列车和测试中的Json数据拆分

2024-06-29 00:58:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图为《赫夫邮报》的新闻数据集https://www.kaggle.com/rmisra/news-category-dataset为CNN做准备。我使用的数据集是json格式的。我的数据格式如下

[{"category": "CRIME", "headline": "There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV", "authors": "Melissa Jeltsen", "link": "https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89", "short_description": "She left her husband. He killed their children. Just another day in America.", "date": "2018-05-26"} , {"category": "ENTERTAINMENT", "headline": "Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song", "authors": "Andy McDonald", "link": "https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-world-cup-song_us_5b09726fe4b0fdb2aa541201", "short_description": "Of course it has a song.", "date": "2018-05-26"} ]

这是我正在尝试的代码源代码是https://www.kaggle.com/kredy10/simple-lstm-for-text-classification 我想在这个数据上拟合LSTM

import pandas as pd
import json
with open('News_Category_Dataset_v2.json', 'r') as f:
    train = json.load(f)

现在我想拆分训练和测试数据,但我不知道如何使用数组拆分数据。。有人能帮忙吗

X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.15)

Tags: 数据httpstestcomjsonwwwlinktrain
1条回答
网友
1楼 · 发布于 2024-06-29 00:58:01

我是这样做的: 我首先使用train_test_split来获得train(70%)和test(30%)集,然后在test上使用相同的命令来获得test(50%)和validation(50%)集

from sklearn.model_selection import train_test_split
   
with open('file_name') as f:
    lines = f.readlines()
    
train, test = train_test_split(lines, test_size=0.3)
val, test = train_test_split(test, test_size=0.5)

希望这有帮助

相关问题 更多 >