将TFIDF结果保存到CSV文件中

2024-09-28 22:13:12 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我第一次问问题,英语不是我的语言,所以如果我写错了什么,请原谅我。 我只是从网站上抓取了脚本并计算了功能的TF-IDF,我想将结果保存到CSV文件中,包括所有行和列。谢谢你的帮助

import pandas as pd
import nltk
import csv

from sklearn.feature_extraction.text import TfidfVectorizer
data = pd.read_csv("script.csv", header=None)
data.columns = ['website','script']

tfidf2 = TfidfVectorizer(min_df=5,max_df= 0.9,max_features=3000,sublinear_tf=True)
X = tfidf2.fit_transform(data['script'])
df = pd.DataFrame(X.toarray(), columns=tfidf2.get_feature_names())
print(df)
with open ("tf_idf.csv",'a', newline='') as file:
    writer = csv.writer(file)
    writer.writerow([df])
    file.close()

结果如下:

  10        12        14  ...  undefined       url     width
0   0.109124  0.184763  0.109124  ...   0.229009  0.000000  0.182132
1   0.000000  0.000000  0.000000  ...   0.000000  0.000000  0.000000
2   0.000000  0.000000  0.000000  ...   0.000000  0.146687  0.000000
3   0.186309  0.000000  0.088777  ...   0.000000  0.000000  0.070605
4   0.000000  0.000000  0.000000  ...   0.000000  0.078447  0.000000
5   0.000000  0.145435  0.000000  ...   0.000000  0.226503  0.195839
6   0.000000  0.000000  0.000000  ...   0.125661  0.157894  0.099939
7   0.109124  0.184763  0.109124  ...   0.229009  0.000000  0.182132
8   0.000000  0.000000  0.000000  ...   0.000000  0.000000  0.000000
9   0.000000  0.000000  0.000000  ...   0.000000  0.145549  0.000000
10  0.185179  0.000000  0.088239  ...   0.000000  0.000000  0.070177
11  0.000000  0.000000  0.000000  ...   0.000000  0.078447  0.000000
12  0.000000  0.145435  0.000000  ...   0.000000  0.226503  0.195839
13  0.000000  0.000000  0.000000  ...   0.125661  0.157894  0.099939
14  0.228102  0.108692  0.184031  ...   0.283624  0.136572  0.000000

[15 rows x 80 columns]
[Finished in 5.8s]

Tags: columnscsvimportdfdatatfasscript