从csv fi清除数据

2024-10-02 20:38:23 发布

您现在位置：Python中文网/ 问答频道 /正文

8560

网友

男 | 程序猿一只，喜欢编程写python代码。

我在做关于crpytocurrency的情绪分析。我的工作是清除csv文件中的数据。数据生成（来自Twitter）并保存在csv文件中。在做情绪分析之前。我必须清理数据。例如，删除标点符号，网址，把测试放在小写。这些是推特。在

我已经导入了一些有用的库，例如NLTK（自然语言处理）、pandas、numpy等。在

这是“Tweets”列的输出。在

   ctweet['Tweets'][0:6]



 Out[5]:


    0    RT @TheLTCnews: The @LTCFoundation has publish...
    1    RT @WildchildSings: "https:/ " + /t.co/"FZrGw6xsZU ac..."
    2    RT @HODL_Whale: 5 days until #LitePay launches...
    3    LTC to USD price $211.92 "https:/" + /t.co/"CFjg1mIg..."
    4    LTC to BTC price B0.020218 "https:/" +/t.co/"XPL8NI..."
    5    LTC to GBP price £151.89 "https:/" +/t.co/"iOIbhgyd..."
    6    Litecoin dropped into the bear zone as sugges...
    Name: Tweets, dtype: object

# the output contains url. Because stackoverflow won't allow me to post the url. I have to change the method for url like adding "quotes" and "//".

我的下一个任务是清理数据。这是预处理代码。在

^{pr2}$

上面的代码会删除标点符号，网址，把测试放在小写，提取用户名为例。当我运行那个代码时，它会给出一个错误。在

TypeErrorTraceback (most recent call last)
<ipython-input-3-8254e078073a> in <module>()
      5 for i in range(len(ctweet['Tweets'])):
      6     try:
----> 7         ctweet['tweetos'][i] = ctweet['Tweets'].str.split(' ')[i][0]
      8     except AttributeError:
      9         ctweet['tweetos'][i] = 'other'

TypeError: 'float' object has no attribute '__getitem__'

这个错误是什么意思？我怎样才能解决这个问题。我正在使用Jupyter笔记本5.4.1

更新部件

AttributeErrorTraceback (most recent call last)
<ipython-input-7-bb6b24f62739> in <module>()
     16 # remove URLs, RTs, and twitter handles
     17 for i in range(len(ctweet['Tweets'])):
---> 18     ctweet['Tweets'][i] = " ".join([word for word in ctweet['Tweets'][i].split()
     19                                 if 'http' not in word and '@' not in word and '<' not in word])
     20 

AttributeError: 'float' object has no attribute 'split'

Tags： and the to 数据 in https for ltc

1条回答

网友

1楼 · 发布于 2024-10-02 20:38:23

看起来ctweet是一个字典，因此您需要指向一个索引，如下所示：

ctweet['tweetos'][i] = ctweet['Tweets'][i].str.split(' ')[0]

而不是： ctweet['tweetos'][i] = ctweet['Tweets'].str.split(' ')[i][0]