如何在用python pandas读取csv文件时忽略句子内的逗号

2024-06-26 10:26:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv文件,我想用Python中的pandas库来阅读它。在

这是我文件的标题和第一行。在

content,topic,class,NRC-Affect-Intensity-anger_Score,NRC-Affect-Intensity-fear_Score,NRC-Affect-Intensity-sadness_Score,NRC-Affect-Intensity-joy_Score
'@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right.',kindle2,positive,0,0,0,0

它是逗号分隔的,它有7个字段。当我试图读取此文件时,出现了一个错误:

^{pr2}$

我想它是在抱怨第一列中的逗号。(位于'个字符之间的部分)

是否可以正确读取此文件?在

head -15 less proc_data.csv
head: less: No such file or directory
==> proc_data.csv <==
content,topic,class,NRC-Affect-Intensity-anger_Score,NRC-Affect-Intensity-fear_Score,NRC-Affect-Intensity-sadness_Score,NRC-Affect-Intensity-joy_Score
'@stellargirl I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own right.',kindle2,positive,0,0,0,0
'Reading my kindle2...  Love it... Lee childs is good read.',kindle2,positive,0,0,0,1.375
'Ok, first assesment of the #kindle2 ...it fucking rocks!!!',kindle2,positive,0,0,0,0
'@kenburbary You\'ll love your Kindle2. I\'ve had mine for a few months and never looked back. The new big one is huge! No need for remorse! :)',kindle2,positive,0,0,0.594,1.125
'@mikefish  Fair enough. But i have the Kindle2 and I think it\'s perfect  :)',kindle2,positive,0,0,0,0.719
'@richardebaker no. it is too big. I\'m quite happy with the Kindle2.',kindle2,positive,0,0,0,0.788
'Fuck this economy. I hate aig and their non loan given asses.',aig,negative,0.828,0.484,0.656,0
'Jquery is my new best friend.',jquery,positive,0,0,0,0.471
'Loves twitter',twitter,positive,0,0,0,0
'how can you not love Obama? he makes jokes about himself.',obama,positive,0,0,0,0.828
'Check this video out -- President Obama at the White House Correspondents\' Dinner ',obama,neutral,0,0,0,0.109
'@Karoli I firmly believe that Obama/Pelosi have ZERO desire to be civil.  It\'s a charade and a slogan, but they want to destroy conservatism',obama,negative,0,0,0,0.484
'House Correspondents dinner was last night whoopi, barbara &amp; sherri went, Obama got a standing ovation',obama,positive,0,0,0.078,0
'Watchin Espn..Jus seen this new Nike Commerical with a Puppet Lebron..sh*t was hilarious...LMAO!!!',nike,positive,0,0,0,0.672

Tags: and文件theismyitscorenrc
1条回答
网友
1楼 · 发布于 2024-06-26 10:26:56

您试图用逗号分隔列,但是在字符串中可以出现逗号。在

这通常由read_csv方法的quoting参数来处理,默认为quoting='"'。但是,在csv文件中,您有单引号,因此需要更改为quoting="'"。在

然而,这就遇到了这样一个问题:字符串内部存在撇号,其前面是转义的反斜杠。默认情况下,pd.read_csvescapechar参数设置为None,因此您也必须设置此参数。在

总而言之,我们最终得到:

pd.read_csv('proc_data.csv', sep=',',quotechar="'", escapechar='\\')

注意,escapechar本身需要在这里转义。在

如果您不太关心单个行,只想尽可能多地读入可以成功解析的内容,那么可以添加关键字error_bad_lines=False。然后从警告中找出这些线路是可以修复的还是需要放弃的。在

相关问题 更多 >