我尝试了各种方法来阅读这个文件中的tweet(示例)。unicode字符Victory Hand似乎不想解析。这是数据样本。在
399491624029274112,Kyle aka K-LO,I unlocked 2 Xbox Live achievements in WWE 2K14! http://t.co/wRIxZTjYWg,False,0,Raptr,,,,,2013,11,10,11,0,0,0,0,1,0,0,0,0,0
399491626584014848,Dots Group LLC,GeekWire Radio: Amazon vs. author Xbox One first take and favorite iPad apps - GeekWire http://t.co/jbbryoHpHe,False,0,IFTTT,,,,,2013,11,10,11,0,0,0,0,1,0,0,0,0,2
399491630149169152,BETTINGGENIUS!,RT @xJohn69: Sergio Ramos giveaway!; XBOX + PS3; ; -RT; -Follow me and @NeillWagers; -S/Os appreciated; ; Goodluck http://t.co/D997faGSB5,False,0,Twitter for iPad,,,,,2013,11,10,11,0,1,1,0,1,0,0,0,0,2
399491635735953408,Princess of TV,Toy Story of Terror is amaze balls. Thanks Xbox for the free NowTV #disneyweekend,False,0,Twitter for iPhone,,,,,2013,11,10,11,0,2,0,0,1,0,0,0,0,2
399491654136369152,Sam Hambre,'9 Things You Should Know Before Buying a PlayStation 4' http://t.co/Q3Ma1R83cF,False,0,Buffer,,,,,2013,11,10,11,0,7,0,1,0,0,0,0,0,0
399491655780167680,Rhi ✌,@Escape2theMoon that's done what? im not on rn obvs i dont even have access to an xbox :c ?,False,0,web,399490703761223680,Escape2theMoon,1404625770,,2013,11,10,11,0,7,0,0,1,0,0,0,0,0
你可以在最后一条微博的第二个区域看到胜利之手。
我想做的是从所有的tweets中建立一个长串。简单地看,我甚至无法处理这个脚本:
^{pr2}$我已经尝试过导入、编码、连接、转换为unicode等多种排列方式。。。但我无法越过胜利之手。我经常收到的错误是:
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-114-fd9b136abd74> in <module>()
----> 1 for record in data:
2 tweets = tweets + ' ' + record[2].encode('utf-8', 'replace')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u270c' in position 23: ordinal not in range(128)
我做错什么了?如何将所有这些tweet连接成一个字符串而不出现unicode问题?在
问题在于csv.reader它试图将unicode转换回
ascii
。来自csv docs的注释:按照建议,您可以使用这个配方from the docs examples:
使用
^{pr2}$unicode_csv_reader
helper实用程序,您的代码可以如下所示(稍微修改以使用闭包和循环的join-istead):相关问题 更多 >
编程相关推荐