奇怪的python csv模块行为不能分割记录

>>> len(open('cities5000.txt').read().splitlines()) 46955 >>> len(list(csv.reader(open('cities5000.txt')))) 46955 # but here comes some fun >>>len(list(csv.reader(open('cities5000.txt'), delimiter='\t'))) 46048

2条回答

网友

1楼 · 编辑于 2024-09-30 05:21:12

我认为默认分隔符是由默认方言“excel”（https://docs.python.org/2/library/csv.html#csv-fmt-params）定义的

我不知道是哪种分隔符，但我认为自己定义分隔符可以让您更好地控制如何分割数据。你知道吗

我还可以想象一些城市名称和UTF8编码的问题（不确定，只是作为进一步研究的提示）。你知道吗

编辑：短谷歌搜索，你会发现：https://github.com/oamasood/GeonamesPy 也许这也有帮助。你知道吗

网友

2楼 · 编辑于 2024-09-30 05:21:12

默认方言还指定了引号char，可用于转义换行符。您可以用quotechar=None覆盖它。你知道吗

>>> len(open('cities5000.txt').read().splitlines())
46957
>>> len(list(csv.reader(open('cities5000.txt'), delimiter='\t')))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_csv.Error: field larger than field limit (131072)
>>> len(list(csv.reader(open('cities5000.txt'), delimiter='\t', quotechar=None)))
46957

相关问题更多 >

编程相关推荐

热门问题

热门文章

奇怪的python csv模块行为不能分割记录

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >