使用正则表达式分隔符读取csv问题的回答

使用正则表达式分隔符读取csv

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>我一直在尝试读取如下自定义csv文件：</p> <pre><code>6 Rotterdam NLD Zuid-Holland 593321 19 Zaanstad NLD Noord-Holland 135621 214 Porto Alegre BRA Rio Grande do Sul 1314032 397 Lauro de Freitas BRA Bahia 109236 547 Dobric BGR Varna 100399 552 Bujumbura BDI Bujumbura 300000 554 Santiago de Chile CHL Santiago 4703954 626 al-Minya EGY al-Minya 201360 646 Santa Ana SLV Santa Ana 139389 762 Bahir Dar ETH Amhara 96140 123 Chicago 10000 222 New York 200000 </code></pre> <p>我在<a href="https://regex101.com/" rel="nofollow noreferrer">https://regex101.com/</a>中尝试了正则表达式以下代码工作：</p> <h2>这很有效</h2> <pre class="lang-py prettyprint-override"><code># https://regex101.com/ s = "6 Rotterdam NLD Zuid-Holland 593321 " pat = r'(\d+)\s+([\D]+)\s(\d+)\s+' m = re.match(pat,s) m.groups() # ('6', 'Rotterdam NLD Zuid-Holland', '593321') </code></pre> <p>我得到了正确的答案，但当我将代码应用于pandas read_csv时，不知何故它无法工作</p> <h2>我的尝试</h2> <pre class="lang-py prettyprint-override"><code>import numpy as np import pandas as pd from io import StringIO s = """6 Rotterdam NLD Zuid-Holland 593321 19 Zaanstad NLD Noord-Holland 135621 214 Porto Alegre BRA Rio Grande do Sul 1314032 397 Lauro de Freitas BRA Bahia 109236 547 Dobric BGR Varna 100399 552 Bujumbura BDI Bujumbura 300000 554 Santiago de Chile CHL Santiago 4703954 626 al-Minya EGY al-Minya 201360 646 Santa Ana SLV Santa Ana 139389 762 Bahir Dar ETH Amhara 96140 123 Chicago 10000 222 New York 200000 """; sep = r'(\d+)\s+|([\D]+)\s+|(\d+)\s+' df = pd.read_csv(StringIO(s), sep=sep,engine='python') df </code></pre> <p>我有很多NaN，如何只得到3列</p> <p><code>Column names are: ID CITY POPULATION</code></p> <h2>类似问题</h2> <ul> <li><a href="https://stackoverflow.com/questions/61110997/how-to-read-the-custom-table-in-pandas-which-has-number-string-number-number/61111148#61111148">How to read the custom table in pandas which has number string number number?</a></li> </ul>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

使用正则表达式分隔符读取csv

1 个回答

相关Python问题