回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在使用这个数据集使用python进行一些文本挖掘
<a href="https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat" rel="nofollow noreferrer">https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat</a></p>
<p>所有内容的格式都很好,但有些条目如下:</p>
<pre><code>6898,"RAAF Williams, Laverton Base","Laverton","Australia",\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"
6899,"Nowra Airport","Nowra","Australia","NOA","YSNW",-34.94889831542969,150.53700256347656,400,10,"O","Australia/Sydney","airport","OurAirports"
</code></pre>
<p>在它们的名字中加上逗号,这样就形成了不规则的列表,因为它创建了同一个核心元素(name)的多个元素</p>
<p>将每行分配给列表的代码:</p>
^{pr2}$
<p>我的主要问题是<code>linea[3]</code>在本例中应该是国家<code>australia</code>,但它返回{<cd3>}。在</p>
<p>我也尝试了csv库,几乎没有差别。在</p>
<p>同样相关:我的代码为该条目返回此值</p>
<pre><code>['6898', 'RAAF Williams, Laverton Base', 'Laverton', 'Australia', '\\N', 'YLVT', '-37.86360168457031', '144.74600219726562', '18', '10', 'O', 'Australia/Hobart', 'airport', 'OurAirports']
</code></pre>