在cs中处理字符串中的逗号

2024-10-03 23:18:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用这个数据集使用python进行一些文本挖掘 https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat

所有内容的格式都很好,但有些条目如下:

6898,"RAAF Williams, Laverton Base","Laverton","Australia",\N,"YLVT",-37.86360168457031,144.74600219726562,18,10,"O","Australia/Hobart","airport","OurAirports"   
6899,"Nowra Airport","Nowra","Australia","NOA","YSNW",-34.94889831542969,150.53700256347656,400,10,"O","Australia/Sydney","airport","OurAirports"

在它们的名字中加上逗号,这样就形成了不规则的列表,因为它创建了同一个核心元素(name)的多个元素

将每行分配给列表的代码:

^{pr2}$

我的主要问题是linea[3]在本例中应该是国家australia,但它返回{}。在

我也尝试了csv库,几乎没有差别。在

同样相关:我的代码为该条目返回此值

['6898', 'RAAF Williams, Laverton Base', 'Laverton', 'Australia', '\\N', 'YLVT', '-37.86360168457031', '144.74600219726562', '18', '10', 'O', 'Australia/Hobart', 'airport', 'OurAirports']

Tags: 数据代码元素列表base条目williamsaustralia
2条回答

如果可以更改为另一个包:可以使用pandas读取文件:

import pandas as pd
df = pd.read_csv(filename, sep=',')

print df

     0                             1         2          3    4     5          6           7    8   9  10                11       12              13
0  6898  RAAF Williams, Laverton Base  Laverton  Australia   \N  YLVT -37.863602  144.746002   18  10  O  Australia/Hobart  airport  OurAirports   
1  6899                 Nowra Airport     Nowra  Australia  NOA  YSNW -34.948898  150.537003  400  10  O  Australia/Sydney  airport     OurAirports

# this line will give you the same output structure as you have with the csv package (i.e. the list of lists)
df.as_matrix()

[[6898 'RAAF Williams, Laverton Base' 'Laverton' 'Australia' '\\N' 'YLVT'
  -37.86360168457031 144.74600219726562 18 10 'O' 'Australia/Hobart'
  'airport' 'OurAirports   ']
 [6899 'Nowra Airport' 'Nowra' 'Australia' 'NOA' 'YSNW' -34.948898315429695
  150.53700256347656 400 10 'O' 'Australia/Sydney' 'airport' 'OurAirports']]

Python长期支持csv解析。Refer this link.

您需要在解析器中使用quotechar。基本上,两个引号之间的逗号都将被忽略。在

例如:

import csv
with open (filename) as csvfile:
   csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
   for row in csvreader:
       # do something with the row
       print row

相关问题 更多 >