将空字符串附加到forloop列表中的最后一个值

2024-10-04 09:24:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在网页上抓取一个网页,在这样做的同时,我希望提取一些特定的信息,比如位置名称、纬度、经度和电影名称。然而,当跨多个网页提取这些信息时,我不确定前面的三个值属于哪部电影

我想了一种克服这个问题的方法,在每部电影的前三个值之后创建一个空字符串,当它到达一个空字符串时,我可以将这些值拆分为每部电影的列表

尽管我在尝试正确处理空字符串时遇到了困难,但以下是我所做的:

test = ['https://www.latlong.net/location/10-things-i-hate-about-you-locations-250',
 'https://www.latlong.net/location/12-angry-men-locations-818',
 'https://www.latlong.net/location/12-monkeys-locations-501']

for i in range(0, len(test), 1):
    r = requests.get(test[i])
    testone = {'location name':[],'film':[]}
    soup = BeautifulSoup(r.content, 'lxml')
    for th in soup.select("td"):
        testone['location name'].append(th.text.strip())
        testone['location name'].append('')
    for h in soup.select_one("h3"):
        testone['film'].append(h)

但是,这似乎会在每个值后面附加一个空字符串:

'location name': ["1117 Broadway (Gil's Music Shop)",
  '',
  '47.252495',
  '',
  '-122.439644',
  '',
  "2715 North Junett St (Kat and Bianca's House)",
  '',
  '47.272591',
  '',
  '-122.474480', ....

我的期望:


'location name': ["1117 Broadway (Gil's Music Shop)",
  '47.252495',
  '-122.439644',
  "2715 North Junett St (Kat and Bianca's House)",
  '47.272591',
  '-122.474480',
  'Aurora Bridge',
  '47.646713',
  '-122.347435',
  'Buckaroo Tavern (closed)',
  '47.657841',
  '-122.350327',
  'Century Ballroom',
  '47.615028',
  '-122.319855',
  'Fremont Place Books (closed)',
  '47.650452',
  '-122.350510',
  'Fremont Troll',
  '47.651093',
  '-122.347435',
  'Gas Works Park',
  '47.645561',
  '-122.334496',
  'Kerry Park',
  '47.629402',
  '-122.360008',
  'Kingdome',
  '47.595993',
  '-122.333649',
  'Paramount Theatre',
  '47.613235',
  '-122.331451',
  'Seattle',
  '47.601871',
  '-122.341248',
  'Stadium High School',
  '47.265991',
  '-122.448570',
  'Tacoma',
  '47.250828',
  '-122.449135',
  '',
  'New York City',
  '40.742298',
  '-73.982559',
  'New York County Courthouse',
  '40.714310',
  '-74.001930',
  '', ................],
 'film': ['10 Things I Hate About You Locations Map','12 Angry Men Locations Map'...]}

Tags: 字符串nameinhttpstest网页fornet
2条回答

问题是您在每个表后面追加了一个空字符串'' 你读的是手机。这样,由于位置名称、经度和纬度有3个单独的单元格,所以在每个单元格之间插入一个空字符串

最佳解决方案可能是添加一个计数器并将所有内容存储在地图中,而不是两个列表:

test = ['https://www.latlong.net/location/10-things-i-hate-about-you-locations-250',
'https://www.latlong.net/location/12-angry-men-locations-818',
'https://www.latlong.net/location/12-monkeys-locations-501']

for i in range(0, len(test), 1):
   r = requests.get(test[i])
   testone = {}
   cells = soup.select("td")
   soup = BeautifulSoup(r.content, 'lxml')
   for h in soup.select_one("h3"):
       testone[h] = list()
       for j in range(3):
           testone[h].append(cells.pop(0))

通过这种方式,您可以使用testone[<filmname>]获得有关胶片的所有信息

extned()代替append();由于strip()函数返回一个list,并且您希望将列表的所有项附加到testone['location name']
试试这个:

for i in range(0, len(test), 1):
    r = requests.get(test[i])
    testone = {'location name':[],'film':[]}
    soup = BeautifulSoup(r.content, 'lxml')
    for th in soup.select("td"):
        testone['location name'].extend(th.text.strip())
        # Do nothing
    for h in soup.select_one("h3"):
        testone['film'].append(h)

相关问题 更多 >