我有一个python代码,它可以在web上获取正确的数据,但是guests列中有多个字符串,并且当前只遍历一个。那么,如何遍历该列单元格中的列表,并将3个guest1、guest2、guest3作为单独的列返回呢? 谢谢
import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
df = pd.DataFrame(columns=(['NoInSeason', 'Guests', 'Winner', 'OriginalAirDate']))
page = requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
table_rows = table.find_all("tr")
for tr in table_rows:
td = tr.find_all("td")
if len(td) == 5:
NoInSeason = td[0].find(text=True)
Guests = td[2].find_all(text=True)
Winner = td[3].find(text=True)
OriginalAirDate = td[4].find(text=True)
if len(Guests) == 3:
Guest1 = Guests[0]
Guest2 = Guests[1]
Guest3 = Guests[2]
df = df.append({'NoInSeason': NoInSeason, 'Guest1' : Guest1, 'Guest2' : Guest2, 'Guest3' : Guest3, 'Winner': Winner, 'OriginalAirDate' : OriginalAirDate}, ignore_index=True)
df.to_csv("output.csv")
print(df)
这就是你要找的吗
编辑:我看到你修改了代码,现在能用了吗?在DataFrame中包含Guest1、Guest2和Guest3列,这样就不会得到一个满是NaN的Guests列,这不是更好吗
相关问题 更多 >
编程相关推荐