我一直在做一个项目,从一个业余曲棍球网站上抓取时间表,并将其导出为csv格式,可以上传到Sports Engine应用程序中。我已经设法以纯文本格式获取了所需的数据,但现在需要弄清楚如何将其转换为csv格式
以下是脚本的示例输出,为简洁起见,将其缩短
AL1602·11月6日·原子A联盟·FVC飞行3最终舞台·阿伯茨福德,B兰利MHA原子A4鹰2-6阿伯茨福德原子A2鹰AL1607·11月10日·原子A联盟·FVC飞行3最终任务休闲中心·北方·任务,B因冰上冲突时间变化C阿伯茨福德原子A2鹰5-4任务MHA原子A2
这里是脚本的一个示例输出,但只使用print(tables)
显示格式,而不只是打印文本
[<tr class="gamelist-row"><td class="game-details"><div class="game-meta text-muted">AL1602 · Nov 6<a class="text-muted" href="/leagues/786?scheduleId=1265&groupId=5" title="Atom A League · FVC Flight 3"> · Atom A League · FVC Flight 3</a></div><div class="game-time">FINAL</div><div class="game-arena">MSA Arena<span class="text-muted"> · Abbotsford, BC</span></div></td><td><div class="game-matchup"><a class="team-link" href="/teams/4688?scheduleId=1265&groupId=5"><div class="d-flex flex-row" style="min-width: 125px;"><div class="pr-2"><div alt="LANGLEY MHA ATOM A4 EAGLES" class="team-logo" style='background-image: url("https://s3-ca-central-1.amazonaws.com/hisports-logos/1537488764672.png");'></div></div><div class="d-flex flex-fill flex-column justify-content-center"><span class="team-name text-uppercase">LANGLEY MHA ATOM A4 EAGLES</span></div></div></a><div class="game-result score"><div class="result result-loss">2</div><span class="text-muted"> - </span><div class="result result-win">6</div></div><a class="team-link" href="/teams/4326?scheduleId=1265&groupId=5"><div class="d-flex flex-row flex-row-reverse" style="min-width: 125px;"><div class="pl-2"><div alt="ABBOTSFORD ATOM A2 HAWKS" class="team-logo" style='background-image: url("https://s3-ca-central-1.amazonaws.com/hisports-logos/1538567502609.jpg");'></div></div><div class="d-flex flex-fill flex-column justify-content-center"><span class="team-name text-uppercase text-right">ABBOTSFORD ATOM A2 HAWKS</span></div></div></a></div></td></tr>, <tr class="gamelist-row"><td class="game-details"><div class="game-meta text-muted">AL1607
下面是脚本
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
#launch url
url = "https://games.pcaha.ca/teams/4326"
#create a new Firefox session
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)
#After opening the url above, Selenium finds the table with the schedule
games = driver.find_elements_by_id("table-responsive")
#Selenium hands the page source to Beautiful Soup
soupsource=BeautifulSoup(driver.page_source, 'lxml')
soupsource.prettify()
#Beautiful Soup grabs the class gamelist-row
tables = soupsource.find_all("tr", class_="gamelist-row")
# prints out the text only
for x in tables:
print(x.text)
尝试将此小片段写入csv文件。修改它以满足您的需要
相关问题 更多 >
编程相关推荐