我已经为NFL比赛数据创建了以下刮刀。它将结果写入csv文件,并执行我需要的所有操作,除了我不知道如何在csv文件的每一行中为谁实际拥有球附加一列。
我可以抓取“home”和“away”<tr>
标签中的文本来显示谁在游戏中玩,以便稍后查询,但是我需要scraper在拥有权发生变化时识别(从home到away,反之亦然)。我对Python还比较陌生,尝试过不同的缩进,但我认为这不是问题所在。任何帮助都将不胜感激。我觉得答案超出了我的理解范围。你知道吗
我也意识到,我的代码可能不是最python,但我仍然在学习。我使用的是python2.7.9。你知道吗
import lxml
from lxml import html
import csv
import urllib2
import re
game_date = raw_input('Enter game date: ')
data_html = 'http://www.cbssports.com/nfl/gametracker/playbyplay/NFL_20160109_PIT@CIN'
url = urllib2.urlopen(data_html).read()
data = lxml.html.fromstring(url)
plays = data.cssselect('tr#play')
home = data.cssselect('tr#home')
away = data.cssselect('tr#away')
csvfile = open('C:\\DATA\\PBP.csv', 'a')
writer = csv.writer(csvfile)
for play in plays:
frame = []
play = play.text_content()
down = re.search(r'\d', play)
if down == None:
pass
else:
down = down.group()
dist = re.search(r'-(\d+)', play)
if dist == None:
pass
else:
dist = dist.group(1)
field_end = re.search(r'[A-Z]+', play)
if field_end == None:
pass
else:
field_end = field_end.group()
yard_line = re.search(r'[A-Z]+([\d]+)', play)
if yard_line == None:
pass
else:
yard_line = yard_line.group(1)
desc = re.search(r'\s(.*)', play)
if desc == None:
pass
else:
desc = desc.group()
time = re.search(r'\((..*\d)\)\s', play)
if time == None:
pass
else:
time = time.group(1)
for team in away:
teamA = team.text_content()
teamA = re.search(r'(\w+)\s', teamA)
teamA = teamA.group(1)
teamA = teamA.upper()
for team in home:
teamH = team.text_content()
teamH = re.search(r'(\w+)\s', teamH)
teamH = teamH.group(1)
teamH = teamH.upper()
frame.append(game_date)
frame.append(down)
frame.append(dist)
frame.append(field_end)
frame.append(yard_line)
frame.append(time)
frame.append(teamA)
frame.append(teamH)
frame.append(desc)
writer.writerow(frame)
csvfile.close()
我猜您需要为每一行向帧中添加另一个值,这表示是否更改了占有权。你知道吗
之后:
添加:
(注意,这假设团队名称是一致的,teamA/teamH值中没有额外的空格/填充/格式)。你知道吗
您不必使用字符串,例如,您可以将0表示无更改,将1表示拥有更改。你知道吗
HTH公司 巴尼
相关问题 更多 >
编程相关推荐