用lxm解析Python

2024-09-28 19:31:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经为NFL比赛数据创建了以下刮刀。它将结果写入csv文件,并执行我需要的所有操作,除了我不知道如何在csv文件的每一行中为谁实际拥有球附加一列。
我可以抓取“home”和“away”<tr>标签中的文本来显示谁在游戏中玩,以便稍后查询,但是我需要scraper在拥有权发生变化时识别(从home到away,反之亦然)。我对Python还比较陌生,尝试过不同的缩进,但我认为这不是问题所在。任何帮助都将不胜感激。我觉得答案超出了我的理解范围。你知道吗

我也意识到,我的代码可能不是最python,但我仍然在学习。我使用的是python2.7.9。你知道吗

import lxml
from lxml import html
import csv
import urllib2
import re

game_date = raw_input('Enter game date: ')

data_html = 'http://www.cbssports.com/nfl/gametracker/playbyplay/NFL_20160109_PIT@CIN'

url = urllib2.urlopen(data_html).read()

data = lxml.html.fromstring(url)


plays = data.cssselect('tr#play')
home = data.cssselect('tr#home')
away = data.cssselect('tr#away')

csvfile = open('C:\\DATA\\PBP.csv', 'a')
writer = csv.writer(csvfile)

for play in plays:

    frame = []
    play = play.text_content()

    down = re.search(r'\d', play)
    if down == None:
        pass
    else:
        down = down.group()

    dist = re.search(r'-(\d+)', play)
    if dist == None:
        pass
    else:
        dist = dist.group(1)

    field_end = re.search(r'[A-Z]+', play)

    if field_end == None:
        pass
    else:
        field_end = field_end.group()

    yard_line = re.search(r'[A-Z]+([\d]+)', play)

    if yard_line == None:
        pass
    else:
        yard_line = yard_line.group(1)

    desc = re.search(r'\s(.*)', play)
    if desc == None:
        pass
    else:
        desc = desc.group()

    time = re.search(r'\((..*\d)\)\s', play)
    if time == None:
        pass
    else:
        time = time.group(1)

    for team in away:
        teamA = team.text_content()
        teamA = re.search(r'(\w+)\s', teamA)
        teamA = teamA.group(1)
        teamA = teamA.upper()

    for team in home:
        teamH = team.text_content()
        teamH = re.search(r'(\w+)\s', teamH)
        teamH = teamH.group(1)
        teamH = teamH.upper()

    frame.append(game_date)
    frame.append(down)
    frame.append(dist)
    frame.append(field_end)
    frame.append(yard_line)
    frame.append(time)
    frame.append(teamA)
    frame.append(teamH)
    frame.append(desc)

    writer.writerow(frame)

csvfile.close()

Tags: csvrenonehomeplaysearchdataif
1条回答
网友
1楼 · 发布于 2024-09-28 19:31:17

我猜您需要为每一行向帧中添加另一个值,这表示是否更改了占有权。你知道吗

之后:

frame.append(desc)

添加:

if teamA == teamH:
    frame.append("Same possession")
else:
    frame.append("Changed possession")

(注意,这假设团队名称是一致的,teamA/teamH值中没有额外的空格/填充/格式)。你知道吗

您不必使用字符串,例如,您可以将0表示无更改,将1表示拥有更改。你知道吗

HTH公司 巴尼

相关问题 更多 >