这个问题可能看起来很琐碎,因为它和Scrapy有关,我试着用自己的方式去做,但没有成功
所以基本上我需要这个WebisteTest URL的数据
因此,该网站的结构如下: 每个部分中的数据都在一个名为box的类中,然后所有数据都出现在这些box类中
中的box类用^{cl1}封装在一个div中$
我已经编写了直接获取所有数据的代码,但在尝试将相同的数据转换为循环结构以逐个获取数据时,我遇到了一些问题。使用循环结构是为了帮助我轻松地将这些数据放入数据库
下面是我为同时完成所有工作而编写的代码
all_competition_names = response.xpath(
"/html/body/div[2]/div[10]/div[1]/div[@class='box']/div[@class='table-header']/h2/a/text()"
).getall()
# Table Columns
competition_columns = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/thead/tr/th/text()'
).getall()
# MathcDay Info(Might not be the same)
matchday_info = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[@class="zentriert"][1]/text()'
).getall()
# Date Info
date_info = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[2]/text()'
).getall()
# Time Info
time_info = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[@class="zentriert"][2]/text()'
).getall()
# Home Team
home_team = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[5]/a/text()'
).getall()
# Away Team
away_team = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[7]/a/text()'
).getall()
# Formation
formation = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[8]/text()'
).getall()
# Coach For Match
coach_for_match = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[9]/a/text()'
).getall()
# Match_result
match_result = response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[11]/a/span/text()'
).getall()
# Match_Statistics_Link
match_stats_link = [
self.base_url + i
for i in response.xpath(
'/html/body/div[2]/div[10]/div/div[@class="box"]/div[@class="responsive-table"]/table/tbody/tr/td[11]/a/@href'
).getall()
]
for item in response.xpath("/html/body/div[2]/div[10]/div"):
print(
item.xpath(
".//div[@class='box']/div[@class='table-header']/h2/a/text()"
)
)
# print(
# all_competition_names,
# competition_columns,
# matchday_info,
# date_info,
# time_info,
# home_team,
# away_team,
# match_result,
# match_stats_link,
# coach_for_match,
# )
如果有人能给我一些关于我哪里出了问题的建议,那将是很有帮助的
在这种情况下,您需要单独定位表。然后在特定元素中使用选择器(而不是整个
response
中的选择器)迭代元素日志输出:
R无回路的解决方案:
相关问题 更多 >
编程相关推荐