如何遵循302重定向，同时仍然获取页面信息时，使用Scrapy？

try: print('Current page index: ', page_index) except: # Will be thrown if page_index wasnt found due to redirection. if response.status in (302,) and 'Location' in response.headers: location = to_native_str(response.headers['location'].decode('latin1')) yield scrapy.Request(response.urljoin(location), method='POST', callback=self.parse)

https://m.tennislink.usta.com/TournamentSearch/searchresults.aspx?typeofsubmit=&action=2&keywords=&tournamentid=&sectiondistrict=&city=&state=&zip=&month=0&startdate=&enddate=&day=&year=2019&division=G16&category=28&surface=&onlineentry=&drawssheets=&usertime=&sanctioned=-1&agegroup=Y&searchradius=-1

1条回答

网友

1楼 · 发布于 2024-09-26 18:05:01

您不必遵循302个请求，而是可以执行POST请求并接收页面的详细信息。以下代码打印前5页中的数据：

import requests
from bs4 import BeautifulSoup 

url = 'https://m.tennislink.usta.com/TournamentSearch/searchresults.aspx'

pages=5

for i in range(pages):

    params={'year':'2019','division':'G16','month':'0','searchradius':'-1'}
    payload={'__EVENTTARGET': 'dgTournaments:_ctl1:_ctl'+str(i)}

    res= requests.post(url,params=params,data=payload)
    soup = BeautifulSoup(res.content,'lxml')

    table=soup.find('table',id='ctl00_mainContent_dgTournaments')

    #pretty print the table contents
    for row in table.find_all('tr'):
        for column in row.find_all('td'):
            text = ', '.join(x.strip() for x in column.text.split('\n') if x.strip()).strip()
            print(text)
        print('-'*10)

相关问题更多 >

编程相关推荐

热门问题

热门文章