刮伤部位缺失

3条回答

网友

1楼 · 编辑于 2024-09-28 17:18:41

我认为不可能用请求来刮那个网站。我建议使用硒或刮痧。你知道吗

网友

2楼 · 编辑于 2024-09-28 17:18:41

问题是，初始GET没有获取数据（我假设是工作列表），而执行此操作的js在头中使用了带有授权令牌的POST。你需要得到这个令牌，然后进行POST来获取数据。你知道吗

这个标记看起来是动态的，所以我们会有点不确定，但可行。你知道吗

url0=r'https://germanamerican.csod.com/ux/ats/careersite/5/home?c=germanamerican'
url=r'https://germanamerican.csod.com/services/x/career-site/v1/search'

s=HTMLSession()
r=s.get(url0)
print(r.status_code)
r.html.render()

soup=bs(r.text,'html.parser')

scripts=soup.find_all('script')

for script in scripts:
    if 'csod.context=' in script.text: x=script

j=json.loads(x.text.replace('csod.context=','').replace(';',''))


payload={
    'careerSiteId': 5,
    'cities': [],
    'countryCodes': [],
    'cultureId': 1,
    'cultureName': "en-US",
    'customFieldCheckboxKeys': [],
    'customFieldDropdowns': [],
    'customFieldRadios': [],
    'pageNumber': 1,
    'pageSize': 25,
    'placeID': "",
    'postingsWithinDays': None,
    'radius': None,
    'searchText': "",
    'states': []
}

headers={
    'accept': 'application/json; q=1.0, text/*; q=0.8, */*; q=0.1',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'authorization': 'Bearer '+j['token'],
    'cache-control': 'no-cache',
    'content-length': '272',
    'content-type': 'application/json',
    'csod-accept-language': 'en-US',
    'origin': 'https://germanamerican.csod.com',
    'referer': 'https://germanamerican.csod.com/ux/ats/careersite/5/home?c=germanamerican',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest'
}

r=s.post(url,headers=headers,json=payload)
print(r.status_code)
print(r.json())

打印出来的r.json()是一个很好的json格式的工作列表表。你知道吗

网友

3楼 · 编辑于 2024-09-28 17:18:41

欢迎来到SO！你知道吗

不幸的是，您将无法使用requests（也不能使用requests_html或类似的库）刮取该页面，因为您需要一个工具来处理动态页面，即基于javascript的页面。你知道吗

对于python，我强烈建议使用selenium及其webdriver。下面是一段打印所需输出的代码，即所有列出的作业（注意，需要安装selenium和Firefox webdriver，并使用正确的运行路径）

# Import libraries
from bs4 import BeautifulSoup
from selenium import webdriver

# Set the URL you want to webscrape from
url = 'https://germanamerican.csod.com/ux/ats/careersite/5/home?c=germanamerican'

browser = webdriver.Firefox() # initialize the webdriver. I use FF, might be Chromium or else

browser.get(url) # go to the desired page. You might want to wait a bit in case of slow connection
page = browser.page_source # this is the page source, now full with the listings that have been uploaded
soup = BeautifulSoup(page, "lxml")
jobs = soup.findAll('a', {'data-tag' : 'displayJobTitle'})
for j in jobs:
    print(j.text)

browser.quit()

相关问题更多 >

编程相关推荐

热门问题

热门文章

刮伤部位缺失

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >