尝试刮应用现在和学习更多的网址，但不能得到它使用美丽的汤和python

from urllib.request import urlopen from bs4 import BeautifulSoup import json, requests, re AMEXurl = ['https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds'] identity = ['filmstrip_container'] html_1 = urlopen(AMEXurl[0]) soup_1 = BeautifulSoup(html_1,'lxml') address = soup_1.find('div',attrs={"class" : identity[0]}) for x in address.find_all('a',id = 'html-link'): print(x)

<a href="https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:platinum_charge&intlink=in-amex-cardshop-allcards-apply-AmericanExpressPlatinum-carousel&cpid=100370494&sourcecode=A0000FCRAA" id="html-link"><div><span>Apply Now</span></div></a> <a href="charge-cards/platinum-card/?linknav=in-amex-cardshop-allcards-learn-AmericanExpressPlatinum-carousel&cpid=100370494&sourcecode=A0000FCRAA" id="html-link"><div><span>Learn More</span></div></a> <a href="https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:platinum_charge&intlink=in-amex-cardshop-allcards-apply-AmericanExpressPlatinum-carousel&cpid=100370494&sourcecode=A0000FCRAA" id="html-link"><div><span>Apply Now</span></div></a> <a href="charge-cards/platinum-card/?linknav=in-amex-cardshop-allcards-learn-AmericanExpressPlatinum-carousel&cpid=100370494&sourcecode=A0000FCRAA" id="html-link"><div><span>Learn More</span></div></a>

2条回答

网友
1楼 · 编辑于 2024-09-30 05:21:38

你可以修改它来使用你的列表和语法，但是这会得到我相信你想要的链接。请注意，使用find并不能获得所需的内容，但是使用find_all和href=True并获取第一个链接就可以了
nurl = 'https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds' npage = requests.get(nurl) nsoup = BeautifulSoup(npage.text, "html.parser") # for link in nsoup.find_all('a'): for link in nsoup.find_all('a', string=re.compile('Apply Now'), href=True)[0:1]: print(link.get('href')) for link in nsoup.find_all('a', string=re.compile('Learn'), href=True)[0:1]: print('https://www.americanexpress.com/in/' + link.get('href'))
输出
https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:platinum_charge&intlink=in-amex-cardshop-allcards-apply-AmericanExpressPlatinum-carousel&cpid=100370494&sourcecode=A0000FCRAA https://www.americanexpress.com/in/charge-cards/platinum-card/?linknav=in-amex-cardshop-allcards-learn-AmericanExpressPlatinum-carousel&cpid=100370494&sourcecode=A0000FCRAA

网友
2楼 · 编辑于 2024-09-30 05:21:38

您要查找的URL并非全部存储在HTML中。需要进一步的请求来返回JSON中的信息。为此，还需要会话ID。例如：
from bs4 import BeautifulSoup import requests import json url = 'https://www.americanexpress.com/in/credit-cards/all-cards/?sourcecode=A0000FCRAA&cpid=100370494&dsparms=dc_pcrid_408453063287_kword_american%20express%20credit%20card_match_e&gclid=Cj0KCQiApY6BBhCsARIsAOI_GjaRsrXTdkvQeJWvKzFy_9BhDeBe2L2N668733FSHTHm96wrPGxkv7YaAl6qEALw_wcB&gclsrc=aw.ds' r = requests.get(url) soup = BeautifulSoup(r.content, 'lxml') for script in soup.find_all('script'): if script.contents and "intlUserSessionId" in script.contents[0]: json_raw = script.contents[0][script.contents[0].find('{'):] json_data = json.loads(json_raw) id = json_data["pageData"]["pageValues"]["intlUserSessionId"] url2 = 'https://acquisition-1.americanexpress.com/api/acquisition/digital/v1/shop/us/cardshop-api/api/v1/intl/content/compare-cards/in/default' r2 = requests.get(url2, params={'sessionId':id}) json_data = r2.json() for entry in json_data: cta_group = entry["ctaGroup"][0] click_url = cta_group['clickUrl'] print(f"{cta_group['text']} - {click_url}") learn_more = entry['learnMore']['ctaGroup'][0] print(f"{learn_more['text']} - {learn_more['clickUrl']}")
这将为您提供以下链接：
Apply Now - https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:membershiprewards_credit&feePay=P1 Learn more - credit-cards/membership-rewards-card/ Apply Now - https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:travel_platinum&feePay=T1 Learn more - credit-cards/platinum-travel-credit-card/ Apply Now - https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:gold_charge&feePay=G4&intlink=mainapplynow Learn more - charge-cards/gold-card/ Apply Now - https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:platinum_reserve&feePay=LV&intlink=mainapplynow Learn more - credit-cards/platinum-reserve-credit-card/ Learn more - credit-cards/jet-airways-platinum-credit-card/ Learn more - credit-cards/jet-airways-platinum-credit-card/ Apply Now - https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:platinum_charge Learn more - charge-cards/platinum-card/ Learn more - credit-cards/payback-card/ Learn more - credit-cards/payback-card/ Apply Now - https://global.americanexpress.com/acq/intl/dpa/japa/ind/pers/begin.do?perform=IntlEapp:IND:smart_earn&feepay=ES1 Learn more - credit-cards/smart-earn-credit-card/
了解更多URL需要添加站点的基本URL

相关问题更多 >

编程相关推荐

热门问题

热门文章