BS4 web刮取没有返回任何内容

2024-10-01 09:35:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我的代码:

res=requests.get('https://www.flickr.com/photos/')
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text, 'html.parser')
linkItem = soup.select('div.photo-list-photo-interaction
a[href^=/photos]')
print(linkItem)

不返回任何值。 检查完元件后,照片在一个<div class "photo-list-photo-interaction">内。所以上面的soup.select应该有用。但是没有。有什么想法吗?你知道吗


Tags: 代码httpsdivgetwwwresrequestsselect
1条回答
网友
1楼 · 发布于 2024-10-01 09:35:26

如果您查看实际来源,您可以看到:

<div  class="view photo-list-view requiredToShowOnServer" style="height: 4578px" data-view-signature="photo-list-view__UA_1__exploreId_2016-09-17__isMobile_false__isOwner_false__photoListConfig_1__photoListLayoutStyle_justified__requiredToShowOnClient_true__requiredToShowOnServer_true__subnavConfig_1"><div  class="view photo-list-photo-view requiredToShowOnServer awake" style="transform: translate(0px, 4px); -webkit-transform: translate(0px, 4px); -ms-transform: translate(0px, 4px); width: 564px; height: 305px; background-image: url(//c6.staticflickr.com/9/8279/29103697453_ca811d0e07_z.jpg)" data-view-signature="photo-list-photo-view__UA_1__contextSuffix_explore-2016-09-17__engagementModelName_photo-lite-models__exploreId_2016-09-17__id_29103697453__interactionViewName_photo-list-photo-interaction-view__isMobile_false__isOwner_false__layoutItem_1__measureAFT_true__model_1__parentContainer_1__parentSignature_photolist-232__photoListLayoutStyle_justified__requiredToShowOnClient_true__requiredToShowOnServer_true__subnavConfig_1">

您在浏览器中看到的URL是使用css动态创建的。在div里面你可以看到background-image: url(//c6.staticflickr.com/9/8279/29103697453_ca811d0e07_z.jpg),这就是你需要得到的。你知道吗

您可以使用带有正则表达式的css选择器:

In [1]: import requests
In [2]: from bs4 import BeautifulSoup    
In [3]: import re
In [4]: url_re = re.compile("url\(//(.*?)\)")

In [5]: res = requests.get('https://www.flickr.com/photos/')

In [6]: soup = BeautifulSoup(res.text, 'html.parser')

In [7]: urls = [url_re.search(d["style"]).group(1) for d in soup.select('div.view.photo-list-view div[style*=url(//]')]

In [8]: print(urls)
[u'c1.staticflickr.com/9/8385/29133157664_856aef9bc3_n.jpg', u'c5.staticflickr.com/9/8212/29128075804_0c166556c5_n.jpg', u'c3.staticflickr.com/9/8070/29138685794_984cf0a7f2.jpg', u'c3.staticflickr.com/9/8084/29465161650_4a1a160928.jpg', u'c5.staticflickr.com/9/8202/29642526492_357d7da694_n.jpg', u'c8.staticflickr.com/9/8164/29769287735_6523928d3d.jpg', u'c5.staticflickr.com/9/8313/29722500236_76d7bdbdd8.jpg', u'c8.staticflickr.com/9/8580/29776721935_f1ce85e967_n.jpg', u'c3.staticflickr.com/9/8364/29731556026_1f9d166845.jpg', u'c3.staticflickr.com/9/8178/29726200506_4439500c3d.jpg', u'c4.staticflickr.com/9/8405/29138108963_288aa48d06.jpg', u'c4.staticflickr.com/9/8565/29137949003_fb41535bd6.jpg', u'c5.staticflickr.com/9/8109/29735723636_0e494810a2.jpg', u'c3.staticflickr.com/9/8482/29662415042_5b0d05c8f3.jpg', u'c1.staticflickr.com/9/8346/29726788896_8c293fbdf7.jpg', u'c3.staticflickr.com/9/8524/29725439906_2b067f0212.jpg', u'c6.staticflickr.com/9/8303/29140293093_e355f8e8cd.jpg', u'c3.staticflickr.com/9/8011/29477607810_db00655d55.jpg', u'c1.staticflickr.com/9/8227/29465026920_36ab1c9637.jpg', u'c2.staticflickr.com/9/8014/29770085625_5163a499d1.jpg', u'c1.staticflickr.com/9/8090/29719718136_5f5ab26519.jpg', u'c1.staticflickr.com/9/8198/29645435472_f5284dedfd.jpg', u'c1.staticflickr.com/9/8692/29469829440_4481cea5e2.jpg', u'c4.staticflickr.com/9/8126/29142193643_f7a2100439.jpg', u'c3.staticflickr.com/9/8395/29646613162_8bbfcb4783.jpg', u'c1.staticflickr.com/9/8182/29482891560_66a7453201.jpg', u'c6.staticflickr.com/9/8078/29137768373_f8c8ebc474.jpg', u'c4.staticflickr.com/9/8142/29754486795_5517360b29.jpg', u'c1.staticflickr.com/9/8276/29138669944_c94fb64f7e.jpg', u'c7.staticflickr.com/9/8189/29658148142_44845e5842.jpg', u'c3.staticflickr.com/9/8168/29724488906_dd17d56015_n.jpg', u'c1.staticflickr.com/9/8450/29727877336_b9d852bc7b.jpg', u'c7.staticflickr.com/8/7471/29129926854_ceff45aaeb.jpg', u'c4.staticflickr.com/9/8298/29690071131_5a7589870d.jpg', u'c8.staticflickr.com/9/8003/29131670143_9cb629648a.jpg', u'c3.staticflickr.com/9/8022/29722826586_c07240a926_n.jpg', u'c3.staticflickr.com/9/8332/29663153602_c9364a94ac.jpg', u'c4.staticflickr.com/9/8219/29767151515_3c9c12d47a.jpg', u'c6.staticflickr.com/9/8475/29675880341_baa5c43403.jpg', u'c5.staticflickr.com/9/8246/29646906852_ff44a93f55_n.jpg', u'c2.staticflickr.com/9/8113/29141997673_64184d61fd_n.jpg', u'c7.staticflickr.com/9/8517/29131116894_1319f5a4af.jpg', u'c5.staticflickr.com/9/8169/29472205700_4930f81031_n.jpg', u'c3.staticflickr.com/9/8051/29466854090_804671e48d.jpg', u'c4.staticflickr.com/9/8459/29772050115_0d602920a9_n.jpg', u'c6.staticflickr.com/9/8413/29762049765_951f4c683c_n.jpg', u'c8.staticflickr.com/9/8480/29132401623_50619e22c5_n.jpg', u'c7.staticflickr.com/9/8410/29482793550_8b338c8432_z.jpg', u'c6.staticflickr.com/8/7501/29693717381_dd907ac02a.jpg']

相关问题 更多 >