我在这里跟随导游: https://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c
我在过去使用过它,但由于某些原因,现在它返回空数组,如下面所示,而不是永久链接列表
C:\Users\19053\InstagramPublicImageDownloader\venv\Scripts\python.exe C:/Users/19053/InstagramPublicImageDownloader/getpermalinks.py
[]
[]
[]
[]
[]
[]
[]
[]
应该像
['https://www.instagram.com/p/CDRbCxjBakW/','https://www.instagram.com/p/CDMQ9J2Fvl4/','...and so on']
代码如下:
from selenium.webdriver import Chrome
url = "https://www.instagram.com/dairyqueen/"
browser = Chrome()
browser.get(url)
post = 'https://www.instagram.com/p/'
post_links = []
while len(post_links) < 25:
links = [a.get_attribute('href') for a in browser.find_elements_by_tag_name('a')]
for link in links:
if post in link and link not in post_links:
post_links.append(link)
scroll_down = "window.scrollTo(0, document.body.scrollHeight);"
browser.execute_script(scroll_down)
time.sleep(10)
else:
print(post_links[:25])
要收集您想要的url,请使用此css选择器
div.v1Nh3.kIKUG._bz0w > a
,并使用WebDriverWait
而不是time.sleep(...)
您应该将放置滚动到循环块内的底部,并重复该操作,直到元素数量达到预期值为止
请尝试以下代码:
以下内容:
相关问题 更多 >
编程相关推荐