删除instagram帖子链接时使用empy阵列

2024-10-03 09:20:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我在这里跟随导游: https://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c

我在过去使用过它,但由于某些原因,现在它返回空数组,如下面所示,而不是永久链接列表

C:\Users\19053\InstagramPublicImageDownloader\venv\Scripts\python.exe C:/Users/19053/InstagramPublicImageDownloader/getpermalinks.py
[]
[]
[]
[]
[]
[]
[]
[]

应该像

['https://www.instagram.com/p/CDRbCxjBakW/','https://www.instagram.com/p/CDMQ9J2Fvl4/','...and so on']

代码如下:

from selenium.webdriver import Chrome

url = "https://www.instagram.com/dairyqueen/"
browser = Chrome()
browser.get(url)
post = 'https://www.instagram.com/p/'
post_links = []
while len(post_links) < 25:
    links = [a.get_attribute('href') for a in browser.find_elements_by_tag_name('a')]
    for link in links:
        if post in link and link not in post_links:
            post_links.append(link)
            scroll_down = "window.scrollTo(0, document.body.scrollHeight);"
            browser.execute_script(scroll_down)
            time.sleep(10)
        else:
            print(post_links[:25])

Tags: andinhttpsbrowsercomurlgetwww
1条回答
网友
1楼 · 发布于 2024-10-03 09:20:31

要收集您想要的url,请使用此css选择器div.v1Nh3.kIKUG._bz0w > a,并使用WebDriverWait而不是time.sleep(...)

您应该将放置滚动到循环块内的底部,并重复该操作,直到元素数量达到预期值为止

请尝试以下代码:

browser.get('https://www.instagram.com/dairyqueen/')

scroll_down = "window.scrollTo(0, document.body.scrollHeight);"

while True:
    links = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'div.v1Nh3.kIKUG._bz0w > a')))
    if(len(links) < 25):
        browser.execute_script(scroll_down)
    else:
        break

post_links = []
for link in links:
    post_links.append(link.get_attribute('href'))
    
print(post_links[:25])

以下内容:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

相关问题 更多 >