我正在尝试获取下面的url,它隐藏在页面源代码的https://www.aliexpress.com/item/32212764152.html中,但它隐藏在脚本标记中。你知道吗
<script>
window.runParams = {"descriptionModule":{"descriptionUrl":"https://aeproductsourcesite.alicdn.com/product/description/pc/v2/en_US/desc.htm?productId=32212764152&key=HTB1GwO_aVY7gK0jSZKzM7OikpXac.zip&token=f32528ddd34e37aecddda1c7778d5f0c"} .... </script>
我已经设法得到源代码,但不知道如何提取作为一个对象的网址。你知道吗
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
import re
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")
CHROMEDRIVER_PATH = '/Users/reezalaq/PycharmProjects/wholesale/driver/chromedriver'
options = Options()
options.headless = False
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=options)
driver.get('https://www.aliexpress.com/item/32212764152.html')
html = driver.page_source
def run_script():
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
body = driver.find_element_by_css_selector('body')
body.send_keys(Keys.PAGE_UP)
count = 0
while count < 3: #13
run_script()
count+=1
time.sleep(5)
x = html.startswith('https://aeproductsourcesite.alicdn.com')
print(x)
如何过滤源代码中的所有其他内容并拥有一个对象?你知道吗
可以使用正则表达式提取值:
相关问题 更多 >
编程相关推荐