我使用以下Python代码从页面路径中刮取锚文本链接和相应的href值:
from requests_html import HTMLSession
from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
url="https://www.mydomain.co.uk/contact-us"
session = HTMLSession()
r = session.get(url)
b = requests.get(url)
soup = BeautifulSoup(b.text, "lxml")
for link in soup.find_all('a'):
print(link.text, '-', link.get('href'))
它工作正常,但它也会刮取图像链接,如果是图像,则输出“-”。例如:
Contact Us - /contact-us
About Us - /about
- /locations
我希望它忽略任何图像href链接,因此输出为:
Contact Us - /contact-us
About Us - /about
这可能吗
谢谢
相关问题 更多 >
编程相关推荐