我试图从一个网站的多个页面上刮取公司名称。我使用for循环遍历每个页面并查找公司名称
### CREATING LOOP TO GO THROUGH PAGES ###
results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
session = HTMLSession()
resp = session.get(url)
resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
soup = BeautifulSoup(resp.html.html, features='lxml')
print(url) #shows what page you are on as it is looping
agencies = soup.find_all(class_='company-name')
for a in agencies:
text = (a.text)
results.append(text)
print(results)
上面代码的结果仅将每个页面的最后一个元素显示为文本
结果:
https://clutch.co/it-services/msp?page=0
https://clutch.co/it-services/msp?page=1
https://clutch.co/it-services/msp?page=2
https://clutch.co/it-services/msp?page=3
['\nAgency Partner Interactive LLC ', '\nTEAM International ', '\nAstute Technology Management ', '\nWP Tech Support ']
我的理解是,这是因为嵌套for循环只显示一个元素?获取所有页面上每个元素的文本的正确过程是什么
提前谢谢
这是因为将每个条目追加到结果列表的语句不在内部for循环中
试试这个:
相关问题 更多 >
编程相关推荐