嵌套循环用于漂亮的汤文本

2024-06-28 20:14:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从一个网站的多个页面上刮取公司名称。我使用for循环遍历每个页面并查找公司名称

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
    results.append(text)

print(results)

上面代码的结果仅将每个页面的最后一个元素显示为文本

结果:

https://clutch.co/it-services/msp?page=0
https://clutch.co/it-services/msp?page=1
https://clutch.co/it-services/msp?page=2
https://clutch.co/it-services/msp?page=3
['\nAgency Partner Interactive LLC ', '\nTEAM International ', '\nAstute Technology Management ', '\nWP Tech Support ']

我的理解是,这是因为嵌套for循环只显示一个元素?获取所有页面上每个元素的文本的正确过程是什么

提前谢谢


Tags: texthttpsurl元素forhtmlservicepage
1条回答
网友
1楼 · 发布于 2024-06-28 20:14:39

这是因为将每个条目追加到结果列表的语句不在内部for循环中

试试这个:

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
        results.append(text)

print(results)

相关问题 更多 >