Python爬行美女怎么爬行几页？

from bs4 import BeautifulSoup import requests maximum = 0 page = 1 URL = 'http://www.saramin.co.kr/zf_user/jobs/company-labs/list/page/1' response = requests.get(URL) source = response.text soup = BeautifulSoup(source, 'html.parser') whole_source = "" for page_number in range(1, maximum+1): URL = 'http://www.saramin.co.kr/zf_user/jobs/company-labs/list/page/' + str(page_number) response = requests.get(URL) whole_source = whole_source + response.text soup = BeautifulSoup(whole_source, 'html.parser') find_company = soup.select("#content > div.wrap_analysis_data > div.public_con_box.public_list_wrap > ul > li:nth-child(13) > div > strong") for company in find_company: print(company.text)

2条回答

网友

1楼 · 编辑于 2024-10-03 00:16:57

那么，您想删除所有headers，只获取公司名称的string？基本上，您可以使用soup.findAll以如下格式查找公司列表：

<strong class="company"><span>중소기업진흥공단</span></strong>

然后使用.find函数从<span>标记中提取信息：

<span>중소기업진흥공단</span>

之后，使用.contents函数从<span>标记获取字符串：

'중소기업진흥공단'

因此，您可以编写一个循环来对每个页面执行相同的操作，并创建一个名为company_list的列表来存储每个页面的结果并将它们附加在一起。你知道吗

代码如下：

from bs4 import BeautifulSoup
import requests

maximum = 12

company_list = [] # List for result storing
for page_number in range(1, maximum+1):
    URL = 'http://www.saramin.co.kr/zf_user/jobs/company-labs/list/page/{}'.format(page_number) 
    response = requests.get(URL)
    print(page_number)
    whole_source = response.text
    soup = BeautifulSoup(whole_source, 'html.parser')
    for entry in soup.findAll('strong', attrs={'class': 'company'}): # Finding all company names in the page
        company_list.append(entry.find('span').contents[0]) # Extracting name from the result

company_list将为您提供所需的所有公司名称

网友

2楼 · 编辑于 2024-10-03 00:16:57

我终于明白了。谢谢你的回答！你知道吗

image : code captured in jupyter notebook

这是我最后的密码。你知道吗

from urllib.request import urlopen 
from bs4 import BeautifulSoup

company_list=[]
for n in range(12):
    url = 'http://www.saramin.co.kr/zf_user/jobs/company-labs/list/page/{}'.format(n+1)
    webpage = urlopen(url)
    source = BeautifulSoup(webpage,'html.parser',from_encoding='utf-8')
    companys = source.findAll('strong',{'class':'company'})

    for company in companys:
    company_list.append(company.get_text().strip().replace('\n','').replace('\t','').replace('\r',''))

file = open('company_name1.txt','w',encoding='utf-8')

for company in company_list:
file.write(company+'\n')

file.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章