下面的脚本执行得很好,可以从Wikipedia页面获取类别名称。如何在10分钟后或获得100个类别后停止
下面的代码需要上面提到的limitation:- 你知道吗
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import csv
import time
#getting all the contents of a url
url = 'https://en.wikipedia.org/wiki/Category:Free software'
content = requests.get(url).content
soup = BeautifulSoup(content,'lxml')
#showing the category-pages Summary
catPageSummaryTag = soup.find(id='mw-pages')
catPageSummary = catPageSummaryTag.find('p')
print(catPageSummary.text)
#showing the category-pages only
catPageSummaryTag = soup.find(id='mw-pages')
tag = soup.find(id='mw-pages')
links = tag.findAll('a')
#getting the category pages
catpages = soup.find(id='mw-pages')
whatlinksherelist = catpages.find_all('li')
things_to_write = []
for titles in whatlinksherelist:
things_to_write.append(titles.find('a').get('title'))
WAIT_TIME = 15
print(titles.text)
time.sleep(WAIT_TIME)
#writing the category pages as a output file
with open('001-catPages.csv', 'a') as csvfile:
writer = csv.writer(csvfile,delimiter="\n")
writer.writerow(things_to_write)
将此添加到您的programe:-
相关问题 更多 >
编程相关推荐