如何在多次编辑后停止脚本的执行?

2024-10-04 05:33:25 发布

您现在位置:Python中文网/ 问答频道 /正文

下面的脚本执行得很好,可以从Wikipedia页面获取类别名称。如何在10分钟后或获得100个类别后停止

下面的代码需要上面提到的limitation:- 你知道吗

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup
import requests
import csv
import time

#getting all the contents of a url
url = 'https://en.wikipedia.org/wiki/Category:Free software'
content = requests.get(url).content
soup = BeautifulSoup(content,'lxml')

#showing the category-pages Summary
catPageSummaryTag = soup.find(id='mw-pages')
catPageSummary = catPageSummaryTag.find('p')
print(catPageSummary.text)

#showing the category-pages only
catPageSummaryTag = soup.find(id='mw-pages')
tag = soup.find(id='mw-pages')
links = tag.findAll('a')

#getting the category pages
catpages = soup.find(id='mw-pages')
whatlinksherelist = catpages.find_all('li')
things_to_write = []
for titles in whatlinksherelist:
  things_to_write.append(titles.find('a').get('title'))
  WAIT_TIME = 15
  print(titles.text)
  time.sleep(WAIT_TIME)  
#writing the category pages as a output file
with open('001-catPages.csv', 'a') as csvfile:
  writer = csv.writer(csvfile,delimiter="\n")
  writer.writerow(things_to_write)

Tags: csvthetoimportidurlpagescontent
1条回答
网友
1楼 · 发布于 2024-10-04 05:33:25

将此添加到您的programe:-

#showing the category-pages only
catPageSummaryTag = soup.find(id='mw-pages')
tag = soup.find(id='mw-pages')
links = tag.findAll('a')
catpages = soup.find(id='mw-pages')
whatlinksherelist = catpages.find_all('li')
things_to_write = []
count = 0  #mentioned to count till 100
for titles in whatlinksherelist:
    if count<=100:
        things_to_write.append(titles.find('a').get('title'))
        count+=1
        print(titles.text)

#writing the category pages as a output file
with open('001-catPages.csv', 'a') as csvfile:
  writer = csv.writer(csvfile,delimiter="\n")
  writer.writerow(things_to_write)

相关问题 更多 >