我似乎打赌没有得到返回值的区别列表旧网址和新网址。这个脚本应该不断地轮询,直到一个差异附加这个差异并返回这个差异。然后返回到def newresponse()
from bs4 import BeautifulSoup
import requests
import time
old_urls = []
new_urls = []
def main():
s = requests.session()
url = s.get('https://www.sivasdescalzo.com/sitemaps/en/sitemap-1.xml')
soup = BeautifulSoup(url.content, "html.parser")
all_urls = soup.find_all("url")
for url in all_urls:
old_urls.append(url.find('loc').get_text())
def newresponse():
s = requests.session()
url = s.get('https://www.sivasdescalzo.com/sitemaps/en/sitemap-1.xml')
soup2 = BeautifulSoup(url.content, "html.parser")
all_newurls = soup2.find_all("url")
for urls in all_newurls:
new_urls.append(urls.find('loc').get_text())
def monitorchange():
x = list(set(new_urls) - set(old_urls))
print "looking for change"
while True:
s =requests.session()
url = s.get('https://www.sivasdescalzo.com/sitemaps/en/sitemap-1.xml')
soup3 = BeautifulSoup(url.content, "html.parser")
if new_urls != old_urls:
return x
old_urls.append(x)
continue
elif url.status_code==403:
print "bannned"
else:
time.sleep(60)
main()
newresponse()
monitorchange()
使用def定义函数,而不是调用它。不会调用main()和newresponse()函数。因此,您的旧列表和新列表将保持为空,不能显示任何差异
相关问题 更多 >
编程相关推荐