在第一次运行Python之后循环访问set stops问题的回答

在第一次运行Python之后循环访问set stops

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>我对这一点很陌生，所以建立了一个脚本来学习如何刮。我正在查询一个主索引页，以获得包含我想要的联系信息的url列表。在</p> <p>我成功地将索引列表放入一个集合中，然后尝试使用两个函数对其进行迭代（我相信有更好的方法来实现这一点）。在第一次迭代之后，它停止了，我根本不明白为什么。有什么建议都可以。在</p> <pre><code>import requests from bs4 import BeautifulSoup linkset = set() url = "http://someurl.com/venues" r = requests.get(url) soup = BeautifulSoup(r.content, "lxml") base_url = "http://someurl.com/uk/" links = soup.find_all("a", class_="supplier-link") # A function to get the links from the top level directory. def get_venue_link_list(links): for link in links: linkset.add(link.get("href")) return linkset #get_venue_link_list(links) # When I test by printing linkset, I get the list of unique URL's. # This works as expected. #print linkset # A function to go retrieve contact def go_retrieve_contact(link_value): for i in link_value: link = i venue_link = base_url + link venue_request = requests.get(venue_link) venue_soup = BeautifulSoup(venue_request.content, "lxml") info = venue_soup.find_all("section", {"class": "findout"}) header = venue_soup.find_all("div", {"id": "supplier-header-desktop"}) go_get_info(info) # Email, Phone and Website was nested in one div so it was a little easier to get. # Will need to use a different div for address and social media names. def go_get_info(info): for item in info: print "%s" % ((item.contents[3].find_all("span", {"class": "text"})[0].text)).strip() print "%s" % ((item.contents[3].find_all("span", {"class": "text"})[1].text)).strip() print "%s" % ((item.contents[3].find_all("span", {"class": "text"})[2].text)).strip() #Lets comment out this next nested loop until I fix the above. #for item in header: #print item.contents[1].text go_retrieve_contact(get_venue_link_list(links)) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在第一次运行Python之后循环访问set stops

1 个回答

相关Python问题