使用包含Python和beautifulsoup的url的.txt文件从多个网页中获取数据

2条回答

网友

1楼 · 编辑于 2024-09-30 10:40:02

您尝试的方式可以通过在代码中抽动两个不同的行来修复。在

试试这个：

with open('urls.txt', 'r') as f:
    urls = f.readlines()   #make sure this line is properly indented.
for url in urls:
    uClient = urlopen(url.strip())

网友

2楼 · 编辑于 2024-09-30 10:40:02

不能使用“f.read（）”将整个文件读入字符串，然后在字符串上迭代。要解决此问题，请参阅下面的更改。我也删除了你的最后一行。当您使用'with'语句时，它将在代码块完成时关闭文件。在

Code from Greg Hewgillfor（Python 2）显示url字符串的类型是“str”还是“unicode”。在

from urllib2 import urlopen
from bs4 import BeautifulSoup as soup

# Code from Greg Hewgill
def whatisthis(s):
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
    else:
        print "not a string"

with open('urls.txt', 'r') as f:
    for url in f:
        print(url)
        whatisthis(url)
        uClient = urlopen(url)
        page_html = uClient.read()
        uClient.close()

        page_soup = soup(page_html, "html.parser")

        containers = page_soup.findAll("tr", {"class":"data"})

        for container in containers:
            unform_name = container.findAll("th", {"width":"30%"})
            name = unform_name[0].text.strip()

            unform_delegate = container.findAll("td", {"id":"y000"})
            delegate = unform_delegate[0].text.strip()

            print(name)
            print(delegate)

使用具有上面列出的URL的文本文件运行代码将生成以下输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用包含Python和beautifulsoup的url的.txt文件从多个网页中获取数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >