为什么我的链接提取不起作用？

from BeautifulSoup import BeautifulSoup import urllib2 url="http://www.popsci.com/" page=urllib2.urlopen(url) soup = BeautifulSoup(page.read()) sci=soup.findAll('a') for eachsci in sci: print eachsci['href']+","+eachsci.string

Traceback (most recent call last): File "/root/Desktop/3.py", line 12, in <module> print eachsci['href']+","+eachsci.string TypeError: coercing to Unicode: need string or buffer, NoneType found [Finished in 1.3s with exit code 1]

1条回答

网友

1楼 · 发布于 2024-10-01 00:29:04

当a元素不包含文本时，eachsci.string是None-并且不能像您尝试的那样使用+操作符将None与字符串连接起来。你知道吗

如果用eachsci.text替换eachsci.string，这个错误就解决了，因为当a元素为空时，eachsci.text包含空字符串''，并且将其与另一个字符串连接起来没有问题。你知道吗

但是，当您碰到一个没有href属性的a元素时，您将遇到另一个问题—当这种情况发生时，您将得到一个^{}。你知道吗

您可以使用^{}来解决这个问题，如果一个键不在字典中，a元素假装是字典，那么它就可以返回一个默认值。你知道吗

把所有这些放在一起，这里是for循环的一个变体：

for eachsci in sci:
    print eachsci.get('href', '[no href found]') + "," + eachsci.text

相关问题更多 >

编程相关推荐

热门问题

热门文章