<p>如果不具体说明正在使用的库,可以执行以下操作:</p>
<pre><code>import urllib2
import re
def is_fully_alive(url, live_check = False):
try:
if not urllib2.urlparse.urlparse(url).netloc:
return False
website = urllib2.urlopen(url)
html = website.read()
if website.code != 200 :
return False
# Get all the links
for link in re.findall('"((http|ftp)s?://.*?)"', html):
url = link[0]
if not urllib2.urlparse.urlparse(url).netloc:
return False
if live_check:
website = urllib2.urlopen(url)
if website.code != 200:
print "Failed link : ", url
return False
except Exception, e:
print "Errored while attempting to validate link : ", url
print e
return False
return True
</code></pre>
<p>请检查您的网址:</p>
^{pr2}$
<p>通过打开每个链接进行检查:</p>
<pre><code># Takes some time depending on your net speed and no. of links in the page
>>> is_fully_alive("http://www.google.com", True)
True
</code></pre>
<p>检查无效的url:</p>
<pre><code>>>> is_fully_alive("//www.google.com")
Errored while attempting to validate link : //www.google.com
unknown url type: //www.google.com
False
</code></pre>