擅长:python、mysql、java
<p>上面的代码对我不起作用,在我的例子中,这是因为页面通过一个脚本组装它们的链接,而不是包含在代码中。当我遇到这个问题时,我使用了以下代码,这只是一个刮刀:</p>
<pre><code>import os
import re
import urllib
import urllib2
from bs4 import BeautifulSoup
url = ''
connection = urllib2.urlopen(url)
soup = BeautifulSoup(connection) #Everything the same up to here
regex = '(.+?).zip' #Here we insert the pattern we are looking for
pattern = re.compile(regex)
link = re.findall(pattern,str(soup)) #This finds all the .zip (.exe) in the text
x=0
for i in link:
link[x]=i.split(' ')[len(i.split(' '))-1]
# When it finds all the .zip, it usually comes back with a lot of undesirable
# text, luckily the file name is almost always separated by a space from the
# rest of the text which is why we do the split
x+=1
os.chdir("F:\Documents")
# This is the filepath where I want to save everything I download
for i in link:
urllib.urlretrieve(url,filename=i+".zip") # Remember that the text we found doesn't include the .zip (or .exe in your case) so we want to reestablish that.
</code></pre>
<p>这并不如前面的答案中的代码有效,但它几乎适用于大多数网站。在</p>