回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有这样的代码,它遍历url的txt文件并搜索要下载的文件:</p>
<pre><code>URLS = open("urlfile.txt").readlines()
def downloader():
with open('data.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
for url in downloadtools.URLS:
try:
html_data = urlopen(url)
except:
print 'Error opening URL: ' + url
pass
#Creates a BS object out of the open URL.
soup = bs(html_data)
#Parsing the URL for later use
urlinfo = urlparse.urlparse(url)
domain = urlparse.urlunparse((urlinfo.scheme, urlinfo.netloc, '', '', '', ''))
path = urlinfo.path.rsplit('/', 1)[0]
FILETYPE = ['\.pdf$', '\.ppt$', '\.pptx$', '\.doc$', '\.docx$', '\.xls$', '\.xlsx$', '\.wmv$', '\.mp4$', '\.mp3$']
#Loop iterates through list of file types for open URL.
for types in FILETYPE:
for link in soup.findAll(href = compile(types)):
urlfile = link.get('href')
filename = urlfile.split('/')[-1]
while os.path.exists(filename):
try:
fileprefix = filename.split('_')[0]
filetype = filename.split('.')[-1]
num = int(filename.split('.')[0].split('_')[1])
filename = fileprefix + '_' + str(num + 1) + '.' + filetype
except:
filetype = filename.split('.')[1]
fileprefix = filename.split('.')[0] + '_' + str(1)
filename = fileprefix + '.' + filetype
#Creates a full URL if needed.
if '://' not in urlfile and not urlfile.startswith('//'):
if not urlfile.startswith('/'):
urlfile = urlparse.urljoin(path, urlfile)
urlfile = urlparse.urljoin(domain, urlfile)
#Downloads the urlfile or returns error for manual inspection
try:
urlretrieve(urlfile, filename, Percentage)
writer.writerow(['SUCCESS', url, urlfile, filename])
print " SUCCESS"
except:
print " ERROR"
writer.writerow(['ERROR', url, urlfile, filename])
</code></pre>
<p>除了数据没有被写入CSV之外,一切正常。没有目录被更改(至少我知道…)</p>
<p>该脚本遍历url的外部列表,找到文件,正确地下载它们,然后打印“SUCCESS”或“ERROR”,没有问题。它唯一没有做的就是将数据写入CSV文件。它将完整运行而不写入任何CSV数据。在</p>
<p>我尝试在virtualenv中运行它以确保没有任何奇怪的包问题。在</p>
<p>我的嵌入式循环有什么问题导致CSV数据无法写入吗?在</p>