<p>我想将一个CSV文件从一个压缩文件夹从一个URL加载到一个数据框中。我引用了<a href="https://stackoverflow.com/questions/41218216/using-pandas-to-download-load-zipped-csv-file-from-url">here</a>并使用了相同的解决方案,如下所示:</p>
<pre><code>from urllib import request
import zipfile
# link to the zip file
link = 'https://cricsheet.org/downloads/'
# the zip file is named as ipl_csv2.zip
request.urlretrieve(link, 'ipl_csv2.zip')
compressed_file = zipfile.ZipFile('ipl_csv2.zip')
# I need the csv file named all_matches.csv from ipl_csv2.zip
csv_file = compressed_file.open('all_matches.csv')
data = pd.read_csv(csv_file)
data.head()
</code></pre>
<p>但是在运行代码之后,我得到一个错误,如下所示:</p>
<pre><code>BadZipFile Traceback (most recent call last)
<ipython-input-3-7b7a01259813> in <module>
1 link = 'https://cricsheet.org/downloads/'
2 request.urlretrieve(link, 'ipl_csv2.zip')
----> 3 compressed_file = zipfile.ZipFile('ipl_csv2.zip')
4 csv_file = compressed_file.open('all_matches.csv')
5 data = pd.read_csv(csv_file)
~\Anaconda3\lib\zipfile.py in __init__(self, file, mode, compression, allowZip64, compresslevel, strict_timestamps)
1267 try:
1268 if mode == 'r':
-> 1269 self._RealGetContents()
1270 elif mode in ('w', 'x'):
1271 # set the modified flag so central directory gets written
~\Anaconda3\lib\zipfile.py in _RealGetContents(self)
1334 raise BadZipFile("File is not a zip file")
1335 if not endrec:
-> 1336 raise BadZipFile("File is not a zip file")
1337 if self.debug > 1:
1338 print(endrec)
BadZipFile: File is not a zip file
</code></pre>
<p>我不太习惯用Python压缩文件处理。所以,请帮助我在这里,我需要在我的代码中做什么更正</p>
<p>如果我在web浏览器中打开URL<code>https://cricsheet.org/downloads/ipl_csv2.zip</code>,zip文件会自动下载到我的系统中。由于数据每天都被添加到这个zip文件中,我想访问URL并通过Python直接获取CSV文件以节省存储空间</p>
<p><strong>Edit1:</strong>如果你们还有其他代码解决方案,请分享</p>