从URL读取压缩的Stata文件到pandas

2024-09-22 16:36:40 发布

您现在位置:Python中文网/ 问答频道 /正文

是否可以从URL读取只包含.dta文件的.zip文件?你知道吗

例如,https://www.federalreserve.gov/econres/files/scfp2016s.zip包含一个文件:rscfp2016.dta,但^{}不适用于它:

import pandas as pd
pd.read_stata('https://www.federalreserve.gov/econres/files/scfp2016s.zip')

ValueError: Version of given Stata file is not 104, 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), or 118 (Stata 14)

^{}支持通过默认推断压缩的compression参数读取压缩文件(如果压缩文件只包含csv)。read_stata缺少此选项。你知道吗

我可以通过下载和解压文件,然后阅读它,但这是混乱的。你知道吗

!wget https://www.federalreserve.gov/econres/files/scfp2016s.zip
!unzip scfp2016s.zip
df = pd.read_stata('rscfp2016.dta')

有更好的办法吗?你知道吗


Tags: 文件httpsreadwwwfileszipgovpd
2条回答

read_stata接受类似文件的对象,因此可以执行以下操作:

import pandas as pd
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen

url = 'https://www.federalreserve.gov/econres/files/scfp2016s.zip'
with urlopen(url) as request:
    data = BytesIO(request.read())

with ZipFile(data) as archive:
    with archive.open(archive.namelist()[0]) as stata:
        df = pd.read_stata(stata)

您可以尝试以下请求:

import io
import zipfile
import requests

response = requests.get('https://www.federalreserve.gov/econres/files/scfp2016s.zip')                                                                                                                                             
a = zipfile.ZipFile(io.BytesIO(response.content))
b = a.read(a.namelist()[0]) 
pd.read_stata(io.BytesIO(b)) 

相关问题 更多 >