从网站读取多个CSV文件并将特定列连接到一个数据帧中

2024-09-27 23:20:19 发布

您现在位置:Python中文网/ 问答频道 /正文

使用下面的代码,我从一个网站下载了多个csv文件:

# Import Key Modules

from bs4 import *
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
import numpy as np
import glob, os

# Get Futures

def main(url):
    with requests.Session() as req:
        r = req.get(url)
        soup = BeautifulSoup(r.content, 'html.parser')
        target = [f"{url[:20]}{item['href']}" for item in soup.select(
            "a[href$='VX.csv']")]
        for x in target:
            print(f"Downloading {x}")
            r = req.get(x)
            name = x.rsplit("/", 1)[-1]
            with open(name, 'wb') as f:
                f.write(r.content)


main("https://www.cboe.com/products/futures/market-data/historical-data-archive")

代码已成功将所需文件下载到我的工作目录:

Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_F18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_G18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_H18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_J18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_K18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_M18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_N18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_Q18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_U18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_V18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_X18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_F17_VX.csv

我想做的是:从多个csv文件中读取特定列,按下载顺序进行

以下是我尝试过的:

# Folder with Files

files = glob.glob('data/*.csv')

# Read Mutiple CSV Files from the Folder into the Dataframe, Rearrange Columns, Rename Some Columns

df = pd.concat([pd.read_csv(fp).assign(Contract=os.path.basename(fp).split('.')[0]) for fp in files])
df = df[['Trade Date', 'Contract', 'Futures', 'Open', 'High', 'Low', 
         'Close', 'Settle', 'Change', 'Total Volume', 'EFP', 'Open Interest']]
df = df.rename(columns={'Trade Date':'Date', 
                        'Total Volume':'Volume', 
                        'Open Interest':'Open_Interest'})
df['Date'] = pd.to_datetime(df['Date'])

# Remove Unwanted Variables from Ticker Names and Save the File

df['Contract'] = df['Contract'].map(lambda x: x.lstrip('CFE_').rstrip('_VX'))
df.to_csv("vxdata.csv")
df.head()

它给出了以下错误代码:

---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
<ipython-input-3-1e010a7d272a> in <module>
      1 # Read Mutiple CSV Files from the Folder into the Dataframe, Rearrange Columns, Rename Some Columns
      2 
----> 3 df = pd.concat([pd.read_csv(fp).assign(Contract=os.path.basename(fp).split('.')[0]) for fp in files])
      4 df = df[['Trade Date', 'Contract', 'Futures', 'Open', 'High', 'Low', 
      5          'Close', 'Settle', 'Change', 'Total Volume', 'EFP', 'Open Interest']]

<ipython-input-3-1e010a7d272a> in <listcomp>(.0)
      1 # Read Mutiple CSV Files from the Folder into the Dataframe, Rearrange Columns, Rename Some Columns
      2 
----> 3 df = pd.concat([pd.read_csv(fp).assign(Contract=os.path.basename(fp).split('.')[0]) for fp in files])
      4 df = df[['Trade Date', 'Contract', 'Futures', 'Open', 'High', 'Low', 
      5          'Close', 'Settle', 'Change', 'Total Volume', 'EFP', 'Open Interest']]

有没有人知道我可能做错了什么,最重要的是,我如何纠正它


Tags: csvhttpscomdfwwwpublishpdfp

热门问题