使用下面的代码,我从一个网站下载了多个csv文件:
# Import Key Modules
from bs4 import *
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
import numpy as np
import glob, os
# Get Futures
def main(url):
with requests.Session() as req:
r = req.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = [f"{url[:20]}{item['href']}" for item in soup.select(
"a[href$='VX.csv']")]
for x in target:
print(f"Downloading {x}")
r = req.get(x)
name = x.rsplit("/", 1)[-1]
with open(name, 'wb') as f:
f.write(r.content)
main("https://www.cboe.com/products/futures/market-data/historical-data-archive")
代码已成功将所需文件下载到我的工作目录:
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_F18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_G18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_H18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_J18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_K18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_M18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_N18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_Q18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_U18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_V18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_X18_VX.csv
Downloading https://www.cboe.com/Publish/ScheduledTask/MktData/datahouse/CFE_F17_VX.csv
我想做的是:从多个csv文件中读取特定列,按下载顺序进行
以下是我尝试过的:
# Folder with Files
files = glob.glob('data/*.csv')
# Read Mutiple CSV Files from the Folder into the Dataframe, Rearrange Columns, Rename Some Columns
df = pd.concat([pd.read_csv(fp).assign(Contract=os.path.basename(fp).split('.')[0]) for fp in files])
df = df[['Trade Date', 'Contract', 'Futures', 'Open', 'High', 'Low',
'Close', 'Settle', 'Change', 'Total Volume', 'EFP', 'Open Interest']]
df = df.rename(columns={'Trade Date':'Date',
'Total Volume':'Volume',
'Open Interest':'Open_Interest'})
df['Date'] = pd.to_datetime(df['Date'])
# Remove Unwanted Variables from Ticker Names and Save the File
df['Contract'] = df['Contract'].map(lambda x: x.lstrip('CFE_').rstrip('_VX'))
df.to_csv("vxdata.csv")
df.head()
它给出了以下错误代码:
---------------------------------------------------------------------------
ParserError Traceback (most recent call last)
<ipython-input-3-1e010a7d272a> in <module>
1 # Read Mutiple CSV Files from the Folder into the Dataframe, Rearrange Columns, Rename Some Columns
2
----> 3 df = pd.concat([pd.read_csv(fp).assign(Contract=os.path.basename(fp).split('.')[0]) for fp in files])
4 df = df[['Trade Date', 'Contract', 'Futures', 'Open', 'High', 'Low',
5 'Close', 'Settle', 'Change', 'Total Volume', 'EFP', 'Open Interest']]
<ipython-input-3-1e010a7d272a> in <listcomp>(.0)
1 # Read Mutiple CSV Files from the Folder into the Dataframe, Rearrange Columns, Rename Some Columns
2
----> 3 df = pd.concat([pd.read_csv(fp).assign(Contract=os.path.basename(fp).split('.')[0]) for fp in files])
4 df = df[['Trade Date', 'Contract', 'Futures', 'Open', 'High', 'Low',
5 'Close', 'Settle', 'Change', 'Total Volume', 'EFP', 'Open Interest']]
有没有人知道我可能做错了什么,最重要的是,我如何纠正它
目前没有回答
相关问题 更多 >
编程相关推荐