如何使用pandas检查每日交易数据与每月交易数据csv?

2024-06-28 15:04:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我每月要分析一次滴答声数据,如下所示:

Time (UTC),Ask,Bid,AskVolume,BidVolume
2007.04.01 21:00:47.593,95.203,95.159,19.1,8.8
2007.04.01 21:00:47.968,95.174,95.124,23.9,9.2
2007.04.01 21:01:02.695,95.132,95.092,4,4
2007.04.01 21:01:05.934,95.154,95.104,11.2,4
2007.04.01 21:01:18.430,95.171,95.131,12,5.2
2007.04.01 21:01:19.957,95.188,95.153,8,9.2
2007.04.01 21:01:56.308,95.208,95.148,22.3,4
2007.04.01 21:01:57.233,95.192,95.152,7.2,9.2
2007.04.01 21:01:57.443,95.188,95.143,7.2,9.2
2007.04.01 21:01:59.691,95.184,95.139,7.2,9.2
2007.04.01 21:01:59.934,95.181,95.141,8,3.9
2007.04.01 21:02:10.569,95.171,95.136,11.9,4
2007.04.01 21:02:20.708,95.166,95.126,11.2,8.8
2007.04.01 21:02:35.211,95.17,95.135,21.5,4
2007.04.01 21:02:39.946,95.196,95.156,7.2,8.8
2007.04.01 21:02:40.206,95.224,95.164,0.8,0.8
2007.04.01 21:02:43.600,95.222,95.177,8,9.2
2007.04.01 21:02:54.578,95.216,95.186,25.5,5.2
2007.04.01 21:03:04.811,95.23,95.18,7.9,7.9

一直到月的最后一天。你知道吗

我需要知道当天的要价百分比变化((max-min)/max)大于0.05的那一天。我的方法是逐日分离数据,计算出变化的百分比,看看当天价格降幅是否超过5%,如果降幅超过5%,就在当天返回。我对熊猫还很陌生,以下是我迄今为止的收获:

import pandas as pd

df = pd.read_csv('AUDJPY_Ticks_2007.04.01_2007.04.30.csv')
percentChange = ((df['Ask'].max() - df['Ask'].min()) / df['Ask'].max()) >= 0.05
print(percentChange)

我只能得到整个月的百分比变化,而不是每天。你知道吗


Tags: csv数据dftimeminmaxaskpd
1条回答
网友
1楼 · 发布于 2024-06-28 15:04:35

具有resampletransform的溶液:


数据:

我修改了您的示例数据,只允许我们的测试用例查看多天,这样至少有一天的变化大于0.05%。另外,下一个读到这篇文章的人将会有一个复制粘贴的例子。你知道吗

import pandas as pd
from io import StringIO

test_data = StringIO("""Time (UTC),Ask,Bid,AskVolume,BidVolume
2007.04.01 21:00:47.593,95.203,95.159,19.1,8.8
2007.04.01 21:00:47.968,95.174,95.124,23.9,9.2
2007.04.01 21:01:02.695,95.132,95.092,4,4
2007.04.01 21:01:05.934,95.154,95.104,11.2,4
2007.04.02 21:01:18.430,95.171,95.131,12,5.2
2007.04.02 21:01:19.957,95.188,95.153,8,9.2
2007.04.02 21:01:56.308,95.208,95.148,22.3,4
2007.04.02 21:01:57.233,95.192,95.152,7.2,9.2
2007.04.03 21:01:57.443,91.188,95.143,7.2,9.2
2007.04.03 21:01:59.691,97.684,95.139,7.2,9.2 
2007.04.03 21:01:59.934,95.181,95.141,8,3.9
2007.04.03 21:02:10.569,95.171,95.136,11.9,4
2007.04.04 21:02:20.708,95.166,95.126,11.2,8.8
2007.04.04 21:02:35.211,95.17,95.135,21.5,4
2007.04.04 21:02:39.946,95.196,95.156,7.2,8.8
2007.04.04 21:02:40.206,95.224,95.164,0.8,0.8
2007.04.05 21:02:43.600,95.222,95.177,8,9.2
2007.04.05 21:02:54.578,95.216,95.186,25.5,5.2
2007.04.05 21:03:04.811,95.23,95.18,7.9,7.9""")

df = pd.read_table(test_data, sep=",", header=[0], parse_dates=["Time (UTC)"])


将索引设置为datetime列:

df.set_index("Time (UTC)", drop=True, inplace=True)


重采样和变换:

daily_ask = df.resample("D")["Ask"]
df["daily_ask_min"] = daily_ask.transform("min")
df["daily_ask_max"] = daily_ask.transform("max")


计算每日变化百分比:

df["daily_ask_change"] = (df["daily_ask_max"] - df["daily_ask_min"]) / df["daily_ask_max"]


查找大于0.05%的更改:

df[df.daily_ask_change > 0.05]["daily_ask_change"]

# Time (UTC)
# 2007-04-03 21:01:57.443    0.0665
# 2007-04-03 21:01:59.691    0.0665
# 2007-04-03 21:01:59.934    0.0665
# 2007-04-03 21:02:10.569    0.0665
# Name: daily_ask_change, dtype: float64


df[df.daily_ask_change > 0.05]["daily_ask_change"].resample("D").mean()

# Time (UTC)
# 2007-04-03    0.0665
# Freq: D, Name: daily_ask_change, dtype: float64

相关问题 更多 >