如何为日期中的特定时间选择值(日期、时间、值的大列表)

2024-06-14 05:02:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含以下列的文件:日期、时间和股票价值。基本上,每分钟的股票价值。我想计算上午10点和下午4点股票价值的差异。这是我目前掌握的代码:

fileName = "C:\\...\\US_200901_210907.csv"

with open(fileName) as f:
    for line in f.readlines()[1:]:
        split = line.split(";")
        time = split[3]
        date = split[2]
        for timev in f.readlines()[1:]:
            if timev == '100000':
                Spot = float(split[2])
            elif timev == '160000':
                Close = float(split[2])
        Diff = Spot - Close
        print(Diff)

我不确定我是否做得对。但代码需要首先循环/循环每个日期,找到“100000”和“160000”的股票价值,然后计算两者之间的差异。然后移到第二天。在每天结束时,打印每天的差异

“Diff=Spot-Close”行也给了我一个错误,表示“NameError:name'Spot'未定义”

感谢您的帮助

数据集如下所示(摘录): enter image description here

=======================================

在我自己做了更多的工作之后,我能够让它工作:

import csv
filename = "C:\\...\\US_200901_210907.csv"
with open(filename, 'r') as f:
    reader = csv.reader(f, delimiter=';')
    next(reader, None)  # skip header
    rows = list(reader)

listOfDates = []
index = 0
for row in rows:
    if rows[index][2] not in listOfDates:
        listOfDates.append(rows[index][2])
    index = index + 1
print(listOfDates)

startPrice = 0
endPrice = 0
index = 0

startPriceSet = False
endPriceSet = False

for date in listOfDates:
    for row in rows:
        if rows[index][2] == date:
            # print(rows[index][2])
            # print(date)
            if rows[index][3] == '100000':
                startPrice = float(rows[index][7])
                startPriceSet = True
            elif rows[index][3] == '160000':
                endPrice = float(rows[index][7])
                endPriceSet = True
            index = index + 1
            if startPriceSet and endPriceSet:
                print(date, startPrice, endPrice, startPrice - endPrice)
                startPriceSet = False
                endPriceSet = False

Tags: csvinfordateindexiffloatreader
2条回答

为什么不利用数据帧进行此计算-

import pandas as pd
df = pd.read_csv("C:\\...\\US_200901_210907.csv")

# give appropriate column names before or after loading the data
# assuming we have the columns 'time', 'date' & 'stockvalue' in df
# might have to use pandas.to_datetime

print(df[(df['time']=='time1') && (df['date']=='date1')]['stockvalue']-df[(df['time']=='time2') && (df['date']=='date1')]['stockvalue'])

另外,为什么有一个嵌入式for循环

使用您提供的表格的方法之一:

import pandas as pd
from collections import defaultdict
df = pd.read_excel("Data.xlsx", header=None, dtype='str')

out = defaultdict(lambda: defaultdict(float))
for rowindex, row in df.iterrows():
    date = row[2]
    name = row[0]
    if row[3] == "100000":
        out[name]['DATE'] = row[2]
        out[name]['START'] = float(row[4])
    if row[3] == "160000":
        out[name]['END'] = float(row[4])


for stock, data in out.items():
    print (stock+': DATE: '+data['DATE']+' START: '+data['START']+' END:'+data['END']+'  diff = '+str(int(data['END']-data['START'])))

相关问题 更多 >