更快地遍历df行

2024-10-02 12:27:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图遍历一行数据,从该行的一列中获取数据,并使用该数据添加新列。下面列出了代码,但速度非常慢。有没有什么方法可以在不遍历数据帧的各行的情况下完成我正在尝试的工作

ctqparam = []
wwy = []
ww = []
for index, row in df.iterrows():
    date = str(row['Event_Start_Time'])
    day = int(date[8] + date[9])
    month = int(date[5] + date[6])
    total = 0
    for i in range(0, month-1):
        total += months[i]
    total += day
    out = total // 7
    ww += [out]
    wwy += [str(date[0] + date[1] + date[2] + date[3])]

    val = str(row['TPRev'])
    out = ""
    for letter in val:
        if letter != '.':
            out += letter
    df.replace(to_replace=row['TPRev'], value=str(out), inplace = True)

    val = str(row['Subtest'])
    if val in ctqparam_dict.keys():
        ctqparam += [ctqparam_dict[val]]

# add WWY column, WW column, and correct data format of Test_Tape column
df.insert(0, column='Work_Week_Year', value = wwy)
df.insert(3, column='Work_Week', value = ww)
df.insert(4, column='ctqparam', value = ctqparam)

Tags: 数据indffordatevaluecolumnval
1条回答
网友
1楼 · 发布于 2024-10-02 12:27:25

很难说你到底想做什么。然而,如果您在各行之间循环,那么很有可能有更好的方法

例如,给定一个如下所示的csv文件

Event_Start_Time,TPRev,Subtest
4/12/19 06:00,"this. string. has dots.. in it.",{'A_Dict':'maybe?'}
6/10/19 04:27,"another stri.ng wi.th d.ots.",{'A_Dict':'aVal'}

您可能希望:

  1. Event_Start_Time格式化为日期时间
  2. Event_Start_Time获取周数
  3. 从列^{中的字符串中删除所有点(.)
  4. Subtest中包含的词典展开到它自己的列

不循环遍历行,考虑按列进行操作。就像对列的第一个“单元格”进行复制一样

代码:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

     Event_Start_Time    TPRev                              Subtest
0    4/12/19 06:00       this. string. has dots.. in it.    {'A_Dict':'maybe?'}
1    6/10/19 04:27       another stri.ng wi.th d.ots.       {'A_Dict':'aVal'}


# format 'Event_Start_Time' as as datetime
df['Event_Start_Time'] = pd.to_datetime(df['Event_Start_Time'], format='%d/%m/%y %H:%M')

# get the week number from 'Event_Start_Time'
df['Week_Number'] = df['Event_Start_Time'].dt.isocalendar().week

# replace all '.' (periods) in the 'TPRev' column
df['TPRev'] = df['TPRev'].str.replace('.', '', regex=False)

# get a dictionary string out of column 'Subtest' and put into a new column
df = pd.concat([df.drop(['Subtest'], axis=1), df['Subtest'].map(eval).apply(pd.Series)], axis=1)

print(df)

     Event_Start_Time      TPRev                       Week_Number    A_Dict
0    2019-12-04 06:00:00   this string has dots in it  49             maybe?
1    2019-10-06 04:27:00   another string with dots    40             aVal


print(df.info())

Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype         
 -                            -         
 0   Event_Start_Time  2 non-null      datetime64[ns]
 1   TPRev             2 non-null      object        
 2   Week_Number       2 non-null      UInt32        
 3   A_Dict            2 non-null      object        
dtypes: UInt32(1), datetime64[ns](1), object(2)

所以你会得到这样一个数据帧

     Event_Start_Time      TPRev                       Week_Number    A_Dict
0    2019-12-04 06:00:00   this string has dots in it  49             maybe?
1    2019-10-06 04:27:00   another string with dots    40             aVa

显然,你可能会想做其他事情。看看你的数据。列出你想对每一列做什么,或者你需要什么新的列。不要说现在有多大的可能性,以前也做过——你只需要找到现有的方法

您可以写下从当前行和下面的行中获取天数差。最后搜索如何进行所需的格式化或计算。把问题分解

相关问题 更多 >

    热门问题