如何改进python中for循环的交互

2024-09-30 14:33:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要添加一个新的列,在那里我将分类,如果在这一行我的出口温度是低,正常或高(在我的数据框中有一百万行)

为此,我使用for迭代和multif对每一行进行如下分类:low temp(低于61)、normal temp(介于61和64之间)和high temp(高于64)。然而,这100万次的迭代太多了,而且需要很长时间,我认为我的PC冻结了,需要关闭spyder IDE

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xlrd
import plotly.offline as py
import plotly.graph_objects as go
import warnings
warnings.filterwarnings('ignore')
def read_date(date):
    return xlrd.xldate.xldate_as_datetime(date, 0)
data1 = pd.read_csv(r'C:\Dados1_14a26_maio.txt', sep=r'\t', engine='python')
data1 = data1.drop('Descartar',axis=1)
data2 = pd.read_csv(r'C:\Dados2_14a26_maio.txt', sep=r'\t', engine='python')
data3 = pd.read_csv(r'C:\Dados3_14a26_maio.txt', sep=r'\t', engine='python')
data3 = data3.drop(['Descartar','Descartar.1','Descartar.2','Descartar.3','Descartar.4','Descartar.5','Descartar.6','Descartar.7','Descartar.8'],axis=1)
DataHora = pd.DataFrame(data1, columns=['Hora'])
DataHora['Hora'] = pd.to_datetime(DataHora['Hora'].apply(read_date), errors='coerce')
data_in = [data1.TentHT, data2.NumVentOn, data3.Tamb]
entrada = pd.concat(data_in, axis=1)
data_out = [data1.TsaidaHT]
saida = pd.concat(data_out, axis=1)
pca_matriz = pd.concat([entrada, saida], axis=1)
cond = pd.DataFrame()
Status = pd.Series([])
for x in saida.index:
    if saida.TsaidaHT[x] < 61: 
        Status[x] = "Low"
    elif saida.TsaidaHT[x] >= 61 and saida.TsaidaHT[x] <= 64: 
        Status[x] = "Normal"
    elif saida.TsaidaHT[x] > 64: 
        Status[x] = "High"
cond.insert(0,"Status",Status)

我想知道是否有一种方法,使这个和多如果的迭代更快,以免冻结我的电脑


Tags: csvimportreaddatadateasstatustemp
1条回答
网友
1楼 · 发布于 2024-09-30 14:33:09

您可以使用numpy.select来实现这一点—它得到了更好的优化。我花了大约2-3秒的时间用100万个虚拟记录来运行下面的程序

import pandas as pd
import numpy as np
from random import randint

d = {"temp":[randint(1,100) for _ in range(1000000)]}

df = pd.DataFrame(d)

df["Status"] = np.select([df["temp"]<61, df["temp"]>64],
                         ["Low","High"],
                         default="Normal")
df.tail()

结果:

        temp  Status
999995     8     Low
999996    62  Normal
999997    40     Low
999998     3     Low
999999    48     Low

相关问题 更多 >