apply函数需要很长时间才能运行

RangeIndex: 32084542 entries, 0 to 32084541 df.head() time device kpi value 0 2020-10-22 00:04:03+00:00 1-xxxx chassis.routing-engine.0.cpu-idle 100 1 2020-10-22 00:04:06+00:00 2-yyyy chassis.routing-engine.0.cpu-idle 97 2 2020-10-22 00:04:07+00:00 3-zzzz chassis.routing-engine.0.cpu-idle 100 3 2020-10-22 00:04:10+00:00 4-dddd chassis.routing-engine.0.cpu-idle 93 4 2020-10-22 00:04:10+00:00 5-rrrr chassis.routing-engine.0.cpu-idle 99

def router_role(row): if row["device"].startswith("1"): row["role"] = '1' if row["device"].startswith("2"): row["role"] = '2' if row["device"].startswith("3"): row["role"] = '3' if row["device"].startswith("4"): row["role"] = '4' return row

3条回答

网友

1楼 · 编辑于 2024-09-26 18:00:13

使用apply是出了名的慢，因为它没有利用多线程（例如，请参见pandas multiprocessing apply）。相反，请使用内置的：

>>> import pandas as pd
>>> df = pd.DataFrame([["some-data", "1-xxxx"], ["more-data", "1-yyyy"], ["other-data", "2-xxxx"]])
>>> df
            0       1
0   some-data  1-xxxx
1   more-data  1-yyyy
2  other-data  2-xxxx
>>> df["Derived Column"] = df[1].str.split("-", expand=True)[0]
>>> df
            0       1 Derived Column
0   some-data  1-xxxx              1
1   more-data  1-yyyy              1
2  other-data  2-xxxx              2

在这里，我假设在连字符之前可能有多个数字（例如42-aaaa），因此需要额外的工作来拆分列并获取拆分的第一个值。如果您只是获取第一个字符，请执行@teepee在其答案中所做的操作，只对字符串进行索引

网友

2楼 · 编辑于 2024-09-26 18:00:13

应用程序非常慢，而且从来都不是很好。请尝试以下方法：

df['role'] = df['device'].str[0]

网友

3楼 · 编辑于 2024-09-26 18:00:13

您可以简单地将代码转换为使用np.vectorize()

请看这里： Performance of Pandas apply vs np.vectorize to create new column from existing columns

相关问题更多 >

编程相关推荐

热门问题

热门文章