Python匹配元素相同但美元在%以内的记录

2024-10-02 04:22:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从一个s/s中除与记录相关的美元金额之外的大多数元素(如果不是所有元素都匹配的话)中删除异常。因此,例如,如果A列-C列匹配,但两者之间的美元差异为10%或更少,我将创建逻辑,以便在数据帧中仅突出显示这些示例。对于任何发生这种情况的例子,我都需要这个,而不仅仅是一个静态id。 序号:

Client ID(Numeric) Client_2nd_ID(Alphanumeric) Instrument(text)  Dollars(numer)
12345              FA000123AB                  Baseball          600
45678              PP000157DC                  Football          800
12345              FA000123AB                  Baseball          570
12345              FA000123AB                  Baseball          645
12345              FB000159EE                  Baseball          605

使用上面的示例,我希望dataframe只显示客户ID的三条记录:12345、2nd_ID FA000123AB、instrument barball和Dollars 600570645,正如我提到的任何其他情况一样,其他记录实例存在相似性,不包括上面提到的ID示例(使此变量与static)


Tags: 数据clientid元素示例记录静态情况
1条回答
网友
1楼 · 发布于 2024-10-02 04:22:36

以下代码将过滤组客户机/工具中“美元”字段值与组内最近值的差值小于阈值的任何记录:

import pandas as pd
import numpy as np

threshold = 0.01

df = pd.DataFrame({'Client_ID': [12345, 45678, 12345, 12345, 12345],
                    'Client_2nd_ID': ["FA000123AB", "PP000157DC", "FA000123AB", "FA000123AB", "FB000159EE"],
                    'Instrument': ["Baseball", "Football", "Baseball", "Baseball", "Baseball"],
                    'Dollars': [600, 800, 570, 645, 605]})

idx_lookup = df.apply(lambda x: (df.loc[(df['Client_ID'] == x['Client_ID']) & (df['Instrument'] == x['Instrument'] ), 'Dollars'] - x['Dollars']).abs().replace(0, np.nan).idxmin(), axis=1)

df['percent'] = (df['Dollars'] - df.loc[idx_lookup, 'Dollars'].values) / df.loc[idx_lookup, 'Dollars'].values

df = df.drop(df[(df.percent<=threshold) & (df.percent>0)].index)

enter image description here

它看起来与客户端#12345的条件相匹配,但是,我另外检查了一下,为客户端#45678添加了805个值,以确保它适用于不同的客户端:

import pandas as pd
import numpy as np

threshold = 0.01

df = pd.DataFrame({'Client_ID': [12345, 45678, 12345, 12345, 12345, 45678],
                    'Client_2nd_ID': ["FA000123AB", "PP000157DC", "FA000123AB", "FA000123AB", "FB000159EE", "PP000157DC"],
                    'Instrument': ["Baseball", "Football", "Baseball", "Baseball", "Baseball", "Football" ],
                    'Dollars': [600, 800, 570, 645, 605, 805]})

idx_lookup = df.apply(lambda x: (df.loc[(df['Client_ID'] == x['Client_ID']) & (df['Instrument'] == x['Instrument'] ), 'Dollars'] - x['Dollars']).abs().replace(0, np.nan).idxmin(), axis=1)

df['percent'] = (df['Dollars'] - df.loc[idx_lookup, 'Dollars'].values) / df.loc[idx_lookup, 'Dollars'].values

df = df.drop(df[(df.percent<=threshold) & (df.percent>0)].index)

enter image description here

相关问题 更多 >

    热门问题