根据另一列的日期顺序创建另一列

2024-09-27 07:28:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧:

 account_id contract_id date_activated
0   1   AAA 2021-01-05
1   1   ADS 2020-12-12
2   1   ADGD    2021-02-03
3   2   HHA 2021-03-05
4   2   HAKD    2021-03-06
5   3   HADSA   2021-05-01

我希望得到以下结果:

 account_id contract_id date_activated  Renewal Order
0   1   ADS 2020-12-12  Original
1   1   AAA 2021-01-05  1st
2   1   ADGD    2021-02-03  2nd
3   2   HHA 2021-03-05  Original
4   2   HAKD    2021-03-06  1st
5   3   HADSA   2021-05-01  Original

我要创建的列是“续订订单”。每个帐户可以有多个合同。该条件基于每个帐户(帐户id)和激活合同的顺序(激活日期)。第一份合同将被标识为“原件”,而后续合同将被标识为“第一份”、“第二份”,依此类推

以下是原始数据帧的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
 'contract_id': {0: 'AAA',
  1: 'ADS',
  2: 'ADGD',
  3: 'HHA',
  4: 'HAKD',
  5: 'HADSA'},
 'date_activated': {0: Timestamp('2021-01-05 00:00:00'),
  1: Timestamp('2020-12-12 00:00:00'),
  2: Timestamp('2021-02-03 00:00:00'),
  3: Timestamp('2021-03-05 00:00:00'),
  4: Timestamp('2021-03-06 00:00:00'),
  5: Timestamp('2021-05-01 00:00:00')}}

以下是结果字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
 'contract_id': {0: 'ADS',
  1: 'AAA',
  2: 'ADGD',
  3: 'HHA',
  4: 'HAKD',
  5: 'HADSA'},
 'date_activated': {0: Timestamp('2020-12-12 00:00:00'),
  1: Timestamp('2021-01-05 00:00:00'),
  2: Timestamp('2021-02-03 00:00:00'),
  3: Timestamp('2021-03-05 00:00:00'),
  4: Timestamp('2021-03-06 00:00:00'),
  5: Timestamp('2021-05-01 00:00:00')},
 'Renewal Order': {0: 'Original',
  1: '1st',
  2: '2nd',
  3: 'Original',
  4: '1st',
  5: 'Original'}}

Tags: iddate帐户accounttimestampadscontractoriginal
3条回答

还有一个选择:

df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
df['Renewal Order'] = df.groupby('account_id').cumcount().apply(
    lambda n: 'Original' if n == 0 else "%d%s" % (n, "tsnrhtdd"[(n//10 % 10 != 1)*(n % 10 < 4)*n % 10::4]))

尝试^{}确保合同的顺序正确+^{}以获取每个订单号,然后^{}^{}使用函数将数字转换为所需的字符串值:

def format_order(n):
    if n == 0:
        return 'Original'
    suffix = ['th', 'st', 'nd', 'rd', 'th'][min(n % 10, 4)]
    if 11 <= (n % 100) <= 13:
        suffix = 'th'
    return str(n) + suffix


df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# apply
df['Renewal Order'] = df.groupby('account_id').cumcount().apply(format_order)

df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# map
df['Renewal Order'] = df.groupby('account_id').cumcount().map(format_order)
   account_id contract_id date_activated Renewal Order
0           1         ADS     2020-12-12      Original
1           1         AAA     2021-01-05           1st
2           1        ADGD     2021-02-03           2nd
3           2         HHA     2021-03-05      Original
4           2        HAKD     2021-03-06           1st
5           3       HADSA     2021-05-01      Original

我们可以首先通过分组account_id找到cumcount,然后使用np.select我们可以提供条件ifRenewal Order is 0 then replace it with Original和以下条件。
我们可以将其扩展到3rd, 4th and so on
如果我们需要任何默认值,我还设置了一个条件default=unorignal

代码

df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
df['Renewal Order'] = df.groupby('account_id').cumcount()
conditions = [
    df['Renewal Order']==0,
    df['Renewal Order']==1,
    df['Renewal Order']==2
]
choices = ['Original', '1st', '2nd']
df['Renewal Order'] = np.select(conditions, choices, default='unOriginal') ## remove default if not required
df

输出

account_id      contract_id date_activated  Renewal Order
0   1           ADS         2020-12-12      Original
1   1           AAA         2021-01-05      1st
2   1           ADGD        2021-02-03      2nd
3   2           HHA         2021-03-05      Original
4   2           HAKD        2021-03-06      1st
5   3           HADSA       2021-05-01      Original

相关问题 更多 >

    热门问题