基于另一列中的值范围创建具有bucket的列

2024-10-01 00:27:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个样品

^{tb1}$

我需要创建另一个列C,它根据一些断点来存储列B

Breakpts=[50100250350]

^{tb2}$

我有以下代码可以工作

def conditions(i): 
    if i <=50: return '0-50'
    if i > 50 and i <=100: return '50-100'
    if i > 100 and i <=250: return '100-250'
    if i > 250 and i <=350: return '250-350'
    if i > 350: return '>350'

df['C']=df['B'].apply(conditions)

然而,我想让这一切变得疯狂。因此,如果我使用不同的breakpts,比如[100250300400],代码应该基于breakpts自动创建不同的bucket

有什么办法吗


Tags: and代码dfreturnifbucketdef样品
1条回答
网友
1楼 · 发布于 2024-10-01 00:27:28

正如评论中指出的那样,pd.cut()将是一条道路。您可以使分手动态化,并自行设置:

import pandas as pd
import numpy as np

bins = [0,50, 100,250, 350, np.inf]
labels = ["'0-50'","'50-100'","'100-250'","'250-350'","'>350'"]
df['C'] = pd.cut(df['B'], bins=bins, labels=labels)

再看一下^{},它是一个基于分位数的离散化函数


或者,使用np.select

col = 'B'
conditions = [
              df[col].between(0,50),   # inclusive = True is the default
              df[col].between(50,100),  
              df[col].between(100,250),
              df[col].between(250,350),
              df[col].ge(350)
             ]
choices = ["'0-50'","'50-100'","'100-250'","'250-350'","'>350'"]
    
df["C"] = np.select(conditions, choices, default=np.nan)

两种印刷品:

    A    B          C
0   X   30     '0-50'
1   Y  150  '100-250'
2   Z  450     '>350'
3  XX  300  '250-350'

相关问题 更多 >