基于另一列中的值范围创建具有bucket的列

def conditions(i): if i <=50: return '0-50' if i > 50 and i <=100: return '50-100' if i > 100 and i <=250: return '100-250' if i > 250 and i <=350: return '250-350' if i > 350: return '>350' df['C']=df['B'].apply(conditions)

1条回答

网友

1楼 · 发布于 2024-10-01 00:27:28

正如评论中指出的那样，pd.cut()将是一条道路。您可以使分手动态化，并自行设置：

import pandas as pd
import numpy as np

bins = [0,50, 100,250, 350, np.inf]
labels = ["'0-50'","'50-100'","'100-250'","'250-350'","'>350'"]
df['C'] = pd.cut(df['B'], bins=bins, labels=labels)

再看一下^{}，它是一个基于分位数的离散化函数

或者，使用np.select：

col = 'B'
conditions = [
              df[col].between(0,50),   # inclusive = True is the default
              df[col].between(50,100),  
              df[col].between(100,250),
              df[col].between(250,350),
              df[col].ge(350)
             ]
choices = ["'0-50'","'50-100'","'100-250'","'250-350'","'>350'"]
    
df["C"] = np.select(conditions, choices, default=np.nan)

两种印刷品：

    A    B          C
0   X   30     '0-50'
1   Y  150  '100-250'
2   Z  450     '>350'
3  XX  300  '250-350'

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于另一列中的值范围创建具有bucket的列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >