在pandas中,应该如何添加Agrange列?

2024-05-19 07:21:58 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个简单的数据框,详细说明人们一生中播放音乐的时间,如下所示:

import pandas as pd

df = pd.DataFrame(
    [[15,  8,  7],
     [20, 10, 10],
     [35, 15, 20],
     [50, 12, 38]],
    columns=['current age', 'age started playing music', 'years playing music'])

一个人应该如何添加额外的专栏,以细分他们在每十年中播放音乐的年数?例如,如果添加的列为0-10、10-20、20-30等,那么第一个人在第一个十年中会有2年的音乐演奏经验,第二个十年中有5年,第三个十年中有0年,等等


Tags: columns数据importdataframepandasdfage音乐
2条回答

我建议创建一个函数,该函数将返回一个列表,其中列出每十年播放的年数,然后将其应用于数据帧

import numpy as np

# Create list with numbers of years played in a decade
def get_years_playing_music_decade(current_age, age_start):
    if age_start > current_age: # should not be possible
        return None
    # convert age to list of booleans 
    # was he playing on its i-th Year of living
    # Example : age_start = 3 is a list [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1 ...]
    age_start_lst = [0] * age_start + (100-age_start) * [1]
    # was he living on its i-th Year of living
    current_age_lst = [1] * current_age + (100-current_age) * [0]
    # combination of living and playing
    playing_music_lst = [1 if x==y else 0 for x, y in zip(age_start_lst, current_age_lst)]
    # group by 10y
    playing_music_lst_10y = [sum(playing_music_lst[(10*i):((10*i)+10)]) for i in range(0, 10)]
    return playing_music_lst_10y

get_years_playing_music_decade(current_age=33, age_start=12)
# [0, 8, 10, 3, 0, 0, 0, 0, 0, 0]

# create columns 0-10 .. 90-100
colnames=list()
for i in range(10):
    colnames += [str(10*i) + '-' + str(10*(i+1))]

# apply defined function to the dataframe
df[colnames] = pd.DataFrame(df.apply(lambda x: get_years_playing_music_decade(
    int(x['current age']), int(x['age started playing music'])), axis=1).values.tolist())

enter image description here

您也可以使用pd.cutvalue_counts来尝试此操作:

df.join(df.apply(lambda x: pd.cut(np.arange(x['age started playing music'], 
                                            x['current age']),
                                  bins=[0, 9, 19, 29, 39, 49], 
                                  labels=['0-10', '10-20', 
                                          '20-30', '30-40',
                                          '40+'])
                             .value_counts(),
                 axis=1))

输出:

   current age  age started playing music  years playing music  0-10  10-20  20-30  30-40  40+
0           15                          8                    7     2      5      0      0    0
1           20                         10                   10     0     10      0      0    0
2           35                         15                   20     0      5     10      5    0
3           50                         12                   38     0      8     10     10   10

相关问题 更多 >

    热门问题