Pandas将一行分成2或3行(取初始行值的百分比)

2024-10-06 07:40:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经看到了很多关于这个问题的问题,但是我仍然不能把它们放在一起解决这个问题

我有一个像这样的df

idx value   name1   %1  name2   %2  name3   %3
0   100     person1 0.3 person2 0.5 person3 0.2
1   100     person4 1.0 None    NaN None    None
2   100     person1 0.6 person5 0.4 None    None

像这样生成:pd.DataFrame(columns= ['value','name1','%1','name2','%2','name3','%3'],data=[[100,'person1',0.3,'person2',0.5,'person3','0.2'],[100,'person4',1], [100,'person1',0.6,'person5',0.4]])

我想将具有多个名称的每一行拆分为各自的行,如下所示:

idx value   name    
0   30      person1 
1   50      person2 
2   20      person3
3   100     person4
4   60      person1 
5   40      person5

需要为每个唯一的人获取初始值的百分比,并为他们创建新行。例如,人员1第0行-100*0.3(%1个值)

希望这是明确的。非常感谢您的帮助


Tags: nonedataframedfvaluenanpdidxname1
3条回答

首先,让我们添加我们需要的信息:

a['value * %1'] = a['value'] * a['%1']
a['value * %2'] = a['value'] * a['%2']
a['value * %3'] = a['value'] * a['%3']

其结果是:

    value   name1   %1  name2   %2  name3   %3   value * %1 value * %2  value * %3
0   100     person1 0.3 person2 0.5 person3 0.2  30.0       50.0        20.0
1   100     person4 1.0 None    NaN None    NaN  100.0      NaN         NaN
2   100     person1 0.6 person5 0.4 None    NaN  60.0       40.0        NaN

现在,我们只需要创建一个新的数据帧并在其中输入值:

df = pd.DataFrame()

df['value'] = a['value * %1'].tolist() + a['value * %2'].tolist() + a['value * %3'].tolist()
df['name'] = a['name1'].tolist() + a['name2'].tolist() + a['name3'].tolist()

并在最后删除Nan值:

df.dropna()

也许有更好的办法。但这就是我想到的

您可以尝试以下方法:

df

   value    name1   %1    name2   %2    name3   %3
0    100  person1  0.3  person2  0.5  person3  0.2
1    100  person4  1.0     None  NaN     None  NaN
2    100  person1  0.6  person5  0.4     None  NaN
def get_value(sr):
    dict_={}
    for i in range(1,4):
        if sr['name'+str(i)] is None:
            continue
        dict_[sr['name'+str(i)]] = sr['value']*sr['%'+str(i)]
    return pd.Series(dict_)

df_new = df.apply(lambda x : get_value(x), axis=1).stack().reset_index()
df_new

   level_0  level_1      0
0        0  person1   30.0
1        0  person2   50.0
2        0  person3   20.0
3        1  person4  100.0
4        2  person1   60.0
5        2  person5   40.0

这是一个多阶段解决方案,注释中有注释:

import pandas as pd

df = pd.DataFrame(columns=['value', 'name1', '%1', 'name2', '%2', 'name3', '%3'], 
                  data=[[100, 'person1', 0.3, 'person2', 0.5, 'person3', '0.2'],
                       [100, 'person4', 1], [100, 'person1', 0.6, 'person5', 0.4]])

# Move the name columns below each other in rows
df1 = pd.melt(df, id_vars=['value'], value_vars=['name1', 'name2', 'name3'], 
              value_name='name')

# Move the percentage columns below each other in rows
df2 = pd.melt(df, id_vars=['value'], value_vars=['%1', '%2', '%3'], 
              value_name='percentage')

# Some input of percentages was string (note '0.2' in the question); 
# let's make it's all float
df2['percentage'] = df2['percentage'].astype(float)

# NaNs are equivalent to zero in this case; easier to calculate with 0.0
df2 = df2.fillna(0)

# We can safely concatenate the two frames, under the assumption that in df1, 
# the various name and percentage columns match
df3 = pd.concat([df1, df2], axis=1)

# Remove duplicated columns from the concatenation ('value')
df3 = df3.loc[:, ~df3.columns.duplicated()]

# Calculate the actual procentual values
df3.loc[:, 'value'] = df3['value'] * df3['percentage']

# dropna() will remove any row with a NaN/None anywhere. Since we've already 
# replaced the percentages with 0.0, this will drop rows that have a
# 'name' of None
df4 = df3.dropna()

# Select the two relevant columns
df4 = df4[['value', 'name']]
print(df4)

   value     name
0   30.0  person1
1  100.0  person4
2   60.0  person1
3   50.0  person2
5   40.0  person5
6   20.0  person3

相关问题 更多 >