基于anoth中的值自行创建列

2024-09-19 07:05:33 发布

您现在位置:Python中文网/ 问答频道 /正文

这和我昨天问的问题很相似。其目的是能够添加一个功能,允许根据另一个列中显示的值创建一个列。例如,当它在指定的文件中找到一个国家代码时,我希望它创建一个名为“country codeTotal”的列,并对具有相同国家代码的每行的单位数量求和

这是我的脚本目前的输出:

Script Output

我想看到的是:

Goal

我的剧本:

df['Sum of Revenue'] = df['Units Sold'] * df['Dealer Price']
    df['AR Revenue'] = df[]
    df = df.sort_values(['End Consumer Country', 'Currency Code'])
    # Sets first value of index by position
    df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()
    # Sets first value of index by position
    df.loc[df.index[0], 'Total Revenue'] = df['Sum of Revenue'].sum()
    # Sums the amout of Units with the End Consumer Country AR
    df['AR Total'] = df.loc[df['End Consumer Country'] == 'AR', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country AU
    df['AU Total'] = df.loc[df['End Consumer Country'] == 'AU', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country NZ
    df['NZ Total'] = df.loc[df['End Consumer Country'] == 'NZ', 'Units Sold'].sum()

但是,由于我知道这个文件中会出现哪些国家,因此我将它们相应地添加到我的脚本中以查找。如何编写脚本,以便在找到另一个国家/地区代码(例如GB)时,创建一个名为“GB Total”的列,并将国家/地区代码设置为GB时每行的单位相加。你知道吗

任何帮助都将不胜感激!你知道吗


Tags: ofthe代码dfconsumer国家countryloc
1条回答
网友
1楼 · 发布于 2024-09-19 07:05:33

如果您真的需要这种格式,那么下面是我将如何继续(下面的起始数据):

# Get those first two columns
d = {'Sum of Revenue': 'Total Revenue', 'Units Sold': 'Total Sold'}
for col, newcol in d.items():
    df.loc[df.index[0], newcol] = df[col].sum()

# Add the rest for every country:
s = df.groupby('End Consumer Country')['Units Sold'].sum().to_frame().T.add_suffix(' Total')
s.index = [df.index[0]]

df  = pd.concat([df, s], 1, sort=False)

输出:df

  End Consumer Country  Sum of Revenue  Units Sold  Total Revenue  Total Sold  AR Total  AU Total  NZ Total  US Total
a                   AR       13.486216           1     124.007334        28.0       3.0       7.0      11.0       7.0
b                   AR       25.984073           2            NaN         NaN       NaN       NaN       NaN       NaN
c                   AU       21.697871           3            NaN         NaN       NaN       NaN       NaN       NaN
d                   AU       10.962232           4            NaN         NaN       NaN       NaN       NaN       NaN
e                   NZ       16.528398           5            NaN         NaN       NaN       NaN       NaN       NaN
f                   NZ       29.908619           6            NaN         NaN       NaN       NaN       NaN       NaN
g                   US        5.439925           7            NaN         NaN       NaN       NaN       NaN       NaN

如您所见,pandas添加了一堆NaN值,因为我们只给第一行赋值,而且DataFrame必须是矩形的


用一个不同的DataFrame来概括每个国家的总数和内部情况要简单得多。如果这是好的,那么一切都简化为一个.pivot_table

df.pivot_table(index='End Consumer Country', 
               values=['Sum of Revenue', 'Units Sold'],
               margins=True,
               aggfunc='sum').T.add_suffix(' Total)

输出:

End Consumer Country   AR Total   AU Total   NZ Total  US Total   All Total
Sum of Revenue        39.470289  32.660103  46.437018  5.439925  124.007334
Units Sold             3.000000   7.000000  11.000000  7.000000   28.000000

同样的信息,更容易编码。你知道吗


样本数据:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'End Consumer Country': ['AR', 'AR', 'AU', 'AU', 'NZ', 'NZ', 'US'],
                   'Sum of Revenue': np.random.normal(20,6,7),
                   'Units Sold': np.arange(1,8,1)},
                   index = list('abcdefg'))

  End Consumer Country  Sum of Revenue  Units Sold
a                   AR       13.486216           1
b                   AR       25.984073           2
c                   AU       21.697871           3
d                   AU       10.962232           4
e                   NZ       16.528398           5
f                   NZ       29.908619           6
g                   US        5.439925           7

相关问题 更多 >