对数据帧中不同类别的数据求和

2024-06-28 20:51:17 发布

您现在位置:Python中文网/ 问答频道 /正文

enter image description here

我已经创建了一个excel电子表格的数据,并已转移到一个CSV文件。我想在每个不同的年份添加每个种族的数据。我曾尝试创建一个数据索引,并尝试对每个种族进行合计,但能够保存或包含数据。 我用过df。以及创建“for”循环,以便我可以保存每个种族的数据,但收到错误消息。原始excel表包含与特定年份相关的特定节目的每个种族的数据框。我无法计算每个种族每年的列数。你知道吗

我是否应该使用for或if循环来遍历特定年份,我的方法是否正确?你知道吗

#this is the first method I have tried
import pandas as pd
import numpy as np

from google.colab import files
uploaded = files.upload()
# df = pd.read_csv('/content/drive/My Drive/allTheaterDataV2.csv')

import io
df = pd.read_csv(io.BytesIO(uploaded['allTheaterDataV2.csv']))
# Daset is now stored in a Pandas Dataframe

#create list that contains the specific season that we want to reference
# print(df)

data = pd.DataFrame(allTheaterDataV2)

dataindex = [20082009, 20102011, 20112012, 20122013, 20132014, 20142015]
print(dataindex)


df.loc['total',:] = df.sum(axis=0)

print(df.loc[1:42, ['ASIAM','AFRAM','LAT','CAU','OTH']].sum())

# The second method I have tried is included below
for i in dataindex:
  # create a new data frame that stores the data per year
  hold_ASIAM = df[df.index == i]
  # allows for data for each season to be contained together
  ETHtotalASIAM = df['ASIAM'].sum()
  hold_ASIAM.append(ETHtotalASIAM)
print(hold_ASIAM)

我希望输出能给我每年(20082009)每个种族(例如非洲)的总数(一些#),但实际输出是“name‘allTheaterDataV2’is not defined”


Tags: csvthe数据importdffordatathat
1条回答
网友
1楼 · 发布于 2024-06-28 20:51:17

这应该管用。你知道吗

import pandas as pd

df = pd.DataFrame({'ID':['Billy Elliot','next to normal','shrek','guys and dolls',
                         'west side story', 'pal joey'],
                   'Season' : [20082009,20082009,20082009,
                               20082009,20082009,20082009],
                   'AFRAM' : [2,0,4,4,0,1],
                   'ASIAM' : [0,0,1,0,0,0],
                   'CAU' : [48,10,25,24,28,20],
                   'LAT' : [1,0,1,3,18,0],
                   'OTH' : [0,0,0,0,0,0]}) 

print(df)
#    AFRAM  ASIAM  CAU               ID  LAT  OTH    Season
# 0      2      0   48     Billy Elliot    1    0  20082009
# 1      0      0   10   next to normal    0    0  20082009
# 2      4      1   25            shrek    1    0  20082009
# 3      4      0   24   guys and dolls    3    0  20082009
# 4      0      0   28  west side story   18    0  20082009
# 5      1      0   20         pal joey    0    0  20082009

# drop the ID column since it is just a string
df = df.drop(['ID'], axis = 1)

# group by season and add the other columns
df = df.groupby('Season').sum()

print(df)
#             AFRAM  ASIAM  CAU  LAT  OTH
# Season                                 
# 20082009     11      1  155   23    0

相关问题 更多 >