在python中将单行转换为不同的数据帧

2024-10-03 04:35:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在处理一个146行x48列的数据框架。这些列是

['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015','Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016','Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']

我想访问一个特定的行,并想将其转换为以下数据帧

    Year    Rank    Score   Family  Health  Freedom Generosity  Trust
0   2015     NaN      NaN     NaN     NaN     NaN         NaN   NaN
1   2016     NaN      NaN     NaN     NaN     NaN         NaN   NaN
2   2017     NaN      NaN     NaN     NaN     NaN         NaN   NaN
3   2018     NaN      NaN     NaN     NaN     NaN         NaN   NaN
4   2019     NaN      NaN     NaN     NaN     NaN         NaN   NaN 

欢迎任何帮助&;先谢谢你


Tags: 数据框架nanmeanfamilyyearregionfreedom
3条回答

另一种方式:

cols=['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015', 'Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016', 'Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']

# source dataframe
df1 = pd.DataFrame(columns=cols)
df1.loc[0] = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

#target dataframe
df2 = pd.DataFrame(columns=['Year','Rank','Score','Family','Health','Freedom','Generosity','Trust','Economy'])
df2['Year']=['2015','2016','2017','2018','2019','Mean']

df2.set_index('Year', inplace=True)

idx = 0  # source row to copy

for col in df1.columns[1:]: 
    c,r = col.split(" ")
    df2.at[r,c] = df1.at[idx, col]

print (df2)

    Rank Score Family Health Freedom Generosity Trust Economy
Year
2015    1     1      1      1       1          1     1       1
2016    1     1      1      1       1          1     1       1
2017    1     1      1      1       1          1     1       1
2018    1     1      1      1       1          1     1       1
2019    1     1      1      1       1          1     1       1
Mean  NaN     1      1      1       1          1     1       1

下面是一个利用列表理解的解决方案:

输入:

cols = ['Region','Rank 2015','Score 2015','Economy 2015','Family 2015','Health 2015','Freedom 2015','Generosity 2015','Trust 2015','Rank 2016','Score 2016','Economy 2016','Family 2016','Health 2016','Freedom 2016','Generosity 2016','Trust 2016','Rank 2017','Score 2017','Economy 2017','Family 2017','Health 2017','Freedom 2017','Generosity 2017','Trust 2017','Rank 2018','Score 2018','Economy 2018','Family 2018','Health 2018','Freedom 2018','Generosity 2018','Trust 2018','Rank 2019','Score 2019','Economy 2019','Family 2019','Health 2019','Freedom 2019','Generosity 2019','Trust 2019','Score Mean','Economy Mean','Family Mean','Health Mean','Freedom Mean','Generosity Mean','Trust Mean']
df = pd.DataFrame(np.random.randint(1,10,(3,48)))
df.columns = cols
print(df.iloc[:, :4])

   Region  Rank 2015  Score 2015  Economy 2015
0       7          9           9             9
1       8          7           2             3
2       3          3           4             5

新的数据帧将是:

target_cols = ['Rank', 'Score', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust']
years = ['2015', '2016', '2017', '2018', '2019']
newdf = pd.DataFrame([df.loc[1, [x + ' ' + year for x in target_cols]].values for year in years])
newdf.columns = target_cols
newdf['year'] = years
print(newdf)

   Rank  Score  Family  Health  Freedom  Generosity  Trust  year
0     7      2       6       9        3           4      9  2015
1     2      8       1       1        7           6      1  2016
2     7      4       2       5        1           7      4  2017
3     9      7       1       4        7           5      2  2018
4     5      4       4       9        1           6      2  2019

假设您只有2015年至2019年的目标年份;并且目标列是已知的

我的程序如下: (1) 定义目标列和年份 target_columns = ['Rank', 'Score', 'Family', 'Health', 'Freedom', 'Generosity', 'Trust'] target_years = ['2015', '2016', '2017', '2018', '2019']

(2)检索特定的行,我假设您的起始数据帧是initial_dataframe

particular_row = initial_dataframe.iloc[0]

(3)从particular_row中检索和重塑信息

reshaped_row = { 'Year': target_years }

reshaped_row.update({ column_name: [ particular_row[column_name + ' ' + year_name] for year_name in target_years ] for column_name in target_columns })

(4)将重新整形的行分配给output_dataframe

output_dataframe = pd.Dataframe(reshaped_row)

相关问题 更多 >