如何基于for循环中的数据构造新的数据帧？

for i in datacomplete2['Country'].unique(): life.append(datacomplete2.loc[(datacomplete2['Country']==i)&(datacomplete2['Year']==2016), 'life'] - datacomplete2.loc[(datacomplete2['Country']==i)&(datacomplete2['Year']==2000), 'life']) health.append(datacomplete2.loc[(datacomplete2['Country']==i)&(datacomplete2['Year']==2016), 'health'] - datacomplete2.loc[(datacomplete2['Country']==i)&(datacomplete2['Year']==2000), 'health']) lifegdp.append(datacomplete2.loc[(datacomplete2['Country']==i)&(datacomplete2['Year']==2016), 'lifegdp'] - datacomplete2.loc[(datacomplete2['Country']==i)&(datacomplete2['Year']==2000), 'lifegdp']) newData = pd.DataFrame([life, health, lifegdp, datacomplete2['Country'].unique()], columns = ['life', 'health', 'lifegdp', 'country']) newData

Country Code Year life health lifegdp 0 Algeria DZA 2000 70.292000 3.489033 20.146558 1 Algeria DZA 2016 76.078000 6.603844 11.520259 2 Angola AGO 2000 47.113000 1.908599 24.684593 3 Angola AGO 2016 61.547000 2.713149 22.684710 4 Antigua and Barbuda ATG 2000 73.541000 4.480701 16.412834 ... ... ... ... ... ... ... 415 Vietnam VNM 2016 76.253000 5.659194 13.474181 416 World OWID_WRL 2000 67.684998 8.617628 7.854249 417 World OWID_WRL 2016 72.035337 9.978453 7.219088 418 Zambia ZMB 2000 44.702000 7.152371 6.249955 419 Zambia ZMB 2016 61.874000 4.477207 13.819775

2条回答

网友

1楼 · 编辑于 2024-05-19 12:34:51

你可以这样做

country_list = df.Country.unique().tolist()
df.drop(columns = ['Code'])

df_2016 = df.loc[(df['Country'].isin(country_list))&(df['Year']==2016)].reset_index()
df_2000 = df.loc[(df['Country'].isin(country_list))&(df['Year']==2000)].reset_index()
df_2016.drop(columns=['Year'])
df_2000.drop(columns=['Year'])
df_2016.set_index('Country').subtract(df_2000.set_index('Country'), fill_value=0)

网友

2楼 · 编辑于 2024-05-19 12:34:51

Anurag Reddy的答案是一个很好的简明解决方案，如果你提前知道日期的话。为了给出一个更一般的备选答案，这个问题是pandas.DataFrame.diff的一个很好的示例用例

注意，您实际上不需要对示例数据中的数据进行排序，但我在下面包含了一行sort_values()来说明未排序的数据帧

import pandas as pd

# Read the raw datafile in
df = pd.read_csv("example.csv")

# Sort the data if required
df.sort_values(by=["Country"], inplace=True)

# Remove columns where you don't need the difference
new_df = df.drop(["Code", "Year"], axis=1)

# Group the data by country, take the difference between the rows, remove NaN rows, and reset the index to sequential integers
new_df = new_df.groupby(["Country"], as_index=False).diff().dropna().reset_index(drop=True)

# Add back the country names and codes as columns in the new DataFrame
new_df.insert(loc=0, column="Country", value=df["Country"].unique())
new_df.insert(loc=1, column="Code", value=df["Code"].unique())

相关问题更多 >

编程相关推荐

热门问题

热门文章