为DataFram每行返回多行

merged_df Full Name Kommata 2007 Kommata 2015 Kommata 2019 0 Athanasios bouras New democracy New democracy New democracy 1 Andreas loverdos Pasok Pasok-democratic alignment Movement for change 2 Theodora tzakri Pasok Pasok Syriza 3 Thanasis zempilis Pasok NaN New democracy

edges_df Source Target 0 New democracy_2007 New democracy_2015 1 New democracy_2015 New democracy_2019 2 Pasok_2007 Pasok-democratic alignment_2015 3 Pasok-democratic alignment_2015 Movement for change_2019 4 Pasok_2007 Pasok_2015 5 Pasok_2015 Syriza_2019 6 Pasok_2007 New democracy_2019

1条回答

网友

1楼 · 发布于 2024-10-01 02:23:41

可以使用pd.melt执行以下操作：

# A list of columns to melt.
value_cols = list(df.columns)[1:]

# Melt said columns while leaving the others (in this case only 'Full Name') intact.
df = pd.melt(df, id_vars=['Full Name'], value_vars=value_cols)

# Get the year from 'variable'
df['variable'] = df['variable'].str.split(' ').apply(lambda x:x[-1])

# Sort the values by 'Full Name' and then year (required).
df = df.sort_values(by=['Full Name', 'variable'])

# Drop rows with empty values.
df = df.dropna()

df['Source'] = df['value'] + '_' + df['variable']

# Pair the values (This is why the previous sort is required).
df['Target'] = df['Source'].shift(-1)

# Remove rows where the values don't belong to the same name.
mask = df['Full Name'].eq(df['Full Name'].shift(-1).bfill())
df = df.loc[mask]

# Keep only relevant columns.
df = df.reindex(columns=['Source', 'Target'])

我假设输出的顺序无关紧要。此代码的输出将按“全名”的字母顺序排序。
如果需要保持顺序，则需要修改df.sort_values行，以便按照“全名”的原始顺序排序，而不是按字母顺序排序。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章