如何合并多个数据帧并使用Pandas为假人添加列？

df1 = pd.DataFrame({'id': ['01','02'], 'like_wave_1': ['red','red'], 'like_wave_2': ['red','yellow']}) print(df1) df2 = pd.DataFrame({'id': ['01','02'], 'dislike_wave1_yellow': ['7','2'], 'dislike_wave1_red':['1','1'], 'dislike_wave1_blue':['2','7'], 'dislike_wave2_yellow': ['7','1'], 'dislike_wave2_red':['1','2'], 'dislike_wave2_blue':['3','7']}) print(df2)

list_id = ['01','02'] list_color = ['yellow','red','blue'] list_wave = ['1','2'] expand = list(product(list_id, list_color, list_wave)) df = pd.DataFrame.from_records(expand, columns=['id', 'color', 'wave']) print(df) id color wave 0 01 yellow 1 1 01 yellow 2 2 01 red 1 3 01 red 2 4 01 blue 1 5 01 blue 2 6 02 yellow 1 7 02 yellow 2 8 02 red 1 9 02 red 2 10 02 blue 1 11 02 blue 2

id color wave like dislike 0 01 yellow 1 0 7 1 01 yellow 2 0 7 2 01 red 1 1 1 3 01 red 2 1 1 4 01 blue 1 0 2 5 01 blue 2 0 3 6 02 yellow 1 0 2 7 02 yellow 2 1 1 8 02 red 1 1 1 9 02 red 2 0 2 10 02 blue 1 0 7 11 02 blue 2 0 7

2条回答

网友

1楼 · 编辑于 2024-06-30 14:03:57

在合并之前，我们可以使用pivot_longerfrom pyjanitor来重塑各个数据帧：

left = (df1.pivot_longer('id', 
                         names_to=('.value', 'num'), 
                         names_pattern=r".+_(.+)_(\d$)")
           .rename(columns={"wave":"color",
                            "num":"wave"})
           .assign(like = 1)
         )

left
 
   id wave   color  like
0  01    1     red     1
1  02    1     red     1
2  01    2     red     1
3  02    2  yellow     1


right = (df2.pivot_longer('id',
                          names_to=(".value", "dislike", "color"), 
                          names_pattern = r".+_(.+)(\d)_(.+)", 
                          sort_by_appearance=True)
           .rename(columns = {"dislike":"wave", "wave":"dislike"})
          )

right
 
    id wave   color dislike
0   01    1  yellow       7
1   01    1     red       1
2   01    1    blue       2
3   01    2  yellow       7
4   01    2     red       1
5   01    2    blue       3
6   02    1  yellow       2
7   02    1     red       1
8   02    1    blue       7
9   02    2  yellow       1
10  02    2     red       2
11  02    2    blue       7

right.merge(left, how = 'outer').fillna(0)

    id wave   color dislike  like
0   01    1  yellow       7   0.0
1   01    1     red       1   1.0
2   01    1    blue       2   0.0
3   01    2  yellow       7   0.0
4   01    2     red       1   1.0
5   01    2    blue       3   0.0
6   02    1  yellow       2   0.0
7   02    1     red       1   1.0
8   02    1    blue       7   0.0
9   02    2  yellow       1   1.0
10  02    2     red       2   0.0
11  02    2    blue       7   0.0

网友

2楼 · 编辑于 2024-06-30 14:03:57

尝试将两个帧转换为与另一帧兼容的格式：

DF1

# Get df1 into usable format
df1 = df1.set_index('id')
# Create Multi Index by splitting columns on '_'
df1.columns = df1.columns.str.split('_', expand=True)
# Stack to create long format frame
df1 = df1.stack().reset_index()
# Fix column names to match df2/output
df1.columns = ['id', 'wave', 'color']
# Set like to 1 for these since this table indicates likes
df1['like'] = 1

df1：

   id wave   color  like
0  01    1     red     1
1  01    2     red     1
2  02    1     red     1
3  02    2  yellow     1

DF2

# Get df2 into usable format
# Set index to ID
df2 = df2.set_index('id')
# Create Multi Index by splitting columns on '_'
df2.columns = df2.columns.str.split('_', expand=True)
# Stack to create long format frame
df2 = df2.stack(level=[1, 2]).reset_index()
# Fix column names to match df1
df2.columns = ['id', 'wave', 'color', 'dislike']
# Turn "wave1" into 1, "wave2" into 2, ... etc.
df2['wave'] = df2['wave'].str.lstrip('wave')

df2：

    id wave   color dislike
0   01    1    blue       2
1   01    1     red       1
2   01    1  yellow       7
3   01    2    blue       3
4   01    2     red       1
5   01    2  yellow       7
6   02    1    blue       7
7   02    1     red       1
8   02    1  yellow       2
9   02    2    blue       7
10  02    2     red       2
11  02    2  yellow       1

然后merge将帧放在一起：

# Merge On Common Columns
df3 = df1.merge(df2, on=['id', 'wave', 'color'], how='outer')

# Fill empty values in like and dislike with 0 (only 1s in source DF1)
# (Fill dislikes in case there are likes in df1 that are not dislikes in df2)
df3[['like', 'dislike']] = df3[['like', 'dislike']].fillna(0).astype(int)

# Sort Values and fix index (to match output in question)
df3 = df3.sort_values(
    ['id', 'color'], ascending=[True, False]
).reset_index(drop=True)

df3：

    id wave   color  like dislike
0   01    1  yellow     0       7
1   01    2  yellow     0       7
2   01    1     red     1       1
3   01    2     red     1       1
4   01    1    blue     0       2
5   01    2    blue     0       3
6   02    1  yellow     0       2
7   02    2  yellow     1       1
8   02    1     red     1       1
9   02    2     red     0       2
10  02    1    blue     0       7
11  02    2    blue     0       7

相关问题更多 >

编程相关推荐

热门问题

热门文章