熊猫：通过重复的ID条件合并/连接数据框架

+--------+------------+ | GlobID | Issue | +--------+------------+ | 1 | Building M | +--------+------------+ | 2 | Building V | +--------+------------+ | 3 | Building H | +--------+------------+

+----+---------+---------+------------+---------+---------+------------+ | ID | Issue_A | Note_A | Location_A | Issue_B | Note_B | Location_B | +----+---------+---------+------------+---------+---------+------------+ | 1 | Y | broken | bathroom | N | | | +----+---------+---------+------------+---------+---------+------------+ | 2 | Y | stained | bedroom | Y | rusty | basement | +----+---------+---------+------------+---------+---------+------------+ | 3 | Y | missing | kitchen | Y | cracked | attic | +----+---------+---------+------------+---------+---------+------------+

2条回答

网友

1楼 · 编辑于 2024-06-01 09:15:55

您可以使用这样的算法：

df1 = pd.DataFrame([[1,"Building M"],[2,"Building V"], [3, "Building H"]], columns=["GlobID","Issue"])
df2 = pd.DataFrame([[1,"Y","broken","bathroom","N","",""],
                    [2,"Y","stained","bedroom","Y","rusty","basement"],
                    [3,"Y","missing","kitchen","Y","cracked","attic"]], 
                   columns=["ID","Issue_A","Note_A", "Location_A", "Issue_B", "Note_B", "Location_B"])

df1 = df1.set_index("GlobID")
df2 = df2.set_index("ID")

# divide our df2 to list of data frames
issues = ["A", "B"]
description = ["Issue", "Note", "Location"]
delimiter = "_"
issues_df_list = []
for issue in issues:
    # prepare concrete issue description fields
    issue_labels = [descr + delimiter + issue for descr in description]
    # select sub df for each issue
    df = df2[issue_labels]
    # rename and unify columns labels
    df.columns = description
    # then add sub df to the df list
    issues_df_list.append(df)

# then concat list of dfs to one big df
issues_df = pd.concat(issues_df_list,sort=False) # some kind of reshaping

# drop rows with "N" values
issues_df = issues_df[issues_df["Issue"] != "N"]

# drop Issue column
issues_df = issues_df.loc[:,issues_df.columns != "Issue"]

# rename Note column label to the Issue 
issues_df = issues_df.rename(columns={"Note":"Issue"})

issues_df

它给你：

+  +    -+     +
|    |  Issue  | Location |
+  +    -+     +
| ID |         |          |
| 1  | broken  | bathroom |
| 2  | stained | bedroom  |
| 3  | missing | kitchen  |
| 2  | rusty   | basement |
| 3  | cracked | attic    |
+  +    -+     +

然后你可以做一个简单的合并：

pd.merge(df1.rename(columns={"Issue":"Name"}), issues_df, left_index=True, right_index=True)

+ -+      +    -+     +
|   |    Name    |  Issue  | Location |
+ -+      +    -+     +
| 1 | Building M | broken  | bathroom |
| 2 | Building V | stained | bedroom  |
| 2 | Building V | rusty   | basement |
| 3 | Building H | missing | kitchen  |
| 3 | Building H | cracked | attic    |
+ -+      +    -+     +

网友

2楼 · 编辑于 2024-06-01 09:15:55

这是解决问题的简单方法：

df1 = pd.DataFrame([[1, "Building M"], [2, "Building V"], [3, "Building H"]], columns=["id", "Issue"])
df2 = pd.DataFrame([[1, "Y", "broken", "bathroom", "N", np.nan, np.nan], [2,"Y", "stained", "bedroom", "Y", "rusty", "basement"], [3, "Y", "missing", "kitchen", "Y", "cracked", "attic"]], columns=["id", "Issue_A", "Note_A", "Location_A", "Issue_B", "Note_B", "Location_B"])

df2 = pd.concat([df2[["id", "Issue_A", "Location_A"]], df2[["id", "Issue_B", "Location_B"]].rename(columns={"Issue_B" : "Issue_A", "Location_B" : "Location_A" })]).dropna()

df_result = pd.merge(df1, df2, how="left")

print(df_result)

相关问题更多 >

编程相关推荐

热门问题

热门文章