在DataFram中组合行

df = s token pred tokenID 17 hakawati B-Loc 3 17 theatre L-Loc 3 17 jerusalem U-Loc 7 56 university B-Org 5 56 of I-Org 5 56 texas I-Org 5 56 here L-Org 6 ... 5402 dwight B-Peop 1 5402 d. I-Peop 1 5402 eisenhower L-Peop 1

df2 = s token pred 17 hakawati theatre Location 17 jerusalem Location 56 university of texas here Organisation ... 5402 dwight d. eisenhower People

2条回答

网友

1楼 · 编辑于 2024-09-21 05:22:25

您可以同时按s和tokenID分组，并按如下方式聚合：

def aggregate(df):
    token = " ".join(df.token)
    pred = df.iloc[0].pred.split("-", 1)[1]
    return pd.Series({"token": token, "pred": pred})

df.groupby(["s", "tokenID"]).apply(aggregate)

# Output
                             token  pred
s    tokenID                            
17   3            hakawati theatre   Loc
     7                   jerusalem   Loc
56   5         university of texas   Org
     6                        here   Org
5402 1        dwight d. eisenhower  Peop

网友

2楼 · 编辑于 2024-09-21 05:22:25

一个解决方案通过一个辅助列

df['pred_cat'] = df['pred'].str.split('-').str[-1]

res = df.groupby(['s', 'pred_cat'])['token']\
        .apply(' '.join).reset_index()

print(res)

      s pred_cat                       token
0    17      Loc  hakawati theatre jerusalem
1    56      Org    university of texas here
2  5402     Peop        dwight d. eisenhower

请注意，这与所需的输出不完全匹配；似乎涉及到一些特定于数据的处理方法

相关问题更多 >

编程相关推荐

热门问题

热门文章