从数据框架创建句子

pos sentence_idx word NNS 1.0 Thousands IN 1.0 of NNS 1.0 demonstrators VBP 1.0 have VBN 1.0 marched ... ... ... ... PRP 47959.0 they VBD 47959.0 responded TO 47959.0 to DT 47959.0 the NN 47959.0 attack

[[('Thousands', 'NNS'), ('of', 'IN'), ('demonstrators', 'NNS'), ('have', 'VBP'), ('marched', 'VBN'), ('through', 'IN'), ('London', 'NNP'), ('to', 'TO'), ('protest', 'VB'), ('the', 'DT'), ('war', 'NN'), ('in', 'IN'), ('Iraq', 'NNP'), ('and', 'CC'), ('demand', 'VB'), ('withdrawal', 'NN'), ('British', 'JJ'), ('troops', 'NNS'), ('from', 'IN'), ('that', 'DT'), ('country', 'NN'), ('.', '.')], [('Families', 'NNS'), ('of', 'IN'), ('soldiers', 'NNS'), ('killed', 'VBN'), ('in', 'IN'), ('the', 'DT'), ('conflict', 'NN'), ('joined', 'VBD'), ('protesters', 'NNS'), ('who', 'WP'), ('carried', 'VBD'), ('banners', 'NNS'), ('with', 'IN'), ('such', 'JJ'), ('slogans', 'NNS'), ('as', 'IN'), ('"', '``'), ('Bush', 'NNP'), ('Number', 'NN'), ('One', 'CD'), ('Terrorist', 'NN'), ('and', 'CC'), ('Stop', 'VB'), ('Bombings', 'NNS'), ('.', '.')],...

2条回答

网友

1楼 · 编辑于 2024-06-24 12:32:04

这应该起作用：

def compute(_):
    return [*zip(_['word'], _['pos'])]

df.groupby('sentence_idx').apply(compute).values.tolist()

网友

2楼 · 编辑于 2024-06-24 12:32:04

不确定效率，但以下是实现这一点的一些方法：

df.groupby('sentence_idx')[['word', 'pos']].apply(lambda x: list(zip(*zip(*x.values.tolist())))).tolist()

df.groupby('sentence_idx').apply(lambda x: x[['word', 'pos']].apply(tuple, axis=1).tolist())

df.groupby('sentence_idx').apply(lambda x: [tuple(y) for y in x[['word', 'pos']].values]).tolist()

如果您不一定需要它作为tuple（即list就可以了），那么它就简单得多：

df.groupby('sentence_idx').apply(lambda x: x[['word', 'pos']].values.tolist()).tolist()

相关问题更多 >

编程相关推荐

热门问题

热门文章