回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>假设我有以下数据集:</p>
<pre><code> pos sentence_idx word
NNS 1.0 Thousands
IN 1.0 of
NNS 1.0 demonstrators
VBP 1.0 have
VBN 1.0 marched
... ... ... ...
PRP 47959.0 they
VBD 47959.0 responded
TO 47959.0 to
DT 47959.0 the
NN 47959.0 attack
</code></pre>
<p>我想创建句子(为此,我必须使用句子\ idx)。我可以使用以下代码执行此操作:</p>
<pre><code>sent = []
for i in df['sentence_idx'].unique():
sent.append([(w,t) for w,t in zip(df[df['sentence_idx'] == i]['word'].values.tolist(),df[df['sentence_idx'] == i]['pos'].values.tolist())])
</code></pre>
<p>但首先,它效率不高(使用for循环而不是numpy/pandas函数),而且看起来很难看。
我怎样才能更有效地完成它</p>
<p><strong>编辑:</strong>
结果应该是句子,其中每个元素都是一个元组(单词、词组):</p>
<pre><code>[[('Thousands', 'NNS'),
('of', 'IN'),
('demonstrators', 'NNS'),
('have', 'VBP'),
('marched', 'VBN'),
('through', 'IN'),
('London', 'NNP'),
('to', 'TO'),
('protest', 'VB'),
('the', 'DT'),
('war', 'NN'),
('in', 'IN'),
('Iraq', 'NNP'),
('and', 'CC'),
('demand', 'VB'),
('withdrawal', 'NN'),
('British', 'JJ'),
('troops', 'NNS'),
('from', 'IN'),
('that', 'DT'),
('country', 'NN'),
('.', '.')],
[('Families', 'NNS'),
('of', 'IN'),
('soldiers', 'NNS'),
('killed', 'VBN'),
('in', 'IN'),
('the', 'DT'),
('conflict', 'NN'),
('joined', 'VBD'),
('protesters', 'NNS'),
('who', 'WP'),
('carried', 'VBD'),
('banners', 'NNS'),
('with', 'IN'),
('such', 'JJ'),
('slogans', 'NNS'),
('as', 'IN'),
('"', '``'),
('Bush', 'NNP'),
('Number', 'NN'),
('One', 'CD'),
('Terrorist', 'NN'),
('and', 'CC'),
('Stop', 'VB'),
('Bombings', 'NNS'),
('.', '.')],...
</code></pre>