<p>如果我理解正确,您可以使用内置的pandas函数执行此操作:<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.str.count.html" rel="nofollow noreferrer"><strong>^{<cd1>}</strong></a>来计算<code>queries</code><a href="https://pandas.pydata.org/docs/reference/api/pandas.melt.html" rel="nofollow noreferrer"><strong>^{<cd3>}</strong></a>以重塑为最终的柱结构</p>
<p>给定样本<code>df</code>:</p>
<pre class="lang-py prettyprint-override"><code>df = pd.DataFrame({'docid': {0: 0, 1: 0, 2: 0}, 'title': {0: 'A', 1: 'A', 2: 'A'}, 'lineid': {0: 0, 1: 1, 2: 2}, 'text': {0: 'shopping and orders have become more com...', 1: 'people wrote to the postal service online...', 2: 'text updates really from the U.S. Postal...'}, 'tokencount': {0: 66, 1: 67, 2: 43}})
# docid title lineid text
# 0 0 A 0 shopping and orders have become more com...
# 1 0 A 1 people wrote to the postal service online...
# 2 0 A 2 text updates really from the U.S. Postal...
</code></pre>
<p>第一个{a3}这个{<cd2>}:</p>
<pre class="lang-py prettyprint-override"><code>queries = ['order', 'shop', 'text']
df = df.assign(**{f'query_{query}': df.text.str.count(query) for query in queries})
# docid title lineid text tokencount query_order query_shop query_text
# 0 0 A 0 shopping and orders have become more com... 66 1 1 0
# 1 0 A 1 people wrote to the postal service online... 67 0 0 0
# 2 0 A 2 text updates really from the U.S. Postal... 43 0 0 1
</code></pre>
<p>然后<a href="https://pandas.pydata.org/docs/reference/api/pandas.melt.html" rel="nofollow noreferrer"><strong>^{<cd3>}</strong></a>进入最终的列结构:</p>
<pre class="lang-py prettyprint-override"><code>df.melt(
id_vars=['title', 'lineid'],
value_vars=[f'query_{query}' for query in queries],
var_name='lemma',
value_name='count',
).replace(r'^query_', '', regex=True)
# title lineid lemma count
# 0 A 0 order 1
# 1 A 1 order 0
# 2 A 2 order 0
# 3 A 0 shop 1
# 4 A 1 shop 0
# 5 A 2 shop 0
# 6 A 0 text 0
# 7 A 1 text 0
# 8 A 2 text 1
</code></pre>