<p>第一个想法是将lambda函数与<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.len.html" rel="nofollow noreferrer">^{<cd1>}</a>和<code>max</code>一起使用:</p>
<pre><code>df = (df.groupby('source')['text_column']
.agg(lambda x: x.str.len().max())
.reset_index(name='something'))
print (df)
source something
0 a 9.0
1 b 14.0
2 c 9.0
</code></pre>
<p>或者您可以先使用<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.len.html" rel="nofollow noreferrer">^{<cd1>}</a>,然后聚合<code>max</code>:</p>
<pre><code>df = (df['text_column'].str.len()
.groupby(df['source'])
.max()
.reset_index(name='something'))
print (df)
</code></pre>
<p>如果需要整数,请首先使用<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html" rel="nofollow noreferrer">^{<cd5>}</a>:</p>
<pre><code>df = (df.dropna(subset=['text_column'])
.assign(text_column=lambda x: x['text_column'].str.len())
.groupby('source', as_index=False)['text_column']
.max())
print (df)
source text_column
0 a 9
1 b 14
2 c 9
</code></pre>
<p>编辑:对于第一个和第二个顶级值,使用<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html" rel="nofollow noreferrer">^{<cd6>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.head.html" rel="nofollow noreferrer">^{<cd7>}</a>:</p>
<pre><code>df1 = (df.dropna(subset=['text_column'])
.assign(something=lambda x: x['text_column'].str.len())
.sort_values(['source','something'], ascending=[True, False])
.groupby('source', as_index=False)
.head(2))
print (df1)
source text_column something
0 a abcdefghi 9
1 a abcde 5
7 b qazxswedcdcvfr 14
2 b qwertyiop 9
3 c plmnkoijb 9
5 c abcde 5
</code></pre>
<p>具有<a href="http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.SeriesGroupBy.nlargest.html" rel="nofollow noreferrer">^{<cd8>}</a>的替代解决方案,明显较慢:</p>
<pre><code>df1 = (df.dropna(subset=['text_column'])
.assign(something=lambda x: x['text_column'].str.len())
.groupby('source')['something']
.nlargest(2)
.reset_index(level=1, drop=True)
.reset_index())
print (df1)
source something
0 a 9
1 a 5
2 b 14
3 b 9
4 c 9
5 c 5
</code></pre>
<p>top1、top2新列的最后解决方案:</p>
<pre><code>df=df.dropna(subset=['text_column']).assign(something=lambda x: x['text_column'].str.len())
df = df.sort_values(['source','something'], ascending=[True, False])
df['g'] = df.groupby('source').cumcount().add(1)
df = (df[df['g'].le(2)].pivot('source','g','something')
.add_prefix('top')
.rename_axis(index=None, columns=None))
print (df)
top1 top2
a 9 5
b 14 9
c 9 5
</code></pre>