<p>让我们试试<code>groupby</code>和<code>reduce</code>:</p>
<pre><code>from functools import reduce
dfs = [pd.DataFrame(1, index=list(s), columns=list(s))
for _, s in df.groupby('user_id')['page_view_page_slug']]
df_out = reduce(lambda x, y: x.add(y, fill_value=0), dfs).fillna(0).astype(int)
</code></pre>
<p><strong>详细信息:</strong></p>
<p><a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd3>}</a>在<code>user_id</code>上的数据帧,然后为<code>page_view_page_slug</code>中的每个组创建一个邻接数据帧,其索引和列对应于该组中的<code>slugs</code></p>
<pre><code>>>> dfs
[ slug1 slug2 slug3 slug4
slug1 1 1 1 1
slug2 1 1 1 1
slug3 1 1 1 1
slug4 1 1 1 1,
slug5 slug3 slug2 slug1
slug5 1 1 1 1
slug3 1 1 1 1
slug2 1 1 1 1
slug1 1 1 1 1]
</code></pre>
<p>现在<a href="https://docs.python.org/3/library/functools.html#reduce" rel="nofollow noreferrer">^{<cd2>}</a>使用带有可选参数<code>fill_value=0</code>的缩减函数<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add.html" rel="nofollow noreferrer">^{<cd9>}</a>来计算上述邻接数据帧,以便计算段塞横截面的用户id</p>
<pre><code>>>> df_out
slug1 slug2 slug3 slug4 slug5
slug1 2 2 2 1 1
slug2 2 2 2 1 1
slug3 2 2 2 1 1
slug4 1 1 1 1 0
slug5 1 1 1 0 1
</code></pre>
<hr/>
<p><strong>可选地</strong>您可以将上述代码包装到函数中,如下所示:</p>
<pre><code>def count():
df_out = pd.DataFrame()
for _, s in df.groupby('user_id')['page_view_page_slug']:
df_out = df_out.add(
pd.DataFrame(1, index=list(s), columns=list(s)), fill_value=0)
return df_out.fillna(0).astype(int)
>>> count()
slug1 slug2 slug3 slug4 slug5
slug1 2 2 2 1 1
slug2 2 2 2 1 1
slug3 2 2 2 1 1
slug4 1 1 1 1 0
slug5 1 1 1 0 1
</code></pre>