<p>我想你需要<a href="http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing" rel="nofollow noreferrer">^{<cd1>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.crosstab.html" rel="nofollow noreferrer">^{<cd2>}</a>:</p>
<pre><code>df1 = df[df['ease'] == 1]
df = pd.crosstab(df1['tags'], df1['date'])
print (df)
date 'date1' 'date2'
tags
'tag1' 2 1
'tag2' 0 1
'tag3' 0 1
</code></pre>
<p>另一种解决方案是<code>crosstab</code>将<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd4>}</a>与<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html" rel="nofollow noreferrer">^{<cd5>}</a>一起使用,并对<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html" rel="nofollow noreferrer">^{<cd6>}</a>进行整形:</p>
<pre><code>df = df[df['ease'] == 1].groupby(["date", "tags"]).size().unstack(level=0, fill_value=0)
print (df)
date 'date1' 'date2'
tags
'tag1' 2 1
'tag2' 0 1
'tag3' 0 1
</code></pre>
<p>编辑:</p>
<p>在测试完我发布的解决方案后,需要添加函数<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html" rel="nofollow noreferrer">^{<cd7>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html" rel="nofollow noreferrer">^{<cd8>}</a>,因为如果过滤掉非<code>1</code>值,它会删除最终<code>DataFrame</code>中的行。你知道吗</p>
<pre><code>print (df[df['ease'] == 1].groupby(["date", "tags"])
.size()
.unstack(level=0, fill_value=0)
.reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0)
.sort_index()
.sort_index(axis=1))
</code></pre>
<p>还有第二个解决方案:</p>
<pre><code>df1 = df[df['ease'] == 1]
df2 = pd.crosstab(df1['tags'], df1['date'])
.reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0)
.sort_index()
.sort_index(axis=1)
</code></pre>
<p>时间安排:</p>
<p>(Psidom的第二个解决方案通常是错误的,所以我从计时中省略了它)</p>
<pre><code>np.random.seed(123)
N = 10000
dates = pd.date_range('2017-01-01', periods=100)
tags = ['tag' + str(i) for i in range(100)]
ease = range(10)
df = pd.DataFrame({'date':np.random.choice(dates, N),
'tags': np.random.choice(tags, N),
'ease': np.random.choice(ease, N)})
df = df.reindex_axis(['date','tags','ease'], axis=1)
#[10000 rows x 3 columns]
#print (df)
</code></pre>
<pre><code>print (df.groupby(["date", "tags"]).agg({"ease": lambda x: (x == 1).sum()}).ease.unstack(level=0).fillna(0))
print (df[df['ease'] == 1].groupby(["date", "tags"]).size().unstack(level=0, fill_value=0).reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0).sort_index().sort_index(axis=1))
def jez(df):
df1 = df[df['ease'] == 1]
return pd.crosstab(df1['tags'], df1['date']).reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0).sort_index().sort_index(axis=1)
print (jez(df))
</code></pre>
<hr/>
<pre><code>#Psidom solution
In [56]: %timeit (df.groupby(["date", "tags"]).agg({"ease": lambda x: (x == 1).sum()}).ease.unstack(level=0).fillna(0))
1 loop, best of 3: 1.94 s per loop
In [57]: %timeit (df[df['ease'] == 1].groupby(["date", "tags"]).size().unstack(level=0, fill_value=0).reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0).sort_index().sort_index(axis=1))
100 loops, best of 3: 5.74 ms per loop
In [58]: %timeit (jez(df))
10 loops, best of 3: 54.5 ms per loop
</code></pre>