创建df或其他数组，对满足特定条件的其他df的条目进行计数问题的回答

创建df或其他数组，对满足特定条件的其他df的条目进行计数

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我想你需要<a href="http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing" rel="nofollow noreferrer">^{<cd1>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.crosstab.html" rel="nofollow noreferrer">^{<cd2>}</a>： <pre><code>df1 = df[df['ease'] == 1] df = pd.crosstab(df1['tags'], df1['date']) print (df) date 'date1' 'date2' tags 'tag1' 2 1 'tag2' 0 1 'tag3' 0 1 </code></pre> 另一种解决方案是<code>crosstab</code>将<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd4>}</a>与<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html" rel="nofollow noreferrer">^{<cd5>}</a>一起使用，并对<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unstack.html" rel="nofollow noreferrer">^{<cd6>}</a>进行整形： <pre><code>df = df[df['ease'] == 1].groupby(["date", "tags"]).size().unstack(level=0, fill_value=0) print (df) date 'date1' 'date2' tags 'tag1' 2 1 'tag2' 0 1 'tag3' 0 1 </code></pre> 编辑： 在测试完我发布的解决方案后，需要添加函数<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html" rel="nofollow noreferrer">^{<cd7>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html" rel="nofollow noreferrer">^{<cd8>}</a>，因为如果过滤掉非<code>1</code>值，它会删除最终<code>DataFrame</code>中的行。你知道吗 <pre><code>print (df[df['ease'] == 1].groupby(["date", "tags"]) .size() .unstack(level=0, fill_value=0) .reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0) .sort_index() .sort_index(axis=1)) </code></pre> 还有第二个解决方案： <pre><code>df1 = df[df['ease'] == 1] df2 = pd.crosstab(df1['tags'], df1['date']) .reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0) .sort_index() .sort_index(axis=1) </code></pre> 时间安排： （Psidom的第二个解决方案通常是错误的，所以我从计时中省略了它） <pre><code>np.random.seed(123) N = 10000 dates = pd.date_range('2017-01-01', periods=100) tags = ['tag' + str(i) for i in range(100)] ease = range(10) df = pd.DataFrame({'date':np.random.choice(dates, N), 'tags': np.random.choice(tags, N), 'ease': np.random.choice(ease, N)}) df = df.reindex_axis(['date','tags','ease'], axis=1) #[10000 rows x 3 columns] #print (df) </code></pre> <pre><code>print (df.groupby(["date", "tags"]).agg({"ease": lambda x: (x == 1).sum()}).ease.unstack(level=0).fillna(0)) print (df[df['ease'] == 1].groupby(["date", "tags"]).size().unstack(level=0, fill_value=0).reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0).sort_index().sort_index(axis=1)) def jez(df): df1 = df[df['ease'] == 1] return pd.crosstab(df1['tags'], df1['date']).reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0).sort_index().sort_index(axis=1) print (jez(df)) </code></pre> <hr/> <pre><code>#Psidom solution In [56]: %timeit (df.groupby(["date", "tags"]).agg({"ease": lambda x: (x == 1).sum()}).ease.unstack(level=0).fillna(0)) 1 loop, best of 3: 1.94 s per loop In [57]: %timeit (df[df['ease'] == 1].groupby(["date", "tags"]).size().unstack(level=0, fill_value=0).reindex(index=df.tags.unique(), columns=df.date.unique(), fill_value=0).sort_index().sort_index(axis=1)) 100 loops, best of 3: 5.74 ms per loop In [58]: %timeit (jez(df)) 10 loops, best of 3: 54.5 ms per loop </code></pre>

创建df或其他数组，对满足特定条件的其他df的条目进行计数

1 个回答

相关Python问题