使用pandas创建平均数据帧问题的回答

使用pandas创建平均数据帧

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

可以先将<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html" rel="nofollow noreferrer">^{<cd1>}</a>与参数<code>sheetname=None</code>一起用于<code>Dataframes</code>的<code>dict</code>。然后通过<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html" rel="nofollow noreferrer">^{<cd6>}</a>创建大<code>df</code>，通过第二级<code>index</code>创建<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="nofollow noreferrer">^{<cd7>}</a>，并聚合<code>mean</code>： <pre><code>dict_dfs = pd.read_excel('multiple_sheets.xlsx', sheetname=None) print (dict_dfs) {'sheetname1': a b 0 1 4 1 2 8, 'sheetname2': a b 0 7 1 1 5 0, 'sheetname3': a b 0 4 5} df = pd.concat(dict_dfs) print (df) a b sheetname1 0 1 4 1 2 8 sheetname2 0 7 1 1 5 0 sheetname3 0 4 5 df = df.groupby(level=1).mean() print (df) a b 0 4.0 3.333333 1 3.5 4.000000 </code></pre> 编辑： 数据样本<a href="https://dl.dropboxusercontent.com/u/84444599/multiple_sheets.xlsx" rel="nofollow noreferrer">file</a>： <pre><code>dict_dfs = pd.read_excel('multiple_sheets.xlsx', sheetname=None, index_col=0) df = pd.concat(dict_dfs) df = df.groupby(level=1).mean() print (df) Austria Belgium Denmark France Germany Italy \ Fromcountry Austria 4 0 0 0 0 0 Belgium 0 0 0 2 1 1 Denmark 0 2 0 2 0 1 France 0 0 0 0 6 0 Germany 0 2 0 6 0 0 Italy 0 0 3 0 1 0 Luxembourg 0 0 0 4 0 1 Switzerland 0 1 0 0 0 0 The Netherlands 1 0 5 1 0 2 USA 3 4 0 0 0 0 United Kingdom 2 0 2 2 0 2 Luxembourg Switzerland The Netherlands USA United Kingdom Fromcountry Austria 3 0 6 4.0 1 Belgium 0 0 5 4.0 1 Denmark 0 2 3 5.0 0 France 0 0 4 0.0 0 Germany 0 1 1 0.0 0 Italy 4 1 1 0.0 0 Luxembourg 0 1 3 0.0 1 Switzerland 0 0 7 0.0 2 The Netherlands 0 0 0 0.0 1 USA 0 0 0 0.0 0 United Kingdom 1 0 1 0.0 0 </code></pre> 如果有多个coutry，最后使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html" rel="nofollow noreferrer">^{<cd10>}</a>通过引用<code>index</code>和<code>columns</code>名称进行过滤： <pre><code>#reference sheetname - sheetname1 idx = dict_dfs['sheetname1'].index cols = dict_dfs['sheetname1'].columns df = df.reindex(index=idx, columns=cols) print (df) Austria Belgium Denmark France Germany Italy \ Fromcountry Austria 4 0 0 0 0 0 Belgium 0 0 0 2 1 1 Denmark 0 2 0 2 0 1 France 0 0 0 0 6 0 Germany 0 2 0 6 0 0 Italy 0 0 3 0 1 0 Luxembourg 0 0 0 4 0 1 Switzerland 0 1 0 0 0 0 The Netherlands 1 0 5 1 0 2 United Kingdom 2 0 2 2 0 2 Luxembourg Switzerland The Netherlands United Kingdom Fromcountry Austria 3 0 6 1 Belgium 0 0 5 1 Denmark 0 2 3 0 France 0 0 4 0 Germany 0 1 1 0 Italy 4 1 1 0 Luxembourg 0 1 3 1 Switzerland 0 0 7 2 The Netherlands 0 0 0 1 United Kingdom 1 0 1 0 </code></pre>

使用pandas创建平均数据帧

1 个回答

相关Python问题