字典理解计算内部字典中每个键的dict的统计信息

2条回答

网友

1楼 · 编辑于 2024-05-19 21:14:35

此任务应使用pandas：

编辑

熊猫可以帮助你实现你想要的：

In [3]: pd.DataFrame(property2region2value)
Out[3]: 
   countryA  countryB  countryC  countryD  countryE  countryF
a      24.0       3.0       121     123.0    1011.0      1433
b      56.0      98.0     12121    1312.0    1911.0     19829
c      78.0       NaN  12989121    1231.0       NaN      1132
d       NaN       NaN     16171       NaN       NaN      1791

In [4]: df.apply(np.min, axis=1)
Out[4]: 
a       3.0
b      56.0
c      78.0
d    1791.0
dtype: float64

In [5]: df.apply(np.mean, axis=1)
Out[5]: 
a    4.525000e+02
b    5.887833e+03
c    3.247890e+06
d    8.981000e+03
dtype: float64

In [6]: mean_dict = df.apply(np.mean, axis=1).to_dict()

In [7]: mean_dict
Out[7]: {'a': 452.5, 'b': 5887.833333333333, 'c': 3247890.5, 'd': 8981.0}

或者，更容易的是，您可以转置数据帧：

^{pr2}$

如果您想要更精细的控制，您可以选择：

In [24]: df.T.describe().loc[['mean','std','min','max'],:]
Out[24]: 
                a             b             c             d
mean   452.500000   5887.833333  3.247890e+06   8981.000000
std    612.768717   8215.770187  6.494154e+06  10168.195513
min      3.000000     56.000000  7.800000e+01   1791.000000
max   1433.000000  19829.000000  1.298912e+07  16171.000000

In [25]: df.T.describe().loc[['mean','std','min','max'],:].to_dict()
Out[25]: 
{'a': {'max': 1433.0,
       'mean': 452.5,
       'min': 3.0,
       'std': 612.76871656441472},
 'b': {'max': 19829.0,
       'mean': 5887.833333333333,
       'min': 56.0,
       'std': 8215.770187065038},
 'c': {'max': 12989121.0,
       'mean': 3247890.5,
       'min': 78.0,
       'std': 6494153.687626767},
 'd': {'max': 16171.0,
       'mean': 8981.0,
       'min': 1791.0,
       'std': 10168.195513462553}}

从原始答案

这样你就可以很容易地实现你想要的：

In [8]: df.apply(np.min)
Out[8]: 
countryA      24.0
countryB       3.0
countryC     121.0
countryD     123.0
countryE    1011.0
countryF    1132.0
dtype: float64

In [9]: df.apply(np.max)
Out[9]: 
countryA          78.0
countryB          98.0
countryC    12989121.0
countryD        1312.0
countryE        1911.0
countryF       19829.0
dtype: float64

In [10]: df.apply(np.std)
Out[10]: 
countryA    2.217105e+01
countryB    4.750000e+01
countryC    5.620356e+06
countryD    5.424170e+02
countryE    4.500000e+02
countryF    7.960893e+03
dtype: float64

您甚至可以轻松地将所有内容带回词典：

In [11]: df.apply(np.min).to_dict()
Out[11]: 
{'countryA': 24.0,
 'countryB': 3.0,
 'countryC': 121.0,
 'countryD': 123.0,
 'countryE': 1011.0,
 'countryF': 1132.0}

疯了！您的所有数据处理需求将变得更简单：

In [12]: df.describe()
Out[12]: 
        countryA   countryB      countryC     countryD     countryE  \
count   3.000000   2.000000  4.000000e+00     3.000000     2.000000   
mean   52.666667  50.500000  3.254384e+06   888.666667  1461.000000   
std    27.153882  67.175144  6.489829e+06   664.322462   636.396103   
min    24.000000   3.000000  1.210000e+02   123.000000  1011.000000   
25%    40.000000  26.750000  9.121000e+03   677.000000  1236.000000   
50%    56.000000  50.500000  1.414600e+04  1231.000000  1461.000000   
75%    67.000000  74.250000  3.259408e+06  1271.500000  1686.000000   
max    78.000000  98.000000  1.298912e+07  1312.000000  1911.000000   

           countryF  
count      4.000000  
mean    6046.250000  
std     9192.447602  
min     1132.000000  
25%     1357.750000  
50%     1612.000000  
75%     6300.500000  
max    19829.000000

网友

2楼 · 编辑于 2024-05-19 21:14:35

毫无疑问，pandas是这项任务的一个好选择，而以一种简单的python方式，您可以简单地在一个循环中收集每个属性的所有值。在

from collections import defaultdict

xs = defaultdict(list)

for _, vs in property2region2value.items():
    for k, v in vs.items():
        xs[k].append(v)

# xs: defaultdict(<type 'list'>, {'a': [3, 121, 24, 1433, 123, 1011], 'c': [12989121, 78, 1132, 1231], 'b': [98, 12121, 56, 19829, 1312, 1911], 'd': [16171, 1791]})

然后可以对每个项目应用静态。在

编辑

从原始答案

相关问题更多 >

编程相关推荐

热门问题

热门文章

字典理解计算内部字典中每个键的dict的统计信息

编辑

从原始答案

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >