更改数据帧的索引:获取属性错误

2024-10-03 19:33:02 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我正在使用Python尝试更改数据帧的索引。 这是我的密码:

df = pd.read_csv("data_file.csv", na_values=' ')
table = df['HINCP'].groupby(df['HHT'])
print(table.describe()[['mean', 'std', 'count', 'min', 'max']].sort_values('mean', ascending=False))

以下是当前的数据帧:

              mean            std    count      min        max
HHT                                                           
1.0  106790.565562  100888.917804  25495.0  -5100.0  1425000.0
5.0   79659.567376   74734.380152   1410.0      0.0   625000.0
7.0   69055.725901   63871.751863   1193.0      0.0   645000.0
2.0   64023.122122   59398.970193   1998.0      0.0   610000.0
3.0   49638.428821   48004.399101   5718.0  -5100.0   609000.0
4.0   48545.356298   60659.516163   5835.0  -5100.0   681000.0
6.0   37282.245015   44385.091076   8024.0 -11200.0   676000.0

我希望索引值是这样的,而不是编号为1,2,…,7:

Married couple household
Nonfamily household:Male 
Nonfamily household:Female 
Other family household:Male 
Other family household:Female 
Nonfamily household:Male 
Nonfamily household:Female 

我尝试使用set_index()作为table的属性,其中我将键设置为我想要的上面的索引列表,但这会导致以下错误:

AttributeError: 'SeriesGroupBy' object has no attribute 'set_index'

我还想知道是否有任何方法可以改变索引顶部的HHT标签,或者改变索引值会带来什么


Tags: csv数据dfcounttableminmeanmax
2条回答

使用dict映射到map带有HHT标签的HHT数值将更加健壮:

hht_map = {
    1: 'Married couple household',
    2: 'Nonfamily household:Male',
    3: 'Nonfamily household:Female',
    4: 'Other family household:Male',
    5: 'Other family household:Female',
    6: 'Nonfamily household:Male',
    7: 'Nonfamily household:Female',
}
df.index = df.index.map(hht_map)
print(df)
^{tb1}$

编辑:在分组之前,请在pums_df上尝试映射

使用map创建一个新的label列:

pums_df['label'] = pums_df.HHT.map(hht_map)

使用新的labelgroupby

table = pums_df['HINCP'].groupby(pums_df['label'])
>>> df = pd.DataFrame(columns = ["HHT", "HINC"], data = np.transpose([[2,3,2,2,2,3,3,3,4], [1,1,3,1,4,7,8,9,11]]))
>>> df
   HHT  HINC
0    2     1
1    3     1
2    2     3
3    2     1
4    2     4
5    3     7
6    3     8
7    3     9
8    4    11
>>> table = df['HINC'].groupby(df['HHT'])
>>> td = table.describe()
>>> df2 = pd.DataFrame(td)
>>> df2.index = ['lab1', 'lab2', 'lab3']
>>> df2
      count   mean       std   min   25%   50%    75%   max
lab1    4.0   2.25  1.500000   1.0   1.0   2.0   3.25   4.0
lab2    4.0   6.25  3.593976   1.0   5.5   7.5   8.25   9.0
lab3    1.0  11.00       NaN  11.0  11.0  11.0  11.00  11.0

相关问题 更多 >