Pandas：如何在现有数据帧的列上设置索引？

**df_1** |Total Assets| Firm 1| 100 | Firm 2| 200 | Firm 3| 300 | **df_2** |AUMS | Firm 1| 300 | Firm 2| 3400 | Firm 3| 800 | Firm 4| 800 | and so on until df_10. Also the firms for all the df could differ.

**Merged_df** Importance| L | H | Category | Cat1 | Cat2 | |Total Assets| AUMs | Firm 1 | 100 | 300 | Firm 2 | 200 | 3400 | Firm 3 | 300 | 800 | Firm 4 | NaN | 800 |

1条回答

网友

1楼 · 发布于 2024-09-25 00:35:43

我们可以使用^{}键在axis=1上^{}：

dfs = [df1, df2]
merged_df = pd.concat(
    dfs, axis=1,
    keys=pd.MultiIndex.from_arrays([
        ['L', 'H'],       # Top Level Keys
        ['Cat1', 'Cat2']  # Second Level Keys
    ], names=['Importance', 'Category'])
)

merged_df：

Importance            L     H
Category           Cat1  Cat2
           Total Assets  AUMS
Firm 1            100.0   300
Firm 2            200.0  3400
Firm 3            300.0   800
Firm 4              NaN   800

^{}可用于建立排序：

dfs = [df1, df2]
# Specify Categorical Types
# These lists should contain _only_ the unique categories
# in the desired order
importance_type = pd.CategoricalDtype(categories=['H', 'L'], ordered=True)
category_type = pd.CategoricalDtype(categories=['Cat1', 'Cat2'], ordered=True)


# Keys should contain the _complete_ list of _all_ columns
merged_df = pd.concat(
    dfs, axis=1,
    keys=pd.MultiIndex.from_arrays([
        pd.Series(['L', 'H'],            # Top Level Keys
                  dtype=importance_type),
        pd.Series(['Cat1', 'Cat2'],      # Second Level Keys
                  dtype=category_type)
    ], names=['Importance', 'Category'])
)

然后可以使用^{}，它将按预期工作H在{}之前，等等

# Sorting Now Works As Expected
merged_df = merged_df.sort_index(level=[0, 1], axis=1)

merged_df：

Importance     H            L
Category    Cat2         Cat1
            AUMS Total Assets
Firm 1       300        100.0
Firm 2      3400        200.0
Firm 3       800        300.0
Firm 4       800          NaN

数据帧：

import pandas as pd

df1 = pd.DataFrame({
    'Total Assets': {'Firm 1': 100, 'Firm 2': 200, 'Firm 3': 300}
})

df2 = pd.DataFrame({
    'AUMS': {'Firm 1': 300, 'Firm 2': 3400, 'Firm 3': 800, 'Firm 4': 800}
})

相关问题更多 >

编程相关推荐

热门问题

热门文章