Concat 2个数据帧，54个条目，得到1个 - 问答 - Python中文网

Concat 2个数据帧，54个条目，得到1个

2024-10-02 00:32:02 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我已经创建了两个数据帧，其中有一个基于年份和地区的通用索引。每个数据帧中有58行，年份和地区是完全匹配的。然而，当我尝试加入它们时，我得到了一个新的数据帧，所有列都合并在一起（这就是我想要的），但只有一行-纽约市。这一行在两个数据帧中都存在，其余的都存在，但只有这一行进入合并的DF。我尝试了几种不同的方法来连接数据帧，但它们都做相同的事情。本例使用：

pd.concat([ groupeddf,Popdf], axis=1)

这是以（年份，地区）为索引的Popdf：

                            Population
Year District                
2017 Albany                 309612
     Allegany               46894
     Broome                 193639
     Cattaraugus            77348
     Cayuga                 77603

这是按年份和地区编制的groupeddf索引（为清楚起见，删除了一些列）：

                            Total SNAP Households  Total SNAP Persons  \
Year District                                                 
2017 Albany                 223057                 416302   
     Allegany               36935                  69802   
     Broome                 201586                 363504   
     Cattaraugus            75567                  144572   
     Cayuga                 64168                  121988

这是执行pd.concat([ groupeddf,Popdf], axis=1)后的合并DF：

                     Population       Total SNAP Households  Total SNAP Persons  
Year District                                                               
2017 New York City      8622698       11314598               19987958

这表明合并的数据帧只有一个条目：

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1 entries, (2017, New York City) to (2017, New York City)
Data columns (total 4 columns):
Population               1 non-null int64
Total SNAP Households    1 non-null int64
Total SNAP Persons       1 non-null int64
Total SNAP Benefits      1 non-null float64
dtypes: float64(1), int64(3)
memory usage: 170.0+ bytes

更新：我尝试了另一种方法，它证明了在我看来完全相同的索引并不是完全相同的。你知道吗

当我执行这段代码时，我得到的是重复的而不是合并：

combined_df = groupeddf.merge(Popdf, how='outer',  left_index=True,   right_index=True)

结果如下：

Year District                                                   
2017 Albany                      223057.0            416302.0   
    Albany                           NaN                 NaN   
    Allegany                     36935.0             69802.0   
    Allegany                         NaN                 NaN   
    Broome                      201586.0            363504.0   
    Broome                           NaN                 NaN   
    Cattaraugus                  75567.0            144572.0   
    Cattaraugus                      NaN                 NaN

唯一的例外是你去纽约的时候。这一个不重复，所以实际上被视为同一个索引。所以数据有问题，但我不知道是什么。你知道吗

Tags：数据 nan year 地区 total 年份 non snap

2条回答

网友

1楼 · 编辑于 2024-10-02 00:32:02

你试过使用合并吗，像这样：

combined_df = merge(groupeddf, Popdf, how = 'inner', on = ['Year','District'])

如果您只想在两个数据帧中都存在地区和年份的情况下进行组合，那么我就做了。如果您想将所有数据都保留在左边的数据帧上，但是只从右边匹配，那么就执行左连接，等等

网友

2楼 · 编辑于 2024-10-02 00:32:02

花了一段时间，但我终于解决了。人口数据框中的地区名称在名称末尾有一个空格，SNAP df中没有空格。你知道吗

"Albany " vs "Albany"

相关问题更多 >

编程相关推荐

热门问题

热门文章