我有一个数据框,其中一列在我请求唯一值时输出以下内容(我最初考虑的是,如果组合较少,则手动映射计数):
df.amenities.unique()
array(['{TV,Wifi,Kitchen,Elevator,Heating,Washer,"First aid kit","Fire extinguisher",Essentials,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Private entrance"}',
'{TV,Wifi,Kitchen,"Free parking on premises","Indoor fireplace",Heating,"Family/kid friendly",Washer,"First aid kit","Fire extinguisher",Essentials,"Lock on bedroom door",Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Private entrance"}'])
为了处理这个便利设施阵列,我决定首先去掉引号:
df['amenities'] = df['amenities'].str.replace('"', '')
我的策略是计算每个数组元素中出现的逗号数,添加1以说明后面缺少的逗号,并使用reset_index命名我希望在其中显示计数的列
(df['amenities'].str.count(',').add(1).sum().reset_index(name='amenities_count'))
这不太有效,因为我得到了错误:
AttributeError: 'numpy.int64' object has no attribute 'reset_index'
如果可能的话,你能解释一下为什么这不是一个好的方法,什么是一个好的选择
谢谢你抽出时间
回应伯纳德:
Dataframe:
Apt Counties amenities
S1 C1 {TV, "Kitchen", "WiFi"}
S1 C1 {"Hair dryer"}
S2 C1 {"Heating", Essentials}
S2 C2 {"Cable", Kitchen, "WiFi"}
Output:
Apt Counties amenities amenities_counts
S1 C1 {TV, "Kitchen", "WiFi"} 3
S1 C1 {"Hair dryer"} 1
S2 C1 {"Heating", Essentials} 2
S2 C2 {"Cable", Kitchen, "WiFi"} 3
作为示例,计算
','
加1并将其分配给新列相关问题 更多 >
编程相关推荐