如何使用pandas中的嵌套字典映射变量?

2024-05-17 23:29:11 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,通过在以前的帖子中搜索,我真的找不到任何东西可以帮助我完成我正在尝试做的事情。我也是python方面的新手

本质上,我想做的是找出从一个变量创建多个变量的最简化的方法

假设我的数据是这样的

CaseNumber   Offense
ABC123       1      
ABC123       1
ABC124       24
ABC124       62
ABC125       12
ABC126       10

我想知道如何使用嵌套字典创建变量,如下所示:

offense_variable = { 'Traffic', {1:1},
'Violence', {24:1},
'DUI', {62:1},
'Theft', {12:1},
'Drugs', {10:1}
}

并使用map函数从进攻中的键创建“交通”、“暴力”等变量值

谢谢大家!

编辑:

The goal is essentially to turn this: 

CaseNumber   Offense
ABC123       1      
ABC123       1
ABC124       24
ABC124       62
ABC125       12
ABC126       10

为此:

CaseNumber   Offense   Traffic   Violence    DUI    Theft    Drugs   Flag    
ABC123       1           1         0          0       0       0        1
ABC123       1           1         0          0       0       0        1
ABC124       24          0         1          0       0       0        1 
ABC124       62          0         0          1       0       0        1
ABC125       12          0         0          0       1       0        0
ABC126       10          0         0          0       0       1        1

还有一些添加的功能,包括其他虚拟标志。例如,假设最后一列“flag”为1,则盗窃也将为1,此外,进攻=12


Tags: 事情帖子traffic新手abc123本质drugstheft
1条回答
网友
1楼 · 发布于 2024-05-17 23:29:11

以下是根据OP的评论修改后的回复

from io import StringIO
import pandas as pd

data = '''CaseNumber   Offense
ABC123       1      
ABC123       1
ABC124       24
ABC124       62
ABC125       12
ABC126       10
'''
# create data frame
df = pd.read_csv(StringIO(data), sep='\s+', engine='python')

# create dict of dict
offense_variable = { 'Traffic': {1: 1}, 'Violence': {24: 1},
    'DUI': {62: 1}, 'Theft': {12: 1}, 'Drugs': {10: 1} }

# flatten the offense_variable from nested dicts to ordinary dict
ov = { num: desc
      for desc, vs in offense_variable.items()
      for num, _ in vs.items() }

# use flattened dict to convert Offense (number) to desc (string)
df['offense_desc'] = df['Offense'].map(ov)

# use `.get_dummies()` for one-hot encoding
df = pd.concat([df, pd.get_dummies(df['offense_desc'])], axis=1)

print(df)

  CaseNumber  Offense offense_desc  DUI  Drugs  Theft  Traffic  Violence
0     ABC123        1      Traffic    0      0      0        1         0
1     ABC123        1      Traffic    0      0      0        1         0
2     ABC124       24     Violence    0      0      0        0         1
3     ABC124       62          DUI    1      0      0        0         0
4     ABC125       12        Theft    0      0      1        0         0
5     ABC126       10        Drugs    0      1      0        0         0

相关问题 更多 >