我想使用努比。哪里还有字典。你知道吗
目前我正在尝试这个使用只是努比。哪里如果我有很多类别的话,我的代码就会增加很多。我想使用字典创建一个地图,然后在中使用该地图努比。哪里. 你知道吗
示例数据帧:
dataF = pd.DataFrame({'TITLE':['CEO','CHIEF EXECUTIVE','EXECUTIVE OFFICER','FOUNDER',
'CHIEF OP','TECH OFFICER','CHIEF TECH','VICE PRES','PRESIDENT','PRESIDANTE','OWNER','CO OWNER',
'DIRECTOR','MANAGER',np.nan]})
dataF
TITLE
0 CEO
1 CHIEF EXECUTIVE
2 EXECUTIVE OFFICER
3 FOUNDER
4 CHIEF OP
5 TECH OFFICER
6 CHIEF TECH
7 VICE PRES
8 PRESIDENT
9 PRESIDANTE
10 OWNER
11 CO OWNER
12 DIRECTOR
13 MANAGER
14 NaN
Numpy操作
dataF['TITLE_GRP'] = np.where(dataF['TITLE'].isna(),'NOTAVAILABLE',
np.where(dataF['TITLE'].str.contains('CEO|CHIEF EXECUTIVE|EXECUTIVE OFFICER|FOUN'),'CEO_FOUNDER',
np.where(dataF['TITLE'].str.contains('CHIEF|OFFICER|^CFO$|^COO$|^CIO$|^CTO$|^CMO$'),'OTHER_OFFICERS',
np.where(dataF['TITLE'].str.contains('VICE|VP'),'VP',
np.where(dataF['TITLE'].str.contains('PRESIDENT|PRES'),'PRESIDENT',
np.where(dataF['TITLE'].str.contains('OWNER'),'OWNER_CO_OWN',
np.where(dataF['TITLE'].str.contains('MANAGER|GM|MGR|MNGR|DIR|HEAD|CHAIR'),'DIR_MGR_HEAD'
,dataF['TITLE'])))))))
转换的数据
TITLE TITLE_GRP
0 CEO CEO_FOUNDER
1 CHIEF EXECUTIVE CEO_FOUNDER
2 EXECUTIVE OFFICER CEO_FOUNDER
3 FOUNDER CEO_FOUNDER
4 CHIEF OP OTHER_OFFICERS
5 TECH OFFICER OTHER_OFFICERS
6 CHIEF TECH OTHER_OFFICERS
7 VICE PRES VP
8 PRESIDENT PRESIDENT
9 PRESIDANTE PRESIDENT
10 OWNER OWNER_CO_OWN
11 CO OWNER OWNER_CO_OWN
12 DIRECTOR DIR_MGR_HEAD
13 MANAGER DIR_MGR_HEAD
14 NaN NOTAVAILABLE
我想做的是创建一些映射,如下所示:
TITLE_REPLACE = {'CEO_FOUNDER':'CEO|CHIEF EXECUTIVE|EXECUTIVE OFFICER|FOUN',
'OTHER_OFFICERS':'CHIEF|OFFICER|^CFO$|^COO$|^CIO$|^CTO$|^CMO$',
'VP':'VICE|VP',
'PRESIDENT':'PRESIDENT|PRES',
'OWNER_CO_OWN':'OWNER',
'DIR_MGR_HEAD':'MANAGER|GM|MGR|MNGR|DIR|HEAD|CHAIR'}
然后把它交给一个函数,这个函数应用了逐步numpy运算,得到了与上面相同的结果。你知道吗
我这样做,我必须参数化我的代码,这样所有的参数数据操作将提供一个json文件。你知道吗
我在努力1.更换因为它有字典功能,但是它没有像嵌套中那样保留层次结构np.哪里,它也不能替换整个标题,因为它只是在找到匹配项时替换字符串。你知道吗
如果您能够为上述问题提供解决方案,我还想知道如何解决以下两种其他情况:
dataF['INDUSTRY'] = np.where(dataF['INDUSTRY'].isin(['AEROSPACE','AGRICULTURE/MINING','EDUCATION','ENERGY']),'AER_AGR_MIN_EDU_ENER',
np.where(dataF['INDUSTRY'].isin(['TRAVEL','INSURANCE','GOVERNMENT','FINANCIAL SERVICES','AUTO','PHARMACEUTICALS']),'TRA_INS_GOVT_FIN_AUT_PHAR',
np.where(dataF['INDUSTRY'].isin(['BUSINESS GOODS/SERVICES','CHEMICALS ','TELECOM','TRANSPORTATION']),'BS_CHEM_TELE_TRANSP',
np.where(dataF['INDUSTRY'].isin(['CONSUMER GOODS','ENTERTAINMENT','FOOD AND BEVERAGE','HEALTHCARE','INDUSTRIAL/MANUFACTURING','TECHNOLOGY']),'CG_ENTER_FB_HLTH_IND_TECH',
np.where(dataF['INDUSTRY'].isin(['ADVERTISING','ASSOCIATION','CONSULTING/ACCOUNTING','PUBLISHING/MEDIA','TECHNOLOGY']),'ADV_ASS_CONS_ACC_PUBL_MED_TECH',
np.where(dataF['INDUSTRY'].isin(['RESTAURANT','SOFTWARE']),'REST_SOFT',
'NOTAVAILABLE'))))))
dataF['annual_revn'] = np.where(dataF['annual_revn'].between(1000000,10000000),'1_10_MILLION',
np.where(dataF['annual_revn'].between(10000000,15000000),'10_15_MILLION',
np.where(dataF['annual_revn'].between(15000000,20000000),'15_20_MILLION',
np.where(dataF['annual_revn'].between(20000000,50000000),'20_50_MILLION',
np.where(dataF['annual_revn'].between(50000000,1000000000),'50_1000_MILLION',
'NOTAVAILABLE_OUTLIER')))))
下面的方法是可行的,但它不是特别优雅,也可能没有那么快。你知道吗
对于您的其他场景,使用行业映射数据构造df可能是有意义的,然后执行
df.merge
从行业中确定分组相关问题 更多 >
编程相关推荐