生成包含4个字符串键/数值的哈希表

2024-05-19 20:26:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个CSV文件,我正在读取到一个数据帧中

ZoneMaterialName1,ZoneThickness1,ZoneMaterialName2,ZoneThickness2,ZoneMaterialName3,ZoneThickness3,ZoneMaterialName4,ZoneThickness4
Copper,2.5,Silver,5,Gold,12,Selenium,6
Copper,2.5,Silver,5,Gold,12,Selenium,6
Copper,2,Silver,8,Gold,2,Selenium,3
Aluminium,3,Sodium,14,,,Titanium,5
Aluminium,13,Sodium,5,,,Titanium,15

我想生成一个哈希表,该表的键由唯一的4个ZoneMaterialName字段组成,值是它们各自的ZoneThickness数字字段

期望输出的示例:

Copper,Silver,Gold,Selenium:[[2.5,5,12,6],[2,8,2,3]]
Aluminium,Sodium,,Titanium:[[3,14,,5],[13,5,,15]]

如果一个唯一键的四个值再次出现,它们将被忽略,并且只取唯一的值

有时,每行中可能会有一些空字段,但如上所示,在哈希表的键和值中也应该考虑这些字段

我无法有效地做到这一点

import pandas as pd
import numpy as np
df = pd.read_csv('/mnt/c/python_test/Materials.csv')
myfilter = ~df.ZoneMaterialName1.duplicated(keep='first') & \
           ~df.ZoneMaterialName2.duplicated(keep='first') & \
           ~df.ZoneMaterialName3.duplicated(keep='first') & \
           ~df.ZoneMaterialName34.duplicated(keep='first') & \
df.ix[myfilter, 'uniqueID'] = np.arange(myfilter.sum(), dtype='int')
print df

我是熊猫队的新手,因此非常感谢您的帮助/指导


Tags: dfsilverseleniumtitaniumfirstkeepduplicatedcopper
1条回答
网友
1楼 · 发布于 2024-05-19 20:26:52
import pandas as pd
import numpy as np
df = pd.read_csv('/mnt/c/python_test/Materials.csv')

# replace nan with 'NA' for material names so they are not excluded from groupby
df[['ZoneMaterialName1','ZoneMaterialName2','ZoneMaterialName3','ZoneMaterialName4']] =df[['ZoneMaterialName1','ZoneMaterialName2','ZoneMaterialName3','ZoneMaterialName4']].fillna('NA')

# Get List of thickness all values for each row
df['combined'] = df.apply(lambda row: [row['ZoneThickness1'],row['ZoneThickness2'],row['ZoneThickness3'],row['ZoneThickness4']], axis=1)

# Groupby target columns, making a list of lists of thicknesses
df.groupby(['ZoneMaterialName1','ZoneMaterialName2','ZoneMaterialName3','ZoneMaterialName4'])['combined'].apply(list)

# Get rid of duplicates
df['combined'] = df['combined'].apply(lambda x: set(tuple(i) for i in x))

输出

 ZoneMaterialName1 ZoneMaterialName2 ZoneMaterialName3 ZoneMaterialName4                                 combined
0         Aluminium            Sodium                NA          Titanium  {(3.0, 14, nan, 5), (13.0, 5, nan, 15)}
1            Copper            Silver              Gold          Selenium    {(2.0, 8, 2.0, 3), (2.5, 5, 12.0, 6)}

相关问题 更多 >