在python中如何将字典转换为矩阵?

2024-05-03 08:53:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一本这样的字典:

{设备1:(新闻1,新闻2,…),设备2:(新闻2,新闻4,…)}

如何在python中将它们转换成二维0-1矩阵?看起来像这样:

         news1 news2 news3 news4
device1    1     1     0      0
device2    0     1     0      1
device3    1     0     0      1

Tags: 字典矩阵新闻中将device1device2news1device3
3条回答

下面是将字典转换为矩阵的另一种选择:

# Load library
from sklearn.feature_extraction import DictVectorizer

# Our dictionary of data
data_dict = [{'Red': 2, 'Blue': 4},
             {'Red': 4, 'Blue': 3},
             {'Red': 1, 'Yellow': 2},
             {'Red': 2, 'Yellow': 2}]
# Create DictVectorizer object
dictvectorizer = DictVectorizer(sparse=False)

# Convert dictionary into feature matrix
features = dictvectorizer.fit_transform(data_dict)
print(features)
#output
'''
[[4. 2. 0.]
 [3. 4. 0.]
 [0. 1. 2.]
 [0. 2. 2.]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['Blue', 'Red', 'Yellow']
'''

下面是一些使用numpy包创建矩阵(或2D数组)的代码。注意,我们必须按顺序使用一个名称列表,因为字典不一定按输入的顺序存储键/值。

import numpy as np

dataDict = {'device1':(1,1,0,1), 'device2':(0,1,0,1), 'device3':(1,0,0,1)}
orderedNames = ['device1','device2','device3']

dataMatrix = np.array([dataDict[i] for i in orderedNames])

print dataMatrix

输出为:

[[1 1 0 1]
 [0 1 0 1]
 [1 0 0 1]]

再加上这一点,因为我认为之前的答案假设你的数据结构不同,不会直接解决你的问题。

假设我正确理解了您的数据结构,并且矩阵中索引的名称并不重要:

from sklearn.feature_extraction import DictVectorizer

dict = {'device1':['news1', 'news2'],
        'device2':['news2', 'news4'],
        'device3':['news1', 'news4']}

restructured = []

for key in dict:
    data_dict = {}
    for news in dict[key]:
        data_dict[news] = 1
    data_dict['news3'] = 0
    restructured.append(data_dict)

#restructured should now look like
'''
[{'news1':1, 'news2':1, 'news3':0},
 {'news2':1, 'news4':1, 'news3':0},
 {'news1':1, 'news4':1, 'news3':0}]
'''

dictvectorizer = DictVectorizer(sparse=False)
features = dictvectorizer.fit_transform(restructured)

print(features)

#output
'''
[[1, 1, 0, 0],
 [0, 1, 1, 0],
 [1, 0, 1, 0]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['news1', 'news2', 'news4', 'news3']
'''

相关问题 更多 >