将数组中的所有分类元素编码为二进制元素

2024-09-20 03:54:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我在做机器学习课程,我有一个任务,把分类值改成二进制值,这样它们就可以和我以前写的算法兼容了

这是我获取数据的地方:

https://archive.ics.uci.edu/ml/datasets/Mushroom

from numpy import genfromtxt
mushroom = genfromtxt('dane/mushroom.csv', delimiter=',' ,dtype = str)

features = mushroom[:,range(1,23)]
classes = mushroom[:,0]


#7. Attribute Information: (classes: edible=e, poisonous=p)
#     1. cap-shape:                bell=b,conical=c,convex=x,flat=f,
#                                  knobbed=k,sunken=s
#     2. cap-surface:              fibrous=f,grooves=g,scaly=y,smooth=s
#     3. cap-color:                brown=n,buff=b,cinnamon=c,gray=g,green=r,
#                                  pink=p,purple=u,red=e,white=w,yellow=y
#     4. bruises?:                 bruises=t,no=f
#     5. odor:                     almond=a,anise=l,creosote=c,fishy=y,foul=f,
#                                  musty=m,none=n,pungent=p,spicy=s
#     6. gill-attachment:          attached=a,descending=d,free=f,notched=n
#     7. gill-spacing:             close=c,crowded=w,distant=d
#     8. gill-size:                broad=b,narrow=n
#     9. gill-color:               black=k,brown=n,buff=b,chocolate=h,gray=g,
#                                  green=r,orange=o,pink=p,purple=u,red=e,
#                                  white=w,yellow=y
#    10. stalk-shape:              enlarging=e,tapering=t
#    11. stalk-root:               bulbous=b,club=c,cup=u,equal=e,
#                                  rhizomorphs=z,rooted=r,missing=?
#    12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
#    13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
#    14. stalk-color-above-ring:   brown=n,buff=b,cinnamon=c,gray=g,orange=o,
#                                  pink=p,red=e,white=w,yellow=y
#    15. stalk-color-below-ring:   brown=n,buff=b,cinnamon=c,gray=g,orange=o,
#                                  pink=p,red=e,white=w,yellow=y
#    16. veil-type:                partial=p,universal=u
#    17. veil-color:               brown=n,orange=o,white=w,yellow=y
#    18. ring-number:              none=n,one=o,two=t
#    19. ring-type:                cobwebby=c,evanescent=e,flaring=f,large=l,
#                                  none=n,pendant=p,sheathing=s,zone=z
#    20. spore-print-color:        black=k,brown=n,buff=b,chocolate=h,green=r,
#                                  orange=o,purple=u,white=w,yellow=y
#    21. population:               abundant=a,clustered=c,numerous=n,
#                                  scattered=s,several=v,solitary=y
#    22. habitat:                  grasses=g,leaves=l,meadows=m,paths=p,
#                                  urban=u,waste=w,woods=d

我有这样一个数组:

x   s   n 
x   s   y

我想改变这样的特征:

0, 0, 1

s码

0, 1, 0

0, 1, 1

是的

1, 0, 0

结果:

 0  1  0,   0, 1, 0,   0, 1, 1
 0  1  0,   0, 1, 0,   1, 0, 0

对于课程来说,这相当容易,所以我不需要帮助

先谢谢你


Tags: redsurfacebuffcolorcapwhiteringorange
1条回答
网友
1楼 · 发布于 2024-09-20 03:54:07
from functools import reduce
import numpy as np
mushroom = genfromtxt('dane/mushroom.csv', delimiter=',' ,dtype = str)

features = mushroom[:,range(1,23)]
classes = mushroom[:,0]

def toBinaryFeatures(features):
    COLUMNS = features.shape[1]
    v = [x + str(i % COLUMNS) for i, x in enumerate(features.flatten())]
    l = features.tolist() 
    uv = list(set(v)) # unique values of all features

    mv = {} # mapping to unique powers of 2
    for i,x in enumerate(uv):
        mv[x] = 2**i

    as_numbers = [reduce((lambda x, y: x | y), [mv[x + str(i)] for i, x in enumerate(row)]) for row in l]
    TO_BIN = "{0:0" + str(len(mv)) +"b}"
    flattened_features = [[int(char) for char in TO_BIN.format(number)] for number in as_numbers]
    return np.array(flattened_features)

相关问题 更多 >