就地更改numpy数组中一组值的数据类型

2条回答

网友

1楼 · 编辑于 2024-10-01 15:40:15

您无法就地更改dtype

In [59]: arr = np.array(list_of_lists)                                                         
In [60]: arr                                                                                   
Out[60]: 
array([['Africa', '1990', '0', '', '32.6'],
       ['Asia', '2006', '32.4', '5.5', '46.6'],
       ['Europe', '2011', '5.4', '', '55.4']], dtype='<U6')

输入的常见数据类型是字符串

将“”替换为nan会将字符串表示形式放入数组中：

In [62]: arr[arr == ""] = np.nan                                                                                       
In [63]: arr                                                                                   
Out[63]: 
array([['Africa', '1990', '0', 'nan', '32.6'],
       ['Asia', '2006', '32.4', '5.5', '46.6'],
       ['Europe', '2011', '5.4', 'nan', '55.4']], dtype='<U6')

查看底层databuffer的一部分：

In [64]: arr.tobytes()                                                                         
Out[64]: b'A\x00\x00\x00f\x00\x00\x00r\x00\x00\x00i\x00\x00\x00c\x00\x00\x00a\x00\x00\x001\x00\x00\x009\x00\x00\x009\x00\x00\....'

请参阅实际字符

数组的一个片段是view，但是astype转换是一个新数组，具有自己的数据缓冲区

In [65]: arr[:,2:]                                                                             
Out[65]: 
array([['0', 'nan', '32.6'],
       ['32.4', '5.5', '46.6'],
       ['5.4', 'nan', '55.4']], dtype='<U6')
In [66]: arr[:,2:].astype(float)                                                               
Out[66]: 
array([[ 0. ,  nan, 32.6],
       [32.4,  5.5, 46.6],
       [ 5.4,  nan, 55.4]])

如果不将Out[66]转换回字符串，则无法将其写回arr

可以创建对象数据类型数组：

In [67]: arr = np.array(list_of_lists, dtype=object)                                           
In [68]: arr                                                                                   
Out[68]: 
array([['Africa', '1990', '0', '', '32.6'],
       ['Asia', '2006', '32.4', '5.5', '46.6'],
       ['Europe', '2011', '5.4', '', '55.4']], dtype=object)
In [69]: arr = np.array(list_of_lists, dtype=object)                                           
In [70]: arr[arr == ""] = np.nan                                                               
In [71]: arr                                                                                   
Out[71]: 
array([['Africa', '1990', '0', nan, '32.6'],
       ['Asia', '2006', '32.4', '5.5', '46.6'],
       ['Europe', '2011', '5.4', nan, '55.4']], dtype=object)
In [72]: arr[:,2:] = arr[:,2:].astype(float)                                                   
In [73]: arr                                                                                   
Out[73]: 
array([['Africa', '1990', 0.0, nan, 32.6],
       ['Asia', '2006', 32.4, 5.5, 46.6],
       ['Europe', '2011', 5.4, nan, 55.4]], dtype=object)

dtype仍然是object，但元素的类型可以更改—这是因为object dtype是一个美化（或降级）的列表。您获得了一些灵活性，但失去了大多数numpy数字速度

另一个答案中显示的结构化数组（复合数据类型）是另一种可能性。在加载csv（使用np.genfromtxt）时，很容易生成这种数组。您仍然无法就地更改数据类型。而且你不能在一个结构化数组的字段之间做数学运算

熊猫

In [153]: df = pd.DataFrame(list_of_lists)                                                     
In [154]: df                                                                                   
Out[154]: 
        0     1     2    3     4
0  Africa  1990     0       32.6
1    Asia  2006  32.4  5.5  46.6
2  Europe  2011   5.4       55.4
In [156]: df.info()                                                                            
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
0    3 non-null object
1    3 non-null object
2    3 non-null object
3    3 non-null object
4    3 non-null object
dtypes: object(5)
memory usage: 248.0+ bytes

转换列数据类型：

In [158]: df[2].astype(float)   
In [162]: df[4]=df[4].astype(float)

第3列需要nan转换才能转换

In [164]: df                                                                                   
Out[164]: 
        0     1     2    3     4
0  Africa  1990   0.0       32.6
1    Asia  2006  32.4  5.5  46.6
2  Europe  2011   5.4       55.4
In [165]: df.dtypes                                                                            
Out[165]: 
0     object
1     object
2    float64
3     object
4    float64
dtype: object

这里有更好的pandas程序员；我更关注numpy

网友

2楼 · 编辑于 2024-10-01 15:40:15

似乎您需要一个结构化数组来处理多个数据类型

list_of_lists = [["Africa", "1990", "0", "", "32.6"], ["Asia", "2006", "32.4", "5.5", "46.6"],
                 ["Europe", "2011", "5.4", "", "55.4"]]

temp = np.array(list_of_lists)
temp[temp==''] = 0

dtypes = np.dtype([('name','S10'),
    ('val1', np.float),
    ('val2',np.float),
    ('val3',np.float),
    ('val4',np.float)])

array = np.array(list(map(tuple, temp)), dtype=dtypes)

# Now you can modify the structured array
array[['val3', 'val4']]=20
array[0]['name'] = 'Australia'

问题是你可以假装这些是列，但答案是否定的，它只是一个结构，形状是(3,)，我建议切换到pandas dataframe

import pandas as pd

array = pd.DataFrame(list_of_lists)
array.replace('', '0', inplace=True)
array[data.columns[2:]] = array[array.columns[2:]].astype(float)

array.dtypes

# 0 object
# 1 object
# 2 float64
# 3 float64
# 4 float64
# dtype: object

熊猫

相关问题更多 >

编程相关推荐

热门问题

热门文章