对象数据类型在Python中是静态的吗?

2024-09-30 01:30:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含两列的数据帧:一列包含字符串,另一列包含整数。正如预期的那样,整数列的数据类型是int64。但是,对于string列,它是object。你知道吗

现在我想通过为每个字符串指定一个给定的整数,将字符串列转换为整数列。我这样做如下:

from pandas import DataFrame

# Create a data frame with two columns:
# - `catCol' represents categorical data and consists of strings
# - `intCol' represents numerical data and consists of integers
myList = {'catCol': ['NM', 'VT', 'VA', 'NY', 'VA'], 'intCol': [3, 6, 10, -1, 0]}
df = DataFrame(myList)

print('Before the mapping:')
print(df)
print('Data type of `catCol`:', df['catCol'].dtype)
print('Data type of a `catCol` element:', type(df['catCol'][3]))
print('Data type of `intCol`:', df['intCol'].dtype)
print('Data type of a `intCol` elements:', type(df['intCol'][3]))

# Replace the categorical columns with unique integers IDs.
fromList = df['catCol'].unique()
toList = list(range(len(fromList)))

for idx in range(len(fromList)):
    df.loc[df['catCol'] == fromList[idx], 'catCol'] = toList[idx]

print()
print('After the mapping:')
print(df)    
print('Data type of `catCol`:', df['catCol'].dtype)
print('Data type of a `catCol` element:', type(df['catCol'][3]))
print('Data type of `intCol`:', df['intCol'].dtype)
print('Data type of a `intCol` elements:', type(df['intCol'][3]))

输出为:

Before the mapping:
  catCol  intCol
0     NM       3
1     VT       6
2     VA      10
3     NY      -1
4     VA       0
Data type of `catCol`: object
Data type of a `catCol` element: <class 'str'>
Data type of `intCol`: int64
Data type of a `intCol` elements: <class 'numpy.int64'>

After the mapping:
  catCol  intCol
0      0       3
1      1       6
2      2      10
3      3      -1
4      2       0
Data type of `catCol`: object
Data type of a `catCol` element: <class 'int'>
Data type of `intCol`: int64
Data type of a `intCol` elements: <class 'numpy.int64'>

问题来了:如果转换后的catCol现在只包含整数,为什么它仍然是一个对象数据类型?我需要它是一个整数数据类型,就像intCol。我怎样才能在不使用任何石膏的情况下修复这个?你知道吗


Tags: ofthedfdatatype整数elementselement
1条回答
网友
1楼 · 发布于 2024-09-30 01:30:17

在这种情况下,我将使用map()函数:

In [84]: toList = pd.Series(range(len(df['catCol'].unique())), index=df['catCol'].unique())

In [85]: toList
Out[85]:
NM    0
VT    1
VA    2
NY    3
dtype: int32

In [86]: df.catCol.map(toList)
Out[86]:
0    0
1    1
2    2
3    3
4    2
Name: catCol, dtype: int32

In [87]: df['catCol'] = df.catCol.map(toList)

In [88]: df
Out[88]:
   catCol  intCol
0       0       3
1       1       6
2       2      10
3       3      -1
4       2       0

In [89]: df.dtypes
Out[89]:
catCol    int32
intCol    int64
dtype: object

相关问题 更多 >

    热门问题