向数据帧添加具有单个分类值的列

网友
1楼 · 编辑于 2024-10-03 13:27:22

这个解决方案肯定能解决第一点，但不一定能解决第二点：
df['col'] = pd.Categorical(('hello' for i in len(df)))
本质上
我们首先创建一个长度等于df中记录数的“hello”生成器
然后我们将其传递给pd.Categorical，使其成为一个分类列

网友
2楼 · 编辑于 2024-10-03 13:27:22

一种简单的方法是使用df.assign创建新变量，然后使用df.astype以及特定列的数据类型字典将数据类型更改为category
df = df.assign(col="hello").astype({'col':'category'}) df.dtypes
A int64 col category dtype: object
这样，您就不必创建一系列长度等于数据帧的数据。您可以直接广播输入字符串，这样会节省一些时间和内存
正如您所看到的，这种方法是非常可伸缩的。您可以根据需要分配多个变量，有些变量还基于复杂函数。然后根据需要为它们设置数据类型
df = pd.DataFrame({'A':[1,2,3,4]}) df = (df.assign(col1 = 'hello', #Define column based on series or broadcasting col2 = lambda x:x['A']**2, #Define column based on existing columns col3 = lambda x:x['col2']/x['A']) #Define column based on previously defined columns .astype({'col1':'category', 'col2':'float'})) print(df) print(df.dtypes)
A col1 col2 col3 0 1 hello 1.0 1.0 1 2 hello 4.0 2.0 2 3 hello 9.0 3.0 3 4 hello 16.0 4.0 A int64 col1 category #<-changed dtype col2 float64 #<-changed dtype col3 float64 dtype: object

网友
3楼 · 编辑于 2024-10-03 13:27:22

我们可以显式构建正确大小和类型的序列，而不是通过__setitem__隐式构建，然后转换：

df['col'] = pd.Series('hello', index=df.index, dtype='category')

示例程序：

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})

df['col'] = pd.Series('hello', index=df.index, dtype='category')

print(df)
print(df.dtypes)
print(df['col'].cat.categories)

   a    col
0  1  hello
1  2  hello
2  3  hello

a         int64
col    category
dtype: object

Index(['hello'], dtype='object')

相关问题更多 >

编程相关推荐

热门问题

热门文章