将列编码为分类值 - 问答 - Python中文网

将列编码为分类值

2024-10-03 13:19:44 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我有如下数据帧：

d = {'item': [1, 2,3,4,5,6], 'time': [1297468800, 1297468809, 12974688010, 1297468890, 1297468820,1297468805]}
df = pd.DataFrame(data=d)

df的输出如下：

   item         time
0     1   1297468800
1     2   1297468809
2     3   1297468801
3     4   1297468890
4     5   1297468820
5     6   1297468805

这里的time基于unixsystem时间。我的目标是替换数据帧中的time列。你知道吗

例如

mintime = 1297468800
maxtime = 1297468890

我想把时间分割成10（可以通过使用类似于20个区间的参数来更改）区间，并重新编码df中的time列。例如

   item         time
0     1          1
1     2          1
2     3          1
3     4          9
4     5          3
5     6          1

既然我有上亿张唱片，那么最有效的方法是什么？谢谢

Tags：数据目标 dataframe 编码 df data 参数 time

1条回答

网友

1楼 · 发布于 2024-10-03 13:19:44

您可以使用pd.cut和np.linspace来指定箱子。这将对列进行分类编码，然后可以从中按顺序提取代码：

bins = np.linspace(df.time.min() - 1, df.time.max(), 10)
df['time'] = pd.cut(df.time, bins=bins, right=True).cat.codes + 1
df

   item  time
0     1     1
1     2     1
2     3     1
3     4     9
4     5     3
5     6     1

或者，根据处理间隔边的方式，也可以这样做

bins = np.linspace(df.time.min(), df.time.max() + 1, 10)
pd.cut(df.time, bins=bins, right=False).cat.codes + 1

0    1
1    1
2    1
3    9
4    2
5    1
dtype: int8

相关问题更多 >

编程相关推荐

热门问题

热门文章