Dask+Pandas：返回一系列条件假人

2条回答

网友

1楼 · 编辑于 2024-09-27 23:26:28

这对我也很有用：

data['ebt_dummy'] = dd.from_array(np.where((df["paymenttypeid"]=='ebt'), 1, 0))

网友

2楼 · 编辑于 2024-09-27 23:26:28

以下是一些示例数据：

In [1]:
df = pd.DataFrame(np.transpose([np.random.choice(['ebt','other'], (10)),
              np.random.rand(10)]), columns=['paymenttypeid','other'])

df

Out[1]:

  paymenttypeid                 other
0         other    0.3130770966143612
1         other    0.5167434068096931
2           ebt    0.7606898392115471
3           ebt    0.9424572692382547
4           ebt     0.624282017575857
5           ebt    0.8584841824784487
6         other    0.5017083765654611
7         other  0.025994123211164233
8           ebt   0.07045354449612984
9           ebt   0.11976351556850084

让我们把它转换成一个数据帧

^{pr2}$

并使用apply（在序列上）来分配：

In [3]:
data['ebt_dummy'] = data.paymenttypeid.apply(lambda x: 1 if x =='ebt' else 0, meta=('paymenttypeid', 'str'))
data.compute()

Out [3]:
  paymenttypeid                 other  ebt_dummy
0         other    0.3130770966143612          0
1         other    0.5167434068096931          0
2           ebt    0.7606898392115471          1
3           ebt    0.9424572692382547          1
4           ebt     0.624282017575857          1
5           ebt    0.8584841824784487          1
6         other    0.5017083765654611          0
7         other  0.025994123211164233          0
8           ebt   0.07045354449612984          1
9           ebt   0.11976351556850084          1

更新：

您传递的meta似乎是问题所在，因为这是有效的：

data = data.map_partitions(lambda df: df.assign(
                                    ebt_dummy = np.where((df["paymenttypeid"]=='ebt'), 1, 0)))

data.compute()

在我的示例中，如果我想指定meta，那么我必须传递当前data的数据类型，而不是我分配后预期的类型：

data.map_partitions(lambda df: df.assign(
                                    ebt_dummy = np.where((df["paymenttypeid"]=='ebt'), 1, 0)), 
               meta={'paymenttypeid': 'str', 'other': 'float64'})

相关问题更多 >

编程相关推荐

热门问题

热门文章

Dask+Pandas：返回一系列条件假人

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >