Dask+Pandas：返回一系列条件假人问题的回答

Dask+Pandas：返回一系列条件假人

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

以下是一些示例数据： <pre><code>In [1]: df = pd.DataFrame(np.transpose([np.random.choice(['ebt','other'], (10)), np.random.rand(10)]), columns=['paymenttypeid','other']) df Out[1]: paymenttypeid other 0 other 0.3130770966143612 1 other 0.5167434068096931 2 ebt 0.7606898392115471 3 ebt 0.9424572692382547 4 ebt 0.624282017575857 5 ebt 0.8584841824784487 6 other 0.5017083765654611 7 other 0.025994123211164233 8 ebt 0.07045354449612984 9 ebt 0.11976351556850084 </code></pre> 让我们把它转换成一个数据帧 ^{pr2}$ 并使用<code>apply</code>（在序列上）来分配： <pre><code>In [3]: data['ebt_dummy'] = data.paymenttypeid.apply(lambda x: 1 if x =='ebt' else 0, meta=('paymenttypeid', 'str')) data.compute() Out [3]: paymenttypeid other ebt_dummy 0 other 0.3130770966143612 0 1 other 0.5167434068096931 0 2 ebt 0.7606898392115471 1 3 ebt 0.9424572692382547 1 4 ebt 0.624282017575857 1 5 ebt 0.8584841824784487 1 6 other 0.5017083765654611 0 7 other 0.025994123211164233 0 8 ebt 0.07045354449612984 1 9 ebt 0.11976351556850084 1 </code></pre> <hr/> 更新： 您传递的<code>meta</code>似乎是问题所在，因为这是有效的： <pre><code>data = data.map_partitions(lambda df: df.assign( ebt_dummy = np.where((df["paymenttypeid"]=='ebt'), 1, 0))) data.compute() </code></pre> 在我的示例中，如果我想指定<code>meta</code>，那么我必须传递当前<code>data</code>的数据类型，而不是我分配后预期的类型： <pre><code>data.map_partitions(lambda df: df.assign( ebt_dummy = np.where((df["paymenttypeid"]=='ebt'), 1, 0)), meta={'paymenttypeid': 'str', 'other': 'float64'}) </code></pre>

Dask+Pandas：返回一系列条件假人

1 个回答

相关Python问题