h2o python在h2o数据帧中的现有列前面加一个字符串

2024-06-25 22:48:27 发布

您现在位置:Python中文网/ 问答频道 /正文

在python中,如何在h2o数据帧中为现有列添加字符串值前缀?该列以数字开头。我已经能够在rH2O中做到这一点,但在python版本的H2O中,我似乎很难做到或者做不到

在R中这似乎有效。在

h2o.init()
df = as.h2o(mtcars)
df['mpg']=h2o.ascharacter(df['mpg'])
df['mpg']=h2o.sub('','hey--------',df['mpg'])
df

但是,当我尝试用python实现这一点时,会出现各种错误。有时我可以将数值列调整为字符串而不会出错,但是当我去查看数据帧时,我会收到一个错误。如果需要的话,我会把代码贴出来。考虑到它们是相同的函数,我想这应该是相对容易的,但我肯定遗漏了一些东西。在


Tags: 数据字符串版本dfinitas错误数字
1条回答
网友
1楼 · 发布于 2024-06-25 22:48:27

已编辑 (第一次没有回答原始问题,现在回答) 这就是如何将数值列转换为具有字符串值的列,然后替换这些值。在

import h2o
prostate = "http://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv"
h2o.init()
df = h2o.import_file(prostate)
# creating your example column with all values equal to 23
df['mpg'] = 23
df['mpg'] = df['mpg'].ascharacter()
df[1,'mpg'] # see that it is now a string
df['mpg']=df['mpg'].sub('23',  'please-help-me  23')
df
Out[16]:   ID    CAPSULE    AGE    RACE    DPROS    DCAPS    PSA    VOL    GLEASON  mpg
        -    -          -     -    -    -      -            
   1          0     65       1        2        1    1.4    0            6  please-help-me  23
   2          0     72       1        3        2    6.7    0            7  please-help-me  23
   3          0     70       1        1        2    4.9    0            6  please-help-me  23
   4          0     76       2        2        1   51.2   20            7  please-help-me  23
   5          0     69       1        1        1   12.3   55.9          6  please-help-me  23
   6          1     71       1        3        2    3.3    0            8  please-help-me  23
   7          0     68       2        4        2   31.9    0            7  please-help-me  23
   8          0     61       2        4        2   66.7   27.2          7  please-help-me  23
   9          0     69       1        1        1    3.9   24            7  please-help-me  23
  10          0     68       2        1        2   13      0            6  please-help-me  23

[380 rows x 10 columns]

(回答下面的错误问题:) 必须传递一个新的列名列表(与原始列列表的长度相同)。在

df.columns = new_column_list

例如,我可以将列ID重命名为NEW

^{pr2}$

它将显示:

Checking whether there is an H2O instance running at http://localhost:54321. connected.
                              
H2O cluster uptime:         9 hours 31 mins
H2O cluster version:        3.10.4.8
H2O cluster version age:    1 month and 6 days
H2O cluster name:           H2O_from_python_laurend_tzhifp
H2O cluster total nodes:    1
H2O cluster free memory:    3.276 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         locked, healthy
H2O connection url:         http://localhost:54321
H2O connection proxy:
H2O internal security:      False
Python version:             3.5.1 final
                              
Parse progress: |████████████████████████████████████████████████████████████████████████████| 100%
['ID', 'CAPSULE', 'AGE', 'RACE', 'DPROS', 'DCAPS', 'PSA', 'VOL', 'GLEASON']
['NEW', 'CAPSULE', 'AGE', 'RACE', 'DPROS', 'DCAPS', 'PSA', 'VOL', 'GLEASON']

相关问题 更多 >