pandaps:在从第一列拆分每一行的同时创建另一列

2024-10-02 12:23:12 发布

您现在位置:Python中文网/ 问答频道 /正文

目标从第一列创建第二列

column1, column2
Hello World, #HelloWord
US Election, #USElection

我有一个只有一列的简单文件

^{pr2}$

我写了以下函数

>>> def newColumn(row):
...     r = "#" + "".join(row.split(" "))
...     return r

然后,我使用pandas创建了第二个专栏

df['column2'] = df.apply (lambda row: newColumn(row),axis=1)

但我最终得出以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 3972, in apply
    return self._apply_standard(f, axis, reduce=reduce)
  File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 4064, in _apply_standard
    results[i] = func(v)
  File "<stdin>", line 1, in <lambda>
  File "<stdin>", line 2, in newColumn
  File "/Users/anuradha_uduwage/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 2360, in __getattr__
    (type(self).__name__, name))
AttributeError: ("'Series' object has no attribute 'split'", u'occurred at index 0')

因此,我将拆分改为:

r = "".join(row.str.split(" "))

但这没用


Tags: inpandaslibstdinlinesiteusersfile
3条回答

尝试列表压缩:

df = pandas.DataFrame({'columnOne': ['Hello World', 'US Election', 'Movie Night']})

df['column2'] = ['#' + item.replace(' ', '') for item in df.columnOne]

In [2]: df

enter image description here

你的一般方法很好,只是有一些问题。当您在整个数据帧上使用apply时,它将向它应用的函数传递一行或一列。你不想在每一列的第一个单元格中。因此,您不需要运行df.apply,而是希望df['columnOne'].apply。在

我要做的是:

import pandas as pd

df = pd.DataFrame(['First test here', 'Second test'], columns=['A'])

# Note that this function expects a string, and returns a string
def new_string(s):
    # Get rid of the spaces
    s = s.replace(' ','')
    # Add the hash
    s = '#' + s
    return s

# The, apply it to the first column, and save it in the second, new column
df['B'] = df['A'].apply(new_string)

或者,如果你真的想用一行代码:

^{pr2}$

您可以使用带参数^{cd3>}的带注释的^{}或{a3}将所有空白替换为空字符串:

df['column2'] = '#' + df.column1.str.replace('\s+','')
df['column3'] = '#' + df.column1.replace('\s+','', regex=True)

print (df)
       column1      column2      column3
0  Hello World  #HelloWorld  #HelloWorld
1  US Election  #USElection  #USElection

相关问题 更多 >

    热门问题