如何正确调用函数并返回更新的数据帧？

df = pd.DataFrame(['adam', 'ed', 'dra','dave','sed','mike'], index = ['a', 'b', 'c', 'd', 'e', 'f'], columns=['A']) def get_item(data): comb=pd.DataFrame() comb['Newfield'] = data #create new columns comb['AnotherNewfield'] = 'y' return pd.DataFrame(comb)

>>> newdf = df['A'].apply(get_item) >>> newdf a A Newfield AnotherNewfield a adam st... b A Newfield AnotherNewfield e sed st... c A Newfield AnotherNewfield d dave st... d A Newfield AnotherNewfield d dave st... e A Newfield AnotherNewfield s NaN st... f A Newfield AnotherNewfield m NaN str(... Name: A, dtype: object >>> type(newdf) <class 'pandas.core.series.Series'>

2条回答

网友

1楼 · 编辑于 2024-09-27 21:28:44

对于任何想找到答案的人来说，我在执行另一篇文章中找到的代码时得到了想要的结果。将发布该家伙的姓名以表彰他，但这基本上允许我编辑函数并通过apply函数获取在不同列中创建的数据：

def get_item(data):
    
    value = data     #create new columns using variables
    AnotherNewfield = 'y'
    return pd.Series(value),pd.Series(AnotherNewfield)

>>> df['B'], df['C'] = zip(*df['A'].apply(get_item))
>>> df
      A        B     C
a  adam  (adam,)  (y,)
b    ed    (ed,)  (y,)
c   dra   (dra,)  (y,)
d  dave  (dave,)  (y,)
e   sed   (sed,)  (y,)
f  mike  (mike,)  (y,)
>>>

它带来的唯一问题是-括号和逗号与数据一起出现。我打算在函数之外的代码中去掉它。也许这个

>>> df['B'] = df['B'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B     C
a  adam   adam   (y,)
b    ed     ed   (y,)
c   dra    dra   (y,)
d  dave   dave   (y,)
e   sed    sed   (y,)
f  mike   mike   (y,)
>>> df['C'] = df['C'].apply(lambda x: re.sub(r"[^a-zA-Z0-9-]+", ' ', str(x)))
>>> df
      A       B    C
a  adam   adam    y 
b    ed     ed    y 
c   dra    dra    y 
d  dave   dave    y 
e   sed    sed    y 
f  mike   mike    y

网友

2楼 · 编辑于 2024-09-27 21:28:44

您可以使用groupby和apply从apply调用中获取数据帧，如下所示：

import pandas as pd

# add new column B for groupby - we need single group only to do the trick
df = pd.DataFrame(
    {'A':['adam', 'ed', 'dra','dave','sed','mike'], 'B': [1,1,1,1,1,1]},
    index=['a', 'b', 'c', 'd', 'e', 'f'])

def get_item(data):
    # create empty dataframe to be returned
    comb=pd.DataFrame(columns=['Newfield', 'AnotherNewfield'], data=None)
    # append series data (or any data) to dataframe's columns 
    comb['Newfield'] = comb['Newfield'].append(data['A'], ignore_index=True)
    comb['AnotherNewfield'] = 'y'
    # return complete dataframe
    return comb

# use column B for group to get tuple instead of dataframe
newdf = df.groupby('B').apply(get_item)
# after processing the dataframe newdf contains MultiIndex - simply remove the 0-level (index col B with value 1 gained from groupby operation)
newdf.droplevel(0)

输出：

    Newfield    AnotherNewfield
0   adam        y
1   ed          y
2   dra         y
3   dave        y
4   sed         y
5   mike        y

相关问题更多 >

编程相关推荐

热门问题

热门文章