Pandas apply函数将4个元素列表返回到4列键

2024-09-27 22:24:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试对df中的一列应用一个函数,并根据返回的列表添加4个新列。你知道吗

下面是返回列表的函数。你知道吗

def separateReagan(data):
    block = None
    township = None
    section = None
    acres = None

    if 'BLK' in data:
        patern = r'BLK (\d{1,3})'
        blockList = re.findall(patern,data)
        if blockList:
            block = blockList[0]
    else:
        patern = r'B-([0-9]{1,3})'
        blockList = re.findall(patern,data)
        if blockList:
            block = blockList[0]

    # Similar for others

    return [block,township,section,acres]

这是数据帧的代码。你知道吗

df = df[['ID','Legal Description']]

# Dataframe looks like this
#          ID                                  Legal Description
# 0        1  143560 CLARKSON | ENDEAVOR ENERGY RESO | A- ,B...
# 1        2  143990 CLARKSON ESTATE | ENDEAVOR ENERGY RESO ...
# 2        3  144420 CLARKSON RANCH | ENDEAVOR ENERGY RESO |...

df[['Block','Township','Section','Acres']] = df.apply(lambda x: separateReagan(x['Legal Description']),axis=1)

我得到这个错误:

KeyError: "['Block' 'Township' 'Section' 'Acres'] not in index"

尝试返回一个元组而不是列表,但不起作用。你知道吗


Tags: 函数nonedf列表dataifdescriptionblock
1条回答
网友
1楼 · 发布于 2024-09-27 22:24:53

我很快就提出了一个小建议,也许正是你想要的。如果这有帮助,请告诉我。你知道吗

from pandas import DataFrame
import re

def separate_reagan(row):
    # row is a single row from the dataframe which is what is passed in
    # from df.apply(fcn, axis=1)
    # note: this means that you can also set values on the row

    # switch local variables to setting row in dataframe if you
    # really want to initialize them. If they are missing they should 
    # just become some form of NaN or None though depending on the dtype
    row['township'] = None
    row['section'] = None
    row['acres'] = None
    row['block'] = None

    # grab legal description here instead of passing it in as the only variable
    data = row['legal_description']
    if 'BLK' in data:
        block_list = re.search(r'BLK (\d{1,3})', data)
        if block_list:
            row['block'] = block_list.group(1)
    else:
        # since you only seem to want the first match, 
        # search is probably more what you're looking for
        block_list = re.search(r'B-([0-9]{1,3})', data)
        if block_list:
            row['block'] = block_list.group(1)

    # Similar for others

    # returns the modified row.
    return row

df = DataFrame([
    {'id': 1, 'legal_description': '43560 CLARKSON | ENDEAVOR ENERGY RESO | A- ,B...'},
    {'id': 2, 'legal_description': '4143990 CLARKSON ESTATE | ENDEAVOR ENERGY RESO ...'},
    {'id': 3, 'legal_description': '144420 CLARKSON RANCH | ENDEAVOR ENERGY RESO |...'},
])
df = df[['id','legal_description']]

# df now only has columns ID and Legal Description

# This left hand side gets the columns from the dataframe, but as mentioned in the comment
# above, those columns in not contained in the dataframe. Also they aren't returned from the 
# apply function because you never set them in separateReagan

df = df.apply(separate_reagan, axis=1)
# now these columns exist because you set them in the function
print(df[['block','township','section','acres']])

相关问题 更多 >

    热门问题