Groupby应用/转换带有参数的自定义函数

2024-10-06 12:13:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做一些NLP工作,我正在尝试使用groupby在lambda函数中执行post请求,并得到一个JSON对象响应,不幸的是,它会导致NaN。我需要它在“分解”字段后添加字段

自定义功能:

def posTagger(text):
    post = { "text": title }
    endpoint = 'http://localhost:8001/api/postagger'
    r = requests.post(endpoint, json=post)
    r = r.json()
    time.sleep(1)
    return {"title": title, "result": r}


posTagger返回值:

[
    {
        "text": "Contemporary Modern Soft Area Rugs Nonslip",
        "terms": [
            {
                "text": "Contemporary",
                "penn": "JJ",
                "tags": [
                    "Adjective"
                ]
            },
            {
                "text": "Modern",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Singular"
                ]
            },
            {
                "text": "Soft",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Singular"
                ]
            },
            {
                "text": "Area",
                "penn": "NN",
                "tags": [
                    "Singular",
                    "Noun",
                    "ProperNoun"
                ]
            },
            {
                "text": "Rugs",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Plural"
                ]
            },
            {
                "text": "Nonslip",
                "penn": "NNP",
                "tags": [
                    "ProperNoun",
                    "Noun",
                    "Singular"
                ]
            }
        ]
    }
]

数据帧

title = [
    'Contemporary Modern Soft Area Rugs Nonslip Velvet Home Room Carpet Floor Mat Rug', 
    'Traditional Distressed Area Rug 8x10 Large Rugs for Living Room 5x8 Gray Ivory', 
    'Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Carpet Living Room Bedroom Large Rug'
    ]
df = pd.DataFrame(title, columns=['title'])
df

# Initial dataframe:

# title
# 0 Contemporary Modern Soft Area Rugs Nonslip...
# 1 Traditional Distressed Area Rug 8x10 Large...
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft...

下面是我使用的分组。应用

df['result'] = pd.DataFrame(df.groupby(['title']).apply(lambda x: posTagger(x)))
df

# Resulting DataFrame after **.apply**:

#   title   result
# 0 Contemporary Modern Soft Area Rugs Nonslip Vel...   NaN
# 1 Traditional Distressed Area Rug 8x10 Large Rug...   NaN
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Car...   NaN

下面是我使用.transform的分组:

df['result'] = pd.DataFrame(df.groupby(['title']).transform(lambda x: posTagger(x)))
df

# Resulting DataFrame after **.transform**:

# title result
# 0 Contemporary Modern Soft Area Rugs Nonslip Vel...   {'title': ['Contemporary Modern Soft Area Rugs...
# 1 Traditional Distressed Area Rug 8x10 Large Rug...   {'title': ['Contemporary Modern Soft Area Rugs...
# 2 Shaggy Area Rugs Fluffy Tie-Dye Floor Soft Car...   {'title': ['Contemporary Modern Soft Area Rugs...

注意,.transform的结果多次发送了相同的值<为什么

  1. 如何从自定义函数(返回具有嵌套数组的对象)获取返回值,并以分解形式添加到与新列相同的数据帧中
  2. 使用.apply.transform来实现这一点更好吗

Tags: textdftitletagsarearesultsoftnoun
1条回答
网友
1楼 · 发布于 2024-10-06 12:13:34

我将在这里讨论apply(),这里有一些考虑因素需要您仔细考虑

对于当前函数,要获得该结果(即字典),可以使用编写的函数并更改代码以调用它。除非其他人是相同的,否则您不会真正根据标题进行分组,所以只需使用apply()而不使用groupby()。这不会使字典爆炸。有很多方法可以考虑这一点

def posTagger(text):
    post = { "text": title }
    endpoint = 'http://localhost:8001/api/postagger'
    r = requests.post(endpoint, json=post)
    r = r.json()
    time.sleep(1)
    return {"title": title, "result": r}

df['result'] = df.apply(lambda x: posTagger(x))

现在,如果您确实想使用groupby().apply(),那么将数据帧组作为x发送,对其进行操作,然后返回x。这没有经过测试,但这是思考这个问题的一种方式

def posTagger(x):
    post = { "text": x['title'] }
    endpoint = 'http://localhost:8001/api/postagger'
    r = requests.post(endpoint, json=post)
    r = r.json()
    time.sleep(1)
    x['result'] = {"title": x['title'], "result": r}
    # or you may be able code in the explode here using something like
    # dftemp = pd.DataFrame({"title": x['title'], "result": r})
    # merging x = x.merge(dftemp)
    # not tested at all but this would return x to the original dataframe
    return x

df = df.groupby(['title']).apply(lambda x: posTagger(x))

相关问题 更多 >