基于行的某个值派生一个新列并应用,直到下一个值再次出现

2024-07-03 05:52:37 发布

您现在位置:Python中文网/ 问答频道 /正文

在dataframe string列中,我希望基于行的值派生一个新列,直到下一个值再次出现。最有效的方法是什么/最干净的方法是什么?你知道吗

输入数据帧:

import pandas as pd

df = pd.DataFrame({'neighborhood':['Chicago City', 'Wicker Park', 'Bucktown','Lincoln Park','West Loop','River North','Milwaukee City','Bay View','East Side','South Side','Bronzeville','North Side','New York City','Harlem','Midtown','Chinatown']})

我想要的数据帧输出是:

      neighborhood city
0     Chicago City Chicago
1      Wicker Park Chicago
2         Bucktown Chicago
3     Lincoln Park Chicago
4        West Loop Chicago
5      River North Chicago
6   Milwaukee City Milwaukee
7         Bay View Milwaukee
8        East Side Milwaukee
9       South Side Milwaukee
10     Bronzeville Milwaukee
11      North Side Milwaukee
12   New York City New York
13          Harlem New York
14         Midtown New York
15       Chinatown New York

Tags: 数据方法cityparknewsidepdyork
3条回答

1)如果第一列包含“城市”,请将其复制到第二列,但切掉“城市”部分

2)使用正向填充方法填充NA

import numpy as np

df['city'] = np.where(
df.neighborhood.str.contains('City'),
df.neighborhood.str.replace(' City', '', case = False),
None)

结果:

      neighborhood       city
0     Chicago City    Chicago
1      Wicker Park       None
2         Bucktown       None
3     Lincoln Park       None
4        West Loop       None
5      River North       None
6   Milwaukee City  Milwaukee
7         Bay View       None
8        East Side       None
9       South Side       None
10     Bronzeville       None
11      North Side       None
12   New York City   New York
13          Harlem       None
14         Midtown       None
15       Chinatown       None
df['city'] = df['city'].fillna(method = 'ffill')

结果:

      neighborhood       city
0     Chicago City    Chicago
1      Wicker Park    Chicago
2         Bucktown    Chicago
3     Lincoln Park    Chicago
4        West Loop    Chicago
5      River North    Chicago
6   Milwaukee City  Milwaukee
7         Bay View  Milwaukee
8        East Side  Milwaukee
9       South Side  Milwaukee
10     Bronzeville  Milwaukee
11      North Side  Milwaukee
12   New York City   New York
13          Harlem   New York
14         Midtown   New York
15       Chinatown   New York

使用.str.extract+ffill

df['city'] = df.neighborhood.str.extract('(.*)\sCity').ffill()

您只需map一个按预期运行的自定义函数

city = None
def generate(s):
    global city
    if 'City' in s: city = s.replace('City','')
    return city

df['neighborhood'].map(generate)

这将返回预期输出

相关问题 更多 >