删除列中字符串的一部分

2024-09-27 21:31:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据框中有一列如下所示:

输入

df['location.display_name']

输出

 Kelso, Scottish Borders
 Manchester, Greater Manchester
 Northampton, Northamptonshire
 Reading, Berkshire
 Leicester, Leicestershire
 Newport, Wales
 Swindon, Wiltshire
 Perth, Perth & Kinross
 Manchester, Greater Manchester
 Perth, Perth & Kinross
 Cardiff
 Hull, East Riding Of Yorkshire
 Chester, Cheshire
 Southampton
 Leamington Spa, Warwickshire
 Swindon, Wiltshire
 Slough, Berkshire
 Portsmouth, Hampshire

我想创建一个只包含位置第一部分的新列-例如:Swindon,Wiltshire我想保留Swindon并将其添加到新列中。你知道吗

还有,这会对我想保留的一些单词有什么影响,比如Cardiff?你知道吗


Tags: 数据namedfdisplaylocationgreaterberkshireperth
2条回答

要在列的每个元素上执行自定义功能,可以使用pandasapply函数。在您的情况下,应该使用以下代码:

import pandas
import numpy

def get_first_substring(x):
    if (x!=None and x!=numpy.nan):
        return x.split(',')[0]

dataframe['new'] = dataframe['location.display_name'].apply(get_first_substring)

输出如下所示:

          old                     new
subsstring1, subsstring2      subsstring1

我认为需要^{}str[0]选择第一列list或按[0]选择第一列:

df['new'] = df['location.display_name'].str.split(',').str[0]
#alternative
#df['new'] = df['location.display_name'].str.split(',', expand=True)[0]
print (df)
              location.display_name              new
0           Kelso, Scottish Borders            Kelso
1    Manchester, Greater Manchester       Manchester
2     Northampton, Northamptonshire      Northampton
3                Reading, Berkshire          Reading
4         Leicester, Leicestershire        Leicester
5                    Newport, Wales          Newport
6                Swindon, Wiltshire          Swindon
7            Perth, Perth & Kinross            Perth
8    Manchester, Greater Manchester       Manchester
9            Perth, Perth & Kinross            Perth
10                          Cardiff          Cardiff
11   Hull, East Riding Of Yorkshire             Hull
12                Chester, Cheshire          Chester
13                      Southampton      Southampton
14     Leamington Spa, Warwickshire   Leamington Spa
15               Swindon, Wiltshire          Swindon
16                Slough, Berkshire           Slough
17            Portsmouth, Hampshire       Portsmouth

如果数据中没有NaNs和Nones,则可以使用list comprehension

df['new'] = [x.split(',')[0] for x in df['location.display_name']]

相关问题 更多 >

    热门问题