删除列中字符串的一部分

Kelso, Scottish Borders Manchester, Greater Manchester Northampton, Northamptonshire Reading, Berkshire Leicester, Leicestershire Newport, Wales Swindon, Wiltshire Perth, Perth & Kinross Manchester, Greater Manchester Perth, Perth & Kinross Cardiff Hull, East Riding Of Yorkshire Chester, Cheshire Southampton Leamington Spa, Warwickshire Swindon, Wiltshire Slough, Berkshire Portsmouth, Hampshire

2条回答

网友

1楼 · 编辑于 2024-09-27 21:31:12

要在列的每个元素上执行自定义功能，可以使用pandasapply函数。在您的情况下，应该使用以下代码：

import pandas
import numpy

def get_first_substring(x):
    if (x!=None and x!=numpy.nan):
        return x.split(',')[0]

dataframe['new'] = dataframe['location.display_name'].apply(get_first_substring)

输出如下所示：

          old                     new
subsstring1, subsstring2      subsstring1

网友

2楼 · 编辑于 2024-09-27 21:31:12

我认为需要^{}按str[0]选择第一列list或按[0]选择第一列：

df['new'] = df['location.display_name'].str.split(',').str[0]
#alternative
#df['new'] = df['location.display_name'].str.split(',', expand=True)[0]
print (df)
              location.display_name              new
0           Kelso, Scottish Borders            Kelso
1    Manchester, Greater Manchester       Manchester
2     Northampton, Northamptonshire      Northampton
3                Reading, Berkshire          Reading
4         Leicester, Leicestershire        Leicester
5                    Newport, Wales          Newport
6                Swindon, Wiltshire          Swindon
7            Perth, Perth & Kinross            Perth
8    Manchester, Greater Manchester       Manchester
9            Perth, Perth & Kinross            Perth
10                          Cardiff          Cardiff
11   Hull, East Riding Of Yorkshire             Hull
12                Chester, Cheshire          Chester
13                      Southampton      Southampton
14     Leamington Spa, Warwickshire   Leamington Spa
15               Swindon, Wiltshire          Swindon
16                Slough, Berkshire           Slough
17            Portsmouth, Hampshire       Portsmouth

如果数据中没有NaNs和Nones，则可以使用list comprehension：

df['new'] = [x.split(',')[0] for x in df['location.display_name']]

相关问题更多 >

编程相关推荐

热门问题

热门文章