如何在Python中从列中拆分和提取位置名称

2024-10-05 11:04:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在处理最初导出到CSV的数据,后来又从同一CSV导入到EDA中。有一个地址栏,上面附有“郊区/地区”名称。我试图找到一种方法,使用excel将这些特定的名称拆分/提取到不同的列中。但我没有得到想要的输出。如果知道我是否可以使用Python(NLTK)函数来实现这一点,会很有帮助吗

这是我的样本数据

**Address column**
4a Mcarthurs Road, Altona north
1 Neal court, Altona North 
4 Vermilion Drive, Greenvale
Lot 307 Bonds Lane, Greenvale
430 Blackshaws rd, Altona North 
159 Bonds lane, Greenvale
Lot 1105 4 compass Drive Greenvale
6005 Bethany dr tarneet
Lot 655 Potofino Way Wollert
lot 403 Binds Lane, Greenvale
157 Maidstone street Altona
11 Laramie Street, Greenvale 
10 Preveli Way Wollert 
21 Laramie Street, Greenvale 
20 taipan crt tarneit
4 bisect road greenvale
83 everton road truganina
Lot 450 Vermilion Drive, Greenvale
Lot 641 Preveli Way Wollert 
648 hogans rd tarneit

期望输出:

Address                   Suburb
4a Mcarthurs Road        Altona North
1 Neal court              Altona North
4 Vermilion Drive          Greenvale
Lot 307 Bonds Lane         Greenvale
430 Blackshaws rd         Altona North
159 Bonds lane              Greenvale
Lot 1105 4 compass Drive    Greenvale
6005 Bethany dr              Tarneet
Lot 655 Potofino Way         Wollert
lot 403 Binds Lane          Greenvale
157 Maidstone street         Altona
11 Laramie Street          Greenvale
10 Preveli Way              Wollert
21 Laramie Street           Greenvale
20 taipan crt               Tarneit
4 bisect road              Greenvale
83 everton road            Truganina
Lot 450 Vermilion Drive    Greenvale
Lot 641 Preveli Way          Wollert
648 hogans rd               Tarneit

在此方面的任何帮助都将不胜感激

提前感谢您的支持


Tags: streetrddrivewaylotnorthroadlane
2条回答

您可以尝试以下方法:

df['local'] = df['Address column']\
                .str.extract(r'.+\, (.*)')\
                .fillna(df['Address column'].str.extract(r'.* (.*)$'))

print(df['local'])
0     Altona north
1     Altona North
2        Greenvale
3        Greenvale
4     Altona North
5        Greenvale
6        Greenvale
7          tarneet
8          Wollert
9        Greenvale
10          Altona
11       Greenvale
12         Wollert
13       Greenvale
14         tarneit
15       greenvale
16       truganina
17       Greenvale
18         Wollert
19         tarneit
Name: local, dtype: object

我注意到您可以使用以下正则表达式模式将地址列拆分为地址和郊区:

number word word split here word

(df["Address Column"]
.str.extract("(?P<Address>.*\d+[\w+?|\s]\s?\w+\s+\w+),?\s(?P<Suburb>.*$)")
.apply(lambda x: x.str.title()))

输出:

                     Address        Suburb
0          4A Mcarthurs Road  Altona North
1               1 Neal Court  Altona North
2          4 Vermilion Drive     Greenvale
3         Lot 307 Bonds Lane     Greenvale
4          430 Blackshaws Rd  Altona North
5             159 Bonds Lane     Greenvale
6   Lot 1105 4 Compass Drive     Greenvale
7            6005 Bethany Dr       Tarneet
8       Lot 655 Potofino Way       Wollert
9         Lot 403 Binds Lane     Greenvale
10      157 Maidstone Street        Altona
11         11 Laramie Street     Greenvale
12            10 Preveli Way       Wollert
13         21 Laramie Street     Greenvale
14             20 Taipan Crt       Tarneit
15             4 Bisect Road     Greenvale
16           83 Everton Road     Truganina
17   Lot 450 Vermilion Drive     Greenvale
18       Lot 641 Preveli Way       Wollert
19             648 Hogans Rd       Tarneit

注意:我相信这个正则表达式会更整洁

相关问题 更多 >

    热门问题