如何从pandas数据框中的团队名称中提取带有rege的城市名称

2024-09-28 05:36:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的数据框,只显示一列

0           Atlantic Division
1        Tampa Bay Lightning*
2              Boston Bruins*
3        Toronto Maple Leafs*
4            Florida Panthers
5           Detroit Red Wings
6          Montreal Canadiens
7             Ottawa Senators
8              Buffalo Sabres
9       Metropolitan Division
10       Washington Capitals*
11       Pittsburgh Penguins*
12       Philadelphia Flyers*
13     Columbus Blue Jackets*
14         New Jersey Devils*
15        Carolina Hurricanes
16         New York Islanders
17           New York Rangers
18           Central Division
19       Nashville Predators*
20             Winnipeg Jets*
21            Minnesota Wild*
22        Colorado Avalanche*
23            St. Louis Blues
24               Dallas Stars
25         Chicago Blackhawks
26           Pacific Division
27      Vegas Golden Knights*
28             Anaheim Ducks*
29           San Jose Sharks*
30         Los Angeles Kings*
31             Calgary Flames
32            Edmonton Oilers
33          Vancouver Canucks
34            Arizona Coyotes
35          Atlantic Division
36        Montreal Canadiens*
37           Ottawa Senators*
38             Boston Bruins*
39       Toronto Maple Leafs*
40        Tampa Bay Lightning
41           Florida Panthers
42          Detroit Red Wings
43             Buffalo Sabres
44      Metropolitan Division
45       Washington Capitals*
46       Pittsburgh Penguins*
47     Columbus Blue Jackets*
48          New York Rangers*
49         New York Islanders
50        Philadelphia Flyers
51        Carolina Hurricanes
52          New Jersey Devils
53           Central Division
54        Chicago Blackhawks*
55            Minnesota Wild*
56           St. Louis Blues*
57       Nashville Predators*
58              Winnipeg Jets
59               Dallas Stars
60         Colorado Avalanche
61           Pacific Division
62             Anaheim Ducks*
63           Edmonton Oilers*
64           San Jose Sharks*
65            Calgary Flames*
66          Los Angeles Kings
67            Arizona Coyotes
68          Vancouver Canucks
69          Atlantic Division
70          Florida Panthers*
71       Tampa Bay Lightning*
72         Detroit Red Wings*
73              Boston Bruins
74            Ottawa Senators
75         Montreal Canadiens
76             Buffalo Sabres
77        Toronto Maple Leafs
78      Metropolitan Division
79       Washington Capitals*
80       Pittsburgh Penguins*
81          New York Rangers*
82        New York Islanders*
83       Philadelphia Flyers*
84        Carolina Hurricanes
85          New Jersey Devils
86      Columbus Blue Jackets
87           Central Division
88              Dallas Stars*
89           St. Louis Blues*
90        Chicago Blackhawks*
91       Nashville Predators*
92            Minnesota Wild*
93         Colorado Avalanche
94              Winnipeg Jets
95           Pacific Division
96             Anaheim Ducks*
97         Los Angeles Kings*
98           San Jose Sharks*
99            Arizona Coyotes
100            Calgary Flames
101         Vancouver Canucks
102           Edmonton Oilers
103         Atlantic Division
104       Montreal Canadiens*
105      Tampa Bay Lightning*
106        Detroit Red Wings*
107          Ottawa Senators*
108             Boston Bruins
109          Florida Panthers
110       Toronto Maple Leafs
111            Buffalo Sabres
112     Metropolitan Division
113         New York Rangers*
114      Washington Capitals*
115       New York Islanders*
116      Pittsburgh Penguins*
117     Columbus Blue Jackets
118       Philadelphia Flyers
119         New Jersey Devils
120       Carolina Hurricanes
121          Central Division
122          St. Louis Blues*
123      Nashville Predators*
124       Chicago Blackhawks*
125           Minnesota Wild*
126            Winnipeg Jets*
127              Dallas Stars
128        Colorado Avalanche
129          Pacific Division
130            Anaheim Ducks*
131        Vancouver Canucks*
132           Calgary Flames*
133         Los Angeles Kings
134           San Jose Sharks
135           Edmonton Oilers
136           Arizona Coyotes
137         Atlantic Division
138            Boston Bruins*
139      Tampa Bay Lightning*
140       Montreal Canadiens*
141        Detroit Red Wings*
142           Ottawa Senators
143       Toronto Maple Leafs
144          Florida Panthers
145            Buffalo Sabres
146     Metropolitan Division
147      Pittsburgh Penguins*
148         New York Rangers*
149      Philadelphia Flyers*
150    Columbus Blue Jackets*
151       Washington Capitals
152         New Jersey Devils
153       Carolina Hurricanes
154        New York Islanders
155          Central Division
156       Colorado Avalanche*
157          St. Louis Blues*
158       Chicago Blackhawks*
159           Minnesota Wild*
160             Dallas Stars*
161       Nashville Predators
162             Winnipeg Jets
163          Pacific Division
164            Anaheim Ducks*
165          San Jose Sharks*
166        Los Angeles Kings*
167           Phoenix Coyotes
168         Vancouver Canucks
169            Calgary Flames
170           Edmonton Oilers
Name: team, dtype: object

我需要用城市名称创建一个附加列

乍一看,正则表达式很简单(第一个单词)应该是城市名称,其余的是团队名称

然而,有些城市有两个词(洛杉矶、圣路易斯等)

是否有可能使用正则表达式执行此操作,或者必须手动执行

更新:我尝试了以下方法:

nhl_df['city']=nhl_df['team'].str.extract(r'^(?:([\w.]{1,5}\s\w+)|(\w+)|)(?:\s\w+)+\*?$')

但我得到了这个错误:

ValueError: Wrong number of items passed 2, placement implies 1

Tags: newbostondivisionlightningmapleyorkbaytoronto
3条回答

你可以用

^([\w.]{1,5}(?:\s\w+)?\w*)

regex demo详细信息

  • ^-字符串的开头
  • ([\w.]{1,5}(?:\s\w+)?\w*)-捕获组1:
    • [\w.]{1,5}-一到五个字或点字
    • (?:\s\w+)?-可选出现一个空格,然后出现一个或多个单词字符
    • \w*-零个或多个单词字符

熊猫测试:

import pandas as pd
nhl_df = pd.DataFrame({"team":["Atlantic Division","Tampa Bay Lightning*","Boston Bruins*","Toronto Maple Leafs*","Florida Panthers","Detroit Red Wings","Montreal Canadiens","Ottawa Senators","Buffalo Sabres","Metropolitan Division","Washington Capitals*","Pittsburgh Penguins*","Philadelphia Flyers*","Columbus Blue Jackets*","New Jersey Devils*","Carolina Hurricanes","New York Islanders","New York Rangers","Central Division","Nashville Predators*","Winnipeg Jets*","Minnesota Wild*","Colorado Avalanche*","St. Louis Blues","Dallas Stars","Chicago Blackhawks","Pacific Division","Vegas Golden Knights*","Anaheim Ducks*","San Jose Sharks*","Los Angeles Kings*","Calgary Flames","Edmonton Oilers","Vancouver Canucks","Arizona Coyotes","Atlantic Division","Montreal Canadiens*","Ottawa Senators*","Boston Bruins*","Toronto Maple Leafs*","Tampa Bay Lightning","Florida Panthers","Detroit Red Wings","Buffalo Sabres","Metropolitan Division","Washington Capitals*","Pittsburgh Penguins*","Columbus Blue Jackets*","New York Rangers*","New York Islanders","Philadelphia Flyers","Carolina Hurricanes","New Jersey Devils","Central Division","Chicago Blackhawks*","Minnesota Wild*","St. Louis Blues*","Nashville Predators*","Winnipeg Jets","Dallas Stars","Colorado Avalanche","Pacific Division","Anaheim Ducks*","Edmonton Oilers*","San Jose Sharks*","Calgary Flames*","Los Angeles Kings","Arizona Coyotes","Vancouver Canucks","Atlantic Division","Florida Panthers*","Tampa Bay Lightning*","Detroit Red Wings*","Boston Bruins","Ottawa Senators","Montreal Canadiens","Buffalo Sabres","Toronto Maple Leafs","Metropolitan Division","Washington Capitals*","Pittsburgh Penguins*","New York Rangers*","New York Islanders*","Philadelphia Flyers*","Carolina Hurricanes","New Jersey Devils","Columbus Blue Jackets","Central Division","Dallas Stars*","St. Louis Blues*","Chicago Blackhawks*","Nashville Predators*","Minnesota Wild*","Colorado Avalanche","Winnipeg Jets","Pacific Division","Anaheim Ducks*","Los Angeles Kings*","San Jose Sharks*","Arizona Coyotes","Calgary Flames","Vancouver Canucks","Edmonton Oilers","Atlantic Division","Montreal Canadiens*","Tampa Bay Lightning*","Detroit Red Wings*","Ottawa Senators*","Boston Bruins","Florida Panthers","Toronto Maple Leafs","Buffalo Sabres","Metropolitan Division","New York Rangers*","Washington Capitals*","New York Islanders*","Pittsburgh Penguins*","Columbus Blue Jackets","Philadelphia Flyers","New Jersey Devils","Carolina Hurricanes","Central Division","St. Louis Blues*","Nashville Predators*","Chicago Blackhawks*","Minnesota Wild*","Winnipeg Jets*","Dallas Stars","Colorado Avalanche","Pacific Division","Anaheim Ducks*","Vancouver Canucks*","Calgary Flames*","Los Angeles Kings","San Jose Sharks","Edmonton Oilers","Arizona Coyotes","Atlantic Division","Boston Bruins*","Tampa Bay Lightning*","Montreal Canadiens*","Detroit Red Wings*","Ottawa Senators","Toronto Maple Leafs","Florida Panthers","Buffalo Sabres","Metropolitan Division","Pittsburgh Penguins*","New York Rangers*","Philadelphia Flyers*","Columbus Blue Jackets*","Washington Capitals","New Jersey Devils","Carolina Hurricanes","New York Islanders","Central Division","Colorado Avalanche*","St. Louis Blues*","Chicago Blackhawks*","Minnesota Wild*","Dallas Stars*","Nashville Predators","Winnipeg Jets","Pacific Division","Anaheim Ducks*","San Jose Sharks*","Los Angeles Kings*","Phoenix Coyotes","Vancouver Canucks","Calgary Flames","Edmonton Oilers"]})
nhl_df['city']=nhl_df['team'].str.extract(r'^([\w.]{1,5}(?:\s\w+)?\w*)')
>>> nhl_df
                     team         city
0       Atlantic Division     Atlantic
1    Tampa Bay Lightning*    Tampa Bay
2          Boston Bruins*       Boston
3    Toronto Maple Leafs*      Toronto
4        Florida Panthers      Florida
..                    ...          ...
166    Los Angeles Kings*  Los Angeles
167       Phoenix Coyotes      Phoenix
168     Vancouver Canucks    Vancouver
169        Calgary Flames      Calgary
170       Edmonton Oilers     Edmonton

^\S+(?=\s\S+$)

这个正则表达式给出了所有团队名称中只有两个单词的第一个单词。其他的你必须手动排序,因为没有办法仅仅通过模式来判断中间的单词是城市的一部分还是团队名称

您可以尝试类似的方法:

^(?:([\w.]{1,5}\s\w+)|(\w+)|)(?:\s\w+)+\*?$

在这里,您应该在第一组或第二组中查找城市名称

该模式假设两个单词城市名称的第一部分不超过5个符号。结果可能不那么清晰,但在给定的示例中似乎效果很好

相关问题 更多 >

    热门问题