我有下面的数据框,只显示一列
0 Atlantic Division
1 Tampa Bay Lightning*
2 Boston Bruins*
3 Toronto Maple Leafs*
4 Florida Panthers
5 Detroit Red Wings
6 Montreal Canadiens
7 Ottawa Senators
8 Buffalo Sabres
9 Metropolitan Division
10 Washington Capitals*
11 Pittsburgh Penguins*
12 Philadelphia Flyers*
13 Columbus Blue Jackets*
14 New Jersey Devils*
15 Carolina Hurricanes
16 New York Islanders
17 New York Rangers
18 Central Division
19 Nashville Predators*
20 Winnipeg Jets*
21 Minnesota Wild*
22 Colorado Avalanche*
23 St. Louis Blues
24 Dallas Stars
25 Chicago Blackhawks
26 Pacific Division
27 Vegas Golden Knights*
28 Anaheim Ducks*
29 San Jose Sharks*
30 Los Angeles Kings*
31 Calgary Flames
32 Edmonton Oilers
33 Vancouver Canucks
34 Arizona Coyotes
35 Atlantic Division
36 Montreal Canadiens*
37 Ottawa Senators*
38 Boston Bruins*
39 Toronto Maple Leafs*
40 Tampa Bay Lightning
41 Florida Panthers
42 Detroit Red Wings
43 Buffalo Sabres
44 Metropolitan Division
45 Washington Capitals*
46 Pittsburgh Penguins*
47 Columbus Blue Jackets*
48 New York Rangers*
49 New York Islanders
50 Philadelphia Flyers
51 Carolina Hurricanes
52 New Jersey Devils
53 Central Division
54 Chicago Blackhawks*
55 Minnesota Wild*
56 St. Louis Blues*
57 Nashville Predators*
58 Winnipeg Jets
59 Dallas Stars
60 Colorado Avalanche
61 Pacific Division
62 Anaheim Ducks*
63 Edmonton Oilers*
64 San Jose Sharks*
65 Calgary Flames*
66 Los Angeles Kings
67 Arizona Coyotes
68 Vancouver Canucks
69 Atlantic Division
70 Florida Panthers*
71 Tampa Bay Lightning*
72 Detroit Red Wings*
73 Boston Bruins
74 Ottawa Senators
75 Montreal Canadiens
76 Buffalo Sabres
77 Toronto Maple Leafs
78 Metropolitan Division
79 Washington Capitals*
80 Pittsburgh Penguins*
81 New York Rangers*
82 New York Islanders*
83 Philadelphia Flyers*
84 Carolina Hurricanes
85 New Jersey Devils
86 Columbus Blue Jackets
87 Central Division
88 Dallas Stars*
89 St. Louis Blues*
90 Chicago Blackhawks*
91 Nashville Predators*
92 Minnesota Wild*
93 Colorado Avalanche
94 Winnipeg Jets
95 Pacific Division
96 Anaheim Ducks*
97 Los Angeles Kings*
98 San Jose Sharks*
99 Arizona Coyotes
100 Calgary Flames
101 Vancouver Canucks
102 Edmonton Oilers
103 Atlantic Division
104 Montreal Canadiens*
105 Tampa Bay Lightning*
106 Detroit Red Wings*
107 Ottawa Senators*
108 Boston Bruins
109 Florida Panthers
110 Toronto Maple Leafs
111 Buffalo Sabres
112 Metropolitan Division
113 New York Rangers*
114 Washington Capitals*
115 New York Islanders*
116 Pittsburgh Penguins*
117 Columbus Blue Jackets
118 Philadelphia Flyers
119 New Jersey Devils
120 Carolina Hurricanes
121 Central Division
122 St. Louis Blues*
123 Nashville Predators*
124 Chicago Blackhawks*
125 Minnesota Wild*
126 Winnipeg Jets*
127 Dallas Stars
128 Colorado Avalanche
129 Pacific Division
130 Anaheim Ducks*
131 Vancouver Canucks*
132 Calgary Flames*
133 Los Angeles Kings
134 San Jose Sharks
135 Edmonton Oilers
136 Arizona Coyotes
137 Atlantic Division
138 Boston Bruins*
139 Tampa Bay Lightning*
140 Montreal Canadiens*
141 Detroit Red Wings*
142 Ottawa Senators
143 Toronto Maple Leafs
144 Florida Panthers
145 Buffalo Sabres
146 Metropolitan Division
147 Pittsburgh Penguins*
148 New York Rangers*
149 Philadelphia Flyers*
150 Columbus Blue Jackets*
151 Washington Capitals
152 New Jersey Devils
153 Carolina Hurricanes
154 New York Islanders
155 Central Division
156 Colorado Avalanche*
157 St. Louis Blues*
158 Chicago Blackhawks*
159 Minnesota Wild*
160 Dallas Stars*
161 Nashville Predators
162 Winnipeg Jets
163 Pacific Division
164 Anaheim Ducks*
165 San Jose Sharks*
166 Los Angeles Kings*
167 Phoenix Coyotes
168 Vancouver Canucks
169 Calgary Flames
170 Edmonton Oilers
Name: team, dtype: object
我需要用城市名称创建一个附加列
乍一看,正则表达式很简单(第一个单词)应该是城市名称,其余的是团队名称
然而,有些城市有两个词(洛杉矶、圣路易斯等)
是否有可能使用正则表达式执行此操作,或者必须手动执行
更新:我尝试了以下方法:
nhl_df['city']=nhl_df['team'].str.extract(r'^(?:([\w.]{1,5}\s\w+)|(\w+)|)(?:\s\w+)+\*?$')
但我得到了这个错误:
ValueError: Wrong number of items passed 2, placement implies 1
你可以用
见regex demo详细信息:
^
-字符串的开头([\w.]{1,5}(?:\s\w+)?\w*)
-捕获组1:[\w.]{1,5}
-一到五个字或点字(?:\s\w+)?
-可选出现一个空格,然后出现一个或多个单词字符\w*
-零个或多个单词字符李>熊猫测试:
^\S+(?=\s\S+$)
这个正则表达式给出了所有团队名称中只有两个单词的第一个单词。其他的你必须手动排序,因为没有办法仅仅通过模式来判断中间的单词是城市的一部分还是团队名称
您可以尝试类似的方法:
在这里,您应该在第一组或第二组中查找城市名称
该模式假设两个单词城市名称的第一部分不超过5个符号。结果可能不那么清晰,但在给定的示例中似乎效果很好
相关问题 更多 >
编程相关推荐