<p>由于只有三个可能的分隔符,因此可以利用chained split(),因为如果找不到分隔符,split将返回未修改的字符串。你知道吗</p>
<pre><code>>>> s = """196 Boston (Boston University, Boston College, Bos...
... 197 Bridgewater (Bridgewater State College)[2]
... 198 Cambridge (Harvard University, Massachusetts I...
... 199 Chestnut Hill (Boston College)
... 200 The Colleges of Worcester Consortium:
... 201 Dudley (Nichols College)
... 240 Faribault, South Central College
... 241 Mankato (Minnesota State University, Mankato),...
... 242 Marshall (Southwest Minnesota State University...
... 243 Moorhead (Minnesota State University, Moorhead...
... 244 Morris (University of Minnesota Morris)[2]
... 245 Northfield (Carleton College, St. Olaf College...
... 246 North Mankato, South Central College
... 247 St. Cloud (St. Cloud State University, The Col...
... 248 St. Joseph (College of Saint Benedict)[2]
... 249 St. Peter (Gustavus Adolphus College)[2]"""
>>> for i in s.split('\n'):
... number, text = i.split('(')[0].split(',')[0].split(':')[0].split(' ',1)
... print('{} {}'.format(number, text.strip()))
...
196 Boston
197 Bridgewater
198 Cambridge
199 Chestnut Hill
200 The Colleges of Worcester Consortium
201 Dudley
240 Faribault
241 Mankato
242 Marshall
243 Moorhead
244 Morris
245 Northfield
246 North Mankato
247 St. Cloud
248 St. Joseph
249 St. Peter
</code></pre>
<p>可以使用<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html#pandas.DataFrame.apply" rel="nofollow noreferrer">^{<cd1>}</a>对字符串进行相同的转换。你知道吗</p>