<p>比如说,你有一个列在一个系列中,像这样:</p>
<pre><code>s
0 United Kingdom - ��Global Consumer Technolog...
1 United Kingdom - ��VP Technology - Founder -...
2 Aberdeen - ��SeniorCore Analysis Specialist ...
3 London, - ��ED, Equit Technology, London - �...
4 United Kingdom - ��Chief Officer, Group Tech...
Name: Summary 1, dtype: object
</code></pre>
<p><strong>选项1</strong><br/>
展开<a href="https://stackoverflow.com/questions/3203190/regex-any-ascii-character">this answer</a>,可以使用<code>str.split</code>拆分非ascii字符:</p>
<pre><code>s.str.split(r'-\s*[^\x00-\x7f]+', expand=True)
0 1 2
0 United Kingdom Global Consumer Technology American Express
1 United Kingdom VP Technology - Founder Hogarth Worldwide
2 Aberdeen SeniorCore Analysis Specialist COREX Group
3 London, ED, Equit Technology, London Morgan Stanley
4 United Kingdom Chief Officer, Group Technology BP
</code></pre>
<hr/>
<p><strong>选项2</strong><br/>
<code>str.extractall</code>+<code>unstack</code>:</p>
<pre><code>s.str.extractall('([\x00-\x7f]+)')[0].str.rstrip(r'- ').unstack()
match 0 1 2
0 United Kingdom Global Consumer Technology American Express
1 United Kingdom VP Technology - Founder Hogarth Worldwide
2 Aberdeen SeniorCore Analysis Specialist COREX Group
3 London, ED, Equit Technology, London Morgan Stanley
4 United Kingdom Chief Officer, Group Technology BP
</code></pre>