<p>正如评论部分所建议的,正则表达式在这里应该工作得很好</p>
<h2>数据帧示例:</h2>
<pre><code>>>> df
PLUGS\nDESIGN\nGEAR
0 700\nDaewoo 8000 Gearless
1 300\nHyundai 4400 Gearless
2 600\nSTX 2600 Gearless
3 200\nB170 \nGeared
4 362 Wenchong 1700 Mk II \nGeared
5 252\nRichMax 1550 Gearless
6 220\nCV 1100 Plus \nGeared
7 232\nOrskov Mk VII Gearless
8 119\nKouan 1000 Gearless
9 100\nHanjin 700 Gearless
</code></pre>
<p>只需从列名中删除换行符,即可使可读性易于使用</p>
<pre><code>>>> df.columns = df.columns.str.replace(r"\\n", " ", regex=True)
</code></pre>
<p>现在,列名没有任何特殊的汽车:</p>
<pre><code>>>> df
PLUGS DESIGN GEAR
0 700\nDaewoo 8000 Gearless
1 300\nHyundai 4400 Gearless
2 600\nSTX 2600 Gearless
3 200\nB170 \nGeared
4 362 Wenchong 1700 Mk II \nGeared
5 252\nRichMax 1550 Gearless
6 220\nCV 1100 Plus \nGeared
7 232\nOrskov Mk VII Gearless
8 119\nKouan 1000 Gearless
9 100\nHanjin 700 Gearless
</code></pre>
<p>现在,我们可以使用<a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html" rel="nofollow noreferrer">pandas.Series.str.extract</a>。使用<code>regex</code>方法时,所有命名组<code>()</code>将成为结果中的列名</p>
<p>由于,命名组将成为具有预定义名称的列,如<code>0,1,2</code>,因此我们可以使用所需名称对它们进行重命名,以获得所需结果,如下所示:</p>
<pre><code>>>> df = df['PLUGS DESIGN GEAR'].str.extract(r"^(\d+)[\\n\s]+([^\\]+)[\\n\s]+([\\n|^Gear][a-z]+)").rename(columns={0: 'PLUGS', 1: 'DESIGN', 2: 'GEAR'})
</code></pre>
<h2>结果:</h2>
<pre><code>>>> print(df)
PLUGS DESIGN GEAR
0 700 Daewoo 8000 Gearless
1 300 Hyundai 4400 Gearless
2 600 STX 2600 Gearless
3 200 B170 Geared
4 362 Wenchong 1700 Mk II Geared
5 252 RichMax 1550 Gearless
6 220 CV 1100 Plus Geared
7 232 Orskov Mk VII Gearless
8 119 Kouan 1000 Gearless
9 100 Hanjin 700 Gearless
</code></pre>
<p>正则表达式解释:</p>
<p>你可以在<a href="https://regex101.com/" rel="nofollow noreferrer">regex101.com</a>查看</p>
<pre><code>(\d+)[\\n\s]+([^\\]+)[\\n\s]+([\|^Gear][a-z]+)
</code></pre>
<p><strong>第一个捕获组(\d+)</strong></p>
<pre><code> \d matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [\\n\s]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\\ matches the character \ literally (case sensitive)
n matches the character n literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
</code></pre>
<p><strong>第二捕获组([^\]+)</strong></p>
<pre><code> Match a single character not present in the list below [^\\]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\\ matches the character \ literally (case sensitive)
Match a single character present in the list below [\\n\s]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\\ matches the character \ literally (case sensitive)
n matches the character n literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
</code></pre>
<p><strong>第三捕获组([^Gear][a-z]+)</strong></p>
<pre><code>Match a single character present in the list below [\|^Gear]
\| matches the character | literally (case sensitive)
^Gear matches a single character in the list ^Gear (case sensitive)
Match a single character present in the list below [a-z]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
</code></pre>