<p>你可以用</p>
<pre class="lang-py prettyprint-override"><code>df = pd.DataFrame({'Date_Title':['05-21 I. Don Quixote','21-20 IV. Macbeth','10-12 ML. To Kill a Mockingbird','12 V. Invisible Man'], 'Date':[1605,1629,1960,1897], 'Copies':[252,987,478,136]})
rx = r'^(\d+(?:-\d+)?\s*(M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})))\.\s*(.*)'
df[['NumRoman','Roman','Name']] = df.pop('Date_Title').str.extract(rx)
df = df[['NumRoman','Roman','Name', 'Date', 'Copies']]
>>> df
NumRoman Roman Name Date Copies
0 05-21 I I Don Quixote 1605 252
1 21-20 IV IV Macbeth 1629 987
2 10-12 ML ML To Kill a Mockingbird 1960 478
3 12 V V Invisible Man 1897 136
</code></pre>
<p>见<a href="https://regex101.com/r/zOVQ5i/2/" rel="nofollow noreferrer">regex demo</a><em>详细信息</em>:</p>
<ul>
<li><code>^</code>-字符串的开头</li>
<li><code>(\d+(?:-\d+)?\s*(M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})))</code>-第1组(“NumRoman”):
<ul>
<li><code>\d+(?:-\d+)?</code>-一个或多个数字,后跟可选的<code>-</code>序列和一个或多个数字</li>
<li><code>\s*</code>-零个或多个空格</li>
<li><code>(M{0,4}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3}))</code>-第2组(“罗马”):参见<a href="https://stackoverflow.com/questions/267399">How do you match only valid roman numerals with a regular expression?</a>了解解释</li>
</ul>
</li>
<li><code>\.</code>-一个点</li>
<li><code>\s*</code>-零个或多个空格</li>
<li><code>(.*)</code>-第3组(“名称”):除换行符以外的任何零个或多个字符,尽可能多</li>
</ul>
<p>注意<code>df.pop('Date_Title')</code>删除<code>Date_Title</code>列,并将其作为<code>extract</code>方法的输入^如果需要保持原始列顺序,{<cd13>}是必需的</p>