<p>您可以使用regex构建模式,该模式可以提取数字和以下单词,然后将此函数应用于数据帧的整个列</p>
<pre><code>import pandas as pd
import re
df = pd.DataFrame({'text':["Halve the clementine and place into the cavity along with the bay leaves. Transfer the duck to a medium roasting tray and roast for around 1 hour 20 minutes.",
"Add the stock, then bring to the boil and reduce to a simmer for around 15 minutes.",
"2 heaped teaspoons Chinese five-spice",
"100 ml Marsala",
"1 litre organic chicken stock"]})
def extract_qty(txt):
return re.findall('\d+ \w+',txt)
df['extracted_qty'] = df['text'].apply(extract_qty)
df
# text extracted_qty
#0 Halve the clementine and place into the cavity... [1 hour, 20 minutes]
#1 Add the stock, then bring to the boil and redu... [15 minutes]
#2 2 heaped teaspoons Chinese five-spice [2 heaped]
#3 100 ml Marsala [100 ml]
#4 1 litre organic chicken stock [1 litre]
</code></pre>
<p>使用<code>to_compare</code>和列表提取公共值:</p>
<pre><code>to_compare= ["1 hour", "20 litres", "100 ml", "2", "15 minutes", "20 minutes"]
df['common'] = df['extracted_qty'].apply(lambda x: [el for el in x if el in to_compare])
# text extracted_qty common
#0 Halve the clementine ... [1 hour, 20 minutes] [1 hour, 20 minutes]
#1 Add the stock, then ... [15 minutes] [15 minutes]
#2 2 heaped teaspoons ... [2 heaped] []
#3 100 ml Marsala [100 ml] [100 ml]
#4 1 litre organic chicken... [1 litre] []
</code></pre>