<p>使用<code>re</code>包删除不需要的冗余字符串,并使用<code>apply</code>函数删除pandas<code>DataFrame</code>中的行。你知道吗</p>
<p>在下面的代码中,您可以看到一个可能的解决方案:</p>
<pre><code>import pandas as pd
import re
def removeReduntantData(row):
if row["strength"] is not None:
string = row["strength"].replace(" ", "\s?")
return re.sub(re.compile(string+"\s?", re.IGNORECASE), "", row["name"]).strip()
else:
return row["name"]
df = pd.DataFrame({"name":["Vitamin B12 Tab 500mcg","Vitamin B12 Tab 5mcg","Vitamin B12 Tablets 250mcg","Vitamin B12-folic Acid","Vitamin B6 & B12 With Folic Acid","Vitamin Deficiency Injectable System - B12","Vitamine 110 Liq","Vitamine B-12 Tab 100mcg","Vitamine B12 25 Mcg - Tablet","Vitamine B12 250mcg"],\
"strength":["500 mcg","5 mcg","250 mcg",None,None,None,None,"100 mcg","25 mcg","250 mcg"]})
df["name"] = df.apply(removeReduntantData, axis=1)
</code></pre>
<p>然后输出<code>DataFrame</code>:</p>
<pre><code>>>> df
name strength
0 Vitamin B12 Tab 500 mcg
1 Vitamin B12 Tab 5 mcg
2 Vitamin B12 Tablets 250 mcg
3 Vitamin B12-folic Acid None
4 Vitamin B6 & B12 With Folic Acid None
5 Vitamin Deficiency Injectable System - B12 None
6 Vitamine 110 Liq None
7 Vitamine B-12 Tab 100 mcg
8 Vitamine B12 - Tablet 25 mcg
9 Vitamine B12 250 mcg
</code></pre>
<p>这样,您就可以使用<code>strength</code>列在<code>name</code>列中查找冗余字符串并将其删除,同时考虑到冗余字符串之间可能没有空格。你知道吗</p>