<p>如果我理解正确,下面应该可以:</p>
<h2>数据帧:</h2>
<pre><code>df
column A
0 TTT-Changing Car-BBBB-KKKK
1 TTT-KKKK - Changing device-KKKK
2 Releasing device-RRRR-KKKK-TTTT
3 RRRR-BBBB-Switching Car-TTTT
4 Login issue -RRRR-KKKK-TTTT
5 CCCC-Activation issue-RRRR-KKKK-TTTT
</code></pre>
<p>使用<code>str.extract</code>作为<code>Activation</code>&<code>Changing</code>字符串</p>
<pre><code>df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
column A column B
0 TTT-Changing Car-BBBB-KKKK Changing Car
1 TTT-KKKK - Changing device-KKKK Changing device
2 Releasing device-RRRR-KKKK-TTTT NaN
3 RRRR-BBBB-Switching Car-TTTT NaN
4 Login issue -RRRR-KKKK-TTTT NaN
5 CCCC-Activation issue-RRRR-KKKK-TTTT Activation
</code></pre>
<p>现在根据需要替换新列ie<code>colB</code>中的单词</p>
<pre><code>df['column B'] = df['column B'].str.replace(r'(^.*Changing.*$)', 'Change')
df['column B'] = df['column B'].str.replace(r'(^.*Activation.*$)', 'Activation')
df
column A column B
0 TTT-Changing Car-BBBB-KKKK Change
1 TTT-KKKK - Changing device-KKKK Change
2 Releasing device-RRRR-KKKK-TTTT NaN
3 RRRR-BBBB-Switching Car-TTTT NaN
4 Login issue -RRRR-KKKK-TTTT NaN
5 CCCC-Activation issue-RRRR-KKKK-TTTT Activation
</code></pre>
<h2>另一种方法是:</h2>
<p>下面更好的方法是,您可以安排要重命名的项目数量,然后应用于数据帧,如下所示:</p>
<pre><code>df = pd.read_csv("data_file")
df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
replacements = {
'column B': {
r'(^.*Changing.*$)': 'Change',
r'(^.*Activation.*$)': 'Activation'}
}
df = df.replace(replacements, regex=True)
print(df)
</code></pre>
<h2>结果:</h2>
<pre><code> column A column B
0 TTT-Changing Car-BBBB-KKKK Change
1 TTT-KKKK - Changing device-KKKK Change
2 Releasing device-RRRR-KKKK-TTTT NaN
3 RRRR-BBBB-Switching Car-TTTT NaN
4 Login issue -RRRR-KKKK-TTTT NaN
5 CCCC-Activation issue-RRRR-KKKK-TTTT Activation
</code></pre>
<p>或</p>
<p>这里我们没有在replacement中定义列名,因此您需要定义<code>df['column B'] = </code></p>
<pre><code>df['column B'] = df['column A'].str.extract('(Activation|Changing[^-]*)')
replacements = {
r'(^.*Changing.*$)': 'Change',
r'(^.*Activation.*$)': 'Activation'
}
print(replacements)
df['column B'] = df['column B'].replace(replacements, regex=True)
print(df)
</code></pre>
<h2>注:</h2>
<p><code>replacement</code>相对较慢,而按列操作则足够快</p>