<p>“绕过”这个奇怪的Pandas功能最简单的方法是使用<code>df.duplicated(col_name) | df.duplicated(col_name, take_last=True)</code>生成一个掩码。按位or表示生成的序列是<code>True</code>的所有重复项。在</p>
<p>接下来,使用索引设置原始名称或新名称中带有fron中数字的值。在</p>
<p>以下是您的案例:</p>
<pre><code># Generating your DataFrame
df_attachment = pd.DataFrame(index=range(5))
df_attachment['ID'] = [1, 2, 3, 4, 5]
df_attachment['File Name'] = ['Text.csv', 'TEXT.csv', 'unique.csv',
'unique2.csv', 'text.csv']
df_attachment['LowerFileName'] = df_attachment['File Name'].str.lower()
# Answer from here, mask generation over two lines for readability
mask = df_attachment.duplicated('LowerFileName')
mask = mask | df_attachment.duplicated('LowerFileName', take_last=True)
df_attachment['Duplicate'] = mask
# New column names if possible
df_attachment['number_name'] = df_attachment['ID'].astype(str) + df_attachment['File Name']
# Set the final unique name column using the mask already generated
df_attachment.loc[mask, 'UniqueFileName'] = df_attachment.loc[mask, 'number_name']
df_attachment.loc[~mask, 'UniqueFileName'] = df_attachment.loc[~mask, 'File Name']
# Drop the intermediate column used
del df_attachment['number_name']
</code></pre>
<p>最后一个<code>df_attachment</code>:</p>
^{pr2}$
<p>此方法使用矢量化的pandas操作和索引,因此对于任何大小的数据帧都应该是快速的。在</p>
<h2>编辑:2017-03-28</h2>
<p>昨天有人投了一票,所以我想我可以编辑一下,说这是从<code>0.17.0</code>开始就得到了熊猫的支持,看这里的变化:<a href="http://pandas.pydata.org/pandas-docs/version/0.19.2/whatsnew.html#v0-17-0-october-9-2015" rel="nofollow noreferrer">http://pandas.pydata.org/pandas-docs/version/0.19.2/whatsnew.html#v0-17-0-october-9-2015</a></p>
<p>现在可以使用<code>drop_duplicates</code>和<code>duplicated</code>的<code>keep</code>参数,并将其设置为<code>False</code>来标记所有重复项:<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html" rel="nofollow noreferrer">http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html</a></p>
<p>因此,在生成重复列的行的上方变成:</p>
<p><code>df_attachment['Duplicate'] = df_attachment.duplicated('LowerFileName', keep=False)</code></p>