擅长:python、mysql、java
<p>使用<code>df.duplicated</code>和<code>keep=False</code>获得dup行的布尔掩码,然后提取行:</p>
<pre><code># split name / number from your csv file
df = pd.read_csv('names_dup2.csv', quoting=1, header=None)[0] \
.str.split('\t', expand=True)
# increment index to match line number
df.index += 1
# keep duplicate entries
out = df[df[0].duplicated(keep=False)]
# export to duplicated_data.csv
out.to_csv('duplicated_data.csv', header=False)
</code></pre>
<p>输出文件的内容:</p>
<pre><code>15,ANDREW ZHAO CHONG,83091746
19,ANDREW ZHAO CHONG,83091746
26,ANDREW ZHAO CHONG,83091746
48,ANDREW ZHAO CHONG,83091746
53,KOH KANG RI,89943392
56,KOH KANG RI,89943392
63,ENOS ZHAO KANG SONG,80746554
66,ENOS ZHAO KANG SONG,80746554
80,ENOS ZHAO KANG SONG,80746554
</code></pre>
<p><strong>单行版本</strong></p>
<pre><code>pd.read_csv('names_dup2.csv', quoting=1, header=None)[0] \
.str.split('\t', expand=True) \
.assign(index=lambda x: x.index+1) \
.set_index('index') \
[lambda x: x[0].duplicated(keep=False)] \
.to_csv('duplicated_data.csv', header=False)
</code></pre>