<p>考虑以下数据帧</p>
<pre><code>TableA = pd.DataFrame(np.random.rand(4, 3),
pd.Index(list('abcd'), name='Key'),
['A', 'B', 'C']).reset_index()
TableB = pd.DataFrame(np.random.rand(4, 3),
pd.Index(list('aecf'), name='Key'),
['A', 'B', 'C']).reset_index()
</code></pre>
<hr/>
<pre><code>TableA
</code></pre>
<p><a href="https://i.stack.imgur.com/ACXcv.png" rel="noreferrer"><img src="https://i.stack.imgur.com/ACXcv.png" alt="enter image description here"/></a></p>
<hr/>
<pre><code>TableB
</code></pre>
<p><a href="https://i.stack.imgur.com/uIB9Y.png" rel="noreferrer"><img src="https://i.stack.imgur.com/uIB9Y.png" alt="enter image description here"/></a></p>
<p>这是做你想做的事的一种方法</p>
<h3>方法1</h3>
<pre><code># Identify what values are in TableB and not in TableA
key_diff = set(TableB.Key).difference(TableA.Key)
where_diff = TableB.Key.isin(key_diff)
# Slice TableB accordingly and append to TableA
TableA.append(TableB[where_diff], ignore_index=True)
</code></pre>
<p><a href="https://i.stack.imgur.com/jnNkF.png" rel="noreferrer"><img src="https://i.stack.imgur.com/jnNkF.png" alt="enter image description here"/></a></p>
<h3>方法2</h3>
<pre><code>rows = []
for i, row in TableB.iterrows():
if row.Key not in TableA.Key.values:
rows.append(row)
pd.concat([TableA.T] + rows, axis=1).T
</code></pre>
<hr/>
<h3>计时</h3>
<p><strong>4行,2个重叠</strong></p>
<p>方法1要快得多</p>
<p><a href="https://i.stack.imgur.com/wpKrE.png" rel="noreferrer"><img src="https://i.stack.imgur.com/wpKrE.png" alt="enter image description here"/></a></p>
<p><strong>10000行5000重叠</strong></p>
<p><strong>循环不正确</strong></p>
<p><a href="https://i.stack.imgur.com/ZVXCU.png" rel="noreferrer"><img src="https://i.stack.imgur.com/ZVXCU.png" alt="enter image description here"/></a></p>