回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有两个不同大小的数据帧。它们都有四列:Words、x、y和z</p>
<p>但是,当连接这两个数据帧时,我希望保持相似单词的x、y、z值。保留df1中不存在但df2中存在的单词</p>
<p>我试图使用<code>pd.merge</code>,但这将保留这两个值,并且只保留相似的单词。如果我使用<code>pd.concat</code>,我必须删除类似的元素,但不会从第一个数据帧中删除</p>
<h2>样品</h2>
<pre class="lang-py prettyprint-override"><code>df1 = pd.DataFrame({'Words':
['aardvark', 'abalone', 'abandon'],
'x': [0.999, 0.888, 0.777],
'y': [0.999, 0.888, 0.777],
'z': [0.999, 0.888, 0.777]})
df2 = pd.DataFrame({'Words':
['aaaaahh', 'aardvark', 'abalone', 'abandon', 'zoo', 'zoom', 'zucchini'],
'x': [0.199, 0.111, 0.222, 0.333, 0.232, 0.842, 0.945],
'y': [0.929, 0.111, 0.222, 0.333, 0.112, 0.62, 0.265],
'z': [0.993, 0.111, 0.222, 0.333, 0.212, 0.344, 0.745]})
# Expected output
df_res = pd.DataFrame({'Words':
['aaaaahh', 'aardvark', 'abalone', 'abandon', 'zoo', 'zoom', 'zucchini'],
'x': [0.199, 0.999, 0.888, 0.777, 0.232, 0.842, 0.945],
'y': [0.929, 0.999, 0.888, 0.777, 0.112, 0.62, 0.265],
'z': [0.993, 0.999, 0.888, 0.777, 0.212, 0.344, 0.745]})
</code></pre>
<h2>我试过的</h2>
<pre class="lang-py prettyprint-override"><code>import pandas as pd
# Merge
df_res = pd.merge(df1, df2, on='Word', how='inner')
# Concat
df_concat = pd.concat(objs=[df1, df2], ignore_index=True)
df_concat = pd.drop_duplicates(subset=['Word'], keep=False, ignore_index=True)
# Compare
d_res = d1[(d1['Word'] != d1['Word'])]
ValueError: Can only compare identically-labeled Series objects
</code></pre>