<p>到目前为止,@Rob Raymond方法更好</p>
<p>但是,如果两个数据帧具有相同的行数,则可以使用字典和for循环获得类似的结果(在框架中的实践不佳)</p>
<pre><code>df_have1 = pd.DataFrame({
'age':[7,34,19],
'gender':['F',np.nan,'M'],
'profession':['student', 'CEO', 'artist']})
df_have2 = pd.DataFrame({
'age':[7,34,19],
'gender':['np.nan','F',np.nan],
'interests':['acting', 'cars', 'gardening']})
df_need = pd.DataFrame({
'age':[7,34,19],
'gender':['F','F','M'],
'profession':['student', 'CEO', 'artist'],
'interests':['acting', 'cars', 'gardening']})
dct = {k:{} for k in (list(df_have1.columns) + list(df_have2.columns))}
for col in dct.keys():
if col in list(df_have1.columns):
for row in df_have1.index:
if col in list(df_have2.columns): # intersection
if df_have1[col].iloc[row] not in ['NaN', np.nan]:
dct[col][row] = df_have1[col].iloc[row]
elif df_have2[col].iloc[row] not in ['NaN', np.nan]:
dct[col][row] = df_have2[col].iloc[row]
else: # without NaN values in the entry
dct[col][row] = np.nan
else: # data only in df_have1
dct[col][row] = df_have1[col].iloc[row]
else: # data only in df_have2
for row in df_have2.index:
dct[col][row] = df_have2[col].iloc[row]
df_get = pd.DataFrame(dct)
assert df_get.equals(df_need) # assures the both df are the same
</code></pre>