我刚接触熊猫
我有两个数据源-A和B
A和B都有一列,数据如下:
A
Cj0KCQiAiZPvBRDZARIsAORkq7fOa9HW8u6iqLm1KvTjAhWTrYoLeL_baPPO5WoiLHsHeVYUmFFxXa0aAvxKEALw_wcB
EAIaIQobChMImLDtsuSY5gIVR3RgCh1ckQ1fEAAYASAAEgJ4nvD_BwE
Cj0KCQiAiZPvBRDZARIsAORkq7fOa9HW8u6iqLm1KvTjAhWTrYoLeL_baPPO5WoiLHsHeVYUmFFxXa0aAvxKEALw_wcB
Cj0KCQiAiZPvBRDZARIsAORkq7enWHEermCPb4NKdGwnh2HQwUPftxai7nufoVPOgDHE8CE9_s0hSAIaArPJEALw_wcB
Cj0KCQiAiZPvBRDZARIsAORkq7fQm2PgqtRHrXGkzcBPsZo-1Rwm4Ln6RuSBLumtNeElnoASiyC49HAaAoTWEALw_wcB
B类
EAIaIQobChMI_tf0seSY5gIViKztCh1TbAAhEAAYASAAEgKcg_D_BwE
EAIaIQobChMImpyb_-OY5gIVET5gCh38Kw3bEAAYBCAAEgLmHfD_BwE
Cj0KCQiAiZPvBRDZARIsAORkq7fnlXGP7pfobqU5VFzlMPdPSjCKzSE6n43QSnkbQ264SVnX9kkSyHAaApudEALw_wcB
EAIaIQobChMIwvGQt-SY5gIVh6ztCh1c0gHQEAAYAyAAEgLqvPD_BwE
Cj0KCQiAiZPvBRDZARIsAORkq7ej_kXsK5XGwISOQTWUZoChlugerRH0Wcz4Wrpn1qJzlIkKxwqljCsaAhRNEALw_wcB
我将框架连接到一根柱子上,如下所示:
joined = pd.concat([A,B])
然后得到一个包含两个源的列。
接下来我创建新的dataframe,将joined
存储在第一列,将B
存储在第二列
final_export = pd.DataFrame()
final_export['A'] = joined
final_export['B'] = B
数据框如下所示:
最终出口
A B
EAIaIQobChMI_tf0seSY5gIViKztCh1TbAAhEAAYASAAEgKcg_D_BwE EAIaIQobChMI_tf0seSY5gIViKztCh1TbAAhEAAYASAAEgKcg_D_BwE
EAIaIQobChMImpyb_-OY5gIVET5gCh38Kw3bEAAYBCAAEgLmHfD_BwE EAIaIQobChMI_tf0seSY5gIViKztCh1TbAAhEAAYASAAEgKcg_D_BwE
EAIaIQobChMIwvGQt-SY5gIVh6ztCh1c0gHQEAAYAyAAEgLqvPD_BwE
EAIaIQobChMI_tf0seSY5gIViKztCh1TbAAhEAAYASAAEgKcg_D_BwE
EAIaIQobChMImpyb_-OY5gIVET5gCh38Kw3bEAAYBCAAEgLmHfD_BwE
EAIaIQobChMIwvGQt-SY5gIVh6ztCh1c0gHQEAAYAyAAEgLqvPD_BwE
...
A列的条目比B列多
然后我创建了一个新的Dataframe,它有3个列-在两个列中,只有在A中,只有在B中。逻辑是,我有一个包含所有值的列表,我需要检查值是否存在于两个源中,并且仅存在于一个源中的值将被放置在仅A或仅B列中:
df_export = pd.DataFrame({'In both': pd.Series(np.intersect1d(final_export['A'], final_export['B'])),
'Only in A': pd.Series(np.setdiff1d(final_export['A'], final_export['B'])),
'Only in B': pd.Series(np.setdiff1d(final_export['B'], final_export['A']))})
但我得到一个错误:
TypeError: '<' not supported between instances of 'float' and 'str'
我尝试过对B列使用.fillna('')
,因为它的条目比A列少,但仍然得到相同的错误
谢谢你的建议
这应该在纯python中完成,然后创建数据帧-因此:
然后创建数据帧,如中所示
相关问题 更多 >
编程相关推荐