如何比较Pandas中的两列,只使用部分代码?

2024-09-26 04:44:28 发布

您现在位置:Python中文网/ 问答频道 /正文

在Python3和pandas中,我有两个数据帧,“doacoes\u cnpjs”和“te”

doacoes_cnpjs.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 22811 entries, 0 to 47353
Data columns (total 19 columns):
UF                                22811 non-null object
Partido_x                         22811 non-null object
Cargo_x                           22811 non-null object
Nome_candidato_x                  22811 non-null object
CPF_candidato                     22811 non-null int64
CPF_CNPJ_doador                   22811 non-null float64
Nome_doador                       22811 non-null object
Nome_doador_Receita               22811 non-null object
Valor                             22811 non-null float64
CPF_CNPJ_doador_originario        22811 non-null object
Nome_doador_originario            22811 non-null object
Nome_doador_originario_Receita    22811 non-null object
Estado                            22811 non-null object
Cargo_y                           22811 non-null object
Nome_candidato_y                  22811 non-null object
CPF                               22811 non-null int64
Nome_urna                         22811 non-null object
Partido_y                         22811 non-null object
Situacao                          22811 non-null object
dtypes: float64(2), int64(2), object(15)
memory usage: 3.5+ MB


te.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5541 entries, 0 to 5664
Data columns (total 13 columns):
DATA_LS               4118 non-null object
DATA_INCLUS           2957 non-null object
Proprietario          5541 non-null object
Nome_propriedade      5541 non-null object
Municipio             5525 non-null object
Estado                5533 non-null object
CNPJ_CPF_CEI          5541 non-null object
CNPJ_CPF_CEI_limpo    5541 non-null float64
Trab_Envolv           4529 non-null float64
Ramo_atividade        2840 non-null object
Localizacao           2734 non-null object
Cod_ativ              2975 non-null object
Tipo_lista            5541 non-null object
dtypes: float64(2), object(11)
memory usage: 606.0+ KB

数据帧有两列具有相同类型的代码-“CPF\u CNPJ\u doador”和“CNPJ\u CPF\u CEI\u limpo”。它们是带有13或14位整数的代码 例如:“615895900136”、“78141843000103”、“46991295000106”、“5351494000172”。。。你知道吗

我想通过比较“doacoes\u cnpjs”和“te”,使用“CPF\u CNPJ\u doador”和“CNPJ\u CPF\u CEI\u limpo”列来创建一个新的数据帧。但它不能是一个共同的合并

我只想比较列的前八个数字。示例:从“615895900136”仅使用“61589590”,并与代码“78141843000103”中的“78141843”进行比较,从而在所有行上进行比较

拜托,有办法吗?还是最好在提取前几个字符之前将代码转换成字符串?你知道吗


Tags: columns数据代码pandasobjectnullnonfloat64