pd.合并不使用转换的数据类型

2024-09-29 19:34:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将两个pandas数据帧(一个是空间数据帧sdf,另一个是简单的数据帧)合并到一个公共字段上,GEOID使用pd.合并. 在sdf中,大地水准面是一个字符串,在df中是一个int。我使用.astype('str')将df大地水准面字段也转换为字符串。不过,打电话的时候pd.合并输出为空或出现错误,提示“您正在尝试合并object和int64列”。我已用.dtypes()确认这两个列都是字符串。你知道为什么合并不起作用吗?你知道吗

我试过将两者转换为字符串,也试过将两者转换为intpd.连接以及pd.concat公司但这两种方法都没有达到预期效果。你知道吗

import pandas as pd
#Read in CSV with updates (would already be df from Socrata pull in real version)
updated_csv= r"C:\Users\mad10412\Desktop\Active_Business_Data_Edited.csv"
updated_csv_df = pd.read_csv(updated_csv)
updated_csv_df.head(5)
updated_csv_df['GEOID10']=updated_csv_df['GEOID10'].astype(str)
updated_csv_df.dtypes
output_layer_name = 'Join_Features_Test5'
actbus=gis.content.search(output_layer_name)
ActiveBusinesses_item = actbus[0]
ActiveBusinesses_item
ActiveBusinesses_flayer = ActiveBusinesses_item.layers[0]
ActiveBusinesses_flayer
ActiveBusinesses_fset = ActiveBusinesses_flayer.query() #querying without any conditions returns all the features
ActiveBusinesses_fset.sdf.head()
ActiveBusinesses_fset.sdf.shape
ActiveBusinesses_fset.sdf.dtypes
##Attempt 1: Includes original data and Adds Column names but no data

overlap_rows = ActiveBusinesses_fset.sdf.join(updated_csv_df.set_index('GEOID10'),on='GEOID10', lsuffix='_left', rsuffix='_right')
overlap_rows.head(10)
overlap_rows.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF.csv")

##Attempt 2: Only includes column name. no data at all
overlap_rows = pd.merge(left = ActiveBusinesses_fset.sdf, 
                        right = updated_csv_df, 
                        how='inner',
                        on = 'GEOID10')
overlap_rows.head(5)
overlap_rows.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF2.csv")

##Attempt 3: Includes all columns and all data, but GEOIDs don't match
result = pd.concat([ActiveBusinesses_fset.sdf, updated_csv_df], axis=1, join='inner')
result.head(5)
result.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF3.csv")


##Attempt 4:  Only includes column name. no data at all
left=ActiveBusinesses_fset.sdf
right=updated_csv_df
result = pd.merge(left, right, how='inner',on=['GEOID10', 'GEOID10'])
result.head(5)
result.to_csv("C:\\Users\\mad10412\\Desktop\\ConcatDF4.csv")

两个数据帧的数据如下所示:

df=pd.DataFrame({'GEOID': ['060372932023', '060372941201', '060372932022'],
               'Mining': [6, 4,2 ],
               'Agriculture': [10, 12, 4]})
df

数据帧之间的唯一区别是一个数据帧有一个包含几何图形的形状列。本质上,我正在尝试将这些数据帧合并在一起,以找到农业和采矿等领域的值不同的实例。你知道吗

df=pd.DataFrame({'GEOID': ['060372932023', '060372941201', '060372932022'],
               'Mining': [6, 4,2 ],
               'Agriculture': [10, 12, 4],
                'Mining2': [8, 3 , 1],
               'Agriculture2': [14, 0, 6]})
df

这将为每个大地水准面生成一行,其中包含来自两个数据帧的数据。请参阅最后的代码段注释,了解输出的实际外观。你知道吗


Tags: csv数据dfresultusersheadrowspd

热门问题