我只想计算两个数据帧之间缺少的值,所以。。。。 这是我试过的代码,效果很好
import pandas as pd
df1 = pd.DataFrame([1, 2, 3, 4, 5, 6], columns=["my_column"])
df2 = pd.DataFrame([1, 2, 3], columns=["my_column"])
result = df1[~df1.set_index(list(df1)).index.isin(df2.set_index(list(df2)).index)].dropna()
print(result)
输出:
my_column
3 4
4 5
5 6
所以它在静态数据帧上运行良好
但是当我使用基于sql的代码时,我发现了一个问题: 这是我的全部代码:
import pyodbc
import pandas as pd
import os
import sqlalchemy as db
from sqlalchemy import create_engine, MetaData, Table, Column, Integer, String, Date, Float
import datetime as dt
# connect db
engine = create_engine('mssql+pyodbc://xxxxxxxxxx\SMARTRNO_EXPRESS/myDB?driver=SQL+Server+Native+Client+11.0')
connection = engine.connect()
esn_datafeed_query = 'SELECT * FROM [myDB].[dbo].[esn_datafeed]'
esn_inter_intra_query = 'SELECT * FROM [esn_inter_intra_merge]'
esn_datafeed_df = pd.read_sql(esn_datafeed_query ,engine)
esn_inter_intra_merge_df = (esn_inter_intra_query, engine)
df1 = pd.DataFrame(esn_datafeed_df, columns=["st_umts_df_relation_key"])
df2 = pd.DataFrame(esn_inter_intra_merge_df, columns=["st_umts_esn_inter_intra_relation_key"])
result = df1[~df1.set_index(list(df1)).index.isin(df2.set_index(list(df2)).index)].dropna()
print(result)
所以前面的代码显示了所有的值,我不需要这个。。。我只想显示缺少的值。。。。我尝试了不同的方法,代码如下:
esn_datafeed_df = pd.read_sql('SELECT * FROM [myDB].[dbo].[esn_datafeed]', engine)
esn_inter_intra_merge_df = pd.read_sql('SELECT * FROM [myDB].[dbo].[esn_inter_intra_merge]', engine)
df1 = pd.DataFrame(esn_datafeed_df, columns=["st_umts_df_relation_key"])
df2 = pd.DataFrame(esn_inter_intra_merge_df, columns=["st_umts_esn_inter_intra_relation_key"])
merged = df1.merge(df2 , how="left", indicator=True)
result = merged.query("_merge == 'left_only'")[["st_umts_df_relation_key"]]
print(result)
但我有一个错误:
Traceback (most recent call last):
File "C:/Users/haroo501/PycharmProjects/tool_check_nbr/my_missing_result.py", line 18, in <module>
merged = df1.merge(df2 , how="left", indicator=True)
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\frame.py", line 7336, in merge
return merge(
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 68, in merge
op = _MergeOperation(
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 619, in __init__
self._validate_specification()
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 1183, in _validate_specification
raise MergeError(
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
我还尝试了以下代码:
df = df1.merge(df2, how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only']
但我发现这个错误:
Traceback (most recent call last):
File "C:/Users/haroo501/PycharmProjects/tool_check_nbr/my_missing_result.py", line 23, in <module>
df = df1.merge(df2, how = 'outer' ,indicator=True).loc[lambda x : x['_merge']=='left_only']
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\frame.py", line 7336, in merge
return merge(
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 68, in merge
op = _MergeOperation(
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 619, in __init__
self._validate_specification()
File "C:\Users\haroo501\PycharmProjects\tool_check_nbr\venv\lib\site-packages\pandas\core\reshape\merge.py", line 1183, in _validate_specification
raise MergeError(
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
为了简单地解释一下我的数据库,我有两个表
这是第二个表esn_inter_intra_merge
现在我想计算两个表之间的差异,我需要计算esn_datafeed.st_umts_df_relation_key
中的值,而不是esn_inter_intra_merge.st_umts_esn_inter_intra_relation_key
所以任何人都知道如何解决这个问题。。。。。 可能是因为数据库中有大量数据
有没有办法处理查询,这样就可以了
我认为问题在于您的新数据帧对列使用了不同的名称。然而,听起来你无论如何都应该使用集合。下面是如何获取两列之间的值之间的symmetric difference
然后,您可以检查数据帧值是否在这些缺少的值中
编辑
经过进一步思考,这难道不是一个与熊猫无关的SQL问题吗
这样行吗?此SQL查询从
t1
(esn_datafeed
)中选择所有记录,其中t2
(esn_inter_intra_merge
)的st_umts_esn_inter_intra_relation_key
列中没有相应的st_umts_df_relation_key
值相关问题 更多 >
编程相关推荐