我有一个试图推入数据库的数据帧,但收到一条错误消息,表明违反了主键约束:
ecom.to_sql('ecom',
con = engine,
schema = 'ga_shop',
index = False,
if_exists = 'append')
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "ecom_pk"
DETAIL: Key (product_name, dimension3, dimension1)=(Apples, 2019-10-29 19:12:58.83+00, 1572376787423.pgp5jcd7) already exists.
我想过滤dataframe ecom以查看这些重复行,但我无法:
ecom[(ecom['dimension3'] == '2019-10-29 19:12:58.83+00')]
返回一个空数据帧
ecom.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 82 entries, 14014 to 15002
Data columns (total 15 columns):
dimension1 82 non-null object
dimension3 82 non-null object
product_name 82 non-null object
ecom.head()
dimension1 dimension3 ... item_revenue sampling
0 1572337407387.0epe8tn 2019-10-29T04:20:08.119-04:00 ... 0.0 False
1 1572337407387.0epe8tn 2019-10-29T04:20:08.119-04:00 ... 0.0 False
2 1572337407387.0epe8tn 2019-10-29T04:20:08.119-04:00 ... 0.0 False
3 1572337407387.0epe8tn 2019-10-29T04:20:08.119-04:00 ... 0.0 False
4 1572337407387.0epe8tn 2019-10-29T04:20:08.119-04:00 ... 0.0 False
字段dimension3是我从API检索到的iso时间戳,但我不清楚pandas是如何存储它的,因为当我检查ecom.info()
时,它显示“82非空对象”
我如何筛选维度3(时间戳)以匹配错误消息的维度,以便我可以看到这些重复项
不幸的是,在使用
utc=True
进行筛选之前,您必须转换为日期时间,因为存在不同的时区:另一种解决方案应为仅转换字符串:
相关问题 更多 >
编程相关推荐