Filter dataframe,其中字段假定为时间戳,但info()显示非空obj

2024-09-29 06:28:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个试图推入数据库的数据帧,但收到一条错误消息,表明违反了主键约束:

ecom.to_sql('ecom',
            con = engine,
            schema = 'ga_shop',
            index = False,
            if_exists = 'append')
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "ecom_pk"
DETAIL:  Key (product_name, dimension3, dimension1)=(Apples, 2019-10-29 19:12:58.83+00, 1572376787423.pgp5jcd7) already exists.

我想过滤dataframe ecom以查看这些重复行,但我无法:

ecom[(ecom['dimension3'] == '2019-10-29 19:12:58.83+00')]

返回一个空数据帧

ecom.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 82 entries, 14014 to 15002
Data columns (total 15 columns):
dimension1                   82 non-null object
dimension3                   82 non-null object
product_name                 82 non-null object

ecom.head()
              dimension1                     dimension3  ... item_revenue sampling
0  1572337407387.0epe8tn  2019-10-29T04:20:08.119-04:00  ...          0.0    False
1  1572337407387.0epe8tn  2019-10-29T04:20:08.119-04:00  ...          0.0    False
2  1572337407387.0epe8tn  2019-10-29T04:20:08.119-04:00  ...          0.0    False
3  1572337407387.0epe8tn  2019-10-29T04:20:08.119-04:00  ...          0.0    False
4  1572337407387.0epe8tn  2019-10-29T04:20:08.119-04:00  ...          0.0    False

字段dimension3是我从API检索到的iso时间戳,但我不清楚pandas是如何存储它的,因为当我检查ecom.info()时,它显示“82非空对象”

我如何筛选维度3(时间戳)以匹配错误消息的维度,以便我可以看到这些重复项


Tags: to数据namefalse消息object错误exists
1条回答
网友
1楼 · 发布于 2024-09-29 06:28:10

不幸的是,在使用utc=True进行筛选之前,您必须转换为日期时间,因为存在不同的时区:

date = pd.to_datetime('2019-10-29 19:12:58.83+00', utc=True)
df = ecom[pd.to_datetime(ecom['dimension3'], utc=True) == date]

另一种解决方案应为仅转换字符串:

print (ecom)
              dimension1                        dimension3  item_revenue  \
0  1572337407387.0epe8tn  2019-10-29 04:20:08.119000-04:00           0.0   
1  1572337407387.0epe8tn  2019-10-29 04:20:08.119000-04:00           0.0   
2  1572337407387.0epe8tn  2019-10-29 04:20:08.119000-04:00           0.0   
3  1572337407387.0epe8tn  2019-10-29 04:20:08.119000-04:00           0.0   
4  1572337407387.0epe8tn  2019-10-29 19:12:58.830000+00:00           0.0   

   sampling  
0     False  
1     False  
2     False  
3     False  
4     False <-changed date

print (type(ecom['dimension3'].iat[0]))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

df = ecom[ecom['dimension3'] == pd.to_datetime('2019-10-29 19:12:58.83+00')]
print (df)
              dimension1                        dimension3  item_revenue  \
4  1572337407387.0epe8tn  2019-10-29 19:12:58.830000+00:00           0.0   

   sampling  
4     False  

相关问题 更多 >