在将数据帧保存到csv文件时删除一些数据

2024-10-03 17:15:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在运行红移查询,它有4000万条记录。但当我保存到csv文件时,它只显示了7000条记录。你能帮我解决这个问题吗

例如:

代码:

    conn = gcso_conn1()
    with conn.cursor() as cur:
        
        query = "select * from (select a.src_nm Source_System ,b.day_id Date,b.qty Market_Volume,b.cntng_unt Volume_Units,b.sls_in_lcl_crncy Market_Value,b.crncy_cd Value_Currency,a.panel Sales_Channel,a.cmpny Competitor_Name,a.lcl_mnfcr Local_Manufacturer ,a.src_systm_id SKU_PackID_ProductNumber,upper(a.mol_list) Molecule_Name,a.brnd_nm BrandName_Intl,a.lcl_prod_nm BrandName_Local,d.atc3_desc Brand_Indication,a.prsd_strngth_1_nbr Strength,a.prsd_strngth_1_unt Strength_Units,a.pck_desc Pack_Size_Number,a.prod_nm Product_Description,c.iso3_cntry_cd Country_ISO_Code,c.cntry_nm Country_Name from gcso_prd_cpy.dim_prod a join gcso_prd_cpy.fct_sales b on (a.SRC_NM='IMS' and b.SRC_NM='IMS' and a.prod_id = b.prod_id) join gcso_prd_cpy.dim_cntry c on (a.cntry_id = c.cntry_id) left outer join gcso_prd_cpy.dim_thrc_area d on (a.prod_id = d.prod_id) WHERE a.SRC_NM='IMS' and c.iso3_cntry_cd in ('JPN','IND','CAN','USA') and upper(a.mol_list) in ('AMBRISENTAN', 'BERAPROST','BOSENTAN') ORDER BY b.day_id ) a"
        #print(query)
        cur.execute(query)
        result = cur.fetchall()
        conn.commit()
        column = [i[0] for i in cur.description]
        sqldf = pd.DataFrame(result, columns= column)
        print(sqldf.count())
        #print(df3)
        
        sqldf.to_csv(Output_Path, index= False, sep= '\001', encoding = 'utf-8')

Tags: andnameinidcdprodconnquery
1条回答
网友
1楼 · 发布于 2024-10-03 17:15:24

一切都应该正常工作。我认为主要问题是使用count()进行调试。您期望记录的数量,但文档显示:

Count non-NA cells for each column or row.

在调试DataFrame时最好使用:

print(len(df))
print(df.shape)
print(df.info())

此外,您还可以使用read_sql更轻松地执行此操作:

import pandas as pd
from sqlalchemy import create_engine


header = True
for chunk in pd.read_sql(
    'your query here - SELECT * FROM... ',
    con=create_engine('creds', echo=True),  # set creds - postgres+psycopg2://user:password@host:5432/db_name
    chunksize=1000,  # read by chunks
):
    file_path = '/tmp/path_to_your.csv'
    chunk.to_csv(
        file_path,
        header=header,
        mode='a',
        index=False,
    )
    header = False

相关问题 更多 >