使用列表理解为两个不同的条件创建元组列表

2024-10-01 19:17:37 发布

您现在位置:Python中文网/ 问答频道 /正文

是否有一种方法可以使用列表理解创建具有两种不同条件的元组列表

我正在通过一个数组进行交互,如果它符合任一条件,我想返回元组中的整行。第一个是DF在任何列中是否有nan值。 另一个是如果DF中名为ODFS_FILE_CREATE_DATETIME的列与日期列的正则表达式模式不匹配。日期列应该有如下输出:2005242132。10个数字。因此,如果df返回类似于2004dg的内容,则应将其作为错误提取,并将该行添加到我的元组列表中

我可悲的尝试:

[tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values or x in odfscdate_re.search(str(odfscsv_df['ODFS_FILE_CREATE_DATETIME'])) ]

包含两个独立元组列表的完整函数:

def process_csv_formatting(csv):
    odfscsv_df = pd.read_csv(csv, header=None,names=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'])
    odfscsv_df['CSV_FILENAME'] = csv.name
    odfscdate_re = re.compile(r"\d{10}")
    #print(odfscsv_df)
    #odfscsv_df = odfscsv_df.replace('', np.nan)
    errortup = [(odfsname, "Bad_ODFS_FILE_CREATE_DATETIME= " + str(cdatetime), csv.name) for odfsname,cdatetime in zip(odfscsv_df['ODFS_LOG_FILENAME'], odfscsv_df['ODFS_FILE_CREATE_DATETIME']) if not odfscdate_re.search(str(cdatetime))]
    emptypdf = pd.DataFrame(columns=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'])
 
    print([tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values])

    [tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values or x in odfscdate_re.search(str(odfscsv_df['ODFS_FILE_CREATE_DATETIME'])) ]
    #print(odfscsv_df[(odfscsv_df[column_name].notnull()) & (odfscsv_df[column_name] != u'')].index)
    for index, row in odfscsv_df.iterrows():
        #print((row['WAFER_SCRIBE']))
        print((row['ODFS_FILE_CREATE_DATETIME']))
    #errortup = [x for x in odfscsv_df['ODFS_FILE_CREATE_DATETIME']]
    if len(errortup) != 0:
        #print(errortup)  #put this in log file statement somehow
        #print(errortup[0][2])
        return emptypdf
    else:

        return odfscsv_df

示例CSV数据。逗号使单元格无效:

2005091432_943SK1J.00J.SK1J-23.FPD.FMGN520.Jx6D36ny5EO53qAtX4.log,,W943SK10,MGN520,0Z0RK072TCD2
2005230137_014SF1J.00J.SF1J-23.WCPC.FMGN520.XlwHcgyP5eFCpZm5cf.log,,W014SF10,MGN520,DM4MU129SEC1
2005240909_001914J.E0J.914J-15.WRO3PC.FMGN520.nZKn7OvjGKw1i4pxiu.log,,K001914E,MGN520,DM3FZ226SEE3
2005242132_001914J.E0J.914J-15.WRO4PC.FMGN520.V8dcLhEgygRj2rP2Df.log,2005242132,K001914E,MGN520,DM3FZ226SEE3
2005251037_001914J.E0J.914J-15.WRO4PC.FMGN520.dyixmQ5r4SvbDFkivY.log,2005251037,K001914E,MGN520,DM3FZ226SEE3
2005251215_949949J.E0J.949J-21.WRO2PP.FMGN520.yp1i4e7a7D1ighkdB7.log,2005251215,K949949E,MGN520,DG2KV122SEF6
2005251231_949949J.E0J.949J-25.WRO2PP.FMGN520.oLQGhc2whAlhC3dSuR.log,2005251231,K949949E,MGN520,DG2KV333SEF3
2005260105_001914J.E0J.914J-15.WRO4PC.FMGN520.wOQMUOfZgkQK9iHJS5.log,2005260105,K001914E,MGN520,DM3FZ226SEE3
2006111130_950909J.00J.909J-22.FPC.FMGN520.UuqeGtw9xP6lLDUW9N.log,2006111130,K9509090,MGN520,DG7LW031SEE7
2006111612_950909J.00J.909J-22.FPC.FMGN520.hoDl3QSNPKhcs4oA2N.log,2006111612,K9509090,MGN520,DG7LW031SEE7
2006120638_006914J.E0J.914J-15.CZPC.FMGN520.qCgFUH2H21ieT641i9.log,2006120638,K006914E,MGN520,DM8KJ568SEC3
2006122226_006914J.E0J.914J-15.CZPC.FMGN520.nSHSp7klxjrQlVTcCu.log,2006122226,K006914E,MGN520,DM8KJ568SEC3
2006130919_006914J.E0J.914J-15.CZPC.FMGN520.Zd6DrMUsCjuEVBFwvn.log,2006130919,K006914E,MGN520,DM8KJ568SEC3
2006140457_007911J.E0J.911J-25.RDR2PC.FMGN520.QPX9r59TnXObXyfibv.log,2006140457,K007911E,MGN520,DN4AU351SED1
2006141722_007911J.E0J.911J-25.WCPC.FMGN520.dNQLkvQlPTplEjJspB.log,2006141722,K007911E,MGN520,DN4AU351SED1
2006160332_007911J.E0J.911J-25.WCPC.FMGN520.DQiH82Ze9fCoaLVbDE.log,2006160332,K007911E,MGN520,DN4AU351SED1
2006170539_007911J.E0J.911J-25.WCPC.FMGN520.TjakhXkmhmlGhfLheo.log,2006170539,K007911E,MGN520,DN4AU351SED1

Tags: csvinrelogdffordatetimecreate
1条回答
网友
1楼 · 发布于 2024-10-01 19:17:37

添加dtype参数,以便在调用read_csv时将'ODFS_FILE_CREATE_DATETIME'作为数据类型字符串导入

odfscsv_df = pd.read_csv(csv, header=None,
                              names=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'],
                              dtype={'ODFS_FILE_CREATE_DATETIME': str})

m1 = odfscsv_df.isna().any(1)
s = odfscsv_df['ODFS_FILE_CREATE_DATETIME']
m2 = ~s.astype(str).str.isnumeric()
m3 = s.astype(str).str.len().ne(10)

[tuple(x) for x in odfscsv_df[m1 | m2 | m3].values]

相关问题 更多 >

    热门问题