添加regex新列nan,但regex tester显示regex有效

2024-09-30 20:19:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个来自测试回归失败的错误消息csv,我正在将它导入一个pandas数据帧,但是我想找到一些与异常相关的子字符串。你知道吗

我用.csv的内容填充数据框,如下所示:

df = pd.read_csv('ErrorMessage3.csv', header=None, sep=',', 
             names=['ErrorMessage'])

我有下面的regex和相应的测试字符串(这是错误消息的dataframe列中的第一个条目),它正好返回我想要的结果:

teststring = "Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp 
Date Record from Epay Account {DBServer;UserName;Password='', 
DatabaseName='',Year Offset='-10'}> ---> 
System.Data.SqlTypes.SqlNullValueException: Data is Null. This method or 
property cannotbecalled 
on Null values. ---> System.Data.SqlTypes.SqlNullValueException2: Data is Null."

re.findall(r"---> ([^:]+): ", teststring)

从而产生以下输出:

['System.Data.SqlTypes.SqlNullValueException',
 'System.Data.SqlTypes.SqlNullValueException2']

但我希望能够在我的数据帧中添加这个作为“例外”列。我以为这样行得通:

df['Exceptions'] = df['ErrorMessage'].str.extract(r"---> ([^:]+): ")

但当我运行它时,我会添加“Exceptions”列,但所有行都是NaN。我验证了我的ErrorMessage是object类型,并且我使用了一个在线regex测试程序来验证我的ErrorMessage条目中至少有一个子集确实包含与我的regex匹配的异常。我读过一些类似的堆栈溢出问题,但我运气不太好。你知道吗

为什么将regex应用于数据帧会产生nan,而将其应用于单个字符串会返回我想要的结果?你知道吗


Tags: csv数据字符串消息dfdata错误条目
2条回答

正如@Trenton\u M指出的,extractall返回一个新的多索引数据帧,因此一种解决方案是使用groupby,然后连接所有匹配的字符串。你知道吗

下面是一个简单的演示:

import pandas as pd
import numpy as np
df = pd.DataFrame([""""Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp 
Date Record from Epay Account {DBServer;UserName;Password='', 
DatabaseName='',Year Offset='-10'}>  -> 1System.Data.SqlTypes.SqlNullValueException: Data is Null. This method or 
property cannotbecalled 
on Null values.  -> 2System.Data.SqlTypes.SqlNullValueException2: Data is Null."""] * 2, columns=['ErrorMessage'])

mulIndexDataFrame = df['ErrorMessage'].str.extractall(r" -> ([^:]+): ")
df['test'] = mulIndexDataFrame.groupby(mulIndexDataFrame.index.get_level_values(0))[0].apply(lambda x: ','.join(x))
print(df)

输出:

                                        ErrorMessage  \
0  "Step 13 - Iteration 1 Failed: Action: <Update...   
1  "Step 13 - Iteration 1 Failed: Action: <Update...   

                                                test  
0  1System.Data.SqlTypes.SqlNullValueException,2S...  
1  1System.Data.SqlTypes.SqlNullValueException,2S...  
teststring1 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account 
                {DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}>  -> System.Data.SqlTypes.SqlNullValueException1: 
                Data is Null. This method or property cannotbecalled on Null values.  -> System.Data.SqlTypes.SqlNullValueException2: Data is Null. 
                 -> System.Data.SqlTypes.SqlNullValueException21:   -> System.Data.SqlTypes.SqlNullValueException22:   -> System.Data.SqlTypes.SqlNullValueException23: 
                 -> System.Data.SqlTypes.SqlNullValueException24: """
teststring2 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account 
                {DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}>  -> System.Data.SqlTypes.SqlNullValueException3: 
                Data is Null. This method or property cannotbecalled on Null values.  -> System.Data.SqlTypes.SqlNullValueException4: Data is Null."""
teststring3 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account 
                {DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}>  -> System.Data.SqlTypes.SqlNullValueException5: 
                Data is Null. This method or property cannotbecalled on Null values.  -> System.Data.SqlTypes.SqlNullValueException6: Data is Null."""
teststring4 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account 
                {DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}>  -> System.Data.SqlTypes.SqlNullValueException7: 
                Data is Null. This method or property cannotbecalled on Null values.  -> System.Data.SqlTypes.SqlNullValueException8: Data is Null."""
teststring5 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account 
                {DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}>  -> System.Data.SqlTypes.SqlNullValueException9: 
                Data is Null. This method or property cannotbecalled on Null values.  -> System.Data.SqlTypes.SqlNullValueException10: Data is Null."""
teststring6 = """Step 13 - Iteration 1 Failed: Action: <Update Latest CC Exp Date Record from Epay Account 
                {DBServer;UserName;Password='', DatabaseName='',Year Offset='-10'}>  -> System.Data.SqlTypes.SqlNullValueException11: 
                Data is Null. This method or property cannotbecalled on Null values.  -> System.Data.SqlTypes.SqlNullValueException12: Data is Null."""


values = [[teststring1], [teststring2], [teststring3], [teststring4], [teststring5], [teststring6]]
header = ['ErrorMessage']

df = pd.DataFrame(values, columns=header)

exceptions = df['ErrorMessage'].str.extractall(r" -> ([^:]+): ")

extractall返回一个新的多索引DataFrame,其中第一个索引将匹配原始DataFrame索引,第二个索引将是提取或匹配的次数。原始数据帧和新数据帧不兼容。

                                                  0
   match    
0   0   System.Data.SqlTypes.SqlNullValueException1
    1   System.Data.SqlTypes.SqlNullValueException2
    2   System.Data.SqlTypes.SqlNullValueException21
    3   System.Data.SqlTypes.SqlNullValueException22
    4   System.Data.SqlTypes.SqlNullValueException23
    5   System.Data.SqlTypes.SqlNullValueException24
1   0   System.Data.SqlTypes.SqlNullValueException3
    1   System.Data.SqlTypes.SqlNullValueException4
2   0   System.Data.SqlTypes.SqlNullValueException5
    1   System.Data.SqlTypes.SqlNullValueException6
3   0   System.Data.SqlTypes.SqlNullValueException7
    1   System.Data.SqlTypes.SqlNullValueException8
4   0   System.Data.SqlTypes.SqlNullValueException9
    1   System.Data.SqlTypes.SqlNullValueException10
5   0   System.Data.SqlTypes.SqlNullValueException11
    1   System.Data.SqlTypes.SqlNullValueException12

相关问题 更多 >