基于发生顺序分析字符串

2024-06-23 19:55:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的SampleDf这样的数据,我试图创建一个代码,从每个字符串中提取第一个“Avg”、“Sum”或“Count”,并将其放入一个新的列“Agg”。我下面的代码几乎做到了这一点,但它有一个层次结构。所以在下面的代码中,如果Count在Sum之前,它仍然将Sum放在Agg列中。我有一个OutputDf下面显示我希望得到什么

Sample Data:

SampleDf=pd.DataFrame([['tom',"Avg(case when Value1 in ('Value2') and [DateType] in ('Value3') then LOS end)"],['bob',"isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and  [Value1] in ('HM') then  Count(LOS) end),0)"]],columns=['ReportField','OtherField'])

Sample Output:

OutputDf=pd.DataFrame([['tom',"Avg(case when Value1 in ('Value2') and [DateType] in ('Value3') then LOS end)",'Avg'],['bob',"isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and  [Value1] in ('HM') then  Count(LOS) end),0)",'Sum']],columns=['ReportField','OtherField','Agg'])


Code:
import numpy as np

    SampleDf['Agg'] = np.where(SampleDf.SQLTranslation.str.contains("Sum"),"Sum",
                              np.where(SampleDf.SQLTranslation.str.contains("Count"),"Count",
                                      np.where(SampleDf.SQLTranslation.str.contains("Avg"),"Avg","Nothing")))

Tags: andincountaggendavgwhensum
1条回答
网友
1楼 · 发布于 2024-06-23 19:55:48

对这个问题快速而肮脏的尝试就是编写一个返回的函数:
-任何感兴趣的术语,如[“Avg”、“Sum”、“Count]”,如果它出现在字符串中,则首先出现
-或者None,如果没有:

import re
terms = ['Avg','Sum','Count']
def extractTerms(s, t=terms):
    s_clean =  re.sub("[^\w]|[\d]"," ", s).split()
    s_array = [w for w in s_clean if w in t]
    try:
        return s_array[0]
    except:
        return None

字符串中的条件证明:

SampleDf['Agg'] = SampleDf['OtherField'].apply(lambda s: extractTerms(s))
SampleDf

ReportField OtherField  Agg
0   tom Avg(case when Value1 in ('Value2') and [DateType] in ('Value3') then LOS end)   Avg
1   bob isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and [Value1] in ('HM') then Count(LOS) end),0)  Sum

如果术语不在字符串中,则证明:

SampleDf['Agg'] = SampleDf['OtherField'].apply(lambda s: extractTerms(s))
SampleDf

ReportField OtherField  Agg
0   tom foo None
1   bob isnull(Sum(case when XferToValue2 in (1) and DateType in ('Value3') and [Value1] in ('HM') then Count(LOS) end),0)  Sum

相关问题 更多 >

    热门问题