检查数据框列中的缩写

|-------Name---------------------------||-Count-| This is Ante Meridian (AM) not included|| 3 | This is Ante Meridian (AM) included|| 3 | This is Ante Meridian (AM) not included|| 3 | Extra module with Post Meridian (PM) || 1 | Post Meridian (PO) is not available || 0 | #Mismatch

1条回答

网友

1楼 · 发布于 2024-06-28 10:06:21

首先，您需要使用正则表达式来确定（）中的字母是否与前面的两个单词匹配

#get two words before (
wordsbefore = df['Name'].str.extract(r'(\w+) (\w+) (?=\()')

#get first letter of both words and make it what it should be in ()
check = wordsbefore[0].str.extract(r'(^.)') + wordsbefore[1].str.extract(r'(^.)')

#check if letters in () matches our check
df['count'] = np.where(df['Name'].str.extract(r"\((.*)\)") == check, df['Name'].str.extract(r"\((.*)\)"), 0)

现在您有了一个df，其中acynoym位于它自己的列中，如果它不匹配，则为0。现在我们只需要替换为计数

df['count'] = df['count'].map(dict(df[df['count']!=0]['count'].value_counts())).fillna(0)

              Name                          count
0   This is Ante Meridian (AM) not included   3.0
1   This is Ante Meridian (AM) included       3.0
2   This is Ante Meridian (AM) not included   3.0
3   Extra module with Post Meridian (PM)      1.0
4   Post Meridian (PO) is not available       0.0

如果一行中没有（），那么最后也会得到0

如果您只需遵循循环中的模式，则可调节3和更多：

acy = re.compile("\((.*)\)")
twoWords = re.compile('(\w+) (\w+) (?=\()')
threeWords = re.compile('(\w+) (\w+) (\w+) (?=\()')
firstLet = re.compile('(^.)')

acyList = []

#Pull the first letters out of the words before ()
for index, value in df['Name'].iteritems():
    #get letters in () two inspect to check if we need to check 2 or 3 words
    getAcy = acy.search(value)
    try:    
        #check if length of letters in () is 2
        if len(getAcy[1]) == 2:
            #search for two words
            words = twoWords.search(value)
            #get first letter of two words before () and add phrase to list
            acyList.append(firstLet.search(words[1])[1] + firstLet.search(words[2])[1])

        #check if length of letters in () is 3
        elif len(getAcy[1]) == 3:
            #search for three words
            words = threeWords.search(value)
            #get first letter of three words before () and add phrase to list
            acyList.append(firstLet.search(words[1])[1] + firstLet.search(words[2])[1] + firstLet.search(words[3])[1])

    except:
        acyList.append(np.NaN)

df['count'] = np.where(df['Name'].str.extract(r"\((.*)\)") == pd.DataFrame(acyList), df['Name'].str.extract(r"\((.*)\)"), 0)
df['count'] = df['count'].map(dict(df[df['count']!=0]['count'].value_counts())).fillna(0)

相关问题更多 >

编程相关推荐

热门问题

热门文章

检查数据框列中的缩写

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >