Pandas缩略语比较

2024-10-05 14:29:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在用一个包含公司信息的数据集来试验/学习Python。你知道吗

数据帧结构如下(这些是由记录组成的):

import pandas as pd

df = pd.DataFrame({'key': [111, 222, 333, 444, 555, 666, 777, 888, 999], 
                   'left_name' : ['ET CETERA SYSTEMS', 'ODDS AND ENDS', 'MAXIMA COMPANY', 'MUSIC MANY', 
                                  'GRAPHIC MASTER', 'ARC SECURITY', 'MINDNSOLES', 'REX ENERGY', 'THESIS COMPANY'],
                  'right_name' : ['ET CETERA SYS', 'ODDSNENDS', 'MAX COMP', 'MUSICMANY', 'GRAPHIC MSTR', 
                                  'ARC SECU', 'MIND AND SOLES', 'REXX', 'THESIS COMP']})

print(df)

   key          left_name      right_name
0  111  ET CETERA SYSTEMS   ET CETERA SYS
1  222      ODDS AND ENDS       ODDSNENDS
2  333     MAXIMA COMPANY        MAX COMP
3  444         MUSIC MANY       MUSICMANY
4  555     GRAPHIC MASTER    GRAPHIC MSTR
5  666       ARC SECURITY        ARC SECU
6  777         MINDNSOLES  MIND AND SOLES
7  888         REX ENERGY            REXX
8  999     THESIS COMPANY     THESIS COMP

我的目标是比较每一对的缩写词。具体而言,如果由left_name的初始字母串联形成的缩写字符串等于由right_name的初始字母串联形成的缩写字符串,则返回1的标志。否则,返回0。你知道吗

例如,如果我们比较前两个缩写对,那么:

  • ECS == ECS1
  • OAE != O0

从视觉上看,我要查找的结果数据帧应该如下所示:

   key          left_name      right_name  name_flag
0  111  ET CETERA SYSTEMS   ET CETERA SYS          1
1  222      ODDS AND ENDS       ODDSNENDS          0
2  333     MAXIMA COMPANY        MAX COMP          1
3  444         MUSIC MANY       MUSICMANY          0
4  555     GRAPHIC MASTER    GRAPHIC MSTR          1
5  666       ARC SECURITY        ARC SECU          1
6  777         MINDNSOLES  MIND AND SOLES          0
7  888         REX ENERGY            REXX          0
8  999     THESIS COMPANY     THESIS COMP          1

我想我的问题和这个问题密切相关:Upper case first letter of each word in a phrase

不幸的是,我无法对代码进行适当的修改以解决我的问题。任何额外的帮助都将不胜感激。你知道吗


Tags: and数据keynamerightleftcompanyet
3条回答
def abbr(x):
    return ''.join([letter[0] for letter in x.split(' ')])

df['name_flag'] = (df['left_name'].apply(abbr) == df['right_name'].apply(abbr)).astype(int)

输出:

0    1
1    0
2    1
3    0
4    1
5    1
6    0
7    0
8    1


''.join(re.findall(r'^[A-Z]|\s[A-Z]',s)).replace(' ','')

或者

''.join(re.findall(r'\b\w',s))

在函数中也起作用

试试这个:

l = df.left_name.str.findall(r'\b\w')
r = df.right_name.str.findall(r'\b\w')
df['name_flag'] = (l == r).astype(int)

Out[366]:
   key          left_name      right_name  name_flag
0  111  ET CETERA SYSTEMS   ET CETERA SYS          1
1  222      ODDS AND ENDS       ODDSNENDS          0
2  333     MAXIMA COMPANY        MAX COMP          1
3  444         MUSIC MANY       MUSICMANY          0
4  555     GRAPHIC MASTER    GRAPHIC MSTR          1
5  666       ARC SECURITY        ARC SECU          1
6  777         MINDNSOLES  MIND AND SOLES          0
7  888         REX ENERGY            REXX          0
8  999     THESIS COMPANY     THESIS COMP          1

这就行了

def get_acronym(phrase):
    words = phrase.split(' ')
    return ''.join(w[0] for w in words)

df['name_flag'] = df.right_name.map(get_acronym) == df.left_name.map(get_acronym)
df['name_flag'] = df['name_flag'].astype(int)

df输出

   key          left_name      right_name  name_flag
0  111  ET CETERA SYSTEMS   ET CETERA SYS          1
1  222      ODDS AND ENDS       ODDSNENDS          0
2  333     MAXIMA COMPANY        MAX COMP          1
3  444         MUSIC MANY       MUSICMANY          0
4  555     GRAPHIC MASTER    GRAPHIC MSTR          1
5  666       ARC SECURITY        ARC SECU          1
6  777         MINDNSOLES  MIND AND SOLES          0
7  888         REX ENERGY            REXX          0
8  999     THESIS COMPANY     THESIS COMP          1

相关问题 更多 >