选择重复相同字符

2024-09-28 13:26:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图找出在同一个单词中有相同重复字符的句子,例如:

Sentence 
    are they saddddd?
    I don't want to go
    heyyyyy
    12333
    00unit
    00wolf                        
    01man                         
    20595                         
    2091996                       
    03dumbdumb                    

若值1或0包含或不包含重复字符,则为其赋值(预期输出如下):

Sentence                         Lab
    are they saddddd?             1 
    I don't want to go            0
    heyyyyy                       1
    12333                         1
    00unit                        0 
    00wolf                        0
    01man                         0
    20595                         0
    2091996                       0
    03dumbdumb                    0

我尝试了以下方法:

df.loc[(df['Sentence'].str.findall(r'([a-zA-Z])\1{3}').astype(bool)), 'Lab']=1

但是,无论至少3个相同字符是否连续,它都无法正确指定值1

你能看一下我的代码并告诉我为什么它不工作吗

有些值,如00unit, 00wolf, 01man, 20595, 2091996, 03dumbdumb,从上面的代码中错误地选择,因为它们包含三个相同的连续字符,而它们没有


Tags: to代码godflab字符单词are
2条回答

模式是任何单词字符后跟至少两个以上的字符

s = ['    are they saddddd?',
     "    I don't want to go",
     '    heyyyyy',
     '    12333',
     '    00unit',
     '    00wolf', 
     '    01man',
     '    20595',   
     '    2091996',   
     '    03dumbdumb']

df = pd.DataFrame(s,columns=['Sentence'])

In [25]: pattern = r'((\w)\2{2,})'

In [26]: df.loc[(df['Sentence'].str.findall(pattern).astype(bool)), 'Lab']=1

In [27]: df
Out[27]: 
                 Sentence  Lab
0       are they saddddd?  1.0
1      I don't want to go  NaN
2                 heyyyyy  1.0
3                   12333  1.0
4                  00unit  NaN
5                  00wolf  NaN
6                   01man  NaN
7                   20595  NaN
8                 2091996  NaN
9              03dumbdumb  NaN

或者pattern = r'(([a-zA-Z0-9])\2{2,})'如果不想匹配下划线


pattern = r'(([a-zA-Z0-9])\2{2,})'
S = df.Sentence.str.findall(pattern)
df['Lab'] = S.astype(bool).astype(int)

In [13]: df
Out[13]: 
                 Sentence  Lab
0       are they saddddd?    1
1      I don't want to go    0
2                 heyyyyy    1
3                   12333    1
4                  00unit    0
5                  00wolf    0
6                   01man    0
7                   20595    0
8                 2091996    0
9              03dumbdumb    0

\d放入字符列表,并将{3}更改为{2,}如何:

df['Lab'] = df['Sentence'].str.findall(r'([a-zA-Z\d])\1{2,}').astype(bool).astype(int)

输出:

             Sentence  Lab
0   are they saddddd?    1
1  I don't want to go    0
2             heyyyyy    1
3               12333    1
4              00unit    0
5              00wolf    0
6               01man    0
7               20595    0
8             2091996    0
9          03dumbdumb    0

相关问题 更多 >

    热门问题