正则表达式函数的替代方法

2024-06-26 02:47:49 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的密码

import re
with open('newfiles.txt') as f:
   k = f.read()
p = re.compile(r'\w+|[^\w\-\s]')
originaltext = p.findall(k)
uniquelist = []
for word in originaltext:
   if word not in uniquelist:
       uniquelist.append(word)
indexes = ' '.join(str(uniquelist.index(word)+1) for word in originaltext)
print('Here are the index positions of the text file : ' + indexes)

它获取一个文本文件(一对带有标点符号的随机句子),然后输出每个单词/标点符号出现的位置。如果某物重复两次,则显示第一个出现的位置。在这个程序中,标点符号被视为一个单独的单词。你知道吗

我试着玩代码,然后试着简化它。使用regex函数,我只需要两行代码来查找和分离单词和标点符号,因此非常有效。但是,有人知道一种比使用regex更简单、更简单的方法吗?介意我问一下,如果你回答了,请不要改变代码的其他部分,只是用另一种方法来实现相同的功能(显示单词的索引)而不是使用regex。很明显它会更长,所以这无关紧要。你知道吗

新文件.txt

Parkour, also known as freerunning, is a relatively new sport founded by Sebastian Foucan, who showed off his skills in the James Bond movie "Casino Royale", which was released in 2006. Parkour is running, jumping over obstacles, or climbing over buildings and walls.
It is daring, breathtaking and at times terrifying, and now it is also an official sport in the UK, making the UK the first country in the world to recognise it. This means that people can teach parkour in schools.
Some people are worried about the sport being too dangerous, but the founder says that it is as safe as any sport, comparing to rugby, wrestling, surfing or climbing, but, - if you do not do it in the right way, you can get hurt.

输出

Here are the index positions of the text file : 1 2 3 4 5 6 2 7 8 9 10 11 12 13 14 15 2 16 17 18 19 20 21 22 23 24 25 26 27 28 26 2 29 30 31 21 32 33 1 7 34 2 35 36 37 2 38 39 36 40 41 42 33 43 7 44 2 45 41 46 47 48 2 41 49 50 7 3 51 52 11 21 22 53 2 54 22 53 22 55 56 21 22 57 58 59 50 33 60 61 62 63 64 65 66 21 67 33 68 63 69 70 71 22 11 72 73 74 2 75 22 76 77 62 50 7 5 78 5 79 11 2 80 58 81 2 82 2 83 38 39 2 75 2 84 85 86 87 86 50 21 22 88 89 2 85 64 90 91 33

谢谢


Tags: the代码inindexisasit单词
1条回答
网友
1楼 · 发布于 2024-06-26 02:47:49

写“可读代码”真的很难。我仍然不明白为什么要这样做,但这是一个很好的挑战:)我无法控制自己,并改变了您构建独特集合的方式(使用OrderedDict):

import re
from collections import OrderedDict
import string
from numpy.testing.utils import assert_array_equal

k = '''Parkour, also known as freerunning, is a relatively new sport founded by Sebastian Foucan, who showed off his skills in the James Bond movie "Casino Royale", which was released in 2006. Parkour is running, jumping over obstacles, or climbing over buildings and walls.
It is daring, breathtaking and at times terrifying, and now it is also an official sport in the UK, making the UK the first country in the world to recognise it. This means that people can teach parkour in schools.
Some people are worried about the sport being too dangerous, but the founder says that it is as safe as any sport, comparing to rugby, wrestling, surfing or climbing, but, - if you do not do it in the right way, you can get hurt.'''

# the one you know
p = re.compile(r'\w+|[^\w\-\s]')
originaltext = p.findall(k)
uniquelist = []
for word in originaltext:
   if word not in uniquelist:
       uniquelist.append(word)
indexes = ' '.join(str(uniquelist.index(word)+1) for word in originaltext)


# the 'readable one'
w = string.ascii_uppercase + string.ascii_lowercase + "0123456789" 
originaltext2 = []
word = ""
for char in k:
    if char in " -\t\n\r\f\v":
        if word != "":
            originaltext2.append(word)
        word = ""
    elif char not in w:
        if word != "":
            originaltext2.append(word)
        originaltext2.append(char)
        word = ""
    else:
        word += char

uniquelist2 = OrderedDict.fromkeys(originaltext2).keys()
indexes2 = ' '.join(str(uniquelist2.index(word)+1) for word in originaltext2)

# same output
assert_array_equal(indexes, indexes2)

相关问题 更多 >