Python2.7在numb之前选择列表中的单词

infile = open('a.txt') outfile = open('b.txt', 'w') replacements = {'1':'', '2':'' up to twenty and then a list based on words commonly occuring after the numbers such as 'topical':'' etc} for line in infile: for src, target in replacements.iteritems(): line = line.replace(src, target) outfile.write(line) infile.close() outfile.close()

2条回答

网友

1楼 · 编辑于 2024-09-27 21:29:22

试试这个，它将根据数字进行分割，并为您提供名称部分：

import re

exp = re.compile(r'(\d+\.?\d+)')

with open('mainfile.txt') as f, open('names.txt','w') as out:
   for line in f:
      line = line.strip()
      if len(line):
           try:
               out.write('{}\n'.format(re.split(exp, line)[0].strip()))
           except:
               print('Could not parse {}'.format(line))

正则表达式\d+\.?\d+表示：

\d+一个或多个数字
\.?一个可选的.（注意在正则表达式中.有特殊的含义，所以当我们指的是文字.时，我们将其转义）
\d+后跟一个或多个数字

它周围的()使它成为一个捕获组；结果如下：

>>> x = r'(\d+\.?\d+)'
>>> l = 'Benzoyl Peroxide 50 MG/ML Topical Lotion'
>>> re.split(x, l)
['Benzoyl Peroxide ', '50', ' MG/ML Topical Lotion']

网友

2楼 · 编辑于 2024-09-27 21:29:22

为什么不做一个循环，用isdigit()来确定第一个数字呢？比如：

writef = open('b.txt', 'w')
with open('a.txt') as f:
    while True:
        line = f.readline()
        if not line:
            break
        words = line.split()
        for i in range(len(words)):
            if words[i].replace('.', '').isdigit():
                writef.write(words[i-1] + '\n')
                continue
writef.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章