Python2.7在numb之前选择列表中的单词

2024-09-27 21:29:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文本文件a.txt,包含:

Hydrocortisone 10 MG/ML Topical Cream
Tretinoin 0.25 MG/ML Topical Cream
Benzoyl Peroxide 50 MG/ML Topical Lotion
Ketoconazole 20 MG/ML Medicated Shampoo
etc

我需要一种方法来选择第一个数字之前的任何单词,并将它们写入另一个文件b.txt:

Hydrocortisone
Tretinoin 
Benzoyl Peroxide
Ketoconazole
etc

我对如何在文件中进行查找和替换有一个基本的想法,但是对python的理解非常有限,所以我最初的想法是

infile = open('a.txt')
outfile = open('b.txt', 'w')
replacements = {'1':'', '2':'' up to twenty and then a list based on words commonly occuring after the numbers such as 'topical':'' etc}
for line in infile:
for src, target in replacements.iteritems():
line = line.replace(src, target)
outfile.write(line)
infile.close()
outfile.close()

但所要做的就是删除“replacements”中指定的内容。有成千上万种变体,所以我不能全部列出。你知道吗

抱歉没有说清楚,谢谢你的帮助


Tags: txtlineetcmlinfileoutfilecreamreplacements
2条回答

试试这个,它将根据数字进行分割,并为您提供名称部分:

import re

exp = re.compile(r'(\d+\.?\d+)')

with open('mainfile.txt') as f, open('names.txt','w') as out:
   for line in f:
      line = line.strip()
      if len(line):
           try:
               out.write('{}\n'.format(re.split(exp, line)[0].strip()))
           except:
               print('Could not parse {}'.format(line))

正则表达式\d+\.?\d+表示:

  • \d+一个或多个数字
  • \.?一个可选的.(注意在正则表达式中.有特殊的含义,所以当我们指的是文字.时,我们将其转义)
  • \d+后跟一个或多个数字

它周围的()使它成为一个捕获组;结果如下:

>>> x = r'(\d+\.?\d+)'
>>> l = 'Benzoyl Peroxide 50 MG/ML Topical Lotion'
>>> re.split(x, l)
['Benzoyl Peroxide ', '50', ' MG/ML Topical Lotion']

为什么不做一个循环,用isdigit()来确定第一个数字呢?比如:

writef = open('b.txt', 'w')
with open('a.txt') as f:
    while True:
        line = f.readline()
        if not line:
            break
        words = line.split()
        for i in range(len(words)):
            if words[i].replace('.', '').isdigit():
                writef.write(words[i-1] + '\n')
                continue
writef.close()

相关问题 更多 >

    热门问题