使用python在一堆文件中剥离短语

import re string = open('/Users/Cynthia/Desktop/Jeunesse/Longivity English/Jeunesse Longevity TV - Episode 27 - Lifestyle - PART 4 - Healthy Nutrition 2 2.en.transcribed.txt').read() new_str = re.sub('<font color="#CCCCCC">', ' ', string) open('b.txt', 'w').write(new_str) string = open('/Users/Cynthia/Desktop/Jeunesse/Longivity English/b.txt').read() new_str = re.sub('<font color="#E5E5E5">', ' ', string) open('c.txt', 'w').write(new_str) string = open('/Users/Cynthia/Desktop/Jeunesse/Longivity English/c.txt').read() new_str = re.sub('</font>', ' ', string) open('d.txt', 'w').write(new_str)

1条回答

网友

1楼 · 发布于 2024-10-02 10:30:25

下面是一个初学者友好的方法，使用函数来处理一个文件。它链接您的代码并将结果写入一个新文件

因此，只需为每个文件调用strip\u html一次，其中包含新文件的文件名和名称

在本例中，有一个文件名列表，它将用“.fixed”这个词在结尾写入固定文件

请注意，这是一个简单的方法，我遗漏了很多东西，使之易于理解。一旦你对编程有了更多的了解，你就会找到更好的方法。但你应该让它起作用

import re

def strip_html(filename, newfilename):
    with open(filename) as f1:
        string = f1.read()
        new_str = re.sub('<font color="#CCCCCC">', ' ', string)
        new_str = re.sub('<font color="#E5E5E5">', ' ', new_str)
        new_str = re.sub('</font>', ' ', new_str)
        with open(newfilename, 'w') as w1:
            w1.write(new_str)

files = ['/Users/Cynthia/Desktop/Jeunesse/Longivity English/Jeunesse 
Longevity TV - Episode 27 - Lifestyle - PART 4 - Healthy Nutrition 2 
2.en.transcribed.txt',
'/Users/Cynthia/Desktop/Jeunesse/Longivity English/Jeunesse 
Longevity TV - Episode 28 - Lifestyle - PART 1 - Healthy Nutrition 3 
2.en.transcribed.txt'
]

for file in files:  
    strip_html(file, file + '.fixed')

希望这有帮助

运行此命令时，请查看os.listdir（）命令，以了解如何从目录中获取文件名列表，而不是将其写入代码中

相关问题更多 >

编程相关推荐

热门问题

热门文章