从其他单词中删除一些单词

2024-10-03 15:23:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个txt文件(我的文本.txt)包含许多行文本。在

我想知道:

  • 如何创建需要删除的单词列表(我想自己设置单词)
  • 如何创建需要替换的单词列表

例如,如果我的文本.txt是:

    The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month. 
  • 我想删除“和”在“我想取代” “古”由“旧”
  • 我想替换“月”和“世纪” “年”

Tags: and文件thein文本txt列表for
3条回答

您可以始终使用正则表达式:

import re

st='''\
The ancient Romans influenced countries and civilizations in the following centuries.  
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.'''

deletions=('and','in','the')
repl={"ancient": "old", "month":"years", "centuries":"years"}

tgt='|'.join(r'\b{}\b'.format(e) for e in deletions)
st=re.sub(tgt,'',st)
for word in repl:
    tgt=r'\b{}\b'.format(word)
    st=re.sub(tgt,repl[word],st)


print st

使用列表删除,字典替换。应该是这样的:

 def processTextFile(filename_in, filename_out, delWords, repWords):


    with open(filename_in, "r") as sourcefile:
        for line in sourcefile:
            for item in delWords:
                line = line.replace(item, "")
            for key,value in repWords.items():
                line = line.replace(key,value)

            with open(filename_out, "a") as outfile:
                outfile.write(line)



if __name__ == "__main__":
    delWords = []
    repWords = {}

    delWords.extend(["the ", "and ", "in "])
    repWords["ancient"] = "old"
    repWords["month"] = "years"
    repWords["centuries"] = "years"

    processTextFile("myText.txt", "myOutText.txt", delWords, repWords)

请注意,这是为python3.3.2编写的,这就是我使用items()的原因。如果使用python2.x,请使用iteritems(),因为我认为它更有效,尤其是对于大型文本文件。在

这应该能奏效。使用一个列表来存储要删除的对象,然后遍历列表并从内容字符串中删除列表中的每个元素。然后,使用字典来存储您现在拥有的单词以及要替换它们的单词。您还可以循环这些单词,并将当前单词替换为替换单词。在

def replace():
    contents = ""
    deleteWords = ["the ", "and ", "in "]
    replaceWords = {"ancient": "old", "month":"years", "centuries":"years"}

    with open("meText.txt") as f:
    contents = f.read()
    for word in deleteWords:
    contents = contents.replace(word,"")

    for key, value in replaceWords.iteritems():
    contents = contents.replace(key, value)
    return contents

相关问题 更多 >