用Python和regex从文件中逐行搜索和提取WHword

2024-09-30 06:15:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件,每行有一个句子。我试图阅读文件,并使用正则表达式搜索句子是否是一个疑问句,并从句子中提取wh单词,并根据它在第一个文件中出现的顺序将它们保存回另一个文件中。在

这就是我目前所掌握的。。在

def whWordExtractor(inputFile):
    try:
        openFileObject = open(inputFile, "r")
        try:

            whPattern = re.compile(r'(.*)who|what|how|where|when|why|which|whom|whose(\.*)', re.IGNORECASE)
            with openFileObject as infile:
                for line in infile:

                    whWord = whPattern.search(line)
                    print whWord

# Save the whWord extracted from inputFile into another whWord.txt file
#                    writeFileObject = open('whWord.txt','a')                   
#                    if not whWord:
#                        writeFileObject.write('None' + '\n')
#                    else:
#                        whQuestion = whWord   
#                        writeFileObject.write(whQuestion+ '\n') 

        finally:
            print 'Done. All WH-word extracted.'
            openFileObject.close()
    except IOError:
        pass

The result after running the code above: set([])

我有什么地方做错了吗?如果有人能给我指出来,我将不胜感激。在


Tags: 文件therelineopeninfile句子print
3条回答

'(.*)who|what|how|where|when|why|which|whom|whose(\.*)'更改为 ".*(?:who|what|how|where|when|why|which|whom|whose).*\."

像这样:

def whWordExtractor(inputFile):
   try:
      with open(inputFile) as f1:
           whPattern = re.compile(r'(.*)who|what|how|where|when|why|which|whom|whose(\.*)', re.IGNORECASE)
           with open('whWord.txt','a') as f2:  #open file only once, to reduce I/O operations
               for line in f1:
                   whWord = whPattern.search(line)
                   print whWord
                   if not whWord:
                        f2.write('None' + '\n')
                   else:
                        #As re.search returns a sre.SRE_Match object not string, so you will have to use either
                        # whWord.group() or better use  whPattern.findall(line)
                        whQuestion = whWord.group()   
                        f2.write(whQuestion+ '\n') 
               print 'Done. All WH-word extracted.' 
   except IOError:
        pass

不确定这是否是你要找的,但你可以试试这样的方法:

def whWordExtractor(inputFile):
    try:
        whPattern = re.compile(r'who|what|how|where|when|why|which|whom|whose', re.IGNORECASE)
        with open(inputFile, "r") as infile:
            for line in infile:
                whMatch = whPattern.search(line)
                if whMatch:
                    whWord = whMatch.group()
                    print whWord
                    # save to file
                else:
                    # no match
    except IOError:
        pass

相关问题 更多 >

    热门问题