使用Regex查看Python中的文本文件

#Opens Temp file TrueURL = open("TrueURL_tmp.txt","w+") #Reviews Data grabbed from BeautifulSoup and write urls to file for link in g_data: TrueURL.write(link.get("href") + '\n') #Creates Regex Pattern for TrueURL_tmp pattern = re.compile(r'thread/.*/*apple|thread/.*/*potato') search_pattern = re.search(pattern, str(TrueURL)) #Uses Regex Pattern against TrueURL_tmp file. for url in search_pattern: print (url) #Closes and deletes file TrueURL.close() os.remove("TrueURL_tmp.txt")

2条回答

网友

1楼 · 编辑于 2024-09-30 14:16:24

您的搜索没有返回匹配，因为您是在str表示的file object上进行搜索的，而不是实际的文件内容。你知道吗

您基本上是在搜索以下内容：

<open file 'TrueURL_tmp.txt', mode 'w+' at 0x7f2d86522390>

如果要搜索文件内容，请关闭该文件，以便明确写入内容，然后重新打开并读取行，或者只在循环中搜索for link in g_data:

如果确实要写入临时文件，请使用临时文件：

from tempfile import TemporaryFile
with  TemporaryFile() as f:
    for link in g_data:
        f.write(link.get("href") + '\n')
    f.seek(0)
    #Creates Regex Pattern for TrueURL_tmp
    pattern = re.compile(r'thread/.*/*apple|thread/.*/*potato')
    search_pattern = re.search(pattern, f.read())

search_pattern是一个_sre.SRE_Match object，所以您可以称为组i，eprint(search_pattern.group())，或者您想使用findAll。你知道吗

 search_pattern = re.findall(pattern, f.read())

 for url in search_pattern:
     print (url)

我仍然认为在你写任何东西之前做搜索可能是最好的方法，也许根本不写，但我不完全确定你到底想做什么，因为我看不出文件如何适合你正在做的事情，连接到一个字符串将达到同样的效果。你知道吗

 pattern = re.compile(r'thread/.*/*apple|thread/.*/*potato')
 for link in g_data:
        match = pattern.search(link.get("href"))
        if match:
           print(match.group())

网友

2楼 · 编辑于 2024-09-30 14:16:24

这是我找到的答案来回答我原来的问题，虽然Padraic的方法是正确的，不那么痛苦的过程。你知道吗

 with  TemporaryFile() as f:
    for link in g_data:
        f.write(bytes(link.get("href") + '\n', 'UTF-8'))
    f.seek(0)
    #Creates Regex Pattern for TrueURL_tmp
    pattern = re.compile(r'thread/.*/*apple|thread/.*/*potato')
    read = f.read()
    search_pattern = re.findall(pattern,read)
    #Uses Regex Pattern against TrueURL_tmp file.
    for url in search_pattern:
        print (url.decode('utf-8'))

相关问题更多 >

编程相关推荐

热门问题

热门文章