Python findall、regex、unicode

import os, glob, re def scandirs(path): for currentFile in glob.glob(os.path.join(path, '*')): if os.path.isdir(currentFile): scandirs(currentFile) if os.path.splitext(currentFile)[1] == ".flac": rpath = os.path.relpath(currentFile) print "**DEBUG** rpath =", rpath title = os.path.basename(currentFile) title = re.findall(u'\d\d\s(.*).flac', title, re.U) title = title[0].decode("utf8") print "**DEBUG** title =", title fpath = os.path.split(os.path.dirname(currentFile)) artist = fpath[0][2:] print "**DEBUG** artist =", artist album = fpath[1] print "**DEBUG** album =", album out = "%s | %s | %s | %s\n" % (rpath, artist, album, title) flist = open('filelist.tmp', 'a') flist.write(out) flist.close() scandirs('./')

>>> import re >>> title = "Thriftworks - Fader - 01 180°.flac" >>> title2 = "dummy" >>> title = re.findall(u'\d\d\s(.*).flac', title, re.U) >>> title = title[0].decode("utf8") >>> out = "%s | %s\n" % (title2, title) >>> print out dummy | 180°

3条回答

网友

1楼 · 编辑于 2024-09-28 22:06:36

在控制台中，您的终端设置定义编码。现在，unices上主要是unice，例如Linux/BSD/MacOS和Windows上的Windows-1252。在解释器中，它默认为python文件的编码，通常是ascii（除非代码以UTF字节顺序标记开头）。
我不太确定，但也许在字符串“%s”%s |%s |%s\n“前面加一个u使其成为unicode字符串可能会有所帮助。

网友

2楼 · 编辑于 2024-09-28 22:06:36

Python控制台与终端一起工作，并根据其区域设置解释unicode编码。在

将该行替换为新的str.format：

out = u"{} | {} | {} | {}\n".format(rpath, artist, album, title)

并在写入文件时编码为utf8：

^{pr2}$

或者import codecs并直接执行：

^{3}$

或者，由于utf8是默认值：

with open('filelist.tmp', 'a') as f:
    f.write(out)

网友

3楼 · 编辑于 2024-09-28 22:06:36

将glob与包含Unicode字符的文件名一起使用时，请为模式使用Unicode字符串。这使得glob返回Unicode字符串而不是字节字符串。输出时，打印Unicode字符串会自动在控制台的编码中对它们进行编码。如果您的歌曲包含主机编码不支持的字符，您仍然会遇到问题。在这种情况下，将数据写入UTF-8编码的文件，并在支持UTF-8的编辑器中查看。在

>>> import glob
>>> for f in glob.glob('*'): print f
...
ThriftworksFaderThriftworks - Fader - 01 180░.flac
>>> for f in glob.glob(u'*'): print f
...
ThriftworksFaderThriftworks - Fader - 01 180°.flac

这也适用于os.walk，是进行递归搜索的一种更简单的方法：

^{pr2}$

输出：

^{3}$

相关问题更多 >

编程相关推荐

热门问题

热门文章