2024-09-26 18:09:11 发布
网友
我正在使用Python读取一个包含右单引号的txt:'
ord("’") Out[46]: 8217
http://www.fileformat.info/info/unicode/char/2019/index.html 我正在使用以下代码读取txt文件:
with open(text_path, 'r', encoding='utf-8') as f: transcript = f.read()
您可以编写一个自定义编码函数,将utf-8字符转换为查找表中指定的ascii字符
# -*- coding: utf-8 -*- import io def encode_file(filepath, conversion_table={}): ''' replaces utf-8 chars with specified equivalent ascii char''' with io.open(text_path, "r", encoding="utf-8") as f: transcript = f.read() new_transcript = "" for i in transcript: new_char = "" # append character if ascii try: new_transcript += i.encode("ascii") except UnicodeEncodeError: found_char = False for c in conversion_table: # replace utf-8 with custom ascii equivalent if i == unicode(c, encoding="utf-8"): new_transcript += conversion_table[c] found_char = True # no conversion found if found_char == False: new_transcript += "?" return new_transcript text_path = "/path/to/file.txt" conversion_table = {'ü':'u', 'ô':'o', 'é':'e', 'į':'i'} print (encode_file(text_path, conversion_table))
例如,如果文件的内容为my ünicôdé strįng,则会产生my unicode string
my ünicôdé strįng
my unicode string
因此,您可以将'’':'\''(或任何转换)添加到conversion_table,它将为您进行替换
'’':'\''
conversion_table
您可以编写一个自定义编码函数,将utf-8字符转换为查找表中指定的ascii字符
例如,如果文件的内容为
my ünicôdé strįng
,则会产生my unicode string
因此,您可以将
'’':'\''
(或任何转换)添加到conversion_table
,它将为您进行替换相关问题 更多 >
编程相关推荐