在Python中使用中文问题的回答

在Python中使用中文

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试用Python处理中文文本和大数据。工作的一部分是从一些不需要的数据中清除文本。为此，我使用regex。但是，我遇到了一些问题，比如Python regex和PyCharm应用程序： 1）数据存储在postgresql中，在列中可以很好地查看，但是在选择并将其拉到var之后，它显示为一个正方形： <img src="https://i.stack.imgur.com/Tvfth.jpg" alt="enter image description here"/> 当打印到控制台的值如下所示： 薄荷糖100g 所以我认为应用程序编码没有问题，但是编码的调试部分没有问题，但是，我没有找到任何解决这种行为的方法。在 2）我需要注意的regex示例是删除中括号之间的值包括它们。我使用的代码是： <pre><code>#!/usr/bin/env python # -*- coding: utf-8 -* import re from pprint import pprint import sys, locale, os columnString = row[columnName] startFrom = valuestoremove["startsTo"] endWith = valuestoremove["endsAt"] isInclude = valuestoremove["include"] escapeCharsRegex = re.compile('([\.\^\$\*\+\?\[\{\|])') nonASCIIregex = re.compile('([^\x00-\x7F])') if escapeCharsRegex.match(startFrom): startFrom = re.escape(startFrom) if escapeCharsRegex.match(endWith): endWith = re.escape(endWith) if isInclude: regex = startFrom + '(.*)' + endWith else: regex = '(?<=' + startFrom + ').*?(?=' + endWith + ')' if nonASCIIregex.match(regex): p = re.compile(ur'' + regex) else: p = re.compile(regex) row[columnName] = p.sub("", columnString).strip() </code></pre> 但是regex不影响给定的字符串。我用下一个代码做了一个测试： ^{pr2}$ 对我来说很好。这两个代码示例之间的唯一区别是，n第一个regex值来自带有json的txt文件，编码为utf-8： <pre><code>{ "between": { "startsTo": "(", "endsAt": "）", "include": true, "sequenceID": "1" } }, { "between": { "startsTo": "（", "endsAt": ")", "include": true, "sequenceID": "2" } },{ "between": { "startsTo": "(", "endsAt": ")", "include": true, "sequenceID": "2" } },{ "between": { "startsTo": "（", "endsAt": "）", "include": true, "sequenceID": "2" } } </code></pre> 文件中的中文方括号也被视为方形： <img src="https://i.stack.imgur.com/gyE30.jpg" alt="enter image description here"/> 我无法为这种行为找到解释或任何解决办法，因此社区需要帮助 谢谢你的帮助。在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在Python中使用中文

1 个回答

相关Python问题