什么是一个好的Python敏感词过滤库？问题的回答

什么是一个好的Python敏感词过滤库？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<p>我没有找到任何Python亵渎库，所以我自己做了一个。</p> <h2>参数</h2> <hr/> <h3><code>filterlist</code></h3> <p>与禁止使用的单词相匹配的正则表达式的列表。请不要使用<code>\b</code>，它将根据<code>inside_words</code>插入。</p> <p>示例： <code>['bad', 'un\w+']</code></p> <h3><code>ignore_case</code></h3> <p>默认值：<code>True</code></p> <p>不言而喻。</p> <h3><code>replacements</code></h3> <p>默认值：<code>"$@%-?!"</code></p> <p>包含字符的字符串，替换字符串将从中随机生成。</p> <p>例如：<code>"%&$?!"</code>或<code>"-"</code>等</p> <h3><code>complete</code></h3> <p>默认值：<code>True</code></p> <p>控制是否替换整个字符串或是否保留第一个和最后一个字符。</p> <h3><code>inside_words</code></h3> <p>默认值：<code>False</code></p> <p>控制是否在其他单词中搜索单词。禁用此</p> <h2>模块源</h2> <hr/> <p>（最后举例）</p> <pre><code>""" Module that provides a class that filters profanities """ __author__ = "leoluk" __version__ = '0.0.1' import random import re class ProfanitiesFilter(object): def __init__(self, filterlist, ignore_case=True, replacements="$@%-?!", complete=True, inside_words=False): """ Inits the profanity filter. filterlist -- a list of regular expressions that matches words that are forbidden ignore_case -- ignore capitalization replacements -- string with characters to replace the forbidden word complete -- completely remove the word or keep the first and last char? inside_words -- search inside other words? """ self.badwords = filterlist self.ignore_case = ignore_case self.replacements = replacements self.complete = complete self.inside_words = inside_words def _make_clean_word(self, length): """ Generates a random replacement string of a given length using the chars in self.replacements. """ return ''.join([random.choice(self.replacements) for i in range(length)]) def __replacer(self, match): value = match.group() if self.complete: return self._make_clean_word(len(value)) else: return value[0]+self._make_clean_word(len(value)-2)+value[-1] def clean(self, text): """Cleans a string from profanity.""" regexp_insidewords = { True: r'(%s)', False: r'\b(%s)\b', } regexp = (regexp_insidewords[self.inside_words] % '|'.join(self.badwords)) r = re.compile(regexp, re.IGNORECASE if self.ignore_case else 0) return r.sub(self.__replacer, text) if __name__ == '__main__': f = ProfanitiesFilter(['bad', 'un\w+'], replacements="-") example = "I am doing bad ungood badlike things." print f.clean(example) # Returns "I am doing --- ------ badlike things." f.inside_words = True print f.clean(example) # Returns "I am doing --- ------ ---like things." f.complete = False print f.clean(example) # Returns "I am doing b-d u----d b-dlike things." </code></pre>

什么是一个好的Python敏感词过滤库？

1 个回答

相关Python问题