<p>我想用“不能”或“不”的字眼删除空白,无论是通过regex还是在删除时</p>
<pre><code>from nltk.tokenize import WordPunctTokenizer
tok = WordPunctTokenizer()
detok = MosesDetokenizer()
pattern= "[^\w ]+ "
text= "i can ' t use this cause they won ' t fit"
string= re.sub(pattern, '', text)
tk = tok.tokenize(string)
output= detok.detokenize(tk, return_str = True)
print(output)
"i can 't use this cause they won' t fit"
</code></pre>
<p>关于如何在“can”和“won”之后删除空白的任何想法,我都不能也不会。当我使用^{{cd1>}去破坏时,我得到了双空格,一个在撇号前后。示例^{cd2>}</p>
<p>我认为你可以简单地做一些事情,比如:</p>
<pre><code>output = "i can 't use this cause they won' t fit"
output = output.replace(" '", "")
print output
"i can't use this cause they won't fit"
</code></pre>