<p>我会分三步来做(如果你按照选项做的话,可以分五步):</p>
<ol>
<li>第一个匹配<code>text.replace(' *','(@)')</code>(星号前三个空格)。将所有这些空格对(或多于两个)转换成某些标记,您可以确定这些标记不会出现在文本中(我以<code>(@)</code>为例),如<a href="https://regex101.com/r/49VxlG/1/" rel="nofollow noreferrer">demo1</a>所示。这是为了避免将两个(或更多)空间序列视为单个空间的序列(如下我们将删除这些序列)</li>
<li>接下来,<code>text.replace(' ','')</code>。将所有单个空格转换为空字符串,如<a href="https://regex101.com/r/49VxlG/2" rel="nofollow noreferrer">demo2</a>所示。<strong>这将连接示例文本中由单个空格分隔的许多单词,请小心。</strong></li>
<li>最后,<code>text.replace('\(@\)',' ')</code>。将第一步中的所有标记转换为单个空格,如<a href="https://regex101.com/r/49VxlG/3" rel="nofollow noreferrer">demo3</a>。你知道吗</li>
<li>[可选]<code>text.replace(' *([.!?]) *([A-Z])','. $1')</code>。如果你也将所有的点后跟一个大写字符转换成一个点,后跟两个空格和匹配的大写字符,那么你会得到一个更漂亮的外观。如<a href="https://regex101.com/r/49VxlG/5" rel="nofollow noreferrer">demo4</a>。你知道吗</li>
<li>[可选]<code>text.match(' *([,;:]) *','$1 ')')</code>。对其他标点符号执行相同的操作,但只使用一个空格。你知道吗</li>
</ol>
<p>您可以使用<code>sed(1)</code>执行此操作,如下所示:</p>
<pre><code>$ sed -e 's/ */#@#/g' \
-e 's/ //g' \
-e 's/#@#/ /g' \
-e 's/ *\([.!?]\) *\([A-Z]\)/\1 \2/g' \
-e 's/ *\([,;:]\) */\1 /g' \
<<EOF
The European l a n g u a g es ar e members of
the same fa m i l y . Their sep a rate e xi ste nce
is a myth . F or s c i e n c e , music, sport ,
etc, Europe uses the s a m e v oca bula ry. The
languages o n l y d i f f e r i n t heir
grammar, their pro nu n c iation and their most
common words. Everyone realizes why a new common
language would be desirable: one could
refuse to pay expensive translators.
EOF
TheEuropean languages are members of
the same family. Their separate existence
isamyth. For science, music, sport,
etc, Europeusesthe same vocabulary. The
languages only differ in their
grammar, their pronunciation andtheirmost
commonwords. Everyonerealizeswhyanewcommon
languagewouldbedesirable: onecould
refusetopayexpensivetranslators.
$ _
</code></pre>
<p>最后一个例子还将<code>[,;:]</code>转换为它们加上一个空格,并对<code>?</code>和<code>!</code>标记进行句子分隔。你知道吗</p>
<blockquote>
<p>How do I remove all the spaces from a string that are bracketed between pairs of double spaces?</p>
</blockquote>
<p>不要考虑两个之间的空间。。。这与两个或多个</em>相同,只是<code>text.replace(' *',' ')</code>(在<code>*</code>之前有三个空格),或者用两个</em>的字符串替换两个或多个空格的字符串。同样可以通过<code>text.replace(' +',' ')'</code>(在<code>+</code>之前的两个空格)实现。你知道吗</p>