<p><strong>试试这个:</strong></p>
<pre><code>import re
def normalize_text(get_text):
saved_new_lines = []
counter = 0
for each_line in get_text.split("\n"):
if not each_line == "":
normalize_each_line = re.sub(r'\s+', ' ', each_line.strip())
if each_line.startswith(" "):
saved_new_lines[counter-1] += " " + normalize_each_line
else:
saved_new_lines.append(normalize_each_line)
counter += 1
return "\n".join(saved_new_lines)
print(normalize_text(S))
</code></pre>
<p>输出:</p>
<pre class="lang-none prettyprint-override"><code>Missing Since 06/01/1976
Missing From Napa, California
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2, 130 pounds
Distinguishing Characteristics Caucasian female. Brown hair, hazel eyes.
</code></pre>
<p>@FedericoBaù给了我暗示;所以我更新了我的代码(这个版本没有任何空行检查器,所以它将比当前状态快得多)</p>
<p><strong>更新:</strong></p>
<pre><code>import re
def normalize_text(get_text):
saved_new_lines = []
counter = 0
for each_line in re.sub(r'\n+', '\n', get_text.strip()).splitlines():
normalize_each_line = re.sub(r'\s+', ' ', each_line.strip())
if each_line.startswith(" "):
saved_new_lines[counter-1] += " {}".format(normalize_each_line)
else:
saved_new_lines.append(normalize_each_line)
counter += 1
return "\n".join(saved_new_lines)
print(normalize_text(test_string))
</code></pre>