<p>我会试试这样的方法:</p>
<pre><code>import re # regex module
in_string = """Text from above"""
records = [] # list to store all records in order
record = "" # string to store current record
for line in in_string.splitlines(): # go through each line of the input
if re.match('\d\d/\d\d/\d\d',line): # match the date at the start
records.append(record) # add current record to list
record = "" # start new current record
record += line.strip() # add line (without whitespace) to current record
records.append(record) # add last record to records list
</code></pre>
<p>这将输出以下内容:</p>
<blockquote>
<p>['', </p>
<p>'01/01/11 S11-55555 20/444-55-6666 A. PROSTATE AND SEMINAL VESICLES, PROSTATECTOMY:- ADENOCARCINOMA.TOTAL GLEASON SCORE: GLEASON 5+4=9TUMOR LOCATION: BILATERALTUMOR QUANTITATION: 15% OF PROSTATE INVOLVED BY TUMOREXTRAPROSTATIC EXTENSION: PRESENT AT RIGHT POSTERIORSEMINAL VESICLE INVASION: PRESENTMARGINS: UNINVOLVEDLYMPHOVASCULAR INVASION: PRESENTPERINEURAL INVASION: PRESENTLYMPH NODES (SPECIMENS B AND C):NUMBER EXAMINED: 25NUMBER INVOLVED: 1DIAMETER OF LARGEST METASTASIS: 1.7 mmADDITIONAL FINDINGS: HIGH-GRADE PROSTATIC INTRAEPITHELIAL NEOPLASIA,ACUTE AND CHRONIC INFLAMMATION, INTRADUCTAL EXTENSION OF INVASIVECARCINOMAPATHOLOGIC STAGE: pT3b N1 MXB. LYMPH NODES, RIGHT PELVIC, EXCISION:- ONE OF SEVENTEEN LYMPH NODES POSITIVE FOR METASTASIS (1/17).C. LYMPH NODES, LEFT PELVIC, EXCISION:- EIGHT LYMPH NODES NEGATIVE FOR METASTASIS (0/8).',</p>
<p>'01/02/11 S11-4444 20/111-22-3333 PROSTATE AND SEMINAL VESICLES, PROSTATECTOMY:- ADENOCARCINOMA.GLEASON SCORE: 3 + 3 = 6 WITH TERTIARY PATTERN OF 5.TUMOR QUANTITATION: APPROXIMATELY 10% BY VOLUME.TUMOR LOCATION: BILATERAL.EXTRAPROSTATIC EXTENSION: NOT IDENTIFIED.MARGINS: NEGATIVE.PERINEURAL INVASION: IDENTIFIED.LYMPH-VASCULAR INVASION: NOT IDENTIFIED.SEMINAL VESICLE/VASA DEFERENTIA INVASION: NOT IDENTIFIED.LYMPH NODES: NONE SUBMITTED.OTHER: HIGH GRADE PROSTATIC INTRAEPITHELIAL NEOPLASIA.PATHOLOGIC STAGE (pTNM): pT2c NX.']</p>
</blockquote>
<p>注意:这是一个糟糕的正则表达式,它将匹配任何以“nn/nn/nn”开头的行</p>
<p>您可能需要在行之间添加一个空格—类似于<code>record += line.strip()+' '</code></p>
<p>祝你好运!在</p>
<hr/>
<p>您可以使用正则表达式(regex/re)<a href="http://regexpal.com/" rel="nofollow">here</a>-将正则表达式(即<code>\d\d/\d\d/\d\d S11</code>)放在顶部框中,文本放在底部框中。在</p>