<p>Python的re-module中有一个我非常喜欢的VERBOSE选项。代码应该是自解释的(根据3.6检查)</p>
<pre><code>import re
data = """
-
| COLUMN_NAME | DATA_TYPE |
-
| C460 | VARCHAR2 |
| C459 | CLOB |
| C458 | VARCHAR2 |
| C8 | BLOB |
| C60901 | INT |
"""
pattern = """
(C\d+) # Match a capital C followed by at least one digit
(?:\s*\|\s) # Non-matching group for \s - whitespace, \| - pipe, \s - whitespace
(?=INT|CLOB|BLOB) # Positive Lookahead match INT, CLOB or BLOB
"""
match_column = re.compile(pattern, re.VERBOSE)
columns = match_column.findall(data)
print(list(columns))
</code></pre>
<p>这应该会给你['C459','C8','C60901',这就是你所追求的。一旦你明白了你可以写:<code>r'(C\d+)(?:.*(?:INT|CLOB|BLOB))'</code>。但是,对于冗长和特定的匹配(空白和管道字符),有一些事情可以说,滥用<code>.</code>常常会导致正则表达式匹配超出我最疯狂梦想的东西。在</p>
<p>你真的不应该做以上任何事!伟大的黑客杰米·扎文斯基曾经说过:</p>
<blockquote>
<p>Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.</p>
</blockquote>
<p>如果您能够逐行处理输入,我会这样做:</p>
^{pr2}$