<p><code>re</code>解决方案:</p>
<pre class="lang-py prettyprint-override"><code>import re
input = [
"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=250.44,Y=223.48499) height=3.5324998 width=4.2910004]DECEMBER 31,",
"[Base Font : IOFOEO+Imago-Book, Font Size : 3.876, Font Weight : 0.0] [(X=307.5,Y=240.48499) height=3.876 width=2.9970093]respectively. The net decrease in the revenue",
"[Base Font : IOHLGA+Trebuchet, Font Size : 3.5324998, Font Weight : 0.0] [(X=49.5,Y=233.98499) height=3.5324998 width=2.5690002](US$ in millions)",
]
def extract(s):
match = re.search("(X=\d+(?:\.\d*)?).*?\](.*?)$",s)
return match.groups()
output = [extract(item) for item in input]
print(output)
</code></pre>
<p>输出:</p>
<pre><code>[
('X=250.44', 'DECEMBER 31,'),
('X=307.5', 'respectively. The net decrease in the revenue'),
('X=49.5', '(US$ in millions)'),
]
</code></pre>
<p>说明:</p>
<ul>
<li><code>\d</code>。。。数字</li>
<li><code>\d+</code>。。。一个或多个数字</li>
<li><code>(?:...)</code>。。。非捕获(“正常”)括号</li>
<li><code>\.\d*</code>。。。点后跟零或多个数字</li>
<li><code>(?:\.\d*)?</code>。。。可选(零或一)“小数部分”</li>
<li><code>(X=\d+(?:\.\d*)?)</code>。。。第一组,<code>X=number</code></li>
<li><code>.*?</code>。。。任何字符的零个或多个(非贪婪)</li>
<li><code>\]</code><code>]</code>符号</li>
<li><code>$</code>。。。结束</li>
<li><code>\](.*?)$</code>。。。第二组,介于<code>]</code>和字符串结尾之间的任何内容</li>
</ul>