擅长:python、mysql、java
<p>我不想用正则表达式。根据文本,它看起来像字符串减去空白,大致遵循并重复以下格式:</p>
<pre><code>thing 1
score
thing 2
"final"
</code></pre>
<p>因此,我可以继续,清理字符串,遍历它,并将每组4作为字典的一部分返回。在</p>
<p>例如:</p>
^{pr2}$
<p>然后,您可以:</p>
<pre><code>>>> raw = ''.join(soup.findAll(text=True))
>>> scores = get_scores(raw)
>>> print scores['Norfolk St.']
('Norfolk St.', '0 - 38', 'Rutgers')
</code></pre>
<p>如果希望查找不区分大小写,可以执行以下操作:</p>
<pre><code>def get_scores(raw):
clean = [line.strip().lower() for line in raw.split('\n') if line.strip() != '']
return {thing1: (thing1, score, thing2) for (thing1, score, thing2, _) in chunk(clean, 4)}
</code></pre>
<p>如果您想查找“Norfolk St.”或“Rutgers”并得到相同的结果,可以执行以下操作:</p>
<pre><code>def get_scores(raw):
clean = [line.strip().lower() for line in raw.split('\n') if line.strip() != '']
output = {}
for (thing1, score, thing2, _) in chunk(clean, 4):
data = (thing1, score, thing2)
output[thing1] = data
output[thing2] = data
return output
</code></pre>