<pre><code>from __future__ import print_function
import re
import datetime
from bs4 import BeautifulSoup
soup = ""
with open("/tmp/a.html") as page:
soup = BeautifulSoup(page.read(),"html.parser")
table = soup.find('div', {'style': 'overflow:auto; border:1px #cccccc solid;'}).find('table')
trs = table.find_all('tr')
table_dict = {}
game = ""
section = ""
for tr in trs:
if tr.has_attr('class'):
game = tr.text.strip('\n')
if tr.has_attr('bgcolor'):
if tr['bgcolor'] == '#CCE4F1':
section = tr.text.strip('\n')
else:
tds = tr.find_all('td')
extracted_text = [re.sub(r'([^\x00-\x7F])+','', td.text) for td in tds]
extracted_text = [x.strip() for x in extracted_text]
extracted_text = list(filter(lambda x: len(x) > 2, extracted_text))
extracted_text.pop(1)
extracted_text[2] = "Player " + extracted_text[2]
extracted_text[3] = datetime.datetime.strptime(extracted_text[3], '%m/%d/%y %I:%M %p').strftime("%Y-%m-%d")
extracted_text = ['"' + x + '"' for x in [game, section] + extracted_text]
print(','.join(extracted_text))
</code></pre>
<p>跑步时:</p>
^{pr2}$
<p>根据与OP的进一步对话,输入是<a href="https://paste.fedoraproject.org/428111/87928814/raw/" rel="nofollow">https://paste.fedoraproject.org/428111/87928814/raw/</a>,运行上述代码后的输出是:<a href="https://paste.fedoraproject.org/428110/38792211/raw/" rel="nofollow">https://paste.fedoraproject.org/428110/38792211/raw/</a></p>