<p>用漂亮的汤刮去格式繁复的桌子肯定是痛苦的(不要对漂亮的汤大惊小怪,这对几个用例来说都是美妙的)。如果您愿意对此有一点实用性的话,我会使用一种“黑客”来抓取被密集标记包围的数据:</p>
<pre><code>1. Select entire table on web page
2. Copy + paste into Evernote (simplifies and reformats the HTML)
3. Copy + paste from Evernote to Excel or another spreadsheet software (removes the HTML)
4. Save as .csv
</code></pre>
<p><strong>输入</strong>
<a href="https://i.stack.imgur.com/nrGQS.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/nrGQS.png" alt="heavily formatted data surrounded with dense HTML"/></a>
<strong>输出</strong>
<a href="https://i.stack.imgur.com/UFEsc.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/UFEsc.png" alt="minimally formatted data in csv"/></a></p>
<p>这并不完美。CSV中会有空行,但删除空行比刮取此类数据更容易,耗时也少得多。祝你好运</p>
<p>作为参考,我在下面链接了我自己的转换</p>
<ul>
<li><a href="https://www.evernote.com/shard/s413/client/snv?noteGuid=f17165aa-f11a-46c9-8954-f1e94f88f80e&noteKey=cf7338c3e903b5ce&sn=https%3A%2F%2Fwww.evernote.com%2Fshard%2Fs413%2Fsh%2Ff17165aa-f11a-46c9-8954-f1e94f88f80e%2Fcf7338c3e903b5ce&title=Skater%2BStatistics" rel="nofollow noreferrer">Parsed to Evernote</a></li>
<li><a href="https://docs.google.com/spreadsheets/d/14zVYnXhOC_yi_PwSL0qHYyasqk7ljuo2-tePdmwkg9c/edit?usp=sharing" rel="nofollow noreferrer">Parsed to Excel</a></li>
</ul>