回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我尝试在SimpleArn中使用python学习数据科学。在matplotlib学习部分,他们从<a href="https://www.hubertiming.com/results/2018MLK" rel="nofollow noreferrer">here</a>进行网页抓取</p>
<pre class="lang-py prettyprint-override"><code>import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
url="https://www.hubertiming.com/results/2018MLK" #OPEN LINK
html=urlopen(URL)
soup=BeautifulSoup(html,"lxml")
title = soup.title
print (title)
print(title.text)
links = soup.find_all('a',href=True)
for link in links:
print (link['href'])
data =[]
allrows=soup.find_all("tr")
for row in allrows:
row_list = row.find_all("td")
dataRow=[]
data_converted = []
for cell in row_list:
dataRow.append(cell.text)
data.append(dataRow)
data=data[4:]
print(data[-2:])
</code></pre>
<p>这就是结果</p>
<pre><code>[['190', '2087', '\r\n\r\n LEESHA POSEY\r\n\r\n ', 'F', '43', 'PORTLAND', 'OR', '1:33:53', '30:17', '\r\n\r\n 112 of 113\r\n\r\n ', 'F 40-54', '\r\n\r\n 36 of 37\r\n\r\n ', '0:00', '1:33:53'], ['191', '1216', '\r\n\r\n ZULMA OCHOA\r\n\r\n ', 'F', '40', 'GRESHAM', 'OR', '1:43:27', '33:22', '\r\n\r\n 113 of 113\r\n\r\n ', 'F 40-54', '\r\n\r\n 37 of 37\r\n\r\n ', '0:00', '1:43:27']]
</code></pre>
<p>我怎样才能摆脱<code>\r\n\r\n</code>??我已经使用了<code>"replace"</code>函数,它说<code>"'list' object has no attribute 'replace'"</code>,而且我也不能使用strip</p>