我尝试在SimpleArn中使用python学习数据科学。在matplotlib学习部分,他们从here进行网页抓取
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
url="https://www.hubertiming.com/results/2018MLK" #OPEN LINK
html=urlopen(URL)
soup=BeautifulSoup(html,"lxml")
title = soup.title
print (title)
print(title.text)
links = soup.find_all('a',href=True)
for link in links:
print (link['href'])
data =[]
allrows=soup.find_all("tr")
for row in allrows:
row_list = row.find_all("td")
dataRow=[]
data_converted = []
for cell in row_list:
dataRow.append(cell.text)
data.append(dataRow)
data=data[4:]
print(data[-2:])
这就是结果
[['190', '2087', '\r\n\r\n LEESHA POSEY\r\n\r\n ', 'F', '43', 'PORTLAND', 'OR', '1:33:53', '30:17', '\r\n\r\n 112 of 113\r\n\r\n ', 'F 40-54', '\r\n\r\n 36 of 37\r\n\r\n ', '0:00', '1:33:53'], ['191', '1216', '\r\n\r\n ZULMA OCHOA\r\n\r\n ', 'F', '40', 'GRESHAM', 'OR', '1:43:27', '33:22', '\r\n\r\n 113 of 113\r\n\r\n ', 'F 40-54', '\r\n\r\n 37 of 37\r\n\r\n ', '0:00', '1:43:27']]
我怎样才能摆脱\r\n\r\n
??我已经使用了"replace"
函数,它说"'list' object has no attribute 'replace'"
,而且我也不能使用strip
你只能这样做。将代码中的
cell.text
转换为cell.text.strip()
,如下所示:.read_html()
将不起作用李>df.Name = df.Name.str.strip()
或df.Name = df.Name.str.replace('\r', '')
这样的代码就可以了李>你有一个2D列表
我们在利用什么:
strip()
方法使用以下代码:
输出:
相关问题 更多 >
编程相关推荐