从web上的表中提取数字

import urllib2 from bs4 import BeautifulSoup import re url = 'http://www.saiawos2.com/K61/15MinuteReport.php' page = urllib2.urlopen(url) soup = BeautifulSoup(page.read()) table = soup.findAll('table')[0] rows = table.findAll('tr') second_columns = [] thirteen_columns = [] for row in rows[1:]: second_columns.append(row.findAll('td')[1]) #Column with times thirteen_columns.append(row.findAll('td')[12]) #Precipitation Column for second, thirteen in zip(second_columns, thirteen_columns): times = ['12:00','11:00','10:00','09:00','08:00','07:00','06:00', '05:00','04:00','03:00','02:00','01:00','00:00','23:00', '22:00','21:00','20:00','19:00','18:00','17:00','16:00', '15:00','14:00','13:00',] time = '|'.join(times) if re.search(time, second.text): pcpn = re.sub('[^0-9]', '', thirteen.text) #Get rid of text print sum(pcpn[1:]) #Print sum and get rid of leading zero

1条回答

网友

1楼 · 发布于 2024-09-30 03:24:01

问题是sum试图找到整数列表的和，因为您传递了一个不能求和的unicode字符列表。在

您只需将列表的每个元素映射到int，并将其传递给sum。在

if re.search(time, second.text):
        pcpn = re.findall(r'[0-9.]+', thirteen.text) 
        print sum( float(x) for x in pcpn )

它的作用是什么？

re.findall(r'[0-9.]+', thirteen.text)而不是使用re.sub函数，而是使用^{}，它将给您一个匹配列表，然后将其传递给sum()函数。这里的匹配是数字。
sum( float(x) for x in pcpn )将每个元素映射到float并求和。在
- ( float(x) for x in pcpn )是一个generator语句，它在移动中创建元素。在

相关问题更多 >

编程相关推荐

热门问题

热门文章