python解析带有空条目的网站表问题的回答

python解析带有空条目的网站表

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在编写一段python代码，从一个网站获取数据。桌子布置得很好，大部分时间一切正常。你知道吗 但是，当解析器遇到空白字段时，它会完全忽略它。我需要它来计算空白，但我不知道如何做到这一点。你知道吗 问题在于我正在使用的一些数组给我带来了<code>out of bounds</code>错误。你知道吗 不管怎样，这是我的代码： <pre><code>class MyParser(HTMLParser): def __init__(self, *args, **kwargs): #There are only 2 tables in the source code. Outer one is useless to me self.outerloop = True #Set to true when we are in the table, and we want to collect data self.capture_data = False #Array to store the captured data self.dataArray = [] HTMLParser.__init__(self, *args, **kwargs) def handle_starttag(self, tag, attrs): if tag == 'table' and self.outerloop: self.outerloop=False elif tag=='td' and not self.outerloop: self.capture_data=True elif tag=='th': self.capture_data=False def handle_endtag(self, tag): if tag == 'table': self.capture_data=False def handle_data(self, data): if self.capture_data: self.dataArray.append(data) #Function to call the parser def getData(self): self.p = MyParser() url = 'http://www.mysite.com/get.php' content = urllib.urlopen(url).read() self.p.feed(content) val=0 resultString="" while val < len(self.p.dataArray): resultString+=self.p.dataArray[val]+"," val+=1 return HttpResponse(resultString[:-1]) </code></pre> 问题在于<code>handle_data</code>函数。不知何故，我需要告诉它将<code><td></td></code>存储为<code/>，例如一个空白字符串。这一点很重要，因为我将字符串作为逗号分隔的值列表输出到我的网页，如底部所示。你知道吗 如果有人能帮我，我会非常感激的。你知道吗 谢谢。你知道吗

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

好吧，我知道回答你自己的问题是不受欢迎的，但万一将来有人遇到这个问题，我就把我的来源列出来。你知道吗 我用两个整数来修正它。他们都从0开始。当我在questin中遇到开始标记时，我会增加其中一个数字。在处理数据时，我将第二个数字递增。当我遇到这个特殊标记的结束标记时，我检查了这些数字是否相等，如果数据被使用，它们应该相等。你知道吗 如果结果是数字不相等，那么就意味着程序没有处理数据，即一个空白标记。然后我简单地将<code>N/A</code>附加到数组中，并使其工作。请看这里： <pre><code>class MyHTMLParser(HTMLParser): def __init__(self, *args, **kwargs): self.outerloop = True self.capture_data = False self.dataArray = [] self.celldata="NA" self.firstnum=0 self.secondnum=0 HTMLParser.__init__(self, *args, **kwargs) def handle_starttag(self, tag, attrs): if tag == 'table' and self.outerloop: self.outerloop=False elif tag=='td' and not self.outerloop: self.capture_data=True # bool to indicate we want to capture data self.firstnum+=1 # increment first num to say we have encountered the tag in question elif tag=='th': self.capture_data=False def handle_endtag(self, tag): if tag == 'table': self.capture_data=False elif tag == 'td' and not self.firstnum == self.secondnum: #check if they are not equal self.dataArray.append(self.celldata) # append filler data self.secondnum=self.firstnum # make them equal for next tag def handle_data(self, data): if self.capture_data:: self.dataArray.append(data) self.secondnum=self.firstnum def getTides(self): self.p = MyHTMLParser() url = 'http://www.mysite.com/page.php' content = urllib.urlopen(url).read() self.p.feed(content) val=0 resultString="" while val < len(self.p.dataArray): resultString+=self.p.dataArray[val]+"," val+=1 return HttpResponse(resultString[:-1]) </code></pre>

python解析带有空条目的网站表

1 个回答

相关Python问题