Python-SAX解析器程序计算错误结果

<?xml version="1.0" encoding="UTF-8"?> <lieferungen xmlns="urn:myspace:lieferungen" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:myspace:lieferungen C:\xml\lieferungen.xsd"> <artikel id="3526"> <name>apfel</name> <preis stueckpreis="true">8.97</preis> <lieferant>Fa. Krause</lieferant> </artikel> <artikel id="7866"> <name>Kirschen</name> <preis stueckpreis="false">10.45</preis> <lieferant>Fa. Helbig</lieferant> </artikel> <artikel id="4444">  <name>apfel</name> <preis stueckpreis="true">12.67</preis> <lieferant>Fa. Liebig</lieferant> </artikel> <artikel id="7866"> <name>Kirschen</name> <preis stueckpreis="false">17.67</preis> <lieferant>Fa. Krause</lieferant> </artikel> <artikel id="2345">  <name>apfel</name> <preis stueckpreis="true">9.54</preis> <lieferant>Fa. Mertes</lieferant> </artikel> <artikel id="7116">  <name>Kirschen</name> <preis stueckpreis="false">16.45</preis> <lieferant>Fa. Hoeller</lieferant> </artikel> <artikel id="7868"> <name>Kohl</name> <preis stueckpreis="false">3.20</preis> <lieferant>Fa. Hoeller</lieferant> </artikel> <artikel id="7866"> <name>Kirschen</name> <preis stueckpreis="false">12.45</preis> <lieferant>Fa. Richard</lieferant> </artikel> <artikel id="3245"> <name>Bananen</name> <preis stueckpreis="false">15.67</preis> <lieferant>Fa. Hoeller</lieferant> </artikel> <artikel id="6745">  <name>Kohl</name> <preis stueckpreis="false">3.10</preis> <lieferant>Fa. Reinhardt</lieferant> </artikel> <artikel id="7789"> <name>Ananas</name> <preis stueckpreis="true">8.60</preis> <lieferant>Fa. Richard</lieferant> </artikel> </lieferungen>

import xml.sax import sys class C_Handler(xml.sax.ContentHandler): def __init__(self): self.items = {} self.items2 = {} self.read = 0 self.id = 0 def startDocument(self): print("Inconsistencies:\n") def startElement(self, tag, attributes): if tag=="name": self.read = 1 if tag=="artikel": self.id = attributes["id"] def endElement(self, tag): if tag=="name": self.read = 0 def characters(self, content): if self.read == 1: item = content #check whether the item is not yet part of the dictionaries if item not in self.items: #add item (e.g. "apfel") to both dictionary "items" and #dictionary "items2". The value for the item is the id in the #case of dictionary "items" and "0" in the case of dictionary #"items2". The second dictionary contains the number of #inconsistencies for each product. At the beginning, the #number of inconsistencies for the product is zero. self.items[item] = self.id self.items2[item] = 0 else: if self.items[item] == self.id: #increase number of inconsistencies by 1: self.items2[item] = self.items2[item] + 1 def endDocument(self): for prod in self.items2: if self.items2[prod]>0: print("There are {} different IDs for item \" {}\".".format(self.items2[prod] + 1, prod)) if ( __name__ == "__main__"): c = C_Handler() xml.sax.parse("lieferungen.xml", c)

1条回答

网友

1楼 · 发布于 2024-09-29 21:47:25

除非我误解了，错误是这行

                if self.items[item] == self.id:

应该是

                if self.items[item] != self.id:

从目前的情况来看，您的程序似乎在计算一致性而不是不一致性：Kirschen使用ID7866三次，其他任何程序都不会多次使用同一ID，因此您的输出是正确的。你知道吗

通过上述更改，我得到以下输出：

Inconsistencies:

There are 3 different IDs for item "apfel".
There are 2 different IDs for item "Kirschen".
There are 2 different IDs for item "Kohl".

说了这些，我不确定你的代码一定会一直做你想做的事情。尝试将ID为7116的<artikel>元素移到所有其他<artikel>元素之上，然后运行代码。然后，代码将告诉您Kirschen有四个不同的id，而实际上只有两个。你知道吗

这是因为程序为一个项输出的ID的数目是为该项找到的第一个ID的数目，以及为每个具有该名称但其ID与第一个ID不同的<artikel>元素的数目。你知道吗

如果您真的想计算每个产品使用的id的数量，更好的方法是使用集合来存储每个产品使用的id，然后打印包含多个元素的任何集合的长度。下面是您的characters方法在进行此更改后的外观-我将让您对endDocument方法进行必要的修改：

   def characters(self, content):
        if self.read == 1:
            item = content
            #check whether the item is not yet part of the dictionary
            if item not in self.items:
                self.items[item] = set([self.id])
            else:
                self.items[item].add(self.id)

注意，在最后一行中，我不需要检查self.items[item]中的集合是否已经包含self.id。集合的好处在于，如果你添加了一个已经在集合中的ID，那么什么也不会发生。集合不会以重复的ID结束。还要注意的是，我不再使用self.items2，因为self.items有我需要的所有信息。你知道吗

你甚至可以更进一步。我们必须检查item是否在self.items中，如果不在，则为该项创建一个集。如果使用^{}，那么如果该项不存在，则需要为我们创建一个集。在C_Handler类上方添加from collections import defaultdict行，并用self.items = defaultdict(set)替换self.items = {}行。完成此操作后，characters方法只需如下所示：

    def characters(self, content):
        if self.read == 1:
            item = content
            self.items[item].add(self.id)

相关问题更多 >

编程相关推荐

热门问题

热门文章