使用与其他标记匹配的条件选择某些XML标记

2024-09-29 23:27:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个XML文件,其结构如下:

<text>
  <dialogue>
     <pattern>
        We're having a {nice|great} time.
     </pattern>
     <criterion>
       <!-- match this tag, get the above pattern -->
        average_person, tourist, delighted
     </criterion>
  </dialogue>
     <pattern>
        The service {here stinks|is terrible}!
     </pattern>
     <criterion>
        tourist, disgruntled, average_person
     </criterion>
  <dialogue>
     <pattern>
        They have {smoothies|funny hats}. Neat!
     </pattern>
     <criterion>
        tourist, smoothie_enthusiast
     </criterion>
  </dialogue>
  <dialogue>
     <pattern>
        I wonder how {expensive|valuable} these resort tickets are?
     </pattern>
     <criterion>
        merchant, average_person
     </criterion>
  </dialogue>
</text>

我想做的是遍历dialogue标记,查看criterion标记,并匹配单词列表。如果它们匹配,那么我将使用dialogue标记中的模式。我使用Python来完成这个任务

我现在所做的是通过使用lxml“etree”遍历标记,它如下所示:

tree = etree.parse('tourists.xml')
root = tree.getroot()
g=0
for i in root.iterfind('dialogue/criterion'):
   a = i.text.split(',')
   # The "personality" variable has a value like "delighted" or "disgruntled".
   # "tags_to_match" are the criterion that we want to, well, match. It may
   # have criterion like "merchant", "tourist", or "delighted".
   # When the tags match (in the "match_tags" function) returns true, it
   # appends the pattern to the "tourist_patterns" list.
   if personality is not 'average_person' and match_tags( tags_to_match, a):
       tourist_patterns.append(root[g][0].text)
   g+=1
# When we don't have a match, we just go with the "average_person" tag.
if len(tourist_patterns) == 0:
   # Go through the tags again, choosing the ones that match the
   # 'average_person' personality and put it in the "tourist_patterns" list.

然后我浏览“旅游模式”列表中的元素,找出我想要的

我在努力简化这件事。如何遍历标记,在criterion标记中匹配所需的文本,然后在pattern标记中获取模式?我还尝试设置一个默认值,当标准不匹配时(因此是“普通人”人格标准)


编辑:一些评论员要求列出要匹配的内容。基本上,我希望它匹配criterion标记中的一些或所有单词,并给出pattern标记中dialogue标记下面的文本。因此,如果我在寻找“旅游者”和“冰沙爱好者”,那么在我的XML示例中会找到一个匹配项。然后我想得到pattern标签文本“They have{smoothies}funny hats}。太好了。如果这与criterion标签中的任何一个词都不匹配,我只会尝试匹配“普通人”和“旅游者”

反过来,tourist_patterns在匹配时会如下所示:

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!']

当它不匹配时,它会匹配这个:

>>> tourist_pattern
    ['They have {smoothies|funny hats}. Neat!', 'The service {here stinks|is terrible}!']

希望能把事情弄清楚


Tags: thetext标记havematchtagspatternsperson

热门问题