使用xml.etree.ElementTree解析某些元素时出现问题

2024-05-19 21:38:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望你身体好。我面临一些与解析器相关的困难。事实上,我的数据集如下所示:

<?xml version="1.0"?>

<bugrepository name="AspectJ">
  <bug id="28974" opendate="2003-1-3 10:28:00" fixdate="2003-1-14 14:30:00">
    <buginformation>
      <summary>"Compiler error when introducing a ""final"" field"</summary>
      <description>The aspecs the problem...</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/AjcMemberMaker.java</file>
    </fixedFiles>
  </bug>

  <bug id="28919" opendate="2002-12-30 16:40:00" fixdate="2003-1-14 15:06:00">
    <buginformation>
      <summary>waever tries to weave into native methods ...</summary>
      <description>If youat org.aspectj.ajdt.internal.core.burce</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/LazyMethodGen.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29186" opendate="2003-1-8 21:22:00" fixdate="2003-1-14 16:43:00">
    <buginformation>
      <summary>ajc -emacssym chokes on pointcut that includes an intertype method</summary>
      <description>This ;void Foo.ajc$before$Foo</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Lint.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Shadow.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/BcelWeaver.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29769" opendate="2003-1-19 11:42:00" fixdate="2003-1-24 21:17:00">
    <buginformation>
      <summary>Ajde does not support new AspectJ 1.1 compiler options</summary>
      <description>The org.aspectj.ajpiler. This enhancement is needed byort.</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/ajde/testdata/examples/figures-coverage/figures/Figure.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/AjdeTests.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/ui/StructureViewManagerTest.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/ajc/BuildArgParser.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/core/builder/AjBuildConfig.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/testsrc/org/aspectj/ajdt/ajc/BuildArgParserTestCase.java</file>
    </fixedFiles>
  </bug>
  <bug id="29959" opendate="2003-1-22 7:10:00" fixdate="2003-2-13 16:00:00">
    <buginformation>
      <summary>super call in intertype method declaration body causes VerifyError</summary>
      <description>AspectJ Compiler 1.1 showstopper</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/compiler/ast/InterTypeConstructorDeclaration.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/ast/SuperFixerVisitor.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/lookup/InterTypeMethodBinding.java</file>
      <file>org.aspectj/modules/tests/bugs/SuperToIntro.java</file>
    </fixedFiles>
  </bug>
</bugrepository>

我希望能够恢复数据集的一些元素,以便在数据帧中与熊猫一起使用它们

第一个问题是以列表形式从标记中获取所有子元素

实际上,我的代码只检索第一个元素,而忽略其他元素,或者可以检索所有元素,但不具有这些图片中所示的结构: here only the empty ([]) lists without content

守则:

import pandas as pd 
from xml.etree.ElementTree import parse

document = parse('dataset.xml')
summary = []
description = []
fixedfile = []

for item in document.iterfind('bug'):
    summary.append(item.findtext('buginformation/summary'))
    description.append(item.findtext('buginformation/description'))
    fixedfile.append(item.findall('fixedFiles/file'))
    
#df = pd.DataFrame({'summary':summary, 'description':description, 'fixed_files':fixedfile})
df = pd.DataFrame({'fixed_files': fixedfile})
df

here only the first element

守则:

import pandas as pd 
from xml.etree.ElementTree import parse

document = parse('dataset.xml')
summary = []
description = []
fixedfile = []

for item in document.iterfind('bug'):
    summary.append(item.findtext('buginformation/summary'))
    description.append(item.findtext('buginformation/description'))
    fixedfile.append(item.findtext('fixedFiles/file'))
    
#df = pd.DataFrame({'summary':summary, 'description':description, 'fixed_files':fixedfile})
df = pd.DataFrame({'fixed_files': fixedfile})
df

我在这里找到了"Problem traversing XML tree with Python xml.etree.ElementTree"一个适合我的情况的解决方案,它可以工作,但不像我想要的那样(每个元素的列表),我可以加载所有元素,但不能单独加载

守则:

import xml.etree.ElementTree as ET
import pandas as pd 

xmldoc = ET.parse('dataset.xml')
root = xmldoc.getroot()
summary = []
description = []
fixedfile = []

for bug in xmldoc.iter(tag='bug'): 
    
    #for item in document.iterfind('bug'):
    #summary.append(item.findtext('buginformation/summary'))
    #description.append(item.findtext('buginformation/description'))
    
    for file in bug.iterfind('./fixedFiles/file'):
    
           fixedfile.append([file.text])
        
fixedfile
#df = pd.DataFrame({'summary':summary, 'description':description, 'fixed_files':fixedfile})
df = pd.DataFrame({'fixed_files': fixedfile})
df

当我想要迭代我的数据帧的其他列(摘要、描述)时,我会收到以下错误消息: ValueError:所有数组的长度必须相同

第二个问题,例如,能够选择具有2或3个子元素的所有标记

致以最良好的祝愿


Tags: orgsrcmodulesdescriptionjavasummaryitembug
2条回答

下表收集了数据。其思想是找到所有bug元素并对它们进行迭代。对于每个bug-查找所需的子元素

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0"?>

<bugrepository name="AspectJ">
  <bug id="28974" opendate="2003-1-3 10:28:00" fixdate="2003-1-14 14:30:00">
    <buginformation>
      <summary>"Compiler error when introducing a ""final"" field"</summary>
      <description>The aspecs the problem...</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/AjcMemberMaker.java</file>
    </fixedFiles>
  </bug>

  <bug id="28919" opendate="2002-12-30 16:40:00" fixdate="2003-1-14 15:06:00">
    <buginformation>
      <summary>waever tries to weave into native methods ...</summary>
      <description>If youat org.aspectj.ajdt.internal.core.burce</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/LazyMethodGen.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29186" opendate="2003-1-8 21:22:00" fixdate="2003-1-14 16:43:00">
    <buginformation>
      <summary>ajc -emacssym chokes on pointcut that includes an intertype method</summary>
      <description>This ;void Foo.ajc$before$Foo</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Lint.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Shadow.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/BcelWeaver.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29769" opendate="2003-1-19 11:42:00" fixdate="2003-1-24 21:17:00">
    <buginformation>
      <summary>Ajde does not support new AspectJ 1.1 compiler options</summary>
      <description>The org.aspectj.ajpiler. This enhancement is needed byort.</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/ajde/testdata/examples/figures-coverage/figures/Figure.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/AjdeTests.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/ui/StructureViewManagerTest.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/ajc/BuildArgParser.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/core/builder/AjBuildConfig.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/testsrc/org/aspectj/ajdt/ajc/BuildArgParserTestCase.java</file>
    </fixedFiles>
  </bug>
  <bug id="29959" opendate="2003-1-22 7:10:00" fixdate="2003-2-13 16:00:00">
    <buginformation>
      <summary>super call in intertype method declaration body causes VerifyError</summary>
      <description>AspectJ Compiler 1.1 showstopper</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/compiler/ast/InterTypeConstructorDeclaration.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/ast/SuperFixerVisitor.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/lookup/InterTypeMethodBinding.java</file>
      <file>org.aspectj/modules/tests/bugs/SuperToIntro.java</file>
    </fixedFiles>
  </bug>
  </bugrepository>'''

data = []
root = ET.fromstring(xml)
for bug in root.findall('.//bug'):
    bug_info = bug.find('buginformation')
    fixed_files = bug.find('fixedFiles')
    entry = {'summary': bug_info.find('summary').text,'description':bug_info.find('summary').text,'fixedFiles':[x.text for x in list(fixed_files)]}
    data.append(entry)
for entry in data:
    print(entry)
df = pd.DataFrame(data)

输出

{'summary': '"Compiler error when introducing a ""final"" field"', 'description': '"Compiler error when introducing a ""final"" field"', 'fixedFiles': ['org.aspectj/modules/weaver/src/org/aspectj/weaver/AjcMemberMaker.java']}
{'summary': 'waever tries to weave into native methods ...', 'description': 'waever tries to weave into native methods ...', 'fixedFiles': ['org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/LazyMethodGen.java']}
{'summary': 'ajc -emacssym chokes on pointcut that includes an intertype method', 'description': 'ajc -emacssym chokes on pointcut that includes an intertype method', 'fixedFiles': ['org.aspectj/modules/weaver/src/org/aspectj/weaver/Lint.java', 'org.aspectj/modules/weaver/src/org/aspectj/weaver/Shadow.java', 'org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/BcelWeaver.java']}
{'summary': 'Ajde does not support new AspectJ 1.1 compiler options', 'description': 'Ajde does not support new AspectJ 1.1 compiler options', 'fixedFiles': ['org.aspectj/modules/ajde/testdata/examples/figures-coverage/figures/Figure.java', 'org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/AjdeTests.java', 'org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/ui/StructureViewManagerTest.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/ajc/BuildArgParser.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/core/builder/AjBuildConfig.java', 'org.aspectj/modules/org.aspectj.ajdt.core/testsrc/org/aspectj/ajdt/ajc/BuildArgParserTestCase.java']}
{'summary': 'super call in intertype method declaration body causes VerifyError', 'description': 'super call in intertype method declaration body causes VerifyError', 'fixedFiles': ['org.aspectj/modules/org.aspectj.ajdt.core/src/org/compiler/ast/InterTypeConstructorDeclaration.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/ast/SuperFixerVisitor.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/lookup/InterTypeMethodBinding.java', 'org.aspectj/modules/tests/bugs/SuperToIntro.java']}

要将文件保存在与描述和摘要关联的列表中,请将它们添加到每个bug的新列表中

试试看:

import pandas as pd
from xml.etree.ElementTree import parse

document = parse('dataset.xml')
summary = []
description = []
fixedfile = []

for item in document.iterfind('bug'):
    summary.append(item.findtext('buginformation/summary'))
    description.append(item.findtext('buginformation/description'))
    fixedfile.append([elt.text for elt in item.findall('fixedFiles/file')])

df = pd.DataFrame({'summary': summary,
                   'description': description,
                   'fixed_files': fixedfile})
df

对于第二部分,这将只过滤带有两个或更多文件的bug

newdf = df[df.fixed_files.str.len() >= 2]

如果想要2个和3个文件的bug,那么:

newdf = df[(df.fixed_files.str.len() == 2) | (df.fixed_files.str.len() == 3)]

相关问题 更多 >