解析更新信息.xm

2024-06-25 05:39:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在试图分析亚马逊更新信息.xml用Python编写的大学项目文件。示例文件如下:

<?xml version="1.0" ?> <updates> <update author="linux-security@amazon.com" from="linux-security@amazon.com" status="final" type="security" version="1.4"> <id>AL2012-2014-001</id> <title>Amazon Linux 2012.03 - AL2012-2014-001: important priority package update for libxml2</title> <issued date="2014-10-19 15:48" /> <updated date="2014-10-19 15:48" /> <severity>important</severity> <description>Package updates are available for Amazon Linux that fix the following vulnerabilities: CVE-2012-5134: A heap-based buffer underflow flaw was found in the way libxml2 decoded certain entities. A remote attacker could provide a specially-crafted XML file that, when opened in an application linked against libxml2, would cause the application to crash or, potentially, execute arbitrary code with the privileges of the user running the application. </description> <references> <reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-5134" id="CVE-2012-5134" title="" type="cve" /> <reference href="https://rhn.redhat.com/errata/RHSA-2012:1512.html" id="RHSA-2012:1512" title="" type="redhat" /> </references> <pkglist> <collection short="amazon-linux"> <name>Amazon Linux</name> <package arch="x86_64" epoch="0" name="libxml2-debuginfo" release="10.23.26.ec2" version="2.7.8"> <filename>Packages/libxml2-debuginfo-2.7.8-10.23.26.ec2.x86_64.rpm</filename> </package> <package arch="x86_64" epoch="0" name="libxml2-devel" release="10.23.26.ec2" version="2.7.8"> <filename>Packages/libxml2-devel-2.7.8-10.23.26.ec2.x86_64.rpm</filename> </package> <package arch="x86_64" epoch="0" name="libxml2" release="10.23.26.ec2" version="2.7.8"> <filename>Packages/libxml2-2.7.8-10.23.26.ec2.x86_64.rpm</filename> </package> <package arch="x86_64" epoch="0" name="libxml2-static" release="10.23.26.ec2" version="2.7.8"> <filename>Packages/libxml2-static-2.7.8-10.23.26.ec2.x86_64.rpm</filename> </package> <package arch="x86_64" epoch="0" name="libxml2-python" release="10.23.26.ec2" version="2.7.8"> <filename>Packages/libxml2-python-2.7.8-10.23.26.ec2.x86_64.rpm</filename> </package> </collection> </pkglist> </update> <update author="linux-security@amazon.com" from="linux-security@amazon.com" status="final" type="security" version="1.4"> <id>AL2012-2015-088</id> <title>Amazon Linux 2012.03 - AL2012-2015-088: medium priority package update for gnutls</title> <issued date="2015-07-29 20:47" /> <updated date="2015-07-29 20:47" /> <severity>medium</severity> <description>Package updates are available for Amazon Linux that fix the following vulnerabilities: CVE-2015-0294: It was discovered that GnuTLS did not check if all sections of X.509 certificates indicate the same signature algorithm. This flaw, in combination with a different flaw, could possibly lead to a bypass of the certificate signature check. CVE-2015-0282: It was found that GnuTLS did not verify whether a hashing algorithm listed in a signature matched the hashing algorithm listed in the certificate. An attacker could create a certificate that used a different hashing algorithm than it claimed, possibly causing GnuTLS to use an insecure, disallowed hashing algorithm during certificate verification. CVE-2014-8155: It was found that GnuTLS did not check activation and expiration dates of CA certificates. This could cause an application using GnuTLS to incorrectly accept a certificate as valid when its issuing CA is already expired. </description> <references> <reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-8155" id="CVE-2014-8155" title="" type="cve" /> <reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0282" id="CVE-2015-0282" title="" type="cve" /> <reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0294" id="CVE-2015-0294" title="" type="cve" /> <reference href="https://rhn.redhat.com/errata/RHSA-2015:1457.html" id="RHSA-2015:1457" title="" type="redhat" /> </references> <pkglist> <collection short="amazon-linux"> <name>Amazon Linux</name> <package arch="x86_64" epoch="0" name="gnutls-debuginfo" release="18.14.al12" version="2.8.5"> <filename>Packages/gnutls-debuginfo-2.8.5-18.14.al12.x86_64.rpm</filename></package> <package arch="x86_64" epoch="0" name="gnutls" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-2.8.5-18.14.al12.x86_64.rpm</filename></package> <package arch="x86_64" epoch="0" name="gnutls-devel" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-devel-2.8.5-18.14.al12.x86_64.rpm</filename></package> <package arch="x86_64" epoch="0" name="gnutls-utils" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-utils-2.8.5-18.14.al12.x86_64.rpm</filename></package> <package arch="x86_64" epoch="0" name="gnutls-guile" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-guile-2.8.5-18.14.al12.x86_64.rpm</filename></package> <package arch="i686" epoch="0" name="gnutls-debuginfo" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-debuginfo-2.8.5-18.14.al12.i686.rpm</filename></package> <package arch="i686" epoch="0" name="gnutls-devel" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-devel-2.8.5-18.14.al12.i686.rpm</filename></package> <package arch="i686" epoch="0" name="gnutls-guile" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-guile-2.8.5-18.14.al12.i686.rpm</filename></package> <package arch="i686" epoch="0" name="gnutls" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-2.8.5-18.14.al12.i686.rpm</filename></package> <package arch="i686" epoch="0" name="gnutls-utils" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-utils-2.8.5-18.14.al12.i686.rpm</filename></package> </collection> </pkglist> </update> </updates>

我正在尝试去掉诸如架构类型、名称、发布版本和没有包的文件名等细节。在

我的问题是,如何有效地对包含300个以上条目的文件执行此操作?以我对Python的有限知识,我可以从一个条目中得到这个结果。但是由于有太多(700多个)条目(文件大小为1.5G),当我试图在for循环中运行它时,它会消耗大量的资源,并且包含错误。我该怎么做?在


Tags: thenamepackagereleaseversionpackagesfilenamex86
1条回答
网友
1楼 · 发布于 2024-06-25 05:39:16

使用^{} module。就我的经验而言,与xml.etree一起工作时,性能很好。在

例如:

import xml.etree.ElementTree as ET
tree = ET.parse('updateinfo.xml')
root = tree.getroot()
updates = root.findall('update')

for update in updates:
  packages=update.find('pkglist').find('collection').findall('package')
  for package in packages:
    print(package.attrib['arch'], package.attrib['name'], package.attrib['release'], package.find('filename').text.replace('Packages/',''))

这将导致以下输出(使用python3运行):

^{pr2}$

相关问题 更多 >