自动附加每个段落？

2024-09-27 02:21:33 发布

男 | 程序猿一只，喜欢编程写python代码。

我是一个初学者程序员，希望在这个可能很小的问题上得到一些帮助：我有包含以下结构的.xml文件：

<norm builddate="20120625150106" doknr="BJNR000020963BJNE000401308">
    <metadaten>
        <jurabk>BUrlG</jurabk>
        <enbez>§ 3</enbez>
        <titel format="parat">sometitle</titel>
    </metadaten>
    <textdaten>
        <text format="XML">
            <Content>
                <P>(1) sometext</P>
                <P>(2) anothertext</P>
            </Content>
        </text>
        <fussnoten/>
    </textdaten>
</norm>

现在我想用标签“enbez”+“p”（somenumber）+“jurabk”的字符串内容来附加每个“p”内容，例如：§3（1）BUrlG。然后我应用一些格式，使之成为§3 Abs。1伯尔格。你知道吗

我已经设法得到一个具体的“enbez”和一个具体的“p”标签的例子代码。但是我希望能够对整个文档自动执行此过程，但是无法正确编写迭代器来获取每个“enbez”中的每个“P”标记并将append函数应用于正确的段落。而且我的每一步都写得尽可能笨拙，如果有更好的方法做某事我会非常感激任何建议！你知道吗

示例代码：

import string
import re
from urllib import urlopen
from bs4 import BeautifulSoup

xmlfile = urlopen('burlg.xml').read()
soup = BeautifulSoup(xmlfile)

# Find a specific enbez; the norm parent always contains only one
enbez = soup.findAll("enbez")
enbezspecial = enbez[3]

#find the norm parent
norm = enbezspecial.find_parent("norm")

#find all p's belonging to the norm parent
p = norm.findAll("p")
pspecial = p[1]

#Get the number, remove the brackets and add a whitespace
regex = re.compile('\(\d\)')
result = regex.match(pspecial.string)
resultstring = result.group()
resultstring1 = resultstring.replace("(","")
resultstring2 = resultstring1.replace(")","")
resultstring3 = " " + resultstring2

#find the shorttitle; is the same for the whole document
jurabk = soup.find("jurabk")

#add some output formatting
enbezprint = enbezspecial.text
paraprint = " Abs."+resultstring3
jurabkprint = " "+jurabk.text
appendix = "["+enbezprint+paraprint+jurabkprint+"]"

p[1].append(appendix)
print p[1]

Tags： the text import format norm xml find parent

0条回答

目前没有回答

自动附加每个段落？

相关问题更多 >

编程相关推荐

热门问题

热门文章

自动附加每个段落？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >