如何手动剥离XML标记?

2024-09-26 18:00:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图删除标签,并创建一个新的文件,但我不知道如何做到这一点。我给出了一个包含XML标记的文件,我想使用strip和split生成一个list/string。我不能使用XML解析器或任何其他库。你知道吗

以下是文本文件:

<team> <name>Denver Broncos</name> <players> <player> <jno>50</jno> <fname>Zaire</fname> <lname>Anderson</lname> <height>5-11</height> <weight>220</weight> <age>24</age> <position>ILB</position> <school>Nebraska</school> </player> <player> <jno>48</jno> <fname>Shaquil</fname> <lname>Barrett</lname> <height>6-2</height> <weight>250</weight> <age>23</age> <position>OLB</position> <school>Colorado State</school> </player> <player> <jno>35</jno> <fname>Kapri</fname> <lname>Bibbs</lname> <height>5-11</height> <weight>203</weight> <age>23</age> <position>RB</position> <school>Colorado State</school> </player> </players> </team>

我想使用string/list生成如下句子:

Here is the roster for the Denver Broncos. There are 3 players on the team. Zaire Anderson, ILB, wears #50. He is 5 foot 11 inches tall, and weighs 220 pounds. He is 24 years old. He went to Nebraska. Shaquil Barrett, OLB, wears #48. He is 6 foot 2 inches tall, and weighs 250 pounds. He is 23 years old. He went to Colorado State. Kapri Bibbs, RB, wears #48. He is 5 foot 11 inches tall, and weighs 203 pounds. He is 23 years old. He went to Colorado State.

def test(filename):
    f=open(filename,"r")
    line = f.readline()
    f2 = open("BearsRoster.txt", "w")
    print line
    myList = []
    stringl = ""
    for i in line:
        if i == ("<"):
            while i != ">":
                line.remove(i)


        else:


            stringl = stringl + i
            myList.append(stringl)
            stringl = ""
        else:
            stringl = stringl + i
    print myList
    for i in myList:
        print i
        print myList

        if i[0] == "<" or " ":
            myList.remove(i)

显然这个代码是不正确的。我的想法是遍历字符串并尝试剥离<xxxxx>代码。我只是不知道该怎么处理。之后,我想把这句话我张贴。你知道吗


Tags: ageispositionfnamehestateplayerheight
1条回答
网友
1楼 · 发布于 2024-09-26 18:00:54

要删除标记,请使用变量skip=True/False控制何时将char复制到新字符串。你知道吗

当你找到<然后设置skip=True,当你找到>然后设置skip=False

data = '''<team> <name>Denver Broncos</name> <players> <player> <jno>50</jno> <fname>Zaire</fname> <lname>Anderson</lname> <height>5-11</height> <weight>220</weight> <age>24</age> <position>ILB</position> <school>Nebraska</school> </player> <player> <jno>48</jno> <fname>Shaquil</fname> <lname>Barrett</lname> <height>6-2</height> <weight>250</weight> <age>23</age> <position>OLB</position> <school>Colorado State</school> </player> <player> <jno>35</jno> <fname>Kapri</fname> <lname>Bibbs</lname> <height>5-11</height> <weight>203</weight> <age>23</age> <position>RB</position> <school>Colorado State</school> </player> </players> </team>'''

skip = False
result = ''

for char in data:
    if char == '<':
        skip = True
    elif char == '>':
        skip = False
    elif not skip:
        result += char

print(result)

如果您需要来自标记的数据,那么您必须构建解析器—识别开始标记和结束标记,记住标记名称,并可能使用标记构建树。所以你需要更多的工作。你知道吗

相关问题 更多 >

    热门问题