因此,我正在编写一个代码,将一个特定的xml文档转换为一个html文档来呈现一个故事。我已经尽力做到了,但是当我把一个列表连接成一个字符串并把这个新字符串附加到一个列表中时,列表是空的。我试图用有限的理解来解决问题所在,但到目前为止还不够。我会给你看我的代码和我认为问题所在的区域
我已经修复了一件我注意到的事情,我需要的varaiable不是我使用的,但是我已经检查了代码,找不到任何类似的错误
import codecs
import re
fileIn = codecs.open("differenceInAbility.xml", "r", "utf-8")
text = fileIn.read()
fileIn.close()
chapterTitle = re.findall(r'<chapter number="(\d)" name="(.+?)">', text)
chapters = re.findall(r'<chapter number="\d" name=".+?">(.+?)</chapter>', text, flags=re.DOTALL)
paragraphs = re.findall(r"<paragraph>(.+?)</paragraph>", text, flags=re.DOTALL)
cleanParagraphs = []
for entry in paragraphs:
cleanup = re.sub(r"\r\n[ ]+", " ", entry)
cleanup2 = re.sub(r"[ ]+", " ", cleanup)
cleanParagraphs.append(cleanup2)
chaptersHTML = []
chapterCounter = 1
for entry in chapters:
if chapterTitle[0] == r"\d+":
chapterHTML = "<h1> Chapter " + chapterCounter + " - " + chapterTitle[1] + "</h1>"
chapterTitle.pop(0)
chapterTitle.pop(1)
paragraphsHTML = []
for paragraph in cleanParagraphs:
if paragraph in entry:
p = "<p>" + paragraph + "</p>"
paragraphsHTML.append(p)
allParagraphsHTML = "\n".join(paragraphsHTML)
wholeSection = chapterHTML + allParagraphsHTML
chaptersHTML.append(wholeSection)
chapterCounter += 1
print(chaptersHTML)
我认为相关的部分是:
paragraphsHTML = []
for paragraph in cleanParagraphs:
if paragraph in entry:
p = "<p>" + paragraph + "</p>"
paragraphsHTML.append(p)
allParagraphsHTML = "\n".join(paragraphsHTML)
wholeSection = chapterHTML + allParagraphsHTML
chaptersHTML.append(wholeSection)
因为cleanParagraphs
列表有正确的内容,其中xml文档的每个段落都是列表中自己的条目
问题可能是if paragraph in entry
,因为它没有将“entry”的部分注册为其中的段落
如果是这样,我该如何着手解决这个问题?我如何确保它知道哪一段在哪一章
cleanParagraphs
的内容不是原始的子字符串,因此它们当然不会出现在未更改的chapters
值中。你应该分别处理每一章(包括将它分成段落),这样你就不必重新发现它包含哪些段落(并避免错误处理两章中恰好相同的段落)相关问题 更多 >
编程相关推荐