我有一个包含从Wikia页面下载的信息的字符串。在
为了解析它的内容,我如何从页面中剥离所有Wiki格式,只留下原始文本?在
下面是一个可能出现的例子:
#REDIRECT[[Blah]]
{{
I have some stuff in here
}}
[[I also have some stuff in here|and here]]
[[http://blehthisisfake.com Link to a fake website]]
<span class="plainlinks">This is quite useless. Why was [[this page]] even created?</span>
<nowiki>There are more HTML tags, they should probably all be stripped...</nowiki>
There is random text in here. bleh bleh bleh
I'm not sure what single [brackets] do, but they should be stripped too...
预期产量:
^{pr2}$有没有一个模块可以做到这一点?在
Google搜索“pythonwiki解析器”会出现this code,这会剥离并替换标记(有关详细信息,请参阅链接中的源代码)。在
相关问题 更多 >
编程相关推荐