在Python中合并具有相同内容但重叠HTML标记的多个字符串问题的回答

在Python中合并具有相同内容但重叠HTML标记的多个字符串

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>标题本身并不是一个明确的问题，因此我将提供一个示例：</p> <p>我有一个字符串示例：</p> <pre class="lang-none prettyprint-override"><code>Created and managed websites for clients to communicate securely </code></pre> <p>而且有很多“版本”。这意味着字符串的“版本”中的单词或短语将包含在HTML div标记中，即<code><div style="font-size: 0.1000000">foo bar</div></code>。（这些标记是任意的，给font-size属性的数字对应于分数，这些分数稍后将用作其他CSS功能，这些功能现在是不相关的。）下面是字符串的4个版本：</p> <pre class="lang-none prettyprint-override"><code>Created and <div style="font-size: 1">managed</div> websites for clients to communicate securely Created and <div style="font-size: 2">managed websites</div> for clients to communicate securely Created and managed websites for clients to <div style="font-size: 3">communicate</div> securely <div style="font-size: 4">Created</div> and managed websites for clients to communicate securely </code></pre> <p>我要将所有这些版本合并到：</p> <p><code><div style="font-size: 4">Created</div> and <div style="font-size: 2"><div style="font-size: 1">managed</div> websites</div> for clients to <div style="font-size: 3">communicate</div> securely</code></p> <p>正如我们在这里看到的，有重叠的标签（在带有<code>font-size: 2</code>和<code>font-size: 1</code>的标签中）。字符串的版本数可以在1到50之间，因此可能存在多个重叠。你知道吗</p> <p>到目前为止，我使用regex的方法如下：</p> <pre class="lang-py prettyprint-override"><code>import re div_str = "<div style=.*</div>" # the div tags div_text_str = "(?<=(>)).*(?=(</div>))" # the content inside the div tags # compile the regexes div_regex = re.compile(div_str) div_text_regex = re.compile(div_text_str) def merge_strings(str1, str2): # grab the div tag off the first version div = div_regex.search(str1).group() # grab the contents of that div tag div_text = div_text_regex.search(div).group() # find the div content in the second version, then substitute # with the div tag return re.sub(div_text, div, str2) </code></pre> <p>我在循环中运行这个函数，试图一次合并两个字符串，直到得到最终输出。我面临的问题是重叠的标记不能与这个函数一起工作，因为regex模式与它不匹配。而且，一次替换多个div标记也会失败。你知道吗</p> <p>任何帮助都将不胜感激！你知道吗</p>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在Python中合并具有相同内容但重叠HTML标记的多个字符串

1 个回答

相关Python问题