<p>既然@Kolmar已经给出了一个<a href="https://stackoverflow.com/a/69710728/843953">regex solution</a>,我将添加一个没有正则表达式的</p>
<p>为了帮助思考这一点,我将首先向您展示我的解决方案,将<em>常规字符串编码为p语言。在这种方法中,我使用<code>itertools.groupby()</code>根据是否为元音对字符串中的字符进行分组。此函数用于将同一组中具有相同键的连续元素分组</p>
<pre><code>def p_encode(s):
vowels = {'a', 'e', 'i', 'o', 'u'}
s_groups = [(k, list(v)) for k, v in itertools.groupby(s, lambda c: c.lower() in vowels)]
# For scorpion, this will look like this:
# [(False, ['s', 'c']),
# (True, ['o']),
# (False, ['r', 'p']),
# (True, ['i', 'o']),
# (False, ['n'])]
p_output = []
# Now, we go over each group and do the encoding for the vowels.
for is_vowel_group, group_chars in s_groups:
p_output.extend(group_chars) # Add these chars to the output
if is_vowel_group: # Special treatment for vowel groups
p_output.append("p")
p_output.extend(c.lower() for c in group_chars)
return "".join(p_output)
</code></pre>
<p>我添加了一个列表理解来定义<code>s_groups</code>,向您展示它是如何工作的。您可以跳过列表理解,直接迭代<code>for is_vowel_group, group_chars in itertools.groupby(s, lambda c: c.lower() in vowels)</code></p>
<hr/>
<P>现在,对于<EM>解码EME>这,我们可以逆向地做一些类似的事情,但是这次手动分组,因为我们需要处理{{CD4}},当它们位于元音组的中间时。p>
<p>我建议您在迭代字符串时不要修改它。充其量,您将编写一些难以理解的代码。在最坏的情况下,您将有bug,因为循环将尝试迭代比实际存在的索引更多的索引</p>
<p>此外,您还可以迭代<code>1..len(p)</code>,然后尝试访问<code>p[i+1]</code>。在上一次迭代中,这将抛出一个<code>IndexError</code>。因为你想把重复的元音算作一组,这是行不通的。你必须把元音和非元音分开分组,然后把它们连接成一个字符串</p>
<pre><code>def p_decode(p):
vowels = {'a', 'e', 'i', 'o', 'u'}
p_groups = []
current_group = None
for c in p:
if current_group is not None:
# If the 'vowelness' of the current group is the same as this character
# or ( the current group is a vowel group
# and the current character is a 'p'
# and the current group doesn't contain a 'p' already )
if (c.lower() in vowels) == current_group[0] or \
( current_group[0] and
c.lower() == 'p' and
'p' not in current_group[1]):
current_group[1].append(c) # Add c to the current group
else:
current_group = None # Reset the current group to None so you can make it later
if current_group is None:
current_group = (c.lower() in vowels, [c]) # Make the current group
p_groups.append(current_group) # Append it to the list
# For scorpion => scoporpiopion
# p_groups looks like this:
# [(False, ['s', 'c']),
# (True, ['o', 'p', 'o']),
# (False, ['r', 'p']),
# (True, ['i', 'o', 'p', 'i', 'o']),
# (False, ['n'])]
p_output = []
for is_vowel_group, group_chars in p_groups:
if is_vowel_group:
h1 = group_chars[:len(group_chars)//2] # First half of the group
h2 = group_chars[-len(group_chars)//2+1:] # Second half of the group, excluding the p
# Add the first half to the output
p_output.extend(h1)
if h1 != h2:
# The second half of this group is not repeated characters
# so something in the input was wrong!
raise ValueError(f"Invalid input '{p}' to p_decode(): vowels before and after 'p' are not the same in group '{''.join(group_chars)}'")
else:
# Add all chars in non-vowel groups to the output
p_output.extend(group_chars)
return "".join(p_output)
</code></pre>
<p>现在,我们有:</p>
<pre><code>words = ["An elephant", "scorpion", "boat", "boot", "Hello World", "stupid"]
for w in words:
p = p_encode(w)
d = p_decode(p)
print(w, p, d, sep=" | ")
</code></pre>
<p>其中给出了(修饰矿):</p>
<div class="s-table-container">
^{tb1}$
</div>
<p>此外,实际上编码不正确的单词(如<code>"stupid"</code>)会抛出<code>ValueError</code></p>
<pre><code>>>> p_decode("stupid")
ValueError: Invalid input 'stupid' to p_decode(): vowels before and after 'p' are not the same in group 'upi'
</code></pre>