<p>您可以使用groupby,拆分单词并使用<code>__contains__</code>进行分组</p>
<pre><code>s = "dylankid: *random words d* senpai: *random words s* dylankid: *random words d* senpai: *random words s*"
from itertools import groupby
d = {"dylankid:": [], "senpai:":[]}
grps = groupby(s.split(" "), d.__contains__)
for k, v in grps:
if k:
d[next(v)].append(" ".join(next(grps)[1]))
print(d)
</code></pre>
<p>输出:</p>
^{pr2}$
<p>下一次我们用一个单词的名字来连接。在</p>
<p>如果名称后碰巧没有单词,则可以使用空列表作为下一次调用的默认值:</p>
<pre><code>s = "dylankid: *random words d* senpai: *random words s* dylankid: *random words d* senpai: *random words s* senpai:"
from itertools import groupby
d = {"dylankid:": [], "senpai:":[]}
grps = groupby(s.split(" "), d.__contains__)
for k, v in grps:
if k:
d[next(v)].append(" ".join(next(grps,[[], []])[1]))
print(d)
</code></pre>
<p>一些较大字符串的计时:</p>
<pre><code>In [15]: dy, sn = "dylankid:", " senpai:"
In [16]: t = " foo " * 1000
In [17]: s = "".join([dy + t + sn + t for _ in range(1000)])
In [18]: %%timeit
....: d = {"dylankid:": [], "senpai:": []}
....: grps = groupby(s.split(" "), d.__contains__)
....: for k, v in grps:
....: if k:
....: d[next(v)].append(" ".join(next(grps, [[], []])[1]))
....:
1 loop, best of 3: 376 ms per loop
In [19]: %%timeit
....: PATTERN = '''
....: \s* # Any amount of space
....: (dylankid|senpai) # Capture person
....: :\s # Colon and single space
....: (.*?) # Capture everything, non-greedy
....: (?=\sdylankid:|\ssenpai:|$) # Until we find following person or end of string
....: '''
....: res = defaultdict(list)
....: for person, message in re.findall(PATTERN, s, re.VERBOSE):
....: res[person].append(message)
....:
1 loop, best of 3: 753 ms per loop
</code></pre>
<p>两者都重新生成相同的输出:</p>
<pre><code>In [20]: d = {"dylankid:": [], "senpai:": []}
In [21]: grps = groupby(s.split(" "), d.__contains__)
In [22]: for k, v in grps:
if k:
d[next(v)].append(" ".join(next(grps, [[], []])[1]))
....:
In [23]: PATTERN = '''
....: \s* # Any amount of space
....: (dylankid|senpai) # Capture person
....: :\s # Colon and single space
....: (.*?) # Capture everything, non-greedy
....: (?=\sdylankid:|\ssenpai:|$) # Until we find following person or end of string
....: '''
In [24]: res = defaultdict(list)
In [25]: for person, message in re.findall(PATTERN, s, re.VERBOSE):
....: res[person].append(message)
....:
In [26]: d["dylankid:"] == res["dylankid"]
Out[26]: True
In [27]: d["senpai:"] == res["senpai"]
Out[27]: True
</code></pre>