<p>一个相当复杂的问题,但这可能会让你开始:</p>
<pre><code>import re, pandas as pd
from pandas import DataFrame
rx = re.compile(r'''
(?:INVENTORY\ CODE:)\s*
(?P<inv>.+\S)
[\s\S]+?
^BIN.+[\n\r]
(?P<bin_msg>(?:(?!^\ ).+[\n\r])+)
''', re.MULTILINE | re.VERBOSE)
string = your_string_here
# set up the dataframe
df = DataFrame(columns = ['BIN', 'INV', 'MESSAGE'])
for match in rx.finditer(string):
inv = match.group('inv')
bin_msg_raw = match.group('bin_msg').split("\n")
rxbinmsg = re.compile(r'^(?P<bin>(?:(?!\ {2}).)+)\s+(?P<message>.+\S)\s*$', re.MULTILINE)
for item in bin_msg_raw:
for m in rxbinmsg.finditer(item):
# append it to the dataframe
df.loc[len(df.index)] = [m.group('bin'), inv, m.group('message')]
print(df)
</code></pre>
<h3>说明</h3>
<p>它查找<code>INVENTORY CODE</code>,并设置组(<code>inv</code>和{<cd3>}),以便在<code>afterwork()</code>中进行进一步的处理(注意:如果您只有一行bin/msg,则会更容易,因为您需要在此处拆分组)。<br/>
然后,它分割<code>bin</code>和<code>msg</code>部分,并将所有内容附加到<code>df</code>对象。在</p>