用于在两个常量字符串之间搜索变量字符串的Python脚本问题的回答

用于在两个常量字符串之间搜索变量字符串的Python脚本

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<pre><code>import re infile = open('document.txt','r') outfile= open('output.txt','w') copy = False for line in infile: if line.strip() == "--operation():": bucket = [] copy = True elif line.strip() == "StartOperation": for strings in bucket: outfile.write( strings + ',') for strings in bucket: outfile.write('\n') copy = False elif copy: bucket.append(line.strip() </code></pre> CSV格式如下： <pre><code>id, name, poid, error 5896, AutoAuthOSUserSubmit, 900105270, 0x4002 </code></pre> 我的日志文件有几个部分，以<code>==== START ====</code>开始，以<code>==== END ====</code>结束。我想提取<code>--operation():</code>和<code>StartOperation</code>之间的字符串。例如，<code>AutoAuthOSUserSubmit.</code>我还想从第<code>poid: 900105270, poidLen: 9</code>行中提取<code>poid</code>值。最后，我想提取返回值，例如<code>0x4002</code>，如果在它后面找到<code>Roll back all updates</code>。你知道吗 如果<code>Start</code>和<code>End</code>不在同一行，我甚至无法提取原始文本的点。我该怎么做呢？你知道吗 这是一个包含两段的日志摘录示例： <pre><code>-- 08/24 02:07:56 [mds.ecas(5896) ECAS_CP1] **==== START ====** open file /ecas/public/onsite-be/config/timer.conf failed INFO 08/24/16 02:07:56 salt1be-d1-ap(**5896**/0) main.c(780*****):--operation(): AutoAuthOSUserSubmit. StartOperation***** INFO 08/24/16 02:07:56 salt1be-d1-ap(5896/0) main.c(784):--Client Information: Request from host 'malt-d1-wb' process id 12382. DEBUG 08/24/16 02:07:56 salt1be-d1-ap(5896/0) TOci.cc(571):FetchServiceObjects: ServiceCert.sql DEBUG 08/22/16 23:15:53 pepper1be-d1-ap(2680/0) vsserviceagent.cpp(517):Generate Certificate 2: c1cd00d5c3de082360a08730fef9cd1d DEBUG 08/22/16 23:15:53 pepper1be-d1-ap(2680/0) junk.c(1373):GenerateWebPin : poid: **900105270**, poidLen: 9 DEBUG 08/22/16 23:15:53 pepper1be-d1-ap(2680/0) junk.c(1408):GenerateWebPin : pinStr DEBUG 08/24/16 02:07:56 salt1be-d1-ap(5896/0) uaadapter_vasco_totp.c(275):UAVascoTOTPImpl.close() -- Releasing Adapter Context DEBUG 08/22/16 23:15:53 pepper1be-d1-ap(2680/0) vsenterprise.cpp(288):VSEnterprise::Engage returns 0x4002 - Unknown error code **(0x4002)** ERROR 08/22/16 23:15:53 pepper1be-d1-ap(2680/0) vsautoauth.cpp(696):OSAAEndUserEnroll: error occurred. **Roll back** all updates! INFO 08/24/16 02:07:56 salt1be-d1-ap(5896/0) uaotptokenstoreqmimpl.cpp(199):Close token store INFO 08/24/16 02:07:56 salt1be-d1-ap(5896/0) main.c(990):-- EndOperation -- 08/24 02:07:56 [mds.ecas(5896) ECAS_CP1] **==== END ====** OPERATION = AutoAuthOSUserSubmit, rc = 0x0 (0) SYSINFO Elapse = 0.687, Heap = 1334K, Stack = 64K </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

这看起来像是正则表达式的工作！事实上有好几个。谢天谢地，在这种情况下使用它们并不复杂。你知道吗 有两个主要的观察结果会让我选择正则表达式而不是其他东西： <ol> <li>需要从两个已知常量值之间提取一位变量文本</li> <li>对于不同的字符串，需要多次遵循相同的模式</li> </ol> 您可以尝试以下方法： <pre><code>import re def capture(text, pattern_string, flags=0): pattern = re.compile(pattern_string, flags) match = pattern.search(text) if match: output = match.group(1) print '{}\n'.format(output) return output return '' if __name__ == '__main__': file = read_my_file() log_pattern = "\*\*==== START ====\*\*(.+)\*\*==== END ====\*\*" log_text = capture(file, log_pattern, flags=re.MULTILINE|re.DOTALL) op_pattern = " operation\(\): (.+). StartOperation\*\*\*\*\*" op_name = capture(log_text, op_pattern) poid_pattern = "poid: \*\*([\d]+)\*\*, poidLen: " op_name = capture(log_text, poid_pattern) retcode_pattern = "Unknown error code \*\*\((.+)\)\*\*.+\*\*Roll back\*\* all updates!" retcode = capture(log_text, retcode_pattern, flags=re.MULTILINE|re.DOTALL) </code></pre> 这种方法本质上把问题分成几个基本上独立的步骤。我在每个正则表达式中使用捕获组，比如<code>(.+)</code>和<code>([\d]+)</code>，在常量字符的长字符串之间。多行和dotall标志允许您轻松地处理文本中的换行符，并像处理字符串的任何其他部分一样处理它们。你知道吗 我在这里还做了一个很大的假设，即您的日志不是大文件，可能最多只有几百兆字节。注意对<code>read_my_file()</code>的调用-我选择读取整个文件并在内存中工作，而不是一次一行解决这个问题。如果文件变得非常大，或者你正在构建一个会获得大量流量的应用程序，这可能是个坏主意。你知道吗 希望这有帮助！你知道吗

用于在两个常量字符串之间搜索变量字符串的Python脚本

1 个回答

相关Python问题