我正在处理许多文本文件,其中一些包含uuencoding,可以是.jpg或.pdf或.xlsx的.zip等。我不关心嵌入的UUencoded数据,所以我只想放弃这些段落,保留其余的文本。我正在努力想出一个方法,只跳过一点点,但不要太多。在
{以每个blob开始摘要}
begin 644 filename.extension
644开头之后的每一行似乎都是以字母开头的
^{pr2}$所以这也可能有帮助。你知道如何使用一个函数来删除文件夹(目录)中所有.txt文件的所有行吗?在
例如,下面是一个.jpg uuencoding
GRAPHIC
18
g438975g32h99a01.jpg
begin 644 g438975g32h99a01.jpg
M_]C_X``02D9)1@`!`@$`8`!@``#_[0G64&AO;=&]S:&]P(#,N,``X0DE-`^T`
M`````!``8`````$``0!@`````0`!.$))300-```````$````'CA"24T$&0``
M````!````!XX0DE-`_,```````D```````````$`.$))300*```````!```X
M0DE-)Q````````H``0`````````".$))30/U``````!(`"]F9@`!`&QF9;@`&
M```````!`"]F9@`!`*&9F@`&```````!`#(````!`%H````&```````!`#4`
M```!`"T````&```````!.$))30/X``````!P``#_____________________
M________`^@`````_____________________________P/H`````/______
M______________________\#Z`````#_____________________________
M`^@``#A"24T$"```````$`````$```)````"0``````X0DE-!!X```````0`
M````.$))300:``````!M````!@``````````````)P```+`````&`&<`,P`R
M`&@`.0`Y`````0`````````````````````````!``````````````"P````
M)P`````````````````````````````````````````````X0DE-!!$`````
M``$!`#A"24T$%```````!`````(X0DE-!`P`````!SH````!````<````!D`
M``%0```@T```!QX`&``!_]C_X``02D9)1@`!`@$`2`!(``#_[@`.061O8F4`
M9(`````!_]L`A``,"`@("0@,"0D,$0L*"Q$5#PP,#Q48$Q,5$Q,8$0P,#`P,
M#!$,#`P,#`P,#`P,#`P,#`P,#`P,#`P,#`P,#`P,`0T+"PT.#1`.#A`4#@X.
M%!0.#@X.%!$,#`P,#!$1#`P,#`P,$0P,#`P,#`P,#`P,#`P,#`P,#`P,#`P,
M#`P,#`S_P``1"``9`'`#`2(``A$!`Q$!_]T`!``'_\0!/P```04!`0$!`0$`
M`````````P`!`@0%!@<("0H+`0`!!0$!`0$!`0`````````!``(#!`4&!P@)
M"@L0``$$`0,"!`(%!P8(!0,,,P$``A$#!"$2,05!46$3(G&!,@84D:&Q0B;,D
M%5+!8C,T<H+10P)E\K.$P]-U
MX_-&)Y2DA;25Q-3D]*6UQ=7E]59F=H:6IK;&UN;V-T=79W>'EZ>WQ]?G]Q$`
M`@(!`@0$`P0%!@<'!@4U`0`"$0,A,1($05%A<2(3!3*!D12AL4(CP5+1\#,D
M8N%R@I)#4Q5C<S3Q)086HK*#!R8UPM)$DU2C%V1%539T9>+RLX3#TW7C\T:4
MI(6TE<34Y/2EM<75Y?569G:&EJ;:VQM;F]B
我只想留下
GRAPHIC
18
g438975g32h99a01.jpg
有关背景,请参阅我之前的问题How to remove weird encoding from txt file
编辑:来试试
start_marker='开始644'
with open('fileWithBegin644.txt') as inf:
ignoreLines = False
for line in inf:
if start_marker in line:
print line,
ignoreLines = True
if not ignoreLines:
with open("strip_" + inf, "w") as f:
f.write(line.get_text().encode('utf-8'))
我在跟踪错误
File "removeUuencodingFromAll.py", line 10
with open("strip_" + inf, "w") as f:
^
IndentationError: expected an indented block
我编了一个简单的发电机。因为规范有点乏味(为什么在不同的行上有两个独立的结束标记?)它相当笨重,但这里有。它应该同时作为uuencode的验证器工作,但我只在非常有限的设置中测试过它。在
这样使用:
^{pr2}$或者像这样:
或者像这样:
或者在code you seem to be using的情况下:
您链接到的sample在第二个uuencoded blob中有一个无效的长度指示符,所以我添加了一个选项来忽略它。在
相关问题 更多 >
编程相关推荐