OfficeDisSector是用于OOXML文档静态安全性分析的解析器库。
officedissector的Python项目详细描述
#OfficeDisSector是一个用于Office开放XML(OOXML)文档静态安全分析的解析器库,由麻省理工学院林肯实验室的网络系统评估组的Grier Forensics创建。
OfficeDisSector是第一个专为OOXML文档安全分析而设计的解析器。它公开了所有的内部内容,包括
文档属性、部分、内容类型、关系、嵌入的宏和多媒体以及注释等等。
它提供完整的json导出和基于mastiff的插件架构。它还包括近600 MB的测试语料库、覆盖率接近100%的单元测试、针对整个语料库运行的冒烟测试,以及简单、结构良好、注释完整的代码
caly下载并安装:
$sudo pip install lxml;如果您还没有安装lxml
$sudo pip install officedissector
或者,您可以从[github]下载officedissector(https://github.com/grierforensics/officedissector/)或作为[zip]下载officedissector(https://github.com/grierforensics/officedissector/archive/mas,并使用pip(推荐)或python安装程序安装本地副本:
$sudo pip install/path/to/thisfolder(推荐),因为pip支持uninstall
$sudo python setup.py install(可选
issector`目录:
$export pythonpath=/path/to/thisfolder
然后:
re和示例插件,请参见
`mastiff plugins/readme.txt`.
[4]:在[5]:doc.is是模板[5]:在[6]:mp=doc.main_part()
[7]:mp.content_type()
out[7]:'application/vnd.openxmlformats/vnd.openxmlformats officedocdocument.wordprocessingml.docdocdocdocument.main+xml'
<5;在[6]:mp.conten[6]:mp.doc.doc.main=doc.main主要部分
是的t_type()
输出[10]:"应用程序/vnd.openxmlformats officedocument.wordprocessingml.document.main+xml"
我们可以读取部件的数据流:
输入[17]:mp.stream()。读取(200)
输出[17]:'<;?xml version="1.0"encoding="utf-8"standalone="是"?>;\r\n<;w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingcanvas"xmlns:mc="http://schemas.openxmlformats.org/markup-c"
:这是一个很受欢迎的地方。在den dokumenteigenschaften festgelegte eintr\xe4ge中的es verwendet f\xfcr自动与滴度。'
[39]:mp.relationships_out()
out[39]:
[关系[rid8](源部分[/word/document.xml]),
关系[rid13](源部分[/word/document.xml],
关系[rid3](源部分[/word/document.xml],
…
关系[rid14](源部分[/word/document.xml])]
[40]:rel=mp.relationships_out()[0]
[43]:rel.type
out[43]:'http://schemas.openxmlformats.org/officedocument/2006/relationships/endnotes'
[46]:endnotes=rel.target_part
[48]:endnotes.content_type()
out[48]:"application/vnd.openxmlformats officedocument.wordprocessingml.endnotes+xml"
l.endnotes+xml",
"uri":"/word/endnotes.xml",
"relationships\u out":[],
"relationships\u in":[
"relationship[rid8](source part[/word/document.xml])"
]
}
功能将自动公开:
in[55]:doc.f特性。[选项卡]
…
文档特性.注释
文档特性.自定义特性
文档特性.自定义XML
文档特性.数字签名
文档特性.文档
文档特性.嵌入控件
文档特性.嵌入对象
文档特性.嵌入_软件包
doc.features.fonts
doc.features.get_parts
doc.features.get_union
doc.features.images
doc.features.macros
doc.features.sounds
doc.features.videos
in[55]:doc.features.images
out[55]:[part[/word/media/image1.jpeg]]
在[56]中:image=doc.features.images[0]
在[58]中:image.content\type()
在[58]中:"image/jpeg"
我们还可以将二进制数据导出到json,方法是在[61]中设置include\stream=true:
:打印图像。到json(include\stream=true)
{
"第64条河流":"/9J/4AAQSKZJJJRGABQQQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBWWWWQFBGCicQol公司/8QtrararagagaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQQQKKKKWVVVVVVVVVVVVVVVVVVVVVVVWZWZWZWZWZZWGZWQLQLQLQLQLQLQWZWWWQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLZWWWWWBaqebagaaaaaaaecawqfbgcqol/8qatareagec公司baqbaqbaqbaqaqaqaqaqaqaqaqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqb",
"内容-键入":"image/jpeg",
"uri":"/word/media/image1.jpeg",
"relationships_out":[],
"relationships_in":[
"relationship[rid1](source part[/word/theme/theme1.xml])"
]
}
;检查宏:
in[62]:doc.features.macros
out[62]:[]
or comments:
in[63]:doc.features.comments
out[63]:[]
核心属性公开:
in[64]:doc.core属性。[tab]
…
doc.core属性.内容状态
doc.core属性.core属性art
doc.core_properties.created
doc.core_properties.creator
doc.core_properties.description
doc.core_properties.identifier
doc.core_properties.keywords
doc.core_properties.language
doc.core_properties.last_modifiedes.last_printed
doc.core_properties.modified
doc.core_properties.name
doc.core_properties..parse_all
doc.core_properties.parse_prop
doc.core_properties.revision
doc.core_properties.subject
doc.core_properties.title
doc.core_properties.version
doc.core_properties.category
in[68]:doc.core_properties.modified
out[68]:'2009-12-04T14:47:00Z'
r/>
OfficeDisSector是第一个专为OOXML文档安全分析而设计的解析器。它公开了所有的内部内容,包括
文档属性、部分、内容类型、关系、嵌入的宏和多媒体以及注释等等。
它提供完整的json导出和基于mastiff的插件架构。它还包括近600 MB的测试语料库、覆盖率接近100%的单元测试、针对整个语料库运行的冒烟测试,以及简单、结构良好、注释完整的代码
caly下载并安装:
$sudo pip install lxml;如果您还没有安装lxml
$sudo pip install officedissector
或者,您可以从[github]下载officedissector(https://github.com/grierforensics/officedissector/)或作为[zip]下载officedissector(https://github.com/grierforensics/officedissector/archive/mas,并使用pip(推荐)或python安装程序安装本地副本:
$sudo pip install/path/to/thisfolder(推荐),因为pip支持uninstall
$sudo python setup.py install(可选
issector`目录:
$export pythonpath=/path/to/thisfolder
然后:
re和示例插件,请参见
`mastiff plugins/readme.txt`.
[4]:在[5]:doc.is是模板[5]:在[6]:mp=doc.main_part()
[7]:mp.content_type()
out[7]:'application/vnd.openxmlformats/vnd.openxmlformats officedocdocument.wordprocessingml.docdocdocdocument.main+xml'
<5;在[6]:mp.conten[6]:mp.doc.doc.main=doc.main主要部分
是的t_type()
输出[10]:"应用程序/vnd.openxmlformats officedocument.wordprocessingml.document.main+xml"
我们可以读取部件的数据流:
输入[17]:mp.stream()。读取(200)
输出[17]:'<;?xml version="1.0"encoding="utf-8"standalone="是"?>;\r\n<;w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingcanvas"xmlns:mc="http://schemas.openxmlformats.org/markup-c"
:这是一个很受欢迎的地方。在den dokumenteigenschaften festgelegte eintr\xe4ge中的es verwendet f\xfcr自动与滴度。'
[39]:mp.relationships_out()
out[39]:
[关系[rid8](源部分[/word/document.xml]),
关系[rid13](源部分[/word/document.xml],
关系[rid3](源部分[/word/document.xml],
…
关系[rid14](源部分[/word/document.xml])]
[40]:rel=mp.relationships_out()[0]
[43]:rel.type
out[43]:'http://schemas.openxmlformats.org/officedocument/2006/relationships/endnotes'
[46]:endnotes=rel.target_part
[48]:endnotes.content_type()
out[48]:"application/vnd.openxmlformats officedocument.wordprocessingml.endnotes+xml"
l.endnotes+xml",
"uri":"/word/endnotes.xml",
"relationships\u out":[],
"relationships\u in":[
"relationship[rid8](source part[/word/document.xml])"
]
}
功能将自动公开:
in[55]:doc.f特性。[选项卡]
…
文档特性.注释
文档特性.自定义特性
文档特性.自定义XML
文档特性.数字签名
文档特性.文档
文档特性.嵌入控件
文档特性.嵌入对象
文档特性.嵌入_软件包
doc.features.fonts
doc.features.get_parts
doc.features.get_union
doc.features.images
doc.features.macros
doc.features.sounds
doc.features.videos
in[55]:doc.features.images
out[55]:[part[/word/media/image1.jpeg]]
在[56]中:image=doc.features.images[0]
在[58]中:image.content\type()
在[58]中:"image/jpeg"
我们还可以将二进制数据导出到json,方法是在[61]中设置include\stream=true:
:打印图像。到json(include\stream=true)
{
"第64条河流":"/9J/4AAQSKZJJJRGABQQQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBQBWWWWQFBGCicQol公司/8QtrararagagaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAQQQKKKKWVVVVVVVVVVVVVVVVVVVVVVVWZWZWZWZWZZWGZWQLQLQLQLQLQLQWZWWWQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLQLZWWWWWBaqebagaaaaaaaecawqfbgcqol/8qatareagec公司baqbaqbaqbaqaqaqaqaqaqaqaqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqbqb",
"内容-键入":"image/jpeg",
"uri":"/word/media/image1.jpeg",
"relationships_out":[],
"relationships_in":[
"relationship[rid1](source part[/word/theme/theme1.xml])"
]
}
;检查宏:
in[62]:doc.features.macros
out[62]:[]
or comments:
in[63]:doc.features.comments
out[63]:[]
核心属性公开:
in[64]:doc.core属性。[tab]
…
doc.core属性.内容状态
doc.core属性.core属性art
doc.core_properties.created
doc.core_properties.creator
doc.core_properties.description
doc.core_properties.identifier
doc.core_properties.keywords
doc.core_properties.language
doc.core_properties.last_modifiedes.last_printed
doc.core_properties.modified
doc.core_properties.name
doc.core_properties..parse_all
doc.core_properties.parse_prop
doc.core_properties.revision
doc.core_properties.subject
doc.core_properties.title
doc.core_properties.version
doc.core_properties.category
in[68]:doc.core_properties.modified
out[68]:'2009-12-04T14:47:00Z'
r/>