在scipion框架中使用imagic程序的插件
scipion-em-imagic的Python项目详细描述
这个插件包括两个协议,为IMAGIC软件套件的多元统计分析(msa)模块提供包装。imagic是一个授权软件,不与scipion一起分发,必须由用户安装。
安装
您需要使用2.0版本的scipion才能运行这些协议。要安装插件,有两个选项:
- 稳定版本
scipion installp -p scipion-em-imagic
开发人员版本
- download repository
git clone https://github.com/scipion-em/scipion-em-imagic.git
- install
scipion installp -p path_to_scipion-em-imagic --devel
此外,您还需要一个正常工作的imagic安装。假定默认安装路径为software/em/imagic-180311,如果要更改它,请将scipion.conf文件中的imagic_home设置为安装imagic的文件夹(它与shell环境中的imagic_root变量相同)。如果要使用基于mpi的并行作业执行,请确保imagic安装文件夹中有openmpi目录。 要检查安装,只需运行以下scipion测试:
scipion test tests.em.workflows.test_workflow_imagicMSA.TestImagicWorkflow
支持的版本
由于几乎每一个版本的imagic软件都会改变用户与imagic程序的交互,因此我们提出了一种提供多版本支持的方法。在imagic/scripts目录中,每个对应版本都有一个文件夹,其中包含类似于imagic使用的批处理脚本。这样就可以创建特定于某个版本的类似脚本。目前支持110308(2011年3月)、160418(2016年4月)和180311(2018年3月)版本。如果您遇到任何问题或需要帮助修改imagic版本的脚本,请毫不犹豫地create an issue on Github。除了编辑脚本目录外,还需要将版本号添加到文件imagic/__init__.py中的\u supportedversions列表中,并在必要时将imagic\u home变量添加到scipion.conf。
协议
imagic - msa
Multivariate Statistical Analysis (MSA) is a powerful technique that allows to identify largest variations in a big data set. It was originally introduced to discriminate between various classes of molecular projections prior to averaging. In the MSA approach, aligned molecular images are submitted to correspondence analysis (CA), that determines the main (orthogonal) directions of inter-image variance and calculates the image coordinates in a system spanned by these newly determined axes. Since this new coordinate system is adapted to the general behavior of the image data, a large reduction in the total amount of data can be obtained: for example, instead of 64x64=4096 density values (pixels) per image, each image is now characterized by the first eight factorial-axis coordinates at the most! With this large data reduction, the classification of the images becomes much simpler.
To launch MSA protocol, you have to provide an aligned (at least, centered) SetOfParticles, number of factors (eigenvectors), maximum number of iterations for algorithm to converge and a mask if you want to analyze variance withing specific area of your particles (fig. 1). Usually 20-25 factors and similar number of iterations are enough even for large data sets.
If you want to play with advanced parameters, select Advanced expert level and look at the Help message for any particular option.
This protocol does not generate any results except eigenimages. Eigenimages represent eigenvectors in the image space and account for major density variations in the data set (fig. 3). The very first eigenimage is a total sum of all particles. The following eigenimages show data set variance in a decreasing order. Last eigenimages are usually very noisy and can be discarded from further analysis.
imagic-msa分类
After MSA analysis you can use a subset of eigenimages for clustering original images (that will be reconstructed from a linear combination of selected eigenvectors) into groups. IMAGIC MSA module implements hierarchical ascendant classification (HAC) that tries to merge images into clusters by minimizing intra-class variance and maximizing inter-class variance between different clusters.
The msa-classify protocol requires the SetOfParticles from the previous run of msa, a number of factors to use for analysis and a number of classes. At this moment only first N eigenimages can be chosen for MSA-based classification. In the future versions of the protocol it will be possible to select eigenimages independently and also assign weighting coefficients for more advanced image analysis.
As always, if you want to play with advanced parameters, select Advanced expert level and look at the Help message for any particular option.
In the end you will obtain 2D classes that will most likely display what kind of heterogeneity you have in your data set.
参考文献
- M van Heel和W Keegstra(1981)。IMAIC:一个快速、灵活、友好的图像分析软件系统。超微结构7:113-130。
- M van Heel,G Harauz,Ev Orlova,R Schmidt和M Schatz(1996)。新一代图像处理系统。J.结构。比尔。116:17-24。
- M van Heel,R Portugal,A Rohou,C Linnemayr,C Bebeacua,R Schmidt,T Grant和M Schatz(2012年)。准原子分辨率下的四维低温电子显微镜:“imagic 4d”。国际结晶学表,F卷,第19.9章:624-628。
- M van heel(1984年)。噪声图像的多元统计分类(随机取向生物大分子)。超微结构13(1-2):165-183。
- Lisa Borland和Marin Van Heel(1990年)。共轭表示空间中图像数据的分类。美国光学学会学报A 7(4):601-610.