一种将深度网络的预测归因于其输入特征的方法
eig的Python项目详细描述
增强综合坡度(EIG)
Anupama Jha和Yoseph Barash
Biociphers Lab,宾夕法尼亚大学CIS和遗传学系
引文
Improving interpretability of deep learning models: splicing codes as a case study. Jha, A., Aicher, J. K., Gazzara, M. R., Singh, D., & Barash, Y. (2019). biorXiv preprint (2019), 700096.
简介
积分梯度(IG)是一种将深部网络的预测归因于其输入特征Sundararajan et al.的方法。 我们引入了增强的综合梯度(EIG),它扩展了IG的三个主要贡献:非线性路径、有意义的基线和类范围的特征显著性。 这些贡献使我们能够回答解释问题,比如:哪些特性区分了感兴趣的类和基线类? 例如,EIG识别将数字5的图像(样本,感兴趣的类别)与数字3的图像(基线类)区分的像素。在
EIG包包含四条路径,可以在原始特征空间或隐藏(潜在)空间中计算。为了计算潜在空间中的路径,我们假设有一个自动编码器,它可以将样本从原始空间编码到隐藏空间,并将样本从隐藏空间解码到原始特征空间。在
Path | Description |
---|---|
Original space Linear path (O-L-IG) | Linear path computed by linearly interpolating between the sample and the baseline in the original feature space. |
Hidden space Linear path (H-L-IG) | Linear path computed by linearly interpolating between the sample and the baseline in the hidden space. |
Original space Neighbors path (O-N-IG) | Neighbors path computed by picking nearest data points between the sample and the baseline in the original feature space. |
Hidden space Neighbors path (H-N-IG) | Neighbors path computed by picking nearest data points between the sample and the baseline in the hidden space. |
EIG还包括两类基线:组无关基线和组特定基线。第一,不可知的群体基线不需要任何事先的生物学信息来定义它。 组特定基线使用不同的方法从感兴趣的类别中选择参考点(k-均值、中位数、接近和随机)。这些基线点可以在原始要素空间或隐藏要素空间中选择。在
^{tb2}$最后,我们包括一个显著性测试程序来识别与预测任务相关的显著性特征。此过程首先计算属于某个感兴趣类别的样本的特征属性的相对排名。然后,计算相似大小随机样本集的这些排名。然后使用单侧t检验和Bonferroni校正对两组相对排序进行比较,以确定显著特征集。在
安装
可以使用安装EIG
pip install eig
示例
以下文件包含使用EIG路径和基线的示例。我们用卷积神经网络(CNN)和前馈网络(DNN)演示了EIG在MNIST数字上的应用。在
请从here下载拼接数据,并将该文件放入数据文件夹以运行拼接示例。在
File | Description |
---|---|
O-L-IG path with digits CNN | This notebook contains MNIST digit examples with linear path in the original feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
H-L-IG path with digits CNN | This notebook contains MNIST digit examples with linear path in the latent feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
O-N-IG path with digits CNN | This notebook contains MNIST digit examples with neighbors path in the original feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
H-N-IG path with digits CNN | This notebook contains MNIST digit examples with neighbors path in the latent feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
O-L-IG path with splicing DNN | This notebook contains splicing examples with linear path in the original feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
H-L-IG path with splicing DNN | This notebook contains splicing examples with linear path in the latent feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
O-N-IG path with splicing DNN | This notebook contains splicing examples with neighbors path in the original feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
H-N-IG path with splicing DNN | This notebook contains splicing examples with neighbors path in the latent feature space with group specific baselines (median, k-means, close, random) and group agnostic baseline (encoded_zero). |
- 项目
标签: