巨大数据的热图

Region ATF3 BCL3 BCLAF1 BDP1 BRF1 BRF2 Brg1 CCNT2 CEBPB CHD2 CTCF CTCFL E2F6 ELF1 chr1:109102470:109102970 0 0 1 0 0 0 0 1 0 0 4 1 4 1 chr1:110526886:110527386 0 0 0 0 0 0 0 1 1 0 4 1 0 1 chr1:115300671:115301171 0 0 1 0 0 0 0 0 1 1 4 1 1 1 chr1:115323308:115323808 0 0 0 0 0 0 0 1 0 0 2 1 1 0 chr1:11795641:11796141 1 0 0 0 0 0 0 1 2 0 0 0 1 0 chr1:118148103:118148603 0 0 0 0 0 0 0 1 0 0 0 0 0 1 chr1:150521397:150521897 0 0 0 0 0 0 0 2 2 0 6 2 4 0 chr1:150601609:150602109 0 0 0 0 0 0 0 0 3 2 0 0 1 0 chr1:150602098:150602598 0 0 0 0 0 0 0 0 1 1 0 0 0 0 chr1:151119140:151119640 0 0 0 0 0 0 0 1 0 0 0 0 1 0 chr1:151128604:151129104 0 0 0 0 0 0 0 0 0 0 3 0 0 0 chr1:153517729:153518229 0 0 0 0 0 0 0 0 0 0 0 0 0 0 chr1:153962738:153963238 0 0 0 0 0 0 0 1 1 0 0 0 0 1 chr1:154155682:154156182 0 0 0 0 0 0 0 1 0 0 0 0 1 1 chr1:154155725:154156225 0 0 0 0 0 0 0 1 0 0 0 0 1 1 chr1:154192154:154192654 0 0 0 0 0 0 0 0 0 0 0 0 0 0 chr1:154192824:154193324 1 0 0 0 0 0 0 1 0 1 0 0 1 1 chr1:154192943:154193443 1 0 0 0 0 0 0 1 0 2 0 0 1 1 chr1:154193273:154193773 1 0 0 0 0 0 0 1 0 2 0 0 2 1 chr1:154193313:154193813 0 0 0 0 0 0 0 1 0 2 0 0 2 1 chr1:155904188:155904688 0 0 0 0 0 0 0 1 0 0 0 0 1 1 chr1:155947966:155948466 0 0 0 0 0 0 0 1 0 0 3 0 0 1 chr1:155948336:155948836 0 0 0 0 0 0 0 1 0 0 5 1 0 1 chr1:156023516:156024016 0 0 0 0 0 0 0 1 0 1 4 1 1 1 chr1:156024016:156024516 0 1 1 0 0 0 0 0 0 2 0 0 1 1 chr1:156163229:156163729 0 0 0 0 0 0 0 0 0 0 2 0 0 1 chr1:160990902:160991402 0 0 0 0 0 0 0 0 0 1 0 0 1 2 chr1:160991133:160991633 0 0 0 0 0 0 0 0 0 1 0 0 1 2 chr1:161474704:161475204 0 0 0 0 0 0 0 0 0 0 0 0 0 0 chr1:161509530:161510030 0 0 1 1 1 0 0 0 1 0 1 0 0 1 chr1:161590964:161591464 0 0 0 1 1 0 0 0 0 0 0 0 0 0 chr1:169075446:169075946 0 0 0 0 0 0 0 2 0 0 4 0 3 0 chr1:17053279:17053779 0 0 0 1 0 0 0 0 0 1 0 0 0 0 chr1:1709909:1710409 0 0 0 0 0 0 0 2 0 1 0 0 3 1 chr1:1710297:1710797 0 0 0 0 0 0 0 0 0 1 6 0 1 1

2条回答

网友

1楼 · 编辑于 2024-05-18 07:13:40

由于对我另一个答案的评论，OP有另一个关于2d集群搜索的问题。这里有一些答案。在

从我的库eegpy获取的，我使用一个方法find_clusters。它在2d数组中执行遍历，查找高于/低于给定阈值的所有簇。在

这是我的代码：

import pylab as plt
import numpy as np
from Queue import Queue


def find_clusters(ar,thres,cmp_type="greater"):
    """For a given 2d-array (test statistic), find all clusters which
are above/below a certain threshold.
"""
    if not cmp_type in ["lower","greater","abs_greater"]:
        raise ValueError("cmp_type must be in [\"lower\",\"greater\",\"abs_greater\"]")
    clusters = []
    if cmp_type=="lower":
        ar_in = (ar<thres).astype(np.bool)
    elif cmp_type=="greater":
        ar_in = (ar>thres).astype(np.bool)
    else: #cmp_type=="abs_greater":
        ar_in = (abs(ar)>thres).astype(np.bool)

    already_visited = np.zeros(ar_in.shape,np.bool)
    for i_s in range(ar_in.shape[0]): #i_s wie i_sample
        for i_f in range(ar_in.shape[1]):
            if not already_visited[i_s,i_f]:
                if ar_in[i_s,i_f]:
                    #print "Anzahl cluster:", len(clusters)
                    mask = np.zeros(ar_in.shape,np.bool)
                    check_queue = Queue()
                    check_queue.put((i_s,i_f))
                    while not check_queue.empty():
                        pos_x,pos_y = check_queue.get()
                        if not already_visited[pos_x,pos_y]:
                            #print pos_x,pos_y
                            already_visited[pos_x,pos_y] = True
                            if ar_in[pos_x,pos_y]:
                                mask[pos_x,pos_y] = True
                                for coords in [(pos_x-1,pos_y),(pos_x+1,pos_y),(pos_x,pos_y-1),(pos_x,pos_y+1)]: #Direct Neighbors
                                    if 0<=coords[0]<ar_in.shape[0] and 0<=coords[1]<ar_in.shape[1]:
                                        check_queue.put(coords)
                    clusters.append(mask)
    return clusters

fn = "14318737.txt"
with open(fn, "r") as f:
    labels = f.readline().rstrip("\n").split()[1:]
data = np.loadtxt(fn, skiprows=1, converters={0:lambda x: 0})

clusters = find_clusters(data, 0, "greater")

plot_data = np.ma.masked_equal(data[:,1:], 0)

plt.subplots_adjust(left=0.1, bottom=0.15, right=0.99, top=0.95)
plt.imshow(plot_data, cmap=plt.cm.get_cmap("Reds"), interpolation="nearest", aspect = "auto", 
           vmin=0, extent=[0.5,plot_data.shape[1]+0.5, plot_data.shape[0] - 0.5, -0.5])
plt.colorbar()

for cl in clusters:
    plt.contour(cl.astype(np.int),[0.5], colors="k", lw=2)
plt.xticks(np.arange(1, len(labels)+2), labels, rotation=90, va="top", ha="center")


plt.show()

它给出了形状的图像：

Plot with contour around clusters

clusters是布尔二维数组的列表（True/False）。每个数组表示一个簇，其中每个布尔值表示一个特定的“点”是否是该簇的一部分。你可以在任何进一步的分析中使用它。在

编辑

现在我们来看看集群的乐趣

^{pr2}$

我过滤所有包含5个以上点的簇。我只画这些。您也可以在每个集群中使用data的和。然后我按大小对这些大团进行排序，降序排列。在

最后，我打印所有大型集群的摘要，包括所有集群的名称在对面。 Large clusters only

网友

2楼 · 编辑于 2024-05-18 07:13:40

使用Matplotlib

import pylab as plt
import numpy as np

data = np.loadtxt("14318737.txt", skiprows=1, converters={0:lambda x: 0})
plot_data = np.ma.masked_equal(data[:,1:], 0)

plt.imshow(plot_data, cmap=plt.cm.get_cmap("Reds"), interpolation="nearest")
plt.colorbar()

plt.show()

我忽略了第一行和第一列（如果标签需要它们，我们需要更改）。对于其余的数据，所有的零值都被屏蔽（因此它们在图中显示为白色），然后这些数据被绘制成彩色编码图。在

imshow有一系列其他参数来控制结果，例如原点（下/上）、纵横比（auto/equal/some\u ratio）。在

你写的是地区-你指的是地理区域吗？然后您可能需要查看Basemap Toolkit for Matplotlib来创建颜色编码的映射。在

编辑

新的要求，新的例子

^{pr2}$

现在我第一次读第一行的标签。我将关键字参数aspect添加到imshow-调用中。我为每个因素创建标签。在

另外，我用subplots_adjust调整绘图的位置。您可以使用这些参数，直到它们满足您的需要。在

现在的结果是： resulting heatmap

如果您想要y轴的其他记号，请使用plt.yticks，这与我的示例中的xticks类似。在

相关问题更多 >

编程相关推荐

热门问题

热门文章