如何画出合适的分布树图?

2024-09-30 10:35:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我将python与matplotlib一起使用,需要可视化数据集子组的分布百分比。你知道吗

想象一下这棵树:

Data --- group1 (40%)
     -
     --- group2 (25%)
     -
     --- group3 (35%)


group1 --- A (25%)
       -
       --- B (25%)
       -
       --- c (50%)

它可以继续下去,每个小组可以有几个小组,每个小组都是一样的。你知道吗

如何为这些信息绘制一个合适的图表?你知道吗


Tags: 数据信息datamatplotlib可视化图表绘制小组
2条回答

我创建了一个最小的可复制的例子,我认为符合你的描述,但请让我知道,如果这不是你需要的。你知道吗

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = pd.DataFrame()
n_rows = 100
data['group'] = np.random.choice(['1', '2', '3'], n_rows)
data['subgroup'] = np.random.choice(['A', 'B', 'C'], n_rows)

例如,我们可以得到子组的以下计数。你知道吗

In [1]: data.groupby(['group'])['subgroup'].value_counts()
Out[1]: group  subgroup
    1   A      17
        C      16
        B      5
    2   A      23
        C      10
        B      7
    3   C      8
        A      7
        B      7
 Name: subgroup, dtype: int64

我创建了一个函数,在给定列的顺序(例如['group', 'subgroup'])的情况下计算必要的计数,并用相应的百分比递增地绘制条形图。你知道吗

import matplotlib.pyplot as plt
import matplotlib.cm

def plot_tree(data, ordering, axis=False):
    """
    Plots a sequence of bar plots reflecting how the data 
    is distributed at different levels. The order of the 
    levels is given by the ordering parameter.

    Parameters
         
    data: pandas DataFrame
    ordering: list
        Names of the columns to be plotted.They should be 
        ordered top down, from the larger to the smaller group.
    axis: boolean
        Whether to plot the axis.

    Returns
       -
    fig: matplotlib figure object.
        The final tree plot.
    """

    # Frame set-up
    fig, ax = plt.subplots(figsize=(9.2, 3*len(ordering)))
    ax.set_xticks(np.arange(-1, len(ordering)) + 0.5)
    ax.set_xticklabels(['All'] + ordering, fontsize=18)
    if not axis:
        plt.axis('off')
    counts=[data.shape[0]]

    # Get colormap
    labels = ['All']
    for o in reversed(ordering):
        labels.extend(data[o].unique().tolist())
    # Pastel is nice but has few colors. Change for a larger map if needed
    cmap = matplotlib.cm.get_cmap('Pastel1', len(labels))
    colors = dict(zip(labels, [cmap(i) for i in range(len(labels))]))

    # Group the counts
    counts = data.groupby(ordering).size().reset_index(name='c_' + ordering[-1])
    for i, o in enumerate(ordering[:-1], 1):
        if ordering[:i]:
            counts['c_' + o]=counts.groupby(ordering[:i]).transform('sum')['c_' + ordering[-1]]
    # Calculate percentages
    counts['p_' + ordering[0]] = counts['c_' + ordering[0]]/data.shape[0]
    for i, o in enumerate(ordering[1:], 1):
        counts['p_' + o] = counts['c_' + o]/counts['c_' + ordering[i-1]]

    # Plot first bar - all data
    ax.bar(-1, data.shape[0], width=1, label='All', color=colors['All'], align="edge")
    ax.annotate('All   100%', (-0.9, 0.5), fontsize=12)
    comb = 1  # keeps track of the number of possible combinations at each level
    for bar, col in enumerate(ordering):
        labels = sorted(data[col].unique())*comb
        comb *= len(data[col].unique())
        # Get only the relevant counts at this level
        local_counts = counts[ordering[:bar+1] + 
                              ['c_' + o for o in ordering[:bar+1]] + 
                              ['p_' + o for o in ordering[:bar+1]]].drop_duplicates()
        sizes = local_counts['c_' + col]
        percs = local_counts['p_' + col]
        bottom = 0  # start at from 0
        for size, perc, label in zip(sizes, percs, labels):
            ax.bar(bar, size, width=1, bottom=bottom, label=label, color=colors[label], align="edge")
            ax.annotate('{}   {:.0%}'.format(label, perc), (bar+0.1, bottom+0.5), fontsize=12)
            bottom += size  # stack the bars
    ax.legend(colors)
    return fig

根据上面显示的数据,我们将得到以下结果。你知道吗

fig = plot_tree(data, ['group', 'subgroup'], axis=True)

Tree plot example

相关问题 更多 >

    热门问题