气泡图-数据可视化包

bubble-plot的Python项目详细描述


气泡图

大家好

我喜欢数据可视化!如果你也爱他们,我想你会发现这个泡泡情节非常好和有用。

How to install

Very simple - just write in your command line:

^{pr 1}$

Motivation & Usage

The goal for the bubble plot is to help us visualize linear and non-linear connections between numerical/categorical features in our data in an easy and simple way. The bubble plot is a kind of a 2-dimensional histogram using bubbles. It suits every combination of categorical and numerical features.

The bubble size is proportional to the frequency of the data points in this point.

Function signature:

^{pr 2}$

For numerical features the values will be presented in buckets (ten equally spaced bins will be used as default, you can provide the specific bins / bins number through the ^{} and ^{} parameters).

For categorical features the features will be presented according to their categories. If you would like a specific order for the categories presentation please supply a list of the values by order using the ^{} / ^{} parameters.

You can plot a numerical feature vs. another numerical feature or vs. a categorical feature or a categorical feature vs another categorical feature or numerical feature. All options are possible.

Setting the parameter normalization_by_all to False defines that we would like to plot P(y/x), meaning, plot the distribution of y given x. Each column in this plot is an independent (1D) histogram of the values of the y given x. Setting the parameter normalization_by_all to True would plot the joint distribution of x and y, P(x,y), this is in fact a 2D histogram with bubbles.

Setting the ^{} parameter to ^{} would apply the natural log function - element wise - on the counts which will make the differences between the largest bubble to the smallest bubble much smaller, so if you have large differences between the frequencies of different values you might want to use that.

Setting the ^{} parameter to a name of categorical field with two categories / boolean field would make the color of the bucket be proportional to the ratio ( (boolean_z==value_1).sum()/(boolean_z==value_1).sum() + (boolean_z==value_2).sum()) of the z values for this bucket.

Usage Example

^{pr 3}$

The resulting bubble plot will look like this:

Usage Example 2

Census income dataset - plot the age vs. hours per week vs. the income level. How is that even possible? Can we visualize three dimensions of information in a two dimensional plot?

^{pr 4}$

The resulting bubble plot will look like this:

P(x,y), x: age, y: working hours, color — proportional to the rate of high income people within each bucket

In this bubble plot, we see the joint distribution of the hours-per-week vs. the age (p(x,y)), but here the color is proportional to the rate of high income people — (#>50K/((#>50K)+(#≤50K)) - within all the people in this bucket . By supplying the z_boolean variable, we added additional dimension to the plot using the color of the bubble.

The pinker the color, the higher the ratio for the given boolean feature/target Z. See colormap in the image.

Cool colormap — Pink would stand for the higher ratios in our case, cyan would stand for the lower ratios

This plot shows us clearly that the higher income is much more common within people of age higher than 30 which work more than 40 hours a week.

Dependencies

Contact

More usage examples and explanations can be found at: https://medium.com/@DataLady/exploring-the-census-income-dataset-using-bubble-plot-cfa1b366313b

如果你有任何问题,请告诉我我的电子邮件是meir.shir86@gmail.com

享受吧, shir

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
使用概要文件后找不到java bean不确定原因   多线程如果信号量锁获取/tryAcquire失败,如何使Java线程执行不同的任务而不是阻塞?   java编译器在同一目录中找不到其他类   在Java中,如何检查表示时间戳的字符串是否为有效日期?   java Commons vfs FindFile虚拟文件   TomcatJava。util。计时器空指针异常   java是在Oracle和Vertica之间移动数据的有效方法   java Adobe Acrobat Reader无法打开pdf文件,因为该文件不是受支持的文件类型,或者该文件已损坏   java使用usb驱动程序libusb、usb4java ecc。。为什么它如此不受支持?   java如何在第二列或特定列中插入jface TreeViewer?   java通过internet发送对象并调用其方法   带超声波传感器的Esp32Cam   java Cassandra分页问题最后一页的分页状态不正确   Java/WildFly/MongoDB/JAAS身份验证始终返回403禁止