Python outkast包_程序模块 - PyPI

从印第安人的名字推断种姓

outkast的Python项目详细描述

https://travis-ci.org/appeler/outkast.svg?branch=master

https://ci.appveyor.com/api/projects/status/uh8be9gytjo88d6f/branch/master?svg=true

https://img.shields.io/pypi/v/outkast.svg

利用来自19个州的1.4亿印度人的数据，我们估计了特定姓氏、年份和州的在册种姓、在册部落和其他人的比例。在

为什么？在

我们提供这套方案，以便人们能够评估、强调和抵制不公平。在

基础数据是如何产生的？在

script下载发布的here的SECC的clean version。在
Infer the last name

remove names with non-alphabetical characters
remove records with missing last names
remove < 2 char last names
remove rows with birth_date < 1900
last name shared by at least 1000

Group by last name, state, and year并生成{a12}

基本分类器

我们从提供姓氏的基本模型开始，该模型给出了Bayes 提供姓氏为SC、ST和其他的比例的最优解。我们还提供了一系列的基本模型住所是已知的。在

安装

我们强烈建议在Python虚拟环境中安装outkast（请参见venv documentation）

pip install outkast

用法

^{pr2}$

使用EHT 3>

>>> import pandas as pd
>>> from outkast import secc_caste
>>>
>>> names = [{'name': 'patel'},
...             {'name': 'zala'},
...             {'name': 'lal'},
...             {'name': 'agarwal'}]
>>>
>>> df = pd.DataFrame(names)
>>>
>>> secc_caste(df, 'name')
    name    n_sc    n_st  n_other   prop_sc   prop_st  prop_other
0    patel    5681  112302   631393  0.007581  0.149861    0.842558
1     zala     667      14    34550  0.018932  0.000397    0.980670
2      lal  703595  241846  1314224  0.311371  0.107027    0.581601
3  agarwal      39      12     4375  0.008812  0.002711    0.988477


>>>
>>> help(secc_caste)
Help on method secc_caste in module outkast.secc_caste_ln:

secc_caste(df, namecol, state=None, year=None) method of builtins.type instance
    Appends additional columns from SECC data to the input DataFrame
    based on the last name.

    Removes extra space. Checks if the name is the SECC data.
    If it is, outputs data from that row.

    Args:
        df (:obj:`DataFrame`): Pandas DataFrame containing the last name
            column.
        namecol (str or int): Column's name or location of the name in
            DataFrame.
        state (str): The state name of SECC data to be used.
            (default is None for all states)
        year (int): The year of SECC data to be used.
            (default is None for all years)

    Returns:
        DataFrame: Pandas DataFrame with additional columns:-
            'n_sc', 'n_st', 'n_other',
            'prop_sc', 'prop_st', 'prop_other' by last name

作者

素里扬·拉奥哈普拉帕农和高拉夫苏德

许可证

包在MIT License下发布。在

欢迎加入QQ群-->： 979659372

outkast 0.2.1

outkast的Python项目详细描述

为什么？在

基础数据是如何产生的？在

基本分类器

安装

用法

作者

许可证

推荐PyPI第三方库

maximcrc

certbot-dns-arvancloud

imgur-dl

djangosubscribe

ics-ipa-interface

pydevd-odoo

pytest-stepfunctions

syntactic

rss-reader-Anna-Gonchar

auto-cmdline

wpdetect

gsea-incontext-notk

pyrms

simplepkg

the-video-editaneitor

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

outkast 0.2.1

outkast的Python项目详细描述

为什么？在

基础数据是如何产生的？在

基本分类器

安装

用法

作者

许可证

推荐PyPI第三方库

maximcrc

certbot-dns-arvancloud

imgur-dl

djangosubscribe

ics-ipa-interface

pydevd-odoo

pytest-stepfunctions

syntactic

rss-reader-Anna-Gonchar

auto-cmdline

wpdetect

gsea-incontext-notk

pyrms

simplepkg

the-video-editaneitor

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签