问题: 重新标记婚姻状况变量DMARTL,使其具有简短但信息丰富的字符标签。然后为所有人、女性和男性构建这些值的频率表。然后仅使用年龄在30到40岁之间的人构建这三个频率表。 现在我已经完成了所有的工作,除了30到40岁的男性和女性 下面是到目前为止的全部代码,这是到数据集的链接:https://raw.githubusercontent.com/Mauliklm10/Cartwheel.csv/master/datasetNHANES.csv
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as np
da = pd.read_csv("nhanes_2015_2016.csv") # this is where the dataset link will be entered
# prints the data in descending order
da.DMDMARTL.value_counts()
# We are now giving the numbers actual variable names
# The new relabeled variable will be a string first
# all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Married, Divorced etc.
da["DMDMARTLV2"] = da.DMDMARTL.replace({1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never_Married",
6:"Living_With_Partner",77:"Refused",99:"Dont_Know"})
da.DMDMARTLV2.value_counts()
# Below is the way to find out the values that have been lost/are missing
pd.isnull(da.DMDMARTLV2).sum()
# We are relabeling the Gender variable as well as we will we working on them as well
# we relabel so that any changes will not be made to the roiginal dataset and
# also all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Male and Female
da["RIAGENDRV2"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})
# We figure out that the numbers dont add up meaning there are some missing values
# and so we get all those values by the .fillna method
da["DMDMARTLV2"] = da.DMDMARTLV2.fillna("Missing")
da.DMDMARTLV2.value_counts()
# this is to get the frequency table for Females and Males individually
da.groupby("RIAGENDRV2")["DMDMARTLV2"].value_counts()
# this is to get the agegroup 30 to 40
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")["DMDMARTLV2"].value_counts()
# this is to get the agegroup 30 to 40 with males and females
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")("RIAGENDRV2")["DMDMARTLV2"].value_counts()
上面的代码给出了一个TypeError:“DataFrameGroupBy”对象不可调用
我得到了答案,不再需要回答这个帖子了: 代码行是:da.groupby([“agegrp”,“RIAGENDRV2”])[“dmartlv2”]。value_counts()
相关问题 更多 >
编程相关推荐