如何在python中从jupyter笔记本中获取数据集的三列输出

2024-10-04 05:20:20 发布

您现在位置:Python中文网/ 问答频道 /正文

问题: 重新标记婚姻状况变量DMARTL,使其具有简短但信息丰富的字符标签。然后为所有人、女性和男性构建这些值的频率表。然后仅使用年龄在30到40岁之间的人构建这三个频率表。 现在我已经完成了所有的工作,除了30到40岁的男性和女性 下面是到目前为止的全部代码,这是到数据集的链接:https://raw.githubusercontent.com/Mauliklm10/Cartwheel.csv/master/datasetNHANES.csv

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as np

da = pd.read_csv("nhanes_2015_2016.csv") # this is where the dataset link will be entered

# prints the data in descending order
da.DMDMARTL.value_counts()

# We are now giving the numbers actual variable names
# The new relabeled variable will be a string first
# all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Married, Divorced etc.
da["DMDMARTLV2"] = da.DMDMARTL.replace({1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never_Married",
                                     6:"Living_With_Partner",77:"Refused",99:"Dont_Know"})
da.DMDMARTLV2.value_counts()

# Below is the way to find out the values that have been lost/are missing
pd.isnull(da.DMDMARTLV2).sum()

# We are relabeling the Gender variable as well as we will we working on them as well
# we relabel so that any changes will not be made to the roiginal dataset and 
# also all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Male and Female
da["RIAGENDRV2"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})

# We figure out that the numbers dont add up meaning there are some missing values 
# and so we get all those values by the .fillna method
da["DMDMARTLV2"] = da.DMDMARTLV2.fillna("Missing")
da.DMDMARTLV2.value_counts()

# this is to get the frequency table for Females and Males individually
da.groupby("RIAGENDRV2")["DMDMARTLV2"].value_counts()

# this is to get the agegroup 30 to 40
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")["DMDMARTLV2"].value_counts()
# this is to get the agegroup 30 to 40 with males and females
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")("RIAGENDRV2")["DMDMARTLV2"].value_counts()

上面的代码给出了一个TypeError:“DataFrameGroupBy”对象不可调用


Tags: andcsvthetoimportisvalueas