如何在python中从jupyter笔记本中获取数据集的三列输出

2024-10-04 05:20:20 发布

男 | 程序猿一只，喜欢编程写python代码。

问题: 重新标记婚姻状况变量DMARTL，使其具有简短但信息丰富的字符标签。然后为所有人、女性和男性构建这些值的频率表。然后仅使用年龄在30到40岁之间的人构建这三个频率表。现在我已经完成了所有的工作，除了30到40岁的男性和女性下面是到目前为止的全部代码，这是到数据集的链接：https://raw.githubusercontent.com/Mauliklm10/Cartwheel.csv/master/datasetNHANES.csv

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
import numpy as np

da = pd.read_csv("nhanes_2015_2016.csv") # this is where the dataset link will be entered

# prints the data in descending order
da.DMDMARTL.value_counts()

# We are now giving the numbers actual variable names
# The new relabeled variable will be a string first
# all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Married, Divorced etc.
da["DMDMARTLV2"] = da.DMDMARTL.replace({1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never_Married",
                                     6:"Living_With_Partner",77:"Refused",99:"Dont_Know"})
da.DMDMARTLV2.value_counts()

# Below is the way to find out the values that have been lost/are missing
pd.isnull(da.DMDMARTLV2).sum()

# We are relabeling the Gender variable as well as we will we working on them as well
# we relabel so that any changes will not be made to the roiginal dataset and 
# also all the data is being stored in the sr. no. like 1, 2, 3 but we make them into meaningful variables like Male and Female
da["RIAGENDRV2"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})

# We figure out that the numbers dont add up meaning there are some missing values 
# and so we get all those values by the .fillna method
da["DMDMARTLV2"] = da.DMDMARTLV2.fillna("Missing")
da.DMDMARTLV2.value_counts()

# this is to get the frequency table for Females and Males individually
da.groupby("RIAGENDRV2")["DMDMARTLV2"].value_counts()

# this is to get the agegroup 30 to 40
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")["DMDMARTLV2"].value_counts()

# this is to get the agegroup 30 to 40 with males and females
da["agegrp"] = pd.cut(da.RIDAGEYR, [30, 40])
da.groupby("agegrp")("RIAGENDRV2")["DMDMARTLV2"].value_counts()

上面的代码给出了一个TypeError:“DataFrameGroupBy”对象不可调用

Tags： and csv the to import is value as

1条回答

网友

1楼 · 发布于 2024-10-04 05:20:20

我得到了答案，不再需要回答这个帖子了：代码行是：da.groupby（[“agegrp”，“RIAGENDRV2”]）[“dmartlv2”]。value_counts（）

如何在python中从jupyter笔记本中获取数据集的三列输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在python中从jupyter笔记本中获取数据集的三列输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >