csv-fi中的Python条件过滤

2024-06-01 18:45:25 发布

您现在位置:Python中文网/ 问答频道 /正文

请帮忙!我尝试过不同的东西/包编写一个程序,它接受4个输入,并根据csv文件的输入组合返回一个组的写入分数统计。这是我的第一个项目,所以我将非常感谢您的任何见解/提示/提示!在

以下是csv示例(共有200行):

id  gender  ses schtyp  prog        write
70  male    low public  general     52
121 female  middle  public  vocation    68
86  male    high    public  general     33
141 male    high    public  vocation    63      
172 male    middle  public  academic    47
113 male    middle  public  academic    44
50  male    middle  public  general     59
11  male    middle  public  academic    34      
84  male    middle  public  general     57      
48  male    middle  public  academic    57      
75  male    middle  public  vocation    60      
60  male    middle  public  academic    57  

以下是我目前所掌握的情况:

^{pr2}$

我所缺少的是如何过滤和获取特定组的统计信息。例如,假设我输入男性、公共、中等和学术——我想得到该子集的平均写作分数。我尝试了pandas中的groupby函数,但是它只得到大范围组的统计信息(比如public和private)。我也尝试了pandas的DataFrame,但这只能让我过滤一个输入,不知道如何获得写作分数。如有任何提示,将不胜感激!在


Tags: 文件csv项目程序信息示例middlepandas
2条回答

Ramon一致的是,Pandas绝对是最好的选择,一旦你习惯了它,它就拥有非凡的过滤/子设置功能。但首先要把你的头绕过来可能很难(至少对我来说是这样!),所以我从我的一些旧代码中找到了一些您需要的子设置的示例。下面的变量itu是一个Pandas数据框架,包含了不同国家的数据。在

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US

pandas。我认为它将缩短你的csv解析工作,并提供你所要求的子集功能。。。在

import pandas as pd
data = pd.read_csv('fileName.txt', delim_whitespace=True)

#get all of the male students
data[data['gender'] == 'male']

相关问题 更多 >