更新更新:
我做了以下工作,结果奏效了: 1将if-if-elif结构替换为if-elif-else(见下文)。 2将dec计算为字符串(即dec=='1'而不是dec==1)
if len(SframeDup.index) > 0 and dec == '1':
SframeDup.to_csv('NWEA CSVs/Students/StudentDuplicates.csv', sep=',')
print ("%d instances of repeated student IDs detected." % len(SframeDup.index))
print ("See StudentDuplicates.csv for duplicates.")
print ("\nThis program will now stop.")
raise SystemExit
#quit() and exit() work too, but only in the editor
#doing this in Ipython Notebook will restart the kernal and require
#re-running and re-compiling preceeding code
elif len(SframeDup.index) >0 and dec == '2':
print ("%d instances of repeated student IDs detected." % len(SframeDup.index))
print ("See StudentDuplicates.csv for duplicates.")
Sframe['dup_check_1'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = False)
Sframe['dup_check_2'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = True)
Sframe = Sframe[(Sframe['dup_check_1'] == False) & (Sframe['dup_check_2'] == False)]
del Sframe['dup_check_1'], Sframe['dup_check_2']
else:
print ("No duplicates found. Oh yeah!")
更新:
尽管我已经尽我所能“继续前进”,但我还是想尽可能地记录下来。我粘贴了两组代码;第一组尝试使用if elif,但未能使Sframe消除重复项。第二个成功地省略了重复项,但要做到这一点,我必须去掉if elif。在
^{pr2}$输出:2840
import pandas as pd
import numpy as np
import glob
import csv
import os
import sys
path = r'NWEA CSVs/Students/Raw'
allFiles = glob.glob(path + "/*.csv")
Sframe = pd.DataFrame()
list = []
for file in allFiles:
sdf = pd.read_csv(file,index_col=None, header=0)
list.append(sdf)
Sframe = pd.concat(list,ignore_index=False)
Sframe.to_csv('NWEA CSVs/Students/OutStudents.csv', sep=',')
Sframe["TermSchoolStudent"]=Sframe["TermName"]+Sframe["SchoolName"]+\
Sframe["StudentID"].map(str)
SframeDup = Sframe[Sframe.duplicated("TermSchoolStudent") == True]
if len(SframeDup.index) > 0:
SframeDup.to_csv('NWEA CSVs/Students/StudentDuplicates.csv', sep=',')
print ("%d instances of repeated student IDs detected." % len(SframeDup.index))
print ("See StudentDuplicates.csv for duplicates.")
Sframe['dup_check_1'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = False)
Sframe['dup_check_2'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = True)
Sframe = Sframe[(Sframe['dup_check_1'] == False) & (Sframe['dup_check_2'] == False)]
del Sframe['dup_check_1'], Sframe['dup_check_2']
print (len(Sframe))
输出:2834
**
在** 我有一个我认为是一个简单的问题,答案对我作为一个新程序员来说并不明显。基本上,我有一个数据帧(Sframe),我的程序会检查它是否重复。如果用户指示程序应在没有重复项的情况下继续,则从数据帧中删除重复项(及其唯一值),并且在删除重复项的情况下使Sframe等于Sframe(因此用修改后的Sframe替换原始Sframe)。之后,在主程序中,如果用户如上所述选择了“2”,则Sframe应该是修改后的版本。否则,如果一开始就没有检测到重复项(因此用户输入从未输入),则应该使用原始的Sframe。在
我的代码如下所示:
Import Pandas as pd
Sframe = pd.DataFrame()
在这里,代码检查重复项。如果它们存在,则以下运行。 如果它们不存在,则跳过以下内容,并按最初定义使用Sframe。在
这是假定检测到重复项的代码:
dec = input("-->")
if dec == 1:
print ("This program will now stop.")
print ("this_file.csv to resolve a problem.")
raise SystemExit
elif dec == 2:
# add "Repeated" field to student with duplicates table. Values="NaN"
SframeDup["Repeated"]="NaN"
# New table joins (left, inner) Sframe with duplicates table (SframeDup) to
# identify all rows of duplicates (including the unique values that had
# duplicates)
SframeWDup=pd.merge(Sframe, SframeDup, on='identifier', how='left')
# Eliminate all repeating rows, including originals as pulled during left join
SframeWODup=SframeWDup[SframeWDup.Repeated_y!="NaN"]
# So here, in my mind, I should be able to just do this and the rest of
# the code should treat replace Sframe with SframeWODup (without the found
# duplicates)...
Sframe = SframeWODup
但它不起作用。我知道这一点是因为当我在选择2
以消除重复项(及其唯一的原始值)后选中len(Sframe)
时,我得到的数字与处理重复项之前的相同。在
提前谢谢你的帮助。如果不清楚,我很乐意澄清。在
更新: Sframe.类型 TermName对象
DistrictName对象
学校对象名称
StudentLastName对象
StudentFirstName对象
StudentMI对象
StudentID对象
StudentDateOfBirth对象
StudentEthnicGroup对象
学生性别对象
Grade对象
TermSchoolStudent对象
数据类型:对象
在S框架.头部()返回映像中以下链接处的表: https://drive.google.com/file/d/0B1cr7dwUpr_JR3d0YzlwLWFwQU0/view?usp=sharing
我做了以下几件事,它奏效了:1。将if-if-elif结构替换为if-elif-else(见下文)。2将dec计算为字符串(即dec=='1'而不是dec==1)
尝试
Sframe = SframeWODup.copy()
更新: 你能用这段代码来达到你想要的结果吗?在相关问题 更多 >
编程相关推荐