用另一个数据帧替换整个数据帧（覆盖）（python3.4pandas）问题的回答

用另一个数据帧替换整个数据帧（覆盖）（python3.4pandas）

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

更新更新： 我做了以下工作，结果奏效了： 1将if-if-elif结构替换为if-elif-else（见下文）。 2将dec计算为字符串（即dec=='1'而不是dec==1） <pre><code>if len(SframeDup.index) > 0 and dec == '1': SframeDup.to_csv('NWEA CSVs/Students/StudentDuplicates.csv', sep=',') print ("%d instances of repeated student IDs detected." % len(SframeDup.index)) print ("See StudentDuplicates.csv for duplicates.") print ("\nThis program will now stop.") raise SystemExit #quit() and exit() work too, but only in the editor #doing this in Ipython Notebook will restart the kernal and require #re-running and re-compiling preceeding code elif len(SframeDup.index) >0 and dec == '2': print ("%d instances of repeated student IDs detected." % len(SframeDup.index)) print ("See StudentDuplicates.csv for duplicates.") Sframe['dup_check_1'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = False) Sframe['dup_check_2'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = True) Sframe = Sframe[(Sframe['dup_check_1'] == False) & (Sframe['dup_check_2'] == False)] del Sframe['dup_check_1'], Sframe['dup_check_2'] else: print ("No duplicates found. Oh yeah!") </code></pre> 更新： 尽管我已经尽我所能“继续前进”，但我还是想尽可能地记录下来。我粘贴了两组代码；第一组尝试使用if elif，但未能使Sframe消除重复项。第二个成功地省略了重复项，但要做到这一点，我必须去掉if elif。在 ^{pr2}$ 输出：2840 <pre><code>import pandas as pd import numpy as np import glob import csv import os import sys path = r'NWEA CSVs/Students/Raw' allFiles = glob.glob(path + "/*.csv") Sframe = pd.DataFrame() list = [] for file in allFiles: sdf = pd.read_csv(file,index_col=None, header=0) list.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(sdf) Sframe = pd.concat(list,ignore_index=False) Sframe.to_csv('NWEA CSVs/Students/OutStudents.csv', sep=',') Sframe["TermSchoolStudent"]=Sframe["TermName"]+Sframe["SchoolName"]+\ Sframe["StudentID"].map(str) SframeDup = Sframe[Sframe.duplicated("TermSchoolStudent") == True] if len(SframeDup.index) > 0: SframeDup.to_csv('NWEA CSVs/Students/StudentDuplicates.csv', sep=',') print ("%d instances of repeated student IDs detected." % len(SframeDup.index)) print ("See StudentDuplicates.csv for duplicates.") Sframe['dup_check_1'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = False) Sframe['dup_check_2'] = Sframe.duplicated(cols = ['TermName', 'SchoolName', 'StudentID'], take_last = True) Sframe = Sframe[(Sframe['dup_check_1'] == False) & (Sframe['dup_check_2'] == False)] del Sframe['dup_check_1'], Sframe['dup_check_2'] print (len(Sframe)) </code></pre> 输出：2834 ** <ul> <li>老东西：</li> </ul> 在** 我有一个我认为是一个简单的问题，答案对我作为一个新程序员来说并不明显。基本上，我有一个数据帧（Sframe），我的程序会检查它是否重复。如果用户指示程序应在没有重复项的情况下继续，则从数据帧中删除重复项（及其唯一值），并且在删除重复项的情况下使Sframe等于Sframe（因此用修改后的Sframe替换原始Sframe）。之后，在主程序中，如果用户如上所述选择了“2”，则Sframe应该是修改后的版本。否则，如果一开始就没有检测到重复项（因此用户输入从未输入），则应该使用原始的Sframe。在 我的代码如下所示： <pre><code>Import Pandas as pd Sframe = pd.DataFrame() </code></pre> 在这里，代码检查重复项。如果它们存在，则以下运行。如果它们不存在，则跳过以下内容，并按最初定义使用Sframe。在 这是假定检测到重复项的代码： <pre><code>dec = input("-->") if dec == 1: print ("This program will now stop.") print ("this_file.csv to resolve a problem.") raise SystemExit elif dec == 2: # add "Repeated" field to student with duplicates table. Values="NaN" SframeDup["Repeated"]="NaN" # New table joins (left, inner) Sframe with duplicates table (SframeDup) to # identify all rows of duplicates (including the unique values that had # duplicates) SframeWDup=pd.merge(Sframe, SframeDup, on='identifier', how='left') # Eliminate all repeating rows, including originals as pulled during left join SframeWODup=SframeWDup[SframeWDup.Repeated_y!="NaN"] # So here, in my mind, I should be able to just do this and the rest of # the code should treat replace Sframe with SframeWODup (without the found # duplicates)... Sframe = SframeWODup </code></pre> 但它不起作用。我知道这一点是因为当我在选择<code>2</code>以消除重复项（及其唯一的原始值）后选中<code>len(Sframe)</code>时，我得到的数字与处理重复项之前的相同。在 提前谢谢你的帮助。如果不清楚，我很乐意澄清。在 更新： Sframe.类型 TermName对象 DistrictName对象 学校对象名称 StudentLastName对象 StudentFirstName对象 StudentMI对象 StudentID对象 StudentDateOfBirth对象 StudentEthnicGroup对象 学生性别对象 Grade对象 TermSchoolStudent对象 数据类型：对象 在S框架.头部（）返回映像中以下链接处的表： <a href="https://drive.google.com/file/d/0B1cr7dwUpr_JR3d0YzlwLWFwQU0/view?usp=sharing" rel="nofollow">https://drive.google.com/file/d/0B1cr7dwUpr_JR3d0YzlwLWFwQU0/view?usp=sharing</a>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

用另一个数据帧替换整个数据帧（覆盖）（python3.4pandas）

1 个回答

相关Python问题