我正在尝试设置一个脚本,从CSV文件中提取数据以输出到XML文件。我使用这个链接(With PYTHON convert CSV file to XML file)中的信息来创建脚本。它部分工作,但我需要更多的指导,以获得确切的布局
我需要的布局如下:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ImportAcademicExtract xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<AcademicExtract>
<StudentId>StudentID</StudentId>
<LastName>LastName</LastName>
<FirstName>FirstName</FirstName>
<MiddleName>MiddleName</MiddleName>
<SocialSecurityNumber>SocialSecurityNumber</SocialSecurityNumber>
<BirthDate>BirthDate</BirthDate>
<GradeLevel>GradeLevel</GradeLevel>
<SpecialProgramIndicator>SpecialProgramIndicator</SpecialProgramIndicator>
<CIPCode>CIPCode</CIPCode>
<RegisteredHours>RegisteredHours</RegisteredHours>
<PostalAddresses>
<PostalAddress>
<AddressLine1>Address</AddressLine1>
<AddressLine2>Address2ndLn</AddressLine2>
<City>City</City>
<State>State</State>
<PostalCode>PostalCode</PostalCode>
<CountryCode>Country</CountryCode>
</PostalAddress>
</PostalAddresses>
<EmailAddresses>
<EmailAddress>Email</EmailAddress>
</EmailAddresses>
<PhoneNumbers>
<PhoneNumber>PhoneNumber</PhoneNumber>
</PhoneNumbers>
<AdmissionsTerm>
<AdmissionTerm>
<TermName>TermName</TermName>
<AcademicYear>AcademicYear</AcademicYear>
<AdmittedDate>AdmittedDate</AdmittedDate>
</AdmissionTerm>
</AdmissionsTerm>
<EnrollmentTerms>
<EnrollmentTerm>
<TermName>TermNames</TermName>
<AcademicYear>AcademicYears</AcademicYear>
<CumulativeAttemptedHours>CumulativeAttemptedHours</CumulativeAttemptedHours>
<CumulativeRegisteredHours>CumulativeRegisteredHours</CumulativeRegisteredHours>
<CumulativeEarnedHours>CumulativeEarnedHours</CumulativeEarnedHours>
<CumulativeGPA>CumulativeGPA</CumulativeGPA>
<TermStartDate>TermStartDate</TermStartDate>
<TermEndDate>TermEndDate</TermEndDate>
<EnrollmentType>EnrollmentType</EnrollmentType>
<EnrollmentStatus>EnrollmentStatus</EnrollmentStatus>
<FirstTimeDegreeSeeking>FirstTimeDegreeSeeking</FirstTimeDegreeSeeking>
<IntentToReturn>IntentToReturn</IntentToReturn>
<WithdrawnDate>WithdrawnDate</WithdrawnDate>
<ExtTermName>ExtTermName</ExtTermName>
<AcademicYearStartDate>AcademicYearStartDate</AcademicYearStartDate>
<AcademicYearEndDate>AcademicYearEndDate</AcademicYearEndDate>
<ExtEnrollmentType>ExtEnrollmentType</ExtEnrollmentType>
<DateOfDetermination>DateOfDetermination</DateOfDetermination>
<LastDateOfAttendance>LastDateOfAttendance</LastDateOfAttendance>
<SAPStatus>SAPStatus</SAPStatus>
</EnrollmentTerm>
</EnrollmentTerms>
<AcademicPrograms>
<AcademicProgram>
<ProgramCredentialLevel>ProgramCredentialLevel</ProgramCredentialLevel>
<ProgramName>ProgramName</ProgramName>
<EdMajor1>EdMajor1</EdMajor1>
<SiteName>SiteName</SiteName>
<EffectiveStartDate>EffectiveStartDate</EffectiveStartDate>
<EffectiveEndDate>EffectiveEndDate</EffectiveEndDate>
<GraduationDate>GraduationDate</GraduationDate>
<AnticipatedGraduationDate>AnticipatedGraduationDate</AnticipatedGraduationDate>
<ProgramLengthInWeeks>ProgramLengthInWeeks</ProgramLengthInWeeks>
<ProgramLengthInMonths>ProgramLengthInMonths</ProgramLengthInMonths>
<ProgramLengthInYears>ProgramLengthInYears</ProgramLengthInYears>
<AcademicYearBeginDate>AcademicYearBeginDate2</AcademicYearBeginDate>
<AcademicYearEndDate>AcademicYearEndDate2</AcademicYearEndDate>
<WeeksInProgramAcademicYear>WeeksInProgramAcademicYear</WeeksInProgramAcademicYear>
</AcademicProgram>
</AcademicPrograms>
</AcademicExtract>
我正在使用以下脚本:
import itertools
import csv
import os
csvFile = r'J:\JFAFiles\JFA-BR.csv'
xmlFile = r'J:\JFAFiles\XML-BR.xml'
csvData = csv.reader(open(csvFile))
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="UTF-8" standalone="yes"?>' + "\n" +'<ImportAcademicExtract xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">' + "\n" )
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write(' '+'<AcademicExtract>' +"\n")
for i in range (len(tags)):
xmlData.write(' ' +'<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
xmlData.write(' '+'</AcademicExtract>' + "\n")
rowNum +=1
xmlData.write('</ImportAcademicExtract>' + "\n")
xmlData.close()
当我运行脚本时,我不知道如何将插入的区域显示出来(如邮政地址区域、电子邮件地址、电话号码、入学条件等),所有内容都保持一致
下面是运行脚本的输出。下划线是从我删除的文件中输入文本的位置,因为该文件包含敏感信息。文本过来正确,我只是不知道如何让这些其他领域插入更多
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ImportAcademicExtract xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<AcademicExtract>
<StudentID>____</StudentID>
<LastName>____</LastName>
<FirstName>____</FirstName>
<MiddleName>____</MiddleName>
<SocialSecurityNumber>____</SocialSecurityNumber>
<BirthDate>____</BirthDate>
<GradeLevel>____</GradeLevel>
<SpecialProgramIndicator>____</SpecialProgramIndicator>
<CIPCode>____</CIPCode>
<RegisteredHours>____</RegisteredHours>
<Address>____</Address>
<Address2ndLn>____</Address2ndLn>
<City>____</City>
<State>____</State>
<PostalCode>____</PostalCode>
<Country>____</Country>
<Email>____</Email>
<PhoneNumber>____</PhoneNumber>
<TermName>____</TermName>
<AcademicYear>____</AcademicYear>
<AdmittedDate>____</AdmittedDate>
<TermName2>____</TermName2>
<AcademicYear2>____</AcademicYear2>
<CumulativeAttemptedHours>____</CumulativeAttemptedHours>
<CumulativeRegisteredHours>____</CumulativeRegisteredHours>
<CumulativeEarnedHours>____</CumulativeEarnedHours>
<CumulativeGPA>____</CumulativeGPA>
<TermStartDate>____</TermStartDate>
<TermEndDate>____</TermEndDate>
<EnrollmentType>____</EnrollmentType>
<EnrollmentStatus>____</EnrollmentStatus>
<FirstTimeDegreeSeeking>____</FirstTimeDegreeSeeking>
<IntentToReturn>____</IntentToReturn>
<WithdrawnDate>____</WithdrawnDate>
<ExtTermName>____</ExtTermName>
<AcademicYearStartDate>____</AcademicYearStartDate>
<AcademicYearEndDate>____</AcademicYearEndDate>
<ExtEnrollmentType>____</ExtEnrollmentType>
<DateOfDetermination>____</DateOfDetermination>
<LastDateOfAttendance>____</LastDateOfAttendance>
<SAPStatus>____</SAPStatus>
<ProgramCredentialLevel>____</ProgramCredentialLevel>
<ProgramName>____</ProgramName>
<EdMajor1>____</EdMajor1>
<SiteName>____</SiteName>
<EffectiveStartDate>____</EffectiveStartDate>
<EffectiveEndDate>____</EffectiveEndDate>
<GraduationDateDate>____</GraduationDateDate>
<AnticipatedGraduationDate>____</AnticipatedGraduationDate>
<ProgramLengthInWeeks>____</ProgramLengthInWeeks>
<ProgramLengthInMonths>____</ProgramLengthInMonths>
<ProgramLengthInYears>____</ProgramLengthInYears>
<AcademicYearBeginDate2>____</AcademicYearBeginDate2>
<AcademicYearEndDate2>____</AcademicYearEndDate2>
<WeeksInProgramAcademicYear>____</WeeksInProgramAcademicYear>
</AcademicExtract>
</ImportAcademicExtract>
我尝试将此添加到脚本中,但没有成功:
rowNum = 0
for row in csvData:
if rowNum == 0:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write(' '+'<AcademicExtract>' +"\n")
for i in range (len(tags)):
xmlData.write(' ' +'<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
if rowNum == 11:
tags = row
# replace spaces w/ underscores in tag names
for i in range(len(tags)):
tags[i] = tags[i].replace(' ', '_')
else:
xmlData.write(' '+'<PostalAddresses>' +"\n"+' '+'<PostalAddress>' +"\n")
for i in range (len(tags)):
xmlData.write(' ' +'<' + tags[i] + '>' \
+ row[i] + '</' + tags[i] + '>' + "\n")
任何关于如何编辑编码以获得正确布局的帮助都将不胜感激
我认为熊猫数据帧更容易做到这一点,你可以使用下面的脚本,它可能不是很有效
您所需要做的就是更改xml_映射以定义所需的输出
将此python脚本保存在名为csv2xml.py的文件中
对于xml_map变量,我从第二个示例输出中提取了元素列表,因为这看起来像文件中的实际列名
xml_映射数组中的列名,方括号[…]之间的任何内容都必须与输入文件中的列名匹配
在本例中,输入文件为:
注意:CSV文件的第一行必须包含列名
我假设您正在使用powershell,并且已经安装了python。按如下方式运行脚本:
注意:在文件名周围使用引号,因为windows允许在文件名和路径中使用空格
如果出现错误:ModuleNotFoundError:没有名为“pandas”的模块,则需要安装pandas模块
如果在如下方式运行时出现密钥错误:
注意:列名区分大小写
然后,您可以运行以下python代码来检查数据帧中的实际列名
在上面的示例输入上运行脚本的结果是:
相关问题 更多 >
编程相关推荐