为了将大量数据从xml文件导入Access,我正在努力工作。 我面临的问题是,我要导入的文件确实包含id为的第一行
<vin id="11111111111111111">
<description>Mazda3 L 2.0l MZR 150 PS 4T 5AG AL-EDITION TRA-P</description>
<type>BL</type>
<typeapproval>e11*2001/116*0262*07</typeapproval>
<variant>B2F</variant>
<version>7EU</version>
<series>Mazda3</series>
<body>L</body>
<engine>2.0l MZR 150 PS</engine>
<grade>AL-EDITION</grade>
<transmission>5AG</transmission>
<colourtype>Mica</colourtype>
<extcolourcode>34K</extcolourcode>
<extcolourcodedescription>Crystal White Pearl</extcolourcodedescription>
<intcolourcode>BU4</intcolourcode>
<intcolourcodedescription>Black</intcolourcodedescription>
<registrationdate>2012-07-20</registrationdate>
<productiondate>2011-11-30</productiondate>
</vin>
因此,我导入的结果是所有行,除了实际定义为id的车辆VIN编号
我试图手动替换如下字符: “>;等 等等
为了去掉那个id,但我实际上有几十个文件,每个文件中有几十万条记录,所以这是相当痛苦的
所以我考虑用python脚本将所有文件压缩在一起:
import os
import csv
import pandas as pd
import numpy as np
ver='2011'
dirName =r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\xml'.format(ver);
out_file=r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\Output.xml'.format(ver);
def getListOfFiles(dirName):
# create a list of file and sub directories
# names in the given directory
listOfFile = os.listdir(dirName)
allFiles = list()
# Iterate over all the entries
for entry in listOfFile:
# Create full path
fullPath = os.path.join(dirName, entry)
# If entry is a directory then get the list of files in this directory
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
else:
allFiles.append(fullPath)
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
return allFiles
listOfFileOut=getListOfFiles(dirName)
#filenames = allFiles
with open(out_file, 'w',encoding='ANSI') as outfile:
for fname in listOfFileOut:
with open(fname,encoding='ANSI') as infile:
for line in infile:
outfile.write(line)
print("Done")
但这完全破坏了xml文件的结构,我无法再导入它了。 有谁能建议一下,是否可以使用python去掉所有这些ID,以便能够在access中导入整个数据库
提前谢谢你
试试这个
结果:
相关问题 更多 >
编程相关推荐