使用定义的id将xml文件导入Access数据库

2024-06-26 00:15:12 发布

您现在位置:Python中文网/ 问答频道 /正文

为了将大量数据从xml文件导入Access,我正在努力工作。 我面临的问题是,我要导入的文件确实包含id为的第一行

<vin id="11111111111111111">
<description>Mazda3 L 2.0l MZR 150 PS 4T 5AG AL-EDITION TRA-P</description>
<type>BL</type>
<typeapproval>e11*2001/116*0262*07</typeapproval>
<variant>B2F</variant>
<version>7EU</version>
<series>Mazda3</series>
<body>L</body>
<engine>2.0l MZR 150 PS</engine>
<grade>AL-EDITION</grade>
<transmission>5AG</transmission>
<colourtype>Mica</colourtype>
<extcolourcode>34K</extcolourcode>
<extcolourcodedescription>Crystal White Pearl</extcolourcodedescription>
<intcolourcode>BU4</intcolourcode>
<intcolourcodedescription>Black</intcolourcodedescription>
<registrationdate>2012-07-20</registrationdate>
<productiondate>2011-11-30</productiondate>
</vin>

因此,我导入的结果是所有行,除了实际定义为id的车辆VIN编号

我试图手动替换如下字符: “>;等 等等

为了去掉那个id,但我实际上有几十个文件,每个文件中有几十万条记录,所以这是相当痛苦的

所以我考虑用python脚本将所有文件压缩在一起:

import os 
import csv
import pandas as pd
import numpy as np

ver='2011'

dirName =r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\xml'.format(ver);

out_file=r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\Output.xml'.format(ver);


def getListOfFiles(dirName):
    # create a list of file and sub directories 
    # names in the given directory
  
            listOfFile = os.listdir(dirName)
            allFiles = list()
    # Iterate over all the entries
            for entry in listOfFile:
        # Create full path
               
                fullPath = os.path.join(dirName, entry)
        # If entry is a directory then get the list of files in this directory 
                if os.path.isdir(fullPath):
                    allFiles = allFiles + getListOfFiles(fullPath)
                else:
                    allFiles.append(fullPath)
                if os.path.isdir(fullPath):
                    allFiles = allFiles + getListOfFiles(fullPath)
                
            return allFiles

listOfFileOut=getListOfFiles(dirName)

#filenames = allFiles
with open(out_file, 'w',encoding='ANSI') as outfile:
    for fname in listOfFileOut:
        with open(fname,encoding='ANSI') as infile:
            for line in infile:
                outfile.write(line)
                
print("Done")

但这完全破坏了xml文件的结构,我无法再导入它了。 有谁能建议一下,是否可以使用python去掉所有这些ID,以便能够在access中导入整个数据库

提前谢谢你


Tags: 文件pathinimportidosasde
1条回答
网友
1楼 · 发布于 2024-06-26 00:15:12

试试这个

from simplified_scrapy import utils, SimplifiedDoc, req

dirName = r'C:\Users\dawid\Desktop\DE_DATA\Mazda_DE\VINs_DE\Mazda\xml'
listFile = utils.getSubFile(dirName, end='.xml')
for f in listFile:
    doc = SimplifiedDoc(utils.getFileContent(f, encoding='ANSI'))
    doc.replaceReg('<vin[^>]*>', '<vin>')
    print(doc.html)
    # utils.saveFile(f, doc.html, encoding='ANSI') # write to original file

结果:

<vin>
<description>Mazda3 L 2.0l MZR 150 PS 4T 5AG AL-EDITION TRA-P</description>
<type>BL</type>
<typeapproval>e11*2001/116*0262*07</typeapproval>
<variant>B2F</variant>
<version>7EU</version>
...

相关问题 更多 >