Python如何优化文件解析中的迭代器

2024-09-30 22:21:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我得到了具有NTFS审核权限的文件,并使用Python来解析它们。原始CSV文件列出路径,然后列出哪些组具有哪些访问权限,例如这种类型的模式:

E:\DIR A, CREATOR OWNER FullControl
E:\DIR A, Sales FullControl
E:\DIR A, HR Full Control
E:\DIR A\SUBDIR, Sales FullControl
E:\DIR A\SUBDIR, HR FullControl

我的代码解析文件以输出以下内容:

File Access for: E:\DIR A
CREATOR OWNER,FullControl
Sales,FullControl
HR,FullControl

File Access For: E:\DIR A\SUBDIR
Sales,FullControl
HR,FullControl

我是新的发电机,但我想用他们来优化我的代码。我尝试过的东西似乎都不管用,所以这里是原始代码(我知道它很难看)。它能工作,但速度很慢。我唯一能做到这一点的方法是首先解析出路径,将它们放在一个列表中,建立一个集合,使它们唯一,然后遍历该列表并将它们与第二个列表中的路径匹配,然后列出它找到的所有项。就像我说的,虽然很难看,但很管用。你知道吗

import os, codecs, sys
reload(sys)
sys.setdefaultencoding('utf8') // to prevent cp-932 errors on screen

file = "aud.csv"
outfile = "access-2.csv"


filelist = []
accesslist = []
with codecs.open(file,"r",'utf-8-sig') as infile:
    for line in infile:
        newline = line.split(',')
        folder = newline[0].replace("\"","")
        user = newline[1].replace("\"","")
        filelist.append(folder)
        accesslist.append(folder+","+user)

newfl = sorted(set(filelist))

def makeFile():
 print "Starting, please wait"
 for i in range(1,len(newfl)):
  searchItem = str(newfl[i])
  with codecs.open(outfile,"a",'utf-8-sig') as output:
    outtext = ("\r\nFile access for: "+ searchItem + "\r\n")
    output.write(outtext)
    for item in accesslist:
        searchBreak = item.split(",")
        searchTarg = searchBreak[0]
        if searchItem == searchTarg:
            searchBreaknew = searchBreak[1].replace("FSA-INC01S\\","")
            searchBreaknew = str(searchBreaknew)
            # print(searchBreaknew)
            searchBreaknew = searchBreaknew.replace(" ",",")
            searchBreaknew = searchBreaknew.replace("CREATOR,OWNER","CREATOR OWNER")
            output.write(searchBreaknew)

我应该如何优化它?你知道吗

编辑:

这是一个经过编辑的版本。它的工作速度要快得多,尽管我相信它仍然可以修复:

import os, codecs, sys, csv
reload(sys)
sys.setdefaultencoding('utf8')

file = "aud.csv"
outfile = "access-3.csv"


filelist = []
accesslist = []
with codecs.open(file,"r",'utf-8-sig') as csvinfile:
    auditfile = csv.reader(csvinfile, delimiter=",")
    for line in auditfile:
        folder = line[0]
        user = line[1].replace("FSA-INC01S\\","")
        filelist.append(folder)
        accesslist.append(folder+","+user)

newfl = sorted(set(filelist))

def makeFile():
 print "Starting, please wait"
 for i in xrange(1,len(newfl)):
  searchItem = str(newfl[i])
  outtext = ("\r\nFile access for: "+ searchItem + "\r\n")
  accessUserlist = ""
  for item in accesslist:
        searchBreak = item.split(",")
        if searchItem == searchBreak[0]:
            searchBreaknew = str(searchBreak[1]).replace(" ",",")
            searchBreaknew = searchBreaknew.replace("R,O","R O")
            accessUserlist += searchBreaknew+"\r\n"
  with codecs.open(outfile,"a",'utf-8-sig') as output:
    output.write(outtext)
    output.write(accessUserlist)

Tags: csvinforoutputdirsysfolderreplace
1条回答
网友
1楼 · 发布于 2024-09-30 22:21:11

我被你用过的.csv文件扩展名误导了。
您给定的预期输出与csv不兼容,因为在记录中不可能\n
建议使用生成器逐记录返回:

class Audit(object):
    def __init__(self, fieldnames):
        self.fieldnames = fieldnames
        self.__access = {}

    def append(self, row):
        folder = row[self.fieldnames[0]]
        access = row[self.fieldnames[1]].strip(' ')
        access = access.replace("FSA-INC01S\\", "")
        access = access.split(' ')
        if len(access) == 3:
            if access[0] == 'CREATOR':
                access[0] += ' ' + access[1]
                del access[1];
            elif access[1] == 'Full':
                access[1] += ' ' + access[2]
                del access[2];

        if folder not in self.__access:
            self.__access[folder] = []

        self.__access[folder].append(access)

    # Generator for class Audit
    def __iter__(self):
        record = ''
        for folder in sorted(self.__access):
            record = folder + '\n'
            for access in self.__access[folder]:
                record += '%s\n' % (','.join(access) )

            yield record + '\n'

如何使用:

def main():
    import io, csv
    audit = Audit(['Folder', 'Accesslist'])

    with io.open(file, "r", encoding='utf-8') as csc_in:
        for row in csv.DictReader(csc_in, delimiter=","):
            audit.append(row)

    with io.open(outfile, 'w', newline='', encoding='utf-8') as txt_out:
        for record in audit:
            txt_out.write(record)

测试Python:3.4.2-csv:1.0

相关问题 更多 >