我使用一个ETL工具,它将python2.6作为一种内置脚本语言,因此当我需要将一个大文件分割成块进行下游处理时。这似乎是一个明显的选择。我最初使用python2.6在macbook(osx10.8)上编写并测试了这个脚本。你知道吗
当我把它移到窗户上时,我惊讶地发现它慢了10倍。。。甚至是企业级服务器(32核64GB光纤通道SAN等)。你知道吗
当试图缩小差异时,mac osx在评论写操作时几乎没有什么区别,而windows则增加了5倍
osx和windows之间有什么基本的文件IO区别吗?你知道吗
感谢您的帮助:)
import os
import sys
import re
from time import time
t = time()
"""
# Split a pre sorted text file into multiple outputs based on the leftmost element
# delimited by spaces.
# The second element can be used for an additional sort and will stripped from the
# output when 'isLeadingSort=1'
#
# parameter:
# path: char path for the input file
# outPath: char path for the output files
# isLeadingSort int use the 2nd of 3rd element as output data
# isdbg int enable debug prints
"""
# Just use the cmd at the moment for test
path= sys.argv[1]
outPath = sys.argv[2]
isLeadingSort = int(sys.argv[3])
isdbg = int(sys.argv[4])
#outPath = os.getcwd()
#isLeadingSort = 0
#isdbg = 0
# define all the functions up front
def printStr(str):
""" print when the debug option is set """
if isdbg:
print (str)
def testPath(path):
"""raise an exception if we cant find the path or file"""
if not os.path.exists(path):
raise Exception ('File not found: ' + path )
return false
#
# This is where we start
#
# check that the paths exist or raise an exception
testPath(path)
testPath(outPath)
printStr ('paths ok')
#init
arline = []
fnameOut = chr(1) # init the output filename
line=object()
fOut=object()
# open the input file for reading and process though in a loop
with open(path,'r') as f:
for line in f:
printStr( 'for line in f: ' )
if isLeadingSort:
wrds=2
else:
wrds=1
arLine = re.split('[ \n]+',line,wrds)
newFname = arLine[0]
outLine = arLine[len(arLine)-1]
if newFname == fnameOut:
printStr ('writing to open file: ' + fnameOut)
else:
fnameOut = newFname
printStr ('opennextfile: ' + fnameOut + '- closing: ' + str(fOut) )
try:
fOut.close()
except:
pass
if fnameOut in ('' , '\n'):
raise Exception ('Filename is not the first element of the data: ' )
fOut = open(os.path.join(outPath,fnameOut),'w') # open new
#write
fOut.write(outLine)
try:
fOut.close()
except:
pass
print ( 'timediff : ' + str(time() - t))
目前没有回答
相关问题 更多 >
编程相关推荐