解析xml文件时发生java nio错误
我在Jython中有一个函数,这个函数使用Popen运行另一个程序,该程序将xml文件写入它的标准输出,该标准输出指向一个文件。当这个过程完成后,我关闭这个文件并调用另一个函数来解析它。在解析过程中,我收到了一系列错误消息,涉及到访问关闭的文件和/或格式不正确的xml文件(当我查看这些文件时,它们看起来很好)。我认为输出。close()可能在关闭文件之前返回,因此我添加了一个等待输出的循环。接近真实。起初这似乎很有效,但后来我的程序打印了以下内容
blasting
blasted
parsing
parsed
Extending genes found via genemark, 10.00% done
blasting
blasted
parsing
Exception in thread "_CouplerThread-7 (stdout)" Traceback (most recent call last):
File "/Users/mbsulli/jython/Lib/subprocess.py", line 675, in run
self.write_func(buf)
IOError: java.nio.channels.AsynchronousCloseException
[Fatal Error] 17_2_corr.blastp.xml:15902:63: XML document structures must start and end within the same entity.
Retry
blasting
blasted
parsing
Exception in thread "_CouplerThread-9 (stdout)" Traceback (most recent call last):
File "/Users/mbsulli/jython/Lib/subprocess.py", line 675, in run
self.write_func(buf)
IOError: java.nio.channels.ClosedChannelException
[Fatal Error] 17_2_corr.blastp.xml:15890:30: XML document structures must start and end within the same entity.
Retry
blasting
我不确定我的选择是什么。在解析xml之前,我认为它不是编写的,这是对的吗?如果是的话,我可以确定是谁
def parseBlast(fileName):
"""
A function for parsing XML blast output.
"""
print "parsing"
reader = XMLReaderFactory.createXMLReader()
reader.entityResolver = reader.contentHandler = BlastHandler()
reader.parse(fileName)
print "parsed"
return dict(map(lambda iteration: (iteration.query, iteration), reader.getContentHandler().iterations))
def cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote = False, force = False):
"""
Performs a blast search using the blastp executable and database in blastLocation on
the query with the eValue. The result is an XML file saved to fileName. If fileName
already exists the search is skipped. If remote is true then the search is done remotely.
"""
if not os.path.isfile(fileName) or force:
output = open(fileName, "w")
command = [blastLocation + "/bin/blastp",
"-evalue", str(eValue),
"-outfmt", "5",
"-query", query]
if remote:
command += ["-remote",
"-db", database]
else:
command += ["-num_threads", str(Runtime.getRuntime().availableProcessors()),
"-db", database]
print "blasting"
blastProcess = subprocess.Popen(command,
stdout = output)
while blastProcess.poll() == None:
if pipeline.exception:
print "Stopping in blast"
blastProcess.kill()
output.close()
raise pipeline.exception
output.close()
while not output.closed:
pass
print "blasted"
try:
return parseBlast(fileName)
except SAXParseException:
print 'Retry'
return cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote, True)
# 1 楼答案
我认为这个问题是在我从对子进程调用wait切换到使用poll方法时开始的,这样我就可以在进程运行时停止进程。因为我已经得到了我处理过的许多数据集的结果,所以在我再次启动子流程之前需要一段时间,所以很难说。不管怎么说,我的猜测是,当我关闭输出时,它仍然被写入,我的解决方案是切换到管道并自己编写文件