有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

解析xml文件时发生java nio错误

我在Jython中有一个函数,这个函数使用Popen运行另一个程序,该程序将xml文件写入它的标准输出,该标准输出指向一个文件。当这个过程完成后,我关闭这个文件并调用另一个函数来解析它。在解析过程中,我收到了一系列错误消息,涉及到访问关闭的文件和/或格式不正确的xml文件(当我查看这些文件时,它们看起来很好)。我认为输出。close()可能在关闭文件之前返回,因此我添加了一个等待输出的循环。接近真实。起初这似乎很有效,但后来我的程序打印了以下内容

blasting  
blasted  
parsing  
parsed  
    Extending genes found via genemark, 10.00% done  
blasting  
blasted  
parsing  
Exception in thread "_CouplerThread-7 (stdout)" Traceback (most recent call last):  
  File "/Users/mbsulli/jython/Lib/subprocess.py", line 675, in run  
    self.write_func(buf)  
IOError: java.nio.channels.AsynchronousCloseException  
[Fatal Error] 17_2_corr.blastp.xml:15902:63: XML document structures must start and end within the same entity.  
Retry  
blasting  
blasted  
parsing  
Exception in thread "_CouplerThread-9 (stdout)" Traceback (most recent call last):  
  File "/Users/mbsulli/jython/Lib/subprocess.py", line 675, in run  
    self.write_func(buf)  
IOError: java.nio.channels.ClosedChannelException  
[Fatal Error] 17_2_corr.blastp.xml:15890:30: XML document structures must start and end within the same entity.  
Retry  
blasting  

我不确定我的选择是什么。在解析xml之前,我认为它不是编写的,这是对的吗?如果是的话,我可以确定是谁

def parseBlast(fileName):
  """
  A function for parsing XML blast output.
  """
  print "parsing"
  reader = XMLReaderFactory.createXMLReader()
  reader.entityResolver = reader.contentHandler = BlastHandler()
  reader.parse(fileName)
  print "parsed"

  return dict(map(lambda iteration: (iteration.query, iteration), reader.getContentHandler().iterations))

def cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote = False, force = False):
  """
  Performs a blast search using the blastp executable and database in blastLocation on
  the query with the eValue.  The result is an XML file saved to fileName.  If fileName
  already exists the search is skipped.  If remote is true then the search is done remotely.
  """
  if not os.path.isfile(fileName) or force:
    output = open(fileName, "w")
    command = [blastLocation + "/bin/blastp",
               "-evalue", str(eValue),
               "-outfmt", "5",
               "-query", query]
    if remote:
      command += ["-remote",
                  "-db", database]
    else:
      command += ["-num_threads", str(Runtime.getRuntime().availableProcessors()),
                  "-db", database]
    print "blasting"
    blastProcess = subprocess.Popen(command,
                                      stdout = output)
    while blastProcess.poll() == None:
      if pipeline.exception:
        print "Stopping in blast"
        blastProcess.kill()
        output.close()
        raise pipeline.exception
    output.close()
    while not output.closed:
      pass
    print "blasted"
  try:
    return parseBlast(fileName)
  except SAXParseException:
    print 'Retry'
    return cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote, True)

共 (1) 个答案

  1. # 1 楼答案

    我认为这个问题是在我从对子进程调用wait切换到使用poll方法时开始的,这样我就可以在进程运行时停止进程。因为我已经得到了我处理过的许多数据集的结果,所以在我再次启动子流程之前需要一段时间,所以很难说。不管怎么说,我的猜测是,当我关闭输出时,它仍然被写入,我的解决方案是切换到管道并自己编写文件

    def cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote = False, force = False):
    
    
    """
    Performs a blast search using the blastp executable and database in blastLocation on
    the query with the eValue. The result is an XML file saved to fileName. If fileName
    already exists the search is skipped. If remote is true then the search is done remotely.
    """
      if not os.path.isfile(fileName) or force:
        output = open(fileName, "w")
        command = [blastLocation + "/bin/blastp",
                   "-evalue", str(eValue),
                   "-outfmt", "5",
                   "-query", query]
        if remote:
          command += ["-remote",
                      "-db", database]
        else:
          command += ["-num_threads", str(Runtime.getRuntime().availableProcessors()),
                      "-db", database]
        blastProcess = subprocess.Popen(command,
                                        stdout = subprocess.PIPE)
        while blastProcess.poll() == None:
          output.write(blastProcess.stdout.read())
          if pipeline.exception:
            psProcess = subprocess.Popen(["ps", "aux"], stdout = subprocess.PIPE)
            awkProcess = subprocess.Popen(["awk", "/" + " ".join(command).replace("/", "\\/") + "/"], stdin = psProcess.stdout, stdout = subprocess.PIPE)
            for line in awkProcess.stdout:
              subprocess.Popen(["kill", "-9", re.split(r"\s+", line)[1]])
            output.close()
            raise pipeline.exception
        remaining = blastProcess.stdout.read()
        while remaining:
          output.write(remaining)
          remaining = blastProcess.stdout.read()
    
        output.close()
    
      try:
        return parseBlast(fileName)
      except SAXParseException:
        return cachedBlast(fileName, blastLocation, database, eValue, query, pipeline, remote, True)