从java调用python脚本并按编写的方式逐行收集输出

2024-10-02 14:19:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在用Java(8.0.191-oracle)集成Python(3.6.7)模块。如果有关系的话,我正在Ubuntu 18.04上使用IntelliJ IDEA。Python代码是作为批处理作业编写的,它使用print函数将详细的日志信息打印到标准输出中。日志信息当前包含时间戳,以便在批处理作业开始变慢时执行一些轻微的“事后分析”。但是,日志应该由调用Java程序执行,并与来自系统其他部分的日志交错。因此,我需要读取Python进程的standard out(和err),并将其重定向到调用Java类的standard out(和err)

为了说明这种情况和我遇到的问题,我编写了以下示例(注意,这里的时间戳只是为了开发和说明问题)。我有一个Python文件cr_test.py,它只输出1000行带时间戳的文本,间隔为0.01秒:

from datetime import datetime
import time

for i in range(1, 1000):
    print(f"[PyTime: {datetime.now().strftime('%H:%M:%S.%f')[:-3]}]This is message no {i} from Python.")
    time.sleep(0.01)

print("Python test complete!")

运行时,将生成:

[PyTime:13:42:10.486] This is message no 1 from Python.
[PyTime:13:42:10.496] This is message no 2 from Python.
[PyTime:13:42:10.507] This is message no 3 from Python.
[PyTime:13:42:10.517] This is message no 4 from Python.
[PyTime:13:42:10.527] This is message no 5 from Python.
[PyTime:13:42:10.537] This is message no 6 from Python.
[PyTime:13:42:10.548] This is message no 7 from Python.
[PyTime:13:42:10.558] This is message no 8 from Python.
[PyTime:13:42:10.568] This is message no 9 from Python.
[PyTime:13:42:10.578] This is message no 10 from Python.
[PyTime:13:42:10.589] This is message no 11 from Python.
...

等等

我还有一个Java类,它简单地使用ProcessBuilder启动Python脚本,并创建一个线程来侦听Python进程的standard out(在示例中忽略standard err):

package com.company;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.text.SimpleDateFormat;
import java.util.Date;

public class Main {

    public static void main(String[] args) {
        SimpleDateFormat sdfDate = new SimpleDateFormat("HH:mm:ss.SSS");

        ProcessBuilder builder = new ProcessBuilder("python3", "cr_test.py");
        try {
            final Process process = builder.start();
            Thread outThread = new Thread(() -> {
                try (BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()), 4096)) {

                    String line = null;

                    while ((line = reader.readLine()) != null) {
                        System.out.println("[JavaTime:"+sdfDate.format(new Date())+"] "+line);
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            });

            outThread.start();

            int exitCode = -1;
            try {
                exitCode = process.waitFor();
            } catch (Exception e) {
                e.printStackTrace();
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

这类方法可以工作,但有一个重要的缺陷:当从Java调用Python进程时,Python的输出不是逐行读取并在到达时打印出来的,而是似乎有一个延迟或缓冲产生以下类型的输出(看看Java和Python的时间戳,我省略了大多数行来简单说明“大块”出现的位置:

[JavaTime:13:43:28.629] [PyTime:13:43:27.163] This is message no 1 from Python.
[JavaTime:13:43:28.630] [PyTime:13:43:27.173] This is message no 2 from Python.
[JavaTime:13:43:28.631] [PyTime:13:43:27.183] This is message no 3 from Python.
[JavaTime:13:43:28.631] [PyTime:13:43:27.193] This is message no 4 from Python.
[JavaTime:13:43:28.631] [PyTime:13:43:27.203] This is message no 5 from Python.
[JavaTime:13:43:28.631] [PyTime:13:43:27.213] This is message no 6 from Python.
[JavaTime:13:43:28.631] [PyTime:13:43:27.223] This is message no 7 from Python.
[JavaTime:13:43:28.631] [PyTime:13:43:27.234] This is message no 8 from Python.
[JavaTime:13:43:28.632] [PyTime:13:43:27.244] This is message no 9 from Python.
[JavaTime:13:43:28.632] [PyTime:13:43:27.254] This is message no 10 from Python.
[JavaTime:13:43:28.632] [PyTime:13:43:27.264] This is message no 11 from Python.
 ...
[JavaTime:13:43:28.637] [PyTime:13:43:28.576] This is message no 139 from Python.
[JavaTime:13:43:28.637] [PyTime:13:43:28.586] This is message no 140 from Python.
[JavaTime:13:43:28.637] [PyTime:13:43:28.597] This is message no 141 from Python.
[JavaTime:13:43:28.637] [PyTime:13:43:28.607] This is message no 142 from Python.
[JavaTime:13:43:28.637] [PyTime:13:43:28.617] This is message no 143 from Python.
[JavaTime:13:43:30.092] [PyTime:13:43:28.627] This is message no 144 from Python.
[JavaTime:13:43:30.092] [PyTime:13:43:28.638] This is message no 145 from Python.
[JavaTime:13:43:30.092] [PyTime:13:43:28.648] This is message no 146 from Python.
[JavaTime:13:43:30.093] [PyTime:13:43:28.658] This is message no 147 from Python.
[JavaTime:13:43:30.093] [PyTime:13:43:28.668] This is message no 148 from Python.
...
[JavaTime:13:43:30.099] [PyTime:13:43:30.040] This is message no 281 from Python.
[JavaTime:13:43:30.099] [PyTime:13:43:30.050] This is message no 282 from Python.
[JavaTime:13:43:30.099] [PyTime:13:43:30.060] This is message no 283 from Python.
[JavaTime:13:43:30.099] [PyTime:13:43:30.071] This is message no 284 from Python.
[JavaTime:13:43:30.099] [PyTime:13:43:30.081] This is message no 285 from Python.
[JavaTime:13:43:31.556] [PyTime:13:43:30.091] This is message no 286 from Python.
[JavaTime:13:43:31.556] [PyTime:13:43:30.101] This is message no 287 from Python.
[JavaTime:13:43:31.556] [PyTime:13:43:30.112] This is message no 288 from Python.
[JavaTime:13:43:31.557] [PyTime:13:43:30.122] This is message no 289 from Python.
[JavaTime:13:43:31.557] [PyTime:13:43:30.132] This is message no 290 from Python.
...

也就是说,Python的输出几乎每1.5秒分块输出一次。如果我将Python中的间隔改为每0.03秒打印一次,我得到的块基本上是相同的,只是Java端的间隔更长,所以这似乎不是时间问题,而是数据量的问题。嗯,算是吧。具体地说,似乎每次有142行左右的代码被Java发布用于打印,但是更改行的长度似乎没有效果

我尝试过在每次调用readLine之后刷新缓冲区,我尝试过改变缓冲区的大小。两者似乎都不影响问题

谁能解释一下这里发生了什么,以及是否有一种方法让Java在打印Python中的行时打印它们?或者至少有人能告诉我问题的可能来源吗?是Java问题吗?Python问题?虚拟机问题?系统问题?这种类型的编程不是我的专长,所以我可能错过了一些明显的东西


Tags: nofromimportmessagenewis时间java