说说Process.waitfor()引起的进程阻塞问题

Java Process waitfor

字数统计: 2.3k阅读时长: 11 min

 2019/09/04

最近碰到一个看似很怪异的问题, 在两个APP上调用同样的本地指令得到的结果却大相径庭; 看源代码, 这个本地进程做的事情其实并不复杂:

从一个串口/dev/ttyUSBX读取数据
将数据写入到本地目录(读缓存大小为1KB)

本地进程的代码逻辑其实相当简单: 主线程起来后主动创建一个负责读/写的子线程, 然后通过pthread_join主动等待子线程完成后退出.

问题是, 应用A调用的时保存的日志大小雷打不动的停留在不到4M就停止了, 而应用B可以一直写数据. 看应用A调用时, 通过debuggerd -b <tid> 查看本地进程的堆栈, 大概是这样的:

process stack

说明此时本地进程一直在”卡”在写数据上了, 那到底卡在哪里了? 查看cat /proc/<pid>/wchan(也可以通过strace -p <pid>来查看目前进程所调用的系统调用), 就是本地进程的正在执行的系统调用, 发现是pipe_wait, 这个是怎么回事? 本地进程本身并不会用到pipe来进行数据的传输, 那很可能是Java父进程与本地进程之间的数据通信管道了.

回到最开始的问题, 为何两个APP调用同样的指令会有如此大的差异了? 我们再来看看应用A与应用B之间执行的代码到底有多少的差异?

应用A的调用逻辑



Process process = null;
try {
    process = Runtime.getRuntime().exec(COMMAND);
    process.waitFor();
} catch (IOException e) {
    e.printStackTrace();
} catch (InterruptedException e) {
    e.printStackTrace();
} finally {
    if(process != null) {
        process.destroy();
    }
}

应用B的调用逻辑


Process process = null;
try {
    process = Runtime.getRuntime().exec(COMMAND);
} catch (IOException e) {
    e.printStackTrace();
} catch (InterruptedException e) {
    e.printStackTrace();
} finally {
    if(process != null) {
        process.destroy();
    }
}

这么一对比, 看起来问题是出在Process.waitfor()上了, 看了网上一个类似的案例https://www.cnblogs.com/embedded-linux/p/6986525.html, 顿时觉得豁然开朗, 这个不就是我碰到问题末! 看 java.lang.Process的文档说明(这里只拿了最关键的一段话):



By default, the created process does not have its own terminal
or console.  All its standard I/O (i.e. stdin, stdout, stderr)
operations will be redirected to the parent process, where they can
be accessed via the streams obtained using the methods
{@link #getOutputStream()},
{@link #getInputStream()}, and
 {@link #getErrorStream()}.
The parent process uses these streams to feed input to and get output
from the process.  Because some native platforms only provide
limited buffer size for standard input and output streams, failure
to promptly write the input stream or read the output stream of
the process may cause the process to block, or even deadlock.

这段话的大概意思是, 通过Java创建的本地子进程本身是没有标准输入/输出以及错误流的, 这三个流都会被重定向到父进程; 父进程则通过Process.getInputStream()/getOutputStream等来获取子进程的流, 而如果父进程如果一直不读取子进程的输出流, 由于平台本身的输入输出流的缓冲大小是有限的, 此时子进程就可能阻塞, 甚至死锁(如果父进程也在等待子进程的话). 这样看起来, 问题的原因就很明显了: 应用A没有处理子进程的输出流, 且调用了Process.waitfor(), 由于本地进程一直在打印输出日志, 导致输出缓冲区满了之后发生阻塞, 而父进程并不知道子进程发生了阻塞, 一直傻傻的等.现在看来, 调用任何接口之前看看文档总是有益的, 至少在定位分析问题的时候可以少走弯路.

我们先来看下Java调用本地进程的整个处理流程, 再来看具体如何解决这个问题. 调用Runtime.exec(cmd):


      public Process exec(String prog) throws java.io.IOException {
    return exec(prog, null, null);
      }
  
  
      public Process exec(String prog, String[] envp, File directory) throws java.io.IOException {
// Sanity checks
if (prog == null) {
    throw new NullPointerException("prog == null");
} else if (prog.isEmpty()) {
    throw new IllegalArgumentException("prog is empty");
}

// Break down into tokens, as described in Java docs
StringTokenizer tokenizer = new StringTokenizer(prog);
int length = tokenizer.countTokens();
String[] progArray = new String[length];
for (int i = 0; i < length; i++) {
    progArray[i] = tokenizer.nextToken();
}

// Delegate
return exec(progArray, envp, directory);
  }
  
  
  public Process exec(String[] progArray, String[] envp, File directory) throws IOException {
      // ProcessManager is responsible for all argument checking.
      return ProcessManager.getInstance().exec(progArray, envp, directory, false);
  }

接着调用ProcessManager.getInstance().exec():


/**
 * Executes a process and returns an object representing it.
 */
public Process exec(String[] taintedCommand, String[] taintedEnvironment, File workingDirectory,
        boolean redirectErrorStream) throws IOException {
    // Make sure we throw the same exceptions as the RI.
    if (taintedCommand == null) {
        throw new NullPointerException("taintedCommand == null");
    }
    if (taintedCommand.length == 0) {
        throw new IndexOutOfBoundsException("taintedCommand.length == 0");
    }

    // Handle security and safety by copying mutable inputs and checking them.
    String[] command = taintedCommand.clone();
    String[] environment = taintedEnvironment != null ? taintedEnvironment.clone() : null;

    // Check we're not passing null Strings to the native exec.
    for (int i = 0; i < command.length; i++) {
        if (command[i] == null) {
            throw new NullPointerException("taintedCommand[" + i + "] == null");
        }
    }
    // The environment is allowed to be null or empty, but no element may be null.
    if (environment != null) {
        for (int i = 0; i < environment.length; i++) {
            if (environment[i] == null) {
                throw new NullPointerException("taintedEnvironment[" + i + "] == null");
            }
        }
    }

    FileDescriptor in = new FileDescriptor();
    FileDescriptor out = new FileDescriptor();
    FileDescriptor err = new FileDescriptor();

    String workingPath = (workingDirectory == null)
            ? null
            : workingDirectory.getPath();

    // Ensure onExit() doesn't access the process map before we add our
    // entry.
    synchronized (processReferences) {
        int pid;
        try {
            // 调用JNI方法, 创建一个子进程, 并返回对应的PID
            pid = exec(command, environment, workingPath, in, out, err, redirectErrorStream);
        } catch (IOException e) {
            IOException wrapper = new IOException("Error running exec()."
                    + " Command: " + Arrays.toString(command)
                    + " Working Directory: " + workingDirectory
                    + " Environment: " + Arrays.toString(environment));
            wrapper.initCause(e);
            throw wrapper;
        }
        ProcessImpl process = new ProcessImpl(pid, in, out, err);
        ProcessReference processReference = new ProcessReference(process, referenceQueue);
        processReferences.put(pid, processReference);

        /*
         * This will wake up the child monitor thread in case there
         * weren't previously any children to wait on.
         */
        processReferences.notifyAll();

        return process;
    }
}

在看下对应的JNI方法java_lang_ProcessManager.cpp, 看到ExecuteProcess中将子进程的输入输出以及错误流均重定向到pipe的一端, 而pipe的另一端则对应着父进程的输出输入以及错误流, 这样一看子进程所阻塞的函数pipe_wait正是因为输出流缓冲满了, 无法再继续写了(那么, 可能还有疑问? 为何本地进程一直要写pipe了, 手动输入命令调用下就知道, 这个本地进程一直变态的在打印自己写入数据的文件名到标准输出).



static pid_t ProcessManager_exec(JNIEnv* env, jclass, jobjectArray javaCommands,
	                         jobjectArray javaEnvironment, jstring javaWorkingDirectory,
	                         jobject inDescriptor, jobject outDescriptor, jobject errDescriptor,
	                         jboolean redirectErrorStream) {

  ExecStrings commands(env, javaCommands);
  ExecStrings environment(env, javaEnvironment);

  // Extract working directory string.
  const char* workingDirectory = NULL;
  if (javaWorkingDirectory != NULL) {
    workingDirectory = env->GetStringUTFChars(javaWorkingDirectory, NULL);
  }

  pid_t result = ExecuteProcess(env, commands.get(), environment.get(), workingDirectory,
	                        inDescriptor, outDescriptor, errDescriptor, redirectErrorStream);

  // Clean up working directory string.
  if (javaWorkingDirectory != NULL) {
    env->ReleaseStringUTFChars(javaWorkingDirectory, workingDirectory);
  }

  return result;
}


/** Executes a command in a child process. */
static pid_t ExecuteProcess(JNIEnv* env, char** commands, char** environment,
	                    const char* workingDirectory, jobject inDescriptor,
	                    jobject outDescriptor, jobject errDescriptor,
	                    jboolean redirectErrorStream) {

  // Create 4 pipes: stdin, stdout, stderr, and an exec() status pipe.
  int pipes[PIPE_COUNT * 2] = { -1, -1, -1, -1, -1, -1, -1, -1 };
  for (int i = 0; i < PIPE_COUNT; i++) {
    if (pipe(pipes + i * 2) == -1) {
      jniThrowIOException(env, errno);
      ClosePipes(pipes, -1);
      return -1;
    }
  }
  int stdinIn = pipes[0];
  int stdinOut = pipes[1];
  int stdoutIn = pipes[2];
  int stdoutOut = pipes[3];
  int stderrIn = pipes[4];
  int stderrOut = pipes[5];
  int statusIn = pipes[6];
  int statusOut = pipes[7];

  pid_t childPid = fork();

  // If fork() failed...
  if (childPid == -1) {
    jniThrowIOException(env, errno);
    ClosePipes(pipes, -1);
    return -1;
  }

  // If this is the child process...
  if (childPid == 0) {
    // Note: We cannot malloc(3) or free(3) after this point!
    // A thread in the parent that no longer exists in the child may have held the heap lock
    // when we forked, so an attempt to malloc(3) or free(3) would result in deadlock.

    // Replace stdin, out, and err with pipes.
    dup2(stdinIn, 0);
    dup2(stdoutOut, 1);
    if (redirectErrorStream) {
      dup2(stdoutOut, 2);
    } else {
      dup2(stderrOut, 2);
    }

    // Close all but statusOut. This saves some work in the next step.
    ClosePipes(pipes, statusOut);

    // Make statusOut automatically close if execvp() succeeds.
    fcntl(statusOut, F_SETFD, FD_CLOEXEC);

    // Close remaining unwanted open fds.
    CloseNonStandardFds(statusOut);

    // Switch to working directory.
    if (workingDirectory != NULL) {
      if (chdir(workingDirectory) == -1) {
	AbortChild(statusOut);
      }
    }

    // Set up environment.
    if (environment != NULL) {
      extern char** environ; // Standard, but not in any header file.
      environ = environment;
    }

    // Execute process. By convention, the first argument in the arg array
    // should be the command itself.
    execvp(commands[0], commands);
    AbortChild(statusOut);
  }

  // This is the parent process.

  // Close child's pipe ends.
  close(stdinIn);
  close(stdoutOut);
  close(stderrOut);
  close(statusOut);

  // Check status pipe for an error code. If execvp(2) succeeds, the other
  // end of the pipe should automatically close, in which case, we'll read
  // nothing.
  int child_errno;
  ssize_t count = TEMP_FAILURE_RETRY(read(statusIn, &child_errno, sizeof(int)));
  close(statusIn);
  if (count > 0) {
    // chdir(2) or execvp(2) in the child failed.
    // TODO: track which so we can be more specific in the detail message.
    jniThrowIOException(env, child_errno);

    close(stdoutIn);
    close(stdinOut);
    close(stderrIn);

    // Reap our zombie child right away.
    int status;
    int rc = TEMP_FAILURE_RETRY(waitpid(childPid, &status, 0));
    if (rc == -1) {
      ALOGW("waitpid on failed exec failed: %s", strerror(errno));
    }

    return -1;
  }

  // Fill in file descriptor wrappers.
  jniSetFileDescriptorOfFD(env, inDescriptor, stdoutIn);
  jniSetFileDescriptorOfFD(env, outDescriptor, stdinOut);
  jniSetFileDescriptorOfFD(env, errDescriptor, stderrIn);

  return childPid;
}

如果有兴趣还可以继续看下kernel的代码fs/pipe.c是如何实现pipe_write/pipe_read以及pipe_wait是如何发生的.

这么一看代码流程, 如何解决这个问题的思路也有了, 大致有这么几种:

直接在Java代码中去掉Process.waitfor(), 这个方法可能还是会有导致子进程阻塞的风险, 虽然不会死锁
在新的线程中读取子线程的输出流:Process.getInputStream(), 这样确保子进程不会被阻塞
直接将子进程的流全部丢弃(如果本身不感兴趣的话)
要写本地进程的人把所有这些不必要的打印全部去掉(不是好方案, 去掉日志定位问题更难)

原文作者：Jason Wang

更新日期：2022-03-16, 12:30:58

Next Post

BPF与eBPF
Previous Post

聊一聊TCP协议

CATALOG