rules/python: Rework built-in coveragepy support

TLATER · bradb423 · TLATER · commit 72a0f9ed13e4 · 2022-02-02T16:27:54.000Z
[Coveragepy recently gained support for lcov output][nedbat/coveragepy#1289], which allows implementing full support for python coverage without relying on a downstream fork of that project Coveragepy actually must be invoked twice; One generating a `.coverage` database file, the other exporting the data. This means that it is incompatible with the old implementation. The fork of coveragepy previously intended for use with bazel circumvented this by just changing how coveragepy works, and never outputting that database - just outputting the lcov directly instead. If we'd like to use upstream coveragepy, this is of course not possible. The stub_template seems to be written with the idea of supporting other coverage tooling in mind, however it still hard-codes arguments specific to coveragepy. Instead, we think it makes sense to properly support one of them for now, and to rethink a more generic interface later - it will probably take specific scripting for each implementation of coverage in python anyway. As such, this patch rewrites the python stub template to fully support upstream coveragepy as a coverage tool, and reworks some of the logic around invoking python to do so more cleanly. Additional notes: - Python coverage will only work with Python 3.7+ with upstream coveragepy, since the first release with lcov support does not support earlier Python versions - this is unfortunate, but there is not much we can do downstream short of forking to resolve that. The stub template itself should still work with Python 2.4+. - Comments in the code claim to use `os.execv` for performance reasons. There may be a small overhead to `subprocess.call`, but it shouldn't be too impactful, especially considering the overhead in logic (written in Python) this involves - if this is indeed for performance reasons, this is probably a somewhat premature optimization. A colleauge helped dig through some history, finding 3bed4af as the source of this - if that commit is to believed, this is actually to resolve issues with signal handling, however that seems odd as well, since this calls arbitrary python applications, which in turn may use subprocesses again as well, and therefore break what that commit seems to attempt to fix. It's completely opaque to me why we put so much effort into trying to ensure we use `os.execv`. I've replicated the behavior and comments assuming it was correct previously, but the patch probably shouldn't land as-is - the comment explaining the use of `os.execv` is most likely misleading. --- [nedbat/coveragepy#1289]: nedbat/coveragepy#1289 Co-authored-by: Bradley Burns <bradley.burns@codethink.co.uk>
diff --git a/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt b/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt
@@ -281,6 +281,71 @@ def Deduplicate(items):
           seen.add(it)
           yield it
 
+def ExecuteFile(python_program, main_filename, args, env, module_space,
+                coverage_tool=None, workspace=None):
+  """Executes the given python file using the various environment settings.
+
+  This will not return, and acts much like os.execv, except is much
+  more restricted, and handles bazel-related edge cases.
+
+  Args:
+    python_program: Path to the python binary to use for execution
+    main_filename: The python file to execute
+    args: Additional args to pass to the python file
+    env: A dict of environment variables to set for the execution
+    module_space: The module space/runfiles tree
+    coverage_tool: The coverage tool to execute with
+    workspace: The workspace to execute in. This is expected to be a
+               directory under the runfiles tree, and will recursively
+               delete the runfiles directory if set.
+  """
+  # We want to use os.execv instead of subprocess.call, which causes
+  # problems with signal passing (making it difficult to kill
+  # bazel). However, these conditions force us to run via
+  # subprocess.call instead:
+  #
+  # - On Windows, os.execv doesn't handle arguments with spaces
+  #   correctly, and it actually starts a subprocess just like
+  #   subprocess.call.
+  # - When running in a workspace (i.e., if we're running from a zip),
+  #   we need to clean up the workspace after the process finishes so
+  #   control must return here.
+  # - If we may need to emit a host config warning after execution, we
+  #   can't execv because we need control to return here. This only
+  #   happens for targets built in the host config.
+  # - For coverage targets, at least coveragepy requires running in
+  #   two invocations, which also requires control to return here.
+  #
+  if not (IsWindows() or workspace or %enable_host_version_warning% or coverage_tool):
+    os.environ.update(env)
+    os.execv(python_program, [python_program, main_filename] + args)
+
+  if coverage_tool is not None:
+    # Coveragepy wants to frst create a .coverage database file, from
+    # which we can then export lcov.
+    subprocess.call(
+      [python_program, coverage_tool, "run", "--append", "--branch", main_filename] + args,
+      env=env,
+      cwd=workspace
+    )
+    output_filename = os.environ.get('COVERAGE_DIR') + '/pylcov.dat'
+    ret_code = subprocess.call(
+      [python_program, coverage_tool, "lcov", "-o", output_filename] + args,
+      env=env,
+      cwd=workspace
+    )
+  else:
+    ret_code = subprocess.call(
+      [python_program, main_filename] + args,
+      env=env,
+      cwd=workspace
+    )
+
+  if workspace:
+    shutil.rmtree(os.path.dirname(module_space), True)
+  MaybeEmitHostVersionWarning(ret_code)
+  sys.exit(ret_code)
+
 def Main():
   args = sys.argv[1:]
 
@@ -332,54 +397,51 @@ def Main():
   if python_program is None:
     raise AssertionError('Could not find python binary: ' + PYTHON_BINARY)
 
-  cov_tool = os.environ.get('PYTHON_COVERAGE')
-  if cov_tool:
-    # Inhibit infinite recursion:
-    del os.environ['PYTHON_COVERAGE']
+  # COVERAGE_DIR is set iff the instrumentation is configured for the
+  # file and coverage is enabled.
+  if os.environ.get('COVERAGE_DIR'):
+    if 'PYTHON_COVERAGE' in os.environ:
+      cov_tool = os.environ.get('PYTHON_COVERAGE')
+    else:
+      raise EnvironmentError(
+        'No python coverage tool set, '
+        'set PYTHON_COVERAGE '
+        'to configure the coverage tool'
+      )
+
     if not os.path.exists(cov_tool):
       raise EnvironmentError('Python coverage tool %s not found.' % cov_tool)
-    args = [python_program, cov_tool, 'run', '-a', '--branch', main_filename] + args
+
     # coverage library expects sys.path[0] to contain the library, and replaces
     # it with the directory of the program it starts. Our actual sys.path[0] is
     # the runfiles directory, which must not be replaced.
     # CoverageScript.do_execute() undoes this sys.path[0] setting.
     #
     # Update sys.path such that python finds the coverage package. The coverage
     # entry point is coverage.coverage_main, so we need to do twice the dirname.
-    new_env['PYTHONPATH'] = \
-        new_env['PYTHONPATH'] + ':' + os.path.dirname(os.path.dirname(cov_tool))
-    new_env['PYTHON_LCOV_FILE'] = os.environ.get('COVERAGE_DIR') + '/pylcov.dat'
+    new_env['PYTHONPATH'] = (
+      new_env['PYTHONPATH'] + ':' + os.path.dirname(os.path.dirname(cov_tool))
+    )
   else:
-    args = [python_program, main_filename] + args
+    cov_tool = None
 
-  os.environ.update(new_env)
+  new_env.update((key, val) for key, val in os.environ.items() if key not in new_env)
+
+  workspace = None
+  if IsRunningFromZip():
+    # If RUN_UNDER_RUNFILES equals 1, it means we need to
+    # change directory to the right runfiles directory.
+    # (So that the data files are accessible)
+    if os.environ.get('RUN_UNDER_RUNFILES') == '1':
+      workspace = os.path.join(module_space, '%workspace_name%')
 
   try:
     sys.stdout.flush()
-    if IsRunningFromZip():
-      # If RUN_UNDER_RUNFILES equals 1, it means we need to
-      # change directory to the right runfiles directory.
-      # (So that the data files are accessible)
-      if os.environ.get('RUN_UNDER_RUNFILES') == '1':
-        os.chdir(os.path.join(module_space, '%workspace_name%'))
-      ret_code = subprocess.call(args)
-      shutil.rmtree(os.path.dirname(module_space), True)
-      MaybeEmitHostVersionWarning(ret_code)
-      sys.exit(ret_code)
-    else:
-      # On Windows, os.execv doesn't handle arguments with spaces correctly,
-      # and it actually starts a subprocess just like subprocess.call.
-      #
-      # If we may need to emit a host config warning after execution, don't
-      # execv because we need control to return here. This only happens for
-      # targets built in the host config, so other targets still get to take
-      # advantage of the performance benefits of execv.
-      if IsWindows() or %enable_host_version_warning%:
-        ret_code = subprocess.call(args)
-        MaybeEmitHostVersionWarning(ret_code)
-        sys.exit(ret_code)
-      else:
-        os.execv(args[0], args)
+    ExecuteFile(
+      python_program, main_filename, args, new_env, module_space,
+      cov_tool, workspace
+    )
+
   except EnvironmentError:
     # This works from Python 2.4 all the way to 3.x.
     e = sys.exc_info()[1]