Skip to main content

SBOMs for Scientific Software: Python

This is the first of a series of posts exploring the feasibility of generating Software Bills of Materials (SBOMs) for complex scientific software.

SBOMs provide a list of the components, libraries, and modules that are required to build a piece of software. The United States 2021 Executive Order on Cybersecurity highlights the role of SBOMs to support risk assessments for newly discovered vulnerabilities.  Further, the U.S. National Institute of Standards and Technology (NIST) released its Secure Software Development Framework, which requires SBOM information to be available for software.

Both open source and commercial software are impacted by these policies.  Consequently, developers of scientific software should expect that the use of their software may be restricted in some contexts unless accurate SBOMs can be generated. 

The past few years has seen an industry-wide effort to embrace SBOMs and other software security practices highlighted by the U.S. government (more here).  Three standard formats have emerged: SPDX, CycloneDX and SWID. Below, we show that mature tools exist to generate SPDX and CycloneDX SBOMs. We first consider a simple python repository that includes different types of software dependencies.  Next, we consider the generation of SBOMs for pyomo, a complex python package that supports the analysis of structured optimization problems.

Key points:
  1. Existing tools can easily generate SBOMs for simple python packages.

  2. Developers should be clear about the distinction between required and optional dependencies, since optional dependencies may not be captured in SBOMs.

  3. It is unclear how to capture dependencies on cython and compiled software extensions in complex python packages like pyomo.

Automatically Generating SBOMs

A variety of software packages support the automatic generation of SBOMs for python software packages. My goal here is not to recommend on particular tool (see a recommendation for syft here).  However, the following tools are frequently referenced in blogs:
  • cyclonedx-python
    • Generates SBOMs in CyclonDX format.
    • Can generate SBOMs from pip, poetry and the python environment.
  • sbom4python
    • Generates SBOMs in CyclonDX and SPDX formats.
    • Can generate SBOMs from an installed python module or the python environment.
  • syft
    • Generates SBOMs in CyclonDX and SPDX formats.
    • Generates SBOMs for container images, filesystems, archives, and more to discover packages and libraries.

  • cdxgen
    • Generates SBOMs in CyclonDX format.
    • Can generate SBOMs for a variety of programming languages and execution platforms.
Note that OWASP has developed a component verification standard for SBOMs. I have not found a detailed comparison of the python SBOM generators with respect to a standard like this. However, this standard provides context for critiquing SBOMs.

Generating SBOMs for Python Packages

A Simple Example

Consider the examplepythonsbom package.  This package has the following directory structure:

The poetry configuration file, pyproject.toml, specifies the following package dependencies:

  • numpy - required
  • scipy - optional
  • pytest - required for development
I installed examplepythonsbom with poetry and pip, and I used cyclonedx-python and sbom4python to generate SBOMs.  I tested the following configurations:
  1. I installed examplepythonsbom with poetry, and I used cyclonedx-py:

    cyclonedx-py poetry -o sboms/cyclonedx_poetry_sbom.json

    The SBOM file documents the numpy and pytest components, and it shows these both as dependencies of examplepythonsbom.

  2. I installed examplepythonsbom with pip, and I used cyclonedx-py to generate an SBOM of the environment:

    cyclonedx-py environment -o cyclonedx_env_sbom.json

    The SBOM file documents all installed python packages, including examplepythonsbom. The dependencies of examplepythonsbom were documented as numpy and scipy.

  3. I installed examplepythonsbom with pip, and I used sbom4python to generate an SBOM:

    sbom4python --module examplepythonsbom --sbom cyclonedx --format json -o sbom4python_module_sbom.json

    The SBOM file documents the examplepythonsbom and numpy components, and it shows numpy as the only dependencies for examplepythonsbom.
These examples illustrate that existing tools can easily be applied to generate SBOMs.  However, the differences in these results do not inspire confidence. A deeper dive suggests that using pip to install with poetry leads to different documentation of dependencies within the installation than when installing directly with poetry itself.  These differences are reflected in the SBOMs.

Additionally, it is clear that differences will arise between SBOMs generated by different tools. Thus, developers need to critique of SBOMs to confirm that they match their expectations. For example, I was surprised to see that scipy was omitted as a dependency in several of the SBOMs. The toml configuration file specifies scipy as an optional dependency, and thus an SBOM may omit this dependency.

A Complex Real-World Example

Next, consider the pyomo optimization modeling software. Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models. Pyomo is used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers.

Pyomo includes optional dependencies to a large number of python packages and external optimization solvers. However, pyomo only requires the installation of the ply package. Pyomo uses a traditional setup.py configuration file that supports installation with pip.

After installing pyomo, I generated SBOMs using cyclonedx-py and sbom4python as shown above. These SBOMs correctly documented the ply dependency, but cyclonedx-py also documented a dependency on pyyaml.

However, there are two general apparent limitations of these SBOMS that may naturally arise in other scientific software packages:
  1. Pyomo supports installations using cython, but it is not clear how information about the compiled cython extensions could be reflected in an SBOM.  These would require, for example, information about the compiler and compilation options.

  2. Pyomo can execute external optimization solvers that are separate executables. In some cases, these solvers can be installed via python packages. But in many cases they will not, and it is not obvious how to augment the generation of an SBOM to denote this dependency.

Comments

Popular posts from this blog

Python Plugin Frameworks

Using AsciiDoc for Mathematical Publications

A Different Model for Writing Blog Posts