Friday, July 17, 2009

Using Open-Source Tools to Manage Software Quality

At PyCon 2009, Aaron Maxwell gave a presentation about the use of BuildBot to support an automated software QA infrastructure. Listening to his talk (online) made me think more carefully about the reasons I am not using BuildBot, which I took a look at several years ago.  After working with a custom automated build tool for a few years, I have recently begun using Hudson to automate software quality processes for a variety of open source software packages.  Hudson automates the following QA activities for these packages:
  • portability tests - building packages with different compilers, language versions and compute platforms
  • continuous integration - rapid builds and software tests to provide developers continuous feedback
  • integration tests - builds that test the integration of different software tools
  • archiving QA statistics - test histories, code coverage statistics, build times, etc.
  • managing third-party builds - building third-party libraries that my codes depend on
Although I am reasonably happy with Hudson, I must admit that I did not immediately decide that it was perfect for my needs the first time I looked at it.  However, as the scope of my QA needs has grown, it has become critical to have a flexible, extensible strategy for automating software QA activities. The following high-level issues have proven to be major considerations when assessing the viability of tools like BuildBot and Hudson:
  • GUI/web interface
    GUI and web interfaces are key to ensuring that developers regularly use the QA data that is being generated. Interactive interrogation of QA current data facilitates effective use of this data, and GUI interfaces are very important when developers do not all have access to the same computing platforms. These interfaces can also convey valuable QA in a concise manner, such as graphical representations of QA history (e.g test failure/successes of time).

  • Extensibility
    Any automation framework is going to need to be customized to adapt to local operating constraints. Thus, the extensibility of the automation framework is a key issue for end-users. A particularly effective strategy for supporting this flexibility is with plugin components, which are supported in Hudson.

  • Loosely Coupled QA Tools
    Hudson uses a standards-based approach for integrating QA information. QA activities can be initiated in a very generic manner, using shell commands whose scope is not restricted. If the QA information is provided in a standard format, then Hudson can integrate it into its QA summaries. For example, Hudson recognizes testing results that are documented with the xUnit XML format, and code coverage results that are documented with the Cobertura XML format. This strategy supports a loose coupling between the QA processes and the use of Hudson, which allows for the application of a heterogeneous set of QA tools, including tools for different test harnesses and programming languages!

  • Compute Resource Management
    Coordinating of a large number of QA activities requires scalable strategies for managing computing resources. Frameworks like Hudson provide basic resource management strategies, including dynamic scheduling of continuous integration builds on a build farm. More generally, scalable automated testing tools need to support strategies like fractional factorial test designs, which test many build options (configuration, platform, compiler, etc) with a small number of builds. Also, management of daemon clients also becomes an issue for large build farms (e.g. notification of exceptional events like running out of disk space).

  • Ease of Use
    It is worth restating that ease-of-use is a major factor in practice. Developers will not use QA frameworks unless they add value to the development process. Further, it can be difficult to convince an organization to support the maintenance of automated QA frameworks on a large build farm.
As a final note, the Acro Developer Resources page summarizes the QA tools that the Acro project is using with Hudson to support software development. It is noteworthy that this effort includes QA processes for both C/C++ software and Python software. On another project, we have also used Hudson to summarize tests of Matlab code.

P.S. I want to thank John Siirola for brainstorming about this blog. John has done most of the work setting up the Hudson server that we are using for Acro and related open source software development.

Summarizing gcov Coverage Statistics with gcovr

The gcovr command provides a utility for running the gcov command and summarizing code coverage results. This command is inspired by the Python coverage.py package, which provides a similar utility in Python. Further, gcovr can be viewed as a command-line alternative of the lcov utility, which runs gcov and generates an HTML output.

The gcovr command currently generates two different types of output:
  • Text Summary
    For each file that generates gcov statistics, gcovr will summarize the number of lines covered, the percentage of coverage and enumerate the lines that are not covered.
  • Cobertura XML
    An XML summary of the coverage statistics can be generated in a format that is consistent with Cobertura.
I find the text summary quite convenient for interactive assessment of coverage, especially as I design tests to improve coverage. The Cobertura summary can be used by continues build tools like Hudson. For example, see the acro-utilib coverage report that was generated with gcovr, using the Cobertura XML output option.

See the gcovr Trac page for further details about this tool. The gcovr command is currently bundled with the FAST Python package, which you can download from the FAST Trac site.  However, gcovr is a stand-alone Python script.  Thus, it is also convenient to download the latest development version here.

Videos for PyCon2009!

I just discovered that the talks at PyCon 2009 were video taped! Excellent! I had fun browsing them last night, looking for clues to the challenges that I face managing several complex Python packages.

I am not sure how the PyCon organizers justified the cost for doing this, but I think that this was an excellent idea. I would love to see other conferences adopt this idea (or at least support online publishing of electronic slides). I suspect that this would discourage some people from attending a conference. However, people like me are already traveling too much. Thus, the PyCon organizers did not lose anything with me; I was already unable to attend. Further, having access to the presentations makes me more likely to adopt the techniques/approaches that they are presenting! This sounds like a win-win to me!!

Thursday, July 2, 2009

PyUtilib Plugins

After blogging about Python plugin frameworks earlier this year, I wound up implemented a new framework in the PyUtilib software package.  The PyUtilib wiki provides a detailed description of the PyUtilib Plugin Framework, but here's a brief summary:
  • This framework is derived from the Trac plugin framework (a.k.a. component architecture)
  • Provides support for both singleton and non-singleton plugin instances
  • Includes utilities for managing plugins within namespaces
  • The core framework is defined in a single Python module, which can be used independently from PyUtilib
  • PyUtilib also includes commonly use plugins, such as
    • A config-file reader/writer based on ConfigParser
    • Loading utilities for eggs and modules
    • A file manager for temporary files
Although I initially resisted the urge to develop my own framework, I was led to develop this because (1) I wanted a light-weight framework like that provided by Trac, but (2) Trac's framework is not particularlly modular within the Trac software repository.  Also, I really needed a plugin framework that supported non-singleton plugins. Development of the PyUtilib Plugin Framework is mostly motivated by my work with Coopr, which extensively leverages plugins to support a flexible, extensible optimization tools.