Friday, January 20, 2012

Testing Open Source Software

Software testing is widely recognized as a best practice for software development. Software tests define expected functionality, and they can focus developer efforts by providing an objective assessment of the state of a software project. Additionally, software testing data can provide evidence that a software package can be reliably used. For example, when evaluating whether to try out open source software, I routinely look for software testing data to confirm which platforms the software will run on, the versions of associated software that is used, and test coverage statistics that indicate how much of the the code is tested.

Unfortunately, most open source software projects do not publish software test data.  I suspects that this indicates that a small fraction of OSS projects have robust test suites.  However, this also reflects another aspect of the OSS community:  hosting facilities for open source software do not support web-based testing facilities, like Jenkins, that can be used by developers to remotely launch jobs on test machines with a variety of different configurations. This is not totally unexpected, since testing can be computationally intensive.

Recently, I learned about CloudBees, which provides cloud services for building, running and managing Java applications. Happily CloudBees makes its Dev@cloud service freely available to open source projects! This includes the Jenkins testing service, which provides a limited number of CPU hours each month for testing an OSS project.  For example, the CxxTest project now hosts tests on a CloudBees Jenkins server.  Cool!

Sunday, January 8, 2012

A Different Model for Writing Blog Posts


This is a blog that I have been meaning to write for some time.  I occasionally take a look at the download statistics for this blog, and recently I was prompted to do this by other bloggers who were reporting their end-of-year statistics (e.g. see Laura McLay’s review of the Punk Rock OR blog).

Unlike Laura, I do not have impressive download statistics to report about the many blogs I have written in 2011; frankly, I did not create many posts. However, an interesting pattern has emerged regarding this blog’s readership:  there are a few key blog posts that are frequently downloaded.  For example, my most frequently downloaded blog post is a survey of Python plugin software, which I wrote in 2009.  I suspect that other bloggers have seen the same thing; they have a few posts that are very popular because people do web searches on that topic.  However, it is worth stepping back and thinking about the implications of this when writing a blog.

When I first started blogging, I imagined that readers would view my blog the way that I view Laura’s blog.  They would use a RSS feeder to collect and view blog updates.  These would be read shortly after they were published, and afterwards they might be used as a reference.  This led me to create blogs that referenced each other as part of a larger conversation on a topic.  For example, after blogging about Python Plugin Frameworks, I had several follow-up blogs, including a brief description of PyUtilib Plugins that I had developed.

However, I have realized that my blogs are more likely to be found through internet searches focused on a topic.  Consequently, the Python Plugin Frameworks post gets frequently read while the PyUtilib Plugins post rarely gets read.  Readers are finding my blog posts after searching for “python plugins”. The narrower topic covered by the PyUtilib Plugins post is not frequently referenced on the internet, and consequently it is not strongly associated with the more general topic of Python plugins; for example, I did not see it in the first three pages of a google search for “python plugin”.

This suggests a different model for writing blog posts that has already begun to affect my blogging.  Since blog posts are individual artifacts that may have enduring value to readers, updating a blog post with new content makes more sense than creating a new post that continues the previous discussion.  For example, I’ve updated the Python PluginFrameworks post to include references to PyUtilib’s plugins.  This may confuse readers of RSS feeds, and I do not know that RSS feeds will automatically update their feed to capture updates like this. I would assume not.  However, this is clearly a strategy that will enhance the long-term impact of a blog post on a specific topic.

Saturday, January 7, 2012

The Pyomo Book is Coming Soon

The Python Optimization Modeling Objects (Pyomo) package is an open source tool for modeling optimization applications in Python. Pyomo can be used to define symbolic problems, create concrete problem instances, and solve these instances with standard solvers. Pyomo provides a capability that is commonly associated with algebraic modeling languages such as AMPL, AIMMS, and GAMS, but Pyomo's modeling objects are embedded within a full-featured high-level programming language with a rich set of supporting libraries. Pyomo leverages the capabilities of the Coopr software library, which integrates Python packages for defining optimizers, modeling optimization applications, and managing computational experiments.

Of course, there is very little online documentation describing Pyomo.  However, the first book on Pyomo is set to be published in February!
Pyomo - Optimization Modeling in Python. William E. Hart, Carl Laird, Jean-Paul Watson and David L. Woodruff. Springer, 2012.
Here are some links if you want to learn more:
Enjoy!

A Pythonic C++ Parser

If you google for "python C++ parser", you will find a variety of internet discussions related to parsing C++ in Python.  C++ cannot be parsed by a LALR parser and it is well-known that parsing C++ is a nontrivial task.  Thus, these discussions generally fall into one of several categories:
  1. It is too hard to parse C++ in Python, so use a package like GCC_XML that does this for you.  If you really need to do something in Python, write a wrapper to GCC_XML.
  2. It is too hard to perform a complete parse of C++ in Python, but we can use a LALR parser to collect gross structural information from C++ files.  The CppHeaderParser is an example of this type of package, which uses the ply parser to collect information about classes in header files.
In the recent release of CxxTest, I included a LALR C++ parser that is similar to CppHeaderParser. CxxTest is a unit testing framework for C++ that is similar in spirit to JUnit, CppUnit, and xUnit. CxxTest is easy to use because it does not require precompiling a CxxTest testing library, it employs no advanced features of C++ (e.g. RTTI) and it supports a very flexible form of test discovery.

CxxTest performs test discovery by searching C++ header files for CxxTest test classes. The default process for test discovery is a simple process that analyzes each line in a header file sequentially, looking for a sequence of lines that represent class definitions and test method definitions.

I added a new test discovery mechanism in CxxTest 4.0 that is based on the a parser for the Flexible Object Generator (FOG) language, which is a superset of C++. The grammar for the FOG language was adapted to parse C++ header files to identify class definitions and class inheritance relationships, class and namespace nesting of declarations, and class methods. This allows CxxTest to identify test classes that are defined with complex inheritance relationships.

As I noted earlier, the CxxTest FOG parser is similar to the parser in CppHeaderParser.  Based on my limited knowledge of CppHeaderParser, here are some points of contrast between these two capabilities:

  1. The FOG parser is embedded in CxxTest, while the CppHeaderParser is a stand-alone package.  Although I implemented the FOG parser as a separate component in CxxTest, I did not have specific design requirements that led me to make this a separate package.  (Interested parties should give me a buzz...)
  2. The FOG parser is a specifically focused on the features required by CxxTest, and thus it does not parse out much of the information that CppHeaderParser provides (return values, argument types, etc).
  3. The FOG parser was specifically designed to capture class inheritance relationships.  It is not clear to me that the CppHeaderParser does this.
  4. The FOG parser is based on a superset of C++.  Thus, it can robustly parse C++ method and function definitions.  The examples provided by CppHeaderParser suggest that it can parser function and method declarations, but not headers that include their definitions.  (Of course, the FOG parser ignores these definitions, but that's the point.  The parser can do that.)
  5. The FOG parser has been tested on a large set of C and C++ test files that are used to test the ELSA compiler.  This is a much more extensive test suite than is used to develop CppHeaderParser.
The point of this comparison is that the FOG parser may be of interest for other C++ parsing applications.  It has not been developed for general use, but it could easily be adapted to provide a more general capability.