Wednesday, October 21, 2009

MINLP Test Problems

Ignacio Grossmann and Jon Lee recently announced the CMU-IBM Cyberinfrastructure Collaborative site for MINLP.  The goal of this web site is to create a library of optimization problems in different application areas in which one or several alternative models are presented with their derivation. In addition, each model has one or several instances that can serve to test various algorithms. This effort is different from other test problem collections by requiring a description of the problem, and encouraging the contribution of alternate modeling formulations.  Thus, the actual models in this collection may be MILP or NLP formulations that simplify a nonlinear problem, including simplifications of other MINLP formulations.

As it happens, Cindy Phillips, Regan Murray and I are working on a paper that describes our work on sensor placement for water security, where we describe various MILP formulations for this nonlinear application.  I guess we should try to add our models to this repository!

Tuesday, October 20, 2009

Using easy_install to download source files

Python's setuptools package includes the easy_install script, which provides a convenient mechanism for installing a Python package from the PyPi repository.  Normally, easy_install installs a Python package in the Python site packages directory.  However, I recently discovered that easy_install can download the source for Python package.  For example, the following command downloads the Coopr optimization package into the coopr directory:

easy_install -q --editable --build-directory . Coopr

I had to browse a variety of web pages before I figured this syntax out.  Enjoy!

Wednesday, October 14, 2009

Book Recommendation: Software IP

I have been working with software-related intellectual property issues for several years now.  I finally broke down and bought a book to help me get the big picture.  The following book has been remarkably helpful:
Intellectual Property and Open Source: A practical guide to protecting code
Van Lindberg
O'Reilly, 2008
I have been quite surprised how well Lindberg describes the complex legal issues related to intellectual property law.  Lindberg is a lawyer and software developer, and he uses computer science analogies that are quite straightforward.



Monday, October 5, 2009

Applying the CBC Presolver

Here's a fun fact that I wanted to archive...  In a recent COIN-OR email exchange on coin-discuss, John Forrest suggested the following command-line for applying the CBC preprocessor to an MPS file:

     cbc xxxxx.mps -preprocess save -heuristic off -maxnode -1 -solve

This command will save the cbc-presolved model in the file presolved.mps. The -heuristic off and -maxnode -1 options make cbc stop as quickly as possible.

Friday, July 17, 2009

Using Open-Source Tools to Manage Software Quality

At PyCon 2009, Aaron Maxwell gave a presentation about the use of BuildBot to support an automated software QA infrastructure. Listening to his talk (online) made me think more carefully about the reasons I am not using BuildBot, which I took a look at several years ago.  After working with a custom automated build tool for a few years, I have recently begun using Hudson to automate software quality processes for a variety of open source software packages.  Hudson automates the following QA activities for these packages:
  • portability tests - building packages with different compilers, language versions and compute platforms
  • continuous integration - rapid builds and software tests to provide developers continuous feedback
  • integration tests - builds that test the integration of different software tools
  • archiving QA statistics - test histories, code coverage statistics, build times, etc.
  • managing third-party builds - building third-party libraries that my codes depend on
Although I am reasonably happy with Hudson, I must admit that I did not immediately decide that it was perfect for my needs the first time I looked at it.  However, as the scope of my QA needs has grown, it has become critical to have a flexible, extensible strategy for automating software QA activities. The following high-level issues have proven to be major considerations when assessing the viability of tools like BuildBot and Hudson:
  • GUI/web interface
    GUI and web interfaces are key to ensuring that developers regularly use the QA data that is being generated. Interactive interrogation of QA current data facilitates effective use of this data, and GUI interfaces are very important when developers do not all have access to the same computing platforms. These interfaces can also convey valuable QA in a concise manner, such as graphical representations of QA history (e.g test failure/successes of time).

  • Extensibility
    Any automation framework is going to need to be customized to adapt to local operating constraints. Thus, the extensibility of the automation framework is a key issue for end-users. A particularly effective strategy for supporting this flexibility is with plugin components, which are supported in Hudson.

  • Loosely Coupled QA Tools
    Hudson uses a standards-based approach for integrating QA information. QA activities can be initiated in a very generic manner, using shell commands whose scope is not restricted. If the QA information is provided in a standard format, then Hudson can integrate it into its QA summaries. For example, Hudson recognizes testing results that are documented with the xUnit XML format, and code coverage results that are documented with the Cobertura XML format. This strategy supports a loose coupling between the QA processes and the use of Hudson, which allows for the application of a heterogeneous set of QA tools, including tools for different test harnesses and programming languages!

  • Compute Resource Management
    Coordinating of a large number of QA activities requires scalable strategies for managing computing resources. Frameworks like Hudson provide basic resource management strategies, including dynamic scheduling of continuous integration builds on a build farm. More generally, scalable automated testing tools need to support strategies like fractional factorial test designs, which test many build options (configuration, platform, compiler, etc) with a small number of builds. Also, management of daemon clients also becomes an issue for large build farms (e.g. notification of exceptional events like running out of disk space).

  • Ease of Use
    It is worth restating that ease-of-use is a major factor in practice. Developers will not use QA frameworks unless they add value to the development process. Further, it can be difficult to convince an organization to support the maintenance of automated QA frameworks on a large build farm.
As a final note, the Acro Developer Resources page summarizes the QA tools that the Acro project is using with Hudson to support software development. It is noteworthy that this effort includes QA processes for both C/C++ software and Python software. On another project, we have also used Hudson to summarize tests of Matlab code.

P.S. I want to thank John Siirola for brainstorming about this blog. John has done most of the work setting up the Hudson server that we are using for Acro and related open source software development.

Summarizing gcov Coverage Statistics with gcovr

The gcovr command provides a utility for running the gcov command and summarizing code coverage results. This command is inspired by the Python coverage.py package, which provides a similar utility in Python. Further, gcovr can be viewed as a command-line alternative of the lcov utility, which runs gcov and generates an HTML output.

The gcovr command currently generates two different types of output:
  • Text Summary
    For each file that generates gcov statistics, gcovr will summarize the number of lines covered, the percentage of coverage and enumerate the lines that are not covered.
  • Cobertura XML
    An XML summary of the coverage statistics can be generated in a format that is consistent with Cobertura.
I find the text summary quite convenient for interactive assessment of coverage, especially as I design tests to improve coverage. The Cobertura summary can be used by continues build tools like Hudson. For example, see the acro-utilib coverage report that was generated with gcovr, using the Cobertura XML output option.

See the gcovr Trac page for further details about this tool. The gcovr command is currently bundled with the FAST Python package, which you can download from the FAST Trac site.  However, gcovr is a stand-alone Python script.  Thus, it is also convenient to download the latest development version here.

Videos for PyCon2009!

I just discovered that the talks at PyCon 2009 were video taped! Excellent! I had fun browsing them last night, looking for clues to the challenges that I face managing several complex Python packages.

I am not sure how the PyCon organizers justified the cost for doing this, but I think that this was an excellent idea. I would love to see other conferences adopt this idea (or at least support online publishing of electronic slides). I suspect that this would discourage some people from attending a conference. However, people like me are already traveling too much. Thus, the PyCon organizers did not lose anything with me; I was already unable to attend. Further, having access to the presentations makes me more likely to adopt the techniques/approaches that they are presenting! This sounds like a win-win to me!!

Thursday, July 2, 2009

PyUtilib Plugins

After blogging about Python plugin frameworks earlier this year, I wound up implemented a new framework in the PyUtilib software package.  The PyUtilib wiki provides a detailed description of the PyUtilib Plugin Framework, but here's a brief summary:
  • This framework is derived from the Trac plugin framework (a.k.a. component architecture)
  • Provides support for both singleton and non-singleton plugin instances
  • Includes utilities for managing plugins within namespaces
  • The core framework is defined in a single Python module, which can be used independently from PyUtilib
  • PyUtilib also includes commonly use plugins, such as
    • A config-file reader/writer based on ConfigParser
    • Loading utilities for eggs and modules
    • A file manager for temporary files
Although I initially resisted the urge to develop my own framework, I was led to develop this because (1) I wanted a light-weight framework like that provided by Trac, but (2) Trac's framework is not particularlly modular within the Trac software repository.  Also, I really needed a plugin framework that supported non-singleton plugins. Development of the PyUtilib Plugin Framework is mostly motivated by my work with Coopr, which extensively leverages plugins to support a flexible, extensible optimization tools.

Monday, January 26, 2009

Another Discussion of Python Plugins

Here's a nice discussion and comparison of Python plugin frameworks that I ran across today: Design Docs Plugins - PiTiViWiKi. This notes that a big difference between Zope and Trac plugins is that Zope defines interfaces which allows for checking interface implementation/definition, as well as facilities for plugin adapters. In this respect, the Envisage Core plugins are similar to Zope.

Python Plugin Frameworks

Various Python projects I am working on could benefit from the use of a Plug-in framework.  However, there does not appear to be a standard Python plug-in framework, though there are some mature packages that support plug-ins.

Here's a summary of my recent web research:
  • yapsy - This is a simple plug-in framework that was designed specifically to support plug-ins with no external dependencies.

  • Mary Alchin describes a simple plugin framework, with a similar goal.  His classes provide an API for the plugins, with few supporting features (e.g. searching for plugins).

  • André Roberge has a series of posts that describe the application of plugins to refactor a simple calculator application.  The goal of this is to illustrate the requirements for plugin frameworks, with the goal of identifying best practices for plugins. There are some interesting replies to this post, which consider implementations of the plugins he proposes in zope and grok. Both of these pull in quite a few libraries, which begs the question of whether it makes sense for external users to rely on these components for only plugin support.

  • Enthought's Envisage project includes a framework for building extensible, pluggable applications.  The enthough.envisage package defines these capabilities, and there is nice documentation here.

  • Trac and Zope are frameworks that incorporate plug-ins, and these capabilities may be modular enough for use in other applications. Trac's component architecture is detailed in the Trac wiki pages. Zope's plug-ins appear to be called products (see also here). Martin Aspeli describes his experience writing Trac plug-ins and contrasts them with Zope.
These resources highlight one reason why there is not a standard Python plug-in framework: there are a variety of different capabilities that a user may want, and the complexity of the framework generally increases as these new capabilities are added (e.g. security, API validation, etc).

Another item that seems clear is that while a variety of packages support plug-ins, many of them do not use them in a modular fashion.  Thus, it is difficult to reuse sophisticated plug-in frameworks without incorporating a lot of extraneous code.