Wednesday, October 21, 2009

MINLP Test Problems

Ignacio Grossmann and Jon Lee recently announced the CMU-IBM Cyberinfrastructure Collaborative site for MINLP.  The goal of this web site is to create a library of optimization problems in different application areas in which one or several alternative models are presented with their derivation. In addition, each model has one or several instances that can serve to test various algorithms. This effort is different from other test problem collections by requiring a description of the problem, and encouraging the contribution of alternate modeling formulations.  Thus, the actual models in this collection may be MILP or NLP formulations that simplify a nonlinear problem, including simplifications of other MINLP formulations.

As it happens, Cindy Phillips, Regan Murray and I are working on a paper that describes our work on sensor placement for water security, where we describe various MILP formulations for this nonlinear application.  I guess we should try to add our models to this repository!

Tuesday, October 20, 2009

Using easy_install to download source files

Python's setuptools package includes the easy_install script, which provides a convenient mechanism for installing a Python package from the PyPi repository.  Normally, easy_install installs a Python package in the Python site packages directory.  However, I recently discovered that easy_install can download the source for Python package.  For example, the following command downloads the Coopr optimization package into the coopr directory:

easy_install -q --editable --build-directory . Coopr

I had to browse a variety of web pages before I figured this syntax out.  Enjoy!

Wednesday, October 14, 2009

Book Recommendation: Software IP

I have been working with software-related intellectual property issues for several years now.  I finally broke down and bought a book to help me get the big picture.  The following book has been remarkably helpful:
Intellectual Property and Open Source: A practical guide to protecting code
Van Lindberg
O'Reilly, 2008
I have been quite surprised how well Lindberg describes the complex legal issues related to intellectual property law.  Lindberg is a lawyer and software developer, and he uses computer science analogies that are quite straightforward.



Monday, October 5, 2009

Applying the CBC Presolver

Here's a fun fact that I wanted to archive...  In a recent COIN-OR email exchange on coin-discuss, John Forrest suggested the following command-line for applying the CBC preprocessor to an MPS file:

     cbc xxxxx.mps -preprocess save -heuristic off -maxnode -1 -solve

This command will save the cbc-presolved model in the file presolved.mps. The -heuristic off and -maxnode -1 options make cbc stop as quickly as possible.

Friday, July 17, 2009

Using Open-Source Tools to Manage Software Quality

At PyCon 2009, Aaron Maxwell gave a presentation about the use of BuildBot to support an automated software QA infrastructure. Listening to his talk (online) made me think more carefully about the reasons I am not using BuildBot, which I took a look at several years ago.  After working with a custom automated build tool for a few years, I have recently begun using Hudson to automate software quality processes for a variety of open source software packages.  Hudson automates the following QA activities for these packages:
  • portability tests - building packages with different compilers, language versions and compute platforms
  • continuous integration - rapid builds and software tests to provide developers continuous feedback
  • integration tests - builds that test the integration of different software tools
  • archiving QA statistics - test histories, code coverage statistics, build times, etc.
  • managing third-party builds - building third-party libraries that my codes depend on
Although I am reasonably happy with Hudson, I must admit that I did not immediately decide that it was perfect for my needs the first time I looked at it.  However, as the scope of my QA needs has grown, it has become critical to have a flexible, extensible strategy for automating software QA activities. The following high-level issues have proven to be major considerations when assessing the viability of tools like BuildBot and Hudson:
  • GUI/web interface
    GUI and web interfaces are key to ensuring that developers regularly use the QA data that is being generated. Interactive interrogation of QA current data facilitates effective use of this data, and GUI interfaces are very important when developers do not all have access to the same computing platforms. These interfaces can also convey valuable QA in a concise manner, such as graphical representations of QA history (e.g test failure/successes of time).

  • Extensibility
    Any automation framework is going to need to be customized to adapt to local operating constraints. Thus, the extensibility of the automation framework is a key issue for end-users. A particularly effective strategy for supporting this flexibility is with plugin components, which are supported in Hudson.

  • Loosely Coupled QA Tools
    Hudson uses a standards-based approach for integrating QA information. QA activities can be initiated in a very generic manner, using shell commands whose scope is not restricted. If the QA information is provided in a standard format, then Hudson can integrate it into its QA summaries. For example, Hudson recognizes testing results that are documented with the xUnit XML format, and code coverage results that are documented with the Cobertura XML format. This strategy supports a loose coupling between the QA processes and the use of Hudson, which allows for the application of a heterogeneous set of QA tools, including tools for different test harnesses and programming languages!

  • Compute Resource Management
    Coordinating of a large number of QA activities requires scalable strategies for managing computing resources. Frameworks like Hudson provide basic resource management strategies, including dynamic scheduling of continuous integration builds on a build farm. More generally, scalable automated testing tools need to support strategies like fractional factorial test designs, which test many build options (configuration, platform, compiler, etc) with a small number of builds. Also, management of daemon clients also becomes an issue for large build farms (e.g. notification of exceptional events like running out of disk space).

  • Ease of Use
    It is worth restating that ease-of-use is a major factor in practice. Developers will not use QA frameworks unless they add value to the development process. Further, it can be difficult to convince an organization to support the maintenance of automated QA frameworks on a large build farm.
As a final note, the Acro Developer Resources page summarizes the QA tools that the Acro project is using with Hudson to support software development. It is noteworthy that this effort includes QA processes for both C/C++ software and Python software. On another project, we have also used Hudson to summarize tests of Matlab code.

P.S. I want to thank John Siirola for brainstorming about this blog. John has done most of the work setting up the Hudson server that we are using for Acro and related open source software development.

Summarizing gcov Coverage Statistics with gcovr

The gcovr command provides a utility for running the gcov command and summarizing code coverage results. This command is inspired by the Python coverage.py package, which provides a similar utility in Python. Further, gcovr can be viewed as a command-line alternative of the lcov utility, which runs gcov and generates an HTML output.

The gcovr command currently generates two different types of output:
  • Text Summary
    For each file that generates gcov statistics, gcovr will summarize the number of lines covered, the percentage of coverage and enumerate the lines that are not covered.
  • Cobertura XML
    An XML summary of the coverage statistics can be generated in a format that is consistent with Cobertura.
I find the text summary quite convenient for interactive assessment of coverage, especially as I design tests to improve coverage. The Cobertura summary can be used by continues build tools like Hudson. For example, see the acro-utilib coverage report that was generated with gcovr, using the Cobertura XML output option.

See the gcovr Trac page for further details about this tool. The gcovr command is currently bundled with the FAST Python package, which you can download from the FAST Trac site.  However, gcovr is a stand-alone Python script.  Thus, it is also convenient to download the latest development version here.

Videos for PyCon2009!

I just discovered that the talks at PyCon 2009 were video taped! Excellent! I had fun browsing them last night, looking for clues to the challenges that I face managing several complex Python packages.

I am not sure how the PyCon organizers justified the cost for doing this, but I think that this was an excellent idea. I would love to see other conferences adopt this idea (or at least support online publishing of electronic slides). I suspect that this would discourage some people from attending a conference. However, people like me are already traveling too much. Thus, the PyCon organizers did not lose anything with me; I was already unable to attend. Further, having access to the presentations makes me more likely to adopt the techniques/approaches that they are presenting! This sounds like a win-win to me!!

Thursday, July 2, 2009

PyUtilib Plugins

After blogging about Python plugin frameworks earlier this year, I wound up implemented a new framework in the PyUtilib software package.  The PyUtilib wiki provides a detailed description of the PyUtilib Plugin Framework, but here's a brief summary:
  • This framework is derived from the Trac plugin framework (a.k.a. component architecture)
  • Provides support for both singleton and non-singleton plugin instances
  • Includes utilities for managing plugins within namespaces
  • The core framework is defined in a single Python module, which can be used independently from PyUtilib
  • PyUtilib also includes commonly use plugins, such as
    • A config-file reader/writer based on ConfigParser
    • Loading utilities for eggs and modules
    • A file manager for temporary files
Although I initially resisted the urge to develop my own framework, I was led to develop this because (1) I wanted a light-weight framework like that provided by Trac, but (2) Trac's framework is not particularlly modular within the Trac software repository.  Also, I really needed a plugin framework that supported non-singleton plugins. Development of the PyUtilib Plugin Framework is mostly motivated by my work with Coopr, which extensively leverages plugins to support a flexible, extensible optimization tools.

Monday, January 26, 2009

Another Discussion of Python Plugins

Here's a nice discussion and comparison of Python plugin frameworks that I ran across today: Design Docs Plugins - PiTiViWiKi. This notes that a big difference between Zope and Trac plugins is that Zope defines interfaces which allows for checking interface implementation/definition, as well as facilities for plugin adapters. In this respect, the Envisage Core plugins are similar to Zope.

Python Plugin Frameworks

Updated to include pointers to the PyUtilib Component Architecture and PnP.

Various Python projects I am working on could benefit from the use of a Plug-in framework.  However, there does not appear to be a standard Python plug-in framework, though there are some mature packages that support plug-ins.

Here's a summary of my recent web research:
  • yapsy - This is a simple plug-in framework that was designed specifically to support plug-ins with no external dependencies.
  • Mary Alchin describes a simple plugin framework, with a similar goal.  His classes provide an API for the plugins, with few supporting features (e.g. searching for plugins).
  • AndrĂ© Roberge has a series of posts that describe the application of plugins to refactor a simple calculator application.  The goal of this is to illustrate the requirements for plugin frameworks, with the goal of identifying best practices for plugins. There are some interesting replies to this post, which consider implementations of the plugins he proposes in zope and grok. Both of these pull in quite a few libraries, which begs the question of whether it makes sense for external users to rely on these components for only plugin support.
  • Enthought's Envisage project includes a framework for building extensible, pluggable applications.  The enthough.envisage package defines these capabilities, and there is nice documentation here.
  • Trac and Zope are frameworks that incorporate plug-ins, and these capabilities may be modular enough for use in other applications. Trac's component architecture is detailed in the Trac wiki pages. Zope's plug-ins appear to be called products (see also here). Martin Aspeli describes his experience writing Trac plug-ins and contrasts them with Zope.
  • The PyUtilib Component Architecture (PCA) is a component architecture in Python that is derived from the Trac component framework. One important extension was to support both singleton and non-singleton plugins. Singleton plugins are created immediately when the plugin module is loaded, which is well-suited for persistent applications like Trac. However, many other applications need to employ plugins "on demand", which need to be explicitly constructed by the end-user. (See the PyUtilib wiki for further details.)
  • Plug n' PLay (PnP) is a Generic plug-in system inspired by Trac's internal component management. PnP is roughly a implementation of the Observer pattern (http://en.wikipedia.org/wiki/Observer_pattern).
     
These resources highlight one reason why there is not a standard Python plug-in framework: there are a variety of different capabilities that a user may want, and the complexity of the framework generally increases as these new capabilities are added (e.g. security, API validation, etc). Another item that seems clear is that while a variety of packages support plug-ins, many of them do not use them in a modular fashion.  Thus, it is difficult to reuse sophisticated plug-in frameworks without incorporating a lot of extraneous code.

Tuesday, January 20, 2009

A Python Trick: Adding a Lambda Method to a Class

It does not take much to add a lambda function to a Python class. For example, consider the following:

>>> class A: pass
>>> f = lambda self,x:x
>>> setattr(A,"f",f)

This code adds the method f to class A. For example:

>>> a=A()
>>> a.f(1)
1

However, the f method created this way does not have the expected Python name:

>>> A.f.__name__
''

Further, defining this value is not possible; the instancemethod f does not have a __name__ attribute.

The trick is to name the lambda function before defining the class method:

>>> class A: pass
>>> f = lambda self,x:x
>>> f.__name__ = "f"
>>> setattr(A,"f",f)
>>> f.__name__
'f'

This is simple, but it took too long to figure this out...

Interrupting the UNIX time command

Consider the following use of the standard Unix time command:
/usr/bin/time ls -R /

If the SIGTERM signal is sent to the time process, then the ls process will continue! This is an unexpected behavior, which is not well-documented.

Normally, this is not much of an issue; the process that is monitored will simply terminate quietly. However, when the time utility is used in interactive applications, process interrupts can lead to many unexpected rogue processes.

The timer command is a modification of the UNIX timing utility that behaves as expected. When using timer, the SIGTERM signal is sent to the process, which terminates it as expected. The timer command is available in the UTILIB software library, but it can be compiled independently.

Monday, January 19, 2009

Monitoring Maximum Memory Usage in Linux

There are many tools available that can be used to monitor memory usage in computer programs. However, there are few tools that can be applied to monitor the memory usage of a specific process in an automated manner. Most memory monitoring tools provide a gui that a user can monitor.

However, these tools are not useful in contexts where memory must be monitored repeatedly. For example, automated software tests may require checks to validate that the memory usage does not exceed expected limits.

The memmon command is a new memory monitoring tool that is included in the UTILIB software library. memmon provides a convenient mechanism to report the maximum amount of memory that a process uses.

The memmon command requires the absolute path to the command that will be executed. Beyond that, its default syntax is quite simple:
$ ./memmon /bin/sleep 1
53768 Kb used


The memmon command can also be used to terminate a process whose memory exceeds a specified threshold:

$ ./memmon -k 10 /bin/sleep 1
./memmon: Error: memory exceeded
53764 Kb used


At present, memmon only supports memory monitoring on Linux platforms. However, it is not clear how its capability could be ported to other operating systems. I am particularly interested in this capability on MS Windows, if anyone has ideas for how to do that...

Software Releases

I have not blogged much this fall because I have been busy managing a variety of software releases.  I plan to include further details in upcoming blogs, but I thought I would summarize these releases here:
  • acro 2.0 - Acro is A Common Repository for Optimizers that integrates a rich variety of optimization libraries and solvers that have been developed for large-scale engineering and scientific applications. Acro was developed to facilitate the design, development, integration and support of optimization software libraries. Thus, Acro includes both individual optimization solvers as well as optimization frameworks that provide abstract interfaces for flexible interoperability of solver components. Furthermore, many solvers included in Acro can exploit parallel computing resources to solve optimization problems more quickly.

  • utilib 4.0 - Utilib is a library of general-purpose C++ utilities, similar in spirit to the Boost libraries. While generally treated as an Acro package, Utilib is hosted in a separate subversion repository to facilitate use by projects outside of Acro.

  • PyUtilib 1.0 - PyUtilib is yet another a Python utility library. PyUtilib supports several Python projects under development at Sandia National Laboratories, including Acro, Coopr and FAST.

  • FAST 2.0 - FAST provides a collection of Python tools that support software testing. In it's current form, this does not really constitute a framework, but the plan is to develop a set of tools that support comprehensive testing of software tools.

  • Coopr 1.0 - The Coopr Python package integrates a variety of Python optimization-related packages. Most of these packages rely on externally built optimization solvers. In particular, the Acro software builds many optimizers used by Coopr packages.

The Trac wikis for these packages contain a variety of tools that I'll describe in more detail later...