Thursday, December 22, 2005

New home for Cheesecake and Sparkplot

As Titus mentioned in a recent post, I've been moving two of my projects over to a Linux VPS server from JohnCompanies. The two projects are Cheesecake and Sparkplot. They each have their own domain name ( and and they are each running on their own Trac instance. Thanks to Titus for setting up a huge amount of packages and server processes in an amazingly short amount of time (and I'm talking HUGE amount: Apache 2, Subversion, Darcs, tailor, Trac with WSGI, SCGI, and the list goes on.)

I also want to thank Micah Elliott for providing a temporary home for Cheesecake at TracOS. I prefer to have my projects on a server that's co-located though, and I also want more control over the environment (I have a sudo-level account on the new VPS server).

Update 12/23/05: Thanks to Ian Bicking's suggestion, I reorganized the project under Subversion so that it now includes trunk, branches and tags directories.

You can check out the main trunk of the Cheesecake project via subversion by doing
svn co cheesecake
(Note: make sure you indicate the target directory when you do the svn checkout, otherwise the package files will be checked out directly in your current directory.)

See the main Wiki page for usage examples and other information.

I'll provide a Python Egg for Cheesecake in the near future, as soon as I'll finish smoothing some rough edges.

Also, please update your bookmarks (I'm optimistic I guess in thinking people actually have bookmarks for these things) for the Python Testing Tools Taxonomy page and for the Cheesecake Index Measurement Ideas page.

Tuesday, December 20, 2005

More Cheesecake bits

Thanks to Michael Bernstein for updating the Index Measurement Ideas Wiki page for the Cheesecake project with some great information regarding standard file naming practices. He cites from ESR's The Art of Unix Programming, namely from Chapter 19 on Open Source:

Here are some standard top-level file names and what they mean. Not every distribution needs all of these.

  • README - The roadmap file, to be read first.
  • INSTALL - Configuration, build, and installation instructions.
  • AUTHORS - List of project contributors (GNU convention).
  • NEWS - Recent project news.
  • HISTORY - Project history.
  • CHANGES - Log of significant changes between revisions.
  • COPYING - Project license terms (GNU convention).
  • LICENSE - Project license terms.
  • FAQ - Plain-text Frequently-Asked-Questions document for the project.
Clearly not all these files are required, but this confirms Will Guaraldi's idea of adding points to the Cheesecake index if at least a certain percentage of these (and other similar) files is found.

Another idea I liked from the Good Distribution Practice section of ARTU is to encourage people to provide checksums for their distributions. I'm thinking about including a Cheesecake index for this kind of thing.

On another front, I'm busily changing the layout of the Cheesecake source code as I'm packaging it up as an egg. It's a breeze to work with setuptools, as I'll report in another post. I'll also be adding more unit tests, and BTW it's a blast to run them via the setuptools' test hook. I'm using the nose collector mechanism to automatically run the unit tests when python test is invoked (thanks to Titus for enlightening me on this aspect of setuptools.)

A whiff of Cheesecake

I've been pretty busy lately, but I wanted to take the time to work a bit on the Cheesecake project, especially because I finally got some feedback on it. I still haven't produced an official release yet, but people interested in this project (you two know who you are :-) can grab the source code via either svn or cvs:

SVN from
svn co
CVS from SourceForge:
cvs -z3 co -P cheesecake

(all in one line)
Here are some things that have changed:
  • The cheesecake module now computes 3 partial indexes, in addition to the overall Cheesecake index (thanks to PJE for the suggestion):
    • an INSTALLABILITY index (can the package be downloaded/unpacked/installed in a temporary directory)
    • a DOCUMENTATION index (which of the expected files and directories are present, what is the percentage of modules/classes/methods/functions with docstrings)
    • a CODE KWALITEE index (average of pylint score)
  • The license file is now considered a critical file (thanks to Will Guaraldi for the suggestion)
  • The PKG-INFO file is no longer checked (the check was redundant because is already checked for)
Here are some things that will change in the very near future.

Per Will Guaraldi's suggestion, I'm thinking of changing the way the index is computed for required non-critical files. Will suggested "Would it make sense to use a 3/4 rule for non-critical required files? If a project has 3/4 of the non-critical required files, they get x points, otherwise they get 0 points". I'm actually thinking of having a maximum amount of 100 points for the "required files" check and give 50 points if at least 40% of the files are there, 75 points if at least 60% of the files are there and 100 points if at least 80% of the files are there. This might prove to be more encouraging for people who want to increase their Cheesecake index.

Another one of Will's observations: "For the required files and directories, it'd be nice to have Cheesecake output some documentation as to where to find documentation on such things. For example, what content should README contain and why shouldn't I put the acknowledgements in the README? I don't know if this is covered in the Art of Unix Programming or not (mentioned above)--I don't have a copy of that book. Clearly we're creating standards here, so those standards should have some documentation."

I'm thinking of addressing this issue by computing the Cheesecake index for a variety of projects hosted at the CheeseShop. The results will be saved in a database file (I'm thinking of using Durus for its simplicity) and the cheeseshop module will be able to query the file for things such as:
  • show me the URLs for projects which contain README files
  • show me the top N projects in the area of INSTALLABILITY (or DOCUMENTATION, or CODE KWALITEE, or OVERALL INDEX)
This will allow a package creator to look at stuff other people are (successfully) doing in their packages.

The way I see this working as a quick fix is that I will generate the database file on one of my servers, then I will make it available via svn.

In the long run, this is obviously not ideal, so I'm thinking about putting a Web interface around this functionality. You will then be able to see the top N projects in a nice graph, then issue queries via the Web interface, etc...(if only days had 48 hours)

Some things I also want to add ASAP:
  • use pyflakes in addition to pylint
  • improve CODE KWALITEE index by inspecting unit test stuff: number of unit tests, percentage of methods/functions unit tested, running the coverage module against the unit tests (although it might prove tricky to do this right, since a package can rely on a 3rd party unit test framework such as py.test, nose, etc.)
As always, suggestions/comments are more than welcome.

Wednesday, December 14, 2005

Damian Conway on code maintainability

One of the random quotes displayed in Bugzilla:

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." -- Damian Conway

Swamped and stuff

Just when I was getting into a rhythm with my blog postings, I had to drop out for a while because of increased work load. Plus I started to work with Titus on an application we'll present at our "Agile development and testing in Python" tutorial at PyCon 06. And here's where the "stuff" part of the post comes into play. We've been developing and testing for a week now, and it's been a most enjoyable experience. I won't go into details about the app itself, which is still in its infancy/prototype stage, but here are some methodology ideas we've been trying to follow:
  • 1-week iterations
  • We release working software every 3 weeks
  • 10 iterations and 3 releases until Feb. 23rd (PyCon'06 tutorial day)
  • Enter releases as milestones in Trac
  • Features are developed as stories
  • A story should fit on an index card
  • During each iteration, one or more stories are completed
  • A story is not completed if it is not tested
  • Two types of tests per story
    • Unit tests
    • Functional tests (at the logic layer, not the GUI layer)
  • We package the software continuously (every night) using buildbot
  • We also run unit test and a smoke test (a subset of functional tests) every night via buildbot
  • Before each release, we also test at the GUI layer
  • Before each release, we also run performance tests
  • We file bugs using Trac's ticket system
  • Think about using burndown charts (see XPlanner for ideas)
We haven't hit all of these items yet, but we still have some time until our first release, which is slated for Dec. 31st. Here are some random thoughts that belong in a "Lessons learned" category.

Remote pair programming/testing rocks

It's amazing how much difference a second brain and a second set of eyeballs makes. Although Titus and I haven't practiced "classical" pair programming, we've been going back and forth via email and have added code and unit tests almost in real time, so it felt to me like we were pair programming. Having another person integrating your code and your unit tests instantly and giving you feedback in the form of comments/suggestions/modified code/modified unit tests goes a long way towards greatly improving the quality of the code.

Learning new code by writing unit tests rocks
I think I first read about this concept in one of Mike Clark's blog posts, where he was describing his experiences in learning Ruby by writing unit tests for simple language constructs. I found that writing unit tests for a piece of code that's not mine is an amazing way of learning and understanding the new code. Generally speaking, I find that there is no way to write quality code if you don't write unit tests. What we've done so far in this project is not quite TDD, but it's close. I'm tempted to call it TED for Test Enhanced Development: you write some code, then you start writing unit tests for it, then you notice that the code doesn't quite express what you want, then you modify the code, then you add another unit test, etc. It's a code-test-code cycle that's very tight and makes your code airtight too. In this context, I'd also like to add that coverage rocks: we've been using Ned Batchelder's coverage module to keep track of our unit testing coverage. It's well known that no code coverage metric is perfect, but the coverage percentage reported by the tool at least keeps us honest and makes us strive to improve it.

Trac rocks
Trac proved to be absolutely essential for our "agile" approach to development and testing. Trac combines:
  • a Wiki -- which allows easy editing/jotting down of ideas and encourages brainstorming
  • a defect tracking system -- which simplifies keeping track of not only bugs, but also tasks and enhancements
  • a source browser -- which makes it easy to see all the SVN changesets
  • a roadmap -- where you can define milestones that are tied into the tracker
  • a timeline -- which is a constantly-updated, RSS-ified chronological image of all the important events that happened in the project: code checkins, ticket changes, wiki changes, milestones

Tuesday, December 06, 2005

First consumer of Cheesecake spotted in the wild

That would be Will Guaraldi, the main developer of PyBlosxom. In a recent blog post, he talks about using the Cheesecake metrics as an indicator of the 'kwalitee' of his software. I'm glad to see that somebody is actually using this stuff, it motivates me to keep working on it. I also welcome comments such as the ones made by PJE:

I wouldn't take the Cheesecake metrics too seriously at this point; there are too many oddly-biased measurements in there. For example, to score all the points for documentation, you'd have to include both a News and a Changelog, as well as a FAQ, Announce, and a Thanks - to name just a few. Including that many files for anything but a huge project seems counterproductive. PyLint is also incredibly picky by default about things that really don't matter, and you're also being penalized for not including even though you're not using setuptools. (On the flip side, you're given 10 points free for having PKG-INFO, which the distutils automatically generates...)

The Cheesecake metrics would be more meaningful if they were split out into at least 3 separate ratings for installability, documentation, and code quality, and the documentation rating(s) didn't implicitly encourage having lots of redundant files. In effect, you should probably consider the metrics more as a "list of things that some people have in their packages", and then pay attention to only the ones you actually care to have.

It's true that currently Cheesecake puts somewhat too much emphasis on the different files it expects. I tried to find a set of files that I've seen commonly used in Python projects and other open source projects. Also, I tried to include directories that I think should be included in any Python package. But there should be more emphasis on things such as unit test coverage and other metrics (see this page for the list of things I'm currently contemplating, and BTW it's a Wiki, so feel free to add more stuff).

The 3 separate ratings that PJE mentions make a lot of sense. If people have more such ideas/suggestions/comments, please send them to me directly (grig at gheorghiu dot net) or leave a comment here. I can only hope that more people will follow Will's example and start feeding back some of the Cheesecake metrics into their projects.

Monday, December 05, 2005

More updates to PTTT

The Python Testing Tools Taxonomy Wiki has seen more updates recently. Here are some examples:
  • Kent Johnson added two "miscellaneous" Python testing tools: HeapPy, written by Sverker Nilsson, and PySizer, written by Nick Smallbone; both tools aid in debugging, profiling and optimizing memory usage issues in Python programs
  • Ned Batchelder added Pester, a Python port of a very interesting Java tool called Jester, written by Ivan Moore; the idea behind Jester is to mutate the source code, then run your unit tests and find those tests that don't fail and should fail
  • Dave Kirby added Python Mock, a tool he wrote that enables the easy creation of mock objects
  • Inspired by Dave's reference to mock objects, I added another mock-related Python tool I came across: pMock, written by Graham Carlyle and inspired by the Java jMock library


Came across, which seems pretty useful if you don't want to reinvent the wheel every time you need a regular expression for things such as phone numbers, zip codes, etc.

Friday, December 02, 2005

Jim Shore on FIT and Agile Requirements

Jim Shore just posted a blog entry with links to many of his articles/essays on FIT and Agile Requirements. Mandatory reading for people interested in agile methodologies and acceptance testing with FIT.

Tutorial at PyCon 2006

Titus and I will be giving a tutorial at PyCon 2006: "Agile development and testing in Python". The tutorial will happen only if there will be enough interest -- so if you're planning on going to PyCon next year, I urge you to sign up :-) It will be on Thursday Feb. 23rd, on the day before the conference starts. It should be a fun and interactive 3 hours.

Here is the outline we have so far. If you're interested in other topics related to agile methodologies applied to a Python project, please leave a comment.

Agile development and testing in Python

We will present a Python application that we developed together as an "agile team", using agile development and testing approaches, techniques and tools. The value of the tutorial will consist on one hand in detailing the development and testing methodologies we used, and on the other hand in demonstrating specific Python tools that we used for our development and testing. We will cover TDD, unit testing, code coverage, functional/acceptance testing, Web application testing, continuous integration, source code management, issue tracking, project management, documentation, Python package management.

Presentation outline

1st hour:

2nd hour:

3rd hour:

Thursday, December 01, 2005


Via an email from SourceForge, I found out about splunk, a piece of software that indexes and searches log files (actually not only logs, but any "fast-moving IT data", as they put it). I downloaded the free version and installed it on a server I have, then indexed the /var/log/messages file and played with it a bit.

Here is the search results page for "Failed password". A thing to note is that every single word on the results page is clickable, and if you click on it a new search is done on that word. If you want to add the word to the current search words, click Ctrl and the word, or if you want to exclude the work from the search, click Ctrl-Alt and the word.

Pretty impressive. It uses various AJAX techniques to enhance the user experience, and best of all, part of the server software is written in Python! The search interface is based on Twisted:

root 504 1 0 11:26 pts/0 00:00:04 python /opt/splunk/lib/python2.4/site-packages/twisted/scripts/ --pidfile=/opt/splunk/var/run/splunk/ -noy /opt/splunk/lib/python2.4/site-packages/splunk/search/Search.tac

Definitely worth checking it out.

New home for Selenium project: OpenQA

I just found out yesterday that the Selenium project will have a new home at the OpenQA site.

OpenQA is a JIRA-based site created by Patrick Lightbody, who intends it to be a portal for open-source testing-related projects. Selenium is the first such project, but Patrick will be adding another one soon: a test management application he's working on.

I'll be copying over some Wiki pages from the current Selenium Confluence wiki to the new Open QA wiki. OpenQA is very much a work in progress, so please be patient until it's getting fleshed out some more.

QA Podcast with Kent Beck

Nothing earth-shattering, but it's always nice to hear Kent Beck's perspective on XP and testing.

Monday, November 28, 2005


Fascinating blog I came across a couple of days ago: Xooglers (which stands for "ex-Googlers"). A real page-turner about the anticipated ups, but mostly about the unexpected downs of life as a Googler.

Wednesday, November 23, 2005

Updates to PTTT

...where PTTT stands of course for Python Testing Tools Taxonomy. I'm glad to see that people are updating the Wiki page and adding more stuff to their tool description or adding other tools to the list. Some examples:
  • Ori Peleg updated the description of his TestOOB tool: "unittest enhancements; test filtering via regex/glob patterns; reporting in XML/HTML; colorized output; runs pdb on failing tests; run in parallel in threads/processes; verbose asserts; report failures immediately; and a little more;"
  • Ian Bicking added his paste TestFileEnvironment tool ("A simple environment for testing command-line applications, running commands and seeing what files they write to")
  • Geoff Bache added his TextTest tool, an acceptance testing tool "written in python but it can be used to test programs written in any language. Comes with extensive self tests which serve as examples of how to use it, ie how to test a non-trivial application with a pyGTK GUI"; TextTest looks really intriguing, I'll check it out and comment on it soon
  • Zack Cerza added two GUI testing tools:
    • dogtail ("Created by Redhat engineers on linux. Uses the X11 accessability framework (AT-SPI) to drive applications so works well with the gnome desktop on Unixes. Has flash movies")
    • ldtp ("Also uses the X11 accessability framework (AT-SPI) to drive applications so works well with the gnome desktop on Unixes. Has extensive tests for the evolution groupware client.")

Tuesday, November 22, 2005

Martin Fowler on In-Memory Databases and Testing

Just came across Martin Fowler's post on In-Memory Databases. I was glad to see his mention of Firebird, which has been a favorite of mine for a number of years. Fowler talks about the primary use of in-memory databases: testing, or to be more precise test-driven development, where speed is of the essence. He also mentions SQLite, which I've only recently started playing with, mainly by going through the Django tutorial. I'd like to explore it further though in the context of testing.

Monday, November 21, 2005

Python Testing Tools Taxonomy

There's been a flurry of blog posts recently on the subject of Python testing, especially on Web application testing tools. I thought it would be a good idea to put up a Wiki page with the tools I know of, so that anybody who's interested in contributing to the "Python Testing Tools Taxonomy" can do so easily. Here is what I have so far. Feel free to modify it.

Update 05/01/06

The PTTT page has moved here. Please update your bookmarks etc.

Thursday, November 17, 2005

Cheesecake project update

Micah Elliott graciously offered to host my Cheesecake project at his TracOS site, a Trac-based Wiki that hosts a collection of Open Source projects. Check out the brand new Cheesecake home page and let me know what you think. I haven't had time to properly package my code (which is kind of ironic, considering that checking packages for their 'goodness' is after all the goal of the Cheesecake project), but you can grab the source code via subversion:
svn co
I'll put the code up on SourceForge too at some point, but it seems to me that the Trac-based Wiki is much more "agile" than the SourceForge interface, so I'll make TracOS the primary home for my project.

Bob Koss on "Refrigerator Code"

Seen via Jeffrey Fredrick's blog: Bob Koss talks about Refrigerator Code, i.e. code that you're so proud of that you're ready to put it on the refrigerator, next to your kids' drawings. Nice metaphor. It reminds me of an expression used by the late Chick Hearn, the famous play-by-play announcer for the Lakers: "This game is in the refrigerator!"

Lightning Talks session at Star West 2005

Yesterday I gave a 5-minute Lightning Talk on Selenium at the Star West 2005 testing conference in Anaheim. There were 9 speakers in all, coordinated by Erik Petersen. It was my second Lightning Talk experience, after the one at PyCon 2005 earlier this year. It wasn't quite as interactive as at PyCon, mainly because it was based on slides rather than live demos (I did do a live demo of Selenium though), but it was still very interesting and intense. Erik made it even more fun by introducing every speaker with a little Vegas-style tune that evoked the Rat Pack's apparition on the stage.

My favorite presentation was Rob Sabourin's, who talked about the Iron Ring given to all Professional Engineers in Canada during a secret "Calling of the Engineers" ceremony. The ring is made of iron and is initially very coarse, but the engineers who receive it are supposed to wear it permanently on their working hand, so in time it becomes round and smooth. It is a symbol of pride to be part of an important profession, but it's also a symbol of humility and social responsibility. Rob linked the ring to the Quebec Bridge Disaster which happened at the beginning of the 20th century, when what was supposed to be the longest cantilever bridge in the world collapsed twice during construction because of a scaling problem (the weight of the bridge turned out to be more than what the engineers were used to from previous constructions of smaller bridges). What I took away from the talk was the need for more social responsibility from software developers and testers in particular.

Another fun presentation was Jon Bach's, who talked about an interesting form of testing: open-book exams. He has an article online on this subject. The main idea is that if you want your application to be tested by technical and non-technical people alike, just give them an open-book exam with 30-40 questions related to various areas of functionality, and let them collaborate in finding the answers by exploring the application. This offers a much more focused testing environment than just telling them to go explore the application on their own. Interesting indeed...I have to think about ways to apply this in my day-to-day work.

Other interesting topics from some of the other Lightning Talks:
  • a test team's success with using session-based testing (here's a good introductory article by the same Jon Bach)
  • keeping track of testing progress by means of simple Excel spreadsheets (as opposed to cumbersome project management software and Gantt charts that quickly become obsolete) and using burndown charts to show where the test team stands
  • using virtual machines to simplify the administration of the testing environment
  • how gaining even a millisecond is important when doing performance testing at VeriSign
  • how RUP could not be easily adapted to non-software manufacturing environments
On the whole, it was a very focused session, choke-full of ideas and lessons to take away. Kudos to Erik Petersen for putting it together. If you want my advice, the next time you attend a conference, don't miss the Lightning Talks if they are offered!

Wednesday, November 16, 2005

Agile Dilbert

Saw this on Roy Osherove's blog.

Article on All-pairs testing technique

From the agile-testing mailing list, courtesy of Todd Bradley, here's a link to a PDF version of an article by Bernie Berger on the All-pairs testing technique. If you're a tester, you need to get acquainted with this technique, so that you can mitigate the combinatorial explosion of your test cases.

Monday, November 14, 2005

Exciting times in the Python testing world

If you are a developer or tester using Python, you live in exciting, ebullient times. There are Python-based testing frameworks newly-announced or recently-updated almost every day. Here is a rundown of the latest I'm aware of:

Unit testing
  • py.test: no recent new release, but changes are happening almost daily in svn
  • TestOOB: version 0.7 was released recently (TestOOB is an enhancement to the standard unittest module, offering many features that py.test offers)
  • nose: version 0.7.2 was freshly released yesterday (nose, in its author's words, "provides an alternate test discovery and running process for unittest, one that is intended to mimic the behavior of py.test as much as is reasonably possible without resorting to too much magic"; nose will become, if it's not already, the official test framework for TurboGears)
Web application testing
  • twill: version 0.7.4 was released on Nov. 11th, with unit tests that use nose, and with new commands to help developers use twill to unit test their own Web apps; also, Titus Brown announced yesterday that he extended twill to add in-process testing of WSGI applications (I blogged about twill here)
  • FunkLoad: version 1.3.1 was released on Nov.10th (FunkLoad offers functional, performance, load and stress testing of Web applications)
  • zope.testbrowser: version 0.9.0 was released on Nov.12th (zope.testbrowser is the stand-alone version of the Zope 3 functional testing framework)
  • Sancho: version 2.1 was released on Nov. 2nd (Sancho is the unit test framework for the MEMS Exchange applications; for those who are not familiar with MEMS Exchange, they are the guys behind Quixote, Durus and other Python apps)
  • Django's own doctest-based test framework: read my blog post about it
  • ibofobi's doctest-based framework for Django: blog post from Nov. 8th announces a new unit test framework for Django-based Web applications
  • [Update from Ian Bicking, the creator of Paste]: paste.test.fixture: the unit test framework used in Paste (read "What is Paste, yet again" on Ian's blog)
  • [Update from Robert Brewer, the creator of CherryPy]: webtest is a small, but helpful and isolated web-application-test module (it extends unittest) used to test CherryPy
GUI testing
  • guitest: version 0.3 was released on Nov. 13th (guitest is a Python helper library for testing Python GUI applications, with pyGTK support being the most mature)
  • retest: version 0.5.1 was released on Sept. 23rd (retest enables tests of Python regular expressions in a webbrowser; it uses SimpleHTTPServer and AJAX)
I also want to mention MochiKit as an example of an application that makes it a point to offer top-notch tests and documentation. MochiKit is a JavaScript library that is very "Pythonical" in nature, which is not surprising given that one of its main developers is Bob Ippolito, well-known for his contributions in the Python community. One of the goals of MochiKit is to maintain 100% documentation coverage, and another is to test itself mercilessly. If only all applications followed these tenets, the world would truly be a better place :-)

SoCal Code Camp

From the xpsocal mailing list: the Southern California Code Camp will be held on Jan.21-22 2006 at Cal State Fullerton. I checked out the Sessions page and the organizers say "All technologies are welcome C++, C#, VB.Net, Java, Ruby, COBOL???, SQL, etc... if it is code... we want it." I noticed at least one glaring omission (what? no Python?) so I decided to remedy it by sending a proposal for a Python session, which is basically my PyCon 2005 talk on "Agile testing with Python test frameworks". Let's see if they go for it.

Tuesday, November 08, 2005

ibofobi's Django doctest framework

In the "This is way too cool" category: a new doctest-based framework for testing your Django apps. Other than doctest, it also uses Beautiful Soup and YAML. I need to check it out at some point.

Friday, November 04, 2005

Drucker on agility and testing

The other day I picked up a copy of "The Daily Drucker" from the local library. It was again one of those fortuitous events, almost as if the book told me to pick it up from the shelf. I had another similar experience last year at the same library when I picked up "XP Explained", so I think it's particularly fitting to find references to agility and testing in Drucker's ideas. I don't think Peter Drucker needs any introduction, but I have to confess I haven't read any of his books yet (although this is about to change!). "The Daily Drucker" is a collection of fragments from his books, put together in "a thought a day" format that's pretty agile in itself :-)

Here's what Drucker has to say on organizational inertia: "All organizations need to know that virtually no program or activity will perform effectively for a long time without modification and redesign. Eventually every activity becomes obsolete."

These are sobering words that can be applied verbatim to software development (note the use of the word 'program' in the above quote!). They say that you must be prepared for change, and indeed welcome it, as XP teaches. In fact, in the same fragment (from "The Age of Discontinuity", Drucker goes on to say: "All organizations must be capable of change. We need concepts and measurements that give to other kinds of organizations what the market test and profitability yardstick give to business. Those tests and yardsticks will be quite different."

Note the use of the word "tests". To apply this idea to software, you can't prove that your application is capable/performing if you don't have tests and yardsticks. Again, an idea that appears over and over in agile software methodologies. Here's Drucker again: "Each institution will be the stronger the more clearly it defines its objectives. It will be more effective the more yardsticks and measurements there are against which its performance can be appraised. It will be more legitimate the more strictly it bases authority on justification by performance." You couldn't give a better justification for having unit/acceptance/performance tests in place for your application even if you tried :-) !!!

I was also struck by some of Drucker's ideas on abandoning what doesn't work. It reminded me about the virtue of courage in XP, which gives you the fortitude to throw away code and start anew, knowing that you'll probably do a much better job. Here's Drucker: "Without systematic and purposeful abandonment, an organization will be overtaken by events. It will squander its best resources on things it should bever have been doing or should no longer do. As a result, it will lack the resources, especially capable people, needed to exploit the opportunities that arise." It's almost like Drucker is talking about the YAGNI principle and about Big Design Up Front. The alternative, of course, is to Do The Simplest Thing That Could Possibly Work. In fact, according to Drucker: "The question to be asked - and asked seriously - 'If we did not do this already, would we, knowing what we know, go into it now?' If the answer is no, the reaction must be 'What do we do now?'" In XP, we know what to do now because we have the current iteration to work on, having obtained feedback from the customers.

Drucker talks of course about feedback too. He associates it with 'continuous learning' (another agile concept) and he advises everybody to write down the results they anticipate before doing anything of significance. Then, after a number of months, they will feed back from the actual results to the anticipations. This accomplishes three things: firstly, it shows you what you did well and what your strengths are; secondly, it shows you what you have to learn and what habits to change; and thirdly, it show you what you are not gifted for and cannot do well. These are the keys to continuous learning, and it seems to me that they're used very effectively in the agile practices of test-driven development, continuous integration, constant feedback and frequent iterations.

Finally, Drucker has something to say about agility itself: "Large organizations cannot be versatile. A large organization is effective through its mass rather than through its agility. An organization, no matter what it would like to do, can only do only a small number of tasks at the same time. This is not something that better organization or 'effective communication' can cure. The law of organization is concentration. Yet modern organization must be capable of change. Indeed it must be capable of initiating change, that is innovation. It must be able to move scarce and expensive resources of knowledge from areas of low productivity and nonresults to opportunities for achievement and contribution. This however, requires the ability to stop doing what wastes resources." The lesson for agile development? Keep your teams small and focused, encourage an environment of collaboration and freely flowing ideas, don't pidgeon-hole people into roles, be prepared to react quickly and to throw away what doesn't work.

I am amazed by the depth and breadth of Drucker's ideas, and about their direct applicability to software development. My TODO list has just been augmented with "Read at least one Druckerian wisdom nugget every day".

"Articles and tutorials" page updated

As blog posts come and go to the archive bucket, finding a specific article in a blog is hard to do. This page wants to be a central repository of links to the various articles and tutorials that I posted so far.

Updated 01/19/2006

Unit testing
Unit testing in Python part 1: unittest
Unit testing in Python part 2: doctest
Unit testing in Python part 3: py.test
Michael Feathers on unit testing rules

Acceptance testing with FitNesse
PyFIT/FitNesse Tutorial Part 1
PyFIT/FitNesse Tutorial Part 2
PyFIT/FitNesse Tutorial Part 3

Web application testing
Web app testing with Jython and HttpUnit
Web app testing with Python part 1: MaxQ
Web app testing with Python part 3: twill
Acceptance tests for Web apps: GUI vs. business logic

Selenium-specific articles
Web app testing with Python part 2: Selenium and Twisted
Quick update on Selenium in TestRunner mode
Quick update on Selenium in Twisted Server mode
Using Selenium to test a Plone site (part 1)
Using Selenium to test a Plone site (part 2)
New features in Selenium 0.3
Article on Selenium in Oct. 2005 issue of "Better Software"
Selenium at OpenQA
Running Selenium in Python Driven Mode
Testing Commentary (and thus Ajax) with Selenium

Performance/load/stress testing
pyUnitPerf Tutorial
Performance vs. load vs. stress testing
HTTP performance testing with httperf, autobench and openload
More on performance vs. load testing

Automated test distribution, execution and reporting
STAF/STAX Tutorial

General testing topics
Quick black-box testing example
White-box vs. black-box testing

Agile Documentation
Agile Documentation with doctest and epydoc
Agile documentation in the Django project

The Cheesecake project
Cheesecake: How tasty is your code?
Cheesecake project on SourceForge
A whiff of Cheesecake
More Cheesecake bits
New home for Cheesecake and Sparkplot projects


Installing and using cx_Oracle on Unix
Installing Python 2.4.1 and cx_Oracle on AIX
Installing the Firebird database on a 64-bit RHEL Linux server

The py library
Keyword-based logging with the py library
py lib gems: greenlets and py.xml
'py library overview' slides

Python on Windows
Handling the 'Path' Windows registry value correctly
Running a Python script as a Windows service

System Administration HOWTOS
Telecommuting via ssh tunneling
Managing DNS zone files with dnspython
Configuring OpenLDAP as a replacement for NIS
Chroot-ed FTP with wu-ftpd
System monitoring via SNMP
Compiling and installing a custom Linux kernel
Configuring Apache 2 and Tomcat 5.5 with mod_jk

Data visualization
sparkplot: creating sparklines with matplotlib

Jakob Nielsen on Usability Testing
Jakob Nielsen on Blog Usability

Other articles

Python as an agile language
Oblique Strategies and testing

Monday, October 31, 2005

Configuring Apache 2 and Tomcat 5.5 with mod_jk

Update May 15th 2007

If you're interested in setting up Apache virtual hosts with Tomcat 5.5 and mod_jk, check out my recent blog post on that subject.

I recently went through the painful exercise of configuring Tomcat 5.5 behind Apache 2 using the mod_jk connector. I had done it before with mod_jk2, but it seems that mod_jk2 is deprecated, so I wanted to redo it with the officially supported mod_jk connector. Although I found plenty of tutorials and howtos on Google, they all missed some important details or were not exactly tailored to my situation. So here's my own howto:

Step 1: Install Apache 2

I won't go into many details, as this a very well documented process. I installed httpd-2.0.55 and I used the following configuration options:

./configure --enable-so --enable-mods-shared=most

In the following discussion, I will assume that Apache 2 is installed in /usr/local/apache2.

Step 2: Install JDK 1.5

In my case, I put the JDK files in /usr/local/java and I added this line to root's .bash_profile file:

export JAVA_HOME=/usr/java/jdk1.5.0_05

Step 3: Install Tomcat 5.5

Download apache-tomcat-5.5.12.tar.gz and put it in /usr/local. Unpack the package, create tomcat user and group, change ownership:

# cd /usr/local; tar xvfz apache-tomcat-5.5.12
# ln -s /usr/local/apache-tomcat-5.5.12 /usr/local/tomcat
# groupadd tomcat; useradd tomcat -g tomcat -d /usr/local/tomcat tomcat
# chown -R tomcat.tomcat /usr/local/apache-tomcat-5.5.12 /usr/local/tomcat

Modify .bash_profile for user tomcat and source it

Become user tomcat and modify .bash_profile file by adding following lines:

export PATH=$PATH:/usr/local/bin:/usr/local/tomcat/bin
export JAVA_HOME=/usr/java/jdk1.5.0_05
export CATALINA_HOME=/usr/local/tomcat

$ . ~/.bash_profile

Test starting/stopping the Tomcat server

Try to start up tomcat by running (it's in bin subdir, should be in PATH)

If you do ps -def | grep tomcat you should see something like:

tomcat 18591 1 88 06:40 pts/0 00:00:02 /usr/java/jdk1.5.0_05/bin/java -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/usr/local/tomcat/conf/ -Djava.endorsed.dirs=/usr/local/tomcat/common/endorsed -classpath :/usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/commons-logging-api.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat org.apache.catalina.startup.Bootstrap start

Try to shut down tomcat by running (it's in bin subdir, should be in PATH)

If you do ps -def | grep tomcat you shouldn't see the above process running.

Step 4: Install the mod_jk connector

Download jakarta-tomcat-connectors- and unpack it, then configure and build it:

# cd jakarta-tomcat-connectors-
# ./
# ./configure --with-apxs=/usr/local/apache2/bin/apxs
# make; make install

Verify that is in /usr/local/apache2/modules and is chmod 755.

Step 5: Connect Tomcat to Apache

Create file in /usr/local/apache2/conf with following contents:

# This file provides minimal jk configuration properties needed to
# connect to Tomcat.
# We define a worked named 'default'



Edit httpd.conf and add the following mod_jk-specific directives (I added them just before the start of Section 3 / Virtual Hosts).

Important note: the name of the worker defined in the file ('default' in this example) needs to be the same as the worker that appears in httpd.conf in the JkMount lines. Also note that the JkMount lines below only map the two sample JSP/servlet applications that ship with Tomcat. You need to add similar lines for your custom application.

# Mod_jk settings
# Load mod_jk module
LoadModule jk_module modules/
# Where to find
JkWorkersFile conf/
# Where to put jk logs
JkLogFile logs/mod_jk.log
# Set the jk log level [debug/error/info]
JkLogLevel debug
# Select the log format
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
# JkOptions indicate to send SSL KEY SIZE,
JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories
# JkRequestLogFormat set the request format
JkRequestLogFormat "%w %V %T"
# Send JSPs for context /jsp-examples to worker named default
JkMount /jsp-examples/*.jsp default
# Send servlets-examples to worker named default
JkMount /servlets-examples/* default

Keep editing httpd.conf and add following Alias directives (for example under the entry for the icon Alias). These directives tell Apache to map /jsp-examples and servlets-examples to the sample directories that ship with Tomcat.

# Static files in the jsp-examples webapp are served by apache
Alias /jsp-examples "/usr/local/tomcat/webapps/jsp-examples/"

Options FollowSymLinks
AllowOverride None
Allow from all

# The following line prohibits users from directly access WEB-INF

AllowOverride None
deny from all

# Static files in the servlets-examples webapp are served by apache
Alias /servlets-examples "/usr/local/tomcat/webapps/servlets-examples/"

Options FollowSymLinks
AllowOverride None
Allow from all

# The following line prohibits users from directly access WEB-INF

AllowOverride None
deny from all

Restart Apache via /etc/rc.d/init.d/httpd restart

Test standalone Tomcat server by going to http://Web_server_name_or_IP:8080

Test Apache/Tomcat integration by going to http://Web_server_name_or_IP/jsp-examples and http://Web_server_name_or_IP/servlets-examples

That should be it. At least it did the trick for me. In future posts I'll cover Tomcat clustering/session replication and some tips for tuning Apache/Tomcat.

Helpful articles/howtos/tutorials:

A. Ramnishath's Knowledge Base: Configuring AJP1.3 Connector for Tomcat
Jakarta Tomcat Connector -- Apache Howto
JSP Quick-Start for Linux

Friday, October 21, 2005

Mailing lists for Cheesecake project at SourceForge

At Titus's prompting (he challenged me to be a Real Man and not use wimpy Forums but Mailing Lists), I created two mailing lists for the Cheesecake project at cheesecake-devel and cheesecake-users. Feel free to check them out and contribute if you're interested in this project.

Proper location for LICENSE, CHANGELOG and other files?

What do people think should be the proper location for files such as LICENSE, ANNOUNCE, CHANGELOG, README?

Some projects have them in the top-level directory, some have them in a sub-directory such as 'docs'. Currently the Cheesecake index penalizes projects that do not have these files in the top-level project directory.

If you have any ideas, please leave a comment here or, even better, post to this Cheesecake Open Discussion thread.

Update: I also created two mailing lists for the Cheesecake project at cheesecake-devel and cheesecake-users.

Wednesday, October 19, 2005

Recommended blog: Maeda's 'Simplicity'

"Thoughts on Simplicity" is the blog of John Maeda, a professor at the MIT Media Lab. I've been reading it for a couple of months and I always take away intriguing ideas, especially about how to strive for simplicity and elegant design in our cluttered and complex world.

Maeda periodically posts his Laws of Simplicity and he says he'll stop the blog when he'll reach the sixteenth. He's now up to ten. Here is Maeda's Tenth Law of Simplicity:

Less breeds less; more breeds more.
Equilibrium is found at many
points between less and more,
but never nearest the extrema.

Monday, October 17, 2005

Cheesecake project on SourceForge

I registered Cheesecake at SourceForge. People interested in the idea of putting together a "Cheesecake index" that measures the goodness of Python projects are welcome to post in the Open Discussion forum. I got things going there by posting a few ideas contributed by Micah Elliott. If you're interested in participating in the project, send me an email at grig at gheorghiu dot net and I'll add you to the developer list.

Jakob Nielsen on Blog Usability

Usability guru Jakob Nielsen talks about "Weblog Usability: The Top Ten Design Mistakes". I guess I'm guilty of #1 (no author bio), #2 (no author photo) and #10 (generic blog domain name). Oh well, nobody's perfect :-)

Thursday, October 13, 2005

Article on Selenium in October issue of "Better Software"

My "Tool Look: A Look at Selenium" article was published in the Oct. 2005 issue of Better Software. I can now post a PDF version of the article that you can download from here. The "Sticky Notes" are online:

Cheesecake: how tasty is your code?

Update 3/20/06: I'm republishing this post in order to fix this blog's atom.xml index file by getting rid of some malformed XML.

Our friends in the Perl community came up with the concept of KWALITEE: "It looks like quality, it sounds like quality, but it's not quite quality". Kwalitee is an empiric measure of how good a specific body of code is. It defines quality indicators and measures the code along them. It is currently used by the CPANTS Testing Service to evaluate the 'goodness' of CPAN packages. Here are some of the quality indicators that measure kwalitee:
  • extractable: does the package use a known packaging format?
  • has_version: does the package name contain a version number?
  • has_readme: does the package contain a README file?
  • has_buildtool: does the package contain a Makefile?
  • has_tests: does the package contain tests?
I think it would be worth having a similar quality indicator for Python modules. Since the Python CPAN equivalent is the PyPI hosted at the Cheese Shop, it stands to reason that the quality indicator of a PyPI package should be called the Cheesecake index, and I hereby declare that I'm starting the Cheesecake project. The goal of the project is to produce a tool that emits a Cheesecake index for a given Python distribution.

Here are some metrics and tools that I think could be used in computing the Cheesecake index, in addition to some of the CPAN kwalitee metrics:
  • unit test coverage: how many methods/functions are exercised in the unit tests?
  • docstring coverage: how many methods/functions have docstrings?
  • PyFlakes/PyLint validation
As synchronicity would have it, I found a post on today that refers to well-written Python code. Here are some ideas that Micah Elliott shared about what constitutes a "Pythonic" distribution:

  • Has modules grouped into packages, all are cohesive, loosely coupled, and reasonable length
  • Largely follows PEP conventions
  • Avoids reinventing any wheels by using as many Python-provided modules as possible
  • Well documented for users (manpages or other) and developers (docstrings), yet self-documenting with minimal inline commenting
  • Uses distutils for ease of distribution
  • Contains standard informational files such as: BUGS.txt COPYING.txt FAQ.txt HISTORY.txt README.txt THANKS.txt
  • Contains standard directory structure such as: doc/ tools/ (or scripts/ or bin/) packageX/ packageY/ test/
  • Clean UI, easy to use, probably relying on optparse or getopt
  • Has many unit tests that are trivial to run, and code is structured to facilitate building of tests
  • The first example of a pythonic package that comes to my mind is docutils
Checking for some of these things can be automated. Some properties, such as 'clean UI' or 'reasonable length', are more subjective and harder to automate, but in any case they're all very good ideas and a good starting point for computing the Cheesecake index.

Any other ideas? Anybody interested in participating in such a project? Leave a comment with your email address or send me email at grig at gheorghiu dot net.

Wednesday, October 12, 2005

Software Test and Performance magazine

Came across via a blog post by Alexander Podelko. The neat thing is that you can download all back issues in PDF format. I checked out the October issue and it has some really interesting articles on performance testing, and also on agile software development -- which is very aptly compared to candlestick making (dip a string in wax, get a prototype of a candle, repeat until you get a finished candle while always having a 'working' candle in your hands).

Tuesday, October 11, 2005

Mini HOWTO #3: compiling and installing a custom Linux kernel

  • Go to the /usr/src/linux directory, where linux is a link to the appropriate kernel version (e.g. /usr/src/linux-2.4)
  • Edit the EXTRAVERSION line of /usr/src/linux/Makefile. Change the definition for EXTRAVERSION=versionnumber to something that uniquely identifies your kernel, for example EXTRAVERSION=RHcustom.
  • Run make mrproper to ensure your source files are in a consistent and clean state.
  • Save a copy of the old configuration file /usr/src/linux/.config to a secure location.
  • If you want to reuse an old configuration file as a starting point, copy it to /usr/src/.config and run make oldconfig
  • Customize the kernel. Use make config for a text-based interface, make menuconfig for a curses-based interface or make xconfig for an X-based interface. Select all desired/needed options.
  • Run make dep to set up all dependencies correctly.
  • Run make bzImage to create a gzip-compressed kernel image file. This will compile the kernel, which can be a lengthy process depending on your hardware.
  • Run make modules to build the kernel modules.
  • Copy kernel image file to /boot directory: cp /usr/src/linux/arch/i386/boot/bzImage /boot/vmlinuz-2.4.19-RHcustom
  • Run make modules_install to install the kernel modules into /lib/modules/2.4.19-RHcustom.
  • Build support for an initial RAM disk (initrd) by running: mkinitrd /boot/initrd-2.4.19-RHcustom.img 2.4.19-RHcustom
  • Copy the new kernel’s symbol table and configuration file to the /boot partition: cp /usr/src/linux/ /boot/; cp /usr/src/linux/.config /boot/config-2.4.19-RHcustom
  • Update the boot manager configuration files.
    • For GRUB, update /etc/grub.conf with a section for the new kernel.
    • For LILO, update /etc/lilo.conf with a section for the new kernel and then run lilo –v.

Monday, October 10, 2005

Mini HOWTO #2: system monitoring via SNMP

Goal: We want to monitor system resources such as CPU utilization, memory utilization, disk space, processes, system load via SNMP

Solution: Install and configure Net-SNMP

1. Install Net-SNMP
  • if installing from source, the configuration file snmpd.conf will go into /usr/local/share/snmp
  • by default there is no configuration file; it can be generated via the snmpconf Perl utility
2. Configure Net-SNMP by editing /usr/local/share/snmp/snmp.conf

2a. Keep things simple with access control; the following entries can be defined (as opposed to more complicated com2sec, group etc.):

# rwuser: a SNMPv3 read-write user
# arguments: user [noauth|auth|priv] [restriction_oid]
rwuser topsecretv3

# rouser: a SNMPv3 read-only user
# arguments: user [noauth|auth|priv] [restriction_oid]
rouser topsecretv3_ro

# rocommunity: a SNMPv1/SNMPv2c read-only access community name
# arguments: community [default|hostname|network/bits] [oid]
rocommunity topsecret_ro

# rwcommunity: a SNMPv1/SNMPv2c read-write access community name
# arguments: community [default|hostname|network/bits] [oid]
rwcommunity topsecret

2b. Disk space can be monitored by adding entries to the 'disk' section. Example:

disk /
disk /boot
disk /usr

2c. Processes can be monitored by adding entries to the 'proc' section. Example:

proc java
proc postmaster
proc mysqld

2d. System load can be monitored by adding entries to the 'load' section. Example:

load 5 5 5

2e. The EXAMPLE.conf file in the source directory shows more capabilities of the SNMP agent (you can run executables/scripts and return one line of output and an exit code)

3. Start up the SNMP daemon (agent) by running /usr/local/sbin/snmpd. If you want snmpd to start up automatically at boot time, add the line '/usr/local/sbin/snmpd' to /etc/rc.d/rc.local on Red Hat systems, or equivalent on other flavors of Unix

3a. The agent logs to /var/log/snmpd.log (for more detailed debugging info, start the agent with the -D flag)

4. On the SNMP monitoring host, use snmpget to query the SNMP agent running on the target host. The trick here is to know which OIDs to use when you query the agent.


Get available disk space for / on the target host:

snmpget -v 1 -c "community" target_name_or_ip .

(this will return available disk space for the first entry in the 'disk' section of snmpd.conf; replace 1 with n for the nth entry)

Get the number of java processes running on the target host:

snmpget -v 1 -c "community" target_name_or_ip .

(replace 1 at the end with n for the nth entry in the 'proc' section)

Get the 1-minute system load on the target host:

snmpget -v 1 -c "community" target_name_or_ip .

Get the 5-minute system load on the target host:

snmpget -v 1 -c "community" target_name_or_ip .

Get the 15-minute system load on the target host:

snmpget -v 1 -c "community" target_name_or_ip .

Get various CPU utilization metrics on the target host via snmpwalk:

snmpwalk -v 1 -c "community" target_name_or_ip .

Sample output:

UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 5
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 5
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 8
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 0
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 99
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 1007102
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 3879
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 544737
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 238396576

To retrieve a specific metric, for example the number of interrupts, you would do:

snmpget -v 1 -c "community" target_name_or_ip .

(we append 7.0 to the OID that we used in snmpwalk, because ssSysInterrupts is the 7th variable in the snmpwalk output)

Get various memory utilization metrics on the target host via snmpwalk:

snmpwalk -v 1 -c "community" target_name_or_ip .

Sample output:

UCD-SNMP-MIB::memIndex.0 = INTEGER: 0
UCD-SNMP-MIB::memErrorName.0 = STRING: swap
UCD-SNMP-MIB::memTotalSwap.0 = INTEGER: 2048276
UCD-SNMP-MIB::memAvailSwap.0 = INTEGER: 2005604
UCD-SNMP-MIB::memTotalReal.0 = INTEGER: 998560
UCD-SNMP-MIB::memAvailReal.0 = INTEGER: 89896
UCD-SNMP-MIB::memTotalFree.0 = INTEGER: 2095500
UCD-SNMP-MIB::memMinimumSwap.0 = INTEGER: 16000
UCD-SNMP-MIB::memShared.0 = INTEGER: 0
UCD-SNMP-MIB::memBuffer.0 = INTEGER: 234884
UCD-SNMP-MIB::memCached.0 = INTEGER: 459016
UCD-SNMP-MIB::memSwapError.0 = INTEGER: 0
UCD-SNMP-MIB::memSwapErrorMsg.0 = STRING:

To retrieve a specific metric, for example the amount of available swap space, you would do:

snmpget -v 1 -c "community" target_name_or_ip .

(we append 4.0 to the OID that we used in snmpwalk, because memAvailSwap is the 4th variable in the snmpwalk output)

Note: for CPU and memory stats, you don't need to add any special directives in the snmpd.conf configuration file

Mini HOWTO #1: chroot-ed FTP with wu-ftpd

Scenario: We have an Apache server whose DocumentRoot directory is /var/www/html. We have wu-ftpd running as the FTP server.

Goal: We want developers to be able to access /var/www/html via ftp, but we want to grant access only to that directory and below.

Solution: Set up a chroot-ed ftp environment

1. Create special 'ftpuser' user and group:

useradd ftpuser

2. Change /etc/passwd entry for user ftpuser to:


(note the dot after the chroot directory)

3. Add /sbin/nologin to /etc/shells.

4. Create and set permissions on the following directories under the chroot directory:

cd /var/www/html
mkdir -p bin dev etc usr/lib
chmod 0555 bin dev etc usr/lib

5. Copy ls and more binaries to the bin subdirectory:

cp /bin/ls bin
cp /bin/more bin
chmod 111 bin/ls bin/more

Also copy to usr/lib all libraries needed by ls and more. Do "ldd /bin/ls" to see the shared libraries you need to copy. For example:

-rwxr-xr-x 1 root root 495474 Jan 7 15:44
-rwxr-xr-x 1 root root 5797952 Jan 7 15:44
-rwxr-xr-x 1 root root 11832 Jan 7 15:44

6. Create special "zero" file in dev subdirectory:

cd dev
mknod -m 666 zero c 1 5
chown root.mem zero

7. Create bare-bones passwd and group files in etc subdirectory:





8. Edit /etc/ftpaccess and add following lines:

class all real,guest *

guestgroup ftpuser

chmod no guest,anonymous
umask no guest,anonymous
delete no anonymous
overwrite no anonymous
rename no anonymous

upload /var/www/html / yes root ftpuser 0664 dirs

9. Change group (via chgrp) for files under /var/www/html to ftpuser
  • also change permissions to 775 for directories and 664 for files
  • but be careful to exclude the bin, dev, etc and usr subdirectories

10. Modify httpd.conf so that access to special subdirectories is not allowed:

<Directory /var/www/html/bin>
order deny,allow
deny from all

<Directory /var/www/html/dev>
order deny,allow
deny from all

<Directory /var/www/html/etc>
order deny,allow
deny from all

<Directory /var/www/html/usr>
order deny,allow
deny from all

11. Restart Apache and wu-ftpd

12. Test by ftp-ing as user ftpuser
  • Verify that you can upload/delete files in /var/www/html and subdirectories
  • Verify that you can't access files outside of /var/www/html and subdirectories

System administration and security mini HOWTOs

Over the years I kept notes on how to do various sysadmin/security-related tasks. I thought it might be a good idea to post some of them on this blog, both for my own reference and for other folks who might be interested. The first "Mini HOWTO" post will be on setting up a chroot-ed FTP environment with wu-ftpd.

Friday, October 07, 2005

Configuring OpenLDAP as a replacement for NIS

Here's a step-by-step tutorial on installing OpenLDAP on a Red Hat Linux system and configuring it as a replacement for NIS. In a future blog post I intend to cover the python-ldap package.

Install OpenLDAP
# tar xvfz openldap-stable-20050429.tgz
# cd openldap-2.2.26
# ./configure
# make
# make install

Configure and run the OpenLDAP server process slapd
  • In what follows, the LDAP domain is 'myldap'
  • Change the slapd root password:

[root@myhost openldap]# slappasswd
New password:
Re-enter new password:
  • Edit /usr/local/etc/openldap/slapd.conf

    • Change my-domain to myldap

    • Point 'directory' entry to /usr/local/var/openldap-data/myldap

    • Point 'rootpw' entry to line obtained via slappasswd: 'rootpw {SSHA}dYjrA1-JukrfESe/8b1HdZWfcToVE/cC'

    • Add following lines after 'include /usr/local/etc/openldap/schema/core.schema' line:

include         /usr/local/etc/openldap/schema/cosine.schema
include /usr/local/etc/openldap/schema/inetorgperson.schema
include /usr/local/etc/openldap/schema/misc.schema
include /usr/local/etc/openldap/schema/nis.schema
include /usr/local/etc/openldap/schema/openldap.schema
  • Create data directory:

mkdir /usr/local/var/openldap-data/myldap
  • Start up slapd server:

  • Test slapd by running an ldap search:

# ldapsearch -x -b '' -s base '(objectclass=*)' namingContexts
# extended LDIF
# LDAPv3
# base <> with scope base
# filter: (objectclass=*)
# requesting: namingContexts

namingContexts: dc=myldap,dc=com

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

Populate the LDAP database
  • Create /usr/local/var/openldap-data/myldap/myldap.ldif LDIF file:

dn: dc=myldap,dc=com
objectclass: dcObject
objectclass: organization
o: My LDAP Domain
dc: myldap

dn: cn=Manager,dc=myldap,dc=com
objectclass: organizationalRole
cn: Manager
  • Add LDIF contents to the LDAP database via ldapadd:

# ldapadd -x -D "cn=Manager,dc=myldap,dc=com" -W -f myldap.ldif
Enter LDAP Password:
adding new entry "dc=myldap,dc=com"

adding new entry "cn=Manager,dc=myldap,dc=com"
  • Verify that the entries were added by doing an LDAP search:

[root@myhost myldap]# ldapsearch -x -b 'dc=myldap,dc=com' '(objectclass=*)'
# extended LDIF
# LDAPv3
# base with scope sub
# filter: (objectclass=*)
# requesting: ALL

dn: dc=myldap,dc=com
objectClass: dcObject
objectClass: organization
o: My LDAP Domain
dc: myldap

# Manager,
dn: cn=Manager,dc=myldap,dc=com
objectClass: organizationalRole
cn: Manager

# search result
search: 2
result: 0 Success

# numResponses: 3
# numEntries: 2
  • Sample LDIF file with organizational unit info: /usr/local/var/openldap-data/myldap/myldap_ou.ldif

dn: ou=Sales,dc=myldap,dc=com
ou: Sales
objectClass: top
objectClass: organizationalUnit
description: Members of Sales

dn: ou=Engineering,dc=myldap,dc=com
ou: Engineering
objectClass: top
objectClass: organizationalUnit
description: Members of Engineering
  • Add contents of LDIF file to LDAP database via ldapadd:

[root@myhost myldap]# ldapadd -x -D "cn=Manager,dc=myldap,dc=com" -W -f myldap_ou.ldif
Enter LDAP Password:
adding new entry "ou=Sales,dc=myldap,dc=com"

adding new entry "ou=Engineering,dc=myldap,dc=com"
  • Sample LDIF file with user info: /usr/local/var/openldap-data/myldap/myldap_user.ldif

dn: cn=Larry Fine,ou=Sales,dc=myldap,dc=com
ou: Sales
o: myldap
cn: Larry Fine
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
givenname: Larry
sn: Fine
uid: larry
homePostalAddress: 15 Cherry Ln.$Plano TX 78888
postalAddress: 215 Fitzhugh Ave.
l: Dallas
st: TX
postalcode: 75226
telephoneNumber: (800)555-1212
homePhone: 800-555-1313
facsimileTelephoneNumber: 800-555-1414
userPassword: larrysecret
title: Account Executive
destinationindicator: /bios/images/lfine.jpg

dn: cn=Moe Howard,ou=Sales,dc=myldap,dc=com
ou: Sales
o: myldap
cn: Moe Howard
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
givenname: Moe
sn: Howard
uid: moe
initials: Bob
homePostalAddress: 16 Cherry Ln.$Plano TX 78888
postalAddress: 216 South Fitzhugh Ave.
l: Dallas
st: TX
postalcode: 75226
pager: 800-555-1319
homePhone: 800-555-1313
telephoneNumber: (800)555-1213
mobile: 800-555-1318
title: Manager of Product Development
facsimileTelephoneNumber: 800-555-3318
manager: cn=Larry Howard,ou=Sales,dc=myldap,dc=com
userPassword: moesecret
destinationindicator: /bios/images/mhoward.jpg

dn: cn=Curley Howard,ou=Engineering,dc=myldap,dc=com
ou: Engineering
o: myldap
cn: Curley Howard
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
givenname: Curley
sn: Howard
uid: curley
initials: Joe
homePostalAddress: 14 Cherry Ln.$Plano TX 78888
postalAddress: 2908 Greenville Ave.
l: Dallas
st: TX
postalcode: 75206
pager: 800-555-1319
homePhone: 800-555-1313
telephoneNumber: (800)555-1214
mobile: 800-555-1318
title: Development Engineer
facsimileTelephoneNumber: 800-555-3318
userPassword: curleysecret
destinationindicator: /bios/images/choward.jpg
  • Add contents of LDIF file to LDAP database via ldapadd:

[root@myhost myldap]# ldapadd -x -D "cn=Manager,dc=myldap,dc=com" -W -f myldap_users.ldif
Enter LDAP Password:
adding new entry "cn=Larry Fine,ou=Sales,dc=myldap,dc=com"

adding new entry "cn=Moe Howard,ou=Sales,dc=myldap,dc=com"

adding new entry "cn=Curley Howard,ou=Engineering,dc=myldap,dc=com"
  • Verify entries were added by doing an LDAP search:

[root@myhost myldap]# ldapsearch -x -b 'dc=myldap,dc=com' '(objectclass=*)'
  • Search output should end with:

# search result
search: 2
result: 0 Success

# numResponses: 8
# numEntries: 7

Replace NIS with LDAP

Generate ldif files from /etc/passwd and /etc/group and add them to the LDAP database
  • Generate ldif file for creating 'people' and 'group' organizational units:

  • Edit /usr/local/var/openldap-data/myldap/myldap_people.ldif:

dn: ou=people,dc=myldap,dc=com
objectclass: organizationalUnit
ou: people

dn: ou=group,dc=myldap,dc=com
objectclass: organizationalUnit
ou: group
  • Insert contents of myldap_people.ldif in LDAP database:

# ldapadd -x -D "cn=Manager,dc=myldap,dc=com" -W -f myldap_people.ldif
# tar xvfz MigrationTools.tgz
# cd MigrationTools-46/
  • Edit and specify following settings:

$DEFAULT_BASE = "dc=myldap,dc=com";
  • Generate passwd.ldif and group.ldif files:

[root@myhost MigrationTools-46]# ./ /etc/passwd /usr/local/var/openldap-data/myldap/myldap_passwd.ldif
[root@myhost MigrationTools-46]# ./ /etc/group /usr/local/var/openldap-data/myldap/myldap_group.ldif
  • Insert contents of myldap_passwd.ldif and myldap_group.ldif in LDAP database:

# ldapadd -x -D "cn=Manager,dc=myldap,dc=com" -W -f myldap_passwd.ldif
# ldapadd -x -D "cn=Manager,dc=myldap,dc=com" -W -f myldap_group.ldif
Install the pam_ldap and nss_ldap modules
# tar xvfz pam_ldap.tgz
# cd pam_ldap-180
# ./configure
# make
# make install
  • Install nss_ldap.tgz

# tar xvfz nss_ldap.tgz
# cd nss_ldap-243/
# ./configure --enable-rfc2307bis
# make
# make install
  • (See NOTE below before doing this) Edit /etc/ldap.conf (note that there's also /etc/openldap/ldap.conf; you need the one in /etc) and specify the following settings:

base dc=myldap,dc=com
scope sub
timelimit 30
pam_filter objectclass=posixAccount
nss_base_passwd ou=People,dc=myldap,dc=com?one
nss_base_shadow ou=People,dc=myldap,dc=com?one
nss_base_group ou=Group,dc=myldap,dc=com?one
  • (See NOTE below before doing this) Edit /etc/nsswitch.conf and specify:

passwd:     files ldap
shadow: files ldap
group: files ldap

NOTE: Instead of manually modifying /etc/ldap/conf and /etc/nsswitch.conf, you should run the authconfig utility and specify the LDAP server IP and the LDAP base DN ('dc=myldap,dc=com' in our example). authconfig will automatically modify /etc/ldap.conf (minus the nss_base entries), /etc/nsswitch.conf and also /etc/pam.d/system-auth. This is how /etc/pam.d/system-auth looks on a RHEL 4 system after running authconfig:

# This file is auto-generated.
# User changes will be destroyed the next time authconfig is run.
auth required /lib/security/$ISA/
auth sufficient /lib/security/$ISA/ likeauth nullok
auth sufficient /lib/security/$ISA/ use_first_pass
auth required /lib/security/$ISA/

account required /lib/security/$ISA/ broken_shadow
account sufficient /lib/security/$ISA/ uid < default="bad" success="ok" user_unknown="ignore]" retry="3">
Test the LDAP installation with an LDAP-only user

  • Add new user in LDAP database which doesn't exist in /etc/passwd; create /usr/local/var/openldap-data/myldap/myldap_myuser.ldif file:

dn: uid=myuser,ou=People,dc=myldap,dc=com
uid: myuser
cn: myuser
objectClass: account
objectClass: posixAccount
objectClass: top
objectClass: shadowAccount
userPassword: secret
shadowLastChange: 13063
shadowMax: 99999
shadowWarning: 7
loginShell: /bin/bash
uidNumber: 500
gidNumber: 500
homeDirectory: /home/myuser

dn: cn=myuser,ou=Group,dc=myldap,dc=com
objectClass: posixGroup
objectClass: top
cn: myuser
userPassword: {crypt}x
gidNumber: 500
  • Add contents of the myldap_myuser.ldif file to LDAP database via ldapadd:

# ldapadd -x -D "cn=Manager,dc=myldap,dc=com" -W -f myldap_myuser.ldif
  • Create /home/myuser directory and change permissions:

# mkdir /home/myuser
# chown myuser.myuser myuser
  • Change the password for user 'myuser' via ldappasswd:

ldappasswd -x -D "cn=Manager,dc=myldap,dc=com" -W -S "uid=myuser,ou=People,dc=myldap,dc=com"
  • Log in from a remote system via ssh as user myuser; everything should work fine

Adding another host to the myldap LDAP domain
  • On any client machine that you want to join the myldap LDAP domain

    • Make sure the OpenLDAP client package is installed (from source or RPM)

    • Install the nss_ldap and pam_ldap packages

    • Run authconfig and indicate the LDAP server and the LDAP base DN

    • In a terminal console, try to su as user myuser (which doesn't exist locally); it should work

      • To avoid the "home directory not found" message, you'll also need to NFS-mount the home directory of user myuser from the LDAP server

    • Restart sshd and try to ssh from a remote machine as user myuser; it should work (it didn't work in my case until I restarted sshd)

Various notes
  • At this point, you can maintain a central repository of user accounts by adding/deleting/modifying them on the LDAP server machine via various LDAP client utilities such as ldapadd/ldapdelete/ldapmodify
    • For example, to delete user myuser and group myuser, you can run:
# ldapdelete -x -D "cn=Manager,dc=myldap,dc=com" -W 'uid=myuser,ou=People,dc=myldap,dc=com'
# ldapdelete -x -D "cn=Manager,dc=myldap,dc=com" -W 'cn=myuser,ou=Group,dc=myldap,dc=com'
  • I experimented with various ACL entries in slapd.conf in order to allow users to change their own passwords via 'passwd'; however, I was unable to find the proper ACL incantations for doing this (if anybody has a recipe for this, please leave a comment)
  • To properly secure the LDAP communication between clients and the LDAP server, you should enable SSL/TLS (see this HOWTO)
Here are some links I found very useful:

OpenLDAP Quick Start Guide
YoLinux LDAP Tutorial
Linux LDAP Authentication article at
LDAP for Rocket Scientists -- Open Source guide at
Paranoid Penguin - Authenticate with LDAP part III at
Turn your world LDAP-tastic -- blog entry by Ed Dumbill

Using AWS CloudWatch Logs and AWS ElasticSearch for log aggregation and visualization

If you run your infrastructure in AWS, then you can use CloudWatch Logs and AWS ElasticSearch + Kibana for log aggregation/searching/visuali...