Gone sailing

It’s time to unplug for some time and calm down, disconnect not only from the Web, but also from the daily life. In case you need me during the coming week, I’ll be sailing along the Spanish coast — no mobile nor WiFi coverage, sorry. I have several drafts of blog posts waiting patiently to be polished, so be prepared for something new when I return with my mental batteries reloaded.

Why it’s good to be lazy

Recently I’ve presented my talk on Functional Programming with Python at the RuPy conference in Poznań, the slides (this time in English) are available below. Organizers promise that video recordings from all talks will be published shortly, I will keep you informed when it happens.

Tracing and profiling Ruby code

Every child knows that premature optimization is the root of all evil, and even when optimization is necessary, we should concentrate on the bottlenecks. This is where profiling becomes crucial. Ruby includes a simple profiler in the standard library, so to generate a report of program execution you just have to invoke it with ruby -r profile or add require "profile" to the code. In fact the whole profiler is implemented in only 59 lines of Ruby and relies on set_trace_func method to register a callback tracing certain events during program execution (method calls and returns in this case). This tool should suffice for simple profiling, but if you need something faster and more powerful you should rather try ruby-prof.

The powerful introspection features of dynamic languages make tricks like this not only possible, but also straightforward. This gives me an idea that the same approach could be used to implement an aspect-oriented library for Ruby — but I’m almost sure somebody has already tried this.

Thinner Ruby deployment

In the post on benchmarking HTTP performance I mentioned that according to my tests a cluster of Mongrels performs about 10-20% worse than the same number of FastCGI processes behind a reverse proxy. Recently I tried Thin, a new web server based on Mongrel libraries, and it turns out to be a solution that gets the best of both worlds. It is very easy to setup and manage (even easier than Mongrel), extremely flexible (mostly thanks to Rack) and really fast. It matches FastCGI in performance, without all the quirks, and can communicate through UNIX sockets too. I have to admit that I was impressed by the simple and clean design of Thin (which is based on existing quality modules). The only disadvantage is that it isn’t very mature yet — but in the near future Thin might become the best server for deploying Ruby web applications.

Benchmarking HTTP performance

Deployment of Rails application is a subject that tends to raise some hot discussions, leading to many misunderstandings. That’s why I decided to try different deployment strategies and check for myself how they perform.

To make any reasonable comparisons it is crucial to measure performance of different configurations. The most common metric is the number of requests processed per second (RPS). This metric (and many others) can be measured by HTTP benchmarking tools like ab and httperf.

The first tool, ab, comes bundled with Apache and is very easy to use, so it is a good option to start with. You can provide a total number of requests to perform (-n) and a number of concurrent requests (-c). If you like you can also give maximum time to wait for a response (-t), as real users won’t wait for a page to load more than just a few seconds.

For example to issue 1000 requests with concurrency of 100 you might run (remember about a trailing slash in the URL, it is necessary)

% ab -n 1000 -c 100 http://www.example.com/

httperf is a slightly more complex tool with more features. The most important is a possibility to issue multiple request per connection (--num-calls command line option) and support for replaying sessions that imitate real use cases. The tool is also believed to be more robust and give more reliable results. The basic use might look like

% httperf --server www.example.com --num-conn 1000 \
          --num-call 10 --rate 10

This will issue 1000 connections with a rate of ten connections per second (and no more), passing ten requests through each connection before it is closed. So the total number of requests will be 10000. Be sure to remember the distinction between connections and requests, otherwise this can lead to confusion when interpreting results. Another tricky part is the actual meaning of the rate command line option. Rate is not a number of simultaneous connections at a given time (like concurrency in ab), but rather a number of new connections made per second. This means your RPS cannot exceed rate given multiplied by number of requests per connection. So httperf has to be ran multiple times with increasing rate to find the saturation point of the server.[1]

When benchmarking HTTP performance don’t just accept the first results blindly. Think for a minute what you are actually measuring. Check the status of the replies — if most of requests fail it is a sign that something is wrong, if you are getting 3xx redirects probably you should rather test the URL the redirects point to. If many requests have timed out the concurrency you requested might be too high.

Never perform such tests from your desktop machine far away from the server. In the perfect world you should run the benchmark from an independent machine in the same network segment as the server, and make sure the network is not saturated during the test. If you have to run the tests on local machine, remember that the load caused by the test itself can skew the results (note that from my experience ab causes considerably smaller load than httperf).

Finally consider where the URL you provided points to. If this is a static page or file, you can easily achieve thousands of RPS, as the performance is bounded mostly by disk operations. On the other hand if you measure a dynamic page running multiple SQL queries you might get very low results, as the database will be the bottleneck. Many recommend to benchmark a simple dynamic “hello world” application that doesn’t communicate with the database. But if you want to measure performance of the application, not a web server, you can measure and compare different URLs.

In my benchmarks I found out that three Mongrel instances load-balanced by Pound are about 10-20% slower than three static[2] FastCGI processed running from a vanilla Apache installation. It is probably due to the fact that the front-end server communicates with Mongrels through TCP connections, which are considerably slower than UNIX sockets used by FastCGI. On the other hand this architecture makes scaling Mongrels easier, because one load balancer can proxy requests to multiple machines.

It looks like there are reasonable arguments for both strategies, and I find it a bit surprising that the whole Rails community is voting against FastCGI, calling it a legacy solution. It’s true that FastCGI can be tricky to setup correctly — but at the end of the day it performs better, and there are other benchmarks showing similar results (as shown on this chart).

[1] More information on good HTTP benchmarking practices and the usage of httperf can be found in the Linux HTTP Benchmarking HOWTO.

[2] Never use dynamic FastCGI processes for production purposes. Dynamic processes are killed when unused and due to timing issues users can get internal server errors. Moreover every request assigned to a fresh process is delayed, as it has to wait for the new process to boot.

Subversion Scripts for Finder

Subversion is one of the basic tools in my daily work. I know, distributed version control is more en vogue those days, but I would argue that for personal use and small teams Subversion is still a reasonable choice[1] — it is very popular, flexible and there are many additional tools available.

About a year ago, when I started to play with my first Mac, I was looking for Subversion tools that can integrate smoothly with Finder — the standard Mac OS X file manager. To my surprise I couldn’t find anything useful, only SCPlugin which didn’t work at that time, and as far as I know is still somehow buggy. So I decided to write my own set of scripts, as an excuse to play with AppleScript — a funny high-level scripting language that can speak with Mac applications (including Finder) over simple interfaces called dictionaries, not surprisingly consisting of nouns (objects) and verbs (methods). This custom set of scripts had been so useful to me (especially when invoked from Quicksilver) that I decided to release it publicly, starting a small open source project.

Recently I’ve released version 1.2 of the scripts, including support for Copy, Move and Checkout operations, with improved Leopard and MacPorts support. The release was also a good excuse to make some adjustments to the project page and publish a screencast showing how to use the scripts. Judging from the download statistics and the feedback I get, people find the project useful, so if you are a Mac user consider giving it a try!

[1] But not for projects with many independent branches of development — branching and merging sucks in Subversion (it will be improved in Subversion 1.5, which is now in beta). Linus’ critical opinion on Subversion is well known, and I don’t claim it is the best choice for large open source projects (though many such projects use it).

Trivial accessors and uniform access

Some tend to think that Java is a synonym of object orientation done right, some even don’t know other alternatives. But it was always unnatural to me that most of Java classes start their existence with plenty of boilerplate code like this[1]

public class Money {
    private double amount;
 
    public double getAmount() {
        return this.amount;
    }
 
    public void setAmount(double amount) {
        this.amount = amount;
    }
}

This is a lot of code to write just to define one single property, a code that is mostly meaningless. But in Java you have to introduce getters and setters from the very beginning, or it will bite you back in the future. It clearly contradicts with the DRY principle and a preference for evolutionary design, which discourages writing code that is useless right now, but may (or may not) be needed in the future. Things get even worse when such code is created automatically by some code generation tool.

In theory your methods should always have some meaningful behaviour, and your should avoid trivial accessors in the public interface. This is a good rule of a thumb, and when it’s broken, this is often a symptom of some wrong design decisions. Though in many practical situations you simply need trivial accessors without any behaviour, for example when mapping relational databases to objects[2].

The whole problem boils down to the fact that in Java you can’t apply the Uniform Access Principle, which states that users of a class shouldn’t care whether a given service is implemented through storage (property) or computation (method). But the syntax for accessing a property and calling a method in Java is completely different, and you can’t start with a simple public property and change it into a method later when it becomes necessary, keeping the public interface intact. So you are told not to use public properties at all and always define trivial accessors just in case.

I would like to contrast this approach with two dynamic languages, Python and Ruby, each presenting a different point of view on the problem we discuss.

In Ruby — which has been inspired by Smalltalk — properties (instance variables) are always private, and the only way to interact with an object is by sending messages to it. This is similar to a method call, but the meaning is slightly different, and there are certain conventions to make the syntax nicer. You can’t access instance variables outside the class, so the following code

cash = Money.new
cash.amount = 10
puts cash.amount

is actually the same as sending messages amount= and amount to the instance, which can be written explicitly as

cash = Money.new
cash.amount=(10)
puts cash.amount()

This means that Ruby has an uniform syntax for attribute access, but you still have to write message handling methods inside the class. This is where attr_accessor comes in handy (along with its siblings attr_reader and attr_writer), avoiding duplication and making the code more terse. The following piece of code

class Money
  attr_accessor :amount
end

has the same effect as

class Money
  def amount
    @amount
  end
 
  def amount=(value)
    @amount = value
  end
end

When the class evolves and we would like to make accessors more complex (for example implement lazy load or caching) we can replace attr_accessor with real methods, keeping external interface intact.

Python takes a different approach than the message passing metaphor. It publicly exposes all attributes of an instance as slots you can access freely. Inside such slot can be any object (in Pythonic sense of the word), including standard objects (integers, tuples, etc.) and methods. The client simply fetches object from the slot and either invokes it (if it is a callable) or uses its value directly — so the access is not uniform.

To maintain an illusion of uniform access when refactoring a property into a method you can use the property() function, passing new getters and setters as arguments. This means you can start with a class as simple as

class Money(object):
    def __init__(self):
        self.amount = 0

Later, when you need some more complex accessors, you can refactor the class with property(), maintaining the same external interface

class Money(object):
    def _get_amount(self):
        # Getter code here
        return self.amount
 
    def _set_amount(self, value):
        # Setter code here
        self.amount = value
 
    amount = property(_get_amount, _set_amount)

As you see the code is not as clear as with Ruby, and there are some other problems with this approach, but it is possible to maintain uniform access in Python.

I believe obeying the Uniform Access Principle is the right way of solving the accessor problem, and both Ruby and Python handle this quite well. If you see trivial getProperty() and setProperty() methods in Python or Ruby code, stay aware. This probably means the code has been written by a programmer who is unable to change his mindset.

[1] To convince you this is not a fake example I did a quick search on the Web, finding this piece of code.

[2] Martin Fowler on page 155 of his PEAA book gives example of a class to map a simple person table. He writes it starts with data fields and accessors and then gives an example of over twenty lines of boilerplate accessor code.

PS. Thanks to Tomek for reviewing the first draft of this article.

Sharing knowledge inside a team

What I like about being a programmer is that you have to constantly learn new things — either new languages, tools and frameworks that make your job easier (and more fun), or interesting theoretical concepts that stretch your mind, a kind of mental yoga. Being a math graduate I can tell that even if this knowledge is not instantly useful, it will probably pay off in the future.

There are different ways for programmers to gather knowledge, most have something to do with reading. But, as skilled craftsmen have known for centuries, the best way to learn is from a person who is willing to share his real-life experiences, tricks and habits. For example skimming through a cryptic Vim reference sheet or reading even the best tutorial is so much different than seeing the actual usage patterns of a skilled user.

That’s why every week at Code Sprinters we organize Tech Talks, meetings during which one person speaks about about a topic of his (or her) interest. Those are not official presentations with slides, but rather discussions around the whiteboard, sometimes turning into hands-on workshop. The subject doesn’t have to be purely technical — a few weeks ago I spoke about Getting Things Done, a way to organize myself I use and find very useful. We don’t claim that we are experts on a given subject, but certainly each of us knows something that might be interesting to others.

I think holding such meetings is a great (and quite easy) way to spread the knowledge inside a team or a small company, and it’s also fun!

Functional programming in Python

Below you can find the slides from my talk on functional programming in Python, which I presented a few days ago at the first meeting of Pythonistas in Kraków (the slides are in Polish only, sorry). Feel free to leave your comments!

Update 2008-04-24: Slides are also available in English.

Agile goes underground

About a year ago I had an idea to run an informal Open Space event here in Kraków, to gather a group of people interested in Agile methods of software development. At that time there was little or no interest, so my motivation slowly decreased. But a few weeks ago I met with Kuba, who had a very similar idea some time ago, and we decided to take a try. After a few days of quick (and quite dirty) preparations we are proud to invite you to the first iteration of Agile Underground, an event which will take place on Feb 28th here in Kraków.