How my backflip went…

Well, I’m still alive :)

This is a brief entry as I’m currently enjoying a two week vacation in Thailand. I’ll give you the full details when I get back (assuming my inbox hasn’t burst by then).

I achieved a single back handspring, and eventually three back handsprings. I failed to do a complete back somersault after four attempts. Two out of three ain’t bad.

I’m happy though. Thanks to an amazing number of sponsors, I’ve raised 1400 GBP (approx 2500 USD) that is to be donated to Autistic Research.

I’ve got some photos and videos to upload when I get back. Jez has also uploaded some.

Backflippin’ in 4 hours.

Gulp. Six weeks went quickly.

The story so far.

If anyone has a video camera (or one of those digital cameras that can record video snippets), please bring it along… I’m having a bit of a camera shortage.

Is 100% test coverage a BAD thing?

I’m a huuuge advocate of TDD and high test coverage, and I will often go to great lengths to ensure this, but is 100% such a good thing?

I recently heard Tim Lister talking about risk in software projects and the CMM (powerpoint slides).

The ‘ultimate’ level of CMM ensure that everything is documented, everything goes through a rigorous procedure, blah blah blah. Amusingly, Tim pointed out that no CEO in their right mind would ever want their organization to be like that as they would not be effectively managing risk. You only need this extra stuff when you actually need this extra stuff. If there’s little risk, then this added process adds a lot of cost with no real value – you’re just pissing away money.

This also applies for test coverage. There are always going to be untested parts of your system but when increasing the coverage you have to balance the cost with the value.

With test coverage, you get the value of higher quality software that’s easier to change, but it follows the Law of diminishing returns. The effort required to get from 99% to 100% is huge… couldn’t that be spent on something more valuable like adding business functionality or simplifying the system?

Personally, I’m most comfortable with coverage in the 80-90% region, but your mileage may vary.

Looking back at the SiteMesh HTML parser

Before talking about how the new SiteMesh HTML processor works (to be released in SiteMesh 3), I thought I’d write a bit about how the current parser has evolved since it’s first attempt in 1999 – purely in the interest of nostalgia.

The original version used a bunch of regular expressions to extract the necessary chunks of text from the document. This was easy to get running, but very error prone as the matches had no context about where they were in a document. For example, a <title> element in a <head> block is very important to SiteMesh, however sometimes they appear elsewhere, such as in a comment, <script> or <xml> block.

This was dumped, in favour of a DOM based parser, which initially used JTidy to convert HTML to XHTML so it could be traversed as a standard DOM tree. Much nicer, but very slooow. Too slow, so I switched to OpenXML, an XML parser that was tolerant to nasty HTML, giving a slight boost to performance. I was much happier with OpenXML – even though it still added a fair amount of overhead and rewrote bits of HTML that I didn’t want it to.

Annoyingly, not long after that, the OpenXML project merged with the IBM XML4J parser project, rebranded itself as the mighty Apache Xerces and promptly dropped support for HTML parsing. So now I was dependant on a library that no longer existed.

By this time, SiteMesh had been open-sourced, and along came Victor Salaman, who was the third user to discover it (after Mike Cannon-Brookes and Joseph Ottinger). He saw the potential but hated the parser. About three hours later, he’d produced his own version that used low-level string manipulation. It wasn’t pretty, but it went like the clappers – twelve times faster than the OpenXML one, with the bonus feature of not rewriting great chunks of the document. This brought SiteMesh into the mainstream as it was now ready for use on high-traffic sites. 1.0 was released.

This parser really is the core of SiteMesh. It’s been our friend thanks to its speed and reliability. It’s been our enemy because of it’s awkwardness to understand and change. For a couple of years it remained barely untouched, except when we occasionally poked at it from afar with a long pointy stick for the odd change. Three years later, Chris Miller and Hani Suleiman took the plunge and gave its guts an overhaul – making it six times faster! Very brave.

Despite its awkwardness, it proudly lived on and is still the primary ingredient of SiteMesh today. It’s even been ported to VB.Net!

I’ve kept my eye on other HTML parsers, such as HotSAX, NekoHTML and TagSoup, always with the intention of implementing an easier to maintain parser, but I just couldn’t get the performance to be anything like what Victor, Chris and Hani achieved.

The problem is that most HTML parsers try to represent an HTML document as tree of nodes, like XML. This makes sense as that’s what HTML is meant to be, however, to do this, every single tag in a document must be analysed and balanced accordingly. This is hard, error-prone and adds a lot of overhead.

There’s another approach though. The new parser focusses on ease-of-use and ability to customize, without compromising on performance and robustness. I hope you’ll like it…

Update: Sorry, I forget to mention Hani in the original posting of this. how could I forget!

The road ahead for SiteMesh 3

Here’s an update of what’s in store for the upcoming SiteMesh releases and how they benefit you.

Firstly, there are a number of accumulated bugs that we’re steadily working our way through. The recent 2.1 and 2.2 releases have been mostly bugfixes, and this will continue for the 2.x series, including those related to using MVC frameworks such as Struts and WebWork.

Meanwhile, SiteMesh 3 has been brewing. It’s been four years since SiteMesh was first open-sourced (it existed for two years before that as closed-source) and in that time it hasn’t really changed significantly. SiteMesh 3 is going to see the largest set of improvements since it was initially released.

h3. Flexible HTML processing

The core of SiteMesh is based around an HTML parser that is very fast and tolerant to badly formed HTML, however at the cost of being extremely hard to extend.

SiteMesh 3 will contain a new parser, which is easy to customize, without compromising on performance and tolerance to malformed HTML. This will allow extensions to be written that can:

* Extract user-defined properties from the page beyond the predefined ones from <title>, <meta>, <content>, etc.
* Remove blocks of content from the page.
* Transform HTML as the page is parsed.

SiteMesh will come bundled with extensions for popular tasks and it will be trivial to add your own. More on this in a follow-up entry.

h3. Improved Velocity integration

This follows on from some work done by Atlassian and will allow a page to be generated using the Velocity API as an alternative to calling Servlet
RequestDispatchers and the Filter.

This offers significant performance improvements for applications that don’t use JSP and allows more of SiteMesh to be used in environments outside of the Servlet container, which leads nicely on to the next feature.

h3. Offline support with StaticMesh

There has been a lot of demand for using SiteMesh to generate web-sites in an offline manner. A common case for this is a simpler alternative to DocBook style tools, allowing documents to be authored in standard HTML capable word-processing tools (such as MS Word, OpenOffice and Mozilla Composer), giving you the full capabilities of a rich-text word-processor and without the need to learn a special markup/schema.

SiteMesh can then process these raw HTML files and generated another set of static HTML files with the appropriate presentation and navigation added.

Building upon the extended HTML processing capabilities, it will also be possible to do things like generate a table of contents, footnotes, and diagrams from inline syntax.

There have been at least three seperate incarnations of StaticMesh appear over the last few years. We hope to bring the best bits from each of these into the final version.

StaticMesh will have a simple API for configuration, bundled with a command-line wrapper and Ant task.

h2. Backwards compatability

Just to ease your minds, you’re not going to have to rewrite your applications to use SiteMesh 3. Great effort will be taken to ensure that backwards compatability is preserved. The library will have more features, but at the same time a lot of the old stuff can be simplified. Dependencies will be minimized and optional – for example, you will only need velocity.jar if you’re actually using the Velocity stuff.

I’ll be posting more information later.

Watch this space…

Joe’s Backflipping for Autistic Research – time is nearly up…

A reminder of my challenge:

I’ve had six weeks (I’m now on week five) to learn to do backflips to raise money for the International Autistic Research Organization.

This includes:

1. A single back handspring
2. Three back handsprings in a row
3. A single back somersault

Full details, including descriptions of the moves, can be found in my previous blog entry.

h2. When/Where?

I’m going to attempt this on Friday 8th October at 3pm. If you’re in London, and want to come and watch it’ll be at Finsbury Pavement – nearest tube Moorgate or Liverpool Street.

If you can’t make it, there will also be a video available on the net, not long afterwards.

h2. To those who have sponsored me

I’ve been completely overwhelmed by the generousity of the sponsors. I genuinely never expected to raise this amount of money. I’ll reveal the total amount on the day. Of course, it’s now down to me to actually do the stuff so you pay up :).

So, once again, THANK YOU!

After the challenge, I’ll send out details of how to get the money to me.

h2. To those who haven’t sponsored me yet

There’s still time! Just email me saying how much you’ll sponsor for
1) A single back handspring
2) Three back handsprings in a row
3) A single back somersault

For example: “Joe, I will sponsor you 10, 20, 50” (or currency of your choosing).

Note that if I achieve more than one of these, you only pay the maximum amount.

h2. My progress

My shoulder is slightly injured and I’m still feeling dizzy from the time I landed on the ground head-first. Other than that it’s been going well. I learned the technique from some tutorials I found round the net, and have been practising every weekend on my bouncy castle.

The biggest problem I have at the moment is fear. No matter how many times I do it, I always get really nervous before launching and usually bottle out. I hope this doesn’t happen on the day.

For safety reasons, I’ve decided that I will attempt the complete back somersault (i.e. with no hands) with an inflatable matress behind me to break my fall, rather than my neck. The back handsprings shall be on grass. I will also be wearing wrist supports.

SiteMesh 2.2 Released

I’ve just released SiteMesh 2.2 This release fixes a number of minor bugs. No code changes are required if migrating from 2.1.

The following improvements have been made:

* The <excludes> tag in decorators.xml now takes into account ServletPath, PathInfo and QueryString.
* Overhaul of the main Servlet Filter to remove unnecessary complexity and more gracefully handle situations where the order of calls on the ServletResponse, PrintWriter and ServletOutputStream occur in an awkward order.

Links:

* Sitemesh
* Release notes and changes
* Download

Stay tuned for news on the cool new features coming up SiteMesh 3!…

Advanced SiteMesh

Sunil Patil has written an article about SiteMesh for ONJava.com:

Read Advanced SiteMesh.

REST and FishEye

The latest version of FishEye now has a REST API (oh and an XML-RPC API too, which is less interesting).

I’m happy partly because FishEye is a great tool and this will allow integration with other tools – things we are seeing more and more with its natural friends like JIRA, DamageControl and Confluence.

But what I’m really glad to see is that they made the decision to go with REST rather than just stick with the usual suspects SOAP or XML-RPC. Looking at the actual API for FishEye, REST seems like a no-brainer as all of the calls are queries.

On the last system I worked on, we were struggling with SOAP and switched to a simpler REST approach. It had a number of benefits.

Firstly, it simplified things greatly. With REST there was no need for complicated SOAP libraries on either the client or server, just use a plain HTTP call. This reduced coupling and brittleness. We had previously lost hours (possibly days) tracing problems through libraries that were outside of our control.

Secondly, it improved scalability. Though this was not the reason we moved, it was a nice side-effect. The web-server, client HTTP library and any HTTP proxy in-between understood things like the difference between GET and POST and when a resource has not been modified so they can offer effective caching – greatly reducing the amount of traffic. This is why REST is a more scalable solution than XML-RPC or SOAP over HTTP.

Thirdly, it reduced the payload over the wire. No need for SOAP envelope wrappers and it gave us the flexibility to use formats other than XML for the actual resource data. For instance a resource containing the body of an unformatted news headline is simpler to express as plain text and a table of numbers is more concise (and readable) as CSV.

REST is something I hadn’t really considered until recently and is by no means a golden-hammer, but in our case it worked out better than SOAP and XML-RPC.

The power of closures in C# 2.0

Martin Fowler (obligitary Fowlbot namedrop) recently blogged about the power of closures in languages that support them. It’s worth remembering that C# 2.0 has true closure support in the form of anonymous delegates. This includes reading and modifying variables outside the context of the closure – unlike Java’s anonymous inner classes.

Just for kicks, I’ve rewritten all of the examples Martin’s Ruby examples in C# 2.0. This makes use of the improved APIs in .NET 2.0 pointed out by Zohar.

Ruby C# 2.0
def managers(emps)
return emps.select {|e| e.isManager}
end
public List<Employee> Managers(List<Employee> emps) {
return emps.FindAll(delegate(Employee e) {
return e.IsManager;
}
}
def highPaid(emps)
threshold = 150
return emps.select {|e| e.salary > threshold}
end
public List<Employee> HighPaid(List<Employee> emps) {
int threshold = 150;
return emps.FindAll(delegate(Employee e) {
return e.Salary > threshold;
});
}
def paidMore(amount)
return Proc.new {|e| e.salary > amount}
end
public Predicate<Employee> PaidMore(int amount) {
return delegate(Employee e) {
return e.Salary > amount;
}
}
highPaid = paidMore(150)
john = Employee.new
john.salary = 200
print highPaid.call(john)
Predicate<Employee> highPaid = PaidMore(150);
Employee john = new Employee();
john.Salary = 200;
Console.WriteLine(highPaid(john));

The code difference between the languages isn’t that difference. The C# 2.0 code is obviously longer (though not a lot) because:

  • C# 2.0 is staticly typed (let’s not get started on the static vs dynamic debate).
  • C# 2.0 requires the ‘delegate’ keyword.
  • Ruby allows you to ignore the ‘return’ keyword.

You can try this stuff out yourself by playing with Visual C# Express.