Archive for the ‘ Software ’ Category

SiteMesh and Content Management @ O’Reilly OpenSource Conference

I’m talking at the O’Reilly OpenSource Conference (OSCON) – Wednesday Aug 3, Portland, Oregon.

Come and say hi.

A problem faced in every web application is how to separate style from content. SiteMesh is a framework that provides an elegant solution to this, resulting in a clean separation that is straightforward to work with, complements other web frameworks, and is easily applied to existing applications.

The first part of this session introduces SiteMesh, including an overview of the architecture and patterns, comparisons with other approaches, and how it can complement existing web frameworks (such as WebWork, Spring, and Struts).

The second part of this session demonstrates how SiteMesh can be blended with other technologies to form the foundation of a rich content management system that distinguishes between the specialized roles of users, their skills, and the most suitable tools. Content writers can use a word processor, web designers can use a WYSIWYG web development tool, and developers can use their IDE.

Allowing these different roles and tools to come together to produce one website is a trivial task with SiteMesh–allowing content management to be easily introduced to existing applications.

Finally, some of the advanced features of SiteMesh are discussed, such as real world tips and tricks, how to create custom strategies for which look and feel to apply, assembling pages from components and building portal style applications.

And for the first time, new features in SiteMesh 3 will be demonstrated, including extending the HTML processor, using it outside of SiteMesh, and offline support.

http://conferences.oreillynet.com/os2005/

XStream 1.1.2 released. Java 5 Enums, JavaBeans, field aliasing, StAX, and more…

New features:

  • Java 5 Enum support.
  • JavaBeanConverter for serialization using getters and setters.
  • Aliasing of fields.
  • StAX integration, with namespaces.
  • Improved support on JDK 1.3 and IBM JDK.

Changelog:
http://xstream.codehaus.org/changes.html

Full download:
http://dist.codehaus.org/xstream/distributions/xstream-1.1.2.zip

Jar only:
http://dist.codehaus.org/xstream/jars/xstream-1.1.2.jar

VB.Net is the bestest

I was happily coding away in VB.Net today (grrr) when I noticed a little weirdity in the intellisense popup.

stupid.gif

Documentation says:

The NotOverridable modifier defines a method of a base class that cannot be overridden in derived classes. All methods are NotOverridable unless marked with the Overridable modifier. You can use the NotOverridable modifier when you do not want to allow an overridden method to be overridden again in a derived class.

Makes C++ look simple.

XStream 1.1.1 released

I’m pleased to announce the release of XStream 1.1.1 – the powerful, yet easy to use Java to XML serialization library.

Some of the improvements in this release:

  • Converters can be registered with a priority, allowing more generic filters to handle classes that don’t have more specific converters.
  • Converters can now access underlying HierarchicalStreamReader/Writer implementations to make implementation specific calls.
  • Improved support for classes using ObjectInputFields and ObjectInputValidation to follow the serialization specification.
  • Default ClassLoader may be changed using XStream.setClassLoader().
  • Loads of bugfixes and performance enhancements.

Full change log: http://xstream.codehaus.org/changes.html
Download: http://xstream.codehaus.org/download.html

Accessing generic type information at runtime

A common misconception about generics in Java 5 is that you can’t access them at runtime.

What you can’t find out at runtime is which generic type is associated with an instance of an object. However you can use reflection to look at which types have been staticly associated with a member of a class.

public class GenericsTest extends TestCase {

class Thing {
public Map stuff;
}

public void test() throws Exception {
Field field = Thing.class.getField("stuff");
ParameterizedType type = (ParameterizedType) field.getGenericType();
assertEquals(Map.class, type.getRawType());
assertEquals(String.class, type.getActualTypeArguments()[0]);
assertEquals(Integer.class, type.getActualTypeArguments()[1]);
}

}

Just wanted to clear that up.

(This is something that I’ll probably exploit in XStream for J5 users to further simplify the XML.)

XStream 1.1 released

I’m pleased to announce the release of XStream 1.1.

New features include:

  • Improved support for serializing objects following the Java Serialization Specification:
    • Calls custom serialization methods, readObject(), writeObject(), readResolve() and writeReplace() in class, if defined.
    • Supports ObjectInputStream.getFields() and ObjectOutputStream.putFields() in custom serialization.
  • Provides implementations of ObjectInputStream and ObjectOutputStream, allowing drop in replacements for standard serialization,
    including support for streams of objects. [More…]
  • Reads and writes directly to most XML Java APIs: DOM, DOM4J, JDOM, XOM, Electric XML, StAX, Trax (write only), SAX (write only).
    [More…]

View the complete change log and
download.

JUnit tip: Setting the default timezone with a TestDecorator

A problem I was finding when testing XStream is that many of the tests were timezone dependent. Initially I had some code in each setUp()/tearDown() method to set the default timezone and reset it afterwards.

This lead to a lot of duplication. Putting the common code in a super class was an option but this lead to a fragile base class.

Using composition allows this commonality to be put in one case and applied to the relevant test cases. This gets out of the ‘single inheritance’ issue.

JUnit provides an (often overlooked) decorator class to help with this. Here’s my TimeZoneTestSuite:

  public class TimeZoneTestSuite extends TestDecorator {

private final TimeZone timeZone;
private final TimeZone originalTimeZone;

public TimeZoneTestSuite(String timeZone, Test test) {
super(test);
this.timeZone = TimeZone.getTimeZone(timeZone);
this.originalTimeZone = TimeZone.getDefault();
}

public void run(TestResult testResult) {
try {
TimeZone.setDefault(timeZone);
super.run(testResult);
} finally {
TimeZone.setDefault(originalTimeZone); // cleanup
}
}

}

To use it, you need to override the default test suite behavior by adding a method to your TestCase:

  public class MyTest extends TestCase {
public static Test suite() {
Test result = new TestSuite(MyTest.class); // default behavior
result = new TimeZoneTestSuite("EST", result); // ensure it runs in EST timezone
return result;
}

TestDecorators are a very powerful feature of JUnit – don’t forget about them.

XStream: how to serialize objects to non XML formats

As you know, XStream makes it easy to serialize objects to XML:

Person person = ...;
xstream.toXML(out);

Producing:

Joe
Walnes

123
433535


4545
4534

I often use this approach whilst debugging to dump out the contents of an object. It works, but my eyes just aren’t that good at parsing XML.

By creating an alternative writer implementation, XStream can be used to serialize objects in other formats:

Person person = ...;
xstream.marshal(stuff, new AnAlternativeWriter(out));

Producing the slightly more digestible:

com.blah.Person
firstName = Joe
lastName = Walnes
homePhone
areaCode = 123
number = 433535
cellPhone
areaCode = 4545
number = 4534

To gain roundtrip serialization/deserialization support in alternative formats to XML, you need to provide your own implementations of both HierarchicalStreamWriter and HierarchicalStreamReader.

Is 100% test coverage a BAD thing?

I’m a huuuge advocate of TDD and high test coverage, and I will often go to great lengths to ensure this, but is 100% such a good thing?

I recently heard Tim Lister talking about risk in software projects and the CMM (powerpoint slides).

The ‘ultimate’ level of CMM ensure that everything is documented, everything goes through a rigorous procedure, blah blah blah. Amusingly, Tim pointed out that no CEO in their right mind would ever want their organization to be like that as they would not be effectively managing risk. You only need this extra stuff when you actually need this extra stuff. If there’s little risk, then this added process adds a lot of cost with no real value – you’re just pissing away money.

This also applies for test coverage. There are always going to be untested parts of your system but when increasing the coverage you have to balance the cost with the value.

With test coverage, you get the value of higher quality software that’s easier to change, but it follows the Law of diminishing returns. The effort required to get from 99% to 100% is huge… couldn’t that be spent on something more valuable like adding business functionality or simplifying the system?

Personally, I’m most comfortable with coverage in the 80-90% region, but your mileage may vary.

Looking back at the SiteMesh HTML parser

Before talking about how the new SiteMesh HTML processor works (to be released in SiteMesh 3), I thought I’d write a bit about how the current parser has evolved since it’s first attempt in 1999 – purely in the interest of nostalgia.

The original version used a bunch of regular expressions to extract the necessary chunks of text from the document. This was easy to get running, but very error prone as the matches had no context about where they were in a document. For example, a <title> element in a <head> block is very important to SiteMesh, however sometimes they appear elsewhere, such as in a comment, <script> or <xml> block.

This was dumped, in favour of a DOM based parser, which initially used JTidy to convert HTML to XHTML so it could be traversed as a standard DOM tree. Much nicer, but very slooow. Too slow, so I switched to OpenXML, an XML parser that was tolerant to nasty HTML, giving a slight boost to performance. I was much happier with OpenXML – even though it still added a fair amount of overhead and rewrote bits of HTML that I didn’t want it to.

Annoyingly, not long after that, the OpenXML project merged with the IBM XML4J parser project, rebranded itself as the mighty Apache Xerces and promptly dropped support for HTML parsing. So now I was dependant on a library that no longer existed.

By this time, SiteMesh had been open-sourced, and along came Victor Salaman, who was the third user to discover it (after Mike Cannon-Brookes and Joseph Ottinger). He saw the potential but hated the parser. About three hours later, he’d produced his own version that used low-level string manipulation. It wasn’t pretty, but it went like the clappers – twelve times faster than the OpenXML one, with the bonus feature of not rewriting great chunks of the document. This brought SiteMesh into the mainstream as it was now ready for use on high-traffic sites. 1.0 was released.

This parser really is the core of SiteMesh. It’s been our friend thanks to its speed and reliability. It’s been our enemy because of it’s awkwardness to understand and change. For a couple of years it remained barely untouched, except when we occasionally poked at it from afar with a long pointy stick for the odd change. Three years later, Chris Miller and Hani Suleiman took the plunge and gave its guts an overhaul – making it six times faster! Very brave.

Despite its awkwardness, it proudly lived on and is still the primary ingredient of SiteMesh today. It’s even been ported to VB.Net!

I’ve kept my eye on other HTML parsers, such as HotSAX, NekoHTML and TagSoup, always with the intention of implementing an easier to maintain parser, but I just couldn’t get the performance to be anything like what Victor, Chris and Hani achieved.

The problem is that most HTML parsers try to represent an HTML document as tree of nodes, like XML. This makes sense as that’s what HTML is meant to be, however, to do this, every single tag in a document must be analysed and balanced accordingly. This is hard, error-prone and adds a lot of overhead.

There’s another approach though. The new parser focusses on ease-of-use and ability to customize, without compromising on performance and robustness. I hope you’ll like it…

Update: Sorry, I forget to mention Hani in the original posting of this. how could I forget!