Posts Tagged ‘ OpenSource ’

XStream 1.1.1 released

I’m pleased to announce the release of XStream 1.1.1 – the powerful, yet easy to use Java to XML serialization library.

Some of the improvements in this release:

  • Converters can be registered with a priority, allowing more generic filters to handle classes that don’t have more specific converters.
  • Converters can now access underlying HierarchicalStreamReader/Writer implementations to make implementation specific calls.
  • Improved support for classes using ObjectInputFields and ObjectInputValidation to follow the serialization specification.
  • Default ClassLoader may be changed using XStream.setClassLoader().
  • Loads of bugfixes and performance enhancements.

Full change log: http://xstream.codehaus.org/changes.html
Download: http://xstream.codehaus.org/download.html

XStream 1.1 released

I’m pleased to announce the release of XStream 1.1.

New features include:

  • Improved support for serializing objects following the Java Serialization Specification:
    • Calls custom serialization methods, readObject(), writeObject(), readResolve() and writeReplace() in class, if defined.
    • Supports ObjectInputStream.getFields() and ObjectOutputStream.putFields() in custom serialization.
  • Provides implementations of ObjectInputStream and ObjectOutputStream, allowing drop in replacements for standard serialization,
    including support for streams of objects. [More…]
  • Reads and writes directly to most XML Java APIs: DOM, DOM4J, JDOM, XOM, Electric XML, StAX, Trax (write only), SAX (write only).
    [More…]

View the complete change log and
download.

Looking back at the SiteMesh HTML parser

Before talking about how the new SiteMesh HTML processor works (to be released in SiteMesh 3), I thought I’d write a bit about how the current parser has evolved since it’s first attempt in 1999 – purely in the interest of nostalgia.

The original version used a bunch of regular expressions to extract the necessary chunks of text from the document. This was easy to get running, but very error prone as the matches had no context about where they were in a document. For example, a <title> element in a <head> block is very important to SiteMesh, however sometimes they appear elsewhere, such as in a comment, <script> or <xml> block.

This was dumped, in favour of a DOM based parser, which initially used JTidy to convert HTML to XHTML so it could be traversed as a standard DOM tree. Much nicer, but very slooow. Too slow, so I switched to OpenXML, an XML parser that was tolerant to nasty HTML, giving a slight boost to performance. I was much happier with OpenXML – even though it still added a fair amount of overhead and rewrote bits of HTML that I didn’t want it to.

Annoyingly, not long after that, the OpenXML project merged with the IBM XML4J parser project, rebranded itself as the mighty Apache Xerces and promptly dropped support for HTML parsing. So now I was dependant on a library that no longer existed.

By this time, SiteMesh had been open-sourced, and along came Victor Salaman, who was the third user to discover it (after Mike Cannon-Brookes and Joseph Ottinger). He saw the potential but hated the parser. About three hours later, he’d produced his own version that used low-level string manipulation. It wasn’t pretty, but it went like the clappers – twelve times faster than the OpenXML one, with the bonus feature of not rewriting great chunks of the document. This brought SiteMesh into the mainstream as it was now ready for use on high-traffic sites. 1.0 was released.

This parser really is the core of SiteMesh. It’s been our friend thanks to its speed and reliability. It’s been our enemy because of it’s awkwardness to understand and change. For a couple of years it remained barely untouched, except when we occasionally poked at it from afar with a long pointy stick for the odd change. Three years later, Chris Miller and Hani Suleiman took the plunge and gave its guts an overhaul – making it six times faster! Very brave.

Despite its awkwardness, it proudly lived on and is still the primary ingredient of SiteMesh today. It’s even been ported to VB.Net!

I’ve kept my eye on other HTML parsers, such as HotSAX, NekoHTML and TagSoup, always with the intention of implementing an easier to maintain parser, but I just couldn’t get the performance to be anything like what Victor, Chris and Hani achieved.

The problem is that most HTML parsers try to represent an HTML document as tree of nodes, like XML. This makes sense as that’s what HTML is meant to be, however, to do this, every single tag in a document must be analysed and balanced accordingly. This is hard, error-prone and adds a lot of overhead.

There’s another approach though. The new parser focusses on ease-of-use and ability to customize, without compromising on performance and robustness. I hope you’ll like it…

Update: Sorry, I forget to mention Hani in the original posting of this. how could I forget!

The road ahead for SiteMesh 3

Here’s an update of what’s in store for the upcoming SiteMesh releases and how they benefit you.

Firstly, there are a number of accumulated bugs that we’re steadily working our way through. The recent 2.1 and 2.2 releases have been mostly bugfixes, and this will continue for the 2.x series, including those related to using MVC frameworks such as Struts and WebWork.

Meanwhile, SiteMesh 3 has been brewing. It’s been four years since SiteMesh was first open-sourced (it existed for two years before that as closed-source) and in that time it hasn’t really changed significantly. SiteMesh 3 is going to see the largest set of improvements since it was initially released.

h3. Flexible HTML processing

The core of SiteMesh is based around an HTML parser that is very fast and tolerant to badly formed HTML, however at the cost of being extremely hard to extend.

SiteMesh 3 will contain a new parser, which is easy to customize, without compromising on performance and tolerance to malformed HTML. This will allow extensions to be written that can:

* Extract user-defined properties from the page beyond the predefined ones from <title>, <meta>, <content>, etc.
* Remove blocks of content from the page.
* Transform HTML as the page is parsed.

SiteMesh will come bundled with extensions for popular tasks and it will be trivial to add your own. More on this in a follow-up entry.

h3. Improved Velocity integration

This follows on from some work done by Atlassian and will allow a page to be generated using the Velocity API as an alternative to calling Servlet
RequestDispatchers and the Filter.

This offers significant performance improvements for applications that don’t use JSP and allows more of SiteMesh to be used in environments outside of the Servlet container, which leads nicely on to the next feature.

h3. Offline support with StaticMesh

There has been a lot of demand for using SiteMesh to generate web-sites in an offline manner. A common case for this is a simpler alternative to DocBook style tools, allowing documents to be authored in standard HTML capable word-processing tools (such as MS Word, OpenOffice and Mozilla Composer), giving you the full capabilities of a rich-text word-processor and without the need to learn a special markup/schema.

SiteMesh can then process these raw HTML files and generated another set of static HTML files with the appropriate presentation and navigation added.

Building upon the extended HTML processing capabilities, it will also be possible to do things like generate a table of contents, footnotes, and diagrams from inline syntax.

There have been at least three seperate incarnations of StaticMesh appear over the last few years. We hope to bring the best bits from each of these into the final version.

StaticMesh will have a simple API for configuration, bundled with a command-line wrapper and Ant task.

h2. Backwards compatability

Just to ease your minds, you’re not going to have to rewrite your applications to use SiteMesh 3. Great effort will be taken to ensure that backwards compatability is preserved. The library will have more features, but at the same time a lot of the old stuff can be simplified. Dependencies will be minimized and optional – for example, you will only need velocity.jar if you’re actually using the Velocity stuff.

I’ll be posting more information later.

Watch this space…

SiteMesh 2.2 Released

I’ve just released SiteMesh 2.2 This release fixes a number of minor bugs. No code changes are required if migrating from 2.1.

The following improvements have been made:

* The <excludes> tag in decorators.xml now takes into account ServletPath, PathInfo and QueryString.
* Overhaul of the main Servlet Filter to remove unnecessary complexity and more gracefully handle situations where the order of calls on the ServletResponse, PrintWriter and ServletOutputStream occur in an awkward order.

Links:

* Sitemesh
* Release notes and changes
* Download

Stay tuned for news on the cool new features coming up SiteMesh 3!…

XStream article on O’Reilly XML.com

http://www.xml.com/pub/a/2004/08/18/xstream.html

XStream roadmap: What do YOU want to see?

I’m assessing where the next version of XStream should go. There are three major chunks of work I’d like to tackle (described below), however I can only take on one at a time and need to prioritize.

What do you want to see? Leave your comments on my blog.

1. Full support for the Java serialization specification.

Current situation: XStream uses reflection to access an objects fields and convert them to and from XML.

Problem: While this works well with most custom objects, there are many classes (particularly in the standard java and javax packages) that rely on advanced features of the serialization API, by implementing the methods, readObject(), writeObject(), writeReplace() or the methods found in java.io.Externalizable.

Solution: The default Converter in XStream could be enhanced to support these features.

Outcome: XStream should be able to serialize any object that standard Java serialization can, without any modifications.

2. Improved JVM support.

Current situation: XStream uses reflection to instantiate objects when deserializing. Using plain reflection, this requires each object to have a default constructor, which XStream will call. An ‘enhanced-mode’ is also supported that uses JVM specific calls to construct objects without ever invoking a constructor, guaranteeing no side-effects.

Problem: Currently the ‘enhanced-mode’ is only available if using the Sun 1.4 JVM or later.

Solution: Enhance the enhanced mode to support more popular JVMs, including Sun 1.3, IBM 1.4, and JRockit.

Outcome: More users can use XStream without having to make intrusive modifications to their classes.

3. Configurable mappings.

Current situation: XStream was originally designed as a tool for transparent serialization of objects (specifically for persistence or to send across the wire in messaging systems). As such, the goal was to allow XStream to figure out what the XML should look like on the fly, without forcing the mappings to be defined for each class.

Problem: An unforseen popular usage of XStream is for assembling objects from XML configuration files. However, because XStream has been designed to just figure out the XML on the fly, it does not allow users to (easily) specify their own XML structure.

Solution: To add more simple to use ‘tweaks’ to the generated XML. For example, to allow attributes to be used in some elements, alias elements based on where they appear or remove redundant tags to make the XML easier to understand.

Outcome: More flexibility for the resulting XML, without forcing users to write custom converters.