Friday, 6 July 2007

Setting up the Saxon Servlet

I found very little documentation on how to do implement the Saxon Servlet and get it working - so here goes.
1. Install Tomcat - I have Tomcat v6 already installed. It was the basic installation.
2. in webapps add a directory called xslt
copy the files to the following locations:
xslt
  |
  WEB-INF
      web.xml
    |
    classes
      SaxonServlet.class
    |
    lib
      Saxon8.jar + other saxon jars

web.xml consists of the following lines:
<web-app>
<display-name>Saxon Servlet Example</display-name>
<description>
Saxon Servlet Example
</description>
<servlet>
<servlet-name>SaxonServlet</servlet-name>
<servlet-class>SaxonServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>SaxonServlet</servlet-name>
<url-pattern>/SaxonServlet</url-pattern>
</servlet-mapping>
</web-app>


then use the following url to calls the transformation:
http://localhost:8080/xslt/SaxonServlet?source=books.xml&style=books.xsl

and to clear the stylesheet cache:
http://localhost:8080/xslt/SaxonServlet?clear-stylesheet-cache=yes

Thursday, 5 July 2007

XSLT Generating lists in InDesign Interchange

Lists within the INX format are wrapped in a <pcnt> element. Each list item is generated by a carriage return - there are no child elements to define a list item. This becomes an issue when processing inline elements from the source. The way I have overcome this is to have a two stage process. The template for a list applies templates in a given mode and hold the node-set in a variable, this variable is then transformed in another mode:
<xsl:template match="ul">
<xsl:variable name="list">
<xsl:apply-templates mode="inline"/>
</xsl:variable>
<xsl:apply-templates select="$list" mode="out"/>
</xsl:template>
Without this method the result contains elements for each list item.

Wednesday, 4 July 2007

InDesign Interchange Format

InDesign CS2 provides an export format called Interchange (INX) This produces an xml document which can be used for rolling InDesign documents back to previous versions. Although Adobe documentation denotes that this format should only be used for backward compatibility because it may change the INX schema in future releases, a tinkering guy like myself cant help but notice that if we generate an INX document from an XSL transformation on raw data we have a means of generating unique InDesign documents instead of using XML feeds.

To introduce a proof-of-concept I constructed an InDesign document based upon a PIL from a brand of well known hay-fever tablets. I then exported this format out as INX format using the exported INX as a guide attempted to produce and XSLT to take data from KMS and run out a formatted InDesign document. This works!

A few things are worth noticing:

ACE processing instructions: these insert characters disallowed in a well formed XML document - see http://partners.adobe.com/public/developer/indesign/sdk/explodedSDK/cs.01/docs/references/api/TextChar_8h-source.html
for a full listing

aid procssing instructions: these a markers and usually can be ignored when producing an INX using XSLT

Text nodes - each text node is preceded with c_ - this indicates that the element content is a string.

Lists - an IX list is contained within a <pcnt> element and each item is separated by a paragraph break

Hidden InDesign Characters

An interesting aspect I have come across when trying to introduce InDesign into workflows is the use of "hidden characters" in the exported xml and InDesign Interchange formats that denote line and paragraph breaks. Viewing in UTF-8 they are completely hidden, changing to an ANSI view they are displayed as "รข€©" - thats without the speech marks.
This can cause issues when using an XSL transformation to generate data for use within InDesign. Without these hidden characters all paragraph and line breaks cease to exist in the document.
The initial trick I used to to get around this was to include a named template thus:
<xsl:template name="break">
<xsl:text>
</xsl:text>
</xsl:template>
This was called each time a paragraph break was needed.
More recently I have found that using the character entities &#x2028; (for a paragraph break) and &#x2029; (for a line break) does the same trick.