Monday 2 December 2013

Accessing network drives in XSLT

Issue:

I have an xml document that lists several folders:

<folders>
 <folder>foo1</folder>
 <folder>foo2</folder>
 <folder>foo3</folder>
</folders>

and these need to be used to create three result documents on a shared folder on the network using Saxon 9 as the transformation engine.

Resolution:

Using the simple template:

<xsl:template match="folder">
  <xsl:result-document href="{.}/new.xml" method="xml">
  <new>
   <folder/>
  </new>
  </xsl:result-document>
</xsl:template> 

we can specify the absolute path in the href attribute.

Either

Map the shared folder on the network drive to a local drive letter and use the following:

file:///w:/share/{.}/new.xml

Write directly to the shared folder on the drive (note the four forward slashes)

file:////networkPC1234/share/{.}/new.xml

Thursday 31 October 2013

XSD schema: Sequence of Optional Elements where one Element is allowed in any Position

A requirement recently arose where I needed to have a sequence of optional elements foo1, foo2, foo3, foo4, f005 where an optional element bar was permitted between any one of the foo{n} elements.

The following definition demonstrates a definition for just such a case, where foo2 can have multiple occurrences.

<xsd:element name="container">
  <xsd:complexType>
    <xsd:sequence>
      <xsd:choice minOccurs="0" >
        <xsd:sequence>
          <xsd:element ref="foo1"/>
          <xsd:element ref="bar"  minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence> 
      </xsd:choice>
      <xsd:choice minOccurs="0"  maxOccurs="unbounded">
        <xsd:sequence>
          <xsd:element ref="foo2"/>
          <xsd:element ref="bar"  minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence> 
      </xsd:choice>
      <xsd:choice minOccurs="0" >
        <xsd:sequence>
          <xsd:element ref="foo3"/>
          <xsd:element ref="bar"  minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence> 
      </xsd:choice>
      <xsd:choice minOccurs="0" >
        <xsd:sequence>
          <xsd:element ref="foo4"/>
          <xsd:element ref="bar"  minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence> 
      </xsd:choice>
    </xsd:sequence>
  </xsd:complexType>
</xsd:element> 

Wednesday 30 October 2013

XSLT - return filename of the input document

On occasion one needs to use the input filename of the document that is being transformed. This can be supplied as a parameter, but it can also be determined within the XSLT by using either the base-uri() or document-uri() functions. Use global variables to return the uri of the document and then using the forward slash to tokenize the variable. The last token will be the filename.



<xsl:variable name="base-uri" select="base-uri(.)"/>
<xsl:variable name="document-uri" select="document-uri(.)"/>
 
<xsl:variable name="filename" select="(tokenize($document-uri,'/'))[last()]"/>
 
<xsl:template match="/">
 <docs>
  <uri><xsl:value-of select="$document-uri"/></uri>
  <uri><xsl:value-of select="$base-uri"/></uri>
  <filename><xsl:value-of select="$filename"/></filename>
  <xsl:apply-templates/>
 </docs>
</xsl:template> 

Thursday 24 October 2013

XSLT: Quotation marks and apostrophes

XSLT 1 - Single quotes and apostrophes

An intriguing issue came up when trying to remove an apostrophe from a text string using XSLT 1. After a little research I found this method of getting around it:

It was a simple case of transposing the double quotes and single quotes in the element and function

<xsl:value-of select='translate($textstring, "&apos;", "_")'/>

This worked for both Saxon6 and MS transformation engines

XSLT 1 - Double quotes

For double quotes use:

<xsl:value-of select="translate(regex-group(2),'&quot;', '')"/>

XSLT 2 - Single quotes and apostrophes

When using XSLT 2 the same result can be achieved using:

<xsl:value-of select="translate($textstring, '''', '_')"/>

or:

<xsl:value-of select="replace($textstring, '''', '_')"/>

XSLT 2 - Double quotes

A similar mechanism can be used with XSLT for double quotes, either:

<xsl:value-of select='translate($textstring, """", "_")'/>

or:

<xsl:value-of select='replace($textstring, """", "_")'/>

Monday 21 October 2013

Namespace Identity Transformation

A simple identity transformation for switching namespaces

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="oldNS" select="'http://foo/1.0'"/>
 <xsl:param name="newNS" select="'http://bar/1.0'"/>

 <xsl:template match="/">
  <xsl:apply-templates/>
 </xsl:template> 
 
 <xsl:template match="*">
  <xsl:element name="{name()}" namespace="{$newNS}">
   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*">
  <xsl:choose>
   <xsl:when test="namespace-uri()=$oldNS">
    <xsl:attribute name="{name()}" namespace="{$newNS}">
     <xsl:value-of select="."/>
    </xsl:attribute>
   </xsl:when>
   <xsl:otherwise>
    <xsl:copy-of select="."/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

</xsl:stylesheet>

Thursday 8 August 2013

Conversion of W3C schema to RelaxNG

There appears to be a scant lack of tools for such conversions. Two I have found are:

 RNG Convert


rngconvert which can be found at: http://java.net/downloads/msv/nightly/ 

This is a java based utility that can convert RELAX Core, RELAX namespace, TREX,
or W3C XML Schema or DTD  to RELAX NG. It is run with the following command line (which automatically detects the schema type):

java -jar rngconv.jar myschema.xsd > result.rng

Or for an XML DTD the -dtd option is needed:

java -jar rngconv.jar -dtd myschema.dtd > result.rng

XSD to RNG Converter


This project was initially hosted by google code (http://code.google.com/p/xsdtorngconverter/) but is now on github - https://github.com/epiasini/XSDtoRNG

This is a straight forward common or garden XSLT v1.0

I have not tried either of these tools as yet but will add to this posting as and when I do.

Wednesday 19 June 2013

Making a Google Map for Woodfordes Ale Trail using XSLT

Each two years Woodfordes Brewery gives the budding drinker the chance of winning prizes by competing inn their ale trail. This is a great way to combine walking and some ale supping as demonstrated by their last trail which from which some of the walks are recorded on my Great Walks blog. A booklet is produced by Woodfordes which lists all of the outlets that are taking part in the Ale Trail and includes a grid of squares to enable one to add a sticker for each outlet that is visited and a pint of beer is consumed. For each 12 squares in the grid that are filled by a sticker a prize is offered with the biggest prize being a polypin of ale for a full 72 squares.

There are a series of maps at the back of the booklet to assist in locating the pubs but these are not very detailed and when organising a walk to take in several pubs the process can be a little tedious. Therefore in 2009 I decided to convert the locations into a googlemap. This was purely for my own use but I added it to a publicly accessible site for others to share in this functionality. The original googlemap, created in 2009, was generated with a quick and dirty conversion from the Ale Trail PDF using a bit of XSLT and a lot of manual effort.

So, for this years Ale Trail I decided that the conversion I needed to be a little quicker and easier to generate and a lot less manual effort. The source would be the available PDF document of the Ale Trail. The age old problem with dealing with PDF data is the lack of structure when attempting to extract it to other formats. The tool used for this conversion is an online utility provided by Xerox called Rossinante. The purpose of this tool is to produce an epub but it also enables the ability to grab the XML from each of the stages that is used to create the epub. I grabbed the content XML that had been generated from the PDF extraction. This was not pretty.....

 <PARAGRAPH id="psgmt_427" sgm_info="interlign" x="14.4062" y="209.629" width="232.0" height="18.99">
      <LINE x="14.4062" y="209.629" width="232.0" height="10.0" id="l21_19" base="217.287" font-size="9" font-name="myriad" bold="?" italic="no">
        <TEXT width="231.91" height="9.909" x="14.4062" y="209.629" id="p21_t19">
          <TOKEN sid="p21_s9175" id="p21_w148" font-name="myriad" bold="yes" italic="no" font-size="9" font-color="#d10018" rotation="0" angle="0" x="14.4062" y="209.629" base="217.287" width="17.325" height="9.909">186.</TOKEN>
          <TOKEN sid="p21_s9177" id="p21_w149" font-name="myriad" bold="yes" italic="no" font-size="9" font-color="#d10018" rotation="0" angle="0" x="33.4052" y="209.629" base="217.287" width="48.177" height="9.909">Marlingford</TOKEN>
          <TOKEN sid="p21_s9178" id="p21_w150" font-name="myriad" bold="yes" italic="no" font-size="9" font-color="#d10018" rotation="0" angle="0" x="83.4002" y="209.629" base="217.287" width="15.183" height="9.909">Bell</TOKEN>
          <TOKEN sid="p21_s9179" id="p21_w151" font-name="myriad" bold="no" italic="no" font-size="9" font-color="#000000" rotation="0" angle="0" x="100.493" y="209.808" base="217.287" width="38.763" height="9.729">Bawburgh</TOKEN>
          <TOKEN sid="p21_s9180" id="p21_w152" font-name="myriad" bold="no" italic="no" font-size="9" font-color="#000000" rotation="0" angle="0" x="141.163" y="209.808" base="217.287" width="20.862" height="9.729">Road,</TOKEN>
          <TOKEN sid="p21_s9181" id="p21_w153" font-name="myriad" bold="yes" italic="no" font-size="9" font-color="#000000" rotation="0" angle="0" x="163.316" y="209.629" base="217.287" width="48.177" height="9.909">Marlingford</TOKEN>
          <TOKEN sid="p21_s9182" id="p21_w154" font-name="myriad" bold="no" italic="no" font-size="9" font-color="#000000" rotation="0" angle="0" x="213.403" y="209.808" base="217.287" width="15.381" height="9.729">NR9</TOKEN>
          <TOKEN sid="p21_s9183" id="p21_w155" font-name="myriad" bold="no" italic="no" font-size="9" font-color="#000000" rotation="0" angle="0" x="230.692" y="209.808" base="217.287" width="15.624" height="9.729">5HX</TOKEN>
        </TEXT>
      </LINE>.......</PARAGRAPH>


As expected, the structure was severely lacking although there was enough styling information to infer the structure I needed. So firstly, an XSLT transformation was used to simplify the data and generate some basic elements from the styling data which could be used to infer the structure.


 <para>
      <bold>
         <id>186. </id>
         <t>Marlingford </t>
         <t>Bell </t>
      </bold>
      <normal>
         <t>Bawburgh </t>
         <t>Road, </t>
      </normal>
      <bold>
         <t>Marlingford </t>
      </bold>
      <normal>
         <postcode>NR9 </postcode>
         <postcode>5HX </postcode>
      </normal>
      <normal>
         <tel>01603 </tel>
         <tel>880263 </tel>
      </normal>
      <bold>
         <t>1 </t>
         <t>2 </t>
         <t>4 </t>
      </bold>
   </para>

This wasn't perfect and there were a few instances where an entry was split across paragraphs or multiple items had been contained in a single paragraph. As this was a Quick and Dirty method, I sorted these out manually, but may look into getting the XSLT to determine this to take out the manual intervention. As it happened the manual effort took less than an hour which was not bad for 680 or so pubs!

 A second stage XSLT transformation was then used to create structure and add in the geo location data. The geo location data was inserted using a call to the googlemaps api. The issue encountered here was that the api limited access to 20 requests for each minute. Therefore I needed to slow the transformation down. This was done using a simple task for the transformation engine (Saxon) to undertake:

<xsl:message><xsl:value-of select="for $i in 1 to 100000 return $i*2.5"/></xsl:message>


This took a little experimentation before reaching the desired result but in the end worked a dream. This also provided the county information which would otherwise had to be inferred from the identity number of each pub item. The result was a lot cleaner:


<pub>
 <name id="186.">Marlingford Bell</name>
 <address>Bawburgh Road,</address>
 <address type="place">Marlingford</address>
 <postcode>NR9 5HX</postcode>
 <coords worked="60000">
  <GeocodeResponse>
  <status>OK</status>
  <result>
   <type>postal_code</type>
   <formatted_address>Marlingford, Norfolk NR9 5HX, UK</formatted_address>
   <address_component>
   <long_name>NR9 5HX</long_name>
   <short_name>NR9 5HX</short_name>
   <type>postal_code</type>
   </address_component>
   <address_component>
   <long_name>Marlingford</long_name>
   <short_name>Marlingford</short_name>
   <type>locality</type>
   <type>political</type>
   </address_component>
   <address_component>
   <long_name>Norfolk</long_name>
   <short_name>Norfk</short_name>
   <type>administrative_area_level_2</type>
   <type>political</type>
   </address_component>
   <address_component>
   <long_name>United Kingdom</long_name>
   <short_name>GB</short_name>
   <type>country</type>
   <type>political</type>
   </address_component>
   <address_component>
   <long_name>Norwich</long_name>
   <short_name>Norwich</short_name>
   <type>postal_town</type>
   </address_component>
   <geometry>
   <location>
   <lat>52.6377664</lat>
   <lng>1.1488654</lng>
   </location>
   <location_type>APPROXIMATE</location_type>
   <viewport>
   <southwest>
   <lat>52.6364174</lat>
   <lng>1.1472765</lng>
   </southwest>
   <northeast>
   <lat>52.6391153</lat>
   <lng>1.1504542</lng>
   </northeast>
   </viewport>
   <bounds>
   <southwest>
   <lat>52.6370836</lat>
   <lng>1.1472765</lng>
   </southwest>
   <northeast>
   <lat>52.6384491</lat>
   <lng>1.1504542</lng>
   </northeast>
   </bounds>
  </geometry>
  </result>
  </GeocodeResponse>
 </coords>
 <tel>01603  880263 </tel>
 <key>
  <keyitem>1</keyitem>
  <keyitem>2</keyitem>
  <keyitem>4</keyitem>
 </key>
</pub>

The final stage was a transform to create the JSON objects that were inserted into the html file for the googlemap.


['Marlingford Bell, Marlingford',52.6377664,1.1488654,'','Bawburgh Road, Marlingford, Norfolk NR9 5HX. Tel: 01603  880263 ',0,1,1,0,1,0]

As stated - this was quick and dirty but achieved a result. Hopefully I can make this more seemless in the future. The result can be found at http://vulcanarms.freehostia.com/woodfordes/2013.htm

Tuesday 9 April 2013

Remove leading zeros from a number

There are three methods to remove leading zeros (or any other character) from a number using XSLT.

Use the number() function

number(.) 

to return a string you can use 

string(number(.))

Use the format-number() function 

format-number(.,'#')

Use the replace() function 

replace(.,'^0+','')

Try it using the following XML:
<test>
    <number>0001</number>
    <number>0002</number>
    <number>0003</number>
    <number>0011</number>
    <number>0102</number>
    <number>1003</number>
</test>


and the following xslt template:

<xsl:template match="test">
  <number>
    <xsl:for-each select="number">
       <n><xsl:value-of select="number(.)"/></n>
    </xsl:for-each>
   </number>
   <replace>
     <xsl:for-each select="number">
      <n><xsl:value-of select="replace(.,'^0+','')" /></n>
     </xsl:for-each>
   </replace>
   <format>
     <xsl:for-each select="number">
       <n><xsl:value-of select="format-number(.,'#')"/></n>
     </xsl:for-each>
   </format>
</xsl:template> 

Friday 8 March 2013

Enable JConsole for tomcat

To enable the java console for tomcat use the following procedure (this is for windows): 



1. start the tomcat configuration tool (tomcat7w.exe or tomcat6w.exe) which is located in {install path}/bin 

2. On the Java tab add in the following lines to the textarea labelled Java Options

-Dcom.sun.management.jmxremote.port=9090 

-Dcom.sun.management.jmxremote.ssl=false 

-Dcom.sun.management.jmxremote.authenticate=false 

customising the port to that to be targetted by the console

3. click the 'Apply' button 

4. restart tomcat 

5. start jconsole from teh command line using: java jconsole targeting localhost:9090

Friday 1 February 2013

Ant Task for FOP

There is an ant task for FOP

For FOP 0.95 see http://xmlgraphics.apache.org/fop/0.95/anttask.html

For FOP 1.0 see http://xmlgraphics.apache.org/fop/1.0/anttask.html

for full details


<property name="fop.home" value="....path to your FOP HOME directory..."/>

<taskdef name="fop"
         classname="org.apache.fop.tools.anttasks.Fop">
  <classpath>
    <fileset dir="${fop.home}/lib">
      <include name="*.jar"/>
    </fileset>
    <fileset dir="${fop.home}/build">
      <include name="fop.jar"/>
      <include name="fop-hyph.jar" />
    </fileset>
  </classpath>
</taskdef>



<target name="generate-pdf" description="Generates a single PDF file">
   <fop format="application/pdf"
        fofile="c:\working\foDirectory\foDocument.fo"
        outfile="c:\working\pdfDirectory\pdfDocument.pdf" />
</target>

<target name="generate-multiple-pdfs"
        description="Generates multiple PDF files">
   <fop format="application/pdf"
        outdir="${build.dir}" messagelevel="debug">
        <fileset dir="${fo.examples.dir}">
           <include name="*.fo"/>
        </fileset>
   </fop>
</target>