Friday, 3 October 2014

Make structured xml from flat source with XSLT 1

Issue

A requirement for structured XML to be generated from a flat XML source but only XSLT 1 can be used for the transformation. This required that all paragraph elements needed to be nested within a subsection element and all subparagraphs needed to be nested within the paragraph element. An additional requirement was for textual content to be contained within a <text/> element

Resolution

Source:

<sectiontitle>sample content text</sectiontitle>
<subsection>sample content text</subsection>
<paragraph>sample content text</paragraph>
<paragraph>sample content text</paragraph>
<subparagraph>sample content text</subparagraph>
<subparagraph>sample content text</subparagraph>
<subsection>sample content text</subsection>

Required Output:

<sectiontitle><text>sample content text</text></sectiontitle>
<subsection>
    <text>sample content text</text>
    <paragraph><text>sample content text</text></paragraph>
    <paragraph>
        <text>sample content text.</text>
        <subparagraph><text>sample content  text<text></subparagraph>
        <subparagraph><text>sample content  text<text></subparagraph>
    </paragraph>
</subsection>
<subsection><text>sample content text.</text></subsection>


XSLT

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
</xsl:template>

<xsl:template match="subsection">
    <subsection>
        <text>
            <xsl:value-of select="." />
        </text>
        <xsl:apply-templates
        select="following-sibling::paragraph
        [generate-id(preceding-sibling::subsection[1])
        = generate-id(current())]"  mode="nest" />
    </subsection>
</xsl:template>

 <xsl:template match="paragraph" mode="nest">
    <paragraph>
        <text>
            <xsl:value-of select="." />
        </text>
        <xsl:apply-templates 
            select="following-sibling::subparagraph
            [generate-id(preceding-sibling::paragraph[1])
            = generate-id(current())]"  mode="nest" />
    </paragraph>
</xsl:template>

<xsl:template match="subparagraph" mode="nest">
    <xsl:copy>
        <text>
            <xsl:apply-templates />
        </text>
    </xsl:copy>
</xsl:template>
 
<xsl:template match="paragraph"/>
 
<xsl:template match="subparagraph"/>

Points to note:

  • The xsl:value-of could be xsl:apply-templates if we have anything other than a text node within the content
  • There is a requirement for consistency withn the XML source

Thursday, 4 September 2014

Ant XSLT task for saxon transformations

As the Saxon AntTransform is no longer supported it is recommended to use the standard Ant XSLT task (see XSLT Task for documentation.

Example code is given below

<?xml version="1.0" encoding="UTF-8"?>
<project name="opsi" default="transformer" basedir=".">

 <property file="ant.properties"/>
 <property name="source.xml" location="share.xml"/>
 <property name="output.xml" location="output.xml"/>
 <!-- home directroy for Saxon  -->
 <property name="saxon.jar" location="saxon/saxon9.jar"/>

 <path id="saxon.classpath">
  <pathelement location="${saxon.jar}"/>
  <pathelement location="saxon/saxon9he.jar"/>
 </path>

 <target name="transformer">  
  <xslt in="${source.xml}"
      out="${output.xml}"
      style="share.xslt"
      processor="trax">
   <factory name="net.sf.saxon.TransformerFactoryImpl"/>
   <classpath refid="saxon.classpath" />
  </xslt>  
 </target>

</project>

Thursday, 10 July 2014

Large Text File Viewer

A recent issue resulted in interrogating log files > 1 Gb. All text editors I had could not cope with this including emacs, notepad++ and windows notepad. Eventually I found the simply named Large Text File Viewer from Switftgear which worked a treat. Downloadable as a zip, it requires no installation and the executable is only 572kb. Download here

Wednesday, 25 June 2014

Xpath to generate an xpath string to the current item

An xpath to be used in either xquery or XSLT to generate the heirarchical path to the current item


string-join(
   (for $node in ancestor::* 
    return 
      concat($node/name(),
              '[', 
              count($node/preceding-sibling::*[name() = $node/name()])+1, 
              ']'
            ),
      concat(name(),
              '[', 
              count(preceding-sibling::*[name() = current()/name()]) + 1, 
              ']'
            )
    ),
  '/')


Find Duplicate IDs with XSLT

This little snippet of XSLT is a useful tool to find all duplicate ids within an XML source document and generate a report with the count of duplicates and xpath to each element that has a duplciate id attribute.

This relies upon the attribute in question being names @id but it is simple enough to change this to whatever attribute you need to interrogate

Note that this is XSLT 2 and has been used with the Saxon transformation engine


<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
 
  exclude-result-prefixes="xs">

  <xsl:output indent="yes"/>
  
  <xsl:key name="ids" match="*[@id]" use="@id"/> 

  <xsl:template match="/">
    <duplicates>
      <xsl:apply-templates select="//*[@id]"/>
    </duplicates>
  </xsl:template>


  <xsl:template match="*[@id]">
    <xsl:if test="count(key('ids', @id)) &gt; 1">
      <duplicate 
        id="{@id}" 
        dup-count="{count(key('ids', @id))}" 
        node-xpath="{string-join((for $node in ancestor::* return concat($node/name(),'[', count($node/preceding-sibling::*[name() = $node/name()])+1, ']'),concat(name(),'[', count(preceding-sibling::*[name() = current()/name()]) + 1, ']')
   ),'/')}">
     
      </duplicate>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

Wednesday, 4 June 2014

Xsparql - a first attempt!

Xsparql is an easy method of using sparql queries with an xquery syntax. I thought I would give this a go with a data scrape from the schema.org markup on my own walks blog. The specific page I chose was The South West Coast Path - Marazion to Porthleven walk. I wasnt going to do anything complicated, just attempt to pull out the co-ordinates and names of the places featured in the walk posting.

First I needed an implementation of the xsparql specification, this I found at http://xsparql.deri.org/.

Then I needed an rdf source, this was created by scraping my blog posting mentioned above using Apache Any23 - Anything To Triples - Live Service Demo and requesting xml/rdf output which was saved as any23.org.rdf.

The xsparql code was then put together - it seemed to be white space sensitive in some instances and took a few attempts to get working. The code below was used and saved to a file named query.xs:

declare namespace place = "http://schema.org/Place/";
declare namespace geo = "http://schema.org/GeoCoordinates/";

<places>
{ for $Place $Name from <any23.org.rdf>
  where { $Place place:name $Name }
  order by $Name
  return <place obj="{$Place}" name="{$Name}" >
         { for $Name $Geo $lat $long from <any23.org.rdf>
           where { $Place place:geo  $Geo.
   $Place place:name $Name.
   $Geo geo:latitude $lat.
   $Geo geo:longitude $long}
           return <geo> 
   <lat>{ $lat }</lat>
   <long>{$long}</long>
  </geo>
         }
</place>
}
</places>

The implementation was then invoked with the following command line:

java -jar cli-0.5-jar-with-dependencies.jar query.xs -f result.xml

Which generateed the result document:


<places>
   <place name="Marazion" obj="b0">
      <geo>
         <lat>50.118267</lat>
         <long>-5.4776716</long>
      </geo>
   </place>
   <place name="Porthleven" obj="b1">
      <geo>
         <lat>50.118267</lat>
         <long>-5.4776716</long>
      </geo>
   </place>
   <place name="Prussia Cove Smugglers" obj="b2">
      <geo>
         <lat>50.101272</lat>
         <long>-5.4157501</long>
      </geo>
   </place>
   <place name="St Michael's Mount" obj="b3">
      <geo>
         <lat>50.116836</lat>
         <long>-5.4779291</long>
      </geo>
   </place>
</places>

Funky stuff!

Tuesday, 13 May 2014

Simple XSLT Construct Tester

When it comes to testing xquery constructs I find the easiest method is to use MArkLogics CQ interface. A simple answer to doing the same with XSLT is to use a single named template in an XSLT

Just create a basic XSLT named tester.xsl and add in one named template called 'testConstruct' and call it using the following command line

java -jar saxon.jar -it:testConstruct -xsl:tester.xsl -o:result.xml

The sample below tests creation of a string reconstruction from a test string

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xsl:template name="testConstruct">
 <xsl:variable name="provisions" select="'Sch. 08 para. 028(02) para. 058(03)A Sch. 15'" />
 <xsl:variable name="schtokens" select="tokenize($provisions, 'Sch. ')" />
 
 <xsl:for-each select="$schtokens">
  <xsl:variable name="paraTokens" select="tokenize(substring-after(., ' '), 'para.')" />
  <xsl:variable name="schNo" select="if (contains(., ' ')) then substring-before(., ' ')   else ." />
  <xsl:value-of select="if (not(matches(., '^\s*$')) and contains(., 'para')) then 
    (concat('Sch. ', $schNo, ' para.', substring-before($paraTokens[2],'('), 
     string-join(for $p in $paraTokens return replace(normalize-space($p), '^[0-9]+', ''),'')) )
    else if (not(matches(., '^\s*$'))) then 
    ( concat('Sch. ', $schNo) ) 
    else ()" />
  <xsl:if test="not(position() = last())">
   <xsl:value-of select="' '"/>
  </xsl:if>
 </xsl:for-each>
</xsl:template>
</xsl:stylesheet>