Wednesday 25 June 2014

Xpath to generate an xpath string to the current item

An xpath to be used in either xquery or XSLT to generate the heirarchical path to the current item


string-join(
   (for $node in ancestor::* 
    return 
      concat($node/name(),
              '[', 
              count($node/preceding-sibling::*[name() = $node/name()])+1, 
              ']'
            ),
      concat(name(),
              '[', 
              count(preceding-sibling::*[name() = current()/name()]) + 1, 
              ']'
            )
    ),
  '/')


Find Duplicate IDs with XSLT

This little snippet of XSLT is a useful tool to find all duplicate ids within an XML source document and generate a report with the count of duplicates and xpath to each element that has a duplciate id attribute.

This relies upon the attribute in question being names @id but it is simple enough to change this to whatever attribute you need to interrogate

Note that this is XSLT 2 and has been used with the Saxon transformation engine


<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
 
  exclude-result-prefixes="xs">

  <xsl:output indent="yes"/>
  
  <xsl:key name="ids" match="*[@id]" use="@id"/> 

  <xsl:template match="/">
    <duplicates>
      <xsl:apply-templates select="//*[@id]"/>
    </duplicates>
  </xsl:template>


  <xsl:template match="*[@id]">
    <xsl:if test="count(key('ids', @id)) &gt; 1">
      <duplicate 
        id="{@id}" 
        dup-count="{count(key('ids', @id))}" 
        node-xpath="{string-join((for $node in ancestor::* return concat($node/name(),'[', count($node/preceding-sibling::*[name() = $node/name()])+1, ']'),concat(name(),'[', count(preceding-sibling::*[name() = current()/name()]) + 1, ']')
   ),'/')}">
     
      </duplicate>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

Wednesday 4 June 2014

Xsparql - a first attempt!

Xsparql is an easy method of using sparql queries with an xquery syntax. I thought I would give this a go with a data scrape from the schema.org markup on my own walks blog. The specific page I chose was The South West Coast Path - Marazion to Porthleven walk. I wasnt going to do anything complicated, just attempt to pull out the co-ordinates and names of the places featured in the walk posting.

First I needed an implementation of the xsparql specification, this I found at http://xsparql.deri.org/.

Then I needed an rdf source, this was created by scraping my blog posting mentioned above using Apache Any23 - Anything To Triples - Live Service Demo and requesting xml/rdf output which was saved as any23.org.rdf.

The xsparql code was then put together - it seemed to be white space sensitive in some instances and took a few attempts to get working. The code below was used and saved to a file named query.xs:

declare namespace place = "http://schema.org/Place/";
declare namespace geo = "http://schema.org/GeoCoordinates/";

<places>
{ for $Place $Name from <any23.org.rdf>
  where { $Place place:name $Name }
  order by $Name
  return <place obj="{$Place}" name="{$Name}" >
         { for $Name $Geo $lat $long from <any23.org.rdf>
           where { $Place place:geo  $Geo.
   $Place place:name $Name.
   $Geo geo:latitude $lat.
   $Geo geo:longitude $long}
           return <geo> 
   <lat>{ $lat }</lat>
   <long>{$long}</long>
  </geo>
         }
</place>
}
</places>

The implementation was then invoked with the following command line:

java -jar cli-0.5-jar-with-dependencies.jar query.xs -f result.xml

Which generateed the result document:


<places>
   <place name="Marazion" obj="b0">
      <geo>
         <lat>50.118267</lat>
         <long>-5.4776716</long>
      </geo>
   </place>
   <place name="Porthleven" obj="b1">
      <geo>
         <lat>50.118267</lat>
         <long>-5.4776716</long>
      </geo>
   </place>
   <place name="Prussia Cove Smugglers" obj="b2">
      <geo>
         <lat>50.101272</lat>
         <long>-5.4157501</long>
      </geo>
   </place>
   <place name="St Michael's Mount" obj="b3">
      <geo>
         <lat>50.116836</lat>
         <long>-5.4779291</long>
      </geo>
   </place>
</places>

Funky stuff!