XML XSLT and the Universe

Tuesday, 3 March 2020

Prevent closed elements

There seems to be a lot of websites that offer methods of preventing closed elements. Not sure if this is of any relevance but if we use a zero space character then it does offer a good method of display for the rendered element

which will render as

Tuesday, 23 October 2018

Dynamic geographic extent finder

This uses a power set to return all the applicable geographic extents based on a requested value. The code is based upon the functions detailed in David Cassel's blog


declare function search:ignore-word($word as element(cts:word-query)) as xs:boolean {
 $word/cts:text/lower-case(.) = $search:stopWords and not($word/@qtextpre = '"' and $word/@qtextpost = '"')
};


declare function search:print($index, $values, $selections, $connector) {
    fn:concat(fn:string-join($values[$selections], $connector))
};

declare function search:apply-on-power-set($seq as item()*, $function as xdmp:function, $data) {
    let $len := fn:count($seq)
    for $i in (1 to xs:int(math:pow(2, $len) - 1))
    let $targets :=
        for $bit in (1 to $len)
        let $shifted :=
            if ($bit > 1) then
                xs:int($i div (math:pow(2, $bit - 1)))
            else $i
        return
            if (math:fmod($shifted, 2) eq 1) then
                $bit
            else ()
    return
        xdmp:apply($function, $i, $seq, $targets, $data)
};

declare variable $search:extents as xs:string+ := ('E', 'W', 'S', 'N.I.', 'E.U.');

declare variable $search:extentcombinations as xs:string+ := search:apply-on-power-set(($search:extents), xdmp:function(xs:QName("search:print")), '+');

declare function search:extent-combinations($extents as xs:string+) as xs:string* {
distinct-values(
    for $e in $extents
    return
        if ($e = ('england', 'E')) then $search:extentcombinations[matches(., 'E$|E\+')]
       else if ($e = ('wales', 'W')) then $search:extentcombinations[matches(., 'W')]
       else if ($e = ('scotland', 'S')) then $search:extentcombinations[matches(., 'S')]
       else if ($e = ('ni', 'N.I.')) then $search:extentcombinations[matches(., 'N\.I\.')]
       else if ($e = ('eu', 'E.U.')) then $search:extentcombinations[matches(., 'E\.U\.')]
       else ()
      )
};

Tuesday, 2 October 2018

Bluetooth not loading in Windows 10

Issue

When starting windows 10 the bluetooth driver fails to load and refuses to start manually from the services

Resolution

click Windows Icon + X then selct Device Manage. Delete the Bluetooth driver and delete - windows will replace and resolve the issue

Thursday, 2 August 2018

Returning xml data from a json file with XSLT 3

XSLT is enabled with functions to support json. This can be used with a simple call to retrieve and convert a json file to xml


 <xsl:sequence select="json-to-xml(unparsed-text($my-uri))"/>

Friday, 27 July 2018

xquery return a regex matched string

Useful function to return the string matched by a regex pattern - note this will only work for a single match in the string


 declare function local:get-string-match($string as xs:string?, $regex as xs:string)  as xs:string? {
 if (matches($string,$regex)) then
  let $length := string-length($string) - string-length(replace($string, $regex,''))
  let $start := string-length(tokenize($string, $regex)[1]) + 1
  return substring($string, $start, $length)
 else ()
 } ;

Wednesday, 6 June 2018

Resolve Curls double header when requesting from a digest authentication source in an xslt pipeline

When using curl to make a request against a source that uses digest authentication the response will contain a double header, the first is a 401 response followed by the actual response of the request. This is because using the --digest flag with curl, the request will firstly return a 401 to get the nonce etc to be able to do the second request that would get the real data.

A sample request would be something like:

curl http:www.mysite.com --digest -u user:pw --dump-header headers.txt

An example of the response from such a request is shown below:

HTTP/1.1 401 Unauthorized
Server: MarkLogic
WWW-Authenticate: Digest realm="public", qop="auth", nonce="563c5b25a3dbad7b6aa54cdc7dc0d294", opaque="56e3687fefa06c25"
Content-Type: text/html; charset=utf-8
Content-Length: 209
Connection: Keep-Alive
Keep-Alive: timeout=5

HTTP/1.1 200 OK
Server: MarkLogic
Content-Type: application/xml; charset=UTF-8
Content-Length: 175
Connection: Keep-Alive
Keep-Alive: timeout=5

When this is a part of an xml/xslt workflow then which makes use of the header file that is returned then the following code will resolve to the correct header with the following considerations:

This is merely a code snippet that requires to be employed within an xslt stylesheet
The code makes use of functions from the http://www.xsltfunctions.com library (xmlns:functx="http://www.functx.com")
The $actualHeader variable is the header that is required


<xsl:variable name="unparsedText"  select="unparsed-text(resolve-uri('headers.txt', resolve-uri('mydir', static-base-uri())))"/>
<xsl:variable name="actualHeader"  
  select="if (tokenize($unparsedText, 'HTTP')[3]) then 
       substring($unparsedText, functx:index-of-string-last($unparsedText,'HTTP')) 
   else $unparsedText"/>

<xsl:function name="functx:index-of-string-last" as="xs:integer?">
  <xsl:param name="arg" as="xs:string?"/>
  <xsl:param name="substring" as="xs:string"/>

  <xsl:sequence select="
  functx:index-of-string($arg, $substring)[last()]
 "/>

</xsl:function>

<xsl:function name="functx:index-of-string" as="xs:integer*">
  
  

  <xsl:sequence select="
  if (contains($arg, $substring))
  then (string-length(substring-before($arg, $substring))+1,
        for $other in
           functx:index-of-string(substring-after($arg, $substring),
                               $substring)
        return
          $other +
          string-length(substring-before($arg, $substring)) +
          string-length($substring))
  else ()
 "/>

</xsl:function>

Thursday, 24 May 2018

Remove illegal XML characters

A recent issue where some parsed text generated from PDFbox required to be transformed using XSLT highlighted issues with illegal characters in the content. This was part of an ant task and therefore a simple replaceregex solved the issue. The matter was complicated by characters that had not correctly translated to UTF-8, notably the greek π character. Therefore a couple of regexes were required to resolve the issue. The first matches all characters that are not part of the XML specification, the next is specific to the the π character


<target name="remove-illegal">
 <replaceregexp match="[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u10000-u10FFFF]" replace="" flags="g" encoding="utf-8">
  <fileset dir="PDFbox-text" includes="*.txt **/*.txt"/>
 </replaceregexp>
 <replaceregexp match="[\\xcf]" replace="" flags="gs" encoding="utf-8">
  <fileset dir="PDFbox-text" includes="*.txt **/*.txt"/>
 </replaceregexp>
</target>