Wednesday, 6 June 2018

Resolve Curls double header when requesting from a digest authentication source in an xslt pipeline

When using curl to make a request against a source that uses digest authentication the response will contain a double header, the first is a 401 response followed by the actual response of the request. This is because using the --digest flag with curl, the request will firstly return a 401 to get the nonce etc to be able to do the second request that would get the real data.

A sample request would be something like:

curl http:www.mysite.com --digest -u user:pw --dump-header headers.txt

An example of the response from such a request is shown below:

HTTP/1.1 401 Unauthorized
Server: MarkLogic
WWW-Authenticate: Digest realm="public", qop="auth", nonce="563c5b25a3dbad7b6aa54cdc7dc0d294", opaque="56e3687fefa06c25"
Content-Type: text/html; charset=utf-8
Content-Length: 209
Connection: Keep-Alive
Keep-Alive: timeout=5

HTTP/1.1 200 OK
Server: MarkLogic
Content-Type: application/xml; charset=UTF-8
Content-Length: 175
Connection: Keep-Alive
Keep-Alive: timeout=5

When this is a part of an xml/xslt workflow then which makes use of the header file that is returned then the following code will resolve to the correct header with the following considerations:

  • This is merely a code snippet that requires to be employed within an xslt stylesheet
  • The code makes use of functions from the http://www.xsltfunctions.com library (xmlns:functx="http://www.functx.com")
  • The $actualHeader variable is the header that is required

<xsl:variable name="unparsedText"  select="unparsed-text(resolve-uri('headers.txt', resolve-uri('mydir', static-base-uri())))"/>
<xsl:variable name="actualHeader"  
  select="if (tokenize($unparsedText, 'HTTP')[3]) then 
       substring($unparsedText, functx:index-of-string-last($unparsedText,'HTTP')) 
   else $unparsedText"/>

<xsl:function name="functx:index-of-string-last" as="xs:integer?">
  <xsl:param name="arg" as="xs:string?"/>
  <xsl:param name="substring" as="xs:string"/>

  <xsl:sequence select="
  functx:index-of-string($arg, $substring)[last()]
 "/>

</xsl:function>

<xsl:function name="functx:index-of-string" as="xs:integer*">
  
  

  <xsl:sequence select="
  if (contains($arg, $substring))
  then (string-length(substring-before($arg, $substring))+1,
        for $other in
           functx:index-of-string(substring-after($arg, $substring),
                               $substring)
        return
          $other +
          string-length(substring-before($arg, $substring)) +
          string-length($substring))
  else ()
 "/>

</xsl:function>


Thursday, 24 May 2018

Remove illegal XML characters

A recent issue where some parsed text generated from PDFbox required to be transformed using XSLT highlighted issues with illegal characters in the content. This was part of an ant task and therefore a simple replaceregex solved the issue. The matter was complicated by characters that had not correctly translated to UTF-8, notably the greek π character. Therefore a couple of regexes were required to resolve the issue. The first matches all characters that are not part of the XML specification, the next is specific to the the π character

<target name="remove-illegal">
 <replaceregexp match="[^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD\u10000-u10FFFF]" replace="" flags="g" encoding="utf-8">
  <fileset dir="PDFbox-text" includes="*.txt **/*.txt"/>
 </replaceregexp>
 <replaceregexp match="[\\xcf]" replace="" flags="gs" encoding="utf-8">
  <fileset dir="PDFbox-text" includes="*.txt **/*.txt"/>
 </replaceregexp>
</target>


Thursday, 1 March 2018

XSLT: Format number as an alphabetic value

Here is a useful pair of functions that will format a numeric value as an alphabetic value. Ideal for generating alphabetic list numbers from integers. This is not limited to the 26 alphabetic characters and will recurse through the alphabet for each multiple of 26 such that the list will grow from a-z, aa-zz, aaa-zzz etc

xmlns:xslttricks="http://http://xslt-tricks.blogspot.co.uk/functions"


 <xsl:function name="xslttricks:format-number-as-alpha">
  <xsl:param name="number" as="xs:decimal"/> 
  <xsl:value-of select="leg:format-number-as-alpha($number, ())"/>
 </xsl:function>

 <xsl:function name="xslttricks:format-number-as-alpha">
  <xsl:param name="number" as="xs:decimal"/>
  <xsl:param name="case" as="xs:string?"/>
  <xsl:variable name="int" select="xs:integer(round($number))"/>
  <xsl:variable name="mod" select="$int mod 26"/>
  <xsl:variable name="times" select="xs:integer(floor($int div 26) +1)"/>
  <xsl:variable name="alpha" select="('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z')"/>
  <xsl:variable name="numberstring" select="string-join((for $n in 1 to $times return $alpha[$mod]), '')"/>
  <xsl:value-of select="if ($case = ('upper', 'uppercase')) then upper-case($numberstring) else $numberstring"/>
 </xsl:function>