Thursday 23 October 2014

CSV to XML with a Quick and Dirty XSLT

Issue

A csv file has to be converted into XML

Resolution

The following XSLT uses a simple method of tokenization to generate the xml from plain seperated text, the separator being defined by the parameter 'seperator'. The example below uses a tab character.

Other parameters allow the definition of whether a header row is included (header-row), plus the customised naming of the various elements that generate the table, row and cell structure.

The transformation is XSLT2 and can be invoked by use of saxon using the following command line, where thisXSLT.xsl is the code below:

java -jar saxon.jar -it:main -xsl:thisXSLT.xsl -o:result.xml "csvFile=myfile.csv"

XSLT

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:fn="http://www.w3.org/2005/xpath-functions" 
  xmlns:local="http://www.griffmonster.org" 
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
 
  version="2.0"
  exclude-result-prefixes="xsl xs fn local">
 <xsl:output indent="yes" encoding="UTF-8" method="xml"/>
 
 <!--
 
 a more complex routine is available at http://rosettacode.org/wiki/Csv-to-xml.xslt
 
 -->

 <xsl:param name="csvFile" as="xs:string" />
 <xsl:param name="header-row" as="xs:string" select="'true'" />
 <xsl:param name="seperator" as="xs:string"  select="'&#9;'"/>
 <xsl:param name="tableName" as="xs:string"  select="'legislation'"/>
 <xsl:param name="rowName" as="xs:string"  select="'item'"/>
 <xsl:param name="cellName" as="xs:string"  select="'data'"/>
 
 <xsl:template match="/" name="main">
  <xsl:copy-of select="local:csv-to-xml($csvFile)" />
 </xsl:template>

 <!-- if this function is available from xslt 3 then use it otherwise use the makeshift expression  -->
 <xsl:function name="local:unparsed-text-lines" as="xs:string+">
  <xsl:param name="href" as="xs:string" />
  <xsl:sequence use-when="function-available('unparsed-text-lines')" 
    select="fn:unparsed-text-lines($href)" />
  <xsl:sequence use-when="not(function-available('unparsed-text-lines'))" 
    select="tokenize(unparsed-text($href), '\r\n|\r|\n')[not(position()=last() and .='')]" />
 </xsl:function>

 <xsl:function name="local:csv-to-xml" as="node()+">
  <xsl:param name="href" as="xs:string" />
  <xsl:variable name="header-row" as="xs:string*" 
    select="if ($header-row != '') then 
       tokenize(local:unparsed-text-lines($href)[1], $seperator) 
      else ()"/>
  <xsl:element name="{$tableName}">
   <xsl:for-each select="local:unparsed-text-lines($href)">
    <xsl:choose>
     <xsl:when test="position() = 1 and exists($header-row)">
     </xsl:when>
     <xsl:otherwise>
      <xsl:element name="{$rowName}">
       <xsl:variable name="tokens"  as="xs:string+" select="tokenize(., $seperator)"/>
       <xsl:for-each select="$tokens">
        <xsl:variable name="position" as="xs:integer" 
          select="position()"/>
        <xsl:variable name="celltitle" as="xs:string?" 
          select="if (exists($header-row)) then 
             $header-row[$position]
            else ()"/>
        <xsl:element name="{$cellName}">
         <xsl:if test="exists($header-row)">
          <xsl:attribute name="title" select="$celltitle"/>
         </xsl:if>
         <xsl:value-of select="."/>
        </xsl:element>
       </xsl:for-each>
      </xsl:element>
     </xsl:otherwise>
    </xsl:choose>
    
   </xsl:for-each>
  </xsl:element>
 </xsl:function>
</xsl:stylesheet>

Friday 3 October 2014

Make structured xml from flat source with XSLT 1

Issue

A requirement for structured XML to be generated from a flat XML source but only XSLT 1 can be used for the transformation. This required that all paragraph elements needed to be nested within a subsection element and all subparagraphs needed to be nested within the paragraph element. An additional requirement was for textual content to be contained within a <text/> element

Resolution

Source:

<sectiontitle>sample content text</sectiontitle>
<subsection>sample content text</subsection>
<paragraph>sample content text</paragraph>
<paragraph>sample content text</paragraph>
<subparagraph>sample content text</subparagraph>
<subparagraph>sample content text</subparagraph>
<subsection>sample content text</subsection>

Required Output:

<sectiontitle><text>sample content text</text></sectiontitle>
<subsection>
    <text>sample content text</text>
    <paragraph><text>sample content text</text></paragraph>
    <paragraph>
        <text>sample content text.</text>
        <subparagraph><text>sample content  text<text></subparagraph>
        <subparagraph><text>sample content  text<text></subparagraph>
    </paragraph>
</subsection>
<subsection><text>sample content text.</text></subsection>


XSLT

<xsl:template match="node()|@*">
    <xsl:copy>
        <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
</xsl:template>

<xsl:template match="subsection">
    <subsection>
        <text>
            <xsl:value-of select="." />
        </text>
        <xsl:apply-templates
        select="following-sibling::paragraph
        [generate-id(preceding-sibling::subsection[1])
        = generate-id(current())]"  mode="nest" />
    </subsection>
</xsl:template>

 <xsl:template match="paragraph" mode="nest">
    <paragraph>
        <text>
            <xsl:value-of select="." />
        </text>
        <xsl:apply-templates 
            select="following-sibling::subparagraph
            [generate-id(preceding-sibling::paragraph[1])
            = generate-id(current())]"  mode="nest" />
    </paragraph>
</xsl:template>

<xsl:template match="subparagraph" mode="nest">
    <xsl:copy>
        <text>
            <xsl:apply-templates />
        </text>
    </xsl:copy>
</xsl:template>
 
<xsl:template match="paragraph"/>
 
<xsl:template match="subparagraph"/>

Points to note:

  • The xsl:value-of could be xsl:apply-templates if we have anything other than a text node within the content
  • There is a requirement for consistency withn the XML source

Thursday 4 September 2014

Ant XSLT task for saxon transformations

As the Saxon AntTransform is no longer supported it is recommended to use the standard Ant XSLT task (see XSLT Task for documentation.

Example code is given below

<?xml version="1.0" encoding="UTF-8"?>
<project name="opsi" default="transformer" basedir=".">

 <property file="ant.properties"/>
 <property name="source.xml" location="share.xml"/>
 <property name="output.xml" location="output.xml"/>
 <!-- home directroy for Saxon  -->
 <property name="saxon.jar" location="saxon/saxon9.jar"/>

 <path id="saxon.classpath">
  <pathelement location="${saxon.jar}"/>
  <pathelement location="saxon/saxon9he.jar"/>
 </path>

 <target name="transformer">  
  <xslt in="${source.xml}"
      out="${output.xml}"
      style="share.xslt"
      processor="trax">
   <factory name="net.sf.saxon.TransformerFactoryImpl"/>
   <classpath refid="saxon.classpath" />
  </xslt>  
 </target>

</project>

Thursday 10 July 2014

Large Text File Viewer

A recent issue resulted in interrogating log files > 1 Gb. All text editors I had could not cope with this including emacs, notepad++ and windows notepad. Eventually I found the simply named Large Text File Viewer from Switftgear which worked a treat. Downloadable as a zip, it requires no installation and the executable is only 572kb. Download here

Wednesday 25 June 2014

Xpath to generate an xpath string to the current item

An xpath to be used in either xquery or XSLT to generate the heirarchical path to the current item


string-join(
   (for $node in ancestor::* 
    return 
      concat($node/name(),
              '[', 
              count($node/preceding-sibling::*[name() = $node/name()])+1, 
              ']'
            ),
      concat(name(),
              '[', 
              count(preceding-sibling::*[name() = current()/name()]) + 1, 
              ']'
            )
    ),
  '/')


Find Duplicate IDs with XSLT

This little snippet of XSLT is a useful tool to find all duplicate ids within an XML source document and generate a report with the count of duplicates and xpath to each element that has a duplciate id attribute.

This relies upon the attribute in question being names @id but it is simple enough to change this to whatever attribute you need to interrogate

Note that this is XSLT 2 and has been used with the Saxon transformation engine


<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
 
  exclude-result-prefixes="xs">

  <xsl:output indent="yes"/>
  
  <xsl:key name="ids" match="*[@id]" use="@id"/> 

  <xsl:template match="/">
    <duplicates>
      <xsl:apply-templates select="//*[@id]"/>
    </duplicates>
  </xsl:template>


  <xsl:template match="*[@id]">
    <xsl:if test="count(key('ids', @id)) &gt; 1">
      <duplicate 
        id="{@id}" 
        dup-count="{count(key('ids', @id))}" 
        node-xpath="{string-join((for $node in ancestor::* return concat($node/name(),'[', count($node/preceding-sibling::*[name() = $node/name()])+1, ']'),concat(name(),'[', count(preceding-sibling::*[name() = current()/name()]) + 1, ']')
   ),'/')}">
     
      </duplicate>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

Wednesday 4 June 2014

Xsparql - a first attempt!

Xsparql is an easy method of using sparql queries with an xquery syntax. I thought I would give this a go with a data scrape from the schema.org markup on my own walks blog. The specific page I chose was The South West Coast Path - Marazion to Porthleven walk. I wasnt going to do anything complicated, just attempt to pull out the co-ordinates and names of the places featured in the walk posting.

First I needed an implementation of the xsparql specification, this I found at http://xsparql.deri.org/.

Then I needed an rdf source, this was created by scraping my blog posting mentioned above using Apache Any23 - Anything To Triples - Live Service Demo and requesting xml/rdf output which was saved as any23.org.rdf.

The xsparql code was then put together - it seemed to be white space sensitive in some instances and took a few attempts to get working. The code below was used and saved to a file named query.xs:

declare namespace place = "http://schema.org/Place/";
declare namespace geo = "http://schema.org/GeoCoordinates/";

<places>
{ for $Place $Name from <any23.org.rdf>
  where { $Place place:name $Name }
  order by $Name
  return <place obj="{$Place}" name="{$Name}" >
         { for $Name $Geo $lat $long from <any23.org.rdf>
           where { $Place place:geo  $Geo.
   $Place place:name $Name.
   $Geo geo:latitude $lat.
   $Geo geo:longitude $long}
           return <geo> 
   <lat>{ $lat }</lat>
   <long>{$long}</long>
  </geo>
         }
</place>
}
</places>

The implementation was then invoked with the following command line:

java -jar cli-0.5-jar-with-dependencies.jar query.xs -f result.xml

Which generateed the result document:


<places>
   <place name="Marazion" obj="b0">
      <geo>
         <lat>50.118267</lat>
         <long>-5.4776716</long>
      </geo>
   </place>
   <place name="Porthleven" obj="b1">
      <geo>
         <lat>50.118267</lat>
         <long>-5.4776716</long>
      </geo>
   </place>
   <place name="Prussia Cove Smugglers" obj="b2">
      <geo>
         <lat>50.101272</lat>
         <long>-5.4157501</long>
      </geo>
   </place>
   <place name="St Michael's Mount" obj="b3">
      <geo>
         <lat>50.116836</lat>
         <long>-5.4779291</long>
      </geo>
   </place>
</places>

Funky stuff!

Tuesday 13 May 2014

Simple XSLT Construct Tester

When it comes to testing xquery constructs I find the easiest method is to use MArkLogics CQ interface. A simple answer to doing the same with XSLT is to use a single named template in an XSLT

Just create a basic XSLT named tester.xsl and add in one named template called 'testConstruct' and call it using the following command line

java -jar saxon.jar -it:testConstruct -xsl:tester.xsl -o:result.xml

The sample below tests creation of a string reconstruction from a test string

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xsl:template name="testConstruct">
 <xsl:variable name="provisions" select="'Sch. 08 para. 028(02) para. 058(03)A Sch. 15'" />
 <xsl:variable name="schtokens" select="tokenize($provisions, 'Sch. ')" />
 
 <xsl:for-each select="$schtokens">
  <xsl:variable name="paraTokens" select="tokenize(substring-after(., ' '), 'para.')" />
  <xsl:variable name="schNo" select="if (contains(., ' ')) then substring-before(., ' ')   else ." />
  <xsl:value-of select="if (not(matches(., '^\s*$')) and contains(., 'para')) then 
    (concat('Sch. ', $schNo, ' para.', substring-before($paraTokens[2],'('), 
     string-join(for $p in $paraTokens return replace(normalize-space($p), '^[0-9]+', ''),'')) )
    else if (not(matches(., '^\s*$'))) then 
    ( concat('Sch. ', $schNo) ) 
    else ()" />
  <xsl:if test="not(position() = last())">
   <xsl:value-of select="' '"/>
  </xsl:if>
 </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Wednesday 16 April 2014

FOP External Graphics Caching

A recent issue encountered with FOP 1.0 involved the FOP processor grinding to a halt when a document contained a batch of external images hosted on a separate domain. It appeared to be fine with just two or three images but froze if greater numbers were included. The issue appeared to be centred around a prefetch utility in the image loader. This was attempting to fetch all the images in the document prior to processing and it appeared to be overloading the Apache Server with the request.

This was further substantiated when removing the images and then processing with just a couple, then reprocessing with the next few inserted. This worked because the initial ones had already been cached. However, such a routine is certainly not an efficient work-around.

The resolution was to disable the FOP image caching using a system property, -Dorg.apache.xmlgraphics.image.loader.impl.AbstractImageSessionContext.no-source-reuse=true which was added to the JAVAOPTS environment variable. This is easily done by modifying the FOP batch script that is part of the FOP download, adding the property to the appropriate line:

set JAVAOPTS=-Denv.windir=%WINDIR% -Xmx2048m -Dorg.apache.xmlgraphics.image.loader.impl.AbstractImageSessionContext.no-source-reuse=true

The result of this fix is that the processor only fetches the images when needed as seperate requests rather than as a bulk load in one request. The result worked perfectly. No more hanging, and even documents with 100's of images were being processed within a minute.

More information can be found at http://xmlgraphics.apache.org/commons/image-loader.html

Wednesday 9 April 2014

Determine openssl version in apache

With the Heartbleed bug vulnerability around there is a need to determine the version on openssl used on apache systems

The status of the versions of Openssl affected are:

  • OpenSSL 1.0.1 through 1.0.1f (inclusive) are vulnerable
  • OpenSSL 1.0.1g is NOT vulnerable
  • OpenSSL 1.0.0 branch is NOT vulnerable
  • OpenSSL 0.9.8 branch is NOT vulnerable

How to determine the version of openssl that is being run in the apache installation on windows.

Open the command line and navigate to the apache/bin directory and use the following line

openssl version -a

To check openssl vulnerabilities in apache based sites use the online tool at:

http://filippo.io/Heartbleed

ipconfig switches

The following switches can be used with the ipconfig command line utility:

SwitchEffect
/allProduces a detailed configuration report for all interfaces.
/flushdnsRemoves all entries from the DNS name cache.
/registerdnsRefreshes all DHCP leases and reregisters DNS names
/displaydnsDisplays the contents of the DNS resolver cache.
/release < adapter >Releases the IP address for a specified interface.
/renew < adapter >Renews the IP address for a specified interface.
/showclassid < adapter >Displays all the DHCP class IDs allowed for the adapter specified.
/setclassid < adapter > < classID to set >Changes the DHCP class ID for the adapter specified.
/?Displays this list.

Thursday 3 April 2014

XSLT: processing instructions

A recent requirement entailed the round-tripping of elements to processing instructions and then back again to their original elements using a second transformation. This presented a few challanges.

The elements concerned were off the form


<err:Warning 
foo="1234567" 
bar="abcd/efgh/jklm">content that records "some message"</err:Warning>

The first part of the round trip, is to take the elements that were all in a single namespace, and recast them as PI's. The attributes are taken through as name-value pairs and the content is taken through as a name value pair of the form content={content}.


<xsl:template match="err:*">
       <xsl:processing-instruction name="err-{local-name()}">
       <xsl:for-each select="@*">
   <xsl:value-of select="name()"/>
   <xsl:text>="</xsl:text>
   <xsl:value-of select="."/>
   <xsl:text>" </xsl:text>
    </xsl:for-each>
    <xsl:text>content="</xsl:text>
    <xsl:value-of select='.'/>
    <xsl:text>"</xsl:text>
 </xsl:processing-instruction>
</xsl:template>

Running this transformation in Saxon gives the end result of:


<?err-Warning foo="1234567" bar="abcd/efgh/jklm" content="content that records "some message""?>

It must be remembered that PIs do not have attributes, although the result looks to all intents and purposes as an attribute, they are not and you cannot target a PI attribute with an xpath as it does not exists! Therefore, it is a little more of a challange to get this back to their original attributes. In XSLT 2 we can use regular expresions to recreate the attributes.

Firstly, filter out the content={value} name value pair, then use the remaining string as the source for an analyze-string and a regualr expression of ([\c]+)="(.*?") where the second group targets the end quote of the name-value pair. This end quote then needs removing with the translate() or replace() function


<xsl:template match="processing-instruction()[starts-with(name(),'err-')]">
 <xsl:variable name="PI" select="substring-before(.,'content=')" />
 <xsl:variable name="PIname" select="substring-after(name(), 'err-')"/>
 <xsl:variable name="content" select="substring-after(., 'content="')"/>
 <xsl:element name="err:{$PIname}">
  <xsl:analyze-string select="$PI" regex='([\c]+)="(.*?")'>
    <xsl:matching-substring>
    <xsl:attribute name="{regex-group(1)}">
     <xsl:value-of select='translate(regex-group(2),"""", "")'/>
    </xsl:attribute>
    </xsl:matching-substring>
  </xsl:analyze-string>
  <!-- remove the last double quote from the content  -->
  <xsl:value-of select='replace($content,"""$", "")' />
 </xsl:element>
</xsl:template>

Running this transformation in Saxon returns us back to the original element:


<?err-Warning foo="1234567" bar="abcd/efgh/jklm" content="content that records "some message""?>

Friday 14 March 2014

SPARQL highlighting in Notepad++

SPARQL highlighting in Notepad++ can be achieved by adding in a user defined language. Luckily this has already been catered for and is available from sourceforge, the specific file being notepadpp_sparql.xml.

To add this in locate your user defined language file userDefineLang.xml which will either sit in the application directory or {user}\AppData\Roaming\Notepad++\userDefineLang.xml and add in the <UserLang/> node and all its content from notepadpp_sparql.xml as a child to the <NotepadPlus> element in the userDefineLang.xml file. If the file does not exist, just rename notepadpp_sparql.xml as userDefineLang.xml and place it in the appropirate directory.

Thursday 20 February 2014

Create list of files using XSLT

Issue - need a directory listing of xml files using purely XSLT

Resolution - create an XSLT with the following code, name this as lister.xsl and use the Saxon initial template option -it to call the template

java -jar saxon.jar -it:lister -xsl:lister.xsl -o:filelist.xml

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template name="lister">
    <fileList>
      <xsl:for-each 
        select="collection('./?select=*.xml;recurse=yes;on-error=warning')" >
 <xsl:element name='file'>
   <xsl:attribute name="full" select="document-uri(.)"/>
   <xsl:value-of select="tokenize(document-uri(.), '/')[last()]"/>
 </xsl:element>
      </xsl:for-each>
    </fileList>
  </xsl:template>
</xsl:stylesheet>

Convert unescaped html to xml

XSLT stylesheet to convert unescaped HTML to XML

This was used to extract the html from a blogger feed which was used across http://griffmonster-walks.blogspot.co.uk/feeds/posts/full

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"


xmlns:functx="http://www.functx.com"
xmlns:xs="http://www.w3.org/2001/XMLSchema"

>

 <xsl:output omit-xml-declaration="yes" indent="yes"  />

 <xsl:variable name="htmlEntities" as="element()*">
  <entity name="&amp;nbsp;" char="&#160;"/>  <!-- no-break space -->
  <entity name="&amp;iexcl;" char="&#161;"/>  <!-- inverted exclamation mark -->
  <entity name="&amp;cent;" char="&#162;"/>  <!-- cent sign -->
  <entity name="&amp;pound;" char="&#163;"/>  <!-- pound sterling sign -->
  <entity name="&amp;curren;" char="&#164;"/>  <!-- general currency sign -->
  <entity name="&amp;yen;" char="&#165;"/>  <!-- yen sign -->
  <entity name="&amp;brvbar;" char="&#166;"/>  <!-- broken (vertical) bar -->
  <entity name="&amp;sect;" char="&#167;"/>  <!-- section sign -->
  <entity name="&amp;uml;" char="&#168;"/>  <!-- umlaut (dieresis) -->
  <entity name="&amp;copy;" char="&#169;"/>  <!-- copyright sign -->
  <entity name="&amp;ordf;" char="&#170;"/>  <!-- ordinal indicator, feminine -->
  <entity name="&amp;laquo;" char="&#171;"/>  <!-- angle quotation mark, left -->
  <entity name="&amp;not;" char="&#172;"/>  <!-- not sign -->
  <entity name="&amp;shy;" char="&#173;"/>  <!-- soft hyphen -->
  <entity name="&amp;reg;" char="&#174;"/>  <!-- registered sign -->
  <entity name="&amp;macr;" char="&#175;"/>  <!-- macron -->
  <entity name="&amp;deg;" char="&#176;"/>  <!-- degree sign -->
  <entity name="&amp;plusmn;" char="&#177;"/>  <!-- plus-or-minus sign -->
  <entity name="&amp;sup2;" char="&#178;"/>  <!-- superscript two -->
  <entity name="&amp;sup3;" char="&#179;"/>  <!-- superscript three -->
  <entity name="&amp;acute;" char="&#180;"/>  <!-- acute accent -->
  <entity name="&amp;micro;" char="&#181;"/>  <!-- micro sign -->
  <entity name="&amp;para;" char="&#182;"/>  <!-- pilcrow (paragraph sign) -->
  <entity name="&amp;middot;" char="&#183;"/>  <!-- middle dot -->
  <entity name="&amp;cedil;" char="&#184;"/>  <!-- cedilla -->
  <entity name="&amp;sup1;" char="&#185;"/>  <!-- superscript one -->
  <entity name="&amp;ordm;" char="&#186;"/>  <!-- ordinal indicator, masculine -->
  <entity name="&amp;raquo;" char="&#187;"/>  <!-- angle quotation mark, right -->
  <entity name="&amp;frac14;" char="&#188;"/>  <!-- fraction one-quarter -->
  <entity name="&amp;frac12;" char="&#189;"/>  <!-- fraction one-half -->
  <entity name="&amp;frac34;" char="&#190;"/>  <!-- fraction three-quarters -->
  <entity name="&amp;iquest;" char="&#191;"/>  <!-- inverted question mark -->
  <entity name="&amp;Agrave;" char="&#192;"/>  <!-- capital A, grave accent -->
  <entity name="&amp;Aacute;" char="&#193;"/>  <!-- capital A, acute accent -->
  <entity name="&amp;Acirc;" char="&#194;"/>  <!-- capital A, circumflex accent -->
  <entity name="&amp;Atilde;" char="&#195;"/>  <!-- capital A, tilde -->
  <entity name="&amp;Auml;" char="&#196;"/>  <!-- capital A, dieresis or umlaut mark -->
  <entity name="&amp;Aring;" char="&#197;"/>  <!-- capital A, ring -->
  <entity name="&amp;AElig;" char="&#198;"/>  <!-- capital AE diphthong (ligature) -->
  <entity name="&amp;Ccedil;" char="&#199;"/>  <!-- capital C, cedilla -->
  <entity name="&amp;Egrave;" char="&#200;"/>  <!-- capital E, grave accent -->
  <entity name="&amp;Eacute;" char="&#201;"/>  <!-- capital E, acute accent -->
  <entity name="&amp;Ecirc;" char="&#202;"/>  <!-- capital E, circumflex accent -->
  <entity name="&amp;Euml;" char="&#203;"/>  <!-- capital E, dieresis or umlaut mark -->
  <entity name="&amp;Igrave;" char="&#204;"/>  <!-- capital I, grave accent -->
  <entity name="&amp;Iacute;" char="&#205;"/>  <!-- capital I, acute accent -->
  <entity name="&amp;Icirc;" char="&#206;"/>  <!-- capital I, circumflex accent -->
  <entity name="&amp;Iuml;" char="&#207;"/>  <!-- capital I, dieresis or umlaut mark -->
  <entity name="&amp;ETH;" char="&#208;"/>  <!-- capital Eth, Icelandic -->
  <entity name="&amp;Ntilde;" char="&#209;"/>  <!-- capital N, tilde -->
  <entity name="&amp;Ograve;" char="&#210;"/>  <!-- capital O, grave accent -->
  <entity name="&amp;Oacute;" char="&#211;"/>  <!-- capital O, acute accent -->
  <entity name="&amp;Ocirc;" char="&#212;"/>  <!-- capital O, circumflex accent -->
  <entity name="&amp;Otilde;" char="&#213;"/>  <!-- capital O, tilde -->
  <entity name="&amp;Ouml;" char="&#214;"/>  <!-- capital O, dieresis or umlaut mark -->
  <entity name="&amp;times;" char="&#215;"/>  <!-- multiply sign -->
  <entity name="&amp;Oslash;" char="&#216;"/>  <!-- capital O, slash -->
  <entity name="&amp;Ugrave;" char="&#217;"/>  <!-- capital U, grave accent -->
  <entity name="&amp;Uacute;" char="&#218;"/>  <!-- capital U, acute accent -->
  <entity name="&amp;Ucirc;" char="&#219;"/>  <!-- capital U, circumflex accent -->
  <entity name="&amp;Uuml;" char="&#220;"/>  <!-- capital U, dieresis or umlaut mark -->
  <entity name="&amp;Yacute;" char="&#221;"/>  <!-- capital Y, acute accent -->
  <entity name="&amp;THORN;" char="&#222;"/>  <!-- capital THORN, Icelandic -->
  <entity name="&amp;szlig;" char="&#223;"/>  <!-- small sharp s, German (sz ligature) -->
  <entity name="&amp;agrave;" char="&#224;"/>  <!-- small a, grave accent -->
  <entity name="&amp;aacute;" char="&#225;"/>  <!-- small a, acute accent -->
  <entity name="&amp;acirc;" char="&#226;"/>  <!-- small a, circumflex accent -->
  <entity name="&amp;atilde;" char="&#227;"/>  <!-- small a, tilde -->
  <entity name="&amp;auml;" char="&#228;"/>  <!-- small a, dieresis or umlaut mark -->
  <entity name="&amp;aring;" char="&#229;"/>  <!-- small a, ring -->
  <entity name="&amp;aelig;" char="&#230;"/>  <!-- small ae diphthong (ligature) -->
  <entity name="&amp;ccedil;" char="&#231;"/>  <!-- small c, cedilla -->
  <entity name="&amp;egrave;" char="&#232;"/>  <!-- small e, grave accent -->
  <entity name="&amp;eacute;" char="&#233;"/>  <!-- small e, acute accent -->
  <entity name="&amp;ecirc;" char="&#234;"/>  <!-- small e, circumflex accent -->
  <entity name="&amp;euml;" char="&#235;"/>  <!-- small e, dieresis or umlaut mark -->
  <entity name="&amp;igrave;" char="&#236;"/>  <!-- small i, grave accent -->
  <entity name="&amp;iacute;" char="&#237;"/>  <!-- small i, acute accent -->
  <entity name="&amp;icirc;" char="&#238;"/>  <!-- small i, circumflex accent -->
  <entity name="&amp;iuml;" char="&#239;"/>  <!-- small i, dieresis or umlaut mark -->
  <entity name="&amp;eth;" char="&#240;"/>  <!-- small eth, Icelandic -->
  <entity name="&amp;ntilde;" char="&#241;"/>  <!-- small n, tilde -->
  <entity name="&amp;ograve;" char="&#242;"/>  <!-- small o, grave accent -->
  <entity name="&amp;oacute;" char="&#243;"/>  <!-- small o, acute accent -->
  <entity name="&amp;ocirc;" char="&#244;"/>  <!-- small o, circumflex accent -->
  <entity name="&amp;otilde;" char="&#245;"/>  <!-- small o, tilde -->
  <entity name="&amp;ouml;" char="&#246;"/>  <!-- small o, dieresis or umlaut mark -->
  <entity name="&amp;divide;" char="&#247;"/>  <!-- divide sign -->
  <entity name="&amp;oslash;" char="&#248;"/>  <!-- small o, slash -->
  <entity name="&amp;ugrave;" char="&#249;"/>  <!-- small u, grave accent -->
  <entity name="&amp;uacute;" char="&#250;"/>  <!-- small u, acute accent -->
  <entity name="&amp;ucirc;" char="&#251;"/>  <!-- small u, circumflex accent -->
  <entity name="&amp;uuml;" char="&#252;"/>  <!-- small u, dieresis or umlaut mark -->
  <entity name="&amp;yacute;" char="&#253;"/>  <!-- small y, acute accent -->
  <entity name="&amp;thorn;" char="&#254;"/>  <!-- small thorn, Icelandic -->
  <entity name="&amp;yuml;" char="&#255;"/>  <!-- small y, dieresis or umlaut mark -->
  <!-- Latin Extended-B -->
  <entity name="&amp;fnof;" char="&#402;"/>  <!-- latin small f with hook, =function, =florin, u+0192 ISOtech -->

  <!-- Greek -->
  <entity name="&amp;Alpha;" char="&#913;"/>  <!-- greek capital letter alpha,  u+0391 -->
  <entity name="&amp;Beta;" char="&#914;"/>  <!-- greek capital letter beta,  u+0392 -->
  <entity name="&amp;Gamma;" char="&#915;"/>  <!-- greek capital letter gamma,  u+0393 ISOgrk3 -->
  <entity name="&amp;Delta;" char="&#916;"/>  <!-- greek capital letter delta,  u+0394 ISOgrk3 -->
  <entity name="&amp;Epsilon;" char="&#917;"/>  <!-- greek capital letter epsilon,  u+0395 -->
  <entity name="&amp;Zeta;" char="&#918;"/>  <!-- greek capital letter zeta,  u+0396 -->
  <entity name="&amp;Eta;" char="&#919;"/>  <!-- greek capital letter eta,  u+0397 -->
  <entity name="&amp;Theta;" char="&#920;"/>  <!-- greek capital letter theta,  u+0398 ISOgrk3 -->
  <entity name="&amp;Iota;" char="&#921;"/>  <!-- greek capital letter iota,  u+0399 -->
  <entity name="&amp;Kappa;" char="&#922;"/>  <!-- greek capital letter kappa,  u+039A -->
  <entity name="&amp;Lambda;" char="&#923;"/>  <!-- greek capital letter lambda,  u+039B ISOgrk3 -->
  <entity name="&amp;Mu;" char="&#924;"/>  <!-- greek capital letter mu,  u+039C -->
  <entity name="&amp;Nu;" char="&#925;"/>  <!-- greek capital letter nu,  u+039D -->
  <entity name="&amp;Xi;" char="&#926;"/>  <!-- greek capital letter xi,  u+039E ISOgrk3 -->
  <entity name="&amp;Omicron;" char="&#927;"/>  <!-- greek capital letter omicron,  u+039F -->
  <entity name="&amp;Pi;" char="&#928;"/>  <!-- greek capital letter pi,  u+03A0 ISOgrk3 -->
  <entity name="&amp;Rho;" char="&#929;"/>  <!-- greek capital letter rho,  u+03A1 -->
  <!-- (there is no Sigmaf, and no u+03A2 character either) -->
  <entity name="&amp;Sigma;" char="&#931;"/>  <!-- greek capital letter sigma,  u+03A3 ISOgrk3 -->
  <entity name="&amp;Tau;" char="&#932;"/>  <!-- greek capital letter tau,  u+03A4 -->
  <entity name="&amp;Upsilon;" char="&#933;"/>  <!-- greek capital letter upsilon,  u+03A5 ISOgrk3 -->
  <entity name="&amp;Phi;" char="&#934;"/>  <!-- greek capital letter phi,  u+03A6 ISOgrk3 -->
  <entity name="&amp;Chi;" char="&#935;"/>  <!-- greek capital letter chi,  u+03A7 -->
  <entity name="&amp;Psi;" char="&#936;"/>  <!-- greek capital letter psi,  u+03A8 ISOgrk3 -->
  <entity name="&amp;Omega;" char="&#937;"/>  <!-- greek capital letter omega,  u+03A9 ISOgrk3 -->

  <entity name="&amp;alpha;" char="&#945;"/>  <!-- greek small letter alpha, u+03B1 ISOgrk3 -->
  <entity name="&amp;beta;" char="&#946;"/>  <!-- greek small letter beta,  u+03B2 ISOgrk3 -->
  <entity name="&amp;gamma;" char="&#947;"/>  <!-- greek small letter gamma,  u+03B3 ISOgrk3 -->
  <entity name="&amp;delta;" char="&#948;"/>  <!-- greek small letter delta,  u+03B4 ISOgrk3 -->
  <entity name="&amp;epsilon;" char="&#949;"/>  <!-- greek small letter epsilon,  u+03B5 ISOgrk3 -->
  <entity name="&amp;zeta;" char="&#950;"/>  <!-- greek small letter zeta,  u+03B6 ISOgrk3 -->
  <entity name="&amp;eta;" char="&#951;"/>  <!-- greek small letter eta,  u+03B7 ISOgrk3 -->
  <entity name="&amp;theta;" char="&#952;"/>  <!-- greek small letter theta,  u+03B8 ISOgrk3 -->
  <entity name="&amp;iota;" char="&#953;"/>  <!-- greek small letter iota,  u+03B9 ISOgrk3 -->
  <entity name="&amp;kappa;" char="&#954;"/>  <!-- greek small letter kappa,  u+03BA ISOgrk3 -->
  <entity name="&amp;lambda;" char="&#955;"/>  <!-- greek small letter lambda,  u+03BB ISOgrk3 -->
  <entity name="&amp;mu;" char="&#956;"/>  <!-- greek small letter mu,  u+03BC ISOgrk3 -->
  <entity name="&amp;nu;" char="&#957;"/>  <!-- greek small letter nu,  u+03BD ISOgrk3 -->
  <entity name="&amp;xi;" char="&#958;"/>  <!-- greek small letter xi,  u+03BE ISOgrk3 -->
  <entity name="&amp;omicron;" char="&#959;"/>  <!-- greek small letter omicron,  u+03BF NEW -->
  <entity name="&amp;pi;" char="&#960;"/>  <!-- greek small letter pi,  u+03C0 ISOgrk3 -->
  <entity name="&amp;rho;" char="&#961;"/>  <!-- greek small letter rho,  u+03C1 ISOgrk3 -->
  <entity name="&amp;sigmaf;" char="&#962;"/>  <!-- greek small letter final sigma,  u+03C2 ISOgrk3 -->
  <entity name="&amp;sigma;" char="&#963;"/>  <!-- greek small letter sigma,  u+03C3 ISOgrk3 -->
  <entity name="&amp;tau;" char="&#964;"/>  <!-- greek small letter tau,  u+03C4 ISOgrk3 -->
  <entity name="&amp;upsilon;" char="&#965;"/>  <!-- greek small letter upsilon,  u+03C5 ISOgrk3 -->
  <entity name="&amp;phi;" char="&#966;"/>  <!-- greek small letter phi,  u+03C6 ISOgrk3 -->
  <entity name="&amp;chi;" char="&#967;"/>  <!-- greek small letter chi,  u+03C7 ISOgrk3 -->
  <entity name="&amp;psi;" char="&#968;"/>  <!-- greek small letter psi,  u+03C8 ISOgrk3 -->
  <entity name="&amp;omega;" char="&#969;"/>  <!-- greek small letter omega,  u+03C9 ISOgrk3 -->
  <entity name="&amp;thetasym;" char="&#977;"/>  <!-- greek small letter theta symbol,  u+03D1 NEW -->
  <entity name="&amp;upsih;" char="&#978;"/>  <!-- greek upsilon with hook symbol,  u+03D2 NEW -->
  <entity name="&amp;piv;" char="&#982;"/>  <!-- greek pi symbol,  u+03D6 ISOgrk3 -->

  <!-- General Punctuation -->
  <entity name="&amp;bull;" char="&#8226;"/>  <!-- bullet, =black small circle, u+2022 ISOpub  -->
  <!-- bullet is NOT the same as bullet operator, u+2219 -->
  <entity name="&amp;hellip;" char="&#8230;"/>  <!-- horizontal ellipsis, =three dot leader, u+2026 ISOpub  -->
  <entity name="&amp;prime;" char="&#8242;"/>  <!-- prime, =minutes, =feet, u+2032 ISOtech -->
  <entity name="&amp;Prime;" char="&#8243;"/>  <!-- double prime, =seconds, =inches, u+2033 ISOtech -->
  <entity name="&amp;oline;" char="&#8254;"/>  <!-- overline, =spacing overscore, u+203E NEW -->
  <entity name="&amp;frasl;" char="&#8260;"/>  <!-- fraction slash, u+2044 NEW -->
  <!-- Letterlike Symbols -->
  <entity name="&amp;weierp;" char="&#8472;"/>  <!-- script capital P, =power set, =Weierstrass p, u+2118 ISOamso -->
  <entity name="&amp;image;" char="&#8465;"/>  <!-- blackletter capital I, =imaginary part, u+2111 ISOamso -->
  <entity name="&amp;real;" char="&#8476;"/>  <!-- blackletter capital R, =real part symbol, u+211C ISOamso -->
  <entity name="&amp;trade;" char="&#8482;"/>  <!-- trade mark sign, u+2122 ISOnum -->
  <entity name="&amp;alefsym;" char="&#8501;"/>  <!-- alef symbol, =first transfinite cardinal, u+2135 NEW -->
  <!-- alef symbol is NOT the same as hebrew letter alef, u+05D0 although the same glyph
     could be used to depict both characters -->

  <!-- Arrows -->
  <entity name="&amp;larr;" char="&#8592;"/>  <!-- leftwards arrow, u+2190 ISOnum -->
  <entity name="&amp;uarr;" char="&#8593;"/>  <!-- upwards arrow, u+2191 ISOnum-->
  <entity name="&amp;rarr;" char="&#8594;"/>  <!-- rightwards arrow, u+2192 ISOnum -->
  <entity name="&amp;darr;" char="&#8595;"/>  <!-- downwards arrow, u+2193 ISOnum -->
  <entity name="&amp;harr;" char="&#8596;"/>  <!-- left right arrow, u+2194 ISOamsa -->
  <entity name="&amp;crarr;" char="&#8629;"/>  <!-- downwards arrow with corner leftwards, =carriage return, u+21B5 NEW -->
  <entity name="&amp;lArr;" char="&#8656;"/>  <!-- leftwards double arrow, u+21D0 ISOtech -->
  <!-- Unicode does not say that lArr is the same as the 'is implied by' arrow but also 
     does not have any other character for that function. So ? lArr can be used for 
     'is implied by' as ISOtech suggests -->
  <entity name="&amp;uArr;" char="&#8657;"/>  <!-- upwards double arrow, u+21D1 ISOamsa -->
  <entity name="&amp;rArr;" char="&#8658;"/>  <!-- rightwards double arrow, u+21D2 ISOtech -->
  <!-- Unicode does not say this is the 'implies' character but does not have another 
     character with this function so ? rArr can be used for 'implies' as ISOtech suggests -->
  <entity name="&amp;dArr;" char="&#8659;"/>  <!-- downwards double arrow, u+21D3 ISOamsa -->
  <entity name="&amp;hArr;" char="&#8660;"/>  <!-- left right double arrow, u+21D4 ISOamsa -->

  <!-- Mathematical Operators -->
  <entity name="&amp;forall;" char="&#8704;"/>  <!-- for all, u+2200 ISOtech -->
  <entity name="&amp;part;" char="&#8706;"/>  <!-- partial differential, u+2202 ISOtech  -->
  <entity name="&amp;exist;" char="&#8707;"/>  <!-- there exists, u+2203 ISOtech -->
  <entity name="&amp;empty;" char="&#8709;"/>  <!-- empty set, =null set, =diameter, u+2205 ISOamso -->
  <entity name="&amp;nabla;" char="&#8711;"/>  <!-- nabla, =backward difference, u+2207 ISOtech -->
  <entity name="&amp;isin;" char="&#8712;"/>  <!-- element of, u+2208 ISOtech -->
  <entity name="&amp;notin;" char="&#8713;"/>  <!-- not an element of, u+2209 ISOtech -->
  <entity name="&amp;ni;" char="&#8715;"/>  <!-- contains as member, u+220B ISOtech -->
  <!-- should there be a more memorable name than 'ni'? -->
  <entity name="&amp;prod;" char="&#8719;"/>  <!-- n-ary product, =product sign, u+220F ISOamsb -->
  <!-- prod is NOT the same character as u+03A0 'greek capital letter pi' though the same 
     glyph might be used for both -->
  <entity name="&amp;sum;" char="&#8721;"/>  <!-- n-ary sumation, u+2211 ISOamsb -->
  <!-- sum is NOT the same character as u+03A3 'greek capital letter sigma' though the same 
     glyph might be used for both -->
  <entity name="&amp;minus;" char="&#8722;"/>  <!-- minus sign, u+2212 ISOtech -->
  <entity name="&amp;lowast;" char="&#8727;"/>  <!-- asterisk operator, u+2217 ISOtech -->
  <entity name="&amp;radic;" char="&#8730;"/>  <!-- square root, =radical sign, u+221A ISOtech -->
  <entity name="&amp;prop;" char="&#8733;"/>  <!-- proportional to, u+221D ISOtech -->
  <entity name="&amp;infin;" char="&#8734;"/>  <!-- infinity, u+221E ISOtech -->
  <entity name="&amp;ang;" char="&#8736;"/>  <!-- angle, u+2220 ISOamso -->
  <entity name="&amp;and;" char="&#8869;"/>  <!-- logical and, =wedge, u+2227 ISOtech -->
  <entity name="&amp;or;" char="&#8870;"/>  <!-- logical or, =vee, u+2228 ISOtech -->
  <entity name="&amp;cap;" char="&#8745;"/>  <!-- intersection, =cap, u+2229 ISOtech -->
  <entity name="&amp;cup;" char="&#8746;"/>  <!-- union, =cup, u+222A ISOtech -->
  <entity name="&amp;int;" char="&#8747;"/>  <!-- integral, u+222B ISOtech -->
  <entity name="&amp;there4;" char="&#8756;"/>  <!-- therefore, u+2234 ISOtech -->
  <entity name="&amp;sim;" char="&#8764;"/>  <!-- tilde operator, =varies with, =similar to, u+223C ISOtech -->
  <!-- tilde operator is NOT the same character as the tilde, u+007E, although the same 
     glyph might be used to represent both  -->
  <entity name="&amp;cong;" char="&#8773;"/>  <!-- approximately equal to, u+2245 ISOtech -->
  <entity name="&amp;asymp;" char="&#8776;"/>  <!-- almost equal to, =asymptotic to, u+2248 ISOamsr -->
  <entity name="&amp;ne;" char="&#8800;"/>  <!-- not equal to, u+2260 ISOtech -->
  <entity name="&amp;equiv;" char="&#8801;"/>  <!-- identical to, u+2261 ISOtech -->
  <entity name="&amp;le;" char="&#8804;"/>  <!-- less-than or equal to, u+2264 ISOtech -->
  <entity name="&amp;ge;" char="&#8805;"/>  <!-- greater-than or equal to, u+2265 ISOtech -->
  <entity name="&amp;sub;" char="&#8834;"/>  <!-- subset of, u+2282 ISOtech -->
  <entity name="&amp;sup;" char="&#8835;"/>  <!-- superset of, u+2283 ISOtech -->
  <!-- note that nsup, 'not a superset of, u+2283' is not covered by the Symbol font 
     encoding and is not included. Should it be, for symmetry? It is in ISOamsn  --> 
  <entity name="&amp;nsub;" char="&#8836;"/>  <!-- not a subset of, u+2284 ISOamsn -->
  <entity name="&amp;sube;" char="&#8838;"/>  <!-- subset of or equal to, u+2286 ISOtech -->
  <entity name="&amp;supe;" char="&#8839;"/>  <!-- superset of or equal to, u+2287 ISOtech -->
  <entity name="&amp;oplus;" char="&#8853;"/>  <!-- circled plus, =direct sum, u+2295 ISOamsb -->
  <entity name="&amp;otimes;" char="&#8855;"/>  <!-- circled times, =vector product, u+2297 ISOamsb -->
  <entity name="&amp;perp;" char="&#8869;"/>  <!-- up tack, =orthogonal to, =perpendicular, u+22A5 ISOtech -->
  <entity name="&amp;sdot;" char="&#8901;"/>  <!-- dot operator, u+22C5 ISOamsb -->
  <!-- dot operator is NOT the same character as u+00B7 middle dot -->

  <!-- Miscellaneous Technical -->
  <entity name="&amp;lceil;" char="&#8968;"/>  <!-- left ceiling, =apl upstile, u+2308, ISOamsc  -->
  <entity name="&amp;rceil;" char="&#8969;"/>  <!-- right ceiling, u+2309, ISOamsc  -->
  <entity name="&amp;lfloor;" char="&#8970;"/>  <!-- left floor, =apl downstile, u+230A, ISOamsc  -->
  <entity name="&amp;rfloor;" char="&#8971;"/>  <!-- right floor, u+230B, ISOamsc  -->
  <entity name="&amp;lang;" char="&#9001;"/>  <!-- left-pointing angle bracket, =bra, u+2329 ISOtech -->
  <!-- lang is NOT the same character as u+003C 'less than' 
     or u+2039 'single left-pointing angle quotation mark' -->
  <entity name="&amp;rang;" char="&#9002;"/>  <!-- right-pointing angle bracket, =ket, u+232A ISOtech -->
  <!-- rang is NOT the same character as u+003E 'greater than' 
     or u+203A 'single right-pointing angle quotation mark' -->

  <!-- Geometric Shapes -->
  <entity name="&amp;loz;" char="&#9674;"/>  <!-- lozenge, u+25CA ISOpub -->

  <!-- Miscellaneous Symbols -->
  <entity name="&amp;spades;" char="&#9824;"/>  <!-- black spade suit, u+2660 ISOpub -->
  <!-- black here seems to mean filled as opposed to hollow -->
  <entity name="&amp;clubs;" char="&#9827;"/>  <!-- black club suit, =shamrock, u+2663 ISOpub -->
  <entity name="&amp;hearts;" char="&#9829;"/>  <!-- black heart suit, =valentine, u+2665 ISOpub -->
  <entity name="&amp;diams;" char="&#9830;"/>  <!-- black diamond suit, u+2666 ISOpub -->
  <!-- C0 Controls and Basic Latin -->
  <entity name="&amp;quot;" char="&#34;"/>  <!--  quotation mark, =apl quote, u+0022 ISOnum -->
  <entity name="&amp;amp;" char="&#38;amp;"/>  <!--  ampersand, u+0026 ISOnum -->
  <entity name="&amp;lt;" char="&#60;"/>  <!--  less-than sign, u+003C ISOnum -->
  <entity name="&amp;gt;" char="&#62;"/>  <!--  greater-than sign, u+003E ISOnum -->

  <!-- Latin Extended-A -->
  <entity name="&amp;OElig;" char="&#338;"/>  <!--  latin capital ligature oe, u+0152 ISOlat2 -->
  <entity name="&amp;oelig;" char="&#339;"/>  <!--  latin small ligature oe, u+0153 ISOlat2 -->
  <!-- ligature is a misnomer, this is a separate character in some languages -->
  <entity name="&amp;Scaron;" char="&#352;"/>  <!--  latin capital letter s with caron, u+0160 ISOlat2 -->
  <entity name="&amp;scaron;" char="&#353;"/>  <!--  latin small letter s with caron, u+0161 ISOlat2 -->
  <entity name="&amp;Yuml;" char="&#376;"/>  <!--  latin capital letter y with diaeresis, u+0178 ISOlat2 -->

  <!-- Spacing Modifier Letters -->
  <entity name="&amp;circ;" char="&#710;"/>  <!--  modifier letter circumflex accent, u+02C6 ISOpub -->
  <entity name="&amp;tilde;" char="&#732;"/>  <!--  small tilde, u+02DC ISOdia -->

  <!-- General Punctuation -->
  <entity name="&amp;ensp;" char="&#8194;"/>  <!--  en space, u+2002 ISOpub -->
  <entity name="&amp;emsp;" char="&#8195;"/>  <!--  em space, u+2003 ISOpub -->
  <entity name="&amp;thinsp;" char="&#8201;"/>  <!--  thin space, u+2009 ISOpub -->
  <entity name="&amp;zwnj;" char="&#8204;"/>  <!--  zero width non-joiner, u+200C NEW RFC 2070 -->
  <entity name="&amp;zwj;" char="&#8205;"/>  <!--  zero width joiner, u+200D NEW RFC 2070 -->
  <entity name="&amp;lrm;" char="&#8206;"/>  <!--  left-to-right mark, u+200E NEW RFC 2070 -->
  <entity name="&amp;rlm;" char="&#8207;"/>  <!--  right-to-left mark, u+200F NEW RFC 2070 -->
  <entity name="&amp;ndash;" char="&#8211;"/>  <!--  en dash, u+2013 ISOpub -->
  <entity name="&amp;mdash;" char="&#8212;"/>  <!--  em dash, u+2014 ISOpub -->
  <entity name="&amp;lsquo;" char="&#8216;"/>  <!--  left single quotation mark, u+2018 ISOnum -->
  <entity name="&amp;rsquo;" char="&#8217;"/>  <!--  right single quotation mark, u+2019 ISOnum -->
  <entity name="&amp;sbquo;" char="&#8218;"/>  <!--  single low-9 quotation mark, u+201A NEW -->
  <entity name="&amp;ldquo;" char="&#8220;"/>  <!--  left double quotation mark, u+201C ISOnum -->
  <entity name="&amp;rdquo;" char="&#8221;"/>  <!--  right double quotation mark, u+201D ISOnum -->
  <entity name="&amp;bdquo;" char="&#8222;"/>  <!--  double low-9 quotation mark, u+201E NEW -->
  <entity name="&amp;dagger;" char="&#8224;"/>  <!--  dagger, u+2020 ISOpub -->
  <entity name="&amp;Dagger;" char="&#8225;"/>  <!--  double dagger, u+2021 ISOpub -->
  <entity name="&amp;permil;" char="&#8240;"/>  <!--  per mille sign, u+2030 ISOtech -->
  <entity name="&amp;lsaquo;" char="&#8249;"/>  <!--  single left-pointing angle quotation mark, u+2039 ISO proposed -->
  <!-- lsaquo is proposed but not yet ISO standardised -->
  <entity name="&amp;rsaquo;" char="&#8250;"/>  <!--  single right-pointing angle quotation mark, u+203A ISO proposed -->
  <!-- rsaquo is proposed but not yet ISO standardised -->

 </xsl:variable>

 <xsl:template match="/">
  
  <xsl:apply-templates/>
  
 </xsl:template>
 
 <xsl:template match="* | @*">
  <xsl:copy>
   <xsl:apply-templates select="* | @*" />
  </xsl:copy>
 </xsl:template>
 
 <xsl:template match="atom:content">
  <xsl:variable name="from" select="($htmlEntities/@name)"/>
  <xsl:variable name="to" select="($htmlEntities/@char)"/>
  <content>
   <xsl:value-of select="functx:replace-multi(. ,$from,$to)" disable-output-escaping="yes" />
  </content>
 </xsl:template>

 <xsl:function name="functx:replace-multi" as="xs:string?">
  <xsl:param name="arg" as="xs:string?"/>
  <xsl:param name="changeFrom" as="xs:string*"/>
  <xsl:param name="changeTo" as="xs:string*"/>

  <xsl:sequence select="
     if (count($changeFrom) > 0)
     then functx:replace-multi(
      replace($arg, $changeFrom[1],
        functx:if-absent($changeTo[1],'')),
      $changeFrom[position() > 1],
      $changeTo[position() > 1])
     else $arg "/>

 </xsl:function>

 <xsl:function name="functx:if-absent" as="item()*">
  <xsl:param name="arg" as="item()*"/>
  <xsl:param name="value" as="item()*"/>

  <xsl:sequence select="
   if (exists($arg))
   then $arg
   else $value
   "/>

 </xsl:function>
</xsl:stylesheet>

Tuesday 11 February 2014

Rename MarkLogic Document Function

Simple function to rename a document within a MarkLogic database and retain the collections and permissions:

declare function local:document-rename(
   $old-uri as xs:string, $new-uri as xs:string)
  as empty-sequence()
{
    xdmp:document-delete($old-uri),
    let $permissions := xdmp:document-get-permissions($old-uri)
    let $collections := xdmp:document-get-collections($old-uri)
    return xdmp:document-insert(
      $new-uri, doc($old-uri),
      if ($permissions) then $permissions
      else xdmp:default-permissions(),
      if ($collections) then $collections
      else xdmp:default-collections(),
      xdmp:document-get-quality($old-uri)
    )
    ,
    let $prop-ns := namespace-uri()
    let $properties :=
      xdmp:document-properties($old-uri)/node()
        [ namespace-uri(.) ne $prop-ns ]
    return xdmp:document-set-properties($new-uri, $properties)
};

Thursday 23 January 2014

XSLT transformation to create newlines within a cell in a csv file

An issue with generating a newline within a spreadsheet when transforming an XML input to a csv file is that the newline character

<xsl:text>&#xa;</xsl:text>

creates a new row in the list.

To get around this use quote marks around the content of that cell:

<xsl:text>"</xsl:text>
      <xsl:value-of select="string-join(paths/path,'&#xa;')"/>
<xsl:text>"</xsl:text>

This example is converting a subversion log file to a csv and putting each changed file onto a newline within that particular cell

Friday 17 January 2014

Gmail Search Operators

has:attachment - returns all mail with attachments

size - returns messages for a specified size  - eg size:5m - returns messages of 5 Mb, size:512000 returns messages of 500 kb

larger -  returns messages larger than a specified size - eg larger:5m - also larger_than


smaller -  returns messages smaller than a specified size - eg smaller:5m - also smaller_than

older_than returns messages older than a specified time where y = year, m = month and d = day eg older_than:2y

newer_than returns messages newer than a specified time where y = year, m = month and d = day eg older_than:2y

has:userlabels - returns messages that have user-defined labels.

has:nouserlabels - returns messages that have no user-defined labels.

Saturday 4 January 2014

Word Counting in XSLT

A word count is easily determined within either a string or within the text nodes of an xml document.

To count the words within a string supplied in a variable $string:

 <xsl:sequence select="count(tokenize($string, '\W+')[. != ''])"/>

To do the same for the whole XML document use:

<xsl:template match="/">
 <xsl:sequence select="count(//text()/tokenize(., ’\W+’)[.!=’’])"/>
</xsl:template>

To count the words within the text nodes of an xml document and sort them based upon the word frequency, use this:

<xsl:template match="/">
<xsl:for-each-group group-by="." select="for $word in //text()/tokenize(., ’\W+’)[.!=’’] return lower-case($word)">
 <xsl:sort select="count(current-group())" order="descending"/>
 <word word="{current-grouping-key()}"
  frequency="{count(current-group())}"/>
</xsl:for-each-group>
</xsl:template>