Saturday 4 January 2014

Word Counting in XSLT

A word count is easily determined within either a string or within the text nodes of an xml document.

To count the words within a string supplied in a variable $string:

 <xsl:sequence select="count(tokenize($string, '\W+')[. != ''])"/>

To do the same for the whole XML document use:

<xsl:template match="/">
 <xsl:sequence select="count(//text()/tokenize(., ’\W+’)[.!=’’])"/>
</xsl:template>

To count the words within the text nodes of an xml document and sort them based upon the word frequency, use this:

<xsl:template match="/">
<xsl:for-each-group group-by="." select="for $word in //text()/tokenize(., ’\W+’)[.!=’’] return lower-case($word)">
 <xsl:sort select="count(current-group())" order="descending"/>
 <word word="{current-grouping-key()}"
  frequency="{count(current-group())}"/>
</xsl:for-each-group>
</xsl:template>

No comments:

Post a Comment