A word count is easily determined within either a string or within the text nodes of an xml document.
To count the words within a string supplied in a variable $string:
<xsl:sequence select="count(tokenize($string, '\W+')[. != ''])"/>
To do the same for the whole XML document use:
<xsl:template match="/">
<xsl:sequence select="count(//text()/tokenize(., ’\W+’)[.!=’’])"/>
</xsl:template>
To count the words within the text nodes of an xml document and sort them based upon the word frequency, use this:
<xsl:template match="/">
<xsl:for-each-group group-by="." select="for $word in //text()/tokenize(., ’\W+’)[.!=’’] return lower-case($word)">
<xsl:sort select="count(current-group())" order="descending"/>
<word word="{current-grouping-key()}"
frequency="{count(current-group())}"/>
</xsl:for-each-group>
</xsl:template>
No comments:
Post a Comment