Saturday, June 5, 2010

Porter Stemmer


Porter Stemmer

A PHP5 only class to faciliate the stemming of word. This is the process of removing suffixs from words to produce their "stems", eg national -> nation, running -> run, adoption -> adopt. This class, as the name might suggest, implements the algorithm by Martin Porter.

Usage

The public API is very simple, consisting of one function. The function takes two arguments, one being the word you wish to stem, and the optional second being whether to use the built in cache or not. Only use the cache if you know you might be stemming the same word multiple times.
    $stem PorterStemmer::Stem("nationalize");?>A difference from the published code, is that this version also removes English suffixes, eg "ise" as well as "ize".

No comments:

Post a Comment