Quickstart ========== This is a very brief introduction to the use of PyStemmer. First, import the library: >>> import Stemmer Just for show, we'll display a list of the available stemming algorithms: >>> print Stemmer.algorithms() ['danish', 'dutch', 'english', 'finnish', 'french', 'german', 'italian', 'norwegian', 'porter', 'portuguese', 'russian', 'spanish', 'swedish'] Now, we'll get an instance of the english stemming algorithm: >>> stemmer = Stemmer.Stemmer('english') Stem a single word: >>> print stemmer.stemWord('cycling') cycl Stem a list of words: >>> print stemmer.stemWords(['cycling', 'cyclist']) ['cycl', 'cyclist'] Strings which are supplied are assumed to be UTF-8 encoded. We can use unicode input, too: >>> print stemmer.stemWords(['cycling', u'cyclist']) ['cycl', u'cyclist'] Each instance of the stemming algorithms uses a cache to speed up processing of common words. By default, the cache holds 10000 words, but this may be modified. The cache may be disabled entirely by setting the cache size to 0: >>> print stemmer.maxCacheSize 10000 >>> stemmer.maxCacheSize = 1000 >>> print stemmer.maxCacheSize 1000