Finnish pages / Suomenkieliset sivut.
This project aims at creating affix and dictionary files to allow ispell to be used for spell checking Finnish language documents.
Both the dictionary and the affix files are distributed under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. (See the file COPYING.)
This spell check dictionary for ispell should cover a moderate portion of common Finnish words.
The dictionary is usable, but not very suitable for serious use. (See Flaws.)
Three different versions of this dictionary exists: small, medium and large. These differ only by the number of words derived from other words. (See statistics in CHANGELOG.)
The small version recognizes 756588 words, and requires 2.7 megabytes of hard disk space. Ispell uses 5 megabytes of memory when using this version, so it should be usable also on slower computers.
The medium version recognizes 1894529 words, and requires 5.3 megabytes of hard disk space. Ispell uses about 10 megabytes of memory when using this version. This is the recommended version.
The large version recognizes 6678677 words, and requires 9.0 megabytes of hard disk space. Ispell uses as much as 19 megabytes of memory when using this version. Using it may be not worth the memory used.
You can get files mentioned here from the address http://ispell-fi.sourceforge.net/. Like INSTALL.
After this you should do either this:
or this:
Spell checking Finnish with ispell should work now.
Many words do not have some word forms included. More words should therefore be tagged as roots for inflection.
Ispell has only elementary support for compound words, so they do not work very well. For example there may be no suggestions for a misspelled compound word.
Part of the names of countries (and places) are still written in lowercase letters. Additionally their amount could still be increased.
The dictionary contains rather lot of not commonly used words, and words of special areas (computers, linguistics). They may slow down ispell and take unnecessary disk space and memory. However, I don't know how big a benefit removing them would be. (But it is likely rather hard.)
There are also a number of abbreviations in the dictionary. They may cause some misspelled words to be accepted.
Additional word lists are appreciated, especially if they are both extensive and spell checked. (And, of course, free to add to this package distributed under the GNU GPL).
Please inform the authors (us) if ispell accepts a certainly misspelled word when using this dictionary. Before that, however, remember to make sure that the flaw is in the main dictionary, not in your personal dictionary (which is usually the file ~/.ispell_finnish).
These affix files are based on the affix file written by Martin Vermeer. It existed in version 0.1, and before that. Also the book Finnish grammar by Fred Karlsson (1983) published by Werner Söderström Oy, and its Finnish version have been a great help.
Currently the affix files are partially automatically generated by the genfisuffix program. If you are curious, you can get the source code from the address genfisuffix/genfisuffix-0.7.tar.bz2.
Hosted on:
Page last updated by mvermeer 2004-04-23.