| « ssh tunnel - to a server by custom port | Where Has All The PHP Gone? » |
Link: http://dev.mysql.com/tech-resources/articles/full-text-revealed.html
For a long time the ft_stopword_file variable was a poorly documented feature.
According to Wikipedia:
stop words is the name given to words which are filtered out prior to, or after, processing of natural language data (text).
Because of the above stopwords are not indexed in a full text index in MySQL.
For those of you seeking the list of stop words you can find the list in myisam/ft_static.c or in the link above you you don't want to dig in the source.
If you see the same output as below, for sure you are using the default/built-in stopwords file.
mysql> show variables like 'ft_stopword_file';
+------------------+------------+
| Variable_name | Value |
+------------------+------------+
| ft_stopword_file | (built-in) |
+------------------+------------+
Unfortunately there can only be one stopword file per MySQL instance and the default one is for English.
Fortunately there are options...
Follow up:
1. You can run one MySQL instance for every stopword file;
2. You can create a larger file containing stopwords for all languages in your server.
At first sight both are not that attractive... but I believe are reasonable solutions.
More to come in a future post...
But before that is anyone interested in maintaining a stop_word_file for a/his (native) language ?
Maybe we can start a project !
(I'll do the paper work !)
Attachments:
ft_stop_word_file-EN.txt (3.4 KB)
Trackback address for this post
Trackback URL (right click and copy shortcut/link location)
Recent comments