Note: the changes below are built in to Glimpse 4.14 and Webglimpse 2.1.01 and above; no manual changes are necessary with those versions
How to configure Glimpse with International Charsets
Instructions written by Reuven Lerner, http://lerner.co.il
Thanks to Reuven and to Didi Melchior for making this available!
(1) Modify the file Makefile.linux, such that Glimpse will be able to
index international characters. The line that currently reads
ISO_CHAR_SET = 0
should be changed to read
ISO_CHAR_SET = 1
(2) Compile Glimpse by saying
make -f Makefile.linux
or the appropriate makefile for your system.
If you have already compiled Glimpse in the past, then you will
first want to "clean" out the previous compilations with:
make -f Makefile.linux clean
(3) Download and install Webglimpse. (http://webglimpse.net/download.html)
(4) You will now have to tell Glimpse to pay attention to the
"locale," which is the Unix term for "what country you're in, and
thus what characters should be considered letters." The locale
for Hebrew is "iw_IL", and is described in the directory
/usr/share/local/eiw_IL. The contents of this directory are as
follows:
LC_COLLATE LC_CTYPE LC_MESSAGES
LC_MONETARY LC_NUMERIC LC_TIME
If LC_CTYPE and LC_COLLATE aren't there, you will probably have
problems.
(5) In Webglimpse, you will have to edit the files "confarc",
"makecron", "makehn", and "webglimpse". (The latter is a program
in the cgi-bin directory.)
Each of these files must have the following two lines added at the
top:
$ENV{'LANG'} = 'iw_IL';
$ENV{'LC_ALL'} = 'iw_IL';
(6) Remove the existing index with rmarc, and build it again with
confarc. Specify all subdirectories to enable search by
directory. So you can specify /home/kalir/www,
/home/kalir/www/Articles/, and /home/kalir/www/Articles-Havot/.
(7) To enable searching in Hebrew, you will have to modify the
"webglimpse" program itself, commenting out the following two
lines by adding a # character at the beginning of each line:
#$highlight =~ s/^\W+//;
#$highlight = join("|",split(/\W+/,$highlight));
(8) The $ind_dir variable in "webglimpse" should contain the name
of the HTML root directory. For example, this might be
"/usr/local/httpd/htdocs"
(9) There are several important files to keep track of, and which you
may want to modify:
(a) wgindex.html, which is the form for searching through *all*
indexed files.
(b) wgall.html, an HTML form which allows you to search by
directories.
(c) archive.cfg, configuration information about indexed
directories. You probably won't have to change this.
(10) Every time you index a directory, check the file .glimpse_index
(with an underscore, not a hyphen) to ensure that words --
particularly Hebrew words -- were included. If Hebrew words
weren't included in the index, then searches won't return any
results. (This file will be in your archive directory, currently by default
may be the same directory you indexed.)
(11) In the same directory as .glimpse_index is .wgfilter-index (with
a hyphen, not an underscore) which removes certain file
extensions from the indexing. For example, you probably don't
want to index Perl programs, so you can remove everything ending
with .pl. You probably won't need to change this, but it exists
if you need it.
(12) Another important thing is to use:
in the HTML search form.