Webglimpse 2.X Overview

Webglimpse has two parts : Glimpse, the fast C engine which does the text indexing and pattern matching, and Webglimpse proper, the flexible Perl spider, archive manager and user interface script.

When you build an index, you first specify some rules for which files should be included. You might index by directory, by following links from a starting page, or a combination of both. The indexing script first makes a complete list of all files to be indexed, and retrieves any remote links. Then it feeds this list to the glimpseindex program, which builds a keyword index for fast searching.

When a user runs a search, the webglimpse.cgi script first gets the query, checks all the options and does some pre-processing. Then it sends the query to glimpse for a fast search of the index and files. Finally, it parses the raw results, formats them nicely according to the customized format on that site, ranks them, highlights keywords and presents the results back to the user.

Webglimpse 2.X has been out for almost two years and is in wide use, with new development continuing steadily and new releases about once a month.

See below for detailed file list.

Webglimpse Directory and File Layout

The basic layout of a 2.X installation looks something like this, supposing you installed into the default /usr/local/wg2 directory:

/usr/local/wg2 Webglimpse home directory (or WGHOME)
/usr/local/wg2/archives Directory for archive-related files
Libraries, commercial modules go here
Template files are used to make each archive
Dist files are copied into each archive
First archive created through the web interface
Second ...
/home/httpd/cgi-bin/wg2 cgi-bin directory where webglimpse and wgarcmin scripts are placed

Important files:

Under the webglimpse home directory (e.g. /usr/local/wg2/)
wgsites.conf Contains domain/site info, such as DocumentRoot Normally edited through the web interface
archives.list Contains list of all archives, titles and where they are located

Under each archive directory (e.g. /usr/local/wg2/archives/1/)
archive.cfg Archive configuration, normally edited through the web interface
wgreindex Script for reindexing this archive,can be placed in your crontab
wgindex.html Sample search form, you can copy this to your website
wgall.html Sample form to search by subdirectory
.glimpse_filters Programs to filter non-ascii files into text for indexing You can add programs here for PDF, Word, compressed, and other file formats
wgoutput.cfg Custom output configuration file (with commercial version only)
.wgrankhits.cfg Custom ranking formula for ordering results (commercial only)

Under the webglimpse cgi-bin directory (e.g. /home/httpd/cgi-bin/wg2)
webglimpse.cgi the cgi script to perform searches (calls glimpse)
wgarcmin.cgi the cgi script to manage archives

The only files that you generally want to edit by hand, are the special configuration files in each archive directory, including .glimpse_filters, .wgoutput.cfg, .wgrankhits.cfg, .wginputfields, .wgoutputfields. Detailed instructions for editing each of these files (and why you would want to) are in the Howto's on the main documentation page.

