How-To's

 


Logging User Searches

User searches are logged to the file searches.log in the archive directory, using common log format. You can use any standard
web analysis software to analyze this log file. Webglimpse has built in support for wusage, a web site statistics program that can help you determine who is viewing your search archives and how they are using them. Webglimpse can be configured to be compatible with this program. You can find more information on wusage by visiting their website: http://www.boutell.com/wusage/

wusage.conf - is the configuration file for the logging user searches.

Adding Sponsored Links

On the Manage Archive screen using the web administrator, check the box labeled Optional - Include Sponsored Searchfeed links. (See Figure 20 below)

Figure 20

Next, click on Set up/manage Account, which is a link to set up an account with Searchfeed.com. An on-line advertising and content provider company, Searchfeed.com provides sponsored search results that are supposed to be relevant to the keywords on which the user searches. Once your account is set up, simply enter the partner ID and track ID provided by Searchfeed.com and choose how many ads should appear at the top of your search results.

A user account is pretty simple to set up. To get the most out of your ads, you can use the suite of on-line tools provided by Searchfeed.com to monitor what keywords users are searching on, which ads they are clicking on and how much you make from each click.

Search Tips for the End User

Webglimpse supports boolean queries and wildcards, as well as some limited regular expressions. Words separated by spaces are treated as a phrase. You can use the keywords AND, OR and NOT, along with parenthesis, and the '#' character for wildcard matching.

Example boolean queries:

will match documents containing the words "tree" and "leaf" anywhere in the document

will match documents containing the words "tree" and "leaf" but not the word "binary"


will match documents containing the phrase "open source" or the word "shareware"

Example use of wildcard #:

will match 'language' and 'languages' but not 'slang' or 'clangs'. This is different than checking the 'partial match' box in the query form, which would match all three patterns.


will match all patterns mentioned above - 'languages', 'language', 'slang' and 'clangs'. This is the same effect as checking the 'partial match' box, but checking the box would be faster.


will match patterns like www.anything.com

When using complex queries like the above, you may sometimes get matches to code inside HTML tags instead of to the visible text in the file. To avoid this, check the "Use Filters" box in the search options form. (You should always check this box when searching non-ascii type documents, like PDF or Word files. If you don't see it on the form, it may have been automatically checked for you in a hidden tag.)

There's more...

The above examples cover the most common types of queries. Webglimpse (and glimpse) also can accept a limited form of regular expressions, including support for the '|', '*', '[]' characters and parentheses. Please see the Glimpse man page Patterns section for all the gory details. You will need to know that Webglimpse calls glimpse with the switches -U -W -j -y -z , plus others determined by the checkboxes on the form. View the source of the results page for the exact call to glimpse made by a particular webglimpse query.

Note: The above document is for the commercial/contributors version of Webglimpse. EDU and demo versions that do not have the translator module have to use the original ',' for OR, ';' for AND, and '~' for NOT

Language Issues

Glimpse can index any language that is single-byte encoded - that includes all European languages such as Spanish, French, etc. (Currently we cannot do double-byte encoded languages such as Chinese, or with special rules for word breaks such as Thai.) By using the included filter program, the special ¨aut; characters are correctly indexed whether they appear as html character entities or actual upper-ascii characters.

Thanks to contributions from our users, we have incorporated search forms and result templates in the following languages:
Hebrew, German, Spanish, Italian, Finnish, Portuguese and Estonian. Creating a search interface that searches and displays correctly in these languages is just a single click!

Licensed users can easily add their own templates in any language with the Custom Output Module .

 

Indexing Files with Spaces in the Names

Step 1: You will make the following change in one or two files, depending on the version of Webglimpse you have. Change

$FILE_END_MARK = " ";

to something other than a space. For example, set it to

$FILE_END_MARK = "\t";

to use a tab character. (Then you can't index files with tabs in the names, but those are typically uncommon)

The file(s) you need to change are:

Webglimpse 2.0 and above

Webglimpse 1.X

/your/webglimpse/home/lib/wgHeader.pm

/your/webglimpse/home/makenh

and

webglimpse (installed somewhere under your cgi-bin area)

Step 2: A corresponding change must be made in glimpse and glimpseindex: (*Note: This will require that you recompile the glimpse binaries)

Edit glimpse.h (in the /index subdirectory of the glimpse sources) to set a matching value in glimpse itself.
Change the line

#define FILE_END_MARK ' '

to match the value you set in makenh, for example, change it to

#define FILE_END_MARK '\t';

Recompile glimpse. Copy the binaries to wherever you keep them, normally somewhere in your path.

 

Glimpse 4.18.0 and above

From the directory where you extracted glimpse sources, run

./configure --with-file-end-mark='\t'
make clean
make
make install

Glimpse 4.14 thru 4.17.4

Glimpse 4.13.2 and below

From the directory where you extracted glimpse sources, run

./configure --file-end-mark='\t'
make clean
make
make install

Step 3: Reindex your archive with wgreindex

Continue to Next Page >>