Features : Ease of Use, Speed, Power, and Flexibility

Webglimpse is a feature-rich search engine that has been used on thousands of sites, all different flavors of Unix, and in countries from Nigeria to Norway, Chile to Canada. Nearly everything about Webglimpse is configurable: how to select the files to index, how to search them, and how to present the results to the user. Yet, we have tried to keep the install simple and quick so that you can get your search up and running today.

Glimpse is the powerful indexing and query system inside of Webglimpse. It can also be used as a stand-alone program in a Unix environment. Glimpse and glimpseindex are written in C for speed, while most of the management and interface parts of Webglimpse are written in Perl.

Specific features are listed below. Features with a red check are available in the commercial version; ones with a green check are also available in the free version for educational institutions. Click on the question mark to the right for detailed instruction on how to enable each feature.

Ease of Use | Speed & Power | Flexibility | Languages | System Requirements

Ease of Use

ComEduFeatures included
  Guaranteed install: if you purchase a license, we guarantee a successful install on your site. We will walk you through configuration on your server, and if desired will perform a remote install for you.
Web-ministration Interface: Manage all your archives (searchable sites) through a single web interface. Context-sensitive help tells you exact directory paths on your system where all configuration files are located.
  Automated Search Form: simple form allows users to search for ANY KEYWORDS, ALL KEYWORDS or EXACT PHRASE, and to find only matches WITHOUT certain words. Automatically generates correct boolean expression and sends to glimpse.
Search From Any Page of your Site: optional search box can be added to all pages of your site, so that users can search from anywhere. They can also search just the links on the page they are looking at, so that the hits are more likely to be relevent.

Speed and Power

ComEduFeatures included
* Fast Searching: Glimpse builds a keyword index in advance for very fast searching (though it can also access individual files for complex boolean queries). Uncommon words will be found rapidly even in a very large fileset, up to several Gigabytes. Common words (with 100's or 1000's of matches) will take longer, but if the number of hits returned can be limited, even those will be very fast. The core index and search programs are written in C.
    Glimpse uses a small index - typically less than 5% of the total data size - so it can usually be entirely loaded into memory.

Sample Search Times on a dual 400MHz processor running Linux (run on command line to not include page load time)
dataset sizeindex sizenumber of times keyword appearsSearch Time
12608 files, 1.5Gb3.1Mb41 sec
12608 files, 1.5Gb3.1Mb60631 sec to return all hits;
1 sec to return only top 20 hits
499 files, 64Mb300Kb491 sec
* Edu version will search fast for small numbers of returned hits; but when searching for common words the performance may be degraded.

Large Data Sets: Glimpse is used to handle data sets up to 9 Gigabytes, to our knowledge. Because of the two-level search design that localizes keywords to a 'block' of data, the index footprint is quite small, typically less than 5% of the total data set size. The speed of the search scales with the number of matches to the keyword and only secondarily with the total size of the indexed data.
  Result Caching/Large # of Hits: Webglimpse maintains a cache of recent searches, which allows even those searches returning a very large number of hits to return quickly if they are commonly performed. The cache also allows rapid navigation using a Next Hits toolbar, which lets the user jump to any page of hits returned.


Probably Webglimpse's Strongest Point - Extremely Flexible, Configurable rules for which files to index, how to search, pattern matching, result ranking, and more
ComEduFeatures included
  Customized Results Output: using a template file, the webglimpse administrator can control every aspect of the 'search results' page, from backgrounds and font colors to demarcations between individual hits. Results can be displayed using any valid HTML tags - in a list, a table, multiple individual tables, etc. They can be displayed by title, by filename or even by pulling a specific pattern out of the matching file.

Keyword highlighting and amount of context returned are also under administrative control.

  Customized Ranking of Hits: four built-in ranking schemes are available to the user as they search your site: rank by most recent first, by matches in title and meta tags, by link popularity, or by a combination of all these (the default). As the administrator, you can create your own ranking formula using all the available information about the match, and make your own customized ranking schemes available to your users (or limit them to one scheme of your choice). Re-sorting and ranking of hits is an extremely powerful tool that allows users to find the hits most relevent to them.
META tag support allows you to include any meta tag explicitly in your ranking formulas. This gives you precise control over ordering of your hit results, if you so desire.
Index Local and Remote Pages: Webglimpse is not limited to searching only your own data! The Spider program has flexible rules for gathering pages from remote sites. It can gather all the pages under a specified domain, or traverse a set number of 'hops' from a starting page regardless of domain, or a combination of these rules. You can even make a single archive, searchable from one form, that combines local data on your hard drive and multiple remote sites you specify.
Boolean Expresions, Wildcards, Misspellings & more: Extremely powerful agrep engine allows users to specify partial or whole-word matchies; use regular expressions, boolean combinations, specify number of spelling errors allowed, case-sensitive or insensitive. The administrator can modify the search form to pre-set these values for optimum searching, or allow the user to choose at the time of search.
HTML, PDF, Word, other formats: any filetype that can be converted to text can be indexed. On-the-fly conversion saves drive space, or you can create permanent text versions for speed. 3rd-party format converters are easily configured into the system.
Dynamically Generated Pages: PHP and database-driven sites are hard for many indexing programs to handle. Webglimpse allows you to handle both dynamically created and static pages in a single index. Dynamically generated pages are passed through the web server before indexing, so Webglimpse indexes exactly what a browser would see, rather than the source code used to generate the page.
"Neighborhood" Searching: Once your site is indexed, users can search the entire site, any subdirectory, or just the links on the current page. For large sites with many different areas, this is an essential tool to help users quickly find the most relevent results.As the site administrator, you control what options to provide.
  Query Log: all the keywords users enter are logged in a the 'common log' format, which is compatible with standard log analysis tools such as wusage. See how users are actually interacting with your site, and find out which words and phrases are most commonly searched for. This information can help you optimize the design of your site.


ComEduFeatures included
Index Any Single-Byte Language: Glimpse can index any language that is single-byte encoded - that includes all European languages such as Spanish, French, etc. Currently we cannot do double-byte encoded languages such as Chinese, or with special rules for word breaks such as Thai. By using the included filter program, the special ¨aut; characters are correctly indexed whether they appear as html character entities or actual upper-ascii characters.
  Preset Results Output and Search Forms: Thanks to contributions from our users, we have incorporated search forms and result templates in the following languages:
Hebrew, German, Spanish, Italian, Finnish, Portuguese and Estonian.
Creating a search interface that searches and displays correctly in these languages is just a single click!

Licensed users with the Custom Output Module can easily add their own templates in any language.

System Requirements

Unix server, ssh/telnet access; if user does not have experience installing Unix applications then our free remote install may be helpful. Root access is not required.

Platforms: Linux, Solaris, SunOS, HP/UX, freeBSD, AIX, IRIX, OSF, Mach, Rhapsody (Mac OS X).

Glimpse has now been reported to compile and run using the CygWin compiler for Windows, but we have not yet tested on any Win32 platforms.

Disk space requirements

From 5-15% of the indexed filespace will be required for the index; additional space will be required during indexing as scratch space. Remote files must be retrieved and stored locally for indexing. The program itself takes about 5Mb.

A comparison with HtDig

Home ]  [ Downloads ]  [ Docs ]  [ ISC License ]  [ Webglimpse on Github ]  [ Support ]  [ Contact Us ]  [ Top of Page ]

see also our sites for:Tucson, Arizona ]  [ Dallas (Garland), Texas ]  [ bTeaching.com : Ideas for Everyday Learning]  [ Where to Buy SlimeSuperFreakonomics: A Review re Game Theory and Economics based Policy

Webglimpse Advanced Site Search Software : providing flexible local search since 1997 ] 

Copyright © Internet WorkShop, 2002. All Rights Reserved.