The WebGlimpse Solution

Webglimpse site search software includes a web administration interface, remote link spider, and the powerful Glimpse file indexing and query system. Add sophisticated search capability to your site.

Webglimpse is scalable: index one small local site, hundreds of remote sites, or gigabytes of compressed documents. The code is open, mature, widely used, and actively supported.

Requires Unix server to run, can index documents on any server as long as they are accessible via Web or networked drive.

Aside from creating a searchable website, Webglimpse can be used for data mining applications, as part of a document management solution, as Glimpse for LXR and other integrated solutions.

At bTucson.com for Tucson, Arizona we have a production search that combines SQL matches with full text search results, and returns result pages at Google-friendly GoodURLs. See also our site for Dallas, TX

On these sites, Webglimpse combines SQL item and category matches with full text search, without doing any SQL queries. How? We use a specially formatted dump of the tables to search, find matching records with glimpse and rapidly parse out the fields. The links from the search go to SQL-generated pages and print specific database fields. See also our site for Dallas, TX

Webglimpse was originally written at the University of Arizona, and is now developed and maintained by Internet WorkShop.


Download latest versions as of 6/9/12:
Webglimpse  2.21.0   Glimpse 4.18.6 (source) / 4.18.5 (binary)
   Deutsch   Espaņol   Français   Hebrew   Italiano   Nederlands   Polsku   Româneşte   Suomi   Eesti keeles   Bulgarian   translate
...and Searches: (requires javascript)
Search for:
in the selected archive

Joke Archive

Postfix Docs

cPanel Docs

Some recent history: (see Developer's area for the full story, Downloads for the code)

6/9/2012: Webglimpse 2.21.0 no longer installs wgarcmin.cgi by default due to security vulnerabilities. wgarcmin.cgi is the Web-based manager script and is not required for the operation of Webglimpse. Users who are not in a secure intranet should use the shell-based wgcmd manager instead. To install wgarcmin.cgi into a secure environment, edit the wginstall.pl file and uncomment the indicated line:

       # REMOVED for security reasons.  If you can run in a safe intranet environment,
       # uncomment the following line to install the web-based archive manager wgarcmin.cgi
       #                               "wgarcmin.cgi",
Note users running 2.20.0 and earlier should delete the file wgarcmin.cgi. Upgrading to 2.21.0 will delete this file if installed to the same cgi-bin directory as previous versions.

2/14/12: Webglimpse 2.20.0 fixes an important security hole, all Webglimpse users should download

7/30/10: Still no luck getting the University of Arizona to agree to let me open source Webglimpse, so the code is now a bit stagnant. However, I'm still using it on several sites including most recently bTeaching.com. Here is a quick tip to avoid getting repeated files from '#here' links and form results. Include the following code in the .wgfilter-index file to exclude unwanted urls:

Deny \&
Deny \#
Deny \?

4/22/09: Webglimpse 2.18.8 times out slow sites much more agressively in order to speed up the spider. (Timeout is now 5 sec, modify in [webglimpse_home]/wglib/wgAgent.pm )

12/19/08: Glimpse 4.18.6 has a fix contributed by Doug McLaren for indexing documents with long Titles (or missing </title> tags). (avail source only)

10/27/08: Fix to re-enable selective searching by directory according to this doc How To Search by Directory. Had been broken for a few releases by a security enhancement, now fixed.

8/16/08: Important security update! We had a contributed script not running with the -T option, please upgrade now if you have Webglimpse 2.17.0 or above!

8/8/08: New Google Group for Webglimpse Users available. This replaces the former mailing lists we were running in-house. Sign up to ask questions or make suggestions!

8/05/08: Webglimpse 2.18.5 has some minor fixes and tweaks for the new extern module feature. Also a new version of the CommandWeb module has been brought over from the Abra project so sites using both Webglimpse and Abra won't have conflicts. For a site using the new call-external-module feature, see MyRecipeSearcher.com

6/18/08: Webglimpse 2.18.4 adds a major new feature: optional hook for inserting data from an external module into the results output. (Inspired by the ultra-cool Drupal api, though the Webglimpse hook is not quite as elegant.) Read the wgoutput.cfg file for notes on how to use this feature. Also a fix for the New Query box when using the simple search form, and an updated install script that avoids compiler warnings for old code.

2/16/08: Eliminate repetitive text from search results - new doc, actually the capability has been there for some time but not well documented. Most sites have repetitive navigation and copyright info that can interfere with search result quality; a few simple steps lets you pull out those sections prior to indexing.

1/3/08: Improved handling of dynamic URLs of the form http://somesite.tld/path?query=stuff.

11/29/07: Just some minor tweaks to the install, to make errors less annoying...

10/9/07: Major new features for incorporating SQL results into a Webglimpse search. Both exact and approximate matches into database tables can be combined in a controlled way with a full text search of websites and local files. Also, SEO types take note - we can now export search result pages as well-formed static URLs in the site. With a small amount of URL-rewriting, we can make any keyword search appear as a static web page. These new features are a bit tricky to configure, we expect that users may need help to take full advantage of them at this point, but they are so powerful we want to release them now. Both major new features are in use at bTucson.com and bDallas.com.

9/14/07: Advanced version download free for (almost all) nonprofits, .EDU and .GOV sites. Download now!

3/22/07: One thing leads to another - Webglimpse 2.17.3 has a fix for dealing with badly formed URLs. Basically we eval() the URI call and fall thru to our own methods if the URI module fails. Also some minor tweaks for dealing with multiple archives, and a new option inside the makenh script for keeping large hashes out of memory by using dbmopen.

3/15/07: Webglimpse 2.17.2 uses the URI cpan module if available for URL parsing instead of our own recipe from back when.

1/25/07: New German (Deutsch) translations available of several pages of this site.

1/23/07: Webglimpse 2.17.1 has a minor tweak to avoid errors when LWP is not available.

12/06/06: Webglimpse 2.17.0 contains a new nifty add-on called BibGlimpse that lets you use Webglimpse as a scientific reprints repository. Thanks to David Kreil and Tom Tuechler at Boku Bioinformatics, Vienna, Austria. Take a look at the BibGlimpse online documentation that also links to detailed installation instructions.

11/23/06: There now is a new application available to aid distributed literature research that builds on WebGlimpse and which comes in the latest WebGlimpse distribution. Features of the light-weight PDF reprint manager, BibGlimpse, include addition of reprints without forms, automated bibliographic record retrieval for PubMed listed papers using machine learning techniques to match a record to the PDF with a success rate of over 95%, management of user annotation of papers, and structured full-text queries using the WebGlimpse engine.

9/30/06: RECOMMENDED: Webglimpse 2.16.4 adds one last bit of sanitization by eliminating some deprecated variables. This is the currently recommended release.

9/17/06: Webglimpse 2.16.3 adds further sanitization for output variables to prevent XSS/HTML injection - we failed to fully sanitize the query string as unusual chars are required for regexp queries, but we need to eliminate those before displaying the query string on a page. Also minor fixes to the "Within X words" feature and to allow spaces inside archive configuration variables.

8/16/06: Webglimpse 2.16.2 has some further tweaks to handling of BASE HREF tags and also eliminates mailto: and javascript: links at an earlier step in the spidering process.

8/10/06: The Searchable Site article is now available for all to view (without a subscription to Linux Journal). Has a nice introduction to Webglimpse for newcomers, with a focus on making your cool website generate some revenue...

8/01/06: Webglimpse 2.15.5 installs htuml2txt.pl filter automatically and removes the choice, that was confusing some users. With prefiltering on htuml2txt.pl should always be the best option. Alternatefilters can still be specified by manually editing the .glimpse_filters file in the archive directory.

7/23/06: Webglimpse 2.15.3 removes <SCRIPT ..> ... </SCRIPT> sections by default (if you choose the default htuml2txt.pl as the filter). Eliminates ugly hit results matching javascript code.

7/19/06: Webglimpse 2.15.0 uses WWW::Mechanize if its available, to parse the links out of each page. If not available we fall back to our original code; but WWW:Mechanize does a better job of recognizing BASE HREF tags and generally has more modern HTML parsing code.

5/30/06: Webglimpse 2.14.9 has a fix to the NextHits toolbar (a problem was caused by the input sanitization introduced in 2.14.5)

5/30/06: Check out The Searchable Site - our article in the July issue of Linux Journal! (you must be a subscriber - we'll post the content here on August 1)

5/24/06: Webglimpse 2.14.8 has bugfixes and improvements fixing warnings and speeding up searches on large archives. Tests not required in the ranking formula are avoided. Fix to CenterOutput routine avoids warnings and also speeds up code if centering is not necessary.

5/05/06: Webglimpse 2.14.7 has small fix for handling literal '[' and ']' chars in URLs and other places. Thanks to Robert Pelcher for the report!

5/01/06: The cPanel installer now automatically adds an alias of www.[domain name] during installation. cPanel does this automatically for domains so we do too...

4/23/06: cPanel installer now uses an automatic installation script. Single command and you are done, if you are the administrator of a cPanel server. Thanks to contract programmer Julian Lishev for getting this done and making Webglimpse accessible to the world of cPanel users.

4/19/06: The Webglimpse Manual. Finally, an organized layout of all the Webglimpse documentation plus several brand new docs. Thanks to contract docwriter Edis Feldhouse who did a really heroic job (especially given what she had to work with!). The manual is released in beta and any feedback is welcome!

4/17/06: Webglimpse 2.14.6 has several fixes for jump-to-line function; also allows use of custom filters that require the original filename prior to filtering. Thanks to Dr D. P. Kreil and to alert user Steve Cochran for the fixes & reports!

4/08/06: Webglimpse 2.14.5 adds additional input sanitization to fix reported XSS vulnerability

4/08/06: cPanel installer released. 3.X FTP-only install is pulled for now due to difficulties with permissions and security issues. During beta testing we found that most users with only FTP access are either under cPanel or hSphere anyway, so using the built-in install mechanisms of those platforms will give us better usability and security.

4/03/06: Webglimpse 3.0.11b has fix for the installer to display path to a detailed log file in case of error. FTP only install has several potential permissions issues, but in some cases its the only option users have...

4/01/06: Glimpse 4.18.5 has several compile-time fixes and a new make check target, thanks to Nelson Beebe, who not only patiently took us through several rounds of fixes but contributed binaries to 19 different platforms! Thanks, Nelson!

(binaries now available for several flavors of Linux, SunOS, FreeBSD, Darwin, IRIX, NetBSD, OSF1 and OpenBSD)

3/18/06: Not strictly a Webglimpse thing, but since I'm the primary maintainer...I've finally got a home page (Golda's). It does use the 'next generation' Abra software that we're working on...

3/6/06: Webglimpse 2.14.3 adds three hidden tags that allow you to modify the URL of hit results. This is useful for shopping carts such as SoftCart or Minivend that need to keep a session id in the URL, and for use with Google Analytics. Thanks to Tom Monroe of Infinity Imaging for this detailed Analytics & Softcart HowTo!

2/14/06: Webglimpse 2.14.2 now supports LimitPrefix for TREE type roots. That means, you can traverse external links on only a limited portion of a istarting site without hitting unwanted pages.

2/14/06: A user reports that Glimpse 4.18.2 "compiled and seems to be working find under MS services for unix" - so Windows users may be able to now use Glimpse without worrying about Cygwin. Windows Services for Unix appears to be a free download from Microsoft.

2/6/06: Webglimpse 3.00.01b is now available for beta test. Web based install wizard eliminates need for shell access (or at most a few commands will be needed to set permissions). Beta testers welcome!

2/3/06: Glimpse 4.18.2 ends backwards compatibility with varargs.h, as some systems now don't have STRICT_ANSI defined to let us know they have stdarg.h. That's ok ... stdarg.h is there on just about all *nixes since circa 1995.

9/5/05: Note added to docs - delete /tmp/xpdf* files if using xpdf to filter your PDFs to text. Also we are in development on Webglimpse 3.0, which will feature an FTP-only install.

8/09/05:Webglimpse 2.14.1 fixes a bug that prevented the administrative interface cookies from working with Internet Explorer. Also some fine-tuning to the keyword highlighting & centering code.

4/29/05:Webglimpse 2.14.0 now centers output around the keywords and makes sure to always display the matched keywords even when the matching line has to be trimmed for size.

3/22/05:Webglimpse 2.13.2 fixes a bug which on some systems prevented indexing of PDF files. Also adds a new optional filter script, and improves the German language output results template.

12/28/04: Webglimpse 2.13.1 has support for Dutch (Nederlandse) thanks to Rev David Morris of GentleWare Studios!

11/27/04: Webglimpse 2.13.0 now supports the option to find keywords WITHIN X WORDS of each other (wordspan). Also some minor fixes regarding highlighting keywords and elimination of redunant links when spidering.

10/7/04: Webglimpse 2.12.2 has some minor fixes and tweaks dealing with special characters in searches, caching and structured queries.

8/02/04: Webglimpse 2.12.0 detects and uses LWP and HTTP modules if available. This enables us to traverse sites requiring cookies and cookie-based login.

6/10/04: Glimpse 4.18.0 has new configure script generated by autoconf 2.57 - may fix compilation problems on FreeBSD

5/25/04: Webglimpse 2.11.0 has support for Romanian and updated French text, thanks to Marian-Nicolae V. Ion!

4/27/04: Webglimpse 2.10.4 has a fix to make jump-to-line work on subsequent 'Next Hits' pages

4/16/04: wgusers mailing list is re-enabled using Mailman. (Had been taken down as spammers were abusing the list thru majordomo)

3/11/04: Webglimpse 2.10.2 has several fixes to the Customized Output module,specifically to the INCLUDE FILE feature and affecting behaviour of cached output pages.

12/12/03: Webglimpse 2.10.1 adds optional , cleans up Next Hits toolbar and offers 100% uptime

11/16/03: Webglimpse 2.8.1 is a maintenance release with several small fixes and additional tests. Recommended to install.

9/02/03: Webglimpse 2.8.0 has much cleaner, more modern results output (commerical version) using stylesheets. Also added support for Bulgarian, several minor bugfixes.

6/19/03: Webglimpse 2.7.8 handles Russian month names correctly (thanks Adeena Ascher at the JDC!), and fixes a problem with cachefile links containing the '#' wildcard.

6/16/03: Webglimpse 2.7.7 now defaults to DD/MM/YY numeric dates for non-English languages.

5/15/03: Webglimpse 2.7.6 has support for Polish (Polsku), thanks to Wojciech Dorosz! Also some minor fixes involving queries with quote characters.

4/5/03: Webglimpse 2.7.4 can prefilter files for greater speed; plus several other major fixes and features.

2/02/03: Webglimpse 2.6.7 has improved handling of PHP files and more powerful options for Customizing results output.

12/25/02: White paper for using Agrep and MySQL for powerful full-text searches of database entries, by Kevin McGrail

11/29/02: Glimpse 4.17.2 has fixes for compiling on FreeBSD, binaries now available for MacOS and Linux.

11/22/02: PPL: Pay-per-line licensing model to be tried for next-generation search project.

11/18/02: Webglimpse 2.6.2 can search multiple archives from a single search screen. Also has an option to traverse but not index starting 'trunk' pages in a link tree.

10/01/02: Webglimpse 2.5.4 has a new option for greatly improved performance when searching for common words in large archives; an option to return full sentences instead of fragments; several minor fixes; and output templates in Estonian!

8/01/02: Webglimpse 2.5.1 allows you to highlight query words in color, or use your own custom tags. Plus, smarter and easier install for non-root users.

6/17/02: Webglimpse 2.4.6 provides several options for search interfaces, including use of ANY or ALL keywords instead of making the user create their own boolean expression by hand. Plus several minor fixes and a somewhat significant one to handling of documents with no titles.

5/5/02: Webglimpse 2.4.0 has the ability to search only the links on any particular page; if you want you can add search boxes to all the pages in your site so your users can combine browsing with searching. Plus lots of other fix es and tweaks, including the ability to re-sort hits after a search, handle files with really long lines, and optionally register your archives with us!

3/21/02: Webglimpse 2.3.3 is a maintenance release with several minor bugfixes, also some more flexible rules for defining sites and a new statistics module for logging searches. Webglimpse 2.3.1 added support for multiple ranking formulas: users can sort hits by date, title, meta tags, or (also new) link popularity. Plus, new templates for French and Norwegian languages.

11/04/01: Webglimpse 2.2.0 has a nifty command-line interface for managing your archives through a telnet session. ===> Versions 2.2.1 and higher have an important security fix for link-based archives. And 2.2.2 can auto-generate its own search form.

All 2.X versions have some cool web-based adminstration tools for multiple archives, the ability to combine local directories, remote sites and links into a single archive, and the beginnings of category support. See the Live demo (read-only) of the management interface.


Glimpse was originally developed by Udi Manber, Sun Wu, and Burra Gopal.

Glimpse, Webglimpse and this site now maintained by Internet Workshop, and the Webglimpse developer community.


Home ]  [ Purchase ]  [ Downloads ]  [ Docs ]  [ Support ]  [ Contact Us ]  [ Web Hosting ]  [ Top of Page ]

see also our sites for:Tucson, Arizona ]  [ Dallas (Garland), Texas ]  [ bTeaching.com : Ideas for Everyday Learning

Webglimpse Advanced Site Search Software : providing flexible local search since 1997 ] 


Copyright © Internet WorkShop, 2002. All Rights Reserved.