3/6/06: Webglimpse 2.14.3 adds three hidden tags that allow you to modify the URL of hit results. This is useful for shopping carts such as SoftCart or Minivend that need to keep a session id in the URL, and for use with Google Analytics. Thanks to Tom Monroe of Infinity Imaging for this detailed Analytics & Softcart HowTo!
The tags are prepath, postpath and insertbefore. Add these as HIDDEN tags to the search form, and the result url will be modifed like so:
original result url: http://yourserver.com/shoppingcartpath/somefile.html
example tags: <input name=prepath value="/cgi/mycart.cgi?path=">
<input name=insertbefore value="shoppingcartpath">
<input name=postpath value="&sessionid=[SESSION_ID]">
modified result url: http://yourserver.com/cgi/mycart.cgi?path=shoppingcartpath/somefile.html&sessionid=[SESSION_ID]
See the HowTo referenced above for an example that applies specifically to SoftCart and Google Analytics.
2/14/06: Webglimpse 2.14.2 now supports LimitPrefix for TREE type roots. That means, you can traverse external links on only a limited portion of a istarting site without hitting unwanted pages.
2/14/06: A user reports that Glimpse 4.18.2 "compiled and seems to be working find under MS services for unix" - so Windows users may be able to now use Glimpse without worrying about Cygwin. Windows Services for Unix appears to be a free download from Microsoft.
2/6/06: Webglimpse 3.00.01b is now available for beta test. Web based install wizard eliminates need for shell access (or at most a few commands will be needed to set permissions). Beta testers welcome!
2/3/06: Glimpse 4.18.2 ends backwards compatibility with varargs.h, as some systems now don't have STRICT_ANSI defined to let us know they have stdarg.h. That's ok ... stdarg.h is there on just about all *nixes since circa 1995.
9/5/05: Note added to docs - delete /tmp/xpdf* files if using xpdf to filter your PDFs to text. Also we are in development on Webglimpse 3.0, which will feature an FTP-only install.
8/09/05:Webglimpse 2.14.1 fixes a bug that prevented the administrative in
terface cookies from working with Internet Explorer. Also some fine-tuning to the k
eyword highlighting & centering code. No longer highlights partial matches when user was searching for whole word. Also fix to remove some leftover line numbers that were printing in multi-line titles.
4/29/05:Webglimpse 2.14.0 now centers output around the keywords and makes sure to always display the matched keywords even when the matching line has to be trimmed for size. Also does not highlight keywords separately when EXACT MATCH selected and user enters a phrase. NOTE: maxchars setting now refers to max chars per line, rather than per file. So, some admins may want to tweak settings to reduce number of lines displayed per file if you had been relying on the maxchars setting to limit the total output.
3/22/05:
Webglimpse 2.13.2 fixes a bug which on some systems prevented indexing of PDF files. Also adds a new filter script, htfilter.pl, which can be used in place of htuml2txt.pl and can be modified to remove sections of files prior to indexing. This is useful if you have fixed header and footer which you do not want to be part of the index.
12/28/04:
Webglimpse 2.13.1 has support for Dutch (Nederlands) thanks to Rev David Morris of
GentleWare Studios. Also some fixes to normalization of 'A/..' type links and added some code to later support recognizing keywords in the 'link text'. (That is, we save the text that is used to link to each page; later we can use this to help rank searches or even add these keywords into the page for searching on)
11/27/04:
Webglimpse 2.13.0 now supports the option to find keywords WITHIN X WORDS of each other (wordspan). Also some minor fixes regarding highlighting keywords and elimination of redunant links when spidering.
To use the wordspan feature, install the latest version and view the sample input forms. Note that when searching for multiple terms within X words of each other, the terms are also required to appear in the same record rather than anywhere in the file.
10/7/04:Weblimpse 2.12.2 has two feature tweaks:
- Special characters (ie *,;~!) in the query string are automatically escaped on searches using the AUTOSYNTAX drop down. That is, if a user is searching on the simple form wgsimple.html specifying 'ALL the words' or 'ANY of the words' then we assume they don't really want to use regular expressions, and just escape regexp chars for them. You can still use regular expression characters in the advanced search page (wgindex.html), or escape them explicitly as you like.
- For structured queries, we now print the first [maxchars] text from the file on a hit, rather than printing the somewhat ugly SOIF format
filename{len}: value
Note that you still have to have the SOIF format in the indexed file, normally this is produced by a customized prefilter. A version of htuml2txt.pl that saves all Metatags in SOIF format is available, for instance.
9/13/04: Webglimpse 2.12.1 is a maintenance release, fixing a problem
where similar searches could erroneously share a single cache file, resulting in inaccurate search results. Cache filenames are now unique for each combination of search options.
8/02/04: Webglimpse 2.12.0 has several new features and minor fixes:
- Cookie support - we detect LWP and HTTP, if both are present then use those in place of old httpget.c. This lets us support cookie-based login. You can set the login credentials in the 'Configure Remote Domain' screen in the web-ministration interface, or edit wgsites.conf directly.
- Cleaner bolding of keywords. mfs.cgi is responsible for delivering the processed version of a web page when the user clicks on a jump-to-line link. Now it is more careful never to insert the bold tags in the middle of an existing tag, even if the tag spans multiple lines.
- Wildcard filter *.* is now allowed as a glob pattern in .glimpse_filters, so you can filter _everything_ regardless of extension. With previous versions each extension had to be specified individually.
- Fixed a few bugs - makenh trying to call wgLog in non-commercial version, and incorrect path to perl being put in htuml2txt.pl on some systems.
- Indexing tweaks - skip .png image files, and eliminate some common forms of redundant links.
6/10/04: Glimpse 4.18.0 finally uses a standard autoconf-generated configure script. Hopefully this will fix compilation problems some folks ran into on FreeBSD and other platforms (esp those without dlopen support).
- NOTE - if you were indexing files with spaces in the names and had run ./configure --file-end-mark='\t', the new syntax is
./configure --with-file-end-mark='\t'
That way we conform to standard configure script conventions. If you were not already doing this, be sure to read the spaces in filenames HowTo before making this change.
5/25/04: Webglimpse 2.11.0 has support for Romanian and updated French text, thanks to Marian-Nicolae V. Ion!
4/27/04: Webglimpse 2.10.4 has a fix to make jump-to-line work on subsequent 'Next Hits' pages - thanks to Rhonda Ames at the Ohio state library for the catch!
3/22/04: Webglimpse 2.10.3 has a few minor fixes/tweaks:
- CustomOutputTool.pm now uses the wgoutput.cfg from the first archive and not the generic template when an alternate archive is used for 100% uptime.
- ResultCache.pm is more careful not to trigger Perl taint warnings on opening the cache file
- .wg_err and .wg_log files now include timestamps
3/11/04: Webglimpse 2.10.2 is primarily a maintenance release, fixing the following:
- the [INCLUDE filename] variable in wgoutput.cfg was erroneously being filtered out before applying - so INCLUDEs were broken in 2.10.1. This is now fixed.
- the [end_files] variable in wgoutput.cfg was being output twice on live searches and once on cached searches; it is now output once on all searches. If you had adjusted your code to expect it to be output twice, you will need to readjust for the (correct) single output.
- Also, foundation is laid for custom filtering - the wgFilter.pm module is now included in the distribution and can be called upon by your own replacement for htuml2txt.pl if you desire. Makes it easier to eliminate repeated headers, footers and other repetetive code chunks from the search. Ask us for sample scripts if you need them.
12/12/03: Webglimpse 2.10.1 adds several significant new features:
- Optional from SearchFeed.com - generate revenue from portal sites with relevent keyword-driven .
- Next Hits toolbar now split into groups, thanks to Peter Moore for making this suggestion AND sponsoring the work! For large numbers of hits this cleans up the appearance of the results page greatly.
- 100% uptime while indexing - thanks again to Peter Moore for both the idea and the sponsorship! You can now keep your search interface available all the time; simply create two identical archives, put them at different times in the crontab, and turn on the alternate hidden tag in the search form (see source of wgany.html in the archive directory).
11/16/03: Webglimpse 2.8.1 is a maintenance release with fixes to : the paging in the nexthits toolbar (had been broken in some instances); better handling of prefiltering files with tags spanning multiple lines; more error-checking regarding glimpse version; and fix dumb hardcoded ID in the newquery box.
9/02/03: Webglimpse 2.8.0 adds support and templates for Bulgarian, using charset windows-1251, fixes problems with retrieving and caching files with spaces or shell escape chars in the filenames. Also modernized the default results output to be much cleaner using stylesheets. And, CVS has been moved to a new server.
Only the newer language translations have been updated to new look - thanks to Nick Stallman for updates to Italian. Other languages will be updated as translations come in...
6/19/03: Webglimpse 2.7.8 handles Russian month names correctly (thanks Adeena Ascher at the JDC!), and fixes a problem with cachefile links containing the '#' wildcard.
6/16/03: Webglimpse 2.7.7 now defaults to use DD/MM/YY numeric date format for non-English languages. Also make nicer-looking titles from links for PDF and other files that do not contain <TITLE> tag. (Avoid using hex char codes)
5/15/03: Webglimpse 2.7.6 has support for Polish (Polsku), thanks to Wojciech Dorosz! Also some minor fixes involving queries with quote characters, and correction to a file counting bug that could have caused some SITE type archives to index less than the set maximum number of pages.
4/5/03: Webglimpse 2.7.4 is out - fixes and features include
- major new option: Prefiltering of PDF, HTML and any other files for which we have a filter. This can cut search times by a factor of 4 or more in some cases. Please see Prefiltering for Speed for a fuller explanation and instructions for use.
- added Italian language templates, thanks to William Maddler!
- important bugfix for TREE and SITE type archives. Sites with many duplicate links in some cases would fail to index some non-duplicate links as well; this is now fixed.
- added support for entering Login information for remote sites, so that sites requiring username & password for access can be indexed. Note, it is up to the administrator to also password-protect the search interface, if desired.
- more control over ranking of hits, admin can now use $LinkString variable to specifically make some pages or directories come up first (Commercial version only)
- fix for traversing client-side imagemaps, thanks to Chris Poirier of
tapestry-os.org for pointing this out!
2/02/03: Webglimpse 2.6.7 has several new fixes and features:
- Custom Output can now [INCLUDE file] in header and footer
- Handle .php files as in directory-type indexes
- Customizable entries for [LINK] and [DATE]. See comments in wgoutput.cfg and .wgoutputfields for more details.
- Correction for gathering remote links with ' -type quotes and embedded quotes
- Speedup for wgarcmin interface - avoid loading passwd entries! (Was being done to process UserDir settings, but not necessary in all cases)
- More robust install on systems that can't compile - install will recognize binaries of httpget and html2txt (you can get these for Linux from the download page)
- Recognize php links of form .php?stuff=value&...
- Fix for handling "#ABC" type links, thanks to Renfrew Kwong of CS System of the Hong Kong University of
Science and Technology
12/25/02: Agrep and MySQL - a white paper by Kevin McGrail
11/29/02: Several contributed Glimpse fixes and files, thanks to all who helped!
- MacOS binaries for glimpse 4.17.1 from Fr. Vince Bork (see download page for link)
- Some notes on compiling for MacOS from ln at madsci.org
- Linux binaries for glimpse 4.17.2 from yours truly
- Fixes in 4.17.2 should let it compile on FreeBSD, additional notes here. Thanks to Clemens Fischer who runs a Wiki at http://wiki.haribeau.de/
- darwinports binary may be available upon request (was uploaded, I lost it!) Thanks to Larry Sica of http://www.opendarwin.org/projects/darwinports
- Notify user correctly that Bestmatch mode is not working with Linenumber output. Thanks to Kevin McGrail of peregrinehw.com for pointing this out. Note, Kevin has also changed his version of agrep to allow
Bestmatch to run along with Linenum output, he says they work together fine! Have not changed the distributed version yet pending more testing.
11/22/02: Check out some ideas for a
new cooperative development project where the
developers get paid for their work, based on sales of he final product.
11/18/02: Webglimpse 2.6.2 can now search multiple archives at once! Also new option to traverse but not index the 'trunk' pages of a tree.
- To search multiple archives, just put the path to webglimpse.cgi into your browser, something like:
http://yourdomain.com/cgi-bin/wg2/webglimpse.cgi
It should automatically generate a form with checkboxes for all archives on your server. You can save the source and edit this form further, remove any archives you don't
want searched, etc. You can also use the wgany.html template that is copied to each archive directory. When the search is performed, configuration files from the first checked archive are used to format the results.
- The 'don't-index-trunk' option is helpful if you have a page whose only purpose is to link to all the various pages you want indexed. Usually in this case
you won't want the start page indexed. Now, when you create a new TREE type archive in the wgarcmin.cgi interface, you can just uncheck the box that reads "Index starting URL?"
Or, edit archive.cfg by hand and set
IndexTrunk 0
in the archive you want affected. It will only have an effect on TREE type archives.
10/10/02: Glimpse 4.17.1 has an important fix that will prevent missing hits. If you are running 4.16.2 thru 4.17.0, you should upgrade now. Those versions contain an error that could cause hits to be missed in certain cases, for reliable searches upgrade to 4.17.1.
10/01/02: Webglimpse 2.5.4 and Glimpse 4.17.0 have some exciting stuff:
- Webglimpse advanced search form has a new option to limit the number of hits returned; limiting it to the 50 most recent files can speed up searches 30 times or more for common words in a large archive! On our test archive, one search went from 31 seconds to 1 second by adding this limit. It's true the user won't see all the possible hits, but very few users really look thru all the 1000+ hits that can be returned on common words. We're also working on doing background searches with caching for the cases where all the hits are wanted.
- Option to return full sentences in the search results page - especially useful when the raw files have line returns in odd places, or no returns at all.
- Thanks to Ragnar Kurm, we have Estonian (eesti keeles) templates for search forms and output results now built in.
- Several minor fixes, particularly to the "new query" box - search options are now preserved correctly.
- And the big news...Glimpse 4.17.0 should compile and run under Windows if you get the latest Cygwin compiler from RedHat (free, but we can't distribute it or the binaries). Please let us know what mileage you get if you try this!
8/01/02: Webglimpse 2.5.1 has several significant improvements
- wgoutput.cfg has a new set of variables: begin_highlight and end_highlight let you control the highlighting of matched query terms.
- wginstall.pl is much easier now when installing as a non-root user. It picks smarter defaults and tests before prompting, so you don't get unnecessary permission errors.
- several small security improvements to the interface - avoid printing hard drive paths in the search URL
7/08/02: Webglimpse 2.4.8 includes new template file "newquery.html" for customizing the query box at the end of the results page (make it blank to eliminate the box entirely). Requires commercial version. Upgrade to 2.4.8, press 'Save Changes' on the manage archive screen, then edit 'newquery.html' as desired (will be copied to your archive directory). Also new function FindSentences allows the '.' char to be used for line breaks in the results instead of hard returns.
6/17/02: Webglimpse 2.4.6 provides several options for search interfaces, including use of ANY or ALL keywords instead of making the user create their own boolean expression by hand. Plus several minor fixes and a somewhat significant one to handling of documents with no titles.
6/17/02: Glimpse 4.16.2 finally fixes the intermittent segfault on very large indexes (multiple-Gb). Just a buffer overrun, folks... See ChangeLog.txt for details.
5/16/02: Webglimpse 2.4.3 cleans up the Next Hits toolbar
5/7/02: Webglimpse 2.4.1 has minor install tweak and correction to sendmail command line for optional archive registration.
5/5/02: Webglimpse 2.4.0 adds several new (and old) features, plus lots of minor fixes & cleanup:
- Neighborhoods are back! For old-time webglimpse users, you may remmber the feature to search just the links from a particular page, plus the automatic adding of search boxes to every page of a site. This was one of the original 'big features' of Webglimpse, the ability to combine searching with browsing.
Now with 2.4.0, just install, to the the "Manage Archive" screen, click on "Add search boxes to pages", press "Save Changes", and reindex - by running the wgreindex script as an appropriately privileged user.
- Output format cleaner, faster, and more powerful. First off, we output lots of little tables instead of one big one, so that the results appear faster. Default to 1000 chars per file instead of 10000, so files without returns in them don't crowd up the results screen. Provide ranking options in the "new query" box so the user can re-sort the results as desired. Removed some of the hard-coded output messages and moved them into the configurable variables in wgoutput.cfg.
- Management interface tweaks: sort by ID, clean up 'Manage All Archives' screen to save space and make buttons accessible, fix defaults for Domain Configuration screen. Add button to visit URL's indexed.
- Several minor
Ranking tweaks: count hits case-insensitively for scoring purposes, give more weight to the title in the default formula
- Added option to register archives, not much behind it yet - more about this in the next release.
3/21/02: Webglimpse 2.3.3 fixes a problem with the install when CHOWN failed; now we tell the user how to do it themselves and keep going instead of bailing out altogether.
3/17/02: Webglimpse 2.3.2 is mainly a maintenance release, a few new features:
- Indexing by Site had been controlled by matching the prefix - that is, all pages starting with http://www.somedomain.com/ would be gathered. Now, you can check a box to match as a regexp instead of a fixed prefix string. This lets you define much more flexibly what a "site" is.
- Number of files indexed now shows up in the admin interface overview
- Nicer cleanup if Ctrl-C is hit during link gathering
2/12/02: Webglimpse 2.3.1 has lots of new goodies:
- makenh now counts links to each page in a site, allowing us to include link popularity as a ranking variable
- RankHits.pm now supports multiple named ranking formulas, so you can give users more than one way to sort results. This
is really nice for searches that produce pages of hits - users can resort them by date, by title matches, link pop, or other custom formulas you want
to provide them. The default formula names are currently hardcoded into the search form template...if a lot of people are coming up with their own custom
ranking formulas please let me know so I can make this nicer.
- webglimpse.cgi now has a much faster and cleaner way of parsing the raw glimpse output, thanks to Derek Pomery for the suggestion! The monster regexp is no more...
- French templates for the form and output page are finally incorporated, thanks to Cortina Jone for contributing these (back in August!)
- Norwegian templates too, thanks to Arnfinn Anda
- Several minor fixes, url cleanup, better redirect handling of relative urls, etc - thanks to all who submitted bug reports & fixes!
12/05/01: Important security fix in Webglimpse 2.2.1 and 1.7.12, thanks to Gerry Magennis for finding it!
- URL's containing control characters are now correctly escaped and quoted when retrieved through a shell command.
No exploits are known to this problem, but any sites that index by gathering links are advised to upgrade to either 1.7.12 or 2.2.1 or higher versions
immediately. Because it is an important fix, it was rolled into both 1.X and 2.X sources.
- URL's containing commas are also handled correctly starting in 2.2.2, thanks Gerry!
- New in WG 2.2.2 - if you run webglimpse.cgi with no parameters, it now prints out a nice search form. If you only have one archive,
it prints the search form for it; if you have more than one, it prints a form that lets you search any available archive by
picking it out of a drop-down list. See the demo area for a live example.
11/04/01: New with Webglimpse 2.2.0, wgcmd manages and tests your archives from the command line!
- List your archives and view all last-built dates at a glance
- See the configuration details for any archive by hitting one key
- Test searches right from the management interface
- Add documents, build archives manually, delete or create new
- Get detailed help with exact directory paths for other operations
Its pretty cool...try it out and let me know what you think!
10/15/01: Thanks to Nelson Beebe for all the binaries! Download 4.15 for your OS at ftp://webglimpse.net/pub/glimpse/
9/13/01: New shell script for processing PDF files contributed by Tong Sun
- much lower overhead than the previous perl version
- support for gzipped pdf files
- better temp file handling.
- See howto_xpdf.html and howto_pdf.html for details.
8/30/01: Webglimpse 2.1.06 has support for Finnish (suomi), thanks to Marko Asplund!
8/22/01: Webglimpse 2.1.05 trial and licensed test release
- Support for arbitrary protocol (such as https or ftp) in local directory->URL conversions. We still don't do https retrievals, but you can index your local secure server and have the URL's reported correctly.
- Fixes to domain configuration interface - it is now possible to cancel adding a domain, and change the hostname, protocol or port without deleting the domain altogether. Thanks to Seth Chaiklin for detailed bug reports!
- Updated Hebrew output config file, corrected highlighting in non-English languages, and fix for missing META CHARSET tag - thanks to Adeena Ascher for thorough testing!
- The "New Query" text and button at the bottom of the results page is now customizable
Note, this version has only been lightly tested so far, so be prepared to test carefully if you install in the next few days. Please report any bugs you find!
8/20/01: Glimpse compiles on FreeBSD
- Glimpse 4.15 has a fix to the configure script and Makefiles to avoid errors on systems without the dl library (for dynamic loading of shared libs). Dynamic loading is not necessary for glimpse to work, it is just used to speed up filtering.
7/28/01: Portuguese support added
- Webglimpse 2.1.04 trial and Commercial/contributors versions have Portuguese support, thanks to Kepler Oliveira of Brazil for the translations!
7/23/01: Expiring Demo doesn't require registration
- Webglimpse 2.1.03 demo version is self-expiring, so you don't need to remember a username/password to download it anymore. Yeah, open source expiration can always be hacked - but if you are willing to do that much work, just do it on the project and you'll get the commercial version free!
2.1.03 has some other fixes to the install that should solve 'missing library' problems. If you had problems with earlier 2.X's, please try this one, it should be much smoother. If you still have problems, please e-mail me! (use contact form)
7/08/01: Language support built-in!
- Glimpse 4.14 defaults to multi-language support. Also minor cleanup in makefiles, option for spaces in filenames added to configure script.
- WG 2.1.01 licensed and WG 2.1.01 EDU/trial have built-in support for Spanish, Hebrew and German. Thanks to Victor Gonzales, Adeena Ascher, and Peter Zeltmann for translations, and to Adeena and Lerner Communications for information on needed environment variable settings. Other languages are now easy to add, if interested in translating please email me - gvelez AT webglimpse.net
- The 2.1.01 versions also have "smart" docs in the web-ministration interface, they will tell you exactly which files to edit for each archive to do the few tasks that have to be done by hand.
5/20/01: Glimpse 4.13.2b has a new configure script that works much better, thanks to Sang-yong Suh.
- You can now run
sh configure --enable-structured-queries --enable-iso-charset
has been tested on
Linux-2.4.2 gcc-2.96 RedHat-7.1
Linux-2.2.16 egcs-2.91.66 RedHat-6.2
Linux-2.2.12 gcc-2.8.1
Solaris-2.5.1 gcc-2.7.2.1
SunOS-4.1.3 gcc-2.8.0
AIX-3.4 cc
Please let us know if you have successes/problems on other systems!
ChangeLog
5/15/01: WG 2.0.11 licensed and WG 2.0.11 EDU/trial have several minor fixes:
- makenh had been looking for libraries only in /usr/local/wg2, now it correctly looks wherever you actually installed them!
- wginstall.pl was always telling you to run wgarcmin from 'localhost', now it uses the actual ServerName setting.
- Site-type roots had been limiting links only by domain name; now they default to limiting to links within the same subdirectory as the starting page, but can be edited by the user through the web interface.
5/12/01: Webglimpse 2.0.10 can update wusage stats thru the web interface, adds .cgi extensions to scripts
- WG 2.0.10 licensed
and WG 2.0.10 EDU/trial versions contain a small fix from Tom Boutell - unset GATEWAY_INTERFACE environment variable so we can run wusage from a cgi script and automatically update the stats. Also, webglimpse and wgarcmin are renamed to end with .cgi to be compatible with some webserver configurations that require that extension. So, if you upgrade over an existing installation with live archives, you may want to press "Save Changes" in your Manage Archive screen and then copy over the search form again to your web pages. New archives should not have any problems.
ChangeLog
5/08/01: Webglimpse 2.0.08 logs searches to file in common log format, compatible with Wusage software from boutell.com.
- WG 2.0.08 licensed and WG 2.0.08 EDU/trial versions include a new module, wgLog.pm, that generates common log style entries for keyword searches. Also several fixes to wginstall.pl that should avoid headaches in some situations (like glimpse not found in a usual place). Experimental code to run wusage from the web interface, may not work on all systems but it should work fine when run by hand or from cron.
3/28/01: Webglimpse 2.0.07 uses saved files when possible to avoid unnecessary URL retrievals
3/08/01/: Webglimpse 2.0.06 released
- WG 2.0.06 licensed
and the
WG 2.0.06 EDU/trial have a very small but important fix - the -o switch has been added back to the glimpseindex command line! If you prefer, just edit your wgreindex files and add the -o switch to the command line yourself, instead of downloading the new version. There are several other minor changes, but the only other significant one is that builds are now done in the background so the browser won't time out.
1/29/01: Webglimpse 2.0.05 released
- WG 2.0.05 licensed and the
WG 2.0.05 EDU/trial are available for download. Corrected problem with some files looking for old config.pl, added some docs explaining the different types of roots, fixed bugs with adding & deleting roots with blank URLs, other minor fixes. Please try it out & send me more bug reports!
1/23/01: wgusers and wgdev mailing lists archived and searchable using 2.X
- wgusers list archive dates back to 10/99
- wgdev list archive dates back to 3/99
1/17/01: New ResultCache.pm module for 2.X
- Webglimpse 2.04 has
a new ResultCache module with two main improvements: a toolbar that lets the user jump to
an arbitrary group of hits (1-10, 11-20, etc) instead of just "next N hits"; and it expires
old cache files so they don't fill up the /tmp directory for people who don't automatically
delete from there. Also it generates the cache file name by the glimpse command line, so if another
user does the same exact search before the cache expires, it will use the cachefile and be very fast.
For heavily used archives with common search terms this might be a big performance help.
12/26/00: Check this out! Major new release of Webglimpse.
- Webglimpse 2.0.03 has
a web interface for managing all your archives, combining indexes of directories & linktrees,
a lot more flexibility for configuring host-related parameters, lots more.
- Read-only demo lets you preview the interface
on our server. Login as "guest", password "guest".
- The source is all available through cvsweb
if you just want to browse
Please send in bug reports, this is still definitely in beta-test mode!
11/22/00: Speeded-up searches of docs without linebreaks
- Webglimpse 1.7.11: We found that for some documents with very long records, a very large % of the search time was spent parsing each "hit" returned by glimpse. However, often the whole record would not be used anyway, because webglimpse accepts a "maxchars" variable limiting the amount of text that is printed. By chopping the returned lines down to maxchars before parsing, searches on these kinds of files speeded up by more than a factor of 10 in some cases. Note, an entire query could have been slowed down even if only 1 or 2 such ultra-long records were returned. Usually this happens if some large files do not have linebreaks, which glimpse uses as the default record delimiter.
08/15/00: New release versions, GNU-style ChangeLogs
- Glimpse 4.13.1: Contains a few minor fixes, including one for an oversight that may have disabled filters on some sites with 4.13.0. See the Glimpse ChangeLog for details.
- Webglimpse 1.7.9: Most fixes were actually done to RankHits.pm, which is not included in the edu release. Contributing developers may have access to the commercial version, if you don't just e-mail me - (use contact form). The Webglimpse ChangeLog is also in GNU format now.
05/24/00: Docs, libraries & maintenance releases
- Glimpse 4.13.0 contains experimental support for shared libraries added by Christian Vogler.
Tested on linux, should work on OS's with dlopen().
- Webglimpse 1.7.8
is now released. Mostly minor fixes, see CHANGES-1.7.8 for details. Most important fixes probably were to Aliases hash in
siteconf.pl and to handle paths containing './' sequences.
- RankHits module now available for beta testing. Note, this will
be part of the commercial release, but edu/gov/nonprofit users who test it before 7/1/00 can keep a free copy.
- Faster lex-based html2uml translator contributed
by Christian Vogler. Note, to use this you need glimpse 4.13.0 or higher, with dynamic lib support compiled in.
- Better instructions for indexing upper-ascii characters contributed by Reuven Lerner.
12/13/99: Site and developer tool updates
- webglimpse.net
contains an index of all glimpse-related publications
in PDF format
- Bugzilla is now
installed to track all bugs and feature requests.
- The CVS Source Tree for both Glimpse and Webglimpse
is now online (read-only). Remote read/write access can also be arranged for developers.
11/17/99: Webglimpse 1.7.7 has a more robust version of httpget, Vineel fixed it not to hang on
certain remote sites. Also several fixes to reduce warning messages and output when run from cron.
See CHANGES-1.7.7 for full cvs details.
10/25/99: Glimpse 4.12.6 fixes several bugs, thanks to Morey, including the sgrep.c fix below. See the
CHANGES.glimpse file for brief summary. Improved configure
script also included, contributed by Michael Heironimus - see Michael's notes at
http://www.iit.edu/~heirmic/devel/glimpse.shtml.
10/4/99: New version of sgrep.c fixes a bug with the record count
when using a user-defined delimiter. Thanks to Morey Hubin for this tricky fix!
Please test if you have a chance, this change will be incorporated into the next Glimpse release.
See bug119.txt for more info.
9/15/99: Webglimpse 1.7.6 contains
- Several security fixes from Christian Vogler - we were not checking for \'s in all strings, and the InputSyntax commercial module
was interfering with the quote-escapes. This could have been a major security hole, so it is highly recommended to install 1.7.6.
- HPUX support thanks to Art Brittain
- Fix for extra '/' characters in CustomOutputTool, from Charlie Roche
- Comments supported in .wgfilter-index, thanks to Mike Kay
- Multiple DirectoryIndex settings supported in config.pl, from Seth Chaiklin
- makeEndFileDesc was missing from OutputTool.pm, pointed out by Jeff Magnum
- Several fixes for traversing links on local virtual hosts, my errors / my fix --G
- Other minor stuff...see CHANGES.1.7.6 for a cvs dump.
7/13/99: Webglimpse 1.7.5 contains
- Fix for parsing robots.txt with wildcards - thanks Russel Duncan Jr and Alistair Young for the error report!
- Added checkbox in wgindex.html for applying filters to result sets (for indexing non-Ascii files)
- Support for field-based searching. See the howto
- Fix for a bug when exactly maxfiles hits were returned. Thanks to Steve Wix for the error report!
- Some other minor stuff...see CHANGES.1.7.5 for a cvs dump.
7/13/99: glimpse-4.12.5.tar.gz contains
a fix to allow filtering with strucutured queries. This is necessary for field-based searching through webglimpse.
6/7/99: SpacesInFilenames.tar contains changes to glimpse, glimpseindex, makenh and webglimpse that will allow indexing of filenames with spaces in them. So far it works in some small tests, but its a very "deep" change and needs some heavy testing! Please let me know if you try it! --GV
6/2/99: Webglimpse 1.7.2 has several minor fixes
6/1/99: Glimpse 4.12.3 may fix the nasty segfault bug with large indexes. (Bug# 116) --GV
5/25/99: LEX-based htuml2txt contributed by Christian Vogler.
5/19/99: Notes on compiling under OSF 3.0 including some modified system header files.
Archived News
For current bug reports, to submit a new bug or enhancement request, or add information
to an existing bug, please use our installation of
Bugzilla (from the nice folks at www.mozilla.org)
You can access the CVS tree for both Glimpse and Webglimpse through the
cvsweb script
(from Bill Fenner's site).
If you will be making a lot of changes,
talk to me about
getting a username/password for remote CVS access.