Indexing MS Word and Excel Documents

last updated 3/9/06
To index a Microsoft Word document with Glimpse, you need a word-to-ascii command line converter. Thanks to Vitus Wagner there is a program called catdoc that does the job nicely.

NEW: recent versions of catdoc also include the program xls2csv that allows on-the-fly coversions of Excel documents in the same way as MS Word. Note, this is very new and may have some bugs. I got an error message, "No BOF record found" when trying to convert an Excel document containing images.

  1. Download and install catdoc version 0.90 or later from http://www.45.free.net/~vitus/ice/catdoc/

  2. Add the following (or similar) to the file .glimpse_filters
    For MS-Word:
    
    	*.doc	/usr/local/bin/catdoc <
    	*.DOC	/usr/local/bin/catdoc <
    
    For Excel:
    	
    	*.xls	/usr/local/bin/xls2csv <
    	*.XLS	/usr/local/bin/xls2csv <
    

  3. Edit the wgreindex file in each archive that needs to access non-ascii files. Change both glimpseindex command lines to add the -z option, like so:
    	/bin/cat /home/WWW/proj/test/.wg_toindex | /usr/local/bin/glimpseindex -n -H /home/WWW
    /proj/test -o -t -h -X -U -f -C -F -z > /dev/null
    
    	/bin/cat /home/WWW/proj/test/.wg_toindex | /usr/local/bin/glimpseindex -n -H /home/WWW
    /proj/test -o -t -h -X -U -f -C -F -z
    

That should be it - we've tested it here, and it works for us...

Docs and Howtos

Questions: webglimpse-support@iwhome.com