Indexing MS Word and Excel Documents

last updated 3/9/06
To index a Microsoft Word document with Glimpse, you need a word-to-ascii command line converter. Thanks to Vitus Wagner there is a program called catdoc that does the job nicely.

NEW: recent versions of catdoc also include the program xls2csv that allows on-the-fly coversions of Excel documents in the same way as MS Word. Note, this is very new and may have some bugs. I got an error message, "No BOF record found" when trying to convert an Excel document containing images.

  1. Download and install catdoc version 0.90 or later from

  2. Add the following (or similar) to the file .glimpse_filters
    For MS-Word:
    	*.doc	/usr/local/bin/catdoc <
    	*.DOC	/usr/local/bin/catdoc <
    For Excel:
    	*.xls	/usr/local/bin/xls2csv <
    	*.XLS	/usr/local/bin/xls2csv <

  3. Edit the wgreindex file in each archive that needs to access non-ascii files. Change both glimpseindex command lines to add the -z option, like so:
    	/bin/cat /home/WWW/proj/test/.wg_toindex | /usr/local/bin/glimpseindex -n -H /home/WWW
    /proj/test -o -t -h -X -U -f -C -F -z > /dev/null
    	/bin/cat /home/WWW/proj/test/.wg_toindex | /usr/local/bin/glimpseindex -n -H /home/WWW
    /proj/test -o -t -h -X -U -f -C -F -z

That should be it - we've tested it here, and it works for us...

