To index PDF files with Xpdf, you will need to
*.pdf /path/to/usexpdf.sh < *.PDF /path/to/usexpdf.sh < *.pdf.gz /path/to/usexpdf.sh -z < *.PDF.gz /path/to/usexpdf.sh -z <
to the .glimpse_filters file in your archive directory.
NOTE: usexpdf.sh assumes
pdftotext is in your path, if not you will need to edit the script accordingly.
NOTE2: when saving usexpdf.sh from the link above, you should delete the .txt extension. It is just there so you can view the script conveniently from netscape.
The reason we use usexpdf.sh, is because .glimpse_filters works on STDIN, but pdftotext requires an input file for random access.
all or pdf PDFin the field labelled "Prefilter filetypes for speed:"
Prefiltering is recommended for efficiency and speed. However, if you prefer to filter files on the fly in order to save space, then edit the wgreindex file in each archive that needs to access PDF files. You will need to add the -z switch to both glimpseindex command lines.
rm /tmp/xpdf*either to your crontab or the end of the wgreindex script. The xpdf filter tends to leave around tmp files and these can fill up your hard drive if not regularly deleted.
Make sure pdftotext runs when executed as the webserver user. On some systems you may get errors about missing shared libs. With Apache 1.1 and later, you can use a tag like
SetEnv LD_LIBRARY_PATH [path/to/required_libs]in your httpd.conf file to add directories to the list of places to look for shared libs. (It may be better to install them to an accessible directory if you can.)