Webglimpse 2.7.4 and above have a new setting that allows prefiltering of PDf and even HTML files, greatly increasing search speed for large indexes. Indexing is also significantly faster; and for remote files, storage needs are actually reduced. For local files, storage requirements increase by about 20% of the total size of files indexed. It works by simply keeping around the pure-text version of the file that glimpse needs to index and search, rather than generating it on-the-fly as needed. Remote file storage is reduced by storing the filtered version rather than the original. Extra meta information can now be stored in the filtered file (such as line numbers, for jump-to-line), which will allow administrators to add meta information about specific files even if they do not own those files.
Anyway, if you just want to implement it, here is how:
*.pdf /usr/local/bin/usexpdf.sh < *.PDF /usr/local/bin/usexpdf.sh < *.html /usr/local/wgdemo/lib/htuml2txt.pl < *.htm /usr/local/wgdemo/lib/htuml2txt.pl < ...more filetypes here...Note, if you have not yet indexed PDF files, please see How To Index PDF Documents using XPDF.
/path/to/your/archive/wgreindexmanually or by pressing the 'Build Index' button in the web interface. Note, once you rebuild it manually, you may have permissions problems doing it from the web in the future unless you reset ownership to the web user. Generally you should pick one method or the other and be consistent in order to avoid problems.