For example, ü => ü á => á
A filter script or library must be used for converting HTML codes to upper-ASCII characters for indexing. To take advantage of them, you will also need to configure Glimpse and Webglimpse to index upper-ASCII characters. Please see howto_charsets.txt for detailed instructions.
NEW: Faster lex-based html2uml translator contributed by Christian Vogler. This has not been built into the core code/install program yet, you'll need to download it separately.
The slower htuml2txt.pl perl filter for International Characters is now distributed with Webglimpse version 1.7 and above, and is an option in the install.
If you want to install the filter by hand, try this:
Copy the provided htuml2txt.pl script somewhere in your path.
Edit the .glimpse_filters file in your archive directory and replace html2txt with htuml2txt.pl
Re-run wgreindex. This should let you search with the real ASCII characters, and match the appropriate html codes. It will cause slightly slower indexing than using the original html2txt filter.
We have tested this on a very small set of files, it appears to work smoothly. Again, however, you will also need to follow the instructions in howto_charsets.txt for correct indexing and searching of upper-ascii characters. (With the filter alone, you will get some success, but you will also see unexpected errors such as "Query is too broad".)
Please let us know your results at email@example.com!