[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

htuml2txt.pl in Webglimpse 1.7.1



Hello,

I noticed that on a Linux 2.2 system, Webglimpse searches take an
inordinate amount of time with the htuml2txt.pl filter. For example, a
search that took only five seconds with the plain html2txt filter took more
than 30 seconds with the htuml2txt filter. I took some measurements, and
the bottleneck seems to be the startup of the perl interpreter. 

I reimplemented the functionality of this script as a set of lex rules and
got search speeds comparable to the plain html2txt filter. I figured that
the lex file might be of general use, and so I am willing to share it if
you are interested.

Regards,
Christian