[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ignoring commas, other punctuation



Golda Velez wrote:
> The most general solution to this problem would be to use the regexp
> syntax, so that you can specify search patterns such as
> 
>         'form[ \,]*and'
> 
> (See http://glimpse.cs.arizona.edu/glimpsehelp.html#sect7 )
> 
> But, I tested this, and unfortunately it does not seem to be compatible
> with the -U switch needed for Webglimpse!  So, while you can run glimpse on
> the command line and get arbitrary punctuation ignored, you cannot do the
> same thing from webglimpse :-(.

I tried this, and the reason why this did not work on my machine was the -w
switch, not the -U switch. If I allow partial matches in the webglimpse
search form, regular expressions work fine for me, albeit much slower than
normal searches.

However, escapes are a problem. For security reasons, the most recent
webglimpse version changes '[ \,]*' to '[ \\,]*', and glimpse does not like
that. This seems to be kind of tricky to resolve, because different shells
seem to handle escapes within single quotes differently. So I don't know
how to allow the flexibility AND ensuring that webglimpse is still secure.

In my opinion the Right Thing(TM) would be to extend glimpse such that you
can specify arbitrary sets of characters as word delimiters. It is also
likely to yield much faster searches than regular expressions. 

BTW, glimpse does not follow the POSIX standard for specifying regular
expressions. According to POSIX, 'Christian*' means to look for 'Christia'
followed by an arbitrary number of n's (including zero). In glimpse you
need to specify this search as 'Christia(n)*'. This can be confusing to
people who actually know regexps and would like to use them.

- Christian