[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ignoring commas, other punctuation
At 09:41 AM 9/20/99 -0600, William H. Haller wrote:
> Dear Golda: Perhaps I am the only one who feels it is a problem, and
>that is OK too. Assume you are a user who knows nothing about the layout of
>the document you want to search, including punctuation - or perhaps you
>know there is punctuation in the phrase you are looking for, but the
>punctuation character is a comma - not an uncommon occurance when searching
>text documents. As an example, assume you know that the phrase 'form and'
>is in the document somewhere. Assume that the document has
>form, and
>inform and
>form, function, and
>form and I don't know of users who would expect the search engine to find
>instances where the phrase goes across a sentence ending punctuation mark,
>(.!?, etc), but they do expect results that skip simple in-line punctuation
>like commas, semi-colons, colons, etcetera. The problem is that there
>doesn't appear to be a way to get the search engine to return line 1 and 4
>without also getting line 2 and 3.
Hi William
The most general solution to this problem would be to use the regexp
syntax, so that you can specify search patterns such as
'form[ \,]*and'
(See http://glimpse.cs.arizona.edu/glimpsehelp.html#sect7 )
But, I tested this, and unfortunately it does not seem to be compatible
with the -U switch needed for Webglimpse! So, while you can run glimpse on
the command line and get arbitrary punctuation ignored, you cannot do the
same thing from webglimpse :-(.
Any suggestions? I suspect it was a decision of some sort and that it
might take some work to make the switch compatible - I think that with
regexp patterns, glimpse actually runs as agrep.
--G
------------------------------------------------------------
Golda Velez gvelez@tucson.com 520-620-6878
Internet Workshop http://tucson.com
Webglimpse Search Software http://webglimpse.net
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Help organize the world - index your own corner of the web