[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ignoring commas, other punctuation



At 09:41 AM 9/20/99 -0600, William H. Haller wrote:
>  Dear Golda:   Perhaps I am the only one who feels it is a problem, and
>that is OK too. Assume you are a user who knows nothing about the layout of
>the document you want to search, including punctuation - or perhaps you
>know there is punctuation in the phrase you are looking for, but the
>punctuation character is a comma - not an uncommon occurance when searching
>text documents. As an example, assume you know that the phrase 'form and'
>is in the document somewhere. Assume that the document has 
>form, and 
>inform and 
>form, function, and 
>form and   I don't know of users who would expect the search engine to find
>instances where the phrase goes across a sentence ending punctuation mark,
>(.!?, etc), but they do expect results that skip simple in-line punctuation
>like commas, semi-colons, colons, etcetera. The problem is that there
>doesn't appear to be a way to get the search engine to return line 1 and 4
>without also getting line 2 and 3.

Hi William

The most general solution to this problem would be to use the regexp
syntax, so that you can specify search patterns such as

	'form[ \,]*and'

(See http://glimpse.cs.arizona.edu/glimpsehelp.html#sect7 )

But, I tested this, and unfortunately it does not seem to be compatible
with the -U switch needed for Webglimpse!  So, while you can run glimpse on
the command line and get arbitrary punctuation ignored, you cannot do the
same thing from webglimpse :-(.

Any suggestions?  I suspect it was a decision of some sort and that it
might take some work to make the switch compatible - I think that with
regexp patterns, glimpse actually runs as agrep.

--G
------------------------------------------------------------
Golda Velez         gvelez@tucson.com	        520-620-6878
Internet Workshop                          http://tucson.com
Webglimpse Search Software             http://webglimpse.net
		~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Help organize the world - index your own corner of the web