[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Webglimpse Home]

Re: ignoring stopwords?



Hi Dennis

Interesting - I've used -r with glimpseindex, but mainly to handle files
that have no hard returns in them and so need an alternate delimiter.  I
use it in conjunction with glimpse -d for that purpose.

Note, if you want to search only the index, you can use glimpse -N, but it
is not guaranteed to be 100% precise for all queries.  It will be very
fast, though.

--Golda

At 12:23 AM 11/6/02 -0600, Daniel Mahler wrote:
>Hi Golda,
>
>I have found something relevant:
>
>glimseindex -help says:
>     ...
>     -r delim: build an index at the granularity of delimiter `delim'
>	to do booleans by reading ONLY the index
>     ...
>
>this seems to suggest that an index built with -r
>may do what I want at least in the "the,dog" case:
>if it does not have "the" in the index then it can only
>find "dog".
>All bets are off for the "the;dog" case.
>I am building a -r index now to see what it does.
>Have you used the -r option before?
>Is it restricted to a particular index type (-b -o)?
>Is the behaviour automatic for that sort of index
>or does it need to be enabled somehow? 
>
>regards
>
>D
>
>Golda Velez writes:
> > Hi Dennis
> > 
> > Unfortunately, there is no actual 'stop list' file - glimpseindex decides
> > on the fly which words not to index, but does not save a list of them
> > anywhere.
> > 
> > Its a good idea, but I think you would have to implement the way you
> > already thought of (calls to glimpse to determine frequency), or do some
> > hacking in the glimpseindex code.  If you try the latter let me know,
> > perhaps I can help some.
> > 
> > --Golda
> > 
> > At 04:52 PM 11/5/02 -0600, Daniel Mahler wrote:
> > >
> > >Hello again,
> > >
> > >Is there a way to make glimpse drop stopwrds from queries?
> > >ie to make "play;in;the;england" just act like "play;garden"
> > >and also "play,in,the,graden" as "play,graden".
> > >Or put another way, make stop words act like the identity
> > >of both logical operators
> > >[I do know there are theoretical problems with that]
> > >
> > >I am constructucting queries from input text
> > >and I like stop words treated as noise.
> > >However I want stop words to coincide with
> > >the statistically determined stopwords
> > >that glimpse generates rather then trying to construct a list.
> > >I could try using -N to test the frequency of each word first,
> > >but is there something more elegant?
> > >
> > >thanks
> > >
> > >D
> > >
> > >
> > >
> > ------------------------------------------------------------
> > Golda Velez         (use contact form)       626-792-9277
> > Internet Workshop                          http://iwhome.com
> > Webglimpse Search Software             http://webglimpse.net
> > 		~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >  Help organize the world - index your own corner of the web
>
>
>
------------------------------------------------------------
Golda Velez         (use contact form)       626-792-9277
Internet Workshop                          http://iwhome.com
Webglimpse Search Software             http://webglimpse.net
		~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Help organize the world - index your own corner of the web