[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Webglimpse Home]
Re: ignoring stopwords?
Hi Golda,
I have found something relevant:
glimseindex -help says:
...
-r delim: build an index at the granularity of delimiter `delim'
to do booleans by reading ONLY the index
...
this seems to suggest that an index built with -r
may do what I want at least in the "the,dog" case:
if it does not have "the" in the index then it can only
find "dog".
All bets are off for the "the;dog" case.
I am building a -r index now to see what it does.
Have you used the -r option before?
Is it restricted to a particular index type (-b -o)?
Is the behaviour automatic for that sort of index
or does it need to be enabled somehow?
regards
D
Golda Velez writes:
> Hi Dennis
>
> Unfortunately, there is no actual 'stop list' file - glimpseindex decides
> on the fly which words not to index, but does not save a list of them
> anywhere.
>
> Its a good idea, but I think you would have to implement the way you
> already thought of (calls to glimpse to determine frequency), or do some
> hacking in the glimpseindex code. If you try the latter let me know,
> perhaps I can help some.
>
> --Golda
>
> At 04:52 PM 11/5/02 -0600, Daniel Mahler wrote:
> >
> >Hello again,
> >
> >Is there a way to make glimpse drop stopwrds from queries?
> >ie to make "play;in;the;england" just act like "play;garden"
> >and also "play,in,the,graden" as "play,graden".
> >Or put another way, make stop words act like the identity
> >of both logical operators
> >[I do know there are theoretical problems with that]
> >
> >I am constructucting queries from input text
> >and I like stop words treated as noise.
> >However I want stop words to coincide with
> >the statistically determined stopwords
> >that glimpse generates rather then trying to construct a list.
> >I could try using -N to test the frequency of each word first,
> >but is there something more elegant?
> >
> >thanks
> >
> >D
> >
> >
> >
> ------------------------------------------------------------
> Golda Velez (use contact form) 626-792-9277
> Internet Workshop http://iwhome.com
> Webglimpse Search Software http://webglimpse.net
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Help organize the world - index your own corner of the web