[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Webglimpse Home]

Re: ignoring stopwords?



Hi Golda,

I have found something relevant:

glimseindex -help says:
     ...
     -r delim: build an index at the granularity of delimiter `delim'
	to do booleans by reading ONLY the index
     ...

this seems to suggest that an index built with -r
may do what I want at least in the "the,dog" case:
if it does not have "the" in the index then it can only
find "dog".
All bets are off for the "the;dog" case.
I am building a -r index now to see what it does.
Have you used the -r option before?
Is it restricted to a particular index type (-b -o)?
Is the behaviour automatic for that sort of index
or does it need to be enabled somehow? 

regards

D

Golda Velez writes:
 > Hi Dennis
 > 
 > Unfortunately, there is no actual 'stop list' file - glimpseindex decides
 > on the fly which words not to index, but does not save a list of them
 > anywhere.
 > 
 > Its a good idea, but I think you would have to implement the way you
 > already thought of (calls to glimpse to determine frequency), or do some
 > hacking in the glimpseindex code.  If you try the latter let me know,
 > perhaps I can help some.
 > 
 > --Golda
 > 
 > At 04:52 PM 11/5/02 -0600, Daniel Mahler wrote:
 > >
 > >Hello again,
 > >
 > >Is there a way to make glimpse drop stopwrds from queries?
 > >ie to make "play;in;the;england" just act like "play;garden"
 > >and also "play,in,the,graden" as "play,graden".
 > >Or put another way, make stop words act like the identity
 > >of both logical operators
 > >[I do know there are theoretical problems with that]
 > >
 > >I am constructucting queries from input text
 > >and I like stop words treated as noise.
 > >However I want stop words to coincide with
 > >the statistically determined stopwords
 > >that glimpse generates rather then trying to construct a list.
 > >I could try using -N to test the frequency of each word first,
 > >but is there something more elegant?
 > >
 > >thanks
 > >
 > >D
 > >
 > >
 > >
 > ------------------------------------------------------------
 > Golda Velez         (use contact form)       626-792-9277
 > Internet Workshop                          http://iwhome.com
 > Webglimpse Search Software             http://webglimpse.net
 > 		~~~~~~~~~~~~~~~~~~~~~~~~~~~
 >  Help organize the world - index your own corner of the web