[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Agrep linecount bug in large files...



>> At 05:04 PM 3/29/00 -0600, Wilson Smith wrote:

>> >agrep seems to miscount the line number when searching files larger than
>> >the BlockSize / Max_record constants defined in agrep.h -- the line
>> >number returned is larger than the number of the actual line which
>> >matches...

You're right; actually it appears to be adding 1 to the linecount each time
BlockSize is reached.

I looked through the code a bit, and I bet the reason is the extra '\n'
char added to the end of each block in bitap.c, in the bitap() function.

But, it seems that this extra '\n' is important for other reasons in the
code, that otherwise the last bit of text is just dropped.  So I'm not sure
what the best solution is; I would lean towards trying to back up the read
pointer so that we break the blocks at the nearest delimiter, instead of
splitting up a record in two.  (Breaking records may cause other problems,
like with complex queries which wind up with parts on either side of a block)

Does anyone have any thoughts on this?  

Thanks!

--Golda

------------------------------------------------------------
Golda Velez         gvelez@iwhome.com	        520-620-6878
Internet Workshop                          http://iwhome.com
Webglimpse Search Software             http://webglimpse.net
		~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Help organize the world - index your own corner of the web