[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fix for Bug119: -c + -d Doubles Hist Count



Thanks, Morey!  I tried this out on some sample files, it seems to be right
on the money.  For now I just used your fix below, rather than trying to
mess with delim.c at this point - its safe and it works.  Anyone who wants
the convenience can download the patched sgrep.c from the developer page,
http://webglimpse.net/dev/

However, I wonder if this could also be related to bug # 120 -
http://webglimpse.net/dev/bugs/bug120.txt .  It's not fixed by the
patched sgrep, but maybe a fix to forward_delimiter() would do it?

If you have a clear idea what should be fixed, I'll be happy to take the
patch through some heavy testing and make sure nothing else breaks...

--G

At 02:30 PM 9/26/99 -0400, root@mhubin.ne.mediaone.net wrote:
>
>				mhubin@mediaone.net
>				Morey Hubin
>
>
>    Hi Cath,
>
>    I saw your glimpse bug dating back to late December and took a crack
>at it.  Using '>>>' as a record delimiter, or any other delimiter, in a
>regular text file gives double the hit count.
>              glimpse -H index_dir  -c -d '>>>'  PLATYPUS
>
>WORKAROUND:
>    The short answer is, it is a problem with delimiter alignment inside
>agrep. The immediate solution is to add the '-t' option when ever you use
>-d and -c together. This will properly adjust the alignment internally
>and give you the correct hit count. -t simply prints the '>>>' delimiter
>found at the end of the record rather than the beginning, but sunce you
>are not printing the record it never shows.
>
>
>SOLUTION:
>    The technical explanation and proper solution follows:
>sgrep.c's function bm() line 724 calls two functions in succession
>1)          curtextbegin = backward_delimiter(text,......);
>2)          curtextend   = forward_delimiter(curtextbegin,...);
>
>    1) search backward (to the left) in the file for the leading '>>>'
>delimiter and, 2) from there start searching forward for the next '>>>'
>to the right.
>
>    The problem arises because 1) leaves the '>>>' at the beginning of
>curtextbegin so 2) also finds the leading '>>>' and nothing is done in 2).
>(ie curtextbegin = curtextend ). This means that it takes two loops to
>pass out of the current record (until we get to backward_delimiter again).
>Both loops increment the number of hits and presto, double the count.
>
>    Using -t causes agrep (and glimpse) to take the trailing delimiter.
>In this case 1) and 2) work properly because 2) does not get stuck on
>the leading '>>>' in curtextbegin.
>
>    The code must be altered so that the leading '>>>' is not left when
>curtextbegin is passed to forward_delimiter(). The following additions do
>the job nicely for plain text and compressed-filtered files. These are
>also -d&-c specific so cannot break anything else in glimpse.
>
>    See the attachment for the original.
>
>   The better fix would be to properly fix delim.c's forward_delimiter()
>to not get hungup on the leading 'D_length' delimiter if OUTTAIL is ON.
>forward_delimiter is called from a number of other places so I'm not
>touching it until I get more experience with glimpse source.
>                            =)  (=
>
>===========================================================================
=========================
># diff    sgrep.c  sgrep.c119
>712c712,716
><                                       curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend,
tc_D_pattern,tc_D_length, OUTTAIL);
>---
>>                                         if (!OUTTAIL) {
>>                                          curtextend =
forward_delimiter(curtextbegin+D_length/*text-m*/, textend,
tc_D_pattern,tc_D_length, OUTTAIL);
>>                                         }else{
>>                                          curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend,
tc_D_pattern,tc_D_length, OUTTAIL);
>>                                         }
>725c729,733
><                                       curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern,
D_length,OUTTAIL);
>---
>>                                         if (!OUTTAIL) {
>>                                          curtextend =
forward_delimiter(curtextbegin+D_length/*text-m*/, textend,
D_pattern,D_length, OUTTAIL);
>>                                         }else{
>>                                          curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern,
D_length,OUTTAIL);
>>                                         }

------------------------------------------------------------
Golda Velez         gvelez@tucson.com	        520-620-6878
Internet Workshop                          http://tucson.com
Webglimpse Search Software             http://webglimpse.net
		~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Help organize the world - index your own corner of the web