[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fix for Bug119: -c + -d Doubles Hist Count




				mhubin@mediaone.net
				Morey Hubin


    Hi Cath,

    I saw your glimpse bug dating back to late December and took a crack
at it.  Using '>>>' as a record delimiter, or any other delimiter, in a
regular text file gives double the hit count.
              glimpse -H index_dir  -c -d '>>>'  PLATYPUS

WORKAROUND:
    The short answer is, it is a problem with delimiter alignment inside
agrep. The immediate solution is to add the '-t' option when ever you use
-d and -c together. This will properly adjust the alignment internally
and give you the correct hit count. -t simply prints the '>>>' delimiter
found at the end of the record rather than the beginning, but sunce you
are not printing the record it never shows.


SOLUTION:
    The technical explanation and proper solution follows:
sgrep.c's function bm() line 724 calls two functions in succession
1)          curtextbegin = backward_delimiter(text,......);
2)          curtextend   = forward_delimiter(curtextbegin,...);

    1) search backward (to the left) in the file for the leading '>>>'
delimiter and, 2) from there start searching forward for the next '>>>'
to the right.

    The problem arises because 1) leaves the '>>>' at the beginning of
curtextbegin so 2) also finds the leading '>>>' and nothing is done in 2).
(ie curtextbegin = curtextend ). This means that it takes two loops to
pass out of the current record (until we get to backward_delimiter again).
Both loops increment the number of hits and presto, double the count.

    Using -t causes agrep (and glimpse) to take the trailing delimiter.
In this case 1) and 2) work properly because 2) does not get stuck on
the leading '>>>' in curtextbegin.

    The code must be altered so that the leading '>>>' is not left when
curtextbegin is passed to forward_delimiter(). The following additions do
the job nicely for plain text and compressed-filtered files. These are
also -d&-c specific so cannot break anything else in glimpse.

    See the attachment for the original.

   The better fix would be to properly fix delim.c's forward_delimiter()
to not get hungup on the leading 'D_length' delimiter if OUTTAIL is ON.
forward_delimiter is called from a number of other places so I'm not
touching it until I get more experience with glimpse source.
                            =)  (=

====================================================================================================
# diff    sgrep.c  sgrep.c119
712c712,716
<                                       curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern,tc_D_length, OUTTAIL);
---
>                                         if (!OUTTAIL) {
>                                          curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, tc_D_pattern,tc_D_length, OUTTAIL);
>                                         }else{
>                                          curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern,tc_D_length, OUTTAIL);
>                                         }
725c729,733
<                                       curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length,OUTTAIL);
---
>                                         if (!OUTTAIL) {
>                                          curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, D_pattern,D_length, OUTTAIL);
>                                         }else{
>                                          curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length,OUTTAIL);
>                                         }
====================================================================================================

712c712,716
< 					curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern, tc_D_length, OUTTAIL);
---
>                                         if (!OUTTAIL) {
> 					   curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, tc_D_pattern, tc_D_length, OUTTAIL);
>                                         }else{
> 					   curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern, tc_D_length, OUTTAIL);
>                                         }
725c729,733
< 					curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length, OUTTAIL);
---
>                                         if (!OUTTAIL) {
> 					   curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, D_pattern, D_length, OUTTAIL);
>                                         }else{
> 					   curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length, OUTTAIL);
>                                         }