[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fix for Bug119: -c + -d Doubles Hist Count
mhubin@mediaone.net
Morey Hubin
Hi Cath,
I saw your glimpse bug dating back to late December and took a crack
at it. Using '>>>' as a record delimiter, or any other delimiter, in a
regular text file gives double the hit count.
glimpse -H index_dir -c -d '>>>' PLATYPUS
WORKAROUND:
The short answer is, it is a problem with delimiter alignment inside
agrep. The immediate solution is to add the '-t' option when ever you use
-d and -c together. This will properly adjust the alignment internally
and give you the correct hit count. -t simply prints the '>>>' delimiter
found at the end of the record rather than the beginning, but sunce you
are not printing the record it never shows.
SOLUTION:
The technical explanation and proper solution follows:
sgrep.c's function bm() line 724 calls two functions in succession
1) curtextbegin = backward_delimiter(text,......);
2) curtextend = forward_delimiter(curtextbegin,...);
1) search backward (to the left) in the file for the leading '>>>'
delimiter and, 2) from there start searching forward for the next '>>>'
to the right.
The problem arises because 1) leaves the '>>>' at the beginning of
curtextbegin so 2) also finds the leading '>>>' and nothing is done in 2).
(ie curtextbegin = curtextend ). This means that it takes two loops to
pass out of the current record (until we get to backward_delimiter again).
Both loops increment the number of hits and presto, double the count.
Using -t causes agrep (and glimpse) to take the trailing delimiter.
In this case 1) and 2) work properly because 2) does not get stuck on
the leading '>>>' in curtextbegin.
The code must be altered so that the leading '>>>' is not left when
curtextbegin is passed to forward_delimiter(). The following additions do
the job nicely for plain text and compressed-filtered files. These are
also -d&-c specific so cannot break anything else in glimpse.
See the attachment for the original.
The better fix would be to properly fix delim.c's forward_delimiter()
to not get hungup on the leading 'D_length' delimiter if OUTTAIL is ON.
forward_delimiter is called from a number of other places so I'm not
touching it until I get more experience with glimpse source.
=) (=
====================================================================================================
# diff sgrep.c sgrep.c119
712c712,716
< curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern,tc_D_length, OUTTAIL);
---
> if (!OUTTAIL) {
> curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, tc_D_pattern,tc_D_length, OUTTAIL);
> }else{
> curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern,tc_D_length, OUTTAIL);
> }
725c729,733
< curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length,OUTTAIL);
---
> if (!OUTTAIL) {
> curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, D_pattern,D_length, OUTTAIL);
> }else{
> curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length,OUTTAIL);
> }
====================================================================================================
712c712,716
< curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern, tc_D_length, OUTTAIL);
---
> if (!OUTTAIL) {
> curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, tc_D_pattern, tc_D_length, OUTTAIL);
> }else{
> curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, tc_D_pattern, tc_D_length, OUTTAIL);
> }
725c729,733
< curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length, OUTTAIL);
---
> if (!OUTTAIL) {
> curtextend = forward_delimiter(curtextbegin+D_length/*text-m*/, textend, D_pattern, D_length, OUTTAIL);
> }else{
> curtextend = forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern, D_length, OUTTAIL);
> }