[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fix for Bug119: -c + -d Doubles Hist Count
Thanks, Morey! I tried this out on some sample files, it seems to be right
on the money. For now I just used your fix below, rather than trying to
mess with delim.c at this point - its safe and it works. Anyone who wants
the convenience can download the patched sgrep.c from the developer page,
http://webglimpse.net/dev/
However, I wonder if this could also be related to bug # 120 -
http://webglimpse.net/dev/bugs/bug120.txt . It's not fixed by the
patched sgrep, but maybe a fix to forward_delimiter() would do it?
If you have a clear idea what should be fixed, I'll be happy to take the
patch through some heavy testing and make sure nothing else breaks...
--G
At 02:30 PM 9/26/99 -0400, root@mhubin.ne.mediaone.net wrote:
>
> mhubin@mediaone.net
> Morey Hubin
>
>
> Hi Cath,
>
> I saw your glimpse bug dating back to late December and took a crack
>at it. Using '>>>' as a record delimiter, or any other delimiter, in a
>regular text file gives double the hit count.
> glimpse -H index_dir -c -d '>>>' PLATYPUS
>
>WORKAROUND:
> The short answer is, it is a problem with delimiter alignment inside
>agrep. The immediate solution is to add the '-t' option when ever you use
>-d and -c together. This will properly adjust the alignment internally
>and give you the correct hit count. -t simply prints the '>>>' delimiter
>found at the end of the record rather than the beginning, but sunce you
>are not printing the record it never shows.
>
>
>SOLUTION:
> The technical explanation and proper solution follows:
>sgrep.c's function bm() line 724 calls two functions in succession
>1) curtextbegin = backward_delimiter(text,......);
>2) curtextend = forward_delimiter(curtextbegin,...);
>
> 1) search backward (to the left) in the file for the leading '>>>'
>delimiter and, 2) from there start searching forward for the next '>>>'
>to the right.
>
> The problem arises because 1) leaves the '>>>' at the beginning of
>curtextbegin so 2) also finds the leading '>>>' and nothing is done in 2).
>(ie curtextbegin = curtextend ). This means that it takes two loops to
>pass out of the current record (until we get to backward_delimiter again).
>Both loops increment the number of hits and presto, double the count.
>
> Using -t causes agrep (and glimpse) to take the trailing delimiter.
>In this case 1) and 2) work properly because 2) does not get stuck on
>the leading '>>>' in curtextbegin.
>
> The code must be altered so that the leading '>>>' is not left when
>curtextbegin is passed to forward_delimiter(). The following additions do
>the job nicely for plain text and compressed-filtered files. These are
>also -d&-c specific so cannot break anything else in glimpse.
>
> See the attachment for the original.
>
> The better fix would be to properly fix delim.c's forward_delimiter()
>to not get hungup on the leading 'D_length' delimiter if OUTTAIL is ON.
>forward_delimiter is called from a number of other places so I'm not
>touching it until I get more experience with glimpse source.
> =) (=
>
>===========================================================================
=========================
># diff sgrep.c sgrep.c119
>712c712,716
>< curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend,
tc_D_pattern,tc_D_length, OUTTAIL);
>---
>> if (!OUTTAIL) {
>> curtextend =
forward_delimiter(curtextbegin+D_length/*text-m*/, textend,
tc_D_pattern,tc_D_length, OUTTAIL);
>> }else{
>> curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend,
tc_D_pattern,tc_D_length, OUTTAIL);
>> }
>725c729,733
>< curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern,
D_length,OUTTAIL);
>---
>> if (!OUTTAIL) {
>> curtextend =
forward_delimiter(curtextbegin+D_length/*text-m*/, textend,
D_pattern,D_length, OUTTAIL);
>> }else{
>> curtextend =
forward_delimiter(curtextbegin/*text-m*/, textend, D_pattern,
D_length,OUTTAIL);
>> }
------------------------------------------------------------
Golda Velez gvelez@tucson.com 520-620-6878
Internet Workshop http://tucson.com
Webglimpse Search Software http://webglimpse.net
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Help organize the world - index your own corner of the web