[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Webglimpse Home]
Re: Problems indexing
15072 is an odd number; did you check .wg_err for reasons they were not
indexed, or .wg_log for any entries associated with the missing files?
With a directory index, the file list is generated from the find command,
if you run
find /dir -type f -follow -print > tmpfile
where "/dir" is the directory you are indexing, do you get all the files
that way? (Sounds like to fully test you'd have to do all 21; but maybe
you can just try it with the directories that are missing files)
--Golda
At 02:57 PM 10/18/02 -0400, Hugh Taylor wrote:
> I finally found some time to get back and troubleshoot the problem. It
>looks as though the file list being created in .wg_toindex is only getting
>15072 filenames, not the 21,000+ files that are in the directories. (I've
>split the files into 21 directories) Is there a configuration file that
>needs modification? I have set the MaxLocal in archive.cfg to 30000. Thanks.
>
> On Wed, 2002-09-25 at 13:47, Golda Velez wrote:
>> At 01:24 PM 9/24/02 -0400, Hugh Taylor wrote: >OK, I now have two of my
>>archives working - there were some problems with file ownership ( I
>>changed them all to owner = admin, and group = home, permissions, etc. I
>>added -M 16 to the lines that call glimpseindex in wgreindex and removed
>>the .glimpse directory from the directory being archived, and I changed
>>the MaxLocal line in archive.cfg. Hm - the .glimpse* files should not
>>be anywhere near the documents being indexed - Webglimpse 2.X puts all
>>the archive-related files in their own space. Unless you have your own
>>.glimpse directory somewhere? General note - if you do use the -M switch
>>to speed things up, for a large archive larger settings are better, up to
>>-M 200 and higher if you have the memory (that would be 200Mb). This
>>lets glimpseindex use more memory for scratch space while indexing. Of
>>course don't do it if you're using swap for memory - defeats the purpose!
>> >I still have one problem, the third archive, which is supposed to
>>include our whole website is not created correctly. Running wgreindex
>>seems to work, but when it is done and I try to search the archive, I get
>>an error that glimpse_partitions cannot be found. Any ideas on how to fix
>>this? Doesn't sound good, try running the wgreindex script by hand and
>>look for error messages. You can pipe the output to a file and just look
>>at the error output like so cd /your/archive/directory > wgout >On
>>another note, I can't modify the search page wgall to hide Jump to Line
>>and Use Filters checkboxes. I want to hide Use Filters because I didn't
>>index anything but .html. Jump to Line just doesn't work. I get a script
>>file not found error. It looks as though it is trying to run <>. And if I
>>do get my search page to work, can I copy it over wgall.html in all the
>>archive directories. Hm - hiding the checkboxes is just a matter of
>>changing TYPE=CHECKBOX to TYPE=HIDDEN in the html code. You can
>>certainly use any search page you want, if you want to make changes to
>>the templates to be used for future archives then you need to edit
>> /your/webglimpse/home/templates/wgall.html (The default location would
>>be /usr/local/wg2/templates/wgall.html) That is the template used to
>>create archives. Note, it is overwritten when you upgrade webglimpse, so
>>if you do make changes to it make sure to set the permissions to not
>>writable, or make a backup for the next time you upgrade. The way
>>mfs.cgi finds the archive has been changed recently, what version of
>>Webglimpse are you using? Hope some of that helps... --Golda >Thanks
>>for any help. >-- > >>Chemical Propulsion Information Agency
>>From ???@??? Tue Sep 24 >13:31:12 2002 X-POP3-Rcpt: golda@iwhome.com
>>Received: from >master.nyc.office.juno.com (staff.juno.com
>>[64.136.7.196]) by iwhome.com >(8.9.3/8.9.3) with ESMTP id KAA11697 for
>>; Tue, 24 Sep 2002 10:27:50 -0700 >Received: from localhost (localhost)
>>by master.nyc.office.juno.com >(8.8.6.Beta0/8.8.7/juno-1.2) with internal
>>id NAAAA12123; Tue, 24 Sep 2002 >13:27:46 -0400 (EDT) Date: Tue, 24 Sep
>>2002 13:27:46 -0400 (EDT) From: Mail >Delivery Subsystem Message-Id:
>>To: MIME-Version: 1.0 Content-Type: >multipart/report;
>>report-type=delivery-status; >"" Subject: >Returned mail: User unknown
>>Auto-Submitted: auto-generated (failure) The >original message was
>>received at Tue, 24 Sep 2002 13:27:46 -0400 (EDT) from >iwhome.com
>>[216.19.215.2] ----- The following addresses had permanent >fatal
>>errors ----- mravi-does-not-work-here-now (expanded from: )
>>>----- Transcript of session follows ----- 550
>>>mravi-does-not-work-here-now... User unknown Reporting-MTA: dns;
>>>master.nyc.office.juno.com Received-From-MTA: DNS; iwhome.com
>>Arrival-Date: >Tue, 24 Sep 2002 13:27:46 -0400 (EDT) Final-Recipient:
>>RFC822; >mravi@master.nyc.office.juno.com X-Actual-Recipient: RFC822;
>>>mravi-does-not-work-here-now@master.nyc.office.juno.com Action: failed
>>>Status: 5.1.1 Last-Attempt-Date: Tue, 24 Sep 2002 13:27:46 -0400 (EDT)
>>>Return-Path: Received: from iwhome.com (iwhome.com [216.19.215.2]) by
>>>master.nyc.office.juno.com (8.8.6.Beta0/8.8.7/juno-1.2) with ESMTP id
>>>NAAAA12122 for ; Tue, 24 Sep 2002 13:27:46 -0400 (EDT) Received: (from
>>>majordomo@localhost) by iwhome.com (8.9.3/8.9.3) id KAA11496 for
>>>wgusers-outgoing; Tue, 24 Sep 2002 10:24:46 -0700
>>X-Authentication-Warning: >iwhome.com: majordomo set sender to
>>owner-wgusers@webglimpse.net using -f >Received: from cpia.jhu.edu
>>(cpia.jhu.edu [128.220.198.28]) by iwhome.com >(8.9.3/8.9.3) with ESMTP
>>id KAA11492 for ; Tue, 24 Sep 2002 10:24:44 -0700 >Received: from
>>[192.168.1.34] (hidden-user@[192.168.2.1]) by cpia.jhu.edu
>>>(8.9.3/8.9.2) with ESMTP id NAA02425 for ; Tue, 24 Sep 2002 13:24:08
>>-0400 >Subject: Problems indexing From: Hugh Taylor To: listserv -
>>Webglimpse >"" >X-Mailer: Ximian Evolution 1.0.8 Date: 24 Sep 2002
>>13:24:08 -0400 >Message-Id: Mime-Version: 1.0 Sender:
>>owner-wgusers@webglimpse.net >Precedence: bulk
>>------------------------------------------------------------ Golda Velez
>> (use contact form) 626-792-9277 Internet Workshop
>> http://iwhome.com Webglimpse Search Software
>>http://webglimpse.net ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Help organize the
>>world - index your own corner of the web
> -- Hugh R. Taylor Information Systems Supervisor Chemical Propulsion
>Information Agency 10630 Little Patuxent Parkway,Suite 202 Columbia, MD
>21044-3204 Phone: (410)992-9952 Fax: (410)730-4969 mailto: hrt@jhu.edu
>http://www.cpia.jhu.edu
------------------------------------------------------------
Golda Velez (use contact form) 626-792-9277
Internet Workshop http://iwhome.com
Webglimpse Search Software http://webglimpse.net
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Help organize the world - index your own corner of the web