[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Webglimpse Home]

Re: Problems indexing



15072 is an odd number; did you check .wg_err for reasons they were not
indexed, or .wg_log for any entries associated with the missing files?

With a directory index, the file list is generated from the find command,
if you run

	find /dir -type f -follow -print > tmpfile

where "/dir" is the directory you are indexing, do you get all the files
that way?  (Sounds like to fully test you'd have to do all 21; but maybe
you can just try it with the directories that are missing files)

--Golda

At 02:57 PM 10/18/02 -0400, Hugh Taylor wrote:
>    I finally found some time to get back and troubleshoot the problem. It
>looks as though the file list being created in .wg_toindex is only getting
>15072 filenames, not the 21,000+ files that are in the directories. (I've
>split the files into 21 directories) Is there a configuration file that
>needs modification? I have set the MaxLocal in archive.cfg to 30000. Thanks. 
>  
> On Wed, 2002-09-25 at 13:47, Golda Velez wrote:     
>> At 01:24 PM 9/24/02 -0400, Hugh Taylor wrote: >OK, I now have two of my
>>archives working - there were some problems with file ownership ( I
>>changed them all  to owner = admin, and group = home, permissions, etc. I
>>added -M 16 to the lines that call glimpseindex in wgreindex and removed
>>the .glimpse directory from the directory being archived, and I changed
>>the MaxLocal line in archive.cfg.   Hm - the .glimpse* files should not
>>be anywhere near the documents being indexed - Webglimpse 2.X puts all
>>the archive-related files in their own space.  Unless you have your own
>>.glimpse directory somewhere?  General note - if you do use the -M switch
>>to speed things up, for a large archive larger settings are better, up to
>>-M 200 and higher if you have the memory (that would be 200Mb).  This
>>lets glimpseindex use more memory for scratch space while indexing.  Of
>>course don't do it if you're using swap for memory - defeats the purpose!
>> >I still have one problem, the third archive, which is supposed to
>>include our whole website is not created correctly.  Running wgreindex
>>seems to work, but when it is done and I try to search the archive, I get
>>an error that glimpse_partitions cannot be found. Any ideas on how to fix
>>this?   Doesn't sound good, try running the wgreindex script by hand and
>>look for error messages.  You can pipe the output to a file and just look
>>at the error output like so  	cd /your/archive/directory > wgout	  >On
>>another note, I can't modify the search page wgall to hide Jump to Line
>>and Use Filters checkboxes. I want to hide Use Filters because I didn't
>>index anything but .html. Jump to Line just doesn't work. I get a script
>>file not found error. It looks as though it is trying to run <>. And if I
>>do get my search page to work, can I copy it over wgall.html in all the
>>archive directories.   Hm - hiding the checkboxes is just a matter of
>>changing   	TYPE=CHECKBOX to 	TYPE=HIDDEN  in the html code.    You can
>>certainly use any search page you want, if you want to make changes to
>>the templates to be used for future archives then you need to edit 
>>	/your/webglimpse/home/templates/wgall.html  (The default location would
>>be /usr/local/wg2/templates/wgall.html)  That is the template used to
>>create archives.  Note, it is overwritten when you upgrade webglimpse, so
>>if you do make changes to it make sure to set the permissions to not
>>writable, or make a backup for the next time you upgrade.  The way
>>mfs.cgi finds the archive has been changed recently, what version of
>>Webglimpse are you using?  Hope some of that helps...  --Golda   >Thanks
>>for any help.  >--  > >>Chemical Propulsion Information Agency       
>>From ???@??? Tue Sep 24 >13:31:12 2002 X-POP3-Rcpt: golda@iwhome.com
>>Received: from >master.nyc.office.juno.com (staff.juno.com
>>[64.136.7.196]) 	by iwhome.com >(8.9.3/8.9.3) with ESMTP id KAA11697 	for
>>; Tue, 24 Sep 2002 10:27:50 -0700 >Received: from localhost (localhost) 
>>by master.nyc.office.juno.com >(8.8.6.Beta0/8.8.7/juno-1.2) with internal
>>id NAAAA12123;  Tue, 24 Sep 2002 >13:27:46 -0400 (EDT) Date: Tue, 24 Sep
>>2002 13:27:46 -0400 (EDT) From: Mail >Delivery Subsystem  Message-Id: 
>>To:  MIME-Version: 1.0 Content-Type: >multipart/report;
>>report-type=delivery-status; >"" Subject: >Returned mail: User unknown
>>Auto-Submitted: auto-generated (failure)  The >original message was
>>received at Tue, 24 Sep 2002 13:27:46 -0400 (EDT) from >iwhome.com
>>[216.19.215.2]     ----- The following addresses had permanent >fatal
>>errors ----- mravi-does-not-work-here-now     (expanded from: )    
>>>----- Transcript of session follows ----- 550
>>>mravi-does-not-work-here-now... User unknown Reporting-MTA: dns;
>>>master.nyc.office.juno.com Received-From-MTA: DNS; iwhome.com
>>Arrival-Date: >Tue, 24 Sep 2002 13:27:46 -0400 (EDT)  Final-Recipient:
>>RFC822; >mravi@master.nyc.office.juno.com X-Actual-Recipient: RFC822;
>>>mravi-does-not-work-here-now@master.nyc.office.juno.com Action: failed
>>>Status: 5.1.1 Last-Attempt-Date: Tue, 24 Sep 2002 13:27:46 -0400 (EDT)
>>>Return-Path:  Received: from iwhome.com (iwhome.com [216.19.215.2])  by
>>>master.nyc.office.juno.com (8.8.6.Beta0/8.8.7/juno-1.2) with ESMTP id
>>>NAAAA12122  for ; Tue, 24 Sep 2002 13:27:46 -0400 (EDT) Received: (from
>>>majordomo@localhost) 	by iwhome.com (8.9.3/8.9.3) id KAA11496 	for
>>>wgusers-outgoing; Tue, 24 Sep 2002 10:24:46 -0700
>>X-Authentication-Warning: >iwhome.com: majordomo set sender to
>>owner-wgusers@webglimpse.net using -f >Received: from cpia.jhu.edu
>>(cpia.jhu.edu [128.220.198.28]) 	by iwhome.com >(8.9.3/8.9.3) with ESMTP
>>id KAA11492 	for ; Tue, 24 Sep 2002 10:24:44 -0700 >Received: from
>>[192.168.1.34] (hidden-user@[192.168.2.1]) 	by cpia.jhu.edu
>>>(8.9.3/8.9.2) with ESMTP id NAA02425 	for ; Tue, 24 Sep 2002 13:24:08
>>-0400 >Subject: Problems indexing From: Hugh Taylor  To: listserv -
>>Webglimpse  >"" >X-Mailer: Ximian Evolution 1.0.8  Date: 24 Sep 2002
>>13:24:08 -0400 >Message-Id:  Mime-Version: 1.0 Sender:
>>owner-wgusers@webglimpse.net >Precedence: bulk 
>>------------------------------------------------------------ Golda Velez 
>>       (use contact form)       626-792-9277 Internet Workshop        
>>                 http://iwhome.com Webglimpse Search Software            
>>http://webglimpse.net 		~~~~~~~~~~~~~~~~~~~~~~~~~~~  Help organize the
>>world - index your own corner of the web      
>    --   Hugh R. Taylor Information Systems Supervisor Chemical Propulsion
>Information Agency 10630 Little Patuxent Parkway,Suite 202 Columbia, MD
>21044-3204 Phone: (410)992-9952 Fax: (410)730-4969 mailto: hrt@jhu.edu
>http://www.cpia.jhu.edu       
------------------------------------------------------------
Golda Velez         (use contact form)       626-792-9277
Internet Workshop                          http://iwhome.com
Webglimpse Search Software             http://webglimpse.net
		~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Help organize the world - index your own corner of the web