[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: PDF Indexing



Hi Golda,

I checked the -z option in wgreindex and it was included in both places that
glimpesindex is called out.

I may have another workaround for this perplexing problem.  I found a direct
translator from PDF to text.  Its called xpdf and it doesn't need ghostview.
The package is a freeware package.  I just modified the processpdf.pl script
and it seems to work.  I haven't doen much testing yet, but the performance
seems comparable.

Steve

-----Original Message-----
From: Golda Velez [mailto:gvelez@tucson.com]
Sent: Sunday, May 23, 1999 3:49 AM
To: Wix, Steven D
Subject: Re: PDF Indexing


Hi Steven - I got your files, but actually I don't have ghostscript
installed myself. Are you sure that glimpseindex has the -z option in
wgreindex?  Remember, this is not installed automatically, you have to do
it by hand every time.  I'll change that in the next release.

--G 

At 08:55 AM 5/19/99 -0600, you wrote:
>Golda,
>
>I've been working a bit on that lingering PDF problem I've been having on a
>specific SUN platform
>The OS is solaris 5.5.1 and the hardware is an ULTRA-2
>
>I updated the webglimpse software to 1.7.1 and glimpse is 4.1.  Perl is
>5.005_02, ghostscript is 5.50 and pstotext is 1.8
>
>Heres what I get when I search for the word Micro.  The results look like
>stuff in the native pdf language. Webglimpse is picking up the word micro
>from 
>the line containing /Author Microsim on the 5th line.
>
>Search Results
>
>Query was: Micro    Search on entire archive
>File name (modification date), and list of matched lines
>
> http://sass2152.csu891.sandia.gov/TESTPDF2/docs/GETSTAR.PDF, Mar 6 1999
> endstream endobj 549 0 obj << /Type /Pages /Kids [ 562 0 R 1 0 R 19 0 R 37
>0 R 51 0 R 59 0 R ] /Count 6 /Parent 550 0 R >>
> endobj 550 0 obj << /Type /Pages /Kids [ 549 0 R 551 0 R 553 0 R 554 0 R
>555 0 R 556 0 R 557 0 R 558 0 R ] /Count 51 >>
> endobj 551 0 obj << /Type /Pages /Kids [ 67 0 R 75 0 R 86 0 R 94 0 R 104 0
>R 115 0 R 126 0 R ] /Count 7 /Parent 550 0 R >>
> endobj 552 0 obj << /CreationDate (D:19970623093430) /Producer (Acrobat
>Distiller 2.1 for Windows) /Title (Getting Started with
> MicroSim) /Author (MicroSim Corporation) /ModDate (D:19970709142905) >>
>endobj 553 0 obj << /Type /Pages /Kids [ 137 0
> R 149 0 R 161 0 R 171 0 R 184 0 R ] /Count 5 /Parent 550 0 R >> endobj 554
>0 obj << /Type /Pages /Kids [ 194 0 R 205 0 R
> 215 0 R 225 0 R 235 0 R ] /Count 5 /Parent 550 0 R >> endobj 555 0 obj <<
>/Type /Pages /Kids [ 245 0 R 255 0 R 268 0 R 278
> 0 R 288 0 R 299 0 R 310 0 R 321 0 R ] /Count 8 /Parent 550 0 R >> endobj
>556 0 obj << /Type /Pages /Kids [ 332 0 R 346 0 R
> 360 0 R 370 0 R 381 0 R ] /Count 5 /Parent 550 0 R >> endobj 557 0 obj <<
>/Type /Pages /Kids [ 391 0 R 403 0 R 413 0 R 426
> 0 R 436 0 R ] /Count 5 /Parent 550 0 R >> endobj 558 0 obj << /Type /Pages
>/Kids [ 450 0 R 458 0 R 466 0 R 474 0 R 482 0 R
> 490 0 R 498 0 R 506 0 R 514 0 R 522 0 R ] /Count 10 /Parent 550 0 R >>
>endobj xref 0 559 0000000000 65535 f 
>
>
>
>I've also attached the PDF file.  Can you index the attached pdf file and
>perform a search on Micro?  I'm curious about the results you have and if
>the search results can be duplicated
>
> <<GETSTAR.PDF>> 
>Thanks
>
>_______________________________________________________
>_______________________________________________________
>
>                                   Steven D. Wix
>                           Software R&D, Information Specialist
>     _/_/_/   _/    _/ _/    SANDIA NATIONAL LABORATORIES   _/_/_/
>   _/        _/_/  _/ _/     P. O. Box 5800, MS 0525
>_/_/
>  _/_/_/   _/ _/ _/  _/    Albuquerque, NM 87185-0525
_/_/_/_/_/_/
>      /   _/  _/_/  _/        phone: (505) 844-0778                  _/
>_/_/     _/
>_/_/_/   _/    _/  _/_/_/_/  fax: (505) 844-8168                 _/   _/_/
>_/
> 
>_/_/_/
>
> Sandia National Laboratories
> Component Information and Models Department 1734
> e-mail: sdwix@sandia.gov
>
> "If you build it, they will come"                    Go Gators!
>" Go the distance"
>_____________________________________________________
>_____________________________________________________
>
>
>
------------------------------------------------------------------
Golda Velez	     mailto:gvelez@tucson.com	Ph. (520) 620-6878
Internet WorkShop    http://tucson.com		FAX (520) 620-6841