Configuring An Archive

This document assumes you have successfully installed Webglimpse, and are now ready to configure an archive to be searched.

Sections:

  1. What is an archive?
  2. Choosing which documents to index
  3. Using the web interface to create an archive
  4. Searching and testing your archive

What is an archive?

We use the term 'archive' for the collection of documents to be indexed, along with the index itself and associated archive configuration files. By default, Webglimpse will store the indexes and archive-specific configuration files in the directory
/[webglimpse_home]/archives/[N]
where 'N' is the number of the archive. You can create multiple archives with a single installation of Webglimpse; and each archive may include local files from your server's hard drive, remote files retrieved from other servers, and/or files retrieved from your local server thru its web interface.

Choosing which documents to index

Before creating your archive, it is a good idea to have a clear picture of exactly which documents you want indexed. While this may sound obvious, it is not always so: do you want to index all the files linked in your site, all the files in a certain directory, all links from your site to other sites? If indexing by directory, do you have 'junk' laying around, that you will need to either clean up or exclude from the index? If indexing by site, is there a clear definition of the files in your 'site' - is it everything starting with http://yourdomain.com and linked to from that starting page (in any number of clicks)? Do you want to index only .html files, or also PDF, doc or compressed files?

If you will need to index PDF or .doc files, please see howto_word.html and howto_xpdf.html, respectively for instructions on installing the helper programs (filters) needed.

Using the web interface to create an archive

Step 1: Run wgarcmin.cgi

At the completion of the install, you should have been given the URL where you chose to install wgarcmin.cgi, likely something like

http://yourserver.com/cgi-bin/wgarcmin.cgi
or
http://yourserver.com/cgi-bin/webglimpse/wgarcmin.cgi
depending on where you chose to install the cgi programs. Run this script in your browser. You will be prompted for the administrative username and password you set during the install. If you have forgotton your password, you can set a new one using the htpasswd utility that comes with apache webserver. Run
htpasswd /[webglimpse_home]/.wgpasswd [newuser]
and you can add additional admin users. You will need to run this command as the user that installed the program, or as root.

Step 2: Add New Archive

Now, assuming you can run wgarcmin.cgi, you will see the initial 'Manage Archives' screen; if this is your first time, then the drop-down list will be empty. Press the Add New Archive button at the top of the screen.

Enter any Title, Category and Description you choose for the new archive. The Title you enter will be displayed in the search forms and result page. You may also choose a language at this time.

Step 3: Choose Indexing Method

Press one of the buttons: "Index by Directory", "Index by Site", or "Index by Traversing Link". This is the main choice you need to make when creating an archive - how to determine the files to be indexed. Here are some things to consider in your choice:

Step 4: Specify URL or files to index

In the "Add Directory/Site/Tree" screen, you will need to enter the URL for the files you want to index, and other information as well. The context-sensitive help by pressing 'help how does this work' will describe each field in detail.

Step 5: Build the Index

Now you should be at the Manage Archive ... screen. If the status message says 'Ready to build index', simply press the 'Build Index' button in the lower part of the screen. Note, if you have chosen a Directory type index and specified a domain name different from the one you used during the install, you may need to configure that domain name so that Webglimpse knows how to relate URLs to files on your server. The Configure Domain buttons are on the main 'Manage Archives' screen, reachable from the 'WgMin Home' link. For most installations this is not necessary, but may be in the case of multiple virtual domains hosted on a single server.

The archive will build in the background. You can press "Update Status" or reload the page until the 'Last Built...' message appears with the number of files indexed.

Searching and testing your archive

From the 'Manage Archive # X' screen, you can simply press the 'Search Archive' button to search your archive.

To integrate the search interface into your website, click on the link titled Add a search box or page to your website. This link will give you the exact codes you need to cut and paste into your website to search this particular archive, and will also tell you the exact locations on your server where the sample search forms reside. You may edit those forms in any way desired, simply preserve the FORM tag and the names of the input tags.


Return to Docs and Howtos