|Help for Add Link Tree|
The Add Link Tree page adds a Tree-type root to an archive. A "Tree" type root is an index of pages generated by traversing links from the specified start page, limited by the number of "hops" chosen and by the rules for traversing remote sites. The Tree root is the most flexible but also complex archive component; it has potential to index an unexpectedly large number of pages so the "Max # Remote" pages limit should be chosen to restrain a runaway gatherer.
"Traversing links" means following <A HREF...> and similar tags, just as if the program were a user clicking on links on the page. <FRAME> and <IMAGEMAP> tags are also traversed.
Start URL: specifies the starting page for the index.
Max # Hops limits the depth to which links will be traversed. For example, if this is set to 1, then only pages directly linked to from the starting page will be indexed. For a Tree type root, this should NOT be set too high; usually a value of 1 or 2 is sufficient. 3 hops from yahoo.com would be a lot of pages...
Follow links... these checkboxes give you fairly precise control over how the traversing spider should behave. In the examples below, let http://A.com/a.html be the start page; let A.com be a local server; let there be links from a.html to http://A.com/b.html and http://B.com/b.html; and from b.html to http://B.com/c.html and http://C.com/c.html. Make a drawing if this sounds confusing.
Max # Local Pages sets the maximum # of pages to index from the local server. The storage requirements are lower for indexing local pages because they do not need to be copied to the local machine before being indexed.
Max # Remote Pages sets the maximum # of pages to gather and index from remote servers. These pages are actually retrieved and stored on the local server as well as being indexed.
Reindex Freq is NOT actually functional as of version 2.0.04. It will be used to create a crontab fragment that can be manually included in a users' crontab to reindex regularly.
Back to WGmin home