|Help for Edit Site|
The Edit Site page edits a Site-type root. A "Site" type root is an index of pages generated by traversing links from the specified start page, but limited to pages on the same server as the start. Any links to other servers/domains will not be indexed. The list of pages can be further limited to a particular subdirectory on the starting server by editing the LimitPrefix directive by hand in the archive.cfg file.
"Traversing links" means following <A HREF...> and similar tags, just as if the program were a user clicking on links on the page. <FRAME> and <IMAGEMAP> tags are also traversed.
Site URL: specifies the starting page for the index. The domain of this starting page will be used to limit the pages included in the index.
Max # Pages sets the maximum # of pages to index for this site. For example, if this is set to 100, then only the first 100 pages traversed will be indexed.
Max # Hops limits the depth to which links will be traversed. For example, if this is set to 1, then only pages directly linked to from the starting page will be indexed. For a Site type root, this can be safely set to a large value since pages on other servers will not be gathered or indexed.
Index only links starting with limits which links are traversed by URL. For example, if this is set to http://www.mysite.com then only links starting with http://www.mysite.com will be traversed. Note, relative links wil have been transformed into full urls before the comparison.
The above is a regular expression If checked, then the above string is taken to be a regular expression instead of a fixed prefix string. For example, you could use http:\/\/([^\.]*\.)*somedomain.com(\/)|\$ to index all subdomains of somedomain.com.
Back to WGmin home