[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Perl of makenh



Can you tell me if there is a problem with this code out of makenh:

    open(ROBOTFILE, $TEMPROBOTFILE);  # assume it'll work
   while(<ROBOTFILE>){
      s/\#.*$//;                # remove comments

      if(/^User-agent:.*\W$ROBOTNAME\W/io ||
         /^User-agent:\s*[*]/io){
         # check for paths
         print LOGFILE " Found reference to this robot in robot file\n";
         while(<ROBOTFILE>){
            if(/^Disallow:\s*(\S+)\s*(\#.*)?/){
               print LOGFILE " Robot disallowed for $1\n";
               push(@paths, $1);
            }else{
               last;  # we're done with the record
            }
         }
      }
   }

Our robots.txt file looks like this:

User-agent: GPOHTTPGET
Disallow:

User-agent: *
Disallow: /

We set up our ROBOTNAME as GPOHTTPGET. It appears that we are reading that
correctly in the file, and getting the first Disallow: that should allow our
Robot to not disallow anything. But it also appears that we are getting the
second one (specified by the asterisk) and then disallowing everything.

The way I read the standard, this robots.txt file should allow GPOHTTPGET to
index the whole site while excluding everybody else.

I'm not too familiar with PERL and I don't want to try to fix the code if
it's not broken. But it doesn't seem to be working correctly.

(add or subscribe to mailing list)

Russell