[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: replacing the monster regex





>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<<

On 1/29/02, 4:13:42 PM, Golda Velez <(use contact form)> wrote regarding 
Re: replacing the monster regex:


> At 04:18 PM 1/29/02 GMT, Derek Pomery wrote:
> >However, when you have a path like:
> >http://thiserver.thislongveryverylongdomainname.net/project/subproject/s
> >ubsubproject/somefurtherdivision/andanother/oneortwo/forfurtherorganizat
> >ion/A ridiculously long file name that describes exactly what the use
> >case is.html
> >
> >I was finding it was unsurprisingly breaking the splits || regex.
> >Hadn't gone back to see what the actual value for the substr was, since
> >everything worked fine without it. :)
> >

> Must have been a monster name indeed - the default limit to the substr 
was
> 10000 chars!

> But, this is still a good point, and if it runs fast enough without the
> substr we'll just drop it.

Ah.  Took the time to look around, and found this line:
$QS_maxchars = '500';   # Maximum number of characters to print (in case 
file has no line breaks) 
Chalk it up to my foolishness probably, long ago, and not your substr()  
:)
That was probably due to someone complaining about binary files like 
MSWord docs putting gibberish in the output, so I trimmed 10,000 to 500 
without looking at the side effects.
Possibly doing the substr()  *AFTER* having extracted the file name part 
(that part of the line divided by FILE_END_MARK as opposed to ":") might 
make more sense.

Filtering the file contents through the same filters wgindex uses might 
be nice too, if not too slow.

> The code I sent you expects to be dropped into a new version of makenh 
that
> does the %20 to ' ' replacements and a bunch of other stuff as well.  
I'll
> try to get it off to you tonight so you don't have to spend time
> integrating the fragment I sent into the old version.

Thanks.  I only need the CGIs  and their libraries I think, but can't 
hurt to update everything.  (granted, I'll have to be careful I don't 
blow away all the minor tweaking I've done :) )