[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ResultCache problems
Hello again, I've been out working on server problems and hiring a few
students this last week, sorry for the lack of responses!
Maqsood Mohammed will be working on Glimpse & Webglimpse for the next month
or two, and he has implemented something elsewhere that could be a very
nice solution for the ResultCache problem:
Instead of trying to cache each individual query, we could index the cached
results by the keyword combination searched on. That way, we do not need
to keep multiple identical caches, and we save a lot of time when a common
search is done because we don't need to call glimpse at all. Each cache
file can be uniquely named by keyword combination, so we don't risk
overwriting.
We would need to store all the hits in the file and not rewrite the cache
file as hits are "used", but I don't think this is a big drawback. We can
time out the file based on last access time rather than creation time, so
if the file is being used currently, it will definitely not be deleted.
It seems to me this is a really neat solution, one I had not thought of.
Any comments?
--Golda
At 05:32 PM 9/17/99 -0400, Christian Vogler wrote:
>> It seems to me we can eliminate all race conditions if we just add
>> the IP address of the requestor to the name of the cache file. Any
>> objections to that?
>
>IP addresses aren't a reliable indicator of a user's identity. I can
>imagine many situations when such a scheme will fail:
>
>- Proxies (even worse are load-balancing proxies, which means that
> requests from the same end user may be routed through a different
> proxy server every time)
>
>- Firewalls that use IP masquerading
>
>- Multiuser systems
>
>I think we should think the situations through where race conditions
>can occur and decide on each whether any harm would be done. If not,
>we can ignore a specific condition. If yes, lock files are always an
>option on UNIX systems. IIRC, POSIX guarantees that ther is a flag
>that makes open test for and create a file atomically. We would,
>however, have to be very careful not to leave any stale lock files.
>
>Here a three situations that I can imagine right off the bat:
>
>1. Multiple webglimpse processes try to delete files at the same
> time. This won't be a problem with deleting files based on
> the file date. unlink() may simply fail, and that's it. But I'm
> less sure what would happen if multiple processes tried to delete files
> because they take up too much space. Gotta think about this.
>
>2. A webglimpse process tries to delete a file while another one tries to
> access it, because a user wants to see the next page of hits. A lock
> file should help here. That is, if the cache file is named .wgcache.1234,
> create something like .wgcache.1234.lock when you need to read or write
> it. If the deletion phase sees such a lock file, it should leave the file
> alone. If the query phase sees it, it should resubmit the query to
> glimpse instead of reading the cache file.
> I'm not sure if we can simply ignore the race condition, because I don't
> know what happens if a process tries to delete a file that is currently
> open in another process.
>
>3. During the deletion phase, webglimpse marks a file for deletion,
> but meanwhile another webglimpse process deletes it first, and yet
> another process then is in the middle of writing a new results list to
> the same file name. This could result in the first process pulling the
> rug from under the third process.
>
>- Christian
>
>
------------------------------------------------------------
Golda Velez gvelez@tucson.com 520-620-6878
Internet Workshop http://tucson.com
Webglimpse Search Software http://webglimpse.net
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Help organize the world - index your own corner of the web