diff options
Diffstat (limited to 'db2/mutex/README')
-rw-r--r-- | db2/mutex/README | 105 |
1 files changed, 105 insertions, 0 deletions
diff --git a/db2/mutex/README b/db2/mutex/README new file mode 100644 index 0000000000..30d6b6a7d1 --- /dev/null +++ b/db2/mutex/README @@ -0,0 +1,105 @@ +# @(#)README 10.1 (Sleepycat) 4/12/97 + +Resource locking routines: lock based on a db_mutex_t. All this gunk +(including trying to make assembly code portable), is necessary because +System V semaphores require system calls for uncontested locks and we +don't want to make two system calls per resource lock. + +First, this is how it works. The db_mutex_t structure contains a resource +test-and-set lock (tsl), a file offset, a pid for debugging and statistics +information. + +If HAVE_SPINLOCKS is defined (i.e. we know how to do test-and-sets for +this compiler/architecture combination), we try and lock the resource tsl +TSL_DEFAULT_SPINS times. If we can't acquire the lock that way, we use +a system call to sleep for 10ms, 20ms, 40ms, etc. (The time is bounded +at 1 second, just in case.) Using the timer backoff means that there are +two assumptions: that locks are held for brief periods (never over system +calls or I/O) and that locks are not hotly contested. + +If HAVE_SPINLOCKS is not defined, i.e. we can't do test-and-sets, we use +a file descriptor to do byte locking on a file at a specified offset. In +this case, ALL of the locking is done in the kernel. Because file +descriptors are allocated per process, we have to provide the file +descriptor as part of the lock/unlock call. We still have to do timer +backoff because we need to be able to block ourselves, i.e. the lock +manager causes processes to wait by having the process acquire a mutex +and then attempting to re-acquire the mutex. There's no way to use kernel +locking to block yourself, i.e. if you hold a lock and attempt to +re-acquire it, the attempt will succeed. + +Next, let's talk about why it doesn't work the way a reasonable person +would think it should work. + +Ideally, we'd have the ability to try to lock the resource tsl, and if +that fails, increment a counter of waiting processes, then block in the +kernel until the tsl is released. The process holding the resource tsl +would see the wait counter when it went to release the resource tsl, and +would wake any waiting processes up after releasing the lock. This would +actually require both another tsl (call it the mutex tsl) and +synchronization between the call that blocks in the kernel and the actual +resource tsl. The mutex tsl would be used to protect accesses to the +db_mutex_t itself. Locking the mutex tsl would be done by a busy loop, +which is safe because processes would never block holding that tsl (all +they would do is try to obtain the resource tsl and set/check the wait +count). The problem in this model is that the blocking call into the +kernel requires a blocking semaphore, i.e. one whose normal state is +locked. + +The only portable forms of locking under UNIX are fcntl(2) on a file +descriptor/offset, and System V semaphores. Neither of these locking +methods are sufficient to solve the problem. + +The problem with fcntl locking is that only the process that obtained the +lock can release it. Remember, we want the normal state of the kernel +semaphore to be locked. So, if the creator of the db_mutex_t were to +initialize the lock to "locked", then a second process locks the resource +tsl, and then a third process needs to block, waiting for the resource +tsl, when the second process wants to wake up the third process, it can't +because it's not the holder of the lock! For the second process to be +the holder of the lock, we would have to make a system call per +uncontested lock, which is what we were trying to get away from in the +first place. + +There are some hybrid schemes, such as signaling the holder of the lock, +or using a different blocking offset depending on which process is +holding the lock, but it gets complicated fairly quickly. I'm open to +suggestions, but I'm not holding my breath. + +Regardless, we use this form of locking when HAVE_SPINLOCKS is not +defined, (i.e. we're locking in the kernel) because it doesn't have the +limitations found in System V semaphores, and because the normal state of +the kernel object in that case is unlocked, so the process releasing the +lock is also the holder of the lock. + +The System V semaphore design has a number of other limitations that make +it inappropriate for this task. Namely: + +First, the semaphore key name space is separate from the file system name +space (although there exist methods for using file names to create +semaphore keys). If we use a well-known key, there's no reason to believe +that any particular key will not already be in use, either by another +instance of the DB application or some other application, in which case +the DB application will fail. If we create a key, then we have to use a +file system name to rendezvous and pass around the key. + +Second, System V semaphores traditionally have compile-time, system-wide +limits on the number of semaphore keys that you can have. Typically, that +number is far too low for any practical purpose. Since the semaphores +permit more than a single slot per semaphore key, we could try and get +around that limit by using multiple slots, but that means that the file +that we're using for rendezvous is going to have to contain slot +information as well as semaphore key information, and we're going to be +reading/writing it on every db_mutex_t init or destroy operation. Anyhow, +similar compile-time, system-wide limits on the numbers of slots per +semaphore key kick in, and you're right back where you started. + +My fantasy is that once POSIX.1 standard mutexes are in wide-spread use, +we can switch to them. My guess is that it won't happen, because the +POSIX semaphores are only required to work for threads within a process, +and not independent processes. + +Note: there are races in the statistics code, but since it's just that, +I didn't bother fixing them. (The fix requires a mutex tsl, so, when/if +this code is fixed to do rational locking (see above), then change the +statistics update code to acquire/release the mutex tsl. |