Discussion:
[LMDB] Lockups with robust mutexes and crashing processes
Marcos-David Dione
2014-11-21 12:31:51 UTC
Permalink
I already posted this to the IRC channel, but there was no
response, so I repost this here.

I'm trying out lmdb from master, including the robust mutex code.
We're experiencing lock ups after the process holding the lock dies, as if
the robust lock was not recovered. I tried to come up with an lmdb example
that shows it and I got it, just a few lines. It uses fork() just to
automate it; see that the environment is opened in both children. Here's
the code:

http://pastebin.com/Cbbri6az

If I run this, I see that one of the children waits for the write
lock and is not awakened when the other child dies without closing the txn
(but notice I close the env). This is on purpose, to simulate a crashing
process.The worst part is that I can't reproduce it using directly
libpthread and mmap. Here is the code I came up with:

http://pastebin.com/ybR5L4cP

It's a little bit more verbose because I based it on a glibc test
case.

Are we missing anything? It seems to us that the code follows does
not break any of LMDB's caveats (specially the one about creating the envs
before fork()'ing. Is it wrong to assume that the waiting process should
recover the lock from staleness?

--
Marcos Dione
Astek Sud-Est
R&D-SSP-DTA-TAE-TDS
for Amadeus SAS
T: +33 (4)4 9704 1727
marcos-***@amadeus.com
Howard Chu
2014-11-23 14:11:29 UTC
Permalink
Content preview: Marcos-David Dione wrote: > I already posted this to the IRC
channel, but there was no > response, so I repost this here. ... already
followed up in IRC. > I'm trying out lmdb from master, including the robust
mutex > code. We're experiencing lock ups after the process holding the lock
dies, as if the robust lock was not recovered. I tried to come up with
an lmdb example that shows it and I got it, just a few lines. It uses >
fork() just to automate it; see that the environment is opened in both > children.
Here's the code: > > http://pastebin.com/Cbbri6az [...]

Content analysis details: (-1.9 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
I already posted this to the IRC channel, but there was no
response, so I repost this here.
... already followed up in IRC.
I'm trying out lmdb from master, including the robust mutex
code. We're experiencing lock ups after the process holding the lock
dies, as if the robust lock was not recovered. I tried to come up with
an lmdb example that shows it and I got it, just a few lines. It uses
fork() just to automate it; see that the environment is opened in both
http://pastebin.com/Cbbri6az
The example is broken; it does not mimic the behavior of a crashed
process. In particular it does a clean call to mdb_env_close() but
doesn't call mdb_txn_abort() first. An actual crashing process would not
make the call to mdb_env_close(), and a cleanly exiting process would
close all outstanding transactions before calling env_close.
If I run this, I see that one of the children waits for the
write lock and is not awakened when the other child dies without closing
the txn (but notice I close the env). This is on purpose, to simulate a
crashing process.The worst part is that I can't reproduce it using
http://pastebin.com/ybR5L4cP
It's a little bit more verbose because I based it on a glibc
test case.
Are we missing anything? It seems to us that the code follows
does not break any of LMDB's caveats (specially the one about creating
the envs before fork()'ing. Is it wrong to assume that the waiting
process should recover the lock from staleness?
env_close does an munmap of the memory containing the mutex. According
to the manpages, a robust mutex is supposed to automatically unlock when
unmapped. Since this is not happening, it appears you've found a kernel
bug. Regardless, the example is invalid. If you modify the code to just
exit/abort/die without the bogus call to env_close, the other process
wakes up correctly. E.g.

http://pastebin.com/9jieDnUz
--
Marcos Dione
Astek Sud-Est
R&D-SSP-DTA-TAE-TDS
for Amadeus SAS
T: +33 (4)4 9704 1727
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Marcos-David Dione
2014-11-25 14:16:09 UTC
Permalink
Seen like that I'm not sure if there's a defined behaviour
for that. I'll ask in the glibc and/or kernel MLs and I'll come
back with the answer.
We found a situation where a robust mutex cannot be recovered
from a stale lock and we're wondering if it's simply an undefined
situation or a bug in the kernel. Attached you will find the sample
code, which is loosely based on a glibc's test case.The gist of it is
as
1. we open a file.
2. we mmap it and use that mem to store a robust mutex.
3. we lock the mutex.
4. we munmap the file.
5. we close the file.
Undefined behaviour.
This results in undefined behaviour since the allocated storage for
the mutex object has been lost. You need to keep that storage around
for the robust algorithms to work with. Without any data you can't
do anything.
Full answer:

https://sourceware.org/ml/libc-help/2014-11/msg00035.html

--
Marcos Dione
Astek Sud-Est
R&D-SSP-DTA-TAE-TDS
for Amadeus SAS
T: +33 (4)4 9704 1727
marcos-***@amadeus.com
Howard Chu
2014-11-25 15:19:40 UTC
Permalink
Content preview: Marcos-David Dione wrote: > Marcos-David Dione/NCE/AMADEUS
wrote on 24/11/2014 11:10:42: > > Seen like that I'm not sure if there's
a defined behaviour > > for that. I'll ask in the glibc and/or kernel MLs
and I'll come > > back with the answer. > > and here's the answer: > > > On
11/24/2014 03:34 PM, Marcos Dione wrote: > > > We found a situation where
a robust mutex cannot be recovered > > > from a stale lock and we're wondering
if it's simply an undefined > > > situation or a bug in the kernel. Attached
you will find the sample > > > code, which is loosely based on a glibc's
test case.The gist of it > is as > > > follows: > > > > > > 1. we open a file.
Post by Marcos-David Dione
2. we mmap it and use that mem to store a robust mutex. > > > 3. we
lock the mutex. > > > 4. we munmap the file. > > > 5. we close the file.
Post by Marcos-David Dione
Undefined behaviour. > > > > This results in undefined behaviour since
the allocated storage for > > the mutex object has been lost. You need to
keep that storage around > > for the robust algorithms to work with. Without
any data you can't > > do anything. > > Full answer: > > https://sourceware.org/ml/libc-help/2014-11/msg00035.html
[...]

Content analysis details: (-1.9 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
Post by Marcos-David Dione
Seen like that I'm not sure if there's a defined behaviour
for that. I'll ask in the glibc and/or kernel MLs and I'll come
back with the answer.
We found a situation where a robust mutex cannot be recovered
from a stale lock and we're wondering if it's simply an undefined
situation or a bug in the kernel. Attached you will find the sample
code, which is loosely based on a glibc's test case.The gist of it
is as
1. we open a file.
2. we mmap it and use that mem to store a robust mutex.
3. we lock the mutex.
4. we munmap the file.
5. we close the file.
Undefined behaviour.
This results in undefined behaviour since the allocated storage for
the mutex object has been lost. You need to keep that storage around
for the robust algorithms to work with. Without any data you can't
do anything.
https://sourceware.org/ml/libc-help/2014-11/msg00035.html
Fyi, this would not have been a bug in Solaris:

https://docs.oracle.com/cd/E19253-01/816-5168/pthread-mutexattr-setrobust-np-3c/index.html
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu
2014-11-25 16:21:35 UTC
Permalink
Content preview: Marcos-David Dione wrote: > "openldap-devel" <openldap-devel-***@openldap.org>
wrote on > 25/11/2014 16:19:40: > > Fyi, this would not have been a bug in
Solaris: > > > > https://docs.oracle.com/cd/E19253-01/816-5168/pthread-mutexattr-
setrobust-np-3c/index.html > > you mean, because of the phrase «When
the owner of a mutex with > the PTHREAD_MUTEX_ROBUST_NP /robustness/ attribute
dies, or when the > process containing such a locked mutex unmaps the memory
containing the > mutex or performs one of the _exec(2)_ > <https://docs.oracle.com/docs/cd/E19253-01/816-5167/exec-2/index.html>functions,
the mutex is unlocked.»? I'm not sure that when it says 'unmap' it means
'munmap()s'. [...]
Content analysis details: (-1.9 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
Cc: openldap-***@openldap.org, ***@grulic.org.ar
X-BeenThere: openldap-***@openldap.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: OpenLDAP development discussion list <openldap-devel.openldap.org>
List-Unsubscribe: <http://www.openldap.org/lists/mm/options/openldap-devel>,
<mailto:openldap-devel-***@openldap.org?subject=unsubscribe>
List-Archive: <http://www.openldap.org/lists/openldap-devel/>
List-Post: <mailto:openldap-***@openldap.org>
List-Help: <mailto:openldap-devel-***@openldap.org?subject=help>
List-Subscribe: <http://www.openldap.org/lists/mm/listinfo/openldap-devel>,
<mailto:openldap-devel-***@openldap.org?subject=subscribe>
Errors-To: openldap-devel-***@openldap.org
Sender: "openldap-devel" <openldap-devel-***@openldap.org>
X-Spam-Score: -1.9 (-)
X-Spam-Report: Spam detection software, running on the system "gauss.openldap.net", has
identified this incoming email as possible spam. The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email. If you have any questions, see
@@CONTACT_ADDRESS@@ for details.

Content preview: Marcos-David Dione wrote: > "openldap-devel" <openldap-devel-***@openldap.org>
wrote on > 25/11/2014 16:19:40: > > Fyi, this would not have been a bug in
Solaris: > > > > https://docs.oracle.com/cd/E19253-01/816-5168/pthread-mutexattr-
setrobust-np-3c/index.html > > you mean, because of the phrase «When
the owner of a mutex with > the PTHREAD_MUTEX_ROBUST_NP /robustness/ attribute
dies, or when the > process containing such a locked mutex unmaps the memory
containing the > mutex or performs one of the _exec(2)_ > <https://docs.oracle.com/docs/cd/E19253-01/816-5167/exec-2/index.html>functions,
the mutex is unlocked.»? I'm not sure that when it says 'unmap' it means
'munmap()s'. [...]
Content analysis details: (-1.9 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
https://docs.oracle.com/cd/E19253-01/816-5168/pthread-mutexattr-
setrobust-np-3c/index.html
you mean, because of the phrase «When the owner of a mutex with
the PTHREAD_MUTEX_ROBUST_NP /robustness/ attribute dies, or when the
process containing such a locked mutex unmaps the memory containing the
mutex or performs one of the _exec(2)_
<https://docs.oracle.com/docs/cd/E19253-01/816-5167/exec-2/index.html>functions,
the mutex is unlocked.»? I'm not sure that when it says 'unmap' it means
'munmap()s'.
There is no other meaning of the word. A process-shared mutex must
reside in shared memory. When a process detaches from that shared
memory, it is unmapped.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
Marcos-David Dione
2014-11-24 10:10:42 UTC
Permalink
Post by Howard Chu
env_close does an munmap of the memory containing the mutex. According
to the manpages, a robust mutex is supposed to automatically unlock when
unmapped. Since this is not happening, it appears you've found a kernel
bug. Regardless, the example is invalid. If you modify the code to just
exit/abort/die without the bogus call to env_close, the other process
wakes up correctly. E.g.
http://pastebin.com/9jieDnUz
Ok, so now I managed to mimic the situation with pure pthreads
functions.
The gist of it is as following:

1. we open a file.
2. we mmap the file and use the mem space to store a mutex.
3. we lock the mutex.
4. we unmmap the file.
5. we close the file.

Seen like that I'm not sure if there's a defined behaviour for
that. I'll
ask in the glibc and/or kernel MLs and I'll come back with the answer.

Just for the record, here's the full program:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>

void *tf (int n, int f, pthread_mutex_t *m) {
int err = pthread_mutex_lock (m);
printf ("ml: %d\n", err);
if (err == EOWNERDEAD) {
err = pthread_mutex_consistent_np (m);
printf ("mc: %d\n", err);
if (err) {
puts ("pthread_mutex_consistent_np");
exit (1);
}
} else if (err) {
puts ("pthread_mutex_lock");
exit (1);
}
printf ("%ld got the lock.\n", n);
sleep (3);
/* exit without unlock */
munmap (m, sizeof (pthread_mutex_t));
close (f);
printf ("%ld out\n", n);
return NULL;
}

int main (void) {
int err, f;
pthread_mutex_t *m;
pthread_mutexattr_t ma;

pthread_mutexattr_init (&ma);
err = pthread_mutexattr_setrobust_np (&ma, PTHREAD_MUTEX_ROBUST_NP);
if (err) {
puts ("pthread_mutexattr_setrobust_np");
return 1;
}
err = pthread_mutexattr_setpshared (&ma, PTHREAD_PROCESS_SHARED);
if (err) {
puts ("pthread_mutexattr_setpshared");
return 1;
}
#ifdef ENABLE_PI
if (pthread_mutexattr_setprotocol (&ma, PTHREAD_PRIO_INHERIT) != 0) {
puts ("pthread_mutexattr_setprotocol failed");
return 1;
}
#endif

f= open ("mutex.mmap", O_CREAT|O_TRUNC|O_RDWR);
if (f<0) {
puts ("open");
return 1;
}

err= ftruncate (f, sizeof (pthread_mutex_t));
if (err) {
puts ("ftruncate");
return 1;
}

m= (pthread_mutex_t *) mmap (NULL, sizeof (pthread_mutex_t),
PROT_READ|PROT_WRITE, MAP_SHARED, f, 0);

err = pthread_mutex_init (m, &ma);
#ifdef ENABLE_PI
if (err == ENOTSUP) {
puts ("PI robust mutexes not supported");
return 0;
}
#endif
if (err) {
puts ("pthread_mutex_init");
return 1;
}

err= fork ();
if (err==0) {
tf (1, f, m);
} else if (err>0) {
int err2= fork ();
if (err2==0) {
tf (2, f, m);
} else if (err2>0) {
// sleep (1);
// kill (err);
// printf ("child killed\n");
// sleep (10);
puts ("main out");
} else {
puts ("fork2");
return 1;
}
} else {
puts ("fork1");
return 1;
}
return 0;
}
Loading...