mdb_cursor

Discussion:

mdb_cursor_del

David Barbour

2015-02-04 23:31:48 UTC

The documentation for the `mdb_cursor_del` function doesn't indicate what
happens to the state of the cursor after the targeted object is deleted.

This makes it difficult to reason about, for example, the task of scanning
through the database and deleting keys or values according to some
criterion.

What does MDB_NEXT mean on a cursor after deletion? Will it return the next
element, or should I use MDB_GET_CURRENT for the next element? Or will
these operations return in error, and I should use MDB_SET_RANGE with the
deleted key?

Similarly, what happens to every other cursor pointing to the same key? I
see you have some code that touches them, but no documentation to tell the
user's what is happening.

This needs to be documented. Iterator invalidation is painful in C++ where
it is thoroughly documented. When it's reduced to guesswork or testing
without a clear indication of intended behavior, it's worse.

Howard Chu

2015-02-05 01:13:51 UTC

Permalink

Content preview: David Barbour wrote: > The documentation for the `mdb_cursor_del`
function doesn't indicate > what happens to the state of the cursor after
the targeted object is > deleted. > > This makes it difficult to reason about,
for example, the task of > scanning through the database and deleting keys
or values according to > some criterion. > > What does MDB_NEXT mean on a
cursor after deletion? Will it return the > next element, or should I use
MDB_GET_CURRENT for the next element? Or > will these operations return in
error, and I should use MDB_SET_RANGE > with the deleted key? > > Similarly,
what happens to every other cursor pointing to the same key? > I see you
have some code that touches them, but no documentation to tell > the user's
what is happening. > > This needs to be documented. Iterator invalidation
is painful in C++ > where it is thoroughly documented. When it's reduced
to guesswork or > testing without a clear indication of intended behavior,
it's worse. [...]

Content analysis details: (-1.9 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
0.0 RCVD_IN_DNSWL_BLOCKED RBL: ADMINISTRATOR NOTICE: The query to DNSWL
was blocked. See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
for more information.
[69.43.206.106 listed in list.dnswl.org]
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
for more information.
[URIs: highlandsun.com]
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]

Post by David Barbour
The documentation for the `mdb_cursor_del` function doesn't indicate
what happens to the state of the cursor after the targeted object is
deleted.
This makes it difficult to reason about, for example, the task of
scanning through the database and deleting keys or values according to
some criterion.
What does MDB_NEXT mean on a cursor after deletion? Will it return the
next element, or should I use MDB_GET_CURRENT for the next element? Or
will these operations return in error, and I should use MDB_SET_RANGE
with the deleted key?
Similarly, what happens to every other cursor pointing to the same key?
I see you have some code that touches them, but no documentation to tell
the user's what is happening.
This needs to be documented. Iterator invalidation is painful in C++
where it is thoroughly documented. When it's reduced to guesswork or
testing without a clear indication of intended behavior, it's worse.

The behavior wasn't settled, in the beginning. But you can read the
mtest source code to see what the expected behavior is.

In an open source project, you are expected to read source code to
understand what's going on.

Things have settled down enough now to formally document it though.

The point of using cursors is to retain state about a position in the
DB, so it can be reused for a subsequent operation. This is the guiding
principle. Otherwise cursors would have no reason to exist.

After a cursor_del, the cursor's position is unchanged - it points to
the same page and node slot as it did before. But, since the node it
pointed to was deleted, the actual data it points to changes. I.e., all
nodes in the page are moved down 1 slot to occupy the vacated slot, so
it points to whatever was the next item.

Both MDB_NEXT and MDB_GET_CURRENT will return the same record after
cursor_del. This simplifies loops where you just want to delete all the
records in a range - just use MDB_NEXT like any other loop iteration.

The cursor's actual position can change if the deletion causes a page
merge. But its relative position in the set of data will still be the same.

--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

David Barbour

2015-02-05 14:51:05 UTC

Permalink

I have been.

Though, in this particular case, I needed to dig all the way down to LMDB
page layout to grok key indexes, compare to mdb_cursor_next, search for
instances of C_DEL, etc. to understand what was happening.

Reading code works better when we have something more modular and
compositional, i.e. such that you can read and understand in small pieces.
LMDB is just one big module, you can't understand most pieces without
seeing the whole picture. This isn't a bad thing for its purpose, but it
does mean good API documentation saves hours or days of reading code for
someone who wants to just use LMDB.

Things have settled down enough now to formally document it though.

Then make it so. It will save a lot of people a lot of time. :)

Regards,

Dave

Howard Chu

2015-02-06 05:34:08 UTC

Permalink

Content preview: David Barbour wrote: > > > On Wed, Feb 4, 2015 at 7:13 PM,
Howard Chu <***@symas.com > <mailto:***@symas.com>> wrote: > > > The behavior
wasn't settled, in the beginning. But you can read the > mtest source code
to see what the expected behavior is. > > In an open source project, you
are expected to read source code to > understand what's going on. > > > I
have been. > > Though, in this particular case, I needed to dig all the way
down to > LMDB page layout to grok key indexes, compare to mdb_cursor_next,
search > for instances of C_DEL, etc. to understand what was happening. [...]

Content analysis details: (-1.9 points, 5.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked.
See
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
for more information.
[URIs: highlandsun.com]
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]

Post by Howard Chu
The behavior wasn't settled, in the beginning. But you can read the
mtest source code to see what the expected behavior is.
In an open source project, you are expected to read source code to
understand what's going on.
I have been.
Though, in this particular case, I needed to dig all the way down to
LMDB page layout to grok key indexes, compare to mdb_cursor_next, search
for instances of C_DEL, etc. to understand what was happening.

You just needed to read mtest.c to understand the intended behavior.

Post by Howard Chu
Things have settled down enough now to formally document it though.
Then make it so. It will save a lot of people a lot of time. :)

Suggest the language for such a change. Patch welcome.

--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/