intro.doc 8.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192
  1. /*
  2. * Copyright 2015-2021 Howard Chu, Symas Corp.
  3. * All rights reserved.
  4. *
  5. * Redistribution and use in source and binary forms, with or without
  6. * modification, are permitted only as authorized by the OpenLDAP
  7. * Public License.
  8. *
  9. * A copy of this license is available in the file LICENSE in the
  10. * top-level directory of the distribution or, alternatively, at
  11. * <http://www.OpenLDAP.org/license.html>.
  12. */
  13. /** @page starting Getting Started
  14. LMDB is compact, fast, powerful, and robust and implements a simplified
  15. variant of the BerkeleyDB (BDB) API. (BDB is also very powerful, and verbosely
  16. documented in its own right.) After reading this page, the main
  17. \ref mdb documentation should make sense. Thanks to Bert Hubert
  18. for creating the
  19. <a href="https://github.com/ahupowerdns/ahutils/blob/master/lmdb-semantics.md">
  20. initial version</a> of this writeup.
  21. Everything starts with an environment, created by #mdb_env_create().
  22. Once created, this environment must also be opened with #mdb_env_open().
  23. #mdb_env_open() gets passed a name which is interpreted as a directory
  24. path. Note that this directory must exist already, it is not created
  25. for you. Within that directory, a lock file and a storage file will be
  26. generated. If you don't want to use a directory, you can pass the
  27. #MDB_NOSUBDIR option, in which case the path you provided is used
  28. directly as the data file, and another file with a "-lock" suffix
  29. added will be used for the lock file.
  30. Once the environment is open, a transaction can be created within it
  31. using #mdb_txn_begin(). Transactions may be read-write or read-only,
  32. and read-write transactions may be nested. A transaction must only
  33. be used by one thread at a time. Transactions are always required,
  34. even for read-only access. The transaction provides a consistent
  35. view of the data.
  36. Once a transaction has been created, a database can be opened within it
  37. using #mdb_dbi_open(). If only one database will ever be used in the
  38. environment, a NULL can be passed as the database name. For named
  39. databases, the #MDB_CREATE flag must be used to create the database
  40. if it doesn't already exist. Also, #mdb_env_set_maxdbs() must be
  41. called after #mdb_env_create() and before #mdb_env_open() to set the
  42. maximum number of named databases you want to support.
  43. Note: a single transaction can open multiple databases. Generally
  44. databases should only be opened once, by the first transaction in
  45. the process. After the first transaction completes, the database
  46. handles can freely be used by all subsequent transactions.
  47. Within a transaction, #mdb_get() and #mdb_put() can store single
  48. key/value pairs if that is all you need to do (but see \ref Cursors
  49. below if you want to do more).
  50. A key/value pair is expressed as two #MDB_val structures. This struct
  51. has two fields, \c mv_size and \c mv_data. The data is a \c void pointer to
  52. an array of \c mv_size bytes.
  53. Because LMDB is very efficient (and usually zero-copy), the data returned
  54. in an #MDB_val structure may be memory-mapped straight from disk. In
  55. other words <b>look but do not touch</b> (or free() for that matter).
  56. Once a transaction is closed, the values can no longer be used, so
  57. make a copy if you need to keep them after that.
  58. @section Cursors Cursors
  59. To do more powerful things, we must use a cursor.
  60. Within the transaction, a cursor can be created with #mdb_cursor_open().
  61. With this cursor we can store/retrieve/delete (multiple) values using
  62. #mdb_cursor_get(), #mdb_cursor_put(), and #mdb_cursor_del().
  63. #mdb_cursor_get() positions itself depending on the cursor operation
  64. requested, and for some operations, on the supplied key. For example,
  65. to list all key/value pairs in a database, use operation #MDB_FIRST for
  66. the first call to #mdb_cursor_get(), and #MDB_NEXT on subsequent calls,
  67. until the end is hit.
  68. To retrieve all keys starting from a specified key value, use #MDB_SET.
  69. For more cursor operations, see the \ref mdb docs.
  70. When using #mdb_cursor_put(), either the function will position the
  71. cursor for you based on the \b key, or you can use operation
  72. #MDB_CURRENT to use the current position of the cursor. Note that
  73. \b key must then match the current position's key.
  74. @subsection summary Summarizing the Opening
  75. So we have a cursor in a transaction which opened a database in an
  76. environment which is opened from a filesystem after it was
  77. separately created.
  78. Or, we create an environment, open it from a filesystem, create a
  79. transaction within it, open a database within that transaction,
  80. and create a cursor within all of the above.
  81. Got it?
  82. @section thrproc Threads and Processes
  83. LMDB uses POSIX locks on files, and these locks have issues if one
  84. process opens a file multiple times. Because of this, do not
  85. #mdb_env_open() a file multiple times from a single process. Instead,
  86. share the LMDB environment that has opened the file across all threads.
  87. Otherwise, if a single process opens the same environment multiple times,
  88. closing it once will remove all the locks held on it, and the other
  89. instances will be vulnerable to corruption from other processes.
  90. Also note that a transaction is tied to one thread by default using
  91. Thread Local Storage. If you want to pass read-only transactions across
  92. threads, you can use the #MDB_NOTLS option on the environment.
  93. @section txns Transactions, Rollbacks, etc.
  94. To actually get anything done, a transaction must be committed using
  95. #mdb_txn_commit(). Alternatively, all of a transaction's operations
  96. can be discarded using #mdb_txn_abort(). In a read-only transaction,
  97. any cursors will \b not automatically be freed. In a read-write
  98. transaction, all cursors will be freed and must not be used again.
  99. For read-only transactions, obviously there is nothing to commit to
  100. storage. The transaction still must eventually be aborted to close
  101. any database handle(s) opened in it, or committed to keep the
  102. database handles around for reuse in new transactions.
  103. In addition, as long as a transaction is open, a consistent view of
  104. the database is kept alive, which requires storage. A read-only
  105. transaction that no longer requires this consistent view should
  106. be terminated (committed or aborted) when the view is no longer
  107. needed (but see below for an optimization).
  108. There can be multiple simultaneously active read-only transactions
  109. but only one that can write. Once a single read-write transaction
  110. is opened, all further attempts to begin one will block until the
  111. first one is committed or aborted. This has no effect on read-only
  112. transactions, however, and they may continue to be opened at any time.
  113. @section dupkeys Duplicate Keys
  114. #mdb_get() and #mdb_put() respectively have no and only some support
  115. for multiple key/value pairs with identical keys. If there are multiple
  116. values for a key, #mdb_get() will only return the first value.
  117. When multiple values for one key are required, pass the #MDB_DUPSORT
  118. flag to #mdb_dbi_open(). In an #MDB_DUPSORT database, by default
  119. #mdb_put() will not replace the value for a key if the key existed
  120. already. Instead it will add the new value to the key. In addition,
  121. #mdb_del() will pay attention to the value field too, allowing for
  122. specific values of a key to be deleted.
  123. Finally, additional cursor operations become available for
  124. traversing through and retrieving duplicate values.
  125. @section optim Some Optimization
  126. If you frequently begin and abort read-only transactions, as an
  127. optimization, it is possible to only reset and renew a transaction.
  128. #mdb_txn_reset() releases any old copies of data kept around for
  129. a read-only transaction. To reuse this reset transaction, call
  130. #mdb_txn_renew() on it. Any cursors in this transaction must also
  131. be renewed using #mdb_cursor_renew().
  132. Note that #mdb_txn_reset() is similar to #mdb_txn_abort() and will
  133. close any databases you opened within the transaction.
  134. To permanently free a transaction, reset or not, use #mdb_txn_abort().
  135. @section cleanup Cleaning Up
  136. For read-only transactions, any cursors created within it must
  137. be closed using #mdb_cursor_close().
  138. It is very rarely necessary to close a database handle, and in
  139. general they should just be left open.
  140. @section onward The Full API
  141. The full \ref mdb documentation lists further details, like how to:
  142. \li size a database (the default limits are intentionally small)
  143. \li drop and clean a database
  144. \li detect and report errors
  145. \li optimize (bulk) loading speed
  146. \li (temporarily) reduce robustness to gain even more speed
  147. \li gather statistics about the database
  148. \li define custom sort orders
  149. */