PyMongo 3.11.4 Documentation¶
Overview¶
PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. This documentation attempts to explain everything you need to know to use PyMongo.
- Installing / Upgrading
- Instructions on how to get the distribution.
- Tutorial
- Start here for a quick overview.
- Examples
- Examples of how to perform specific tasks.
- Using PyMongo with MongoDB Atlas
- Using PyMongo with MongoDB Atlas.
- TLS/SSL and PyMongo
- Using PyMongo with TLS / SSL.
- Client-Side Field Level Encryption
- Using PyMongo with client side encryption.
- Frequently Asked Questions
- Some questions that come up often.
- PyMongo 3 Migration Guide
- A PyMongo 2.x to 3.x migration guide.
- Python 3 FAQ
- Frequently asked questions about python 3 support.
- Compatibility Policy
- Explanation of deprecations, and how to keep pace with changes in PyMongo’s API.
- API Documentation
- The complete API documentation, organized by module.
- Tools
- A listing of Python tools and libraries that have been written for MongoDB.
- Developer Guide
- Developer guide for contributors to PyMongo.
Getting Help¶
If you’re having trouble or have questions about PyMongo, ask your question on our MongoDB Community Forum. You may also want to consider a commercial support subscription. Once you get an answer, it’d be great if you could work it back into this documentation and contribute!
Issues¶
All issues should be reported (and can be tracked / voted for / commented on) at the main MongoDB JIRA bug tracker, in the “Python Driver” project.
Feature Requests / Feedback¶
Use our feedback engine to send us feature requests and general feedback about PyMongo.
Contributing¶
PyMongo has a large community and contributions are always encouraged. Contributions can be as simple as minor tweaks to this documentation. To contribute, fork the project on GitHub and send a pull request.
Changes¶
See the Changelog for a full list of changes to PyMongo. For older versions of the documentation please see the archive list.
About This Documentation¶
This documentation is generated using the Sphinx documentation generator. The source files for the documentation are located in the doc/ directory of the PyMongo distribution. To generate the docs locally run the following command from the root directory of the PyMongo source:
$ python setup.py doc
Indices and tables¶
Using PyMongo with MongoDB Atlas¶
Atlas is MongoDB, Inc.’s hosted MongoDB as a
service offering. To connect to Atlas, pass the connection string provided by
Atlas to MongoClient
:
client = pymongo.MongoClient(<Atlas connection string>)
Connections to Atlas require TLS/SSL. For connections using TLS/SSL, PyMongo may require third party dependencies as determined by your version of Python. With PyMongo 3.3+, you can install PyMongo 3.3+ and any TLS/SSL-related dependencies using the following pip command:
$ python -m pip install pymongo[tls]
Starting with PyMongo 3.11 this installs PyOpenSSL, requests and service_identity for users of Python versions older than 2.7.9. PyOpenSSL supports SNI for these old Python versions, allowing applictions to connect to Altas free and shared tier instances.
Earlier versions of PyMongo require you to manually install the dependencies. For a list of TLS/SSL-related dependencies, see TLS/SSL and PyMongo.
Note
Connecting to Atlas “Free Tier” or “Shared Cluster” instances requires Server Name Indication (SNI) support. SNI support requires CPython 2.7.9 / PyPy 2.5.1 or newer or PyMongo 3.11+ with PyOpenSSL. To check if your version of Python supports SNI run the following command:
$ python -c "import ssl; print(getattr(ssl, 'HAS_SNI', False))"
You should see “True”.
Warning
Industry best practices recommend, and some regulations require, the use of TLS 1.1 or newer. Though no application changes are required for PyMongo to make use of the newest protocols, some operating systems or versions may not provide an OpenSSL version new enough to support them.
Users of macOS older than 10.13 (High Sierra) will need to install Python from python.org, homebrew, macports, or another similar source.
Users of Linux or other non-macOS Unix can check their OpenSSL version like this:
$ openssl version
If the version number is less than 1.0.1 support for TLS 1.1 or newer is not available. Contact your operating system vendor for a solution or upgrade to a newer distribution.
You can check your Python interpreter by installing the requests module and executing the following command:
python -c "import requests; print(requests.get('https://www.howsmyssl.com/a/check', verify=False).json()['tls_version'])"
You should see “TLS 1.X” where X is >= 1.
You can read more about TLS versions and their security implications here:
Installing / Upgrading¶
PyMongo is in the Python Package Index.
Warning
Do not install the “bson” package from pypi. PyMongo comes with its own bson package; doing “pip install bson” or “easy_install bson” installs a third-party package that is incompatible with PyMongo.
Installing with pip¶
We recommend using pip to install pymongo on all platforms:
$ python -m pip install pymongo
To get a specific version of pymongo:
$ python -m pip install pymongo==3.5.1
To upgrade using pip:
$ python -m pip install --upgrade pymongo
Note
pip does not support installing python packages in .egg format. If you would like to install PyMongo from a .egg provided on pypi use easy_install instead.
Installing with easy_install¶
To use easy_install
from
setuptools do:
$ python -m easy_install pymongo
To upgrade do:
$ python -m easy_install -U pymongo
Dependencies¶
PyMongo supports CPython 2.7, 3.4+, PyPy, and PyPy3.5+.
Optional dependencies:
GSSAPI authentication requires pykerberos on Unix or WinKerberos on Windows. The correct dependency can be installed automatically along with PyMongo:
$ python -m pip install pymongo[gssapi]
MONGODB-AWS authentication requires pymongo-auth-aws:
$ python -m pip install pymongo[aws]
Support for mongodb+srv:// URIs requires dnspython:
$ python -m pip install pymongo[srv]
TLS / SSL support may require ipaddress and certifi or wincertstore depending on the Python version in use. The necessary dependencies can be installed along with PyMongo:
$ python -m pip install pymongo[tls]
Note
Users of Python versions older than 2.7.9 will also receive the dependencies for OCSP when using the tls extra.
OCSP requires PyOpenSSL, requests and service_identity:
$ python -m pip install pymongo[ocsp]
Wire protocol compression with snappy requires python-snappy:
$ python -m pip install pymongo[snappy]
Wire protocol compression with zstandard requires zstandard:
$ python -m pip install pymongo[zstd]
Client-Side Field Level Encryption requires pymongocrypt:
$ python -m pip install pymongo[encryption]
You can install all dependencies automatically with the following command:
$ python -m pip install pymongo[gssapi,aws,ocsp,snappy,srv,tls,zstd,encryption]
Other optional packages:
- backports.pbkdf2, improves authentication performance with SCRAM-SHA-1 and SCRAM-SHA-256. It especially improves performance on Python versions older than 2.7.8.
- monotonic adds support for a monotonic clock, which improves reliability in environments where clock adjustments are frequent. Not needed in Python 3.
Installing from source¶
If you’d rather install directly from the source (i.e. to stay on the bleeding edge), install the C extension dependencies then check out the latest source from GitHub and install the driver from the resulting tree:
$ git clone git://github.com/mongodb/mongo-python-driver.git pymongo
$ cd pymongo/
$ python setup.py install
Installing from source on Unix¶
To build the optional C extensions on Linux or another non-macOS Unix you must have the GNU C compiler (gcc) installed. Depending on your flavor of Unix (or Linux distribution) you may also need a python development package that provides the necessary header files for your version of Python. The package name may vary from distro to distro.
Debian and Ubuntu users should issue the following command:
$ sudo apt-get install build-essential python-dev
Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:
$ sudo yum install gcc python-devel
Installing from source on macOS / OSX¶
If you want to install PyMongo with C extensions from source you will need the command line developer tools. On modern versions of macOS they can be installed by running the following in Terminal (found in /Applications/Utilities/):
xcode-select --install
For older versions of OSX you may need Xcode. See the notes below for various OSX and Xcode versions.
Snow Leopard (10.6) - Xcode 3 with ‘UNIX Development Support’.
Snow Leopard Xcode 4: The Python versions shipped with OSX 10.6.x are universal binaries. They support i386, PPC, and x86_64. Xcode 4 removed support for PPC, causing the distutils version shipped with Apple’s builds of Python to fail to build the C extensions if you have Xcode 4 installed. There is a workaround:
# For some Python builds from python.org
$ env ARCHFLAGS='-arch i386 -arch x86_64' python -m easy_install pymongo
See http://bugs.python.org/issue11623 for a more detailed explanation.
Lion (10.7) and newer - PyMongo’s C extensions can be built against versions of Python 2.7 >= 2.7.4 or Python 3.4+ downloaded from python.org. In all cases Xcode must be installed with ‘UNIX Development Support’.
Xcode 5.1: Starting with version 5.1 the version of clang that ships with Xcode throws an error when it encounters compiler flags it doesn’t recognize. This may cause C extension builds to fail with an error similar to:
clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]
There are workarounds:
# Apple specified workaround for Xcode 5.1
# easy_install
$ ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future easy_install pymongo
# or pip
$ ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install pymongo
# Alternative workaround using CFLAGS
# easy_install
$ CFLAGS=-Qunused-arguments easy_install pymongo
# or pip
$ CFLAGS=-Qunused-arguments pip install pymongo
Installing from source on Windows¶
If you want to install PyMongo with C extensions from source the following requirements apply to both CPython and ActiveState’s ActivePython:
64-bit Windows¶
For Python 3.5 and newer install Visual Studio 2015. For Python 3.4 install Visual Studio 2010. You must use the full version of Visual Studio 2010 as Visual C++ Express does not provide 64-bit compilers. Make sure that you check the “x64 Compilers and Tools” option under Visual C++. For Python 2.7 install the Microsoft Visual C++ Compiler for Python 2.7.
32-bit Windows¶
For Python 3.5 and newer install Visual Studio 2015.
For Python 3.4 install Visual C++ 2010 Express.
For Python 2.7 install the Microsoft Visual C++ Compiler for Python 2.7
Installing Without C Extensions¶
By default, the driver attempts to build and install optional C extensions (used for increasing performance) when it is installed. If any extension fails to build the driver will be installed anyway but a warning will be printed.
If you wish to install PyMongo without the C extensions, even if the extensions build properly, it can be done using a command line option to setup.py:
$ python setup.py --no_ext install
Building PyMongo egg Packages¶
Some organizations do not allow compilers and other build tools on production systems. To install PyMongo on these systems with C extensions you may need to build custom egg packages. Make sure that you have installed the dependencies listed above for your operating system then run the following command in the PyMongo source directory:
$ python setup.py bdist_egg
The egg package can be found in the dist/ subdirectory. The file name will resemble “pymongo-3.6-py2.7-linux-x86_64.egg” but may have a different name depending on your platform and the version of python you use to compile.
Warning
These “binary distributions,” will only work on systems that resemble the environment on which you built the package. In other words, ensure that operating systems and versions of Python and architecture (i.e. “32” or “64” bit) match.
Copy this file to the target system and issue the following command to install the package:
$ sudo python -m easy_install pymongo-3.6-py2.7-linux-x86_64.egg
Installing a beta or release candidate¶
MongoDB, Inc. may occasionally tag a beta or release candidate for testing by the community before final release. These releases will not be uploaded to pypi but can be found on the GitHub tags page. They can be installed by passing the full URL for the tag to pip:
$ python -m pip install https://github.com/mongodb/mongo-python-driver/archive/3.11.0rc0.tar.gz
Tutorial¶
This tutorial is intended as an introduction to working with MongoDB and PyMongo.
Prerequisites¶
Before we start, make sure that you have the PyMongo distribution installed. In the Python shell, the following should run without raising an exception:
>>> import pymongo
This tutorial also assumes that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:
$ mongod
Making a Connection with MongoClient¶
The first step when working with PyMongo is to create a
MongoClient
to the running mongod
instance. Doing so is easy:
>>> from pymongo import MongoClient
>>> client = MongoClient()
The above code will connect on the default host and port. We can also specify the host and port explicitly, as follows:
>>> client = MongoClient('localhost', 27017)
Or use the MongoDB URI format:
>>> client = MongoClient('mongodb://localhost:27017/')
Getting a Database¶
A single instance of MongoDB can support multiple independent
databases. When
working with PyMongo you access databases using attribute style access
on MongoClient
instances:
>>> db = client.test_database
If your database name is such that using attribute style access won’t
work (like test-database
), you can use dictionary style access
instead:
>>> db = client['test-database']
Getting a Collection¶
A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in PyMongo works the same as getting a database:
>>> collection = db.test_collection
or (using dictionary style access):
>>> collection = db['test-collection']
An important note about collections (and databases) in MongoDB is that they are created lazily - none of the above commands have actually performed any operations on the MongoDB server. Collections and databases are created when the first document is inserted into them.
Documents¶
Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:
>>> import datetime
>>> post = {"author": "Mike",
... "text": "My first blog post!",
... "tags": ["mongodb", "python", "pymongo"],
... "date": datetime.datetime.utcnow()}
Note that documents can contain native Python types (like
datetime.datetime
instances) which will be automatically
converted to and from the appropriate BSON types.
Inserting a Document¶
To insert a document into a collection we can use the
insert_one()
method:
>>> posts = db.posts
>>> post_id = posts.insert_one(post).inserted_id
>>> post_id
ObjectId('...')
When a document is inserted a special key, "_id"
, is automatically
added if the document doesn’t already contain an "_id"
key. The value
of "_id"
must be unique across the
collection. insert_one()
returns an
instance of InsertOneResult
. For more information
on "_id"
, see the documentation on _id.
After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:
>>> db.list_collection_names()
[u'posts']
Getting a Single Document With find_one()
¶
The most basic type of query that can be performed in MongoDB is
find_one()
. This method returns a
single document matching a query (or None
if there are no
matches). It is useful when you know there is only one matching
document, or are only interested in the first match. Here we use
find_one()
to get the first
document from the posts collection:
>>> import pprint
>>> pprint.pprint(posts.find_one())
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'mongodb', u'python', u'pymongo'],
u'text': u'My first blog post!'}
The result is a dictionary matching the one that we inserted previously.
Note
The returned document contains an "_id"
, which was
automatically added on insert.
find_one()
also supports querying
on specific elements that the resulting document must match. To limit
our results to a document with author “Mike” we do:
>>> pprint.pprint(posts.find_one({"author": "Mike"}))
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'mongodb', u'python', u'pymongo'],
u'text': u'My first blog post!'}
If we try with a different author, like “Eliot”, we’ll get no result:
>>> posts.find_one({"author": "Eliot"})
>>>
Querying By ObjectId¶
We can also find a post by its _id
, which in our example is an ObjectId:
>>> post_id
ObjectId(...)
>>> pprint.pprint(posts.find_one({"_id": post_id}))
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'mongodb', u'python', u'pymongo'],
u'text': u'My first blog post!'}
Note that an ObjectId is not the same as its string representation:
>>> post_id_as_str = str(post_id)
>>> posts.find_one({"_id": post_id_as_str}) # No result
>>>
A common task in web applications is to get an ObjectId from the
request URL and find the matching document. It’s necessary in this
case to convert the ObjectId from a string before passing it to
find_one
:
from bson.objectid import ObjectId
# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
# Convert from string to ObjectId:
document = client.db.collection.find_one({'_id': ObjectId(post_id)})
A Note On Unicode Strings¶
You probably noticed that the regular Python strings we stored earlier look different when retrieved from the server (e.g. u’Mike’ instead of ‘Mike’). A short explanation is in order.
MongoDB stores data in BSON format. BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Regular strings (<type ‘str’>) are validated and stored unaltered. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.
Bulk Inserts¶
In order to make querying a little more interesting, let’s insert a
few more documents. In addition to inserting a single document, we can
also perform bulk insert operations, by passing a list as the
first argument to insert_many()
.
This will insert each document in the list, sending only a single
command to the server:
>>> new_posts = [{"author": "Mike",
... "text": "Another post!",
... "tags": ["bulk", "insert"],
... "date": datetime.datetime(2009, 11, 12, 11, 14)},
... {"author": "Eliot",
... "title": "MongoDB is fun",
... "text": "and pretty easy too!",
... "date": datetime.datetime(2009, 11, 10, 10, 45)}]
>>> result = posts.insert_many(new_posts)
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...')]
There are a couple of interesting things to note about this example:
- The result from
insert_many()
now returns twoObjectId
instances, one for each inserted document.new_posts[1]
has a different “shape” than the other posts - there is no"tags"
field and we’ve added a new field,"title"
. This is what we mean when we say that MongoDB is schema-free.
Querying for More Than One Document¶
To get more than a single document as the result of a query we use the
find()
method. find()
returns a
Cursor
instance, which allows us to iterate
over all matching documents. For example, we can iterate over every
document in the posts
collection:
>>> for post in posts.find():
... pprint.pprint(post)
...
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'mongodb', u'python', u'pymongo'],
u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'bulk', u'insert'],
u'text': u'Another post!'}
{u'_id': ObjectId('...'),
u'author': u'Eliot',
u'date': datetime.datetime(...),
u'text': u'and pretty easy too!',
u'title': u'MongoDB is fun'}
Just like we did with find_one()
,
we can pass a document to find()
to limit the returned results. Here, we get only those documents whose
author is “Mike”:
>>> for post in posts.find({"author": "Mike"}):
... pprint.pprint(post)
...
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'mongodb', u'python', u'pymongo'],
u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'bulk', u'insert'],
u'text': u'Another post!'}
Counting¶
If we just want to know how many documents match a query we can
perform a count_documents()
operation
instead of a full query. We can get a count of all of the documents
in a collection:
>>> posts.count_documents({})
3
or just of those documents that match a specific query:
>>> posts.count_documents({"author": "Mike"})
2
Range Queries¶
MongoDB supports many different types of advanced queries. As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author:
>>> d = datetime.datetime(2009, 11, 12, 12)
>>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
... pprint.pprint(post)
...
{u'_id': ObjectId('...'),
u'author': u'Eliot',
u'date': datetime.datetime(...),
u'text': u'and pretty easy too!',
u'title': u'MongoDB is fun'}
{u'_id': ObjectId('...'),
u'author': u'Mike',
u'date': datetime.datetime(...),
u'tags': [u'bulk', u'insert'],
u'text': u'Another post!'}
Here we use the special "$lt"
operator to do a range query, and
also call sort()
to sort the results
by author.
Indexing¶
Adding indexes can help accelerate certain queries and can also add additional functionality to querying and storing documents. In this example, we’ll demonstrate how to create a unique index on a key that rejects documents whose value for that key already exists in the index.
First, we’ll need to create the index:
>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
... unique=True)
>>> sorted(list(db.profiles.index_information()))
[u'_id_', u'user_id_1']
Notice that we have two indexes now: one is the index on _id
that MongoDB
creates automatically, and the other is the index on user_id
we just
created.
Now let’s set up some user profiles:
>>> user_profiles = [
... {'user_id': 211, 'name': 'Luke'},
... {'user_id': 212, 'name': 'Ziltoid'}]
>>> result = db.profiles.insert_many(user_profiles)
The index prevents us from inserting a document whose user_id
is already in
the collection:
>>> new_profile = {'user_id': 213, 'name': 'Drew'}
>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
>>> result = db.profiles.insert_one(new_profile) # This is fine.
>>> result = db.profiles.insert_one(duplicate_profile)
Traceback (most recent call last):
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }
See also
The MongoDB documentation on indexes
Examples¶
The examples in this section are intended to give in depth overviews of how to accomplish specific tasks with MongoDB and PyMongo.
Unless otherwise noted, all examples assume that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:
$ mongod
Aggregation Examples¶
There are several methods of performing aggregations in MongoDB. These examples cover the new aggregation framework, using map reduce and using the group method.
Setup¶
To start, we’ll insert some example data which we can perform aggregations on:
>>> from pymongo import MongoClient
>>> db = MongoClient().aggregation_example
>>> result = db.things.insert_many([{"x": 1, "tags": ["dog", "cat"]},
... {"x": 2, "tags": ["cat"]},
... {"x": 2, "tags": ["mouse", "cat", "dog"]},
... {"x": 3, "tags": []}])
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')]
Aggregation Framework¶
This example shows how to use the
aggregate()
method to use the aggregation
framework. We’ll perform a simple aggregation to count the number of
occurrences for each tag in the tags
array, across the entire collection.
To achieve this we need to pass in three operations to the pipeline.
First, we need to unwind the tags
array, then group by the tags and
sum them up, finally we sort by count.
As python dictionaries don’t maintain order you should use SON
or collections.OrderedDict
where explicit ordering is required
eg “$sort”:
Note
aggregate requires server version >= 2.1.0.
>>> from bson.son import SON
>>> pipeline = [
... {"$unwind": "$tags"},
... {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
... {"$sort": SON([("count", -1), ("_id", -1)])}
... ]
>>> import pprint
>>> pprint.pprint(list(db.things.aggregate(pipeline)))
[{u'_id': u'cat', u'count': 3},
{u'_id': u'dog', u'count': 2},
{u'_id': u'mouse', u'count': 1}]
To run an explain plan for this aggregation use the
command()
method:
>>> db.command('aggregate', 'things', pipeline=pipeline, explain=True)
{u'ok': 1.0, u'stages': [...]}
As well as simple aggregations the aggregation framework provides projection capabilities to reshape the returned data. Using projections and aggregation, you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.
See also
The full documentation for MongoDB’s aggregation framework
Map/Reduce¶
Another option for aggregation is to use the map reduce framework. Here we
will define map and reduce functions to also count the number of
occurrences for each tag in the tags
array, across the entire collection.
Our map function just emits a single (key, 1) pair for each tag in the array:
>>> from bson.code import Code
>>> mapper = Code("""
... function () {
... this.tags.forEach(function(z) {
... emit(z, 1);
... });
... }
... """)
The reduce function sums over all of the emitted values for a given key:
>>> reducer = Code("""
... function (key, values) {
... var total = 0;
... for (var i = 0; i < values.length; i++) {
... total += values[i];
... }
... return total;
... }
... """)
Note
We can’t just return values.length
as the reduce function
might be called iteratively on the results of other reduce steps.
Finally, we call map_reduce()
and
iterate over the result collection:
>>> result = db.things.map_reduce(mapper, reducer, "myresults")
>>> for doc in result.find().sort("_id"):
... pprint.pprint(doc)
...
{u'_id': u'cat', u'value': 3.0}
{u'_id': u'dog', u'value': 2.0}
{u'_id': u'mouse', u'value': 1.0}
Advanced Map/Reduce¶
PyMongo’s API supports all of the features of MongoDB’s map/reduce engine.
One interesting feature is the ability to get more detailed results when
desired, by passing full_response=True to
map_reduce()
. This returns the full
response to the map/reduce command, rather than just the result collection:
>>> pprint.pprint(
... db.things.map_reduce(mapper, reducer, "myresults", full_response=True))
{...u'ok': 1.0,... u'result': u'myresults'...}
All of the optional map/reduce parameters are also supported, simply pass them as keyword arguments. In this example we use the query parameter to limit the documents that will be mapped over:
>>> results = db.things.map_reduce(
... mapper, reducer, "myresults", query={"x": {"$lt": 2}})
>>> for doc in results.find().sort("_id"):
... pprint.pprint(doc)
...
{u'_id': u'cat', u'value': 1.0}
{u'_id': u'dog', u'value': 1.0}
You can use SON
or collections.OrderedDict
to
specify a different database to store the result collection:
>>> from bson.son import SON
>>> pprint.pprint(
... db.things.map_reduce(
... mapper,
... reducer,
... out=SON([("replace", "results"), ("db", "outdb")]),
... full_response=True))
{...u'ok': 1.0,... u'result': {u'collection': u'results', u'db': u'outdb'}...}
See also
The full list of options for MongoDB’s map reduce engine
Authentication Examples¶
MongoDB supports several different authentication mechanisms. These examples cover all authentication methods currently supported by PyMongo, documenting Python module and MongoDB version dependencies.
Percent-Escaping Username and Password¶
Username and password must be percent-escaped with
urllib.parse.quote_plus()
in Python 3, or urllib.quote_plus()
in
Python 2, to be used in a MongoDB URI. For example, in Python 3:
>>> from pymongo import MongoClient
>>> import urllib.parse
>>> username = urllib.parse.quote_plus('user')
>>> username
'user'
>>> password = urllib.parse.quote_plus('pass/word')
>>> password
'pass%2Fword'
>>> MongoClient('mongodb://%s:%s@127.0.0.1' % (username, password))
...
SCRAM-SHA-256 (RFC 7677)¶
New in version 3.7.
SCRAM-SHA-256 is the default authentication mechanism supported by a cluster
configured for authentication with MongoDB 4.0 or later. Authentication
requires a username, a password, and a database name. The default database
name is “admin”, this can be overridden with the authSource
option.
Credentials can be specified as arguments to
MongoClient
:
>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
... username='user',
... password='password',
... authSource='the_database',
... authMechanism='SCRAM-SHA-256')
Or through the MongoDB URI:
>>> uri = "mongodb://user:password@example.com/?authSource=the_database&authMechanism=SCRAM-SHA-256"
>>> client = MongoClient(uri)
SCRAM-SHA-1 (RFC 5802)¶
New in version 2.8.
SCRAM-SHA-1 is the default authentication mechanism supported by a cluster
configured for authentication with MongoDB 3.0 or later. Authentication
requires a username, a password, and a database name. The default database
name is “admin”, this can be overridden with the authSource
option.
Credentials can be specified as arguments to
MongoClient
:
>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
... username='user',
... password='password',
... authSource='the_database',
... authMechanism='SCRAM-SHA-1')
Or through the MongoDB URI:
>>> uri = "mongodb://user:password@example.com/?authSource=the_database&authMechanism=SCRAM-SHA-1"
>>> client = MongoClient(uri)
For best performance on Python versions older than 2.7.8 install backports.pbkdf2.
MONGODB-CR¶
Warning
MONGODB-CR was deprecated with the release of MongoDB 3.6 and is no longer supported by MongoDB 4.0.
Before MongoDB 3.0 the default authentication mechanism was MONGODB-CR, the “MongoDB Challenge-Response” protocol:
>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
... username='user',
... password='password',
... authMechanism='MONGODB-CR')
>>>
>>> uri = "mongodb://user:password@example.com/?authSource=the_database&authMechanism=MONGODB-CR"
>>> client = MongoClient(uri)
Default Authentication Mechanism¶
If no mechanism is specified, PyMongo automatically uses MONGODB-CR when connected to a pre-3.0 version of MongoDB, SCRAM-SHA-1 when connected to MongoDB 3.0 through 3.6, and negotiates the mechanism to use (SCRAM-SHA-1 or SCRAM-SHA-256) when connected to MongoDB 4.0+.
Default Database and “authSource”¶
You can specify both a default database and the authentication database in the URI:
>>> uri = "mongodb://user:password@example.com/default_db?authSource=admin"
>>> client = MongoClient(uri)
PyMongo will authenticate on the “admin” database, but the default database will be “default_db”:
>>> # get_database with no "name" argument chooses the DB from the URI
>>> db = MongoClient(uri).get_database()
>>> print(db.name)
'default_db'
MONGODB-X509¶
New in version 2.6.
The MONGODB-X509 mechanism authenticates a username derived from the distinguished subject name of the X.509 certificate presented by the driver during TLS/SSL negotiation. This authentication method requires the use of TLS/SSL connections with certificate validation and is available in MongoDB 2.6 and newer:
>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
... username="<X.509 derived username>"
... authMechanism="MONGODB-X509",
... tls=True,
... tlsCertificateKeyFile='/path/to/client.pem',
... tlsCAFile='/path/to/ca.pem')
MONGODB-X509 authenticates against the $external virtual database, so you do not have to specify a database in the URI:
>>> uri = "mongodb://<X.509 derived username>@example.com/?authMechanism=MONGODB-X509"
>>> client = MongoClient(uri,
... tls=True,
... tlsCertificateKeyFile='/path/to/client.pem',
... tlsCAFile='/path/to/ca.pem')
>>>
Changed in version 3.4: When connected to MongoDB >= 3.4 the username is no longer required.
GSSAPI (Kerberos)¶
New in version 2.5.
GSSAPI (Kerberos) authentication is available in the Enterprise Edition of MongoDB.
Unix¶
To authenticate using GSSAPI you must first install the python kerberos or pykerberos module using easy_install or pip. Make sure you run kinit before using the following authentication methods:
$ kinit mongodbuser@EXAMPLE.COM
mongodbuser@EXAMPLE.COM's Password:
$ klist
Credentials cache: FILE:/tmp/krb5cc_1000
Principal: mongodbuser@EXAMPLE.COM
Issued Expires Principal
Feb 9 13:48:51 2013 Feb 9 23:48:51 2013 krbtgt/EXAMPLE.COM@EXAMPLE.COM
Now authenticate using the MongoDB URI. GSSAPI authenticates against the $external virtual database so you do not have to specify a database in the URI:
>>> # Note: the kerberos principal must be url encoded.
>>> from pymongo import MongoClient
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@mongo-server.example.com/?authMechanism=GSSAPI"
>>> client = MongoClient(uri)
>>>
The default service name used by MongoDB and PyMongo is mongodb. You can
specify a custom service name with the authMechanismProperties
option:
>>> from pymongo import MongoClient
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@mongo-server.example.com/?authMechanism=GSSAPI&authMechanismProperties=SERVICE_NAME:myservicename"
>>> client = MongoClient(uri)
Windows (SSPI)¶
New in version 3.3.
First install the winkerberos module. Unlike authentication on Unix kinit is not used. If the user to authenticate is different from the user that owns the application process provide a password to authenticate:
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM:mongodbuserpassword@example.com/?authMechanism=GSSAPI"
Two extra authMechanismProperties
are supported on Windows platforms:
CANONICALIZE_HOST_NAME - Uses the fully qualified domain name (FQDN) of the MongoDB host for the server principal (GSSAPI libraries on Unix do this by default):
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@example.com/?authMechanism=GSSAPI&authMechanismProperties=CANONICALIZE_HOST_NAME:true"
SERVICE_REALM - This is used when the user’s realm is different from the service’s realm:
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@example.com/?authMechanism=GSSAPI&authMechanismProperties=SERVICE_REALM:otherrealm"
SASL PLAIN (RFC 4616)¶
New in version 2.6.
MongoDB Enterprise Edition version 2.6 and newer support the SASL PLAIN authentication mechanism, initially intended for delegating authentication to an LDAP server. Using the PLAIN mechanism is very similar to MONGODB-CR. These examples use the $external virtual database for LDAP support:
>>> from pymongo import MongoClient
>>> uri = "mongodb://user:password@example.com/?authMechanism=PLAIN"
>>> client = MongoClient(uri)
>>>
SASL PLAIN is a clear-text authentication mechanism. We strongly recommend that you connect to MongoDB using TLS/SSL with certificate validation when using the SASL PLAIN mechanism:
>>> from pymongo import MongoClient
>>> uri = "mongodb://user:password@example.com/?authMechanism=PLAIN"
>>> client = MongoClient(uri,
... tls=True,
... tlsCertificateKeyFile='/path/to/client.pem',
... tlsCAFile='/path/to/ca.pem')
>>>
MONGODB-AWS¶
New in version 3.11.
The MONGODB-AWS authentication mechanism is available in MongoDB 4.4+ and
requires extra pymongo dependencies. To use it, install pymongo with the
aws
extra:
$ python -m pip install 'pymongo[aws]'
The MONGODB-AWS mechanism authenticates using AWS IAM credentials (an access key ID and a secret access key), temporary AWS IAM credentials obtained from an AWS Security Token Service (STS) Assume Role request, AWS Lambda environment variables, or temporary AWS IAM credentials assigned to an EC2 instance or ECS task. The use of temporary credentials, in addition to an access key ID and a secret access key, also requires a security (or session) token.
Credentials can be configured through the MongoDB URI, environment variables, or the local EC2 or ECS endpoint. The order in which the client searches for credentials is:
- Credentials passed through the URI
- Environment variables
- ECS endpoint if and only if
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
is set. - EC2 endpoint
MONGODB-AWS authenticates against the “$external” virtual database, so none of
the URIs in this section need to include the authSource
URI option.
AWS IAM credentials¶
Applications can authenticate using AWS IAM credentials by providing a valid access key id and secret access key pair as the username and password, respectively, in the MongoDB URI. A sample URI would be:
>>> from pymongo import MongoClient
>>> uri = "mongodb://<access_key_id>:<secret_access_key>@localhost/?authMechanism=MONGODB-AWS"
>>> client = MongoClient(uri)
Note
The access_key_id and secret_access_key passed into the URI MUST be percent escaped.
AssumeRole¶
Applications can authenticate using temporary credentials returned from an assume role request. These temporary credentials consist of an access key ID, a secret access key, and a security token passed into the URI. A sample URI would be:
>>> from pymongo import MongoClient
>>> uri = "mongodb://<access_key_id>:<secret_access_key>@example.com/?authMechanism=MONGODB-AWS&authMechanismProperties=AWS_SESSION_TOKEN:<session_token>"
>>> client = MongoClient(uri)
Note
The access_key_id, secret_access_key, and session_token passed into the URI MUST be percent escaped.
AWS Lambda (Environment Variables)¶
When the username and password are not provided and the MONGODB-AWS mechanism
is set, the client will fallback to using the environment variables
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and AWS_SESSION_TOKEN
for the access key ID, secret access key, and session token, respectively:
$ export AWS_ACCESS_KEY_ID=<access_key_id>
$ export AWS_SECRET_ACCESS_KEY=<secret_access_key>
$ export AWS_SESSION_TOKEN=<session_token>
$ python
>>> from pymongo import MongoClient
>>> uri = "mongodb://example.com/?authMechanism=MONGODB-AWS"
>>> client = MongoClient(uri)
Note
No username, password, or session token is passed into the URI. PyMongo will use credentials set via the environment variables. These environment variables MUST NOT be percent escaped.
ECS Container¶
Applications can authenticate from an ECS container via temporary credentials assigned to the machine. A sample URI on an ECS container would be:
>>> from pymongo import MongoClient
>>> uri = "mongodb://localhost/?authMechanism=MONGODB-AWS"
>>> client = MongoClient(uri)
Note
No username, password, or session token is passed into the URI. PyMongo will query the ECS container endpoint to obtain these credentials.
EC2 Instance¶
Applications can authenticate from an EC2 instance via temporary credentials assigned to the machine. A sample URI on an EC2 machine would be:
>>> from pymongo import MongoClient
>>> uri = "mongodb://localhost/?authMechanism=MONGODB-AWS"
>>> client = MongoClient(uri)
Note
No username, password, or session token is passed into the URI. PyMongo will query the EC2 instance endpoint to obtain these credentials.
Collations¶
See also
The API docs for collation
.
Collations are a new feature in MongoDB version 3.4. They provide a set of rules to use when comparing strings that comply with the conventions of a particular language, such as Spanish or German. If no collation is specified, the server sorts strings based on a binary comparison. Many languages have specific ordering rules, and collations allow users to build applications that adhere to language-specific comparison rules.
In French, for example, the last accent in a given word determines the sorting order. The correct sorting order for the following four words in French is:
cote < côte < coté < côté
Specifying a French collation allows users to sort string fields using the French sort order.
Usage¶
Users can specify a collation for a collection, an index, or a CRUD command.
Collation Parameters:¶
Collations can be specified with the Collation
model
or with plain Python dictionaries. The structure is the same:
Collation(locale=<string>,
caseLevel=<bool>,
caseFirst=<string>,
strength=<int>,
numericOrdering=<bool>,
alternate=<string>,
maxVariable=<string>,
backwards=<bool>)
The only required parameter is locale
, which the server parses as
an ICU format locale ID.
For example, set locale
to en_US
to represent US English
or fr_CA
to represent Canadian French.
For a complete description of the available parameters, see the MongoDB manual.
Assign a Default Collation to a Collection¶
The following example demonstrates how to create a new collection called
contacts
and assign a default collation with the fr_CA
locale. This
operation ensures that all queries that are run against the contacts
collection use the fr_CA
collation unless another collation is explicitly
specified:
from pymongo import MongoClient
from pymongo.collation import Collation
db = MongoClient().test
collection = db.create_collection('contacts',
collation=Collation(locale='fr_CA'))
Assign a Default Collation to an Index¶
When creating a new index, you can specify a default collation.
The following example shows how to create an index on the name
field of the contacts
collection, with the unique
parameter
enabled and a default collation with locale
set to fr_CA
:
from pymongo import MongoClient
from pymongo.collation import Collation
contacts = MongoClient().test.contacts
contacts.create_index('name',
unique=True,
collation=Collation(locale='fr_CA'))
Specify a Collation for a Query¶
Individual queries can specify a collation to use when sorting
results. The following example demonstrates a query that runs on the
contacts
collection in database test
. It matches on
documents that contain New York
in the city
field,
and sorts on the name
field with the fr_CA
collation:
from pymongo import MongoClient
from pymongo.collation import Collation
collection = MongoClient().test.contacts
docs = collection.find({'city': 'New York'}).sort('name').collation(
Collation(locale='fr_CA'))
Other Query Types¶
You can use collations to control document matching rules for several different
types of queries. All the various update and delete methods
(update_one()
,
update_many()
,
delete_one()
, etc.) support collation, and
you can create query filters which employ collations to comply with any of the
languages and variants available to the locale
parameter.
The following example uses a collation with strength
set to
SECONDARY
, which considers only
the base character and character accents in string comparisons, but not case
sensitivity, for example. All documents in the contacts
collection with
jürgen
(case-insensitive) in the first_name
field are updated:
from pymongo import MongoClient
from pymongo.collation import Collation, CollationStrength
contacts = MongoClient().test.contacts
result = contacts.update_many(
{'first_name': 'jürgen'},
{'$set': {'verified': 1}},
collation=Collation(locale='de',
strength=CollationStrength.SECONDARY))
Copying a Database¶
To copy a database within a single mongod process, or between mongod
servers, simply connect to the target mongod and use the
command()
method:
>>> from pymongo import MongoClient
>>> client = MongoClient('target.example.com')
>>> client.admin.command('copydb',
fromdb='source_db_name',
todb='target_db_name')
To copy from a different mongod server that is not password-protected:
>>> client.admin.command('copydb',
fromdb='source_db_name',
todb='target_db_name',
fromhost='source.example.com')
If the target server is password-protected, authenticate to the “admin” database:
>>> client = MongoClient('target.example.com',
... username='administrator',
... password='pwd')
>>> client.admin.command('copydb',
fromdb='source_db_name',
todb='target_db_name',
fromhost='source.example.com')
See the authentication examples.
If the source server is password-protected, use the copyDatabase function in the mongo shell.
Versions of PyMongo before 3.0 included a copy_database
helper method,
but it has been removed.
Custom Type Example¶
This is an example of using a custom type with PyMongo. The example here shows
how to subclass TypeCodec
to write a type
codec, which is used to populate a TypeRegistry
.
The type registry can then be used to create a custom-type-aware
Collection
. Read and write operations
issued against the resulting collection object transparently manipulate
documents as they are saved to or retrieved from MongoDB.
Setting Up¶
We’ll start by getting a clean database to use for the example:
>>> from pymongo import MongoClient
>>> client = MongoClient()
>>> client.drop_database('custom_type_example')
>>> db = client.custom_type_example
Since the purpose of the example is to demonstrate working with custom types,
we’ll need a custom data type to use. For this example, we will be working with
the Decimal
type from Python’s standard library. Since the
BSON library’s Decimal128
type (that implements
the IEEE 754 decimal128 decimal-based floating-point numbering format) is
distinct from Python’s built-in Decimal
type, attempting
to save an instance of Decimal
with PyMongo, results in an
InvalidDocument
exception.
>>> from decimal import Decimal
>>> num = Decimal("45.321")
>>> db.test.insert_one({'num': num})
Traceback (most recent call last):
...
bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'), of type: <class 'decimal.Decimal'>
The TypeCodec
Class¶
New in version 3.8.
In order to encode a custom type, we must first define a type codec for
that type. A type codec describes how an instance of a custom type can be
transformed to and/or from one of the types bson
already understands.
Depending on the desired functionality, users must choose from the following
base classes when defining type codecs:
TypeEncoder
: subclass this to define a codec that encodes a custom Python type to a known BSON type. Users must implement thepython_type
property/attribute and thetransform_python
method.TypeDecoder
: subclass this to define a codec that decodes a specified BSON type into a custom Python type. Users must implement thebson_type
property/attribute and thetransform_bson
method.TypeCodec
: subclass this to define a codec that can both encode and decode a custom type. Users must implement thepython_type
andbson_type
properties/attributes, as well as thetransform_python
andtransform_bson
methods.
The type codec for our custom type simply needs to define how a
Decimal
instance can be converted into a
Decimal128
instance and vice-versa. Since we are
interested in both encoding and decoding our custom type, we use the
TypeCodec
base class to define our codec:
>>> from bson.decimal128 import Decimal128
>>> from bson.codec_options import TypeCodec
>>> class DecimalCodec(TypeCodec):
... python_type = Decimal # the Python type acted upon by this type codec
... bson_type = Decimal128 # the BSON type acted upon by this type codec
... def transform_python(self, value):
... """Function that transforms a custom type value into a type
... that BSON can encode."""
... return Decimal128(value)
... def transform_bson(self, value):
... """Function that transforms a vanilla BSON type value into our
... custom type."""
... return value.to_decimal()
>>> decimal_codec = DecimalCodec()
The TypeRegistry
Class¶
New in version 3.8.
Before we can begin encoding and decoding our custom type objects, we must
first inform PyMongo about the corresponding codec. This is done by creating
a TypeRegistry
instance:
>>> from bson.codec_options import TypeRegistry
>>> type_registry = TypeRegistry([decimal_codec])
Note that type registries can be instantiated with any number of type codecs. Once instantiated, registries are immutable and the only way to add codecs to a registry is to create a new one.
Putting It Together¶
Finally, we can define a CodecOptions
instance
with our type_registry
and use it to get a
Collection
object that understands the
Decimal
data type:
>>> from bson.codec_options import CodecOptions
>>> codec_options = CodecOptions(type_registry=type_registry)
>>> collection = db.get_collection('test', codec_options=codec_options)
Now, we can seamlessly encode and decode instances of
Decimal
:
>>> collection.insert_one({'num': Decimal("45.321")})
<pymongo.results.InsertOneResult object at ...>
>>> mydoc = collection.find_one()
>>> import pprint
>>> pprint.pprint(mydoc)
{u'_id': ObjectId('...'), u'num': Decimal('45.321')}
We can see what’s actually being saved to the database by creating a fresh collection object without the customized codec options and using that to query MongoDB:
>>> vanilla_collection = db.get_collection('test')
>>> pprint.pprint(vanilla_collection.find_one())
{u'_id': ObjectId('...'), u'num': Decimal128('45.321')}
Encoding Subtypes¶
Consider the situation where, in addition to encoding
Decimal
, we also need to encode a type that subclasses
Decimal
. PyMongo does this automatically for types that inherit from
Python types that are BSON-encodable by default, but the type codec system
described above does not offer the same flexibility.
Consider this subtype of Decimal
that has a method to return its value as
an integer:
>>> class DecimalInt(Decimal):
... def my_method(self):
... """Method implementing some custom logic."""
... return int(self)
If we try to save an instance of this type without first registering a type codec for it, we get an error:
>>> collection.insert_one({'num': DecimalInt("45.321")})
Traceback (most recent call last):
...
bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'), of type: <class 'decimal.Decimal'>
In order to proceed further, we must define a type codec for DecimalInt
.
This is trivial to do since the same transformation as the one used for
Decimal
is adequate for encoding DecimalInt
as well:
>>> class DecimalIntCodec(DecimalCodec):
... @property
... def python_type(self):
... """The Python type acted upon by this type codec."""
... return DecimalInt
>>> decimalint_codec = DecimalIntCodec()
Note
No attempt is made to modify decoding behavior because without additional
information, it is impossible to discern which incoming
Decimal128
value needs to be decoded as Decimal
and which needs to be decoded as DecimalInt
. This example only considers
the situation where a user wants to encode documents containing either
of these types.
After creating a new codec options object and using it to get a collection
object, we can seamlessly encode instances of DecimalInt
:
>>> type_registry = TypeRegistry([decimal_codec, decimalint_codec])
>>> codec_options = CodecOptions(type_registry=type_registry)
>>> collection = db.get_collection('test', codec_options=codec_options)
>>> collection.drop()
>>> collection.insert_one({'num': DecimalInt("45.321")})
<pymongo.results.InsertOneResult object at ...>
>>> mydoc = collection.find_one()
>>> pprint.pprint(mydoc)
{u'_id': ObjectId('...'), u'num': Decimal('45.321')}
Note that the transform_bson
method of the base codec class results in
these values being decoded as Decimal
(and not DecimalInt
).
Decoding Binary
Types¶
The decoding treatment of Binary
types having
subtype = 0
by the bson
module varies slightly depending on the
version of the Python runtime in use. This must be taken into account while
writing a TypeDecoder
that modifies how this datatype is decoded.
On Python 3.x, Binary
data (subtype = 0
) is decoded
as a bytes
instance:
>>> # On Python 3.x.
>>> from bson.binary import Binary
>>> newcoll = db.get_collection('new')
>>> newcoll.insert_one({'_id': 1, 'data': Binary(b"123", subtype=0)})
>>> doc = newcoll.find_one()
>>> type(doc['data'])
bytes
On Python 2.7.x, the same data is decoded as a Binary
instance:
>>> # On Python 2.7.x
>>> newcoll = db.get_collection('new')
>>> doc = newcoll.find_one()
>>> type(doc['data'])
bson.binary.Binary
As a consequence of this disparity, users must set the bson_type
attribute
on their TypeDecoder
classes differently,
depending on the python version in use.
Note
For codebases requiring compatibility with both Python 2 and 3, type
decoders will have to be registered for both possible bson_type
values.
The fallback_encoder
Callable¶
New in version 3.8.
In addition to type codecs, users can also register a callable to encode types
that BSON doesn’t recognize and for which no type codec has been registered.
This callable is the fallback encoder and like the transform_python
method, it accepts an unencodable value as a parameter and returns a
BSON-encodable value. The following fallback encoder encodes python’s
Decimal
type to a Decimal128
:
>>> def fallback_encoder(value):
... if isinstance(value, Decimal):
... return Decimal128(value)
... return value
After declaring the callback, we must create a type registry and codec options with this fallback encoder before it can be used for initializing a collection:
>>> type_registry = TypeRegistry(fallback_encoder=fallback_encoder)
>>> codec_options = CodecOptions(type_registry=type_registry)
>>> collection = db.get_collection('test', codec_options=codec_options)
>>> collection.drop()
We can now seamlessly encode instances of Decimal
:
>>> collection.insert_one({'num': Decimal("45.321")})
<pymongo.results.InsertOneResult object at ...>
>>> mydoc = collection.find_one()
>>> pprint.pprint(mydoc)
{u'_id': ObjectId('...'), u'num': Decimal128('45.321')}
Note
Fallback encoders are invoked after attempts to encode the given value with standard BSON encoders and any configured type encoders have failed. Therefore, in a type registry configured with a type encoder and fallback encoder that both target the same custom type, the behavior specified in the type encoder will prevail.
Because fallback encoders don’t need to declare the types that they encode
beforehand, they can be used to support interesting use-cases that cannot be
serviced by TypeEncoder
. One such use-case is described in the next
section.
Encoding Unknown Types¶
In this example, we demonstrate how a fallback encoder can be used to save
arbitrary objects to the database. We will use the the standard library’s
pickle
module to serialize the unknown types and so naturally, this
approach only works for types that are picklable.
We start by defining some arbitrary custom types:
class MyStringType(object):
def __init__(self, value):
self.__value = value
def __repr__(self):
return "MyStringType('%s')" % (self.__value,)
class MyNumberType(object):
def __init__(self, value):
self.__value = value
def __repr__(self):
return "MyNumberType(%s)" % (self.__value,)
We also define a fallback encoder that pickles whatever objects it receives
and returns them as Binary
instances with a custom
subtype. The custom subtype, in turn, allows us to write a TypeDecoder that
identifies pickled artifacts upon retrieval and transparently decodes them
back into Python objects:
import pickle
from bson.binary import Binary, USER_DEFINED_SUBTYPE
def fallback_pickle_encoder(value):
return Binary(pickle.dumps(value), USER_DEFINED_SUBTYPE)
class PickledBinaryDecoder(TypeDecoder):
bson_type = Binary
def transform_bson(self, value):
if value.subtype == USER_DEFINED_SUBTYPE:
return pickle.loads(value)
return value
Note
The above example is written assuming the use of Python 3. If you are using
Python 2, bson_type
must be set to Binary
. See the
Decoding Binary Types section for a detailed explanation.
Finally, we create a CodecOptions
instance:
codec_options = CodecOptions(type_registry=TypeRegistry(
[PickledBinaryDecoder()], fallback_encoder=fallback_pickle_encoder))
We can now round trip our custom objects to MongoDB:
collection = db.get_collection('test_fe', codec_options=codec_options)
collection.insert_one({'_id': 1, 'str': MyStringType("hello world"),
'num': MyNumberType(2)})
mydoc = collection.find_one()
assert isinstance(mydoc['str'], MyStringType)
assert isinstance(mydoc['num'], MyNumberType)
Limitations¶
PyMongo’s type codec and fallback encoder features have the following limitations:
- Users cannot customize the encoding behavior of Python types that PyMongo
already understands like
int
andstr
(the ‘built-in types’). Attempting to instantiate a type registry with one or more codecs that act upon a built-in type results in aTypeError
. This limitation extends to all subtypes of the standard types. - Chaining type encoders is not supported. A custom type value, once
transformed by a codec’s
transform_python
method, must result in a type that is either BSON-encodable by default, or can be transformed by the fallback encoder into something BSON-encodable–it cannot be transformed a second time by a different type codec. - The
command()
method does not apply the user’s TypeDecoders while decoding the command response document. gridfs
does not apply custom type encoding or decoding to any documents received from or to returned to the user.
Bulk Write Operations¶
This tutorial explains how to take advantage of PyMongo’s bulk write operation features. Executing write operations in batches reduces the number of network round trips, increasing write throughput.
Bulk Insert¶
New in version 2.6.
A batch of documents can be inserted by passing a list to the
insert_many()
method. PyMongo
will automatically split the batch into smaller sub-batches based on
the maximum message size accepted by MongoDB, supporting very large
bulk insert operations.
>>> import pymongo
>>> db = pymongo.MongoClient().bulk_example
>>> db.test.insert_many([{'i': i} for i in range(10000)]).inserted_ids
[...]
>>> db.test.count_documents({})
10000
Mixed Bulk Write Operations¶
New in version 2.7.
PyMongo also supports executing mixed bulk write operations. A batch of insert, update, and remove operations can be executed together using the bulk write operations API.
Ordered Bulk Write Operations¶
Ordered bulk write operations are batched and sent to the server in the
order provided for serial execution. The return value is an instance of
BulkWriteResult
describing the type and count
of operations performed.
>>> from pprint import pprint
>>> from pymongo import InsertOne, DeleteMany, ReplaceOne, UpdateOne
>>> result = db.test.bulk_write([
... DeleteMany({}), # Remove all documents from the previous example.
... InsertOne({'_id': 1}),
... InsertOne({'_id': 2}),
... InsertOne({'_id': 3}),
... UpdateOne({'_id': 1}, {'$set': {'foo': 'bar'}}),
... UpdateOne({'_id': 4}, {'$inc': {'j': 1}}, upsert=True),
... ReplaceOne({'j': 1}, {'j': 2})])
>>> pprint(result.bulk_api_result)
{'nInserted': 3,
'nMatched': 2,
'nModified': 2,
'nRemoved': 10000,
'nUpserted': 1,
'upserted': [{u'_id': 4, u'index': 5}],
'writeConcernErrors': [],
'writeErrors': []}
Warning
nModified
is only reported by MongoDB 2.6 and later. When
connected to an earlier server version, or in certain mixed version sharding
configurations, PyMongo omits this field from the results of a bulk
write operation.
The first write failure that occurs (e.g. duplicate key error) aborts the
remaining operations, and PyMongo raises
BulkWriteError
. The details
attibute of
the exception instance provides the execution results up until the failure
occurred and details about the failure - including the operation that caused
the failure.
>>> from pymongo import InsertOne, DeleteOne, ReplaceOne
>>> from pymongo.errors import BulkWriteError
>>> requests = [
... ReplaceOne({'j': 2}, {'i': 5}),
... InsertOne({'_id': 4}), # Violates the unique key constraint on _id.
... DeleteOne({'i': 5})]
>>> try:
... db.test.bulk_write(requests)
... except BulkWriteError as bwe:
... pprint(bwe.details)
...
{'nInserted': 0,
'nMatched': 1,
'nModified': 1,
'nRemoved': 0,
'nUpserted': 0,
'upserted': [],
'writeConcernErrors': [],
'writeErrors': [{u'code': 11000,
u'errmsg': u'...E11000...duplicate key error...',
u'index': 1,...
u'op': {'_id': 4}}]}
Unordered Bulk Write Operations¶
Unordered bulk write operations are batched and sent to the server in arbitrary order where they may be executed in parallel. Any errors that occur are reported after all operations are attempted.
In the next example the first and third operations fail due to the unique constraint on _id. Since we are doing unordered execution the second and fourth operations succeed.
>>> requests = [
... InsertOne({'_id': 1}),
... DeleteOne({'_id': 2}),
... InsertOne({'_id': 3}),
... ReplaceOne({'_id': 4}, {'i': 1})]
>>> try:
... db.test.bulk_write(requests, ordered=False)
... except BulkWriteError as bwe:
... pprint(bwe.details)
...
{'nInserted': 0,
'nMatched': 1,
'nModified': 1,
'nRemoved': 1,
'nUpserted': 0,
'upserted': [],
'writeConcernErrors': [],
'writeErrors': [{u'code': 11000,
u'errmsg': u'...E11000...duplicate key error...',
u'index': 0,...
u'op': {'_id': 1}},
{u'code': 11000,
u'errmsg': u'...E11000...duplicate key error...',
u'index': 2,...
u'op': {'_id': 3}}]}
Write Concern¶
Bulk operations are executed with the
write_concern
of the collection they
are executed against. Write concern errors (e.g. wtimeout) will be reported
after all operations are attempted, regardless of execution order.
- ::
>>> from pymongo import WriteConcern >>> coll = db.get_collection( ... 'test', write_concern=WriteConcern(w=3, wtimeout=1)) >>> try: ... coll.bulk_write([InsertOne({'a': i}) for i in range(4)]) ... except BulkWriteError as bwe: ... pprint(bwe.details) ... {'nInserted': 4, 'nMatched': 0, 'nModified': 0, 'nRemoved': 0, 'nUpserted': 0, 'upserted': [], 'writeConcernErrors': [{u'code': 64... u'errInfo': {u'wtimeout': True}, u'errmsg': u'waiting for replication timed out'}], 'writeErrors': []}
Datetimes and Timezones¶
These examples show how to handle Python datetime.datetime
objects
correctly in PyMongo.
Basic Usage¶
PyMongo uses datetime.datetime
objects for representing dates and times
in MongoDB documents. Because MongoDB assumes that dates and times are in UTC,
care should be taken to ensure that dates and times written to the database
reflect UTC. For example, the following code stores the current UTC date and
time into MongoDB:
>>> result = db.objects.insert_one(
... {"last_modified": datetime.datetime.utcnow()})
Always use datetime.datetime.utcnow()
, which returns the current time in
UTC, instead of datetime.datetime.now()
, which returns the current local
time. Avoid doing this:
>>> result = db.objects.insert_one(
... {"last_modified": datetime.datetime.now()})
The value for last_modified is very different between these two examples, even though both documents were stored at around the same local time. This will be confusing to the application that reads them:
>>> [doc['last_modified'] for doc in db.objects.find()]
[datetime.datetime(2015, 7, 8, 18, 17, 28, 324000),
datetime.datetime(2015, 7, 8, 11, 17, 42, 911000)]
bson.codec_options.CodecOptions
has a tz_aware option that enables
“aware” datetime.datetime
objects, i.e., datetimes that know what
timezone they’re in. By default, PyMongo retrieves naive datetimes:
>>> result = db.tzdemo.insert_one(
... {'date': datetime.datetime(2002, 10, 27, 6, 0, 0)})
>>> db.tzdemo.find_one()['date']
datetime.datetime(2002, 10, 27, 6, 0)
>>> options = CodecOptions(tz_aware=True)
>>> db.get_collection('tzdemo', codec_options=options).find_one()['date']
datetime.datetime(2002, 10, 27, 6, 0,
tzinfo=<bson.tz_util.FixedOffset object at 0x10583a050>)
Saving Datetimes with Timezones¶
When storing datetime.datetime
objects that specify a timezone
(i.e. they have a tzinfo property that isn’t None
), PyMongo will convert
those datetimes to UTC automatically:
>>> import pytz
>>> pacific = pytz.timezone('US/Pacific')
>>> aware_datetime = pacific.localize(
... datetime.datetime(2002, 10, 27, 6, 0, 0))
>>> result = db.times.insert_one({"date": aware_datetime})
>>> db.times.find_one()['date']
datetime.datetime(2002, 10, 27, 14, 0)
Reading Time¶
As previously mentioned, by default all datetime.datetime
objects
returned by PyMongo will be naive but reflect UTC (i.e. the time as stored in
MongoDB). By setting the tz_aware option on
CodecOptions
, datetime.datetime
objects
will be timezone-aware and have a tzinfo property that reflects the UTC
timezone.
PyMongo 3.1 introduced a tzinfo property that can be set on
CodecOptions
to convert datetime.datetime
objects to local time automatically. For example, if we wanted to read all times
out of MongoDB in US/Pacific time:
>>> from bson.codec_options import CodecOptions
>>> db.times.find_one()['date']
datetime.datetime(2002, 10, 27, 14, 0)
>>> aware_times = db.times.with_options(codec_options=CodecOptions(
... tz_aware=True,
... tzinfo=pytz.timezone('US/Pacific')))
>>> result = aware_times.find_one()
datetime.datetime(2002, 10, 27, 6, 0, # doctest: +NORMALIZE_WHITESPACE
tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)
Geospatial Indexing Example¶
This example shows how to create and use a GEO2D
index in PyMongo. To create a spherical (earth-like) geospatial index use GEOSPHERE
instead.
Creating a Geospatial Index¶
Creating a geospatial index in pymongo is easy:
>>> from pymongo import MongoClient, GEO2D
>>> db = MongoClient().geo_example
>>> db.places.create_index([("loc", GEO2D)])
u'loc_2d'
Inserting Places¶
Locations in MongoDB are represented using either embedded documents or lists where the first two elements are coordinates. Here, we’ll insert a couple of example locations:
>>> result = db.places.insert_many([{"loc": [2, 5]},
... {"loc": [30, 5]},
... {"loc": [1, 2]},
... {"loc": [4, 4]}])
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')]
Note
If specifying latitude and longitude coordinates in GEOSPHERE
, list the longitude first and then latitude.
Querying¶
Using the geospatial index we can find documents near another point:
>>> import pprint
>>> for doc in db.places.find({"loc": {"$near": [3, 6]}}).limit(3):
... pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}
Note
If using pymongo.GEOSPHERE
, using $nearSphere is recommended.
The $maxDistance operator requires the use of SON
:
>>> from bson.son import SON
>>> query = {"loc": SON([("$near", [3, 6]), ("$maxDistance", 100)])}
>>> for doc in db.places.find(query).limit(3):
... pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}
It’s also possible to query for all items within a given rectangle (specified by lower-left and upper-right coordinates):
>>> query = {"loc": {"$within": {"$box": [[2, 2], [5, 6]]}}}
>>> for doc in db.places.find(query).sort('_id'):
... pprint.pprint(doc)
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
Or circle (specified by center point and radius):
>>> query = {"loc": {"$within": {"$center": [[0, 0], 6]}}}
>>> for doc in db.places.find(query).sort('_id'):
... pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
geoNear queries are also supported using SON
:
>>> from bson.son import SON
>>> db.command(SON([('geoNear', 'places'), ('near', [1, 2])]))
{u'ok': 1.0, u'stats': ...}
Warning
Starting in MongoDB version 4.0, MongoDB deprecates the geoNear command. Use one of the following operations instead.
- $geoNear - aggregation stage.
- $near - query operator.
- $nearSphere - query operator.
Gevent¶
PyMongo supports Gevent. Simply call Gevent’s
monkey.patch_all()
before loading any other modules:
>>> # You must call patch_all() *before* importing any other modules
>>> from gevent import monkey
>>> _ = monkey.patch_all()
>>> from pymongo import MongoClient
>>> client = MongoClient()
PyMongo uses thread and socket functions from the Python standard library. Gevent’s monkey-patching replaces those standard functions so that PyMongo does asynchronous I/O with non-blocking sockets, and schedules operations on greenlets instead of threads.
Avoid blocking in Hub.join¶
By default, PyMongo uses threads to discover and monitor your servers’ topology
(see Health Monitoring). If you execute monkey.patch_all()
when
your application first begins, PyMongo automatically uses greenlets instead
of threads.
When shutting down, if your application calls join()
on
Gevent’s Hub
without first terminating these background
greenlets, the call to join()
blocks indefinitely. You
therefore must close or dereference any active
MongoClient
before exiting.
An example solution to this issue in some application frameworks is a signal handler to end background greenlets when your application receives SIGHUP:
import signal
def graceful_reload(signum, traceback):
"""Explicitly close some global MongoClient object."""
client.close()
signal.signal(signal.SIGHUP, graceful_reload)
Applications using uWSGI prior to 1.9.16 are affected by this issue,
or newer uWSGI versions with the -gevent-wait-for-hub
option.
See the uWSGI changelog for details.
GridFS Example¶
This example shows how to use gridfs
to store large binary
objects (e.g. files) in MongoDB.
See also
The API docs for gridfs
.
See also
This blog post for some motivation behind this API.
Setup¶
We start by creating a GridFS
instance to use:
>>> from pymongo import MongoClient
>>> import gridfs
>>>
>>> db = MongoClient().gridfs_example
>>> fs = gridfs.GridFS(db)
Every GridFS
instance is created with and will
operate on a specific Database
instance.
Saving and Retrieving Data¶
The simplest way to work with gridfs
is to use its key/value
interface (the put()
and
get()
methods). To write data to GridFS, use
put()
:
>>> a = fs.put(b"hello world")
put()
creates a new file in GridFS, and returns
the value of the file document’s "_id"
key. Given that "_id"
we can use get()
to get back the contents of the
file:
>>> fs.get(a).read()
'hello world'
get()
returns a file-like object, so we get the
file’s contents by calling read()
.
In addition to putting a str
as a GridFS file, we can also
put any file-like object (an object with a read()
method). GridFS will handle reading the file in chunk-sized segments
automatically. We can also add additional attributes to the file as
keyword arguments:
>>> b = fs.put(fs.get(a), filename="foo", bar="baz")
>>> out = fs.get(b)
>>> out.read()
'hello world'
>>> out.filename
u'foo'
>>> out.bar
u'baz'
>>> out.upload_date
datetime.datetime(...)
The attributes we set in put()
are stored in the
file document, and retrievable after calling
get()
. Some attributes (like "filename"
) are
special and are defined in the GridFS specification - see that
document for more details.
High Availability and PyMongo¶
PyMongo makes it easy to write highly available applications whether you use a single replica set or a large sharded cluster.
Connecting to a Replica Set¶
PyMongo makes working with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.
Starting a Replica Set¶
The main replica set documentation contains extensive information about setting up a new replica set or migrating an existing MongoDB setup, be sure to check that out. Here, we’ll just do the bare minimum to get a three node replica set setup locally.
Warning
Replica sets should always use multiple nodes in production - putting all set members on the same physical node is only recommended for testing and development.
We start three mongod
processes, each on a different port and with
a different dbpath, but all using the same replica set name “foo”.
$ mkdir -p /data/db0 /data/db1 /data/db2
$ mongod --port 27017 --dbpath /data/db0 --replSet foo
$ mongod --port 27018 --dbpath /data/db1 --replSet foo
$ mongod --port 27019 --dbpath /data/db2 --replSet foo
Initializing the Set¶
At this point all of our nodes are up and running, but the set has yet to be initialized. Until the set is initialized no node will become the primary, and things are essentially “offline”.
To initialize the set we need to connect to a single node and run the initiate command:
>>> from pymongo import MongoClient
>>> c = MongoClient('localhost', 27017)
Note
We could have connected to any of the other nodes instead, but only the node we initiate from is allowed to contain any initial data.
After connecting, we run the initiate command to get things started:
>>> config = {'_id': 'foo', 'members': [
... {'_id': 0, 'host': 'localhost:27017'},
... {'_id': 1, 'host': 'localhost:27018'},
... {'_id': 2, 'host': 'localhost:27019'}]}
>>> c.admin.command("replSetInitiate", config)
{'ok': 1.0, ...}
The three mongod
servers we started earlier will now coordinate
and come online as a replica set.
Connecting to a Replica Set¶
The initial connection as made above is a special case for an
uninitialized replica set. Normally we’ll want to connect
differently. A connection to a replica set can be made using the
MongoClient()
constructor, specifying
one or more members of the set, along with the replica set name. Any of
the following connects to the replica set we just created:
>>> MongoClient('localhost', replicaset='foo')
MongoClient(host=['localhost:27017'], replicaset='foo', ...)
>>> MongoClient('localhost:27018', replicaset='foo')
MongoClient(['localhost:27018'], replicaset='foo', ...)
>>> MongoClient('localhost', 27019, replicaset='foo')
MongoClient(['localhost:27019'], replicaset='foo', ...)
>>> MongoClient('mongodb://localhost:27017,localhost:27018/?replicaSet=foo')
MongoClient(['localhost:27017', 'localhost:27018'], replicaset='foo', ...)
The addresses passed to MongoClient()
are called
the seeds. As long as at least one of the seeds is online, MongoClient
discovers all the members in the replica set, and determines which is the
current primary and which are secondaries or arbiters. Each seed must be the
address of a single mongod. Multihomed and round robin DNS addresses are
not supported.
The MongoClient
constructor is non-blocking:
the constructor returns immediately while the client connects to the replica
set using background threads. Note how, if you create a client and immediately
print the string representation of its
nodes
attribute, the list may be
empty initially. If you wait a moment, MongoClient discovers the whole replica
set:
>>> from time import sleep
>>> c = MongoClient(replicaset='foo'); print(c.nodes); sleep(0.1); print(c.nodes)
frozenset([])
frozenset([(u'localhost', 27019), (u'localhost', 27017), (u'localhost', 27018)])
You need not wait for replica set discovery in your application, however.
If you need to do any operation with a MongoClient, such as a
find()
or an
insert_one()
, the client waits to discover
a suitable member before it attempts the operation.
Handling Failover¶
When a failover occurs, PyMongo will automatically attempt to find the new primary node and perform subsequent operations on that node. This can’t happen completely transparently, however. Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to the replica set and perform a couple of basic operations:
>>> db = MongoClient("localhost", replicaSet='foo').test
>>> db.test.insert_one({"x": 1}).inserted_id
ObjectId('...')
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
By checking the host and port, we can see that we’re connected to localhost:27017, which is the current primary:
>>> db.client.address
('localhost', 27017)
Now let’s bring down that node and see what happens when we run our query again:
>>> db.test.find_one()
Traceback (most recent call last):
pymongo.errors.AutoReconnect: ...
We get an AutoReconnect
exception. This means
that the driver was not able to connect to the old primary (which
makes sense, as we killed the server), but that it will attempt to
automatically reconnect on subsequent operations. When this exception
is raised our application code needs to decide whether to retry the
operation or to simply continue, accepting the fact that the operation
might have failed.
On subsequent attempts to run the query we might continue to see this exception. Eventually, however, the replica set will failover and elect a new primary (this should take no more than a couple of seconds in general). At that point the driver will connect to the new primary and the operation will succeed:
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
>>> db.client.address
('localhost', 27018)
Bring the former primary back up. It will rejoin the set as a secondary. Now we can move to the next section: distributing reads to secondaries.
Secondary Reads¶
By default an instance of MongoClient sends queries to the primary member of the replica set. To use secondaries for queries we have to change the read preference:
>>> client = MongoClient(
... 'localhost:27017',
... replicaSet='foo',
... readPreference='secondaryPreferred')
>>> client.read_preference
SecondaryPreferred(tag_sets=None)
Now all queries will be sent to the secondary members of the set. If there are
no secondary members the primary will be used as a fallback. If you have
queries you would prefer to never send to the primary you can specify that
using the secondary
read preference.
By default the read preference of a Database
is
inherited from its MongoClient, and the read preference of a
Collection
is inherited from its Database. To use
a different read preference use the
get_database()
method, or the
get_collection()
method:
>>> from pymongo import ReadPreference
>>> client.read_preference
SecondaryPreferred(tag_sets=None)
>>> db = client.get_database('test', read_preference=ReadPreference.SECONDARY)
>>> db.read_preference
Secondary(tag_sets=None)
>>> coll = db.get_collection('test', read_preference=ReadPreference.PRIMARY)
>>> coll.read_preference
Primary()
You can also change the read preference of an existing
Collection
with the
with_options()
method:
>>> coll2 = coll.with_options(read_preference=ReadPreference.NEAREST)
>>> coll.read_preference
Primary()
>>> coll2.read_preference
Nearest(tag_sets=None)
Note that since most database commands can only be sent to the primary of a
replica set, the command()
method does not obey
the Database’s read_preference
, but you can
pass an explicit read preference to the method:
>>> db.command('dbstats', read_preference=ReadPreference.NEAREST)
{...}
Reads are configured using three options: read preference, tag sets, and local threshold.
Read preference:
Read preference is configured using one of the classes from
read_preferences
(Primary
,
PrimaryPreferred
,
Secondary
,
SecondaryPreferred
, or
Nearest
). For convenience, we also provide
ReadPreference
with the following
attributes:
PRIMARY
: Read from the primary. This is the default read preference, and provides the strongest consistency. If no primary is available, raiseAutoReconnect
.PRIMARY_PREFERRED
: Read from the primary if available, otherwise read from a secondary.SECONDARY
: Read from a secondary. If no matching secondary is available, raiseAutoReconnect
.SECONDARY_PREFERRED
: Read from a secondary if available, otherwise from the primary.NEAREST
: Read from any available member.
Tag sets:
Replica-set members can be tagged according to any
criteria you choose. By default, PyMongo ignores tags when
choosing a member to read from, but your read preference can be configured with
a tag_sets
parameter. tag_sets
must be a list of dictionaries, each
dict providing tag values that the replica set member must match.
PyMongo tries each set of tags in turn until it finds a set of
tags with at least one matching member. For example, to prefer reads from the
New York data center, but fall back to the San Francisco data center, tag your
replica set members according to their location and create a
MongoClient like so:
>>> from pymongo.read_preferences import Secondary
>>> db = client.get_database(
... 'test', read_preference=Secondary([{'dc': 'ny'}, {'dc': 'sf'}]))
>>> db.read_preference
Secondary(tag_sets=[{'dc': 'ny'}, {'dc': 'sf'}])
MongoClient tries to find secondaries in New York, then San Francisco,
and raises AutoReconnect
if none are available. As an
additional fallback, specify a final, empty tag set, {}
, which means “read
from any member that matches the mode, ignoring tags.”
See read_preferences
for more information.
Local threshold:
If multiple members match the read preference and tag sets, PyMongo reads
from among the nearest members, chosen according to ping time. By default,
only members whose ping times are within 15 milliseconds of the nearest
are used for queries. You can choose to distribute reads among members with
higher latencies by setting localThresholdMS
to a larger
number:
>>> client = pymongo.MongoClient(
... replicaSet='repl0',
... readPreference='secondaryPreferred',
... localThresholdMS=35)
In this case, PyMongo distributes reads among matching members within 35 milliseconds of the closest member’s ping time.
Note
localThresholdMS
is ignored when talking to a
replica set through a mongos. The equivalent is the localThreshold command
line option.
When MongoClient is initialized it launches background threads to monitor the replica set for changes in:
- Health: detect when a member goes down or comes up, or if a different member becomes primary
- Configuration: detect when members are added or removed, and detect changes in members’ tags
- Latency: track a moving average of each member’s ping time
Replica-set monitoring ensures queries are continually routed to the proper members as the state of the replica set changes.
mongos Load Balancing¶
An instance of MongoClient
can be configured
with a list of addresses of mongos servers:
>>> client = MongoClient('mongodb://host1,host2,host3')
Each member of the list must be a single mongos server. Multihomed and round robin DNS addresses are not supported. The client continuously monitors all the mongoses’ availability, and its network latency to each.
PyMongo distributes operations evenly among the set of mongoses within its
localThresholdMS
(similar to how it distributes reads to secondaries
in a replica set). By default the threshold is 15 ms.
The lowest-latency server, and all servers with latencies no more than
localThresholdMS
beyond the lowest-latency server’s, receive
operations equally. For example, if we have three mongoses:
- host1: 20 ms
- host2: 35 ms
- host3: 40 ms
By default the localThresholdMS
is 15 ms, so PyMongo uses host1 and host2
evenly. It uses host1 because its network latency to the driver is shortest. It
uses host2 because its latency is within 15 ms of the lowest-latency server’s.
But it excuses host3: host3 is 20ms beyond the lowest-latency server.
If we set localThresholdMS
to 30 ms all servers are within the threshold:
>>> client = MongoClient('mongodb://host1,host2,host3/?localThresholdMS=30')
Warning
Do not connect PyMongo to a pool of mongos instances through a load balancer. A single socket connection must always be routed to the same mongos instance for proper cursor support.
PyMongo and mod_wsgi¶
To run your application under mod_wsgi, follow these guidelines:
- Run
mod_wsgi
in daemon mode with theWSGIDaemonProcess
directive. - Assign each application to a separate daemon with
WSGIProcessGroup
. - Use
WSGIApplicationGroup %{GLOBAL}
to ensure your application is running in the daemon’s main Python interpreter, not a sub interpreter.
For example, this mod_wsgi
configuration ensures an application runs in the
main interpreter:
<VirtualHost *>
WSGIDaemonProcess my_process
WSGIScriptAlias /my_app /path/to/app.wsgi
WSGIProcessGroup my_process
WSGIApplicationGroup %{GLOBAL}
</VirtualHost>
If you have multiple applications that use PyMongo, put each in a separate daemon, still in the global application group:
<VirtualHost *>
WSGIDaemonProcess my_process
WSGIScriptAlias /my_app /path/to/app.wsgi
<Location /my_app>
WSGIProcessGroup my_process
</Location>
WSGIDaemonProcess my_other_process
WSGIScriptAlias /my_other_app /path/to/other_app.wsgi
<Location /my_other_app>
WSGIProcessGroup my_other_process
</Location>
WSGIApplicationGroup %{GLOBAL}
</VirtualHost>
Background: mod_wsgi
can run in “embedded” mode when only WSGIScriptAlias
is set, or “daemon” mode with WSGIDaemonProcess. In daemon mode, mod_wsgi
can run your application in the Python main interpreter, or in sub interpreters.
The correct way to run a PyMongo application is in daemon mode, using the main
interpreter.
Python C extensions in general have issues running in multiple
Python sub interpreters. These difficulties are explained in the documentation for
Py_NewInterpreter
and in the Multiple Python Sub Interpreters
section of the mod_wsgi
documentation.
Beginning with PyMongo 2.7, the C extension for BSON detects when it is running
in a sub interpreter and activates a workaround, which adds a small cost to
BSON decoding. To avoid this cost, use WSGIApplicationGroup %{GLOBAL}
to
ensure your application runs in the main interpreter.
Since your program runs in the main interpreter it should not share its process with any other applications, lest they interfere with each other’s state. Each application should have its own daemon process, as shown in the example above.
Server Selector Example¶
Users can exert fine-grained control over the server selection algorithm
by setting the server_selector option on the MongoClient
to an appropriate callable. This example shows how to use this functionality
to prefer servers running on localhost
.
Warning
Use of custom server selector functions is a power user feature. Misusing custom server selectors can have unintended consequences such as degraded read/write performance.
Example: Selecting Servers Running on localhost
¶
To start, we need to write the server selector function that will be used.
The server selector function should accept a list of
ServerDescription
objects and return a
list of server descriptions that are suitable for the read or write operation.
A server selector must not create or modify
ServerDescription
objects, and must return
the selected instances unchanged.
In this example, we write a server selector that prioritizes servers running on
localhost
. This can be desirable when using a sharded cluster with multiple
mongos
, as locally run queries are likely to see lower latency and higher
throughput. Please note, however, that it is highly dependent on the
application if preferring localhost
is beneficial or not.
In addition to comparing the hostname with localhost
, our server selector
function accounts for the edge case when no servers are running on
localhost
. In this case, we allow the default server selection logic to
prevail by passing through the received server description list unchanged.
Failure to do this would render the client unable to communicate with MongoDB
in the event that no servers were running on localhost
.
The described server selection logic is implemented in the following server selector function:
>>> def server_selector(server_descriptions):
... servers = [
... server for server in server_descriptions
... if server.address[0] == 'localhost'
... ]
... if not servers:
... return server_descriptions
... return servers
Finally, we can create a MongoClient
instance with this
server selector.
>>> client = MongoClient(server_selector=server_selector)
Server Selection Process¶
This section dives deeper into the server selection process for reads and writes. In the case of a write, the driver performs the following operations (in order) during the selection process:
- Select all writeable servers from the list of known hosts. For a replica set this is the primary, while for a sharded cluster this is all the known mongoses.
- Apply the user-defined server selector function. Note that the custom server selector is not called if there are no servers left from the previous filtering stage.
- Apply the
localThresholdMS
setting to the list of remaining hosts. This whittles the host list down to only contain servers whose latency is at mostlocalThresholdMS
milliseconds higher than the lowest observed latency. - Select a server at random from the remaining host list. The desired operation is then performed against the selected server.
In the case of reads the process is identical except for the first step.
Here, instead of selecting all writeable servers, we select all servers
matching the user’s ReadPreference
from the
list of known hosts. As an example, for a 3-member replica set with a
Secondary
read preference, we would select
all available secondaries.
Tailable Cursors¶
By default, MongoDB will automatically close a cursor when the client has exhausted all results in the cursor. However, for capped collections you may use a tailable cursor that remains open after the client exhausts the results in the initial cursor.
The following is a basic example of using a tailable cursor to tail the oplog of a replica set member:
import time
import pymongo
client = pymongo.MongoClient()
oplog = client.local.oplog.rs
first = oplog.find().sort('$natural', pymongo.ASCENDING).limit(-1).next()
print(first)
ts = first['ts']
while True:
# For a regular capped collection CursorType.TAILABLE_AWAIT is the
# only option required to create a tailable cursor. When querying the
# oplog, the oplog_replay option enables an optimization to quickly
# find the 'ts' value we're looking for. The oplog_replay option
# can only be used when querying the oplog. Starting in MongoDB 4.4
# this option is ignored by the server as queries against the oplog
# are optimized automatically by the MongoDB query engine.
cursor = oplog.find({'ts': {'$gt': ts}},
cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
oplog_replay=True)
while cursor.alive:
for doc in cursor:
ts = doc['ts']
print(doc)
# We end up here if the find() returned no documents or if the
# tailable cursor timed out (no new documents were added to the
# collection for more than 1 second).
time.sleep(1)
TLS/SSL and PyMongo¶
PyMongo supports connecting to MongoDB over TLS/SSL. This guide covers the configuration options supported by PyMongo. See the server documentation to configure MongoDB.
Dependencies¶
For connections using TLS/SSL, PyMongo may require third party dependencies as determined by your version of Python. With PyMongo 3.3+, you can install PyMongo 3.3+ and any TLS/SSL-related dependencies using the following pip command:
$ python -m pip install pymongo[tls]
Starting with PyMongo 3.11 this installs PyOpenSSL, requests and service_identity for users of Python versions older than 2.7.9. PyOpenSSL supports SNI for these old Python versions allowing applictions to connect to Altas free and shared tier instances.
Earlier versions of PyMongo require you to manually install the dependencies listed below.
Python 2.x¶
The ipaddress module is required on all platforms.
When using CPython < 2.7.9 or PyPy < 2.5.1:
- On Windows, the wincertstore module is required.
- On all other platforms, the certifi module is required.
Warning
Industry best practices recommend, and some regulations require, the use of TLS 1.1 or newer. Though no application changes are required for PyMongo to make use of the newest protocols, some operating systems or versions may not provide an OpenSSL version new enough to support them.
Users of macOS older than 10.13 (High Sierra) will need to install Python from python.org, homebrew, macports, or another similar source.
Users of Linux or other non-macOS Unix can check their OpenSSL version like this:
$ openssl version
If the version number is less than 1.0.1 support for TLS 1.1 or newer is not available. Contact your operating system vendor for a solution or upgrade to a newer distribution.
You can check your Python interpreter by installing the requests module and executing the following command:
python -c "import requests; print(requests.get('https://www.howsmyssl.com/a/check', verify=False).json()['tls_version'])"
You should see “TLS 1.X” where X is >= 1.
You can read more about TLS versions and their security implications here:
Basic configuration¶
In many cases connecting to MongoDB over TLS/SSL requires nothing more than
passing tls=True
as a keyword argument to
MongoClient
:
>>> client = pymongo.MongoClient('example.com', tls=True)
Or passing tls=true
in the URI:
>>> client = pymongo.MongoClient('mongodb://example.com/?tls=true')
This configures PyMongo to connect to the server using TLS, verify the server’s certificate and verify that the host you are attempting to connect to is listed by that certificate.
Certificate verification policy¶
By default, PyMongo is configured to require a certificate from the server when
TLS is enabled. This is configurable using the tlsAllowInvalidCertificates
option. To disable this requirement pass tlsAllowInvalidCertificates=True
as a keyword parameter:
>>> client = pymongo.MongoClient('example.com',
... tls=True,
... tlsAllowInvalidCertificates=True)
Or, in the URI:
>>> uri = 'mongodb://example.com/?tls=true&tlsAllowInvalidCertificates=true'
>>> client = pymongo.MongoClient(uri)
Specifying a CA file¶
In some cases you may want to configure PyMongo to use a specific set of CA
certificates. This is most often the case when you are acting as your own
certificate authority rather than using server certificates signed by a well
known authority. The tlsCAFile
option takes a path to a CA file. It can be
passed as a keyword argument:
>>> client = pymongo.MongoClient('example.com',
... tls=True,
... tlsCAFile='/path/to/ca.pem')
Or, in the URI:
>>> uri = 'mongodb://example.com/?tls=true&tlsCAFile=/path/to/ca.pem'
>>> client = pymongo.MongoClient(uri)
Specifying a certificate revocation list¶
Python 2.7.9+ (pypy 2.5.1+) and 3.4+ provide support for certificate revocation
lists. The tlsCRLFile
option takes a path to a CRL file. It can be passed
as a keyword argument:
>>> client = pymongo.MongoClient('example.com',
... tls=True,
... tlsCRLFile='/path/to/crl.pem')
Or, in the URI:
>>> uri = 'mongodb://example.com/?tls=true&tlsCRLFile=/path/to/crl.pem'
>>> client = pymongo.MongoClient(uri)
Note
Certificate revocation lists and OCSP cannot be used together.
Client certificates¶
PyMongo can be configured to present a client certificate using the
tlsCertificateKeyFile
option:
>>> client = pymongo.MongoClient('example.com',
... tls=True,
... tlsCertificateKeyFile='/path/to/client.pem')
If the private key for the client certificate is stored in a separate file use
the ssl_keyfile
option:
>>> client = pymongo.MongoClient('example.com',
... tls=True,
... tlsCertificateKeyFile='/path/to/client.pem',
... ssl_keyfile='/path/to/key.pem')
Python 2.7.9+ (pypy 2.5.1+) and 3.3+ support providing a password or passphrase
to decrypt encrypted private keys. Use the tlsCertificateKeyFilePassword
option:
>>> client = pymongo.MongoClient('example.com',
... tls=True,
... tlsCertificateKeyFile='/path/to/client.pem',
... ssl_keyfile='/path/to/key.pem',
... tlsCertificateKeyFilePassword=<passphrase>)
These options can also be passed as part of the MongoDB URI.
OCSP¶
Starting with PyMongo 3.11, if PyMongo was installed with the “ocsp” extra:
python -m pip install pymongo[ocsp]
certificate revocation checking is enabled by way of OCSP (Online Certification Status Protocol). MongoDB 4.4+ staples OCSP responses to the TLS handshake which PyMongo will verify, failing the TLS handshake if the stapled OCSP response is invalid or indicates that the peer certificate is revoked.
When connecting to a server version older than 4.4, or when a 4.4+ version of MongoDB does not staple an OCSP response, PyMongo will attempt to connect directly to an OCSP endpoint if the peer certificate specified one. The TLS handshake will only fail in this case if the response indicates that the certificate is revoked. Invalid or malformed responses will be ignored, favoring availability over maximum security.
Troubleshooting TLS Errors¶
TLS errors often fall into three categories - certificate verification failure, protocol version mismatch or certificate revocation checking failure. An error message similar to the following means that OpenSSL was not able to verify the server’s certificate:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
This often occurs because OpenSSL does not have access to the system’s root certificates or the certificates are out of date. Linux users should ensure that they have the latest root certificate updates installed from their Linux vendor. macOS users using Python 3.6.0 or newer downloaded from python.org may have to run a script included with python to install root certificates:
open "/Applications/Python <YOUR PYTHON VERSION>/Install Certificates.command"
Users of older PyPy portable versions may have to set an environment variable to tell OpenSSL where to find root certificates. This is easily done using the certifi module from pypi:
$ pypy -m pip install certifi
$ export SSL_CERT_FILE=$(pypy -c "import certifi; print(certifi.where())")
An error message similar to the following message means that the OpenSSL version used by Python does not support a new enough TLS protocol to connect to the server:
[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version
Industry best practices recommend, and some regulations require, that older TLS protocols be disabled in some MongoDB deployments. Some deployments may disable TLS 1.0, others may disable TLS 1.0 and TLS 1.1. See the warning earlier in this document for troubleshooting steps and solutions.
An error message similar to the following message means that certificate revocation checking failed:
[('SSL routines', 'tls_process_initial_server_flight', 'invalid status response')]
See OCSP for more details.
Client-Side Field Level Encryption¶
New in MongoDB 4.2, client-side field level encryption allows an application to encrypt specific data fields in addition to pre-existing MongoDB encryption features such as Encryption at Rest and TLS/SSL (Transport Encryption).
With field level encryption, applications can encrypt fields in documents prior to transmitting data over the wire to the server. Client-side field level encryption supports workloads where applications must guarantee that unauthorized parties, including server administrators, cannot read the encrypted data.
Dependencies¶
To get started using client-side field level encryption in your project, you will need to install the pymongocrypt library as well as the driver itself. Install both the driver and a compatible version of pymongocrypt like this:
$ python -m pip install 'pymongo[encryption]'
Note that installing on Linux requires pip 19 or later for manylinux2010 wheel support. For more information about installing pymongocrypt see the installation instructions on the project’s PyPI page.
mongocryptd¶
The mongocryptd
binary is required for automatic client-side encryption
and is included as a component in the MongoDB Enterprise Server package.
For detailed installation instructions see
the MongoDB documentation on mongocryptd.
mongocryptd
performs the following:
- Parses the automatic encryption rules specified to the database connection.
If the JSON schema contains invalid automatic encryption syntax or any
document validation syntax,
mongocryptd
returns an error. - Uses the specified automatic encryption rules to mark fields in read and write operations for encryption.
- Rejects read/write operations that may return unexpected or incorrect results when applied to an encrypted field. For supported and unsupported operations, see Read/Write Support with Automatic Field Level Encryption.
A MongoClient configured with auto encryption will automatically spawn the
mongocryptd
process from the application’s PATH
. Applications can
control the spawning behavior as part of the automatic encryption options.
For example to set the path to the mongocryptd
process:
auto_encryption_opts = AutoEncryptionOpts(
...,
mongocryptd_spawn_path='/path/to/mongocryptd')
To control the logging output of mongocryptd
pass options using
mongocryptd_spawn_args
:
auto_encryption_opts = AutoEncryptionOpts(
...,
mongocryptd_spawn_args=['--logpath=/path/to/mongocryptd.log', '--logappend'])
If your application wishes to manage the mongocryptd
process manually,
it is possible to disable spawning mongocryptd
:
auto_encryption_opts = AutoEncryptionOpts(
...,
mongocryptd_bypass_spawn=True,
# URI of the local ``mongocryptd`` process.
mongocryptd_uri='mongodb://localhost:27020')
mongocryptd
is only responsible for supporting automatic client-side field
level encryption and does not itself perform any encryption or decryption.
Automatic Client-Side Field Level Encryption¶
Automatic client-side field level encryption is enabled by creating a
MongoClient
with the auto_encryption_opts
option set to an instance of
AutoEncryptionOpts
. The following
examples show how to setup automatic client-side field level encryption
using ClientEncryption
to create a new
encryption data key.
Note
Automatic client-side field level encryption requires MongoDB 4.2 enterprise or a MongoDB 4.2 Atlas cluster. The community version of the server supports automatic decryption as well as Explicit Encryption.
The following example shows how to specify automatic encryption rules via the
schema_map
option. The automatic encryption rules are expressed using a
strict subset of the JSON Schema syntax.
Supplying a schema_map
provides more security than relying on
JSON Schemas obtained from the server. It protects against a
malicious server advertising a false JSON Schema, which could trick
the client into sending unencrypted data that should be encrypted.
JSON Schemas supplied in the schema_map
only apply to configuring
automatic client-side field level encryption. Other validation
rules in the JSON schema will not be enforced by the driver and
will result in an error.:
import os
from bson.codec_options import CodecOptions
from bson import json_util
from pymongo import MongoClient
from pymongo.encryption import (Algorithm,
ClientEncryption)
from pymongo.encryption_options import AutoEncryptionOpts
def create_json_schema_file(kms_providers, key_vault_namespace,
key_vault_client):
client_encryption = ClientEncryption(
kms_providers,
key_vault_namespace,
key_vault_client,
# The CodecOptions class used for encrypting and decrypting.
# This should be the same CodecOptions instance you have configured
# on MongoClient, Database, or Collection. We will not be calling
# encrypt() or decrypt() in this example so we can use any
# CodecOptions.
CodecOptions())
# Create a new data key and json schema for the encryptedField.
# https://dochub.mongodb.org/core/client-side-field-level-encryption-automatic-encryption-rules
data_key_id = client_encryption.create_data_key(
'local', key_alt_names=['pymongo_encryption_example_1'])
schema = {
"properties": {
"encryptedField": {
"encrypt": {
"keyId": [data_key_id],
"bsonType": "string",
"algorithm":
Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic
}
}
},
"bsonType": "object"
}
# Use CANONICAL_JSON_OPTIONS so that other drivers and tools will be
# able to parse the MongoDB extended JSON file.
json_schema_string = json_util.dumps(
schema, json_options=json_util.CANONICAL_JSON_OPTIONS)
with open('jsonSchema.json', 'w') as file:
file.write(json_schema_string)
def main():
# The MongoDB namespace (db.collection) used to store the
# encrypted documents in this example.
encrypted_namespace = "test.coll"
# This must be the same master key that was used to create
# the encryption key.
local_master_key = os.urandom(96)
kms_providers = {"local": {"key": local_master_key}}
# The MongoDB namespace (db.collection) used to store
# the encryption data keys.
key_vault_namespace = "encryption.__pymongoTestKeyVault"
key_vault_db_name, key_vault_coll_name = key_vault_namespace.split(".", 1)
# The MongoClient used to access the key vault (key_vault_namespace).
key_vault_client = MongoClient()
key_vault = key_vault_client[key_vault_db_name][key_vault_coll_name]
# Ensure that two data keys cannot share the same keyAltName.
key_vault.drop()
key_vault.create_index(
"keyAltNames",
unique=True,
partialFilterExpression={"keyAltNames": {"$exists": True}})
create_json_schema_file(
kms_providers, key_vault_namespace, key_vault_client)
# Load the JSON Schema and construct the local schema_map option.
with open('jsonSchema.json', 'r') as file:
json_schema_string = file.read()
json_schema = json_util.loads(json_schema_string)
schema_map = {encrypted_namespace: json_schema}
auto_encryption_opts = AutoEncryptionOpts(
kms_providers, key_vault_namespace, schema_map=schema_map)
client = MongoClient(auto_encryption_opts=auto_encryption_opts)
db_name, coll_name = encrypted_namespace.split(".", 1)
coll = client[db_name][coll_name]
# Clear old data
coll.drop()
coll.insert_one({"encryptedField": "123456789"})
print('Decrypted document: %s' % (coll.find_one(),))
unencrypted_coll = MongoClient()[db_name][coll_name]
print('Encrypted document: %s' % (unencrypted_coll.find_one(),))
if __name__ == "__main__":
main()
The MongoDB 4.2 server supports using schema validation to enforce encryption
of specific fields in a collection. This schema validation will prevent an
application from inserting unencrypted values for any fields marked with the
"encrypt"
JSON schema keyword.
The following example shows how to setup automatic client-side field level
encryption using
ClientEncryption
to create a new encryption
data key and create a collection with the
Automatic Encryption JSON Schema Syntax:
import os
from bson.codec_options import CodecOptions
from bson.binary import STANDARD
from pymongo import MongoClient
from pymongo.encryption import (Algorithm,
ClientEncryption)
from pymongo.encryption_options import AutoEncryptionOpts
from pymongo.errors import OperationFailure
from pymongo.write_concern import WriteConcern
def main():
# The MongoDB namespace (db.collection) used to store the
# encrypted documents in this example.
encrypted_namespace = "test.coll"
# This must be the same master key that was used to create
# the encryption key.
local_master_key = os.urandom(96)
kms_providers = {"local": {"key": local_master_key}}
# The MongoDB namespace (db.collection) used to store
# the encryption data keys.
key_vault_namespace = "encryption.__pymongoTestKeyVault"
key_vault_db_name, key_vault_coll_name = key_vault_namespace.split(".", 1)
# The MongoClient used to access the key vault (key_vault_namespace).
key_vault_client = MongoClient()
key_vault = key_vault_client[key_vault_db_name][key_vault_coll_name]
# Ensure that two data keys cannot share the same keyAltName.
key_vault.drop()
key_vault.create_index(
"keyAltNames",
unique=True,
partialFilterExpression={"keyAltNames": {"$exists": True}})
client_encryption = ClientEncryption(
kms_providers,
key_vault_namespace,
key_vault_client,
# The CodecOptions class used for encrypting and decrypting.
# This should be the same CodecOptions instance you have configured
# on MongoClient, Database, or Collection. We will not be calling
# encrypt() or decrypt() in this example so we can use any
# CodecOptions.
CodecOptions())
# Create a new data key and json schema for the encryptedField.
data_key_id = client_encryption.create_data_key(
'local', key_alt_names=['pymongo_encryption_example_2'])
json_schema = {
"properties": {
"encryptedField": {
"encrypt": {
"keyId": [data_key_id],
"bsonType": "string",
"algorithm":
Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic
}
}
},
"bsonType": "object"
}
auto_encryption_opts = AutoEncryptionOpts(
kms_providers, key_vault_namespace)
client = MongoClient(auto_encryption_opts=auto_encryption_opts)
db_name, coll_name = encrypted_namespace.split(".", 1)
db = client[db_name]
# Clear old data
db.drop_collection(coll_name)
# Create the collection with the encryption JSON Schema.
db.create_collection(
coll_name,
# uuid_representation=STANDARD is required to ensure that any
# UUIDs in the $jsonSchema document are encoded to BSON Binary
# with the standard UUID subtype 4. This is only needed when
# running the "create" collection command with an encryption
# JSON Schema.
codec_options=CodecOptions(uuid_representation=STANDARD),
write_concern=WriteConcern(w="majority"),
validator={"$jsonSchema": json_schema})
coll = client[db_name][coll_name]
coll.insert_one({"encryptedField": "123456789"})
print('Decrypted document: %s' % (coll.find_one(),))
unencrypted_coll = MongoClient()[db_name][coll_name]
print('Encrypted document: %s' % (unencrypted_coll.find_one(),))
try:
unencrypted_coll.insert_one({"encryptedField": "123456789"})
except OperationFailure as exc:
print('Unencrypted insert failed: %s' % (exc.details,))
if __name__ == "__main__":
main()
Explicit Encryption¶
Explicit encryption is a MongoDB community feature and does not use the
mongocryptd
process. Explicit encryption is provided by the
ClientEncryption
class, for example:
import os
from pymongo import MongoClient
from pymongo.encryption import (Algorithm,
ClientEncryption)
def main():
# This must be the same master key that was used to create
# the encryption key.
local_master_key = os.urandom(96)
kms_providers = {"local": {"key": local_master_key}}
# The MongoDB namespace (db.collection) used to store
# the encryption data keys.
key_vault_namespace = "encryption.__pymongoTestKeyVault"
key_vault_db_name, key_vault_coll_name = key_vault_namespace.split(".", 1)
# The MongoClient used to read/write application data.
client = MongoClient()
coll = client.test.coll
# Clear old data
coll.drop()
# Set up the key vault (key_vault_namespace) for this example.
key_vault = client[key_vault_db_name][key_vault_coll_name]
# Ensure that two data keys cannot share the same keyAltName.
key_vault.drop()
key_vault.create_index(
"keyAltNames",
unique=True,
partialFilterExpression={"keyAltNames": {"$exists": True}})
client_encryption = ClientEncryption(
kms_providers,
key_vault_namespace,
# The MongoClient to use for reading/writing to the key vault.
# This can be the same MongoClient used by the main application.
client,
# The CodecOptions class used for encrypting and decrypting.
# This should be the same CodecOptions instance you have configured
# on MongoClient, Database, or Collection.
coll.codec_options)
# Create a new data key for the encryptedField.
data_key_id = client_encryption.create_data_key(
'local', key_alt_names=['pymongo_encryption_example_3'])
# Explicitly encrypt a field:
encrypted_field = client_encryption.encrypt(
"123456789",
Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
key_id=data_key_id)
coll.insert_one({"encryptedField": encrypted_field})
doc = coll.find_one()
print('Encrypted document: %s' % (doc,))
# Explicitly decrypt the field:
doc["encryptedField"] = client_encryption.decrypt(doc["encryptedField"])
print('Decrypted document: %s' % (doc,))
# Cleanup resources.
client_encryption.close()
client.close()
if __name__ == "__main__":
main()
Explicit Encryption with Automatic Decryption¶
Although automatic encryption requires MongoDB 4.2 enterprise or a
MongoDB 4.2 Atlas cluster, automatic decryption is supported for all users.
To configure automatic decryption without automatic encryption set
bypass_auto_encryption=True
in
AutoEncryptionOpts
:
import os
from pymongo import MongoClient
from pymongo.encryption import (Algorithm,
ClientEncryption)
from pymongo.encryption_options import AutoEncryptionOpts
def main():
# This must be the same master key that was used to create
# the encryption key.
local_master_key = os.urandom(96)
kms_providers = {"local": {"key": local_master_key}}
# The MongoDB namespace (db.collection) used to store
# the encryption data keys.
key_vault_namespace = "encryption.__pymongoTestKeyVault"
key_vault_db_name, key_vault_coll_name = key_vault_namespace.split(".", 1)
# bypass_auto_encryption=True disable automatic encryption but keeps
# the automatic _decryption_ behavior. bypass_auto_encryption will
# also disable spawning mongocryptd.
auto_encryption_opts = AutoEncryptionOpts(
kms_providers, key_vault_namespace, bypass_auto_encryption=True)
client = MongoClient(auto_encryption_opts=auto_encryption_opts)
coll = client.test.coll
# Clear old data
coll.drop()
# Set up the key vault (key_vault_namespace) for this example.
key_vault = client[key_vault_db_name][key_vault_coll_name]
# Ensure that two data keys cannot share the same keyAltName.
key_vault.drop()
key_vault.create_index(
"keyAltNames",
unique=True,
partialFilterExpression={"keyAltNames": {"$exists": True}})
client_encryption = ClientEncryption(
kms_providers,
key_vault_namespace,
# The MongoClient to use for reading/writing to the key vault.
# This can be the same MongoClient used by the main application.
client,
# The CodecOptions class used for encrypting and decrypting.
# This should be the same CodecOptions instance you have configured
# on MongoClient, Database, or Collection.
coll.codec_options)
# Create a new data key for the encryptedField.
data_key_id = client_encryption.create_data_key(
'local', key_alt_names=['pymongo_encryption_example_4'])
# Explicitly encrypt a field:
encrypted_field = client_encryption.encrypt(
"123456789",
Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
key_alt_name='pymongo_encryption_example_4')
coll.insert_one({"encryptedField": encrypted_field})
# Automatically decrypts any encrypted fields.
doc = coll.find_one()
print('Decrypted document: %s' % (doc,))
unencrypted_coll = MongoClient().test.coll
print('Encrypted document: %s' % (unencrypted_coll.find_one(),))
# Cleanup resources.
client_encryption.close()
client.close()
if __name__ == "__main__":
main()
Handling UUID Data¶
PyMongo ships with built-in support for dealing with UUID types.
It is straightforward to store native uuid.UUID
objects
to MongoDB and retrieve them as native uuid.UUID
objects:
from pymongo import MongoClient
from bson.binary import UuidRepresentation
from uuid import uuid4
# use the 'standard' representation for cross-language compatibility.
client = MongoClient(uuid_representation=UuidRepresentation.STANDARD)
collection = client.get_database('uuid_db').get_collection('uuid_coll')
# remove all documents from collection
collection.delete_many({})
# create a native uuid object
uuid_obj = uuid4()
# save the native uuid object to MongoDB
collection.insert_one({'uuid': uuid_obj})
# retrieve the stored uuid object from MongoDB
document = collection.find_one({})
# check that the retrieved UUID matches the inserted UUID
assert document['uuid'] == uuid_obj
Native uuid.UUID
objects can also be used as part of MongoDB
queries:
document = collection.find({'uuid': uuid_obj})
assert document['uuid'] == uuid_obj
The above examples illustrate the simplest of use-cases - one where the
UUID is generated by, and used in the same application. However,
the situation can be significantly more complex when dealing with a MongoDB
deployment that contains UUIDs created by other drivers as the Java and CSharp
drivers have historically encoded UUIDs using a byte-order that is different
from the one used by PyMongo. Applications that require interoperability across
these drivers must specify the appropriate
UuidRepresentation
.
In the following sections, we describe how drivers have historically differed
in their encoding of UUIDs, and how applications can use the
UuidRepresentation
configuration option to maintain
cross-language compatibility.
Attention
New applications that do not share a MongoDB deployment with
any other application and that have never stored UUIDs in MongoDB
should use the standard
UUID representation for cross-language
compatibility. See Configuring a UUID Representation for details
on how to configure the UuidRepresentation
.
Legacy Handling of UUID Data¶
Historically, MongoDB Drivers have used different byte-ordering
while serializing UUID types to Binary
.
Consider, for instance, a UUID with the following canonical textual
representation:
00112233-4455-6677-8899-aabbccddeeff
This UUID would historically be serialized by the Python driver as:
00112233-4455-6677-8899-aabbccddeeff
The same UUID would historically be serialized by the C# driver as:
33221100-5544-7766-8899-aabbccddeeff
Finally, the same UUID would historically be serialized by the Java driver as:
77665544-3322-1100-ffee-ddccbbaa9988
Note
For in-depth information about the the byte-order historically used by different drivers, see the Handling of Native UUID Types Specification.
This difference in the byte-order of UUIDs encoded by different drivers can result in highly unintuitive behavior in some scenarios. We detail two such scenarios in the next sections.
Scenario 2: Round-Tripping UUIDs¶
In the following examples, we see how using a misconfigured
UuidRepresentation
can cause an application
to inadvertently change the Binary
subtype, and in some
cases, the bytes of the Binary
field itself when
round-tripping documents containing UUIDs.
Consider the following situation:
from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation
from uuid import uuid4
# Using UuidRepresentation.PYTHON_LEGACY stores a Binary subtype-3 UUID
python_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY)
input_uuid = uuid4()
collection = client.testdb.get_collection('test', codec_options=python_opts)
collection.insert_one({'_id': 'foo', 'uuid': input_uuid})
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)})['_id'] == 'foo'
# Retrieving this document using UuidRepresentation.STANDARD returns a native UUID
std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD)
std_collection = client.testdb.get_collection('test', codec_options=std_opts)
doc = std_collection.find_one({'_id': 'foo'})
assert doc['uuid'] == input_uuid
# Round-tripping the retrieved document silently changes the Binary subtype to 4
std_collection.replace_one({'_id': 'foo'}, doc)
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None
round_tripped_doc = collection.find_one({'uuid': Binary(input_uuid.bytes, 4)})
assert doc == round_tripped_doc
In this example, round-tripping the document using the incorrect
UuidRepresentation
(STANDARD
instead of
PYTHON_LEGACY
) changes the Binary
subtype as a
side-effect. Note that this can also happen when the situation is reversed -
i.e. when the original document is written using ``STANDARD`` representation
and then round-tripped using the ``PYTHON_LEGACY`` representation.
In the next example, we see the consequences of incorrectly using a
representation that modifies byte-order (CSHARP_LEGACY
or JAVA_LEGACY
)
when round-tripping documents:
from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation
from uuid import uuid4
# Using UuidRepresentation.STANDARD stores a Binary subtype-4 UUID
std_opts = CodecOptions(uuid_representation=UuidRepresentation.STANDARD)
input_uuid = uuid4()
collection = client.testdb.get_collection('test', codec_options=std_opts)
collection.insert_one({'_id': 'baz', 'uuid': input_uuid})
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)})['_id'] == 'baz'
# Retrieving this document using UuidRepresentation.JAVA_LEGACY returns a native UUID
# without modifying the UUID byte-order
java_opts = CodecOptions(uuid_representation=UuidRepresentation.JAVA_LEGACY)
java_collection = client.testdb.get_collection('test', codec_options=java_opts)
doc = java_collection.find_one({'_id': 'baz'})
assert doc['uuid'] == input_uuid
# Round-tripping the retrieved document silently changes the Binary bytes and subtype
java_collection.replace_one({'_id': 'baz'}, doc)
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 3)}) is None
assert collection.find_one({'uuid': Binary(input_uuid.bytes, 4)}) is None
round_tripped_doc = collection.find_one({'_id': 'baz'})
assert round_tripped_doc['uuid'] == Binary(input_uuid.bytes, 3).as_uuid(UuidRepresentation.JAVA_LEGACY)
In this case, using the incorrect UuidRepresentation
(JAVA_LEGACY
instead of STANDARD
) changes the
Binary
bytes and subtype as a side-effect.
Note that this happens when any representation that
manipulates byte-order (``CSHARP_LEGACY`` or ``JAVA_LEGACY``) is incorrectly
used to round-trip UUIDs written with ``STANDARD``. When the situation is
reversed - i.e. when the original document is written using ``CSHARP_LEGACY``
or ``JAVA_LEGACY`` and then round-tripped using ``STANDARD`` -
only the :class:`~bson.binary.Binary` subtype is changed.
Note
Starting in PyMongo 4.0, these issue will be resolved as
the STANDARD
representation will decode Binary subtype 3 fields as
Binary
objects of subtype 3 (instead of
uuid.UUID
), and each of the LEGACY_*
representations will
decode Binary subtype 4 fields to Binary
objects of
subtype 4 (instead of uuid.UUID
).
Configuring a UUID Representation¶
Users can workaround the problems described above by configuring their
applications with the appropriate UuidRepresentation
.
Configuring the representation modifies PyMongo’s behavior while
encoding uuid.UUID
objects to BSON and decoding
Binary subtype 3 and 4 fields from BSON.
Applications can set the UUID representation in one of the following ways:
At the
MongoClient
level using theuuidRepresentation
URI option, e.g.:client = MongoClient("mongodb://a:27107/?uuidRepresentation=javaLegacy")
Valid values are:
Value UUID Representation pythonLegacy
PYTHON_LEGACY javaLegacy
JAVA_LEGACY csharpLegacy
CSHARP_LEGACY standard
STANDARD unspecified
UNSPECIFIED Using the
uuid_representation
kwarg option, e.g.:from bson.binary import UuidRepresentation client = MongoClient(uuid_representation=UuidRepresentation.PYTHON_LEGACY)
By supplying a suitable
CodecOptions
instance, e.g.:from bson.codec_options import CodecOptions csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY) csharp_database = client.get_database('csharp_db', codec_options=csharp_opts) csharp_collection = client.testdb.get_collection('csharp_coll', codec_options=csharp_opts)
Supported UUID Representations¶
UUID Representation | Default? | Encode uuid.UUID to |
Decode Binary subtype 4 to |
Decode Binary subtype 3 to |
---|---|---|---|---|
PYTHON_LEGACY | Yes, in PyMongo>=2.9,<4 | Binary subtype 3 with standard byte-order |
uuid.UUID in PyMongo<4; Binary subtype 4 in PyMongo>=4 |
uuid.UUID |
JAVA_LEGACY | No | Binary subtype 3 with Java legacy byte-order |
uuid.UUID in PyMongo<4; Binary subtype 4 in PyMongo>=4 |
uuid.UUID |
CSHARP_LEGACY | No | Binary subtype 3 with C# legacy byte-order |
uuid.UUID in PyMongo<4; Binary subtype 4 in PyMongo>=4 |
uuid.UUID |
STANDARD | No | Binary subtype 4 |
uuid.UUID |
uuid.UUID in PyMongo<4; Binary subtype 3 in PyMongo>=4 |
UNSPECIFIED | Yes, in PyMongo>=4 | Raise ValueError |
Binary subtype 4 |
uuid.UUID in PyMongo<4; Binary subtype 3 in PyMongo>=4 |
We now detail the behavior and use-case for each supported UUID representation.
PYTHON_LEGACY
¶
Attention
This uuid representation should be used when reading UUIDs generated by existing applications that use the Python driver but don’t explicitly set a UUID representation.
Attention
PYTHON_LEGACY
has been the default uuid representation since PyMongo 2.9.
The PYTHON_LEGACY
representation
corresponds to the legacy representation of UUIDs used by PyMongo. This
representation conforms with
RFC 4122 Section 4.1.2.
The following example illustrates the use of this representation:
from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import UuidRepresentation
# No configured UUID representation
collection = client.python_legacy.get_collection('test', codec_options=DEFAULT_CODEC_OPTIONS)
# Using UuidRepresentation.PYTHON_LEGACY
pylegacy_opts = CodecOptions(uuid_representation=UuidRepresentation.PYTHON_LEGACY)
pylegacy_collection = client.python_legacy.get_collection('test', codec_options=pylegacy_opts)
# UUIDs written by PyMongo with no UuidRepresentation configured can be queried using PYTHON_LEGACY
uuid_1 = uuid4()
collection.insert_one({'uuid': uuid_1})
document = pylegacy_collection.find_one({'uuid': uuid_1})
# UUIDs written using PYTHON_LEGACY can be read by PyMongo with no UuidRepresentation configured
uuid_2 = uuid4()
pylegacy_collection.insert_one({'uuid': uuid_2})
document = collection.find_one({'uuid': uuid_2})
PYTHON_LEGACY
encodes native uuid.UUID
objects to
Binary
subtype 3 objects, preserving the same
byte-order as bytes
:
from bson.binary import Binary
document = collection.find_one({'uuid': Binary(uuid_2.bytes, subtype=3)})
assert document['uuid'] == uuid_2
JAVA_LEGACY
¶
Attention
This UUID representation should be used when reading UUIDs
written to MongoDB by the legacy applications (i.e. applications that don’t
use the STANDARD
representation) using the Java driver.
The JAVA_LEGACY
representation
corresponds to the legacy representation of UUIDs used by the MongoDB Java
Driver.
Note
The JAVA_LEGACY
representation reverses the order of bytes 0-7,
and bytes 8-15.
As an example, consider the same UUID described in Legacy Handling of UUID Data.
Let us assume that an application used the Java driver without an explicitly
specified UUID representation to insert the example UUID
00112233-4455-6677-8899-aabbccddeeff
into MongoDB. If we try to read this
value using PyMongo with no UUID representation specified, we end up with an
entirely different UUID:
UUID('77665544-3322-1100-ffee-ddccbbaa9988')
However, if we explicitly set the representation to
JAVA_LEGACY
, we get the correct result:
UUID('00112233-4455-6677-8899-aabbccddeeff')
PyMongo uses the specified UUID representation to reorder the BSON bytes and
load them correctly. JAVA_LEGACY
encodes native uuid.UUID
objects
to Binary
subtype 3 objects, while performing the same
byte-reordering as the legacy Java driver’s UUID to BSON encoder.
CSHARP_LEGACY
¶
Attention
This UUID representation should be used when reading UUIDs
written to MongoDB by the legacy applications (i.e. applications that don’t
use the STANDARD
representation) using the C# driver.
The CSHARP_LEGACY
representation
corresponds to the legacy representation of UUIDs used by the MongoDB Java
Driver.
Note
The CSHARP_LEGACY
representation reverses the order of bytes 0-3,
bytes 4-5, and bytes 6-7.
As an example, consider the same UUID described in Legacy Handling of UUID Data.
Let us assume that an application used the C# driver without an explicitly
specified UUID representation to insert the example UUID
00112233-4455-6677-8899-aabbccddeeff
into MongoDB. If we try to read this
value using PyMongo with no UUID representation specified, we end up with an
entirely different UUID:
UUID('33221100-5544-7766-8899-aabbccddeeff')
However, if we explicitly set the representation to
CSHARP_LEGACY
, we get the correct result:
UUID('00112233-4455-6677-8899-aabbccddeeff')
PyMongo uses the specified UUID representation to reorder the BSON bytes and
load them correctly. CSHARP_LEGACY
encodes native uuid.UUID
objects to Binary
subtype 3 objects, while performing
the same byte-reordering as the legacy C# driver’s UUID to BSON encoder.
STANDARD
¶
Attention
This UUID representation should be used by new applications that have never stored UUIDs in MongoDB.
The STANDARD
representation
enables cross-language compatibility by ensuring the same byte-ordering
when encoding UUIDs from all drivers. UUIDs written by a driver with this
representation configured will be handled correctly by every other provided
it is also configured with the STANDARD
representation.
STANDARD
encodes native uuid.UUID
objects to
Binary
subtype 4 objects.
UNSPECIFIED
¶
Attention
Starting in PyMongo 4.0,
UNSPECIFIED
will be the default
UUID representation used by PyMongo.
The UNSPECIFIED
representation
prevents the incorrect interpretation of UUID bytes by stopping short of
automatically converting UUID fields in BSON to native UUID types. Loading
a UUID when using this representation returns a Binary
object instead. If required, users can coerce the decoded
Binary
objects into native UUIDs using the
as_uuid()
method and specifying the appropriate
representation format. The following example shows
what this might look like for a UUID stored by the C# driver:
from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation
from uuid import uuid4
# Using UuidRepresentation.CSHARP_LEGACY
csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)
# Store a legacy C#-formatted UUID
input_uuid = uuid4()
collection = client.testdb.get_collection('test', codec_options=csharp_opts)
collection.insert_one({'_id': 'foo', 'uuid': input_uuid})
# Using UuidRepresentation.UNSPECIFIED
unspec_opts = CodecOptions(uuid_representation=UuidRepresentation.UNSPECIFIED)
unspec_collection = client.testdb.get_collection('test', codec_options=unspec_opts)
# UUID fields are decoded as Binary when UuidRepresentation.UNSPECIFIED is configured
document = unspec_collection.find_one({'_id': 'foo'})
decoded_field = document['uuid']
assert isinstance(decoded_field, Binary)
# Binary.as_uuid() can be used to coerce the decoded value to a native UUID
decoded_uuid = decoded_field.as_uuid(UuidRepresentation.CSHARP_LEGACY)
assert decoded_uuid == input_uuid
Native uuid.UUID
objects cannot directly be encoded to
Binary
when the UUID representation is UNSPECIFIED
and attempting to do so will result in an exception:
unspec_collection.insert_one({'_id': 'bar', 'uuid': uuid4()})
Traceback (most recent call last):
...
ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. See the documentation for UuidRepresentation for more information.
Instead, applications using UNSPECIFIED
must explicitly coerce a native UUID using the
from_uuid()
method:
explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.PYTHON_LEGACY)
unspec_collection.insert_one({'_id': 'bar', 'uuid': explicit_binary})
Frequently Asked Questions¶
Contents
- Frequently Asked Questions
- Is PyMongo thread-safe?
- Is PyMongo fork-safe?
- How does connection pooling work in PyMongo?
- Does PyMongo support Python 3?
- Does PyMongo support asynchronous frameworks like Gevent, asyncio, Tornado, or Twisted?
- Why does PyMongo add an _id field to all of my documents?
- Key order in subdocuments – why does my query work in the shell but not PyMongo?
- What does CursorNotFound cursor id not valid at server mean?
- How do I change the timeout value for cursors?
- How can I store
decimal.Decimal
instances? - I’m saving
9.99
but when I query my document contains9.9900000000000002
- what’s going on here? - Can you add attribute style access for documents?
- What is the correct way to handle time zones with PyMongo?
- How can I save a
datetime.date
instance? - When I query for a document by ObjectId in my web application I get no result
- How can I use PyMongo from Django?
- Does PyMongo work with mod_wsgi?
- Does PyMongo work with PythonAnywhere?
- How can I use something like Python’s
json
module to encode my documents to JSON? - Why do I get OverflowError decoding dates stored by another language’s driver?
- Using PyMongo with Multiprocessing
Is PyMongo thread-safe?¶
PyMongo is thread-safe and provides built-in connection pooling for threaded applications.
Is PyMongo fork-safe?¶
PyMongo is not fork-safe. Care must be taken when using instances of
MongoClient
with fork()
. Specifically,
instances of MongoClient must not be copied from a parent process to
a child process. Instead, the parent process and each child process must
create their own instances of MongoClient. Instances of MongoClient copied from
the parent process have a high probability of deadlock in the child process due
to the inherent incompatibilities between fork()
, threads, and locks
described below. PyMongo will attempt to
issue a warning if there is a chance of this deadlock occurring.
MongoClient spawns multiple threads to run background tasks such as monitoring
connected servers. These threads share state that is protected by instances of
Lock
, which are themselves not fork-safe. The
driver is therefore subject to the same limitations as any other multithreaded
code that uses Lock
(and mutexes in general). One of these
limitations is that the locks become useless after fork()
. During the fork,
all locks are copied over to the child process in the same state as they were
in the parent: if they were locked, the copied locks are also locked. The child
created by fork()
only has one thread, so any locks that were taken out by
other threads in the parent will never be released in the child. The next time
the child process attempts to acquire one of these locks, deadlock occurs.
For a long but interesting read about the problems of Python locks in
multithreaded contexts with fork()
, see http://bugs.python.org/issue6721.
How does connection pooling work in PyMongo?¶
Every MongoClient
instance has a built-in
connection pool per server in your MongoDB topology. These pools open sockets
on demand to support the number of concurrent MongoDB operations that your
multi-threaded application requires. There is no thread-affinity for sockets.
The size of each connection pool is capped at maxPoolSize
, which defaults
to 100. If there are maxPoolSize
connections to a server and all are in
use, the next request to that server will wait until one of the connections
becomes available.
The client instance opens one additional socket per server in your MongoDB topology for monitoring the server’s state.
For example, a client connected to a 3-node replica set opens 3 monitoring
sockets. It also opens as many sockets as needed to support a multi-threaded
application’s concurrent operations on each server, up to maxPoolSize
. With
a maxPoolSize
of 100, if the application only uses the primary (the
default), then only the primary connection pool grows and the total connections
is at most 103. If the application uses a
ReadPreference
to query the secondaries,
their pools also grow and the total connections can reach 303.
It is possible to set the minimum number of concurrent connections to each
server with minPoolSize
, which defaults to 0. The connection pool will be
initialized with this number of sockets. If sockets are closed due to any
network errors, causing the total number of sockets (both in use and idle) to
drop below the minimum, more sockets are opened until the minimum is reached.
The maximum number of milliseconds that a connection can remain idle in the
pool before being removed and replaced can be set with maxIdleTimeMS
, which
defaults to None (no limit).
The default configuration for a MongoClient
works for most applications:
client = MongoClient(host, port)
Create this client once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient.
To support extremely high numbers of concurrent MongoDB operations within one
process, increase maxPoolSize
:
client = MongoClient(host, port, maxPoolSize=200)
… or make it unbounded:
client = MongoClient(host, port, maxPoolSize=None)
Once the pool reaches its maximum size, additional threads have to wait for
sockets to become available. PyMongo does not limit the number of threads
that can wait for sockets to become available and it is the application’s
responsibility to limit the size of its thread pool to bound queuing during a
load spike. Threads are allowed to wait for any length of time unless
waitQueueTimeoutMS
is defined:
client = MongoClient(host, port, waitQueueTimeoutMS=100)
A thread that waits more than 100ms (in this example) for a socket raises
ConnectionFailure
. Use this option if it is more
important to bound the duration of operations during a load spike than it is to
complete every operation.
When close()
is called by any thread,
all idle sockets are closed, and all sockets that are in use will be closed as
they are returned to the pool.
Does PyMongo support Python 3?¶
PyMongo supports CPython 3.4+ and PyPy3.5+. See the Python 3 FAQ for details.
Does PyMongo support asynchronous frameworks like Gevent, asyncio, Tornado, or Twisted?¶
PyMongo fully supports Gevent.
To use MongoDB with asyncio or Tornado, see the Motor project.
For Twisted, see TxMongo. Its stated mission is to keep feature parity with PyMongo.
Why does PyMongo add an _id field to all of my documents?¶
When a document is inserted to MongoDB using
insert_one()
,
insert_many()
, or
bulk_write()
, and that document does not
include an _id
field, PyMongo automatically adds one for you, set to an
instance of ObjectId
. For example:
>>> my_doc = {'x': 1}
>>> collection.insert_one(my_doc)
<pymongo.results.InsertOneResult object at 0x7f3fc25bd640>
>>> my_doc
{'x': 1, '_id': ObjectId('560db337fba522189f171720')}
Users often discover this behavior when calling
insert_many()
with a list of references
to a single document raises BulkWriteError
. Several
Python idioms lead to this pitfall:
>>> doc = {}
>>> collection.insert_many(doc for _ in range(10))
Traceback (most recent call last):
...
pymongo.errors.BulkWriteError: batch op errors occurred
>>> doc
{'_id': ObjectId('560f171cfba52279f0b0da0c')}
>>> docs = [{}]
>>> collection.insert_many(docs * 10)
Traceback (most recent call last):
...
pymongo.errors.BulkWriteError: batch op errors occurred
>>> docs
[{'_id': ObjectId('560f1933fba52279f0b0da0e')}]
PyMongo adds an _id
field in this manner for a few reasons:
- All MongoDB documents are required to have an
_id
field. - If PyMongo were to insert a document without an
_id
MongoDB would add one itself, but it would not report the value back to PyMongo. - Copying the document to insert before adding the
_id
field would be prohibitively expensive for most high write volume applications.
If you don’t want PyMongo to add an _id
to your documents, insert only
documents that already have an _id
field, added by your application.
Key order in subdocuments – why does my query work in the shell but not PyMongo?¶
The key-value pairs in a BSON document can have any order (except that _id
is always first). The mongo shell preserves key order when reading and writing
data. Observe that “b” comes before “a” when we create the document and when it
is displayed:
> // mongo shell.
> db.collection.insert( { "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } } )
WriteResult({ "nInserted" : 1 })
> db.collection.find()
{ "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }
PyMongo represents BSON documents as Python dicts by default, and the order of keys in dicts is not defined. That is, a dict declared with the “a” key first is the same, to Python, as one with “b” first:
>>> print({'a': 1.0, 'b': 1.0})
{'a': 1.0, 'b': 1.0}
>>> print({'b': 1.0, 'a': 1.0})
{'a': 1.0, 'b': 1.0}
Therefore, Python dicts are not guaranteed to show keys in the order they are stored in BSON. Here, “a” is shown before “b”:
>>> print(collection.find_one())
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}
To preserve order when reading BSON, use the SON
class,
which is a dict that remembers its key order. First, get a handle to the
collection, configured to use SON
instead of dict:
>>> from bson import CodecOptions, SON
>>> opts = CodecOptions(document_class=SON)
>>> opts
CodecOptions(document_class=<class 'bson.son.SON'>,
tz_aware=False,
uuid_representation=UuidRepresentation.PYTHON_LEGACY,
unicode_decode_error_handler='strict',
tzinfo=None, type_registry=TypeRegistry(type_codecs=[],
fallback_encoder=None))
>>> collection_son = collection.with_options(codec_options=opts)
Now, documents and subdocuments in query results are represented with
SON
objects:
>>> print(collection_son.find_one())
SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])
The subdocument’s actual storage layout is now visible: “b” is before “a”.
Because a dict’s key order is not defined, you cannot predict how it will be serialized to BSON. But MongoDB considers subdocuments equal only if their keys have the same order. So if you use a dict to query on a subdocument it may not match:
>>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None
True
Swapping the key order in your query makes no difference:
>>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None
True
… because, as we saw above, Python considers the two dicts the same.
There are two solutions. First, you can match the subdocument field-by-field:
>>> collection.find_one({'subdocument.a': 1.0,
... 'subdocument.b': 1.0})
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}
The query matches any subdocument with an “a” of 1.0 and a “b” of 1.0, regardless of the order you specify them in Python or the order they are stored in BSON. Additionally, this query now matches subdocuments with additional keys besides “a” and “b”, whereas the previous query required an exact match.
The second solution is to use a SON
to specify the key order:
>>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
>>> collection.find_one(query)
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}
The key order you use when you create a SON
is preserved
when it is serialized to BSON and used as a query. Thus you can create a
subdocument that exactly matches the subdocument in the collection.
What does CursorNotFound cursor id not valid at server mean?¶
Cursors in MongoDB can timeout on the server if they’ve been open for
a long time without any operations being performed on them. This can
lead to an CursorNotFound
exception being
raised when attempting to iterate the cursor.
How do I change the timeout value for cursors?¶
MongoDB doesn’t support custom timeouts for cursors, but cursor
timeouts can be turned off entirely. Pass no_cursor_timeout=True
to
find()
.
How can I store decimal.Decimal
instances?¶
PyMongo >= 3.4 supports the Decimal128 BSON type introduced in MongoDB 3.4.
See decimal128
for more information.
MongoDB <= 3.2 only supports IEEE 754 floating points - the same as the Python float type. The only way PyMongo could store Decimal instances to these versions of MongoDB would be to convert them to this standard, so you’d really only be storing floats anyway - we force users to do this conversion explicitly so that they are aware that it is happening.
I’m saving 9.99
but when I query my document contains 9.9900000000000002
- what’s going on here?¶
The database representation is 9.99
as an IEEE floating point (which
is common to MongoDB and Python as well as most other modern
languages). The problem is that 9.99
cannot be represented exactly
with a double precision floating point - this is true in some versions of
Python as well:
>>> 9.99
9.9900000000000002
The result that you get when you save 9.99
with PyMongo is exactly the
same as the result you’d get saving it with the JavaScript shell or
any of the other languages (and as the data you’re working with when
you type 9.99
into a Python program).
Can you add attribute style access for documents?¶
This request has come up a number of times but we’ve decided not to implement anything like this. The relevant jira case has some information about the decision, but here is a brief summary:
- This will pollute the attribute namespace for documents, so could lead to subtle bugs / confusing errors when using a key with the same name as a dictionary method.
- The only reason we even use SON objects instead of regular dictionaries is to maintain key ordering, since the server requires this for certain operations. So we’re hesitant to needlessly complicate SON (at some point it’s hypothetically possible we might want to revert back to using dictionaries alone, without breaking backwards compatibility for everyone).
- It’s easy (and Pythonic) for new users to deal with documents, since they behave just like dictionaries. If we start changing their behavior it adds a barrier to entry for new users - another class to learn.
What is the correct way to handle time zones with PyMongo?¶
See Datetimes and Timezones for examples on how to handle
datetime
objects correctly.
How can I save a datetime.date
instance?¶
PyMongo doesn’t support saving datetime.date
instances, since
there is no BSON type for dates without times. Rather than having the
driver enforce a convention for converting datetime.date
instances to datetime.datetime
instances for you, any
conversion should be performed in your client code.
When I query for a document by ObjectId in my web application I get no result¶
It’s common in web applications to encode documents’ ObjectIds in URLs, like:
"/posts/50b3bda58a02fb9a84d8991e"
Your web framework will pass the ObjectId portion of the URL to your request
handler as a string, so it must be converted to ObjectId
before it is passed to find_one()
. It is a
common mistake to forget to do this conversion. Here’s how to do it correctly
in Flask (other web frameworks are similar):
from pymongo import MongoClient
from bson.objectid import ObjectId
from flask import Flask, render_template
client = MongoClient()
app = Flask(__name__)
@app.route("/posts/<_id>")
def show_post(_id):
# NOTE!: converting _id from string to ObjectId before passing to find_one
post = client.db.posts.find_one({'_id': ObjectId(_id)})
return render_template('post.html', post=post)
if __name__ == "__main__":
app.run()
See also
How can I use PyMongo from Django?¶
Django is a popular Python web
framework. Django includes an ORM, django.db
. Currently,
there’s no official MongoDB backend for Django.
django-mongodb-engine is an unofficial MongoDB backend that supports Django aggregations, (atomic) updates, embedded objects, Map/Reduce and GridFS. It allows you to use most of Django’s built-in features, including the ORM, admin, authentication, site and session frameworks and caching.
However, it’s easy to use MongoDB (and PyMongo) from Django
without using a Django backend. Certain features of Django that require
django.db
(admin, authentication and sessions) will not work
using just MongoDB, but most of what Django provides can still be
used.
One project which should make working with MongoDB and Django easier
is mango. Mango is a set of
MongoDB backends for Django sessions and authentication (bypassing
django.db
entirely).
Does PyMongo work with mod_wsgi?¶
Yes. See the configuration guide for PyMongo and mod_wsgi.
Does PyMongo work with PythonAnywhere?¶
No. PyMongo creates Python threads which PythonAnywhere does not support. For more information see PYTHON-1495.
How can I use something like Python’s json
module to encode my documents to JSON?¶
json_util
is PyMongo’s built in, flexible tool for using
Python’s json
module with BSON documents and MongoDB Extended JSON. The
json
module won’t work out of the box with all documents from PyMongo
as PyMongo supports some special types (like ObjectId
and DBRef
) that are not supported in JSON.
python-bsonjs is a fast
BSON to MongoDB Extended JSON converter built on top of
libbson. python-bsonjs does not
depend on PyMongo and can offer a nice performance improvement over
json_util
. python-bsonjs works best with PyMongo when using
RawBSONDocument
.
Why do I get OverflowError decoding dates stored by another language’s driver?¶
PyMongo decodes BSON datetime values to instances of Python’s
datetime.datetime
. Instances of datetime.datetime
are
limited to years between datetime.MINYEAR
(usually 1) and
datetime.MAXYEAR
(usually 9999). Some MongoDB drivers (e.g. the PHP
driver) can store BSON datetimes with year values far outside those supported
by datetime.datetime
.
There are a few ways to work around this issue. One option is to filter
out documents with values outside of the range supported by
datetime.datetime
:
>>> from datetime import datetime
>>> coll = client.test.dates
>>> cur = coll.find({'dt': {'$gte': datetime.min, '$lte': datetime.max}})
Another option, assuming you don’t need the datetime field, is to filter out just that field:
>>> cur = coll.find({}, projection={'dt': False})
Using PyMongo with Multiprocessing¶
On Unix systems the multiprocessing module spawns processes using fork()
.
Care must be taken when using instances of
MongoClient
with fork()
. Specifically,
instances of MongoClient must not be copied from a parent process to a child
process. Instead, the parent process and each child process must create their
own instances of MongoClient. For example:
# Each process creates its own instance of MongoClient.
def func():
db = pymongo.MongoClient().mydb
# Do something with db.
proc = multiprocessing.Process(target=func)
proc.start()
Never do this:
client = pymongo.MongoClient()
# Each child process attempts to copy a global MongoClient
# created in the parent process. Never do this.
def func():
db = client.mydb
# Do something with db.
proc = multiprocessing.Process(target=func)
proc.start()
Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to inherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.
See also
Compatibility Policy¶
Semantic Versioning¶
PyMongo’s version numbers follow semantic versioning: each version number is structured “major.minor.patch”. Patch releases fix bugs, minor releases add features (and may fix bugs), and major releases include API changes that break backwards compatibility (and may add features and fix bugs).
Deprecation¶
Before we remove a feature in a major release, PyMongo’s maintainers make an effort to release at least one minor version that deprecates it. We add “DEPRECATED” to the feature’s documentation, and update the code to raise a DeprecationWarning. You can ensure your code is future-proof by running your code with the latest PyMongo release and looking for DeprecationWarnings.
Starting with Python 2.7, the interpreter silences DeprecationWarnings by
default. For example, the following code uses the deprecated insert
method but does not raise any warning:
# "insert.py"
from pymongo import MongoClient
client = MongoClient()
client.test.test.insert({})
To print deprecation warnings to stderr, run python with “-Wd”:
$ python -Wd insert.py
insert.py:4: DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.
client.test.test.insert({})
You can turn warnings into exceptions with “python -We”:
$ python -We insert.py
Traceback (most recent call last):
File "insert.py", line 4, in <module>
client.test.test.insert({})
File "/home/durin/work/mongo-python-driver/pymongo/collection.py", line 2906, in insert
"instead.", DeprecationWarning, stacklevel=2)
DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.
If your own code’s test suite passes with “python -We” then it uses no deprecated PyMongo features.
See also
The Python documentation on the warnings module, and the -W command line option.
API Documentation¶
The PyMongo distribution contains three top-level packages for
interacting with MongoDB. bson
is an implementation of the
BSON format, pymongo
is a
full-featured driver for MongoDB, and gridfs
is a set of tools
for working with the GridFS storage
specification.
bson
– BSON (Binary JSON) Encoding and Decoding¶
BSON (Binary JSON) encoding and decoding.
The mapping from Python types to BSON types is as follows:
Python Type | BSON Type | Supported Direction |
---|---|---|
None | null | both |
bool | boolean | both |
int [1] | int32 / int64 | py -> bson |
long | int64 | py -> bson |
bson.int64.Int64 | int64 | both |
float | number (real) | both |
string | string | py -> bson |
unicode | string | both |
list | array | both |
dict / SON | object | both |
datetime.datetime [2] [3] | date | both |
bson.regex.Regex | regex | both |
compiled re [4] | regex | py -> bson |
bson.binary.Binary | binary | both |
bson.objectid.ObjectId | oid | both |
bson.dbref.DBRef | dbref | both |
None | undefined | bson -> py |
unicode | code | bson -> py |
bson.code.Code | code | py -> bson |
unicode | symbol | bson -> py |
bytes (Python 3) [5] | binary | both |
Note that, when using Python 2.x, to save binary data it must be wrapped as an instance of bson.binary.Binary. Otherwise it will be saved as a BSON string and retrieved as unicode. Users of Python 3.x can use the Python bytes type.
[1] | A Python int will be saved as a BSON int32 or BSON int64 depending
on its size. A BSON int32 will always decode to a Python int. A BSON
int64 will always decode to a Int64 . |
[2] | datetime.datetime instances will be rounded to the nearest millisecond when saved |
[3] | all datetime.datetime instances are treated as naive. clients should always use UTC. |
[4] | Regex instances and regular expression
objects from re.compile() are both saved as BSON regular expressions.
BSON regular expressions are decoded as Regex
instances. |
[5] | The bytes type from Python 3.x is encoded as BSON binary with
subtype 0. In Python 3.x it will be decoded back to bytes. In Python 2.x
it will be decoded to an instance of Binary with
subtype 0. |
-
class
bson.
BSON
¶ BSON (Binary JSON) data.
Warning
Using this class to encode and decode BSON adds a performance cost. For better performance use the module level functions
encode()
anddecode()
instead.-
decode
(codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶ Decode this BSON data.
By default, returns a BSON document represented as a Python
dict
. To use a differentMutableMapping
class, configure aCodecOptions
:>>> import collections # From Python standard library. >>> import bson >>> from bson.codec_options import CodecOptions >>> data = bson.BSON.encode({'a': 1}) >>> decoded_doc = bson.BSON(data).decode() <type 'dict'> >>> options = CodecOptions(document_class=collections.OrderedDict) >>> decoded_doc = bson.BSON(data).decode(codec_options=options) >>> type(decoded_doc) <class 'collections.OrderedDict'>
Parameters: - codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as
Regex
objects. Usetry_compile()
to attempt to convert from a BSON regular expression to a Python regular expression object.Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as
Regex
objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.- codec_options (optional): An instance of
-
classmethod
encode
(document, check_keys=False, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶ Encode a document to a new
BSON
instance.A document can be any mapping type (like
dict
).Raises
TypeError
if document is not a mapping type, or contains keys that are not instances ofbasestring
(str
in python 3). RaisesInvalidDocument
if document cannot be converted toBSON
.Parameters: - document: mapping type representing a document
- check_keys (optional): check if keys start with ‘$’ or
contain ‘.’, raising
InvalidDocument
in either case - codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Replaced uuid_subtype option with codec_options.
-
-
bson.
decode
(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶ Decode BSON to a document.
By default, returns a BSON document represented as a Python
dict
. To use a differentMutableMapping
class, configure aCodecOptions
:>>> import collections # From Python standard library. >>> import bson >>> from bson.codec_options import CodecOptions >>> data = bson.encode({'a': 1}) >>> decoded_doc = bson.decode(data) <type 'dict'> >>> options = CodecOptions(document_class=collections.OrderedDict) >>> decoded_doc = bson.decode(data, codec_options=options) >>> type(decoded_doc) <class 'collections.OrderedDict'>
Parameters: - data: the BSON to decode. Any bytes-like object that implements the buffer protocol.
- codec_options (optional): An instance of
CodecOptions
.
New in version 3.9.
-
bson.
decode_all
(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶ Decode BSON data to multiple documents.
data must be a bytes-like object implementing the buffer protocol that provides concatenated, valid, BSON-encoded documents.
Parameters: - data: BSON data
- codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.9: Supports bytes-like objects that implement the buffer protocol.
Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as
Regex
objects. Usetry_compile()
to attempt to convert from a BSON regular expression to a Python regular expression object.Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as
Regex
objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.
-
bson.
decode_file_iter
(file_obj, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶ Decode bson data from a file to multiple documents as a generator.
Works similarly to the decode_all function, but reads from the file object in chunks and parses bson in chunks, yielding one document at a time.
Parameters: - file_obj: A file object containing BSON data.
- codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
New in version 2.8.
-
bson.
decode_iter
(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶ Decode BSON data to multiple documents as a generator.
Works similarly to the decode_all function, but yields one document at a time.
data must be a string of concatenated, valid, BSON-encoded documents.
Parameters: - data: BSON data
- codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
New in version 2.8.
-
bson.
encode
(document, check_keys=False, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶ Encode a document to BSON.
A document can be any mapping type (like
dict
).Raises
TypeError
if document is not a mapping type, or contains keys that are not instances ofbasestring
(str
in python 3). RaisesInvalidDocument
if document cannot be converted toBSON
.Parameters: - document: mapping type representing a document
- check_keys (optional): check if keys start with ‘$’ or
contain ‘.’, raising
InvalidDocument
in either case - codec_options (optional): An instance of
CodecOptions
.
New in version 3.9.
-
bson.
gen_list_name
()¶ Generate “keys” for encoded lists in the sequence b”0”, b”1”, b”2”, …
The first 1000 keys are returned from a pre-built cache. All subsequent keys are generated on the fly.
-
bson.
has_c
()¶ Is the C extension installed?
-
bson.
is_valid
(bson)¶ Check that the given string represents valid
BSON
data.Raises
TypeError
if bson is not an instance ofstr
(bytes
in python 3). ReturnsTrue
if bson is validBSON
,False
otherwise.Parameters: - bson: the data to be validated
Sub-modules:
binary
– Tools for representing binary data to be stored in MongoDB¶
-
bson.binary.
BINARY_SUBTYPE
= 0¶ BSON binary subtype for binary data.
This is the default subtype for binary data.
-
bson.binary.
FUNCTION_SUBTYPE
= 1¶ BSON binary subtype for functions.
-
bson.binary.
OLD_BINARY_SUBTYPE
= 2¶ Old BSON binary subtype for binary data.
This is the old default subtype, the current default is
BINARY_SUBTYPE
.
-
bson.binary.
OLD_UUID_SUBTYPE
= 3¶ Old BSON binary subtype for a UUID.
uuid.UUID
instances will automatically be encoded bybson
using this subtype.New in version 2.1.
-
bson.binary.
UUID_SUBTYPE
= 4¶ BSON binary subtype for a UUID.
This is the new BSON binary subtype for UUIDs. The current default is
OLD_UUID_SUBTYPE
.Changed in version 2.1: Changed to subtype 4.
-
bson.binary.
STANDARD
= 4¶ An alias for
UuidRepresentation.STANDARD
.New in version 3.0.
-
bson.binary.
PYTHON_LEGACY
= 3¶ An alias for
UuidRepresentation.PYTHON_LEGACY
.New in version 3.0.
-
bson.binary.
JAVA_LEGACY
= 5¶ An alias for
UuidRepresentation.JAVA_LEGACY
.Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.
New in version 2.3.
-
bson.binary.
CSHARP_LEGACY
= 6¶ An alias for
UuidRepresentation.CSHARP_LEGACY
.Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.
New in version 2.3.
-
bson.binary.
MD5_SUBTYPE
= 5¶ BSON binary subtype for an MD5 hash.
-
bson.binary.
USER_DEFINED_SUBTYPE
= 128¶ BSON binary subtype for any user defined structure.
-
class
bson.binary.
UuidRepresentation
¶ -
CSHARP_LEGACY
= 6¶ The C#/.net legacy UUID representation.
uuid.UUID
instances will automatically be encoded to and decoded from BSON binary subtypeOLD_UUID_SUBTYPE
, using the C# driver’s legacy byte order.See CSHARP_LEGACY for details.
New in version 3.11.
-
JAVA_LEGACY
= 5¶ The Java legacy UUID representation.
uuid.UUID
instances will automatically be encoded to and decoded from BSON binary subtypeOLD_UUID_SUBTYPE
, using the Java driver’s legacy byte order.See JAVA_LEGACY for details.
New in version 3.11.
-
PYTHON_LEGACY
= 3¶ The Python legacy UUID representation.
uuid.UUID
instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtypeOLD_UUID_SUBTYPE
.See PYTHON_LEGACY for details.
New in version 3.11.
-
STANDARD
= 4¶ The standard UUID representation.
uuid.UUID
instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtypeUUID_SUBTYPE
.See STANDARD for details.
New in version 3.11.
-
UNSPECIFIED
= 0¶ An unspecified UUID representation.
When configured,
uuid.UUID
instances will not be automatically encoded to or decoded fromBinary
. When encoding auuid.UUID
instance, an error will be raised. To encode auuid.UUID
instance with this configuration, it must be wrapped in theBinary
class by the application code. When decoding a BSON binary field with a UUID subtype, aBinary
instance will be returned instead of auuid.UUID
instance.See UNSPECIFIED for details.
New in version 3.11.
-
-
class
bson.binary.
Binary
(data, subtype=BINARY_SUBTYPE)¶ Bases:
bytes
Representation of BSON binary data.
This is necessary because we want to represent Python strings as the BSON string type. We need to wrap binary data so we can tell the difference between what should be considered binary data and what should be considered a string when we encode to BSON.
Raises TypeError if data is not an instance of
bytes
(str
in python 2) or subtype is not an instance ofint
. Raises ValueError if subtype is not in [0, 256).Note
In python 3 instances of Binary with subtype 0 will be decoded directly to
bytes
.Parameters: - data: the binary data to represent. Can be any bytes-like type that implements the buffer protocol.
- subtype (optional): the binary subtype to use
Changed in version 3.9: Support any bytes-like type that implements the buffer protocol.
-
as_uuid
(uuid_representation=4)¶ Create a Python UUID from this BSON Binary object.
Decodes this binary object as a native
uuid.UUID
instance with the provideduuid_representation
.Raises
ValueError
if thisBinary
instance does not contain a UUID.Parameters: - uuid_representation: A member of
UuidRepresentation
. Default:STANDARD
. See Handling UUID Data for details.
New in version 3.11.
- uuid_representation: A member of
-
classmethod
from_uuid
(uuid, uuid_representation=4)¶ Create a BSON Binary object from a Python UUID.
Creates a
Binary
object from auuid.UUID
instance. Assumes that the nativeuuid.UUID
instance uses the byte-order implied by the provideduuid_representation
.Raises
TypeError
if uuid is not an instance ofUUID
.Parameters: - uuid: A
uuid.UUID
instance. - uuid_representation: A member of
UuidRepresentation
. Default:STANDARD
. See Handling UUID Data for details.
New in version 3.11.
- uuid: A
-
subtype
¶ Subtype of this binary data.
-
class
bson.binary.
UUIDLegacy
(obj)¶ Bases:
bson.binary.Binary
DEPRECATED - UUID wrapper to support working with UUIDs stored as PYTHON_LEGACY.
Note
This class has been deprecated and will be removed in PyMongo 4.0. Use
from_uuid()
andas_uuid()
with the appropriateUuidRepresentation
to handle legacy-formatted UUIDs instead.:from bson import Binary, UUIDLegacy, UuidRepresentation import uuid my_uuid = uuid.uuid4() legacy_uuid = UUIDLegacy(my_uuid) binary_uuid = Binary.from_uuid( my_uuid, UuidRepresentation.PYTHON_LEGACY) assert legacy_uuid == binary_uuid assert legacy_uuid.uuid == binary_uuid.as_uuid( UuidRepresentation.PYTHON_LEGACY)
>>> import uuid >>> from bson.binary import Binary, UUIDLegacy, STANDARD >>> from bson.codec_options import CodecOptions >>> my_uuid = uuid.uuid4() >>> coll = db.get_collection('test', ... CodecOptions(uuid_representation=STANDARD)) >>> coll.insert_one({'uuid': Binary(my_uuid.bytes, 3)}).inserted_id ObjectId('...') >>> coll.count_documents({'uuid': my_uuid}) 0 >>> coll.count_documents({'uuid': UUIDLegacy(my_uuid)}) 1 >>> coll.find({'uuid': UUIDLegacy(my_uuid)})[0]['uuid'] UUID('...') >>> >>> # Convert from subtype 3 to subtype 4 >>> doc = coll.find_one({'uuid': UUIDLegacy(my_uuid)}) >>> coll.replace_one({"_id": doc["_id"]}, doc).matched_count 1 >>> coll.count_documents({'uuid': UUIDLegacy(my_uuid)}) 0 >>> coll.count_documents({'uuid': {'$in': [UUIDLegacy(my_uuid), my_uuid]}}) 1 >>> coll.find_one({'uuid': my_uuid})['uuid'] UUID('...')
Raises
TypeError
if obj is not an instance ofUUID
.Parameters: - obj: An instance of
UUID
.
Changed in version 3.11: Deprecated. The same functionality can be replicated using the
from_uuid()
andto_uuid()
methods withPYTHON_LEGACY
.New in version 2.1.
-
uuid
¶ UUID instance wrapped by this UUIDLegacy instance.
- obj: An instance of
code
– Tools for representing JavaScript code¶
Tools for representing JavaScript code in BSON.
-
class
bson.code.
Code
(code, scope=None, **kwargs)¶ Bases:
str
BSON’s JavaScript code type.
Raises
TypeError
if code is not an instance ofbasestring
(str
in python 3) or scope is notNone
or an instance ofdict
.Scope variables can be set by passing a dictionary as the scope argument or by using keyword arguments. If a variable is set as a keyword argument it will override any setting for that variable in the scope dictionary.
Parameters: - code: A string containing JavaScript code to be evaluated or another
instance of Code. In the latter case, the scope of code becomes this
Code’s
scope
. - scope (optional): dictionary representing the scope in which
code should be evaluated - a mapping from identifiers (as
strings) to values. Defaults to
None
. This is applied after any scope associated with a given code above. - **kwargs (optional): scope variables can also be passed as keyword arguments. These are applied after scope and code.
Changed in version 3.4: The default value for
scope
isNone
instead of{}
.-
scope
¶ Scope dictionary for this instance or
None
.
- code: A string containing JavaScript code to be evaluated or another
instance of Code. In the latter case, the scope of code becomes this
Code’s
codec_options
– Tools for specifying BSON codec options¶
Tools for specifying BSON codec options.
-
class
bson.codec_options.
CodecOptions
¶ Encapsulates options used encoding and / or decoding BSON.
The document_class option is used to define a custom type for use decoding BSON documents. Access to the underlying raw BSON bytes for a document is available using the
RawBSONDocument
type:>>> from bson.raw_bson import RawBSONDocument >>> from bson.codec_options import CodecOptions >>> codec_options = CodecOptions(document_class=RawBSONDocument) >>> coll = db.get_collection('test', codec_options=codec_options) >>> doc = coll.find_one() >>> doc.raw '\x16\x00\x00\x00\x07_id\x00[0\x165\x91\x10\xea\x14\xe8\xc5\x8b\x93\x00'
The document class can be any type that inherits from
MutableMapping
:>>> class AttributeDict(dict): ... # A dict that supports attribute access. ... def __getattr__(self, key): ... return self[key] ... def __setattr__(self, key, value): ... self[key] = value ... >>> codec_options = CodecOptions(document_class=AttributeDict) >>> coll = db.get_collection('test', codec_options=codec_options) >>> doc = coll.find_one() >>> doc._id ObjectId('5b3016359110ea14e8c58b93')
See Datetimes and Timezones for examples using the tz_aware and tzinfo options.
See
UUIDLegacy
for examples using the uuid_representation option.Parameters: - document_class: BSON documents returned in queries will be decoded
to an instance of this class. Must be a subclass of
MutableMapping
. Defaults todict
. - tz_aware: If
True
, BSON datetimes will be decoded to timezone aware instances ofdatetime
. Otherwise they will be naive. Defaults toFalse
. - uuid_representation: The BSON representation to use when encoding
and decoding instances of
UUID
. Defaults toPYTHON_LEGACY
. New applications should consider setting this toSTANDARD
for cross language compatibility. See Handling UUID Data for details. - unicode_decode_error_handler: The error handler to apply when
a Unicode-related error occurs during BSON decoding that would
otherwise raise
UnicodeDecodeError
. Valid options include ‘strict’, ‘replace’, and ‘ignore’. Defaults to ‘strict’. - tzinfo: A
tzinfo
subclass that specifies the timezone to/from whichdatetime
objects should be encoded/decoded. - type_registry: Instance of
TypeRegistry
used to customize encoding and decoding behavior.
New in version 3.8: type_registry attribute.
Warning
Care must be taken when changing unicode_decode_error_handler from its default value (‘strict’). The ‘replace’ and ‘ignore’ modes should not be used when documents retrieved from the server will be modified in the client application and stored back to the server.
-
with_options
(**kwargs)¶ Make a copy of this CodecOptions, overriding some options:
>>> from bson.codec_options import DEFAULT_CODEC_OPTIONS >>> DEFAULT_CODEC_OPTIONS.tz_aware False >>> options = DEFAULT_CODEC_OPTIONS.with_options(tz_aware=True) >>> options.tz_aware True
New in version 3.5.
- document_class: BSON documents returned in queries will be decoded
to an instance of this class. Must be a subclass of
-
class
bson.codec_options.
TypeCodec
¶ Base class for defining type codec classes which describe how a custom type can be transformed to/from one of the types
bson
can already encode/decode.Codec classes must implement the
python_type
attribute, and thetransform_python
method to support encoding, as well as thebson_type
attribute, and thetransform_bson
method to support decoding.See The TypeCodec Class documentation for an example.
-
class
bson.codec_options.
TypeDecoder
¶ Base class for defining type codec classes which describe how a BSON type can be transformed to a custom type.
Codec classes must implement the
bson_type
attribute, and thetransform_bson
method to support decoding.See The TypeCodec Class documentation for an example.
-
bson_type
¶ The BSON type to be converted into our own type.
-
transform_bson
(value)¶ Convert the given BSON value into our own type.
-
-
class
bson.codec_options.
TypeEncoder
¶ Base class for defining type codec classes which describe how a custom type can be transformed to one of the types BSON understands.
Codec classes must implement the
python_type
attribute, and thetransform_python
method to support encoding.See The TypeCodec Class documentation for an example.
-
python_type
¶ The Python type to be converted into something serializable.
-
transform_python
(value)¶ Convert the given Python object into something serializable.
-
-
class
bson.codec_options.
TypeRegistry
(type_codecs=None, fallback_encoder=None)¶ Encapsulates type codecs used in encoding and / or decoding BSON, as well as the fallback encoder. Type registries cannot be modified after instantiation.
TypeRegistry
can be initialized with an iterable of type codecs, and a callable for the fallback encoder:>>> from bson.codec_options import TypeRegistry >>> type_registry = TypeRegistry([Codec1, Codec2, Codec3, ...], ... fallback_encoder)
See The TypeRegistry Class documentation for an example.
Parameters: - type_codecs (optional): iterable of type codec instances. If
type_codecs
contains multiple codecs that transform a single python or BSON type, the transformation specified by the type codec occurring last prevails. A TypeError will be raised if one or more type codecs modify the encoding behavior of a built-inbson
type. - fallback_encoder (optional): callable that accepts a single,
unencodable python value and transforms it into a type that
bson
can encode. See The fallback_encoder Callable documentation for an example.
- type_codecs (optional): iterable of type codec instances. If
dbref
– Tools for manipulating DBRefs (references to documents stored in MongoDB)¶
Tools for manipulating DBRefs (references to MongoDB documents).
-
class
bson.dbref.
DBRef
(collection, id, database=None, _extra={}, **kwargs)¶ Initialize a new
DBRef
.Raises
TypeError
if collection or database is not an instance ofbasestring
(str
in python 3). database is optional and allows references to documents to work across databases. Any additional keyword arguments will create additional fields in the resultant embedded document.Parameters: - collection: name of the collection the document is stored in
- id: the value of the document’s
"_id"
field - database (optional): name of the database to reference
- **kwargs (optional): additional keyword arguments will create additional, custom fields
-
as_doc
()¶ Get the SON document representation of this DBRef.
Generally not needed by application developers
-
collection
¶ Get the name of this DBRef’s collection as unicode.
-
database
¶ Get the name of this DBRef’s database.
Returns None if this DBRef doesn’t specify a database.
-
id
¶ Get this DBRef’s _id.
decimal128
– Support for BSON Decimal128¶
Tools for working with the BSON decimal128 type.
New in version 3.4.
Note
The Decimal128 BSON type requires MongoDB 3.4+.
-
class
bson.decimal128.
Decimal128
(value)¶ BSON Decimal128 type:
>>> Decimal128(Decimal("0.0005")) Decimal128('0.0005') >>> Decimal128("0.0005") Decimal128('0.0005') >>> Decimal128((3474527112516337664, 5)) Decimal128('0.0005')
Parameters: - value: An instance of
decimal.Decimal
, string, or tuple of (high bits, low bits) from Binary Integer Decimal (BID) format.
Note
Decimal128
uses an instance ofdecimal.Context
configured for IEEE-754 Decimal128 when validating parameters. Signals likedecimal.InvalidOperation
,decimal.Inexact
, anddecimal.Overflow
are trapped and raised as exceptions:>>> Decimal128(".13.1") Traceback (most recent call last): File "<stdin>", line 1, in <module> ... decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>] >>> >>> Decimal128("1E-6177") Traceback (most recent call last): File "<stdin>", line 1, in <module> ... decimal.Inexact: [<class 'decimal.Inexact'>] >>> >>> Decimal128("1E6145") Traceback (most recent call last): File "<stdin>", line 1, in <module> ... decimal.Overflow: [<class 'decimal.Overflow'>, <class 'decimal.Rounded'>]
To ensure the result of a calculation can always be stored as BSON Decimal128 use the context returned by
create_decimal128_context()
:>>> import decimal >>> decimal128_ctx = create_decimal128_context() >>> with decimal.localcontext(decimal128_ctx) as ctx: ... Decimal128(ctx.create_decimal(".13.3")) ... Decimal128('NaN') >>> >>> with decimal.localcontext(decimal128_ctx) as ctx: ... Decimal128(ctx.create_decimal("1E-6177")) ... Decimal128('0E-6176') >>> >>> with decimal.localcontext(DECIMAL128_CTX) as ctx: ... Decimal128(ctx.create_decimal("1E6145")) ... Decimal128('Infinity')
To match the behavior of MongoDB’s Decimal128 implementation str(Decimal(value)) may not match str(Decimal128(value)) for NaN values:
>>> Decimal128(Decimal('NaN')) Decimal128('NaN') >>> Decimal128(Decimal('-NaN')) Decimal128('NaN') >>> Decimal128(Decimal('sNaN')) Decimal128('NaN') >>> Decimal128(Decimal('-sNaN')) Decimal128('NaN')
However,
to_decimal()
will return the exact value:>>> Decimal128(Decimal('NaN')).to_decimal() Decimal('NaN') >>> Decimal128(Decimal('-NaN')).to_decimal() Decimal('-NaN') >>> Decimal128(Decimal('sNaN')).to_decimal() Decimal('sNaN') >>> Decimal128(Decimal('-sNaN')).to_decimal() Decimal('-sNaN')
Two instances of
Decimal128
compare equal if their Binary Integer Decimal encodings are equal:>>> Decimal128('NaN') == Decimal128('NaN') True >>> Decimal128('NaN').bid == Decimal128('NaN').bid True
This differs from
decimal.Decimal
comparisons for NaN:>>> Decimal('NaN') == Decimal('NaN') False
-
bid
¶ The Binary Integer Decimal (BID) encoding of this instance.
-
classmethod
from_bid
(value)¶ Create an instance of
Decimal128
from Binary Integer Decimal string.Parameters: - value: 16 byte string (128-bit IEEE 754-2008 decimal floating point in Binary Integer Decimal (BID) format).
-
to_decimal
()¶ Returns an instance of
decimal.Decimal
for thisDecimal128
.
- value: An instance of
-
bson.decimal128.
create_decimal128_context
()¶ Returns an instance of
decimal.Context
appropriate for working with IEEE-754 128-bit decimal floating point values.
errors
– Exceptions raised by the bson
package¶
Exceptions raised by the BSON package.
-
exception
bson.errors.
BSONError
¶ Base class for all BSON exceptions.
-
exception
bson.errors.
InvalidBSON
¶ Raised when trying to create a BSON object from invalid data.
-
exception
bson.errors.
InvalidDocument
¶ Raised when trying to create a BSON object from an invalid document.
-
exception
bson.errors.
InvalidId
¶ Raised when trying to create an ObjectId from invalid data.
-
exception
bson.errors.
InvalidStringData
¶ Raised when trying to encode a string containing non-UTF8 data.
int64
– Tools for representing BSON int64¶
New in version 3.0.
A BSON wrapper for long (int in python3)
-
class
bson.int64.
Int64
¶ Representation of the BSON int64 type.
This is necessary because every integral number is an
int
in Python 3. Small integral numbers are encoded to BSON int32 by default, but Int64 numbers will always be encoded to BSON int64.Parameters: - value: the numeric value to represent
json_util
– Tools for using Python’s json
module with BSON documents¶
Tools for using Python’s json
module with BSON documents.
This module provides two helper methods dumps and loads that wrap the
native json
methods and provide explicit BSON conversion to and from
JSON. JSONOptions
provides a way to control how JSON
is emitted and parsed, with the default being the legacy PyMongo format.
json_util
can also generate Canonical or Relaxed Extended JSON
when CANONICAL_JSON_OPTIONS
or RELAXED_JSON_OPTIONS
is
provided, respectively.
Example usage (deserialization):
>>> from bson.json_util import loads
>>> loads('[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$scope": {}, "$code": "function x() { return 1; }"}}, {"bin": {"$type": "80", "$binary": "AQIDBA=="}}]')
[{u'foo': [1, 2]}, {u'bar': {u'hello': u'world'}}, {u'code': Code('function x() { return 1; }', {})}, {u'bin': Binary('...', 128)}]
Example usage (serialization):
>>> from bson import Binary, Code
>>> from bson.json_util import dumps
>>> dumps([{'foo': [1, 2]},
... {'bar': {'hello': 'world'}},
... {'code': Code("function x() { return 1; }", {})},
... {'bin': Binary(b"")}])
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }", "$scope": {}}}, {"bin": {"$binary": "AQIDBA==", "$type": "00"}}]'
Example usage (with CANONICAL_JSON_OPTIONS
):
>>> from bson import Binary, Code
>>> from bson.json_util import dumps, CANONICAL_JSON_OPTIONS
>>> dumps([{'foo': [1, 2]},
... {'bar': {'hello': 'world'}},
... {'code': Code("function x() { return 1; }")},
... {'bin': Binary(b"")}],
... json_options=CANONICAL_JSON_OPTIONS)
'[{"foo": [{"$numberInt": "1"}, {"$numberInt": "2"}]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }"}}, {"bin": {"$binary": {"base64": "AQIDBA==", "subType": "00"}}}]'
Example usage (with RELAXED_JSON_OPTIONS
):
>>> from bson import Binary, Code
>>> from bson.json_util import dumps, RELAXED_JSON_OPTIONS
>>> dumps([{'foo': [1, 2]},
... {'bar': {'hello': 'world'}},
... {'code': Code("function x() { return 1; }")},
... {'bin': Binary(b"")}],
... json_options=RELAXED_JSON_OPTIONS)
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }"}}, {"bin": {"$binary": {"base64": "AQIDBA==", "subType": "00"}}}]'
Alternatively, you can manually pass the default to json.dumps()
.
It won’t handle Binary
and Code
instances (as they are extended strings you can’t provide custom defaults),
but it will be faster as there is less recursion.
Note
If your application does not need the flexibility offered by
JSONOptions
and spends a large amount of time in the json_util
module, look to
python-bsonjs for a nice
performance improvement. python-bsonjs is a fast BSON to MongoDB
Extended JSON converter for Python built on top of
libbson. python-bsonjs works best
with PyMongo when using RawBSONDocument
.
Changed in version 2.8: The output format for Timestamp
has changed from
‘{“t”: <int>, “i”: <int>}’ to ‘{“$timestamp”: {“t”: <int>, “i”: <int>}}’.
This new format will be decoded to an instance of
Timestamp
. The old format will continue to be
decoded to a python dict as before. Encoding to the old format is no longer
supported as it was never correct and loses type information.
Added support for $numberLong and $undefined - new in MongoDB 2.6 - and
parsing $date in ISO-8601 format.
Changed in version 2.7: Preserves order when rendering SON, Timestamp, Code, Binary, and DBRef instances.
Changed in version 2.3: Added dumps and loads helpers to automatically handle conversion to and
from json and supports Binary
and
Code
-
class
bson.json_util.
DatetimeRepresentation
¶ -
LEGACY
= 0¶ Legacy MongoDB Extended JSON datetime representation.
datetime.datetime
instances will be encoded to JSON in the format {“$date”: <dateAsMilliseconds>}, where dateAsMilliseconds is a 64-bit signed integer giving the number of milliseconds since the Unix epoch UTC. This was the default encoding before PyMongo version 3.4.New in version 3.4.
-
NUMBERLONG
= 1¶ NumberLong datetime representation.
datetime.datetime
instances will be encoded to JSON in the format {“$date”: {“$numberLong”: “<dateAsMilliseconds>”}}, where dateAsMilliseconds is the string representation of a 64-bit signed integer giving the number of milliseconds since the Unix epoch UTC.New in version 3.4.
-
ISO8601
= 2¶ ISO-8601 datetime representation.
datetime.datetime
instances greater than or equal to the Unix epoch UTC will be encoded to JSON in the format {“$date”: “<ISO-8601>”}.datetime.datetime
instances before the Unix epoch UTC will be encoded as if the datetime representation isNUMBERLONG
.New in version 3.4.
-
-
class
bson.json_util.
JSONMode
¶ -
LEGACY
= 0¶ Legacy Extended JSON representation.
In this mode,
dumps()
produces PyMongo’s legacy non-standard JSON output. Consider usingRELAXED
orCANONICAL
instead.New in version 3.5.
-
RELAXED
= 1¶ Relaxed Extended JSON representation.
In this mode,
dumps()
produces Relaxed Extended JSON, a mostly JSON-like format. Consider using this for things like a web API, where one is sending a document (or a projection of a document) that only uses ordinary JSON type primitives. In particular, theint
,Int64
, andfloat
numeric types are represented in the native JSON number format. This output is also the most human readable and is useful for debugging and documentation.See also
The specification for Relaxed Extended JSON.
New in version 3.5.
-
CANONICAL
= 2¶ Canonical Extended JSON representation.
In this mode,
dumps()
produces Canonical Extended JSON, a type preserving format. Consider using this for things like testing, where one has to precisely specify expected types in JSON. In particular, theint
,Int64
, andfloat
numeric types are encoded with type wrappers.See also
The specification for Canonical Extended JSON.
New in version 3.5.
-
-
class
bson.json_util.
JSONOptions
¶ Encapsulates JSON options for
dumps()
andloads()
.Parameters: - strict_number_long: If
True
,Int64
objects are encoded to MongoDB Extended JSON’s Strict mode type NumberLong, ie'{"$numberLong": "<number>" }'
. Otherwise they will be encoded as an int. Defaults toFalse
. - datetime_representation: The representation to use when encoding
instances of
datetime.datetime
. Defaults toLEGACY
. - strict_uuid: If
True
,uuid.UUID
object are encoded to MongoDB Extended JSON’s Strict mode type Binary. Otherwise it will be encoded as'{"$uuid": "<hex>" }'
. Defaults toFalse
. - json_mode: The
JSONMode
to use when encoding BSON types to Extended JSON. Defaults toLEGACY
. - document_class: BSON documents returned by
loads()
will be decoded to an instance of this class. Must be a subclass ofcollections.MutableMapping
. Defaults todict
. - uuid_representation: The
UuidRepresentation
to use when encoding and decoding instances ofuuid.UUID
. Defaults toPYTHON_LEGACY
. - tz_aware: If
True
, MongoDB Extended JSON’s Strict mode type Date will be decoded to timezone aware instances ofdatetime.datetime
. Otherwise they will be naive. Defaults toTrue
. - tzinfo: A
datetime.tzinfo
subclass that specifies the timezone from whichdatetime
objects should be decoded. Defaults toutc
. - args: arguments to
CodecOptions
- kwargs: arguments to
CodecOptions
See also
The specification for Relaxed and Canonical Extended JSON.
New in version 3.4.
Changed in version 3.5: Accepts the optional parameter json_mode.
-
with_options
(**kwargs)¶ Make a copy of this JSONOptions, overriding some options:
>>> from bson.json_util import CANONICAL_JSON_OPTIONS >>> CANONICAL_JSON_OPTIONS.tz_aware True >>> json_options = CANONICAL_JSON_OPTIONS.with_options(tz_aware=False) >>> json_options.tz_aware False
New in version 3.12.
- strict_number_long: If
-
bson.json_util.
LEGACY_JSON_OPTIONS
= JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))¶ JSONOptions
for encoding to PyMongo’s legacy JSON format.See also
The documentation for
bson.json_util.JSONMode.LEGACY
.New in version 3.5.
-
bson.json_util.
DEFAULT_JSON_OPTIONS
= JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))¶ The default
JSONOptions
for JSON encoding/decoding.The same as
LEGACY_JSON_OPTIONS
. This will change toRELAXED_JSON_OPTIONS
in a future release.New in version 3.4.
-
bson.json_util.
CANONICAL_JSON_OPTIONS
= JSONOptions(strict_number_long=True, datetime_representation=1, strict_uuid=True, json_mode=2, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))¶ JSONOptions
for Canonical Extended JSON.See also
The documentation for
bson.json_util.JSONMode.CANONICAL
.New in version 3.5.
-
bson.json_util.
RELAXED_JSON_OPTIONS
= JSONOptions(strict_number_long=False, datetime_representation=2, strict_uuid=True, json_mode=1, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))¶ JSONOptions
for Relaxed Extended JSON.See also
The documentation for
bson.json_util.JSONMode.RELAXED
.New in version 3.5.
-
bson.json_util.
STRICT_JSON_OPTIONS
= JSONOptions(strict_number_long=True, datetime_representation=2, strict_uuid=True, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))¶ DEPRECATED -
JSONOptions
for MongoDB Extended JSON’s Strict mode encoding.New in version 3.4.
Changed in version 3.5: Deprecated. Use
RELAXED_JSON_OPTIONS
orCANONICAL_JSON_OPTIONS
instead.
-
bson.json_util.
dumps
(obj, *args, **kwargs)¶ Helper function that wraps
json.dumps()
.Recursive function that handles all BSON types including
Binary
andCode
.Parameters: - json_options: A
JSONOptions
instance used to modify the encoding of MongoDB Extended JSON types. Defaults toDEFAULT_JSON_OPTIONS
.
Changed in version 3.4: Accepts optional parameter json_options. See
JSONOptions
.Changed in version 2.7: Preserves order when rendering SON, Timestamp, Code, Binary, and DBRef instances.
- json_options: A
-
bson.json_util.
loads
(s, *args, **kwargs)¶ Helper function that wraps
json.loads()
.Automatically passes the object_hook for BSON type conversion.
Raises
TypeError
,ValueError
,KeyError
, orInvalidId
on invalid MongoDB Extended JSON.Parameters: - json_options: A
JSONOptions
instance used to modify the decoding of MongoDB Extended JSON types. Defaults toDEFAULT_JSON_OPTIONS
.
Changed in version 3.5: Parses Relaxed and Canonical Extended JSON as well as PyMongo’s legacy format. Now raises
TypeError
orValueError
when parsing JSON type wrappers with values of the wrong type or any extra keys.Changed in version 3.4: Accepts optional parameter json_options. See
JSONOptions
.- json_options: A
-
bson.json_util.
object_pairs_hook
(pairs, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶
-
bson.json_util.
object_hook
(dct, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶
-
bson.json_util.
default
(obj, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))¶
max_key
– Representation for the MongoDB internal MaxKey type¶
Representation for the MongoDB internal MaxKey type.
-
class
bson.max_key.
MaxKey
¶ MongoDB internal MaxKey type.
Changed in version 2.7:
MaxKey
now implements comparison operators.
min_key
– Representation for the MongoDB internal MinKey type¶
Representation for the MongoDB internal MinKey type.
-
class
bson.min_key.
MinKey
¶ MongoDB internal MinKey type.
Changed in version 2.7:
MinKey
now implements comparison operators.
objectid
– Tools for working with MongoDB ObjectIds¶
Tools for working with MongoDB ObjectIds.
-
class
bson.objectid.
ObjectId
(oid=None)¶ Initialize a new ObjectId.
An ObjectId is a 12-byte unique identifier consisting of:
- a 4-byte value representing the seconds since the Unix epoch,
- a 5-byte random value,
- a 3-byte counter, starting with a random value.
By default,
ObjectId()
creates a new unique identifier. The optional parameter oid can be anObjectId
, or any 12bytes
or, in Python 2, any 12-characterstr
.For example, the 12 bytes b’foo-bar-quux’ do not follow the ObjectId specification but they are acceptable input:
>>> ObjectId(b'foo-bar-quux') ObjectId('666f6f2d6261722d71757578')
oid can also be a
unicode
orstr
of 24 hex digits:>>> ObjectId('0123456789ab0123456789ab') ObjectId('0123456789ab0123456789ab') >>> >>> # A u-prefixed unicode literal: >>> ObjectId(u'0123456789ab0123456789ab') ObjectId('0123456789ab0123456789ab')
Raises
InvalidId
if oid is not 12 bytes nor 24 hex digits, orTypeError
if oid is not an accepted type.Parameters: - oid (optional): a valid ObjectId.
Changed in version 3.8:
ObjectId
now implements the ObjectID specification version 0.2.-
str(o)
Get a hex encoded version of
ObjectId
o.The following property always holds:
>>> o = ObjectId() >>> o == ObjectId(str(o)) True
This representation is useful for urls or other places where
o.binary
is inappropriate.
-
binary
¶ 12-byte binary representation of this ObjectId.
-
classmethod
from_datetime
(generation_time)¶ Create a dummy ObjectId instance with a specific generation time.
This method is useful for doing range queries on a field containing
ObjectId
instances.Warning
It is not safe to insert a document containing an ObjectId generated using this method. This method deliberately eliminates the uniqueness guarantee that ObjectIds generally provide. ObjectIds generated with this method should be used exclusively in queries.
generation_time will be converted to UTC. Naive datetime instances will be treated as though they already contain UTC.
An example using this helper to get documents where
"_id"
was generated before January 1, 2010 would be:>>> gen_time = datetime.datetime(2010, 1, 1) >>> dummy_id = ObjectId.from_datetime(gen_time) >>> result = collection.find({"_id": {"$lt": dummy_id}})
Parameters: - generation_time:
datetime
to be used as the generation time for the resulting ObjectId.
- generation_time:
-
generation_time
¶ A
datetime.datetime
instance representing the time of generation for thisObjectId
.The
datetime.datetime
is timezone aware, and represents the generation time in UTC. It is precise to the second.
-
classmethod
is_valid
(oid)¶ Checks if a oid string is valid or not.
Parameters: - oid: the object id to validate
New in version 2.3.
raw_bson
– Tools for representing raw BSON documents.¶
Tools for representing raw BSON documents.
-
bson.raw_bson.
DEFAULT_RAW_BSON_OPTIONS
= CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))¶ The default
CodecOptions
forRawBSONDocument
.
-
class
bson.raw_bson.
RawBSONDocument
(bson_bytes, codec_options=None)¶ Create a new
RawBSONDocument
RawBSONDocument
is a representation of a BSON document that provides access to the underlying raw BSON bytes. Only when a field is accessed or modified within the document does RawBSONDocument decode its bytes.RawBSONDocument
implements theMapping
abstract base class from the standard library so it can be used like a read-onlydict
:>>> from bson import encode >>> raw_doc = RawBSONDocument(encode({'_id': 'my_doc'})) >>> raw_doc.raw b'...' >>> raw_doc['_id'] 'my_doc'
Parameters: - bson_bytes: the BSON bytes that compose this document
- codec_options (optional): An instance of
CodecOptions
whosedocument_class
must beRawBSONDocument
. The default isDEFAULT_RAW_BSON_OPTIONS
.
Changed in version 3.8:
RawBSONDocument
now validates that thebson_bytes
passed in represent a single bson document.Changed in version 3.5: If a
CodecOptions
is passed in, its document_class must beRawBSONDocument
.-
items
()¶ Lazily decode and iterate elements in this document.
-
raw
¶ The raw BSON bytes composing this document.
regex
– Tools for representing MongoDB regular expressions¶
New in version 2.7.
Tools for representing MongoDB regular expressions.
-
class
bson.regex.
Regex
(pattern, flags=0)¶ BSON regular expression data.
This class is useful to store and retrieve regular expressions that are incompatible with Python’s regular expression dialect.
Parameters: - pattern: string
- flags: (optional) an integer bitmask, or a string of flag characters like “im” for IGNORECASE and MULTILINE
-
classmethod
from_native
(regex)¶ Convert a Python regular expression into a
Regex
instance.Note that in Python 3, a regular expression compiled from a
str
has there.UNICODE
flag set. If it is undesirable to store this flag in a BSON regular expression, unset it first:>>> pattern = re.compile('.*') >>> regex = Regex.from_native(pattern) >>> regex.flags ^= re.UNICODE >>> db.collection.insert({'pattern': regex})
Parameters: - regex: A regular expression object from
re.compile()
.
Warning
Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query.
- regex: A regular expression object from
-
try_compile
()¶ Compile this
Regex
as a Python regular expression.Warning
Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query.
try_compile()
may raisere.error
.
son
– Tools for working with SON, an ordered mapping¶
Tools for creating and manipulating SON, the Serialized Ocument Notation.
Regular dictionaries can be used instead of SON objects, but not when the order of keys is important. A SON object can be used just like a normal Python dictionary.
-
class
bson.son.
SON
(data=None, **kwargs)¶ SON data.
A subclass of dict that maintains ordering of keys and provides a few extra niceties for dealing with SON. SON provides an API similar to collections.OrderedDict from Python 2.7+.
-
clear
() → None. Remove all items from D.¶
-
copy
() → a shallow copy of D¶
-
get
(key, default=None)¶ Return the value for key if key is in the dictionary, else default.
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised
-
popitem
() → (k, v), remove and return some (key, value) pair as a¶ 2-tuple; but raise KeyError if D is empty.
-
setdefault
(key, default=None)¶ Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
-
to_dict
()¶ Convert a SON document to a normal Python dictionary instance.
This is trickier than just dict(…) because it needs to be recursive.
-
update
([E, ]**F) → None. Update D from dict/iterable E and F.¶ If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
-
values
() → an object providing a view on D's values¶
-
timestamp
– Tools for representing MongoDB internal Timestamps¶
Tools for representing MongoDB internal Timestamps.
-
class
bson.timestamp.
Timestamp
(time, inc)¶ Create a new
Timestamp
.This class is only for use with the MongoDB opLog. If you need to store a regular timestamp, please use a
datetime
.Raises
TypeError
if time is not an instance of :class: int ordatetime
, or inc is not an instance ofint
. RaisesValueError
if time or inc is not in [0, 2**32).Parameters:
tz_util
– Utilities for dealing with timezones in Python¶
Timezone related utilities for BSON.
-
class
bson.tz_util.
FixedOffset
(offset, name)¶ Fixed offset timezone, in minutes east from UTC.
Implementation based from the Python standard library documentation. Defining __getinitargs__ enables pickling / copying.
-
dst
(dt)¶ datetime -> DST offset as timedelta positive east of UTC.
-
tzname
(dt)¶ datetime -> string name of time zone.
-
utcoffset
(dt)¶ datetime -> timedelta showing offset from UTC, negative values indicating West of UTC
-
-
bson.tz_util.
utc
= <bson.tz_util.FixedOffset object>¶ Fixed offset timezone representing UTC.
pymongo
– Python driver for MongoDB¶
Python driver for MongoDB.
-
pymongo.
version
= '3.11.4'¶ str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
-
pymongo.
MongoClient
¶ Alias for
pymongo.mongo_client.MongoClient
.
-
pymongo.
MongoReplicaSetClient
¶ Alias for
pymongo.mongo_replica_set_client.MongoReplicaSetClient
.
-
pymongo.
ReadPreference
¶ Alias for
pymongo.read_preferences.ReadPreference
.
-
pymongo.
has_c
()¶ Is the C extension installed?
-
pymongo.
MIN_SUPPORTED_WIRE_VERSION
¶ The minimum wire protocol version PyMongo supports.
-
pymongo.
MAX_SUPPORTED_WIRE_VERSION
¶ The maximum wire protocol version PyMongo supports.
Sub-modules:
bulk
– The bulk write operations interface¶
The bulk write operations interface.
New in version 2.7.
-
class
pymongo.bulk.
BulkOperationBuilder
(collection, ordered=True, bypass_document_validation=False)¶ DEPRECATED: Initialize a new BulkOperationBuilder instance.
Parameters: - collection: A
Collection
instance. - ordered (optional): If
True
all operations will be executed serially, in the order provided, and the entire execution will abort on the first error. IfFalse
operations will be executed in arbitrary order (possibly in parallel on the server), reporting any errors that occurred after attempting all operations. Defaults toTrue
. - bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
.
Note
bypass_document_validation requires server version >= 3.2
Changed in version 3.5: Deprecated. Use
bulk_write()
instead.Changed in version 3.2: Added bypass_document_validation support
-
execute
(write_concern=None)¶ Execute all provided operations.
Parameters: - write_concern (optional): the write concern for this bulk execution.
-
find
(selector, collation=None)¶ Specify selection criteria for bulk operations.
Parameters: - selector (dict): the selection criteria for update and remove operations.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above.
Returns: - A
BulkWriteOperation
instance, used to add update and remove operations to this bulk operation.
Changed in version 3.4: Added the collation option.
-
insert
(document)¶ Insert a single document.
Parameters: - document (dict): the document to insert
- collection: A
-
class
pymongo.bulk.
BulkUpsertOperation
(selector, bulk, collation)¶ An interface for adding upsert operations.
-
replace_one
(replacement)¶ Replace one entire document matching the selector criteria.
Parameters: - replacement (dict): the replacement document
-
update
(update)¶ Update all documents matching the selector.
Parameters: - update (dict): the update operations to apply
-
update_one
(update)¶ Update one document matching the selector.
Parameters: - update (dict): the update operations to apply
-
-
class
pymongo.bulk.
BulkWriteOperation
(selector, bulk, collation)¶ An interface for adding update or remove operations.
-
remove
()¶ Remove all documents matching the selector criteria.
-
remove_one
()¶ Remove a single document matching the selector criteria.
-
replace_one
(replacement)¶ Replace one entire document matching the selector criteria.
Parameters: - replacement (dict): the replacement document
-
update
(update)¶ Update all documents matching the selector criteria.
Parameters: - update (dict): the update operations to apply
-
update_one
(update)¶ Update one document matching the selector criteria.
Parameters: - update (dict): the update operations to apply
-
upsert
()¶ Specify that all chained update operations should be upserts.
Returns: - A
BulkUpsertOperation
instance, used to add update operations to this bulk operation.
- A
-
change_stream
– Watch changes on a collection, database, or cluster¶
Watch changes on a collection, a database, or the entire cluster.
-
class
pymongo.change_stream.
ChangeStream
(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)¶ The internal abstract base class for change stream cursors.
Should not be called directly by application developers. Use
pymongo.collection.Collection.watch()
,pymongo.database.Database.watch()
, orpymongo.mongo_client.MongoClient.watch()
instead.New in version 3.6.
-
alive
¶ Does this cursor have the potential to return more data?
Note
Even if
alive
isTrue
,next()
can raiseStopIteration
andtry_next()
can returnNone
.New in version 3.8.
-
close
()¶ Close this ChangeStream.
-
next
()¶ Advance the cursor.
This method blocks until the next change document is returned or an unrecoverable error is raised. This method is used when iterating over all changes in the cursor. For example:
try: resume_token = None pipeline = [{'$match': {'operationType': 'insert'}}] with db.collection.watch(pipeline) as stream: for insert_change in stream: print(insert_change) resume_token = stream.resume_token except pymongo.errors.PyMongoError: # The ChangeStream encountered an unrecoverable error or the # resume attempt failed to recreate the cursor. if resume_token is None: # There is no usable resume token because there was a # failure during ChangeStream initialization. logging.error('...') else: # Use the interrupted ChangeStream's resume token to create # a new ChangeStream. The new stream will continue from the # last seen insert change without missing any events. with db.collection.watch( pipeline, resume_after=resume_token) as stream: for insert_change in stream: print(insert_change)
Raises
StopIteration
if this ChangeStream is closed.
-
resume_token
¶ The cached resume token that will be used to resume after the most recently returned change.
New in version 3.9.
-
try_next
()¶ Advance the cursor without blocking indefinitely.
This method returns the next change document without waiting indefinitely for the next change. For example:
with db.collection.watch() as stream: while stream.alive: change = stream.try_next() # Note that the ChangeStream's resume token may be updated # even when no changes are returned. print("Current resume token: %r" % (stream.resume_token,)) if change is not None: print("Change document: %r" % (change,)) continue # We end up here when there are no recent changes. # Sleep for a while before trying again to avoid flooding # the server with getMore requests when no changes are # available. time.sleep(10)
If no change document is cached locally then this method runs a single getMore command. If the getMore yields any documents, the next document is returned, otherwise, if the getMore returns no documents (because there have been no changes) then
None
is returned.Returns: The next change document or None
when no document is available after running a single getMore or when the cursor is closed.New in version 3.8.
-
-
class
pymongo.change_stream.
ClusterChangeStream
(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)¶ A change stream that watches changes on all collections in the cluster.
Should not be called directly by application developers. Use helper method
pymongo.mongo_client.MongoClient.watch()
instead.New in version 3.7.
-
class
pymongo.change_stream.
CollectionChangeStream
(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)¶ A change stream that watches changes on a single collection.
Should not be called directly by application developers. Use helper method
pymongo.collection.Collection.watch()
instead.New in version 3.7.
-
class
pymongo.change_stream.
DatabaseChangeStream
(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)¶ A change stream that watches changes on all collections in a database.
Should not be called directly by application developers. Use helper method
pymongo.database.Database.watch()
instead.New in version 3.7.
client_session
– Logical sessions for sequential operations¶
Logical sessions for ordering sequential operations.
Requires MongoDB 3.6.
New in version 3.6.
Causally Consistent Reads¶
with client.start_session(causal_consistency=True) as session:
collection = client.db.collection
collection.update_one({'_id': 1}, {'$set': {'x': 10}}, session=session)
secondary_c = collection.with_options(
read_preference=ReadPreference.SECONDARY)
# A secondary read waits for replication of the write.
secondary_c.find_one({'_id': 1}, session=session)
If causal_consistency is True (the default), read operations that use the session are causally after previous read and write operations. Using a causally consistent session, an application can read its own writes and is guaranteed monotonic reads, even when reading from replica set secondaries.
Transactions¶
MongoDB 4.0 adds support for transactions on replica set primaries. A
transaction is associated with a ClientSession
. To start a transaction
on a session, use ClientSession.start_transaction()
in a with-statement.
Then, execute an operation within the transaction by passing the session to the
operation:
orders = client.db.orders
inventory = client.db.inventory
with client.start_session() as session:
with session.start_transaction():
orders.insert_one({"sku": "abc123", "qty": 100}, session=session)
inventory.update_one({"sku": "abc123", "qty": {"$gte": 100}},
{"$inc": {"qty": -100}}, session=session)
Upon normal completion of with session.start_transaction()
block, the
transaction automatically calls ClientSession.commit_transaction()
.
If the block exits with an exception, the transaction automatically calls
ClientSession.abort_transaction()
.
In general, multi-document transactions only support read/write (CRUD) operations on existing collections. However, MongoDB 4.4 adds support for creating collections and indexes with some limitations, including an insert operation that would result in the creation of a new collection. For a complete description of all the supported and unsupported operations see the MongoDB server’s documentation for transactions.
A session may only have a single active transaction at a time, multiple transactions on the same session can be executed in sequence.
New in version 3.7.
PyMongo 3.9 adds support for transactions on sharded clusters running MongoDB 4.2. Sharded transactions have the same API as replica set transactions. When running a transaction against a sharded cluster, the session is pinned to the mongos server selected for the first operation in the transaction. All subsequent operations that are part of the same transaction are routed to the same mongos server. When the transaction is completed, by running either commitTransaction or abortTransaction, the session is unpinned.
New in version 3.9.
Classes¶
-
class
pymongo.client_session.
ClientSession
(client, server_session, options, authset, implicit)¶ A session for ordering sequential operations.
ClientSession
instances are not thread-safe or fork-safe. They can only be used by one thread or process at a time. A singleClientSession
cannot be used to run multiple operations concurrently.Should not be initialized directly by application developers - to create a
ClientSession
, callstart_session()
.-
abort_transaction
()¶ Abort a multi-statement transaction.
New in version 3.7.
-
advance_cluster_time
(cluster_time)¶ Update the cluster time for this session.
Parameters: - cluster_time: The
cluster_time
from another ClientSession instance.
- cluster_time: The
-
advance_operation_time
(operation_time)¶ Update the operation time for this session.
Parameters: - operation_time: The
operation_time
from another ClientSession instance.
- operation_time: The
-
client
¶ The
MongoClient
this session was created from.
-
cluster_time
¶ The cluster time returned by the last operation executed in this session.
-
commit_transaction
()¶ Commit a multi-statement transaction.
New in version 3.7.
-
end_session
()¶ Finish this session. If a transaction has started, abort it.
It is an error to use the session after the session has ended.
-
has_ended
¶ True if this session is finished.
-
in_transaction
¶ True if this session has an active multi-statement transaction.
New in version 3.10.
-
operation_time
¶ The operation time returned by the last operation executed in this session.
-
options
¶ The
SessionOptions
this session was created with.
-
session_id
¶ A BSON document, the opaque server session identifier.
-
start_transaction
(read_concern=None, write_concern=None, read_preference=None, max_commit_time_ms=None)¶ Start a multi-statement transaction.
Takes the same arguments as
TransactionOptions
.Changed in version 3.9: Added the
max_commit_time_ms
option.New in version 3.7.
-
with_transaction
(callback, read_concern=None, write_concern=None, read_preference=None, max_commit_time_ms=None)¶ Execute a callback in a transaction.
This method starts a transaction on this session, executes
callback
once, and then commits the transaction. For example:def callback(session): orders = session.client.db.orders inventory = session.client.db.inventory orders.insert_one({"sku": "abc123", "qty": 100}, session=session) inventory.update_one({"sku": "abc123", "qty": {"$gte": 100}}, {"$inc": {"qty": -100}}, session=session) with client.start_session() as session: session.with_transaction(callback)
To pass arbitrary arguments to the
callback
, wrap your callable with alambda
like this:def callback(session, custom_arg, custom_kwarg=None): # Transaction operations... with client.start_session() as session: session.with_transaction( lambda s: callback(s, "custom_arg", custom_kwarg=1))
In the event of an exception,
with_transaction
may retry the commit or the entire transaction, thereforecallback
may be invoked multiple times by a single call towith_transaction
. Developers should be mindful of this possiblity when writing acallback
that modifies application state or has any other side-effects. Note that even when thecallback
is invoked multiple times,with_transaction
ensures that the transaction will be committed at-most-once on the server.The
callback
should not attempt to start new transactions, but should simply run operations meant to be contained within a transaction. Thecallback
should also not commit the transaction; this is handled automatically bywith_transaction
. If thecallback
does commit or abort the transaction without error, however,with_transaction
will return without taking further action.ClientSession
instances are not thread-safe or fork-safe. Consequently, thecallback
must not attempt to execute multiple operations concurrently.When
callback
raises an exception,with_transaction
automatically aborts the current transaction. Whencallback
orcommit_transaction()
raises an exception that includes the"TransientTransactionError"
error label,with_transaction
starts a new transaction and re-executes thecallback
.When
commit_transaction()
raises an exception with the"UnknownTransactionCommitResult"
error label,with_transaction
retries the commit until the result of the transaction is known.This method will cease retrying after 120 seconds has elapsed. This timeout is not configurable and any exception raised by the
callback
or byClientSession.commit_transaction()
after the timeout is reached will be re-raised. Applications that desire a different timeout duration should not use this method.Parameters: - callback: The callable
callback
to run inside a transaction. The callable must accept a single argument, this session. Note, under certain error conditions the callback may be run multiple times. - read_concern (optional): The
ReadConcern
to use for this transaction. - write_concern (optional): The
WriteConcern
to use for this transaction. - read_preference (optional): The read preference to use for this
transaction. If
None
(the default) theread_preference
of thisDatabase
is used. Seeread_preferences
for options.
Returns: The return value of the
callback
.New in version 3.9.
- callback: The callable
-
-
class
pymongo.client_session.
SessionOptions
(causal_consistency=True, default_transaction_options=None)¶ Options for a new
ClientSession
.Parameters: - causal_consistency (optional): If True (the default), read operations are causally ordered within the session.
- default_transaction_options (optional): The default TransactionOptions to use for transactions started on this session.
-
causal_consistency
¶ Whether causal consistency is configured.
-
default_transaction_options
¶ The default TransactionOptions to use for transactions started on this session.
New in version 3.7.
-
class
pymongo.client_session.
TransactionOptions
(read_concern=None, write_concern=None, read_preference=None, max_commit_time_ms=None)¶ Options for
ClientSession.start_transaction()
.Parameters: - read_concern (optional): The
ReadConcern
to use for this transaction. IfNone
(the default) theread_preference
of theMongoClient
is used. - write_concern (optional): The
WriteConcern
to use for this transaction. IfNone
(the default) theread_preference
of theMongoClient
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisMongoClient
is used. Seeread_preferences
for options. Transactions which read must usePRIMARY
. - max_commit_time_ms (optional): The maximum amount of time to allow a
single commitTransaction command to run. This option is an alias for
maxTimeMS option on the commitTransaction command. If
None
(the default) maxTimeMS is not used.
Changed in version 3.9: Added the
max_commit_time_ms
option.New in version 3.7.
-
max_commit_time_ms
¶ The maxTimeMS to use when running a commitTransaction command.
New in version 3.9.
-
read_concern
¶ This transaction’s
ReadConcern
.
-
read_preference
¶ This transaction’s
ReadPreference
.
-
write_concern
¶ This transaction’s
WriteConcern
.
- read_concern (optional): The
collation
– Tools for working with collations.¶
Tools for working with collations.
-
class
pymongo.collation.
Collation
(locale, caseLevel=None, caseFirst=None, strength=None, numericOrdering=None, alternate=None, maxVariable=None, normalization=None, backwards=None, **kwargs)¶ Parameters: locale: (string) The locale of the collation. This should be a string that identifies an ICU locale ID exactly. For example,
en_US
is valid, buten_us
anden-US
are not. Consult the MongoDB documentation for a list of supported locales.caseLevel: (optional) If
True
, turn on case sensitivity if strength is 1 or 2 (case sensitivity is implied if strength is greater than 2). Defaults toFalse
.caseFirst: (optional) Specify that either uppercase or lowercase characters take precedence. Must be one of the following values:
strength: (optional) Specify the comparison strength. This is also known as the ICU comparison level. This must be one of the following values:
PRIMARY
SECONDARY
TERTIARY
(the default)QUATERNARY
IDENTICAL
Each successive level builds upon the previous. For example, a strength of
SECONDARY
differentiates characters based both on the unadorned base character and its accents.numericOrdering: (optional) If
True
, order numbers numerically instead of in collation order (defaults toFalse
).alternate: (optional) Specify whether spaces and punctuation are considered base characters. This must be one of the following values:
NON_IGNORABLE
(the default)SHIFTED
maxVariable: (optional) When alternate is
SHIFTED
, this option specifies what characters may be ignored. This must be one of the following values:normalization: (optional) If
True
, normalizes text into Unicode NFD. Defaults toFalse
.backwards: (optional) If
True
, accents on characters are considered from the back of the word to the front, as it is done in some French dictionary ordering traditions. Defaults toFalse
.kwargs: (optional) Keyword arguments supplying any additional options to be sent with this Collation object.
-
class
pymongo.collation.
CollationStrength
¶ An enum that defines values for strength on a
Collation
.-
PRIMARY
= 1¶ Differentiate base (unadorned) characters.
-
SECONDARY
= 2¶ Differentiate character accents.
-
TERTIARY
= 3¶ Differentiate character case.
-
QUATERNARY
= 4¶ Differentiate words with and without punctuation.
-
IDENTICAL
= 5¶ Differentiate unicode code point (characters are exactly identical).
-
-
class
pymongo.collation.
CollationAlternate
¶ An enum that defines values for alternate on a
Collation
.-
NON_IGNORABLE
= 'non-ignorable'¶ Spaces and punctuation are treated as base characters.
-
SHIFTED
= 'shifted'¶ Spaces and punctuation are not considered base characters.
Spaces and punctuation are distinguished regardless when the
Collation
strength is at leastQUATERNARY
.
-
collection
– Collection level operations¶
Collection level utilities for Mongo.
-
pymongo.
ASCENDING
= 1¶ Ascending sort order.
-
pymongo.
DESCENDING
= -1¶ Descending sort order.
-
pymongo.
GEO2D
= '2d'¶ Index specifier for a 2-dimensional geospatial index.
-
pymongo.
GEOHAYSTACK
= 'geoHaystack'¶ DEPRECATED - Index specifier for a 2-dimensional haystack index.
DEPRECATED -
GEOHAYSTACK
is deprecated and will be removed in PyMongo 4.0. geoHaystack indexes (and the geoSearch command) were deprecated in MongoDB 4.4. Instead, create a 2d index and use $geoNear or $geoWithin. See https://dochub.mongodb.org/core/4.4-deprecate-geoHaystack.Changed in version 3.11: Deprecated.
-
pymongo.
GEOSPHERE
= '2dsphere'¶ Index specifier for a spherical geospatial index.
New in version 2.5.
-
pymongo.
HASHED
= 'hashed'¶ Index specifier for a hashed index.
New in version 2.5.
-
pymongo.
TEXT
= 'text'¶ Index specifier for a text index.
See also
MongoDB’s Atlas Search which offers more advanced text search functionality.
New in version 2.7.1.
-
class
pymongo.collection.
ReturnDocument
¶ An enum used with
find_one_and_replace()
andfind_one_and_update()
.-
BEFORE
¶ Return the original document before it was updated/replaced, or
None
if no document matches the query.
-
AFTER
¶ Return the updated/replaced or inserted document.
-
-
class
pymongo.collection.
Collection
(database, name, create=False, **kwargs)¶ Get / create a Mongo collection.
Raises
TypeError
if name is not an instance ofbasestring
(str
in python 3). RaisesInvalidName
if name is not a valid collection name. Any additional keyword arguments will be used as options passed to the create command. Seecreate_collection()
for valid options.If create is
True
, collation is specified, or any additional keyword arguments are present, acreate
command will be sent, usingsession
if specified. Otherwise, acreate
command will not be sent and the collection will be created implicitly on first use. The optionalsession
argument is only used for thecreate
command, it is not associated with the collection afterward.Parameters: - database: the database to get a collection from
- name: the name of the collection to get
- create (optional): if
True
, force collection creation even without options being set - codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) database.codec_options is used. - read_preference (optional): The read preference to use. If
None
(the default) database.read_preference is used. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) database.write_concern is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) database.read_concern is used. - collation (optional): An instance of
Collation
. If a collation is provided, it will be passed to the create collection command. This option is only supported on MongoDB 3.4 and above. - session (optional): a
ClientSession
that is used with the create collection command - **kwargs (optional): additional keyword arguments will be passed as options for the create collection command
Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Support the collation option.
Changed in version 3.2: Added the read_concern option.
Changed in version 3.0: Added the codec_options, read_preference, and write_concern options. Removed the uuid_subtype attribute.
Collection
no longer returns an instance ofCollection
for attribute names with leading underscores. You must use dict-style lookups instead::collection[‘__my_collection__’]Not:
collection.__my_collection__Changed in version 2.2: Removed deprecated argument: options
New in version 2.1: uuid_subtype attribute
-
c[name] || c.name
Get the name sub-collection of
Collection
c.Raises
InvalidName
if an invalid collection name is used.
-
full_name
¶ The full name of this
Collection
.The full name is of the form database_name.collection_name.
-
name
¶ The name of this
Collection
.
-
database
¶ The
Database
that thisCollection
is a part of.
-
codec_options
¶ Read only access to the
CodecOptions
of this instance.
-
read_preference
¶ Read only access to the read preference of this instance.
Changed in version 3.0: The
read_preference
attribute is now read only.
-
write_concern
¶ Read only access to the
WriteConcern
of this instance.Changed in version 3.0: The
write_concern
attribute is now read only.
-
read_concern
¶ Read only access to the
ReadConcern
of this instance.New in version 3.2.
-
with_options
(codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get a clone of this collection changing the specified settings.
>>> coll1.read_preference Primary() >>> from pymongo import ReadPreference >>> coll2 = coll1.with_options(read_preference=ReadPreference.SECONDARY) >>> coll1.read_preference Primary() >>> coll2.read_preference Secondary(tag_sets=None)
Parameters: - codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisCollection
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisCollection
is used. Seeread_preferences
for options. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisCollection
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisCollection
is used.
- codec_options (optional): An instance of
-
bulk_write
(requests, ordered=True, bypass_document_validation=False, session=None)¶ Send a batch of write operations to the server.
Requests are passed as a list of write operation instances (
InsertOne
,UpdateOne
,UpdateMany
,ReplaceOne
,DeleteOne
, orDeleteMany
).>>> for doc in db.test.find({}): ... print(doc) ... {u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634ef')} {u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')} >>> # DeleteMany, UpdateOne, and UpdateMany are also available. ... >>> from pymongo import InsertOne, DeleteOne, ReplaceOne >>> requests = [InsertOne({'y': 1}), DeleteOne({'x': 1}), ... ReplaceOne({'w': 1}, {'z': 1}, upsert=True)] >>> result = db.test.bulk_write(requests) >>> result.inserted_count 1 >>> result.deleted_count 1 >>> result.modified_count 0 >>> result.upserted_ids {2: ObjectId('54f62ee28891e756a6e1abd5')} >>> for doc in db.test.find({}): ... print(doc) ... {u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')} {u'y': 1, u'_id': ObjectId('54f62ee2fba5226811f634f1')} {u'z': 1, u'_id': ObjectId('54f62ee28891e756a6e1abd5')}
Parameters: - requests: A list of write operations (see examples above).
- ordered (optional): If
True
(the default) requests will be performed on the server serially, in the order provided. If an error occurs all remaining operations are aborted. IfFalse
requests will be performed on the server in arbitrary order, possibly in parallel, and all operations will be attempted. - bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
. - session (optional): a
ClientSession
.
Returns: An instance of
BulkWriteResult
.Note
bypass_document_validation requires server version >= 3.2
Changed in version 3.6: Added
session
parameter.Changed in version 3.2: Added bypass_document_validation support
New in version 3.0.
-
insert_one
(document, bypass_document_validation=False, session=None)¶ Insert a single document.
>>> db.test.count_documents({'x': 1}) 0 >>> result = db.test.insert_one({'x': 1}) >>> result.inserted_id ObjectId('54f112defba522406c9cc208') >>> db.test.find_one({'x': 1}) {u'x': 1, u'_id': ObjectId('54f112defba522406c9cc208')}
Parameters: - document: The document to insert. Must be a mutable mapping type. If the document does not have an _id field one will be added automatically.
- bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
. - session (optional): a
ClientSession
.
Returns: - An instance of
InsertOneResult
.
Note
bypass_document_validation requires server version >= 3.2
Changed in version 3.6: Added
session
parameter.Changed in version 3.2: Added bypass_document_validation support
New in version 3.0.
-
insert_many
(documents, ordered=True, bypass_document_validation=False, session=None)¶ Insert an iterable of documents.
>>> db.test.count_documents({}) 0 >>> result = db.test.insert_many([{'x': i} for i in range(2)]) >>> result.inserted_ids [ObjectId('54f113fffba522406c9cc20e'), ObjectId('54f113fffba522406c9cc20f')] >>> db.test.count_documents({}) 2
Parameters: - documents: A iterable of documents to insert.
- ordered (optional): If
True
(the default) documents will be inserted on the server serially, in the order provided. If an error occurs all remaining inserts are aborted. IfFalse
, documents will be inserted on the server in arbitrary order, possibly in parallel, and all document inserts will be attempted. - bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
. - session (optional): a
ClientSession
.
Returns: An instance of
InsertManyResult
.Note
bypass_document_validation requires server version >= 3.2
Changed in version 3.6: Added
session
parameter.Changed in version 3.2: Added bypass_document_validation support
New in version 3.0.
-
replace_one
(filter, replacement, upsert=False, bypass_document_validation=False, collation=None, hint=None, session=None)¶ Replace a single document matching the filter.
>>> for doc in db.test.find({}): ... print(doc) ... {u'x': 1, u'_id': ObjectId('54f4c5befba5220aa4d6dee7')} >>> result = db.test.replace_one({'x': 1}, {'y': 1}) >>> result.matched_count 1 >>> result.modified_count 1 >>> for doc in db.test.find({}): ... print(doc) ... {u'y': 1, u'_id': ObjectId('54f4c5befba5220aa4d6dee7')}
The upsert option can be used to insert a new document if a matching document does not exist.
>>> result = db.test.replace_one({'x': 1}, {'x': 1}, True) >>> result.matched_count 0 >>> result.modified_count 0 >>> result.upserted_id ObjectId('54f11e5c8891e756a6e1abd4') >>> db.test.find_one({'x': 1}) {u'x': 1, u'_id': ObjectId('54f11e5c8891e756a6e1abd4')}
Parameters: - filter: A query that matches the document to replace.
- replacement: The new document.
- upsert (optional): If
True
, perform an insert if no documents match the filter. - bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
. This option is only supported on MongoDB 3.2 and above. - collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.2 and above. - session (optional): a
ClientSession
.
Returns: - An instance of
UpdateResult
.
Changed in version 3.11: Added
hint
parameter.Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Added the collation option.
Changed in version 3.2: Added bypass_document_validation support.
New in version 3.0.
-
update_one
(filter, update, upsert=False, bypass_document_validation=False, collation=None, array_filters=None, hint=None, session=None)¶ Update a single document matching the filter.
>>> for doc in db.test.find(): ... print(doc) ... {u'x': 1, u'_id': 0} {u'x': 1, u'_id': 1} {u'x': 1, u'_id': 2} >>> result = db.test.update_one({'x': 1}, {'$inc': {'x': 3}}) >>> result.matched_count 1 >>> result.modified_count 1 >>> for doc in db.test.find(): ... print(doc) ... {u'x': 4, u'_id': 0} {u'x': 1, u'_id': 1} {u'x': 1, u'_id': 2}
Parameters: - filter: A query that matches the document to update.
- update: The modifications to apply.
- upsert (optional): If
True
, perform an insert if no documents match the filter. - bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
. This option is only supported on MongoDB 3.2 and above. - collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - array_filters (optional): A list of filters specifying which array elements an update should apply. This option is only supported on MongoDB 3.6 and above.
- hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.2 and above. - session (optional): a
ClientSession
.
Returns: - An instance of
UpdateResult
.
Changed in version 3.11: Added
hint
parameter.Changed in version 3.9: Added the ability to accept a pipeline as the
update
.Changed in version 3.6: Added the
array_filters
andsession
parameters.Changed in version 3.4: Added the
collation
option.Changed in version 3.2: Added
bypass_document_validation
support.New in version 3.0.
-
update_many
(filter, update, upsert=False, array_filters=None, bypass_document_validation=False, collation=None, hint=None, session=None)¶ Update one or more documents that match the filter.
>>> for doc in db.test.find(): ... print(doc) ... {u'x': 1, u'_id': 0} {u'x': 1, u'_id': 1} {u'x': 1, u'_id': 2} >>> result = db.test.update_many({'x': 1}, {'$inc': {'x': 3}}) >>> result.matched_count 3 >>> result.modified_count 3 >>> for doc in db.test.find(): ... print(doc) ... {u'x': 4, u'_id': 0} {u'x': 4, u'_id': 1} {u'x': 4, u'_id': 2}
Parameters: - filter: A query that matches the documents to update.
- update: The modifications to apply.
- upsert (optional): If
True
, perform an insert if no documents match the filter. - bypass_document_validation (optional): If
True
, allows the write to opt-out of document level validation. Default isFalse
. This option is only supported on MongoDB 3.2 and above. - collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - array_filters (optional): A list of filters specifying which array elements an update should apply. This option is only supported on MongoDB 3.6 and above.
- hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.2 and above. - session (optional): a
ClientSession
.
Returns: - An instance of
UpdateResult
.
Changed in version 3.11: Added
hint
parameter.Changed in version 3.9: Added the ability to accept a pipeline as the update.
Changed in version 3.6: Added
array_filters
andsession
parameters.Changed in version 3.4: Added the collation option.
Changed in version 3.2: Added bypass_document_validation support.
New in version 3.0.
-
delete_one
(filter, collation=None, hint=None, session=None)¶ Delete a single document matching the filter.
>>> db.test.count_documents({'x': 1}) 3 >>> result = db.test.delete_one({'x': 1}) >>> result.deleted_count 1 >>> db.test.count_documents({'x': 1}) 2
Parameters: - filter: A query that matches the document to delete.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.4 and above. - session (optional): a
ClientSession
.
Returns: - An instance of
DeleteResult
.
Changed in version 3.11: Added
hint
parameter.Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Added the collation option.
New in version 3.0.
-
delete_many
(filter, collation=None, hint=None, session=None)¶ Delete one or more documents matching the filter.
>>> db.test.count_documents({'x': 1}) 3 >>> result = db.test.delete_many({'x': 1}) >>> result.deleted_count 3 >>> db.test.count_documents({'x': 1}) 0
Parameters: - filter: A query that matches the documents to delete.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.4 and above. - session (optional): a
ClientSession
.
Returns: - An instance of
DeleteResult
.
Changed in version 3.11: Added
hint
parameter.Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Added the collation option.
New in version 3.0.
-
aggregate
(pipeline, session=None, **kwargs)¶ Perform an aggregation using the aggregation framework on this collection.
All optional aggregate command parameters should be passed as keyword arguments to this method. Valid options include, but are not limited to:
- allowDiskUse (bool): Enables writing to temporary files. When set to True, aggregation stages can write data to the _tmp subdirectory of the –dbpath directory. The default is False.
- maxTimeMS (int): The maximum amount of time to allow the operation to run in milliseconds.
- batchSize (int): The maximum number of documents to return per
batch. Ignored if the connected mongod or mongos does not support
returning aggregate results using a cursor, or useCursor is
False
. - collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - useCursor (bool): Deprecated. Will be removed in PyMongo 4.0.
The
aggregate()
method obeys theread_preference
of thisCollection
, except when$out
or$merge
are used, in which casePRIMARY
is used.Note
This method does not support the ‘explain’ option. Please use
command()
instead. An example is included in the Aggregation Framework documentation.Note
The
write_concern
of this collection is automatically applied to this operation when using MongoDB >= 3.4.Parameters: - pipeline: a list of aggregation pipeline stages
- session (optional): a
ClientSession
. - **kwargs (optional): See list of options above.
Returns: A
CommandCursor
over the result set.Changed in version 3.9: Apply this collection’s read concern to pipelines containing the $out stage when connected to MongoDB >= 4.2. Added support for the
$merge
pipeline stage. Aggregations that write always use read preferencePRIMARY
.Changed in version 3.6: Added the session parameter. Added the maxAwaitTimeMS option. Deprecated the useCursor option.
Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4. Support the collation option.
Changed in version 3.0: The
aggregate()
method always returns a CommandCursor. The pipeline argument must be a list.Changed in version 2.7: When the cursor option is used, return
CommandCursor
instead ofCursor
.Changed in version 2.6: Added cursor support.
New in version 2.3.
See also
-
aggregate_raw_batches
(pipeline, **kwargs)¶ Perform an aggregation and retrieve batches of raw BSON.
Similar to the
aggregate()
method but returns aRawBatchCursor
.This example demonstrates how to work with raw batches, but in practice raw batches should be passed to an external library that can decode BSON into another data type, rather than used with PyMongo’s
bson
module.>>> import bson >>> cursor = db.test.aggregate_raw_batches([ ... {'$project': {'x': {'$multiply': [2, '$x']}}}]) >>> for batch in cursor: ... print(bson.decode_all(batch))
Note
aggregate_raw_batches does not support sessions or auto encryption.
New in version 3.6.
-
watch
(pipeline=None, full_document=None, resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, start_at_operation_time=None, session=None, start_after=None)¶ Watch changes on this collection.
Performs an aggregation with an implicit initial
$changeStream
stage and returns aCollectionChangeStream
cursor which iterates over changes on this collection.Introduced in MongoDB 3.6.
with db.collection.watch() as stream: for change in stream: print(change)
The
CollectionChangeStream
iterable blocks until the next change document is returned or an error is raised. If thenext()
method encounters a network error when retrieving a batch from the server, it will automatically attempt to recreate the cursor such that no change events are missed. Any error encountered during the resume attempt indicates there may be an outage and will be raised.try: with db.collection.watch( [{'$match': {'operationType': 'insert'}}]) as stream: for insert_change in stream: print(insert_change) except pymongo.errors.PyMongoError: # The ChangeStream encountered an unrecoverable error or the # resume attempt failed to recreate the cursor. logging.error('...')
For a precise description of the resume process see the change streams specification.
Note
Using this helper method is preferred to directly calling
aggregate()
with a$changeStream
stage, for the purpose of supporting resumability.Warning
This Collection’s
read_concern
must beReadConcern("majority")
in order to use the$changeStream
stage.Parameters: - pipeline (optional): A list of aggregation pipeline stages to
append to an initial
$changeStream
stage. Not all pipeline stages are valid after a$changeStream
stage, see the MongoDB documentation on change streams for the supported stages. - full_document (optional): The fullDocument to pass as an option
to the
$changeStream
stage. Allowed values: ‘updateLookup’. When set to ‘updateLookup’, the change notification for partial updates will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred. - resume_after (optional): A resume token. If provided, the change stream will start returning changes that occur directly after the operation specified in the resume token. A resume token is the _id value of a change document.
- max_await_time_ms (optional): The maximum time in milliseconds for the server to wait for changes before responding to a getMore operation.
- batch_size (optional): The maximum number of documents to return per batch.
- collation (optional): The
Collation
to use for the aggregation. - start_at_operation_time (optional): If provided, the resulting
change stream will only return changes that occurred at or after
the specified
Timestamp
. Requires MongoDB >= 4.0. - session (optional): a
ClientSession
. - start_after (optional): The same as resume_after except that start_after can resume notifications after an invalidate event. This option and resume_after are mutually exclusive.
Returns: A
CollectionChangeStream
cursor.Changed in version 3.9: Added the
start_after
parameter.Changed in version 3.7: Added the
start_at_operation_time
parameter.New in version 3.6.
- pipeline (optional): A list of aggregation pipeline stages to
append to an initial
-
find
(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, session=None, allow_disk_use=None)¶ Query the database.
The filter argument is a prototype document that all results must match. For example:
>>> db.test.find({"hello": "world"})
only matches documents that have a key “hello” with value “world”. Matches can have other keys in addition to “hello”. The projection argument is used to specify a subset of fields that should be included in the result documents. By limiting results to a certain subset of fields you can cut down on network traffic and decoding time.
Raises
TypeError
if any of the arguments are of improper type. Returns an instance ofCursor
corresponding to this query.The
find()
method obeys theread_preference
of thisCollection
.Parameters: - filter (optional): a SON object specifying elements which must be present for a document to be included in the result set
- projection (optional): a list of field names that should be returned in the result set or a dict specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a dict to exclude fields from the result (e.g. projection={‘_id’: False}).
- session (optional): a
ClientSession
. - skip (optional): the number of documents to omit (from the start of the result set) when returning the results
- limit (optional): the maximum number of results to return. A limit of 0 (the default) is equivalent to setting no limit.
- no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.
- cursor_type (optional): the type of cursor to return. The valid
options are defined by
CursorType
:NON_TAILABLE
- the result of this find call will return a standard cursor over the result set.TAILABLE
- the result of this find call will be a tailable cursor - tailable cursors are only for use with capped collections. They are not closed when the last data is retrieved but are kept open and the cursor location marks the final document position. If more data is received iteration of the cursor will continue from the last document received. For details, see the tailable cursor documentation.TAILABLE_AWAIT
- the result of this find call will be a tailable cursor with the await flag set. The server will wait for a few seconds after returning the full result set so that it can capture and return additional data added during the query.EXHAUST
- the result of this find call will be an exhaust cursor. MongoDB will stream batched results to the client without waiting for the client to request each batch, reducing latency. See notes on compatibility below.
- sort (optional): a list of (key, direction) pairs
specifying the sort order for this query. See
sort()
for details. - allow_partial_results (optional): if True, mongos will return partial results if some shards are down instead of returning an error.
- oplog_replay (optional): DEPRECATED - if True, set the oplogReplay query flag. Default: False.
- batch_size (optional): Limits the number of documents returned in a single batch.
- manipulate (optional): DEPRECATED - If True, apply any outgoing SON manipulators before returning. Default: True.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - return_key (optional): If True, return only the index keys in each document.
- show_record_id (optional): If True, adds a field
$recordId
in each document with the storage engine’s internal record identifier. - snapshot (optional): DEPRECATED - If True, prevents the cursor from returning a document more than once because of an intervening write operation.
- hint (optional): An index, in the same format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). Pass this as an alternative to callinghint()
on the cursor to tell Mongo the proper index to use for the query. - max_time_ms (optional): Specifies a time limit for a query
operation. If the specified time is exceeded, the operation will be
aborted and
ExecutionTimeout
is raised. Pass this as an alternative to callingmax_time_ms()
on the cursor. - max_scan (optional): DEPRECATED - The maximum number of
documents to scan. Pass this as an alternative to calling
max_scan()
on the cursor. - min (optional): A list of field, limit pairs specifying the
inclusive lower bound for all keys of a specific index in order.
Pass this as an alternative to calling
min()
on the cursor.hint
must also be passed to ensure the query utilizes the correct index. - max (optional): A list of field, limit pairs specifying the
exclusive upper bound for all keys of a specific index in order.
Pass this as an alternative to calling
max()
on the cursor.hint
must also be passed to ensure the query utilizes the correct index. - comment (optional): A string to attach to the query to help
interpret and trace the operation in the server logs and in profile
data. Pass this as an alternative to calling
comment()
on the cursor. - modifiers (optional): DEPRECATED - A dict specifying additional MongoDB query modifiers. Use the keyword arguments listed above instead.
- allow_disk_use (optional): if True, MongoDB may use temporary disk files to store data exceeding the system memory limit while processing a blocking sort operation. The option has no effect if MongoDB can satisfy the specified sort using an index, or if the blocking sort requires less memory than the 100 MiB limit. This option is only supported on MongoDB 4.4 and above.
Note
There are a number of caveats to using
EXHAUST
as cursor_type:- The limit option can not be used with an exhaust cursor.
- Exhaust cursors are not supported by mongos and can not be used with a sharded cluster.
- A
Cursor
instance created with theEXHAUST
cursor_type requires an exclusivesocket
connection to MongoDB. If theCursor
is discarded without being completely iterated the underlyingsocket
connection will be closed and discarded without being returned to the connection pool.
Changed in version 3.11: Added the
allow_disk_use
option. Deprecated theoplog_replay
option. Support for this option is deprecated in MongoDB 4.4. The query engine now automatically optimizes queries against the oplog without requiring this option to be set.Changed in version 3.7: Deprecated the
snapshot
option, which is deprecated in MongoDB 3.6 and removed in MongoDB 4.0. Deprecated themax_scan
option. Support for this option is deprecated in MongoDB 4.0. Usemax_time_ms
instead to limit server-side execution time.Changed in version 3.6: Added
session
parameter.Changed in version 3.5: Added the options
return_key
,show_record_id
,snapshot
,hint
,max_time_ms
,max_scan
,min
,max
, andcomment
. Deprecated themodifiers
option.Changed in version 3.4: Added support for the
collation
option.Changed in version 3.0: Changed the parameter names
spec
,fields
,timeout
, andpartial
tofilter
,projection
,no_cursor_timeout
, andallow_partial_results
respectively. Added thecursor_type
,oplog_replay
, andmodifiers
options. Removed thenetwork_timeout
,read_preference
,tag_sets
,secondary_acceptable_latency_ms
,max_scan
,snapshot
,tailable
,await_data
,exhaust
,as_class
, and slave_okay parameters. Removedcompile_re
option: PyMongo now always represents BSON regular expressions asRegex
objects. Usetry_compile()
to attempt to convert from a BSON regular expression to a Python regular expression object. Soft deprecated themanipulate
option.Changed in version 2.7: Added
compile_re
option. If set to False, PyMongo represented BSON regular expressions asRegex
objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.Changed in version 2.3: Added the
tag_sets
andsecondary_acceptable_latency_ms
parameters.
-
find_raw_batches
(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, allow_disk_use=None)¶ Query the database and retrieve batches of raw BSON.
Similar to the
find()
method but returns aRawBatchCursor
.This example demonstrates how to work with raw batches, but in practice raw batches should be passed to an external library that can decode BSON into another data type, rather than used with PyMongo’s
bson
module.>>> import bson >>> cursor = db.test.find_raw_batches() >>> for batch in cursor: ... print(bson.decode_all(batch))
Note
find_raw_batches does not support sessions or auto encryption.
New in version 3.6.
-
find_one
(filter=None, *args, **kwargs)¶ Get a single document from the database.
All arguments to
find()
are also valid arguments forfind_one()
, although any limit argument will be ignored. Returns a single document, orNone
if no matching document is found.The
find_one()
method obeys theread_preference
of thisCollection
.Parameters: filter (optional): a dictionary specifying the query to be performed OR any other type to be used as the value for a query for
"_id"
.*args (optional): any additional positional arguments are the same as the arguments to
find()
.**kwargs (optional): any additional keyword arguments are the same as the arguments to
find()
.>>> collection.find_one(max_time_ms=100)
-
find_one_and_delete
(filter, projection=None, sort=None, hint=None, session=None, **kwargs)¶ Finds a single document and deletes it, returning the document.
>>> db.test.count_documents({'x': 1}) 2 >>> db.test.find_one_and_delete({'x': 1}) {u'x': 1, u'_id': ObjectId('54f4e12bfba5220aa4d6dee8')} >>> db.test.count_documents({'x': 1}) 1
If multiple documents match filter, a sort can be applied.
>>> for doc in db.test.find({'x': 1}): ... print(doc) ... {u'x': 1, u'_id': 0} {u'x': 1, u'_id': 1} {u'x': 1, u'_id': 2} >>> db.test.find_one_and_delete( ... {'x': 1}, sort=[('_id', pymongo.DESCENDING)]) {u'x': 1, u'_id': 2}
The projection option can be used to limit the fields returned.
>>> db.test.find_one_and_delete({'x': 1}, projection={'_id': False}) {u'x': 1}
Parameters: - filter: A query that matches the document to delete.
- projection (optional): a list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a mapping to exclude fields from the result (e.g. projection={‘_id’: False}).
- sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is deleted.
- hint (optional): An index to use to support the query predicate
specified either by its string name, or in the same format as
passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.4 and above. - session (optional): a
ClientSession
. - **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).
Changed in version 3.11: Added
hint
parameter.Changed in version 3.6: Added
session
parameter.Changed in version 3.2: Respects write concern.
Warning
Starting in PyMongo 3.2, this command uses the
WriteConcern
of thisCollection
when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.Changed in version 3.4: Added the collation option.
New in version 3.0.
-
find_one_and_replace
(filter, replacement, projection=None, sort=None, return_document=ReturnDocument.BEFORE, hint=None, session=None, **kwargs)¶ Finds a single document and replaces it, returning either the original or the replaced document.
The
find_one_and_replace()
method differs fromfind_one_and_update()
by replacing the document matched by filter, rather than modifying the existing document.>>> for doc in db.test.find({}): ... print(doc) ... {u'x': 1, u'_id': 0} {u'x': 1, u'_id': 1} {u'x': 1, u'_id': 2} >>> db.test.find_one_and_replace({'x': 1}, {'y': 1}) {u'x': 1, u'_id': 0} >>> for doc in db.test.find({}): ... print(doc) ... {u'y': 1, u'_id': 0} {u'x': 1, u'_id': 1} {u'x': 1, u'_id': 2}
Parameters: - filter: A query that matches the document to replace.
- replacement: The replacement document.
- projection (optional): A list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a mapping to exclude fields from the result (e.g. projection={‘_id’: False}).
- sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is replaced.
- upsert (optional): When
True
, inserts a new document if no document matches the query. Defaults toFalse
. - return_document: If
ReturnDocument.BEFORE
(the default), returns the original document before it was replaced, orNone
if no document matches. IfReturnDocument.AFTER
, returns the replaced or inserted document. - hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.4 and above. - session (optional): a
ClientSession
. - **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).
Changed in version 3.11: Added the
hint
option.Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Added the
collation
option.Changed in version 3.2: Respects write concern.
Warning
Starting in PyMongo 3.2, this command uses the
WriteConcern
of thisCollection
when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.New in version 3.0.
-
find_one_and_update
(filter, update, projection=None, sort=None, return_document=ReturnDocument.BEFORE, array_filters=None, hint=None, session=None, **kwargs)¶ Finds a single document and updates it, returning either the original or the updated document.
>>> db.test.find_one_and_update( ... {'_id': 665}, {'$inc': {'count': 1}, '$set': {'done': True}}) {u'_id': 665, u'done': False, u'count': 25}}
Returns
None
if no document matches the filter.>>> db.test.find_one_and_update( ... {'_exists': False}, {'$inc': {'count': 1}})
When the filter matches, by default
find_one_and_update()
returns the original version of the document before the update was applied. To return the updated (or inserted in the case of upsert) version of the document instead, use the return_document option.>>> from pymongo import ReturnDocument >>> db.example.find_one_and_update( ... {'_id': 'userid'}, ... {'$inc': {'seq': 1}}, ... return_document=ReturnDocument.AFTER) {u'_id': u'userid', u'seq': 1}
You can limit the fields returned with the projection option.
>>> db.example.find_one_and_update( ... {'_id': 'userid'}, ... {'$inc': {'seq': 1}}, ... projection={'seq': True, '_id': False}, ... return_document=ReturnDocument.AFTER) {u'seq': 2}
The upsert option can be used to create the document if it doesn’t already exist.
>>> db.example.delete_many({}).deleted_count 1 >>> db.example.find_one_and_update( ... {'_id': 'userid'}, ... {'$inc': {'seq': 1}}, ... projection={'seq': True, '_id': False}, ... upsert=True, ... return_document=ReturnDocument.AFTER) {u'seq': 1}
If multiple documents match filter, a sort can be applied.
>>> for doc in db.test.find({'done': True}): ... print(doc) ... {u'_id': 665, u'done': True, u'result': {u'count': 26}} {u'_id': 701, u'done': True, u'result': {u'count': 17}} >>> db.test.find_one_and_update( ... {'done': True}, ... {'$set': {'final': True}}, ... sort=[('_id', pymongo.DESCENDING)]) {u'_id': 701, u'done': True, u'result': {u'count': 17}}
Parameters: - filter: A query that matches the document to update.
- update: The update operations to apply.
- projection (optional): A list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a dict to exclude fields from the result (e.g. projection={‘_id’: False}).
- sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is updated.
- upsert (optional): When
True
, inserts a new document if no document matches the query. Defaults toFalse
. - return_document: If
ReturnDocument.BEFORE
(the default), returns the original document before it was updated. IfReturnDocument.AFTER
, returns the updated or inserted document. - array_filters (optional): A list of filters specifying which array elements an update should apply. This option is only supported on MongoDB 3.6 and above.
- hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.4 and above. - session (optional): a
ClientSession
. - **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).
Changed in version 3.11: Added the
hint
option.Changed in version 3.9: Added the ability to accept a pipeline as the
update
.Changed in version 3.6: Added the
array_filters
andsession
options.Changed in version 3.4: Added the
collation
option.Changed in version 3.2: Respects write concern.
Warning
Starting in PyMongo 3.2, this command uses the
WriteConcern
of thisCollection
when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.New in version 3.0.
-
count_documents
(filter, session=None, **kwargs)¶ Count the number of documents in this collection.
Note
For a fast count of the total documents in a collection see
estimated_document_count()
.The
count_documents()
method is supported in a transaction.All optional parameters should be passed as keyword arguments to this method. Valid options include:
- skip (int): The number of matching documents to skip before returning results.
- limit (int): The maximum number of documents to count. Must be a positive integer. If not provided, no limit is imposed.
- maxTimeMS (int): The maximum amount of time to allow this operation to run, in milliseconds.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (string or list of tuples): The index to use. Specify either the index name as a string or the index specification as a list of tuples (e.g. [(‘a’, pymongo.ASCENDING), (‘b’, pymongo.ASCENDING)]). This option is only supported on MongoDB 3.6 and above.
The
count_documents()
method obeys theread_preference
of thisCollection
.Note
When migrating from
count()
tocount_documents()
the following query operators must be replaced:Operator Replacement $where $expr $near $geoWithin with $center $nearSphere $geoWithin with $centerSphere $expr requires MongoDB 3.6+
Parameters: - filter (required): A query document that selects which documents to count in the collection. Can be an empty document to count all documents.
- session (optional): a
ClientSession
. - **kwargs (optional): See list of options above.
New in version 3.7.
-
estimated_document_count
(**kwargs)¶ Get an estimate of the number of documents in this collection using collection metadata.
The
estimated_document_count()
method is not supported in a transaction.All optional parameters should be passed as keyword arguments to this method. Valid options include:
- maxTimeMS (int): The maximum amount of time to allow this operation to run, in milliseconds.
Parameters: - **kwargs (optional): See list of options above.
New in version 3.7.
-
distinct
(key, filter=None, session=None, **kwargs)¶ Get a list of distinct values for key among all documents in this collection.
Raises
TypeError
if key is not an instance ofbasestring
(str
in python 3).All optional distinct parameters should be passed as keyword arguments to this method. Valid options include:
- maxTimeMS (int): The maximum amount of time to allow the count command to run, in milliseconds.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above.
The
distinct()
method obeys theread_preference
of thisCollection
.Parameters: - key: name of the field for which we want to get the distinct values
- filter (optional): A query document that specifies the documents from which to retrieve the distinct values.
- session (optional): a
ClientSession
. - **kwargs (optional): See list of options above.
Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Support the collation option.
-
create_index
(keys, session=None, **kwargs)¶ Creates an index on this collection.
Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of
basestring
(str
in python 3), and the direction(s) must be one of (ASCENDING
,DESCENDING
,GEO2D
,GEOHAYSTACK
,GEOSPHERE
,HASHED
,TEXT
).To create a single key ascending index on the key
'mike'
we just use a string argument:>>> my_collection.create_index("mike")
For a compound index on
'mike'
descending and'eliot'
ascending we need to use a list of tuples:>>> my_collection.create_index([("mike", pymongo.DESCENDING), ... ("eliot", pymongo.ASCENDING)])
All optional index creation parameters should be passed as keyword arguments to this method. For example:
>>> my_collection.create_index([("mike", pymongo.DESCENDING)], ... background=True)
Valid options include, but are not limited to:
- name: custom name to use for this index - if none is given, a name will be generated.
- unique: if
True
, creates a uniqueness constraint on the index. - background: if
True
, this index should be created in the background. - sparse: if
True
, omit from the index any documents that lack the indexed field. - bucketSize: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
- min: minimum value for keys in a
GEO2D
index. - max: maximum value for keys in a
GEO2D
index. - expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
- partialFilterExpression: A document that specifies a filter for a partial index. Requires MongoDB >=3.2.
- collation (optional): An instance of
Collation
. Requires MongoDB >= 3.4. - wildcardProjection: Allows users to include or exclude specific field paths from a wildcard index using the {“$**” : 1} key pattern. Requires MongoDB >= 4.2.
- hidden: if
True
, this index will be hidden from the query planner and will not be evaluated as part of query plan selection. Requires MongoDB >= 4.4.
See the MongoDB documentation for a full list of supported options by server version.
Warning
dropDups is not supported by MongoDB 3.0 or newer. The option is silently ignored by the server and unique index builds using the option will fail if a duplicate value is detected.
Note
The
write_concern
of this collection is automatically applied to this operation when using MongoDB >= 3.4.Parameters: - keys: a single key or a list of (key, direction) pairs specifying the index to create
- session (optional): a
ClientSession
. - **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments
Changed in version 3.11: Added the
hidden
option.Changed in version 3.6: Added
session
parameter. Added support for passing maxTimeMS in kwargs.Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4. Support the collation option.
Changed in version 3.2: Added partialFilterExpression to support partial indexes.
Changed in version 3.0: Renamed key_or_list to keys. Removed the cache_for option.
create_index()
no longer caches index names. Removed support for the drop_dups and bucket_size aliases.
-
create_indexes
(indexes, session=None, **kwargs)¶ Create one or more indexes on this collection.
>>> from pymongo import IndexModel, ASCENDING, DESCENDING >>> index1 = IndexModel([("hello", DESCENDING), ... ("world", ASCENDING)], name="hello_world") >>> index2 = IndexModel([("goodbye", DESCENDING)]) >>> db.test.create_indexes([index1, index2]) ["hello_world", "goodbye_-1"]
Parameters: - indexes: A list of
IndexModel
instances. - session (optional): a
ClientSession
. - **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.
Note
create_indexes uses the createIndexes command introduced in MongoDB 2.6 and cannot be used with earlier versions.
Note
The
write_concern
of this collection is automatically applied to this operation when using MongoDB >= 3.4.Changed in version 3.6: Added
session
parameter. Added support for arbitrary keyword arguments.Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.
New in version 3.0.
- indexes: A list of
-
drop_index
(index_or_name, session=None, **kwargs)¶ Drops the specified index on this collection.
Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error (e.g. trying to drop an index that does not exist). index_or_name can be either an index name (as returned by create_index), or an index specifier (as passed to create_index). An index specifier should be a list of (key, direction) pairs. Raises TypeError if index is not an instance of (str, unicode, list).
Warning
if a custom name was used on index creation (by passing the name parameter to
create_index()
orensure_index()
) the index must be dropped by name.Parameters: - index_or_name: index (or name of index) to drop
- session (optional): a
ClientSession
. - **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.
Note
The
write_concern
of this collection is automatically applied to this operation when using MongoDB >= 3.4.Changed in version 3.6: Added
session
parameter. Added support for arbitrary keyword arguments.Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.
-
drop_indexes
(session=None, **kwargs)¶ Drops all indexes on this collection.
Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error.
Parameters: - session (optional): a
ClientSession
. - **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.
Note
The
write_concern
of this collection is automatically applied to this operation when using MongoDB >= 3.4.Changed in version 3.6: Added
session
parameter. Added support for arbitrary keyword arguments.Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.
- session (optional): a
-
reindex
(session=None, **kwargs)¶ Rebuilds all indexes on this collection.
DEPRECATED - The
reindex()
method is deprecated and will be removed in PyMongo 4.0. Usecommand()
to run thereIndex
command directly instead:db.command({"reIndex": "<collection_name>"})
Note
Starting in MongoDB 4.6, the reIndex command can only be run when connected to a standalone mongod.
Parameters: - session (optional): a
ClientSession
. - **kwargs (optional): optional arguments to the reIndex command (like maxTimeMS) can be passed as keyword arguments.
Warning
reindex blocks all other operations (indexes are built in the foreground) and will be slow for large collections.
Changed in version 3.11: Deprecated.
Changed in version 3.6: Added
session
parameter. Added support for arbitrary keyword arguments.Changed in version 3.5: We no longer apply this collection’s write concern to this operation. MongoDB 3.4 silently ignored the write concern. MongoDB 3.6+ returns an error if we include the write concern.
Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.
- session (optional): a
-
list_indexes
(session=None)¶ Get a cursor over the index documents for this collection.
>>> for index in db.test.list_indexes(): ... print(index) ... SON([('v', 2), ('key', SON([('_id', 1)])), ('name', '_id_')])
Parameters: - session (optional): a
ClientSession
.
Returns: An instance of
CommandCursor
.Changed in version 3.6: Added
session
parameter.New in version 3.0.
- session (optional): a
-
index_information
(session=None)¶ Get information on this collection’s indexes.
Returns a dictionary where the keys are index names (as returned by create_index()) and the values are dictionaries containing information about each index. The dictionary is guaranteed to contain at least a single key,
"key"
which is a list of (key, direction) pairs specifying the index (as passed to create_index()). It will also contain any other metadata about the indexes, except for the"ns"
and"name"
keys, which are cleaned. Example output might look like this:>>> db.test.create_index("x", unique=True) u'x_1' >>> db.test.index_information() {u'_id_': {u'key': [(u'_id', 1)]}, u'x_1': {u'unique': True, u'key': [(u'x', 1)]}}
Parameters: - session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter.- session (optional): a
-
drop
(session=None)¶ Alias for
drop_collection()
.Parameters: - session (optional): a
ClientSession
.
The following two calls are equivalent:
>>> db.foo.drop() >>> db.drop_collection("foo")
Changed in version 3.7:
drop()
now respects thisCollection
’swrite_concern
.Changed in version 3.6: Added
session
parameter.- session (optional): a
-
rename
(new_name, session=None, **kwargs)¶ Rename this collection.
If operating in auth mode, client must be authorized as an admin to perform this operation. Raises
TypeError
if new_name is not an instance ofbasestring
(str
in python 3). RaisesInvalidName
if new_name is not a valid collection name.Parameters: - new_name: new name for this collection
- session (optional): a
ClientSession
. - **kwargs (optional): additional arguments to the rename command
may be passed as keyword arguments to this helper method
(i.e.
dropTarget=True
)
Note
The
write_concern
of this collection is automatically applied to this operation when using MongoDB >= 3.4.Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.
-
options
(session=None)¶ Get the options set on this collection.
Returns a dictionary of options and their values - see
create_collection()
for more information on the possible options. Returns an empty dictionary if the collection has not been created yet.Parameters: - session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter.- session (optional): a
-
map_reduce
(map, reduce, out, full_response=False, session=None, **kwargs)¶ Perform a map/reduce operation on this collection.
If full_response is
False
(default) returns aCollection
instance containing the results of the operation. Otherwise, returns the full response from the server to the map reduce command.Parameters: map: map function (as a JavaScript string)
reduce: reduce function (as a JavaScript string)
out: output collection name or out object (dict). See the map reduce command documentation for available options. Note: out options are order sensitive.
SON
can be used to specify multiple options. e.g. SON([(‘replace’, <collection name>), (‘db’, <database name>)])full_response (optional): if
True
, return full response to this command - otherwise just return the result collectionsession (optional): a
ClientSession
.**kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:
>>> db.test.map_reduce(map, reduce, "myresults", limit=2)
Note
The
map_reduce()
method does not obey theread_preference
of thisCollection
. To run mapReduce on a secondary use theinline_map_reduce()
method instead.Note
The
write_concern
of this collection is automatically applied to this operation (if the output is not inline) when using MongoDB >= 3.4.Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.
See also
Changed in version 3.4: Added the collation option.
Changed in version 2.2: Removed deprecated arguments: merge_output and reduce_output
-
inline_map_reduce
(map, reduce, full_response=False, session=None, **kwargs)¶ Perform an inline map/reduce operation on this collection.
Perform the map/reduce operation on the server in RAM. A result collection is not created. The result set is returned as a list of documents.
If full_response is
False
(default) returns the result documents in a list. Otherwise, returns the full response from the server to the map reduce command.The
inline_map_reduce()
method obeys theread_preference
of thisCollection
.Parameters: map: map function (as a JavaScript string)
reduce: reduce function (as a JavaScript string)
full_response (optional): if
True
, return full response to this command - otherwise just return the result collectionsession (optional): a
ClientSession
.**kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:
>>> db.test.inline_map_reduce(map, reduce, limit=2)
Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Added the collation option.
-
parallel_scan
(num_cursors, session=None, **kwargs)¶ DEPRECATED: Scan this entire collection in parallel.
Returns a list of up to
num_cursors
cursors that can be iterated concurrently. As long as the collection is not modified during scanning, each document appears once in one of the cursors result sets.For example, to process each document in a collection using some thread-safe
process_document()
function:>>> def process_cursor(cursor): ... for document in cursor: ... # Some thread-safe processing function: ... process_document(document) >>> >>> # Get up to 4 cursors. ... >>> cursors = collection.parallel_scan(4) >>> threads = [ ... threading.Thread(target=process_cursor, args=(cursor,)) ... for cursor in cursors] >>> >>> for thread in threads: ... thread.start() >>> >>> for thread in threads: ... thread.join() >>> >>> # All documents have now been processed.
The
parallel_scan()
method obeys theread_preference
of thisCollection
.Parameters: - num_cursors: the number of cursors to return
- session (optional): a
ClientSession
. - **kwargs: additional options for the parallelCollectionScan command can be passed as keyword arguments.
Note
Requires server version >= 2.5.5.
Changed in version 3.7: Deprecated.
Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Added back support for arbitrary keyword arguments. MongoDB 3.4 adds support for maxTimeMS as an option to the parallelCollectionScan command.
Changed in version 3.0: Removed support for arbitrary keyword arguments, since the parallelCollectionScan command has no optional arguments.
-
initialize_unordered_bulk_op
(bypass_document_validation=False)¶ DEPRECATED - Initialize an unordered batch of write operations.
Operations will be performed on the server in arbitrary order, possibly in parallel. All operations will be attempted.
Parameters: - bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
.
Returns a
BulkOperationBuilder
instance.See Unordered Bulk Write Operations for examples.
Note
bypass_document_validation requires server version >= 3.2
Changed in version 3.5: Deprecated. Use
bulk_write()
instead.Changed in version 3.2: Added bypass_document_validation support
New in version 2.7.
- bypass_document_validation: (optional) If
-
initialize_ordered_bulk_op
(bypass_document_validation=False)¶ DEPRECATED - Initialize an ordered batch of write operations.
Operations will be performed on the server serially, in the order provided. If an error occurs all remaining operations are aborted.
Parameters: - bypass_document_validation: (optional) If
True
, allows the write to opt-out of document level validation. Default isFalse
.
Returns a
BulkOperationBuilder
instance.See Ordered Bulk Write Operations for examples.
Note
bypass_document_validation requires server version >= 3.2
Changed in version 3.5: Deprecated. Use
bulk_write()
instead.Changed in version 3.2: Added bypass_document_validation support
New in version 2.7.
- bypass_document_validation: (optional) If
-
group
(key, condition, initial, reduce, finalize=None, **kwargs)¶ Perform a query similar to an SQL group by operation.
DEPRECATED - The group command was deprecated in MongoDB 3.4. The
group()
method is deprecated and will be removed in PyMongo 4.0. Useaggregate()
with the $group stage ormap_reduce()
instead.Changed in version 3.5: Deprecated the group method.
Changed in version 3.4: Added the collation option.
Changed in version 2.2: Removed deprecated argument: command
-
count
(filter=None, session=None, **kwargs)¶ DEPRECATED - Get the number of documents in this collection.
The
count()
method is deprecated and not supported in a transaction. Please usecount_documents()
orestimated_document_count()
instead.All optional count parameters should be passed as keyword arguments to this method. Valid options include:
- skip (int): The number of matching documents to skip before returning results.
- limit (int): The maximum number of documents to count. A limit of 0 (the default) is equivalent to setting no limit.
- maxTimeMS (int): The maximum amount of time to allow the count command to run, in milliseconds.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (string or list of tuples): The index to use. Specify either the index name as a string or the index specification as a list of tuples (e.g. [(‘a’, pymongo.ASCENDING), (‘b’, pymongo.ASCENDING)]).
The
count()
method obeys theread_preference
of thisCollection
.Note
When migrating from
count()
tocount_documents()
the following query operators must be replaced:Operator Replacement $where $expr $near $geoWithin with $center $nearSphere $geoWithin with $centerSphere $expr requires MongoDB 3.6+
Parameters: - filter (optional): A query document that selects which documents to count in the collection.
- session (optional): a
ClientSession
. - **kwargs (optional): See list of options above.
Changed in version 3.7: Deprecated.
Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Support the collation option.
-
insert
(doc_or_docs, manipulate=True, check_keys=True, continue_on_error=False, **kwargs)¶ Insert a document(s) into this collection.
DEPRECATED - Use
insert_one()
orinsert_many()
instead.Changed in version 3.0: Removed the safe parameter. Pass
w=0
for unacknowledged write operations.
-
save
(to_save, manipulate=True, check_keys=True, **kwargs)¶ Save a document in this collection.
DEPRECATED - Use
insert_one()
orreplace_one()
instead.Changed in version 3.0: Removed the safe parameter. Pass
w=0
for unacknowledged write operations.
-
update
(spec, document, upsert=False, manipulate=False, multi=False, check_keys=True, **kwargs)¶ Update a document(s) in this collection.
DEPRECATED - Use
replace_one()
,update_one()
, orupdate_many()
instead.Changed in version 3.0: Removed the safe parameter. Pass
w=0
for unacknowledged write operations.
-
remove
(spec_or_id=None, multi=True, **kwargs)¶ Remove a document(s) from this collection.
DEPRECATED - Use
delete_one()
ordelete_many()
instead.Changed in version 3.0: Removed the safe parameter. Pass
w=0
for unacknowledged write operations.
-
find_and_modify
(query={}, update=None, upsert=False, sort=None, full_response=False, manipulate=False, **kwargs)¶ Update and return an object.
DEPRECATED - Use
find_one_and_delete()
,find_one_and_replace()
, orfind_one_and_update()
instead.
-
ensure_index
(key_or_list, cache_for=300, **kwargs)¶ DEPRECATED - Ensures that an index exists on this collection.
Changed in version 3.0: DEPRECATED
command_cursor
– Tools for iterating over MongoDB command results¶
CommandCursor class to iterate over command results.
-
class
pymongo.command_cursor.
CommandCursor
(collection, cursor_info, address, retrieved=0, batch_size=0, max_await_time_ms=None, session=None, explicit_session=False)¶ Create a new command cursor.
The parameter ‘retrieved’ is unused.
-
address
¶ The (host, port) of the server used, or None.
New in version 3.0.
-
alive
¶ Does this cursor have the potential to return more data?
Even if
alive
isTrue
,next()
can raiseStopIteration
. Best to use a for loop:for doc in collection.aggregate(pipeline): print(doc)
-
batch_size
(batch_size)¶ Limits the number of documents returned in one batch. Each batch requires a round trip to the server. It can be adjusted to optimize performance and limit data transfer.
Note
batch_size can not override MongoDB’s internal limits on the amount of data it will return to the client in a single batch (i.e if you set batch size to 1,000,000,000, MongoDB will currently only return 4-16MB of results per batch).
Raises
TypeError
if batch_size is not an integer. RaisesValueError
if batch_size is less than0
.Parameters: - batch_size: The size of each batch of results requested.
-
close
()¶ Explicitly close / kill this cursor.
-
cursor_id
¶ Returns the id of the cursor.
-
next
()¶ Advance the cursor.
-
session
¶ The cursor’s
ClientSession
, or None.New in version 3.6.
-
-
class
pymongo.command_cursor.
RawBatchCommandCursor
(collection, cursor_info, address, retrieved=0, batch_size=0, max_await_time_ms=None, session=None, explicit_session=False)¶ Create a new cursor / iterator over raw batches of BSON data.
Should not be called directly by application developers - see
aggregate_raw_batches()
instead.
cursor
– Tools for iterating over MongoDB query results¶
Cursor class to iterate over Mongo query results.
-
class
pymongo.cursor.
CursorType
¶ -
NON_TAILABLE
¶ The standard cursor type.
-
TAILABLE
¶ The tailable cursor type.
Tailable cursors are only for use with capped collections. They are not closed when the last data is retrieved but are kept open and the cursor location marks the final document position. If more data is received iteration of the cursor will continue from the last document received.
-
TAILABLE_AWAIT
¶ A tailable cursor with the await option set.
Creates a tailable cursor that will wait for a few seconds after returning the full result set so that it can capture and return additional data added during the query.
-
EXHAUST
¶ An exhaust cursor.
MongoDB will stream batched results to the client without waiting for the client to request each batch, reducing latency.
-
-
class
pymongo.cursor.
Cursor
(collection, filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, session=None, allow_disk_use=None)¶ Create a new cursor.
Should not be called directly by application developers - see
find()
instead.-
c[index]
See
__getitem__()
.
-
__getitem__
(index)¶ Get a single document or a slice of documents from this cursor.
Raises
InvalidOperation
if this cursor has already been used.To get a single document use an integral index, e.g.:
>>> db.test.find()[50]
An
IndexError
will be raised if the index is negative or greater than the amount of documents in this cursor. Any limit previously applied to this cursor will be ignored.To get a slice of documents use a slice index, e.g.:
>>> db.test.find()[20:25]
This will return this cursor with a limit of
5
and skip of20
applied. Using a slice index will override any prior limits or skips applied to this cursor (including those applied through previous calls to this method). RaisesIndexError
when the slice has a step, a negative start value, or a stop value less than or equal to the start value.Parameters: - index: An integer or slice index to be applied to this cursor
-
add_option
(mask)¶ Set arbitrary query flags using a bitmask.
To set the tailable flag: cursor.add_option(2)
-
address
¶ The (host, port) of the server used, or None.
Changed in version 3.0: Renamed from “conn_id”.
-
alive
¶ Does this cursor have the potential to return more data?
This is mostly useful with tailable cursors since they will stop iterating even though they may return more results in the future.
With regular cursors, simply use a for loop instead of
alive
:for doc in collection.find(): print(doc)
-
allow_disk_use
(allow_disk_use)¶ Specifies whether MongoDB can use temporary disk files while processing a blocking sort operation.
Raises
TypeError
if allow_disk_use is not a boolean.Note
allow_disk_use requires server version >= 4.4
Parameters: - allow_disk_use: if True, MongoDB may use temporary disk files to store data exceeding the system memory limit while processing a blocking sort operation.
New in version 3.11.
-
batch_size
(batch_size)¶ Limits the number of documents returned in one batch. Each batch requires a round trip to the server. It can be adjusted to optimize performance and limit data transfer.
Note
batch_size can not override MongoDB’s internal limits on the amount of data it will return to the client in a single batch (i.e if you set batch size to 1,000,000,000, MongoDB will currently only return 4-16MB of results per batch).
Raises
TypeError
if batch_size is not an integer. RaisesValueError
if batch_size is less than0
. RaisesInvalidOperation
if thisCursor
has already been used. The last batch_size applied to this cursor takes precedence.Parameters: - batch_size: The size of each batch of results requested.
-
clone
()¶ Get a clone of this cursor.
Returns a new Cursor instance with options matching those that have been set on the current instance. The clone will be completely unevaluated, even if the current instance has been partially or completely evaluated.
-
close
()¶ Explicitly close / kill this cursor.
-
collation
(collation)¶ Adds a
Collation
to this query.This option is only supported on MongoDB 3.4 and above.
Raises
TypeError
if collation is not an instance ofCollation
or adict
. RaisesInvalidOperation
if thisCursor
has already been used. Only the last collation applied to this cursor has any effect.Parameters: - collation: An instance of
Collation
.
- collation: An instance of
-
collection
¶ The
Collection
that thisCursor
is iterating.
-
comment
(comment)¶ Adds a ‘comment’ to the cursor.
http://docs.mongodb.org/manual/reference/operator/comment/
Parameters: - comment: A string to attach to the query to help interpret and trace the operation in the server logs and in profile data.
New in version 2.7.
-
count
(with_limit_and_skip=False)¶ DEPRECATED - Get the size of the results set for this query.
The
count()
method is deprecated and not supported in a transaction. Please usecount_documents()
instead.Returns the number of documents in the results set for this query. Does not take
limit()
andskip()
into account by default - set with_limit_and_skip toTrue
if that is the desired behavior. RaisesOperationFailure
on a database error.When used with MongoDB >= 2.6,
count()
uses anyhint()
applied to the query. In the following example the hint is passed to the count command:collection.find({‘field’: ‘value’}).hint(‘field_1’).count()The
count()
method obeys theread_preference
of theCollection
instance on whichfind()
was called.Parameters: Note
The with_limit_and_skip parameter requires server version >= 1.1.4-
Changed in version 3.7: Deprecated.
-
cursor_id
¶ Returns the id of the cursor
Useful if you need to manage cursor ids and want to handle killing cursors manually using
kill_cursors()
New in version 2.2.
-
distinct
(key)¶ Get a list of distinct values for key among all documents in the result set of this query.
Raises
TypeError
if key is not an instance ofbasestring
(str
in python 3).The
distinct()
method obeys theread_preference
of theCollection
instance on whichfind()
was called.Parameters: - key: name of key for which we want to get the distinct values
-
explain
()¶ Returns an explain plan record for this cursor.
Note
Starting with MongoDB 3.2
explain()
uses the default verbosity mode of the explain command,allPlansExecution
. To use a different verbosity usecommand()
to run the explain command directly.
-
hint
(index)¶ Adds a ‘hint’, telling Mongo the proper index to use for the query.
Judicious use of hints can greatly improve query performance. When doing a query on multiple fields (at least one of which is indexed) pass the indexed field as a hint to the query. Raises
OperationFailure
if the provided hint requires an index that does not exist on this collection, and raisesInvalidOperation
if this cursor has already been used.index should be an index as passed to
create_index()
(e.g.[('field', ASCENDING)]
) or the name of the index. If index isNone
any existing hint for this query is cleared. The last hint applied to this cursor takes precedence over all others.Parameters: - index: index to hint on (as an index specifier)
Changed in version 2.8: The
hint()
method accepts the name of the index.
-
limit
(limit)¶ Limits the number of results to be returned by this cursor.
Raises
TypeError
if limit is not an integer. RaisesInvalidOperation
if thisCursor
has already been used. The last limit applied to this cursor takes precedence. A limit of0
is equivalent to no limit.Parameters: - limit: the number of results to return
-
max
(spec)¶ Adds
max
operator that specifies upper bound for specific index.When using
max
,hint()
should also be configured to ensure the query uses the expected index and starting in MongoDB 4.2hint()
will be required.Parameters: - spec: a list of field, limit pairs specifying the exclusive upper bound for all keys of a specific index in order.
Changed in version 3.8: Deprecated cursors that use
max
without ahint()
.New in version 2.7.
-
max_await_time_ms
(max_await_time_ms)¶ Specifies a time limit for a getMore operation on a
TAILABLE_AWAIT
cursor. For all other types of cursor max_await_time_ms is ignored.Raises
TypeError
if max_await_time_ms is not an integer orNone
. RaisesInvalidOperation
if thisCursor
has already been used.Note
max_await_time_ms requires server version >= 3.2
Parameters: - max_await_time_ms: the time limit after which the operation is aborted
New in version 3.2.
-
max_scan
(max_scan)¶ DEPRECATED - Limit the number of documents to scan when performing the query.
Raises
InvalidOperation
if this cursor has already been used. Only the lastmax_scan()
applied to this cursor has any effect.Parameters: - max_scan: the maximum number of documents to scan
Changed in version 3.7: Deprecated
max_scan()
. Support for this option is deprecated in MongoDB 4.0. Usemax_time_ms()
instead to limit server side execution time.
-
max_time_ms
(max_time_ms)¶ Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and
ExecutionTimeout
is raised. If max_time_ms isNone
no limit is applied.Raises
TypeError
if max_time_ms is not an integer orNone
. RaisesInvalidOperation
if thisCursor
has already been used.Parameters: - max_time_ms: the time limit after which the operation is aborted
-
min
(spec)¶ Adds
min
operator that specifies lower bound for specific index.When using
min
,hint()
should also be configured to ensure the query uses the expected index and starting in MongoDB 4.2hint()
will be required.Parameters: - spec: a list of field, limit pairs specifying the inclusive lower bound for all keys of a specific index in order.
Changed in version 3.8: Deprecated cursors that use
min
without ahint()
.New in version 2.7.
-
next
()¶ Advance the cursor.
-
remove_option
(mask)¶ Unset arbitrary query flags using a bitmask.
To unset the tailable flag: cursor.remove_option(2)
-
retrieved
¶ The number of documents retrieved so far.
-
rewind
()¶ Rewind this cursor to its unevaluated state.
Reset this cursor if it has been partially or completely evaluated. Any options that are present on the cursor will remain in effect. Future iterating performed on this cursor will cause new queries to be sent to the server, even if the resultant data has already been retrieved by this cursor.
-
session
¶ The cursor’s
ClientSession
, or None.New in version 3.6.
-
skip
(skip)¶ Skips the first skip results of this cursor.
Raises
TypeError
if skip is not an integer. RaisesValueError
if skip is less than0
. RaisesInvalidOperation
if thisCursor
has already been used. The last skip applied to this cursor takes precedence.Parameters: - skip: the number of results to skip
-
sort
(key_or_list, direction=None)¶ Sorts this cursor’s results.
Pass a field name and a direction, either
ASCENDING
orDESCENDING
:for doc in collection.find().sort('field', pymongo.ASCENDING): print(doc)
To sort by multiple fields, pass a list of (key, direction) pairs:
for doc in collection.find().sort([ ('field1', pymongo.ASCENDING), ('field2', pymongo.DESCENDING)]): print(doc)
Beginning with MongoDB version 2.6, text search results can be sorted by relevance:
cursor = db.test.find( {'$text': {'$search': 'some words'}}, {'score': {'$meta': 'textScore'}}) # Sort by 'score' field. cursor.sort([('score', {'$meta': 'textScore'})]) for doc in cursor: print(doc)
For more advanced text search functionality, see MongoDB’s Atlas Search.
Raises
InvalidOperation
if this cursor has already been used. Only the lastsort()
applied to this cursor has any effect.Parameters: - key_or_list: a single key or a list of (key, direction) pairs specifying the keys to sort on
- direction (optional): only used if key_or_list is a single
key, if not given
ASCENDING
is assumed
-
where
(code)¶ Adds a $where clause to this query.
The code argument must be an instance of
basestring
(str
in python 3) orCode
containing a JavaScript expression. This expression will be evaluated for each document scanned. Only those documents for which the expression evaluates to true will be returned as results. The keyword this refers to the object currently being scanned. For example:# Find all documents where field "a" is less than "b" plus "c". for doc in db.test.find().where('this.a < (this.b + this.c)'): print(doc)
Raises
TypeError
if code is not an instance ofbasestring
(str
in python 3). RaisesInvalidOperation
if thisCursor
has already been used. Only the last call towhere()
applied to aCursor
has any effect.Parameters: - code: JavaScript expression to use as a filter
-
-
class
pymongo.cursor.
RawBatchCursor
(collection, filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, allow_disk_use=None)¶ Create a new cursor / iterator over raw batches of BSON data.
Should not be called directly by application developers - see
find_raw_batches()
instead.
cursor_manager
– Managers to handle when cursors are killed after being closed¶
DEPRECATED - A manager to handle when cursors are killed after they are closed.
New cursor managers should be defined as subclasses of CursorManager and can be
installed on a client by calling
set_cursor_manager()
.
Changed in version 3.3: Deprecated, for real this time.
Changed in version 3.0: Undeprecated. close()
now
requires an address argument. The BatchCursorManager
class is removed.
-
class
pymongo.cursor_manager.
CursorManager
(client)¶ Instantiate the manager.
Parameters: - client: a MongoClient
-
close
(cursor_id, address)¶ Kill a cursor.
Raises TypeError if cursor_id is not an instance of (int, long).
Parameters: - cursor_id: cursor id to close
- address: the cursor’s server’s (host, port) pair
Changed in version 3.0: Now requires an address argument.
database
– Database level operations¶
Database level operations.
-
pymongo.auth.
MECHANISMS
= frozenset({'MONGODB-AWS', 'MONGODB-CR', 'DEFAULT', 'MONGODB-X509', 'SCRAM-SHA-256', 'PLAIN', 'SCRAM-SHA-1', 'GSSAPI'})¶ The authentication mechanisms supported by PyMongo.
-
pymongo.
OFF
= 0¶ No database profiling.
-
pymongo.
SLOW_ONLY
= 1¶ Only profile slow operations.
-
pymongo.
ALL
= 2¶ Profile all operations.
-
class
pymongo.database.
Database
(client, name, codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get a database by client and name.
Raises
TypeError
if name is not an instance ofbasestring
(str
in python 3). RaisesInvalidName
if name is not a valid database name.Parameters: - client: A
MongoClient
instance. - name: The database name.
- codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) client.codec_options is used. - read_preference (optional): The read preference to use. If
None
(the default) client.read_preference is used. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) client.write_concern is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) client.read_concern is used.
Changed in version 3.2: Added the read_concern option.
Changed in version 3.0: Added the codec_options, read_preference, and write_concern options.
Database
no longer returns an instance ofCollection
for attribute names with leading underscores. You must use dict-style lookups instead::db[‘__my_collection__’]Not:
db.__my_collection__-
db[collection_name] || db.collection_name
Get the collection_name
Collection
ofDatabase
db.Raises
InvalidName
if an invalid collection name is used.Note
Use dictionary style access if collection_name is an attribute of the
Database
class eg: db[collection_name].
-
codec_options
¶ Read only access to the
CodecOptions
of this instance.
-
read_preference
¶ Read only access to the read preference of this instance.
Changed in version 3.0: The
read_preference
attribute is now read only.
-
write_concern
¶ Read only access to the
WriteConcern
of this instance.Changed in version 3.0: The
write_concern
attribute is now read only.
-
read_concern
¶ Read only access to the
ReadConcern
of this instance.New in version 3.2.
-
add_son_manipulator
(manipulator)¶ Add a new son manipulator to this database.
DEPRECATED - add_son_manipulator is deprecated.
Changed in version 3.0: Deprecated add_son_manipulator.
-
add_user
(name, password=None, read_only=None, session=None, **kwargs)¶ DEPRECATED: Create user name with password password.
Add a new user with permissions for this
Database
.Note
Will change the password if user name already exists.
Note
add_user is deprecated and will be removed in PyMongo 4.0. Starting with MongoDB 2.6 user management is handled with four database commands, createUser, usersInfo, updateUser, and dropUser.
To create a user:
db.command("createUser", "admin", pwd="password", roles=["root"])
To create a read-only user:
db.command("createUser", "user", pwd="password", roles=["read"])
To change a password:
db.command("updateUser", "user", pwd="newpassword")
Or change roles:
db.command("updateUser", "user", roles=["readWrite"])
Warning
Never create or modify users over an insecure network without the use of TLS. See TLS/SSL and PyMongo for more information.
Parameters: - name: the name of the user to create
- password (optional): the password of the user to create. Can not
be used with the
userSource
argument. - read_only (optional): if
True
the user will be read only - **kwargs (optional): optional fields for the user document
(e.g.
userSource
,otherDBRoles
, orroles
). See http://docs.mongodb.org/manual/reference/privilege-documents for more information. - session (optional): a
ClientSession
.
Changed in version 3.7: Added support for SCRAM-SHA-256 users with MongoDB 4.0 and later.
Changed in version 3.6: Added
session
parameter. Deprecated add_user.Changed in version 2.5: Added kwargs support for optional fields introduced in MongoDB 2.4
Changed in version 2.2: Added support for read only users
-
aggregate
(pipeline, session=None, **kwargs)¶ Perform a database-level aggregation.
See the aggregation pipeline documentation for a list of stages that are supported.
Introduced in MongoDB 3.6.
# Lists all operations currently running on the server. with client.admin.aggregate([{"$currentOp": {}}]) as cursor: for operation in cursor: print(operation)
All optional aggregate command parameters should be passed as keyword arguments to this method. Valid options include, but are not limited to:
- allowDiskUse (bool): Enables writing to temporary files. When set to True, aggregation stages can write data to the _tmp subdirectory of the –dbpath directory. The default is False.
- maxTimeMS (int): The maximum amount of time to allow the operation to run in milliseconds.
- batchSize (int): The maximum number of documents to return per batch. Ignored if the connected mongod or mongos does not support returning aggregate results using a cursor.
- collation (optional): An instance of
Collation
.
The
aggregate()
method obeys theread_preference
of thisDatabase
, except when$out
or$merge
are used, in which casePRIMARY
is used.Note
This method does not support the ‘explain’ option. Please use
command()
instead.Note
The
write_concern
of this collection is automatically applied to this operation.Parameters: - pipeline: a list of aggregation pipeline stages
- session (optional): a
ClientSession
. - **kwargs (optional): See list of options above.
Returns: A
CommandCursor
over the result set.New in version 3.9.
-
authenticate
(name=None, password=None, source=None, mechanism='DEFAULT', **kwargs)¶ DEPRECATED: Authenticate to use this database.
Warning
Starting in MongoDB 3.6, calling
authenticate()
invalidates all existing cursors. It may also leave logical sessions open on the server for up to 30 minutes until they time out.Authentication lasts for the life of the underlying client instance, or until
logout()
is called.Raises
TypeError
if (required) name, (optional) password, or (optional) source is not an instance ofbasestring
(str
in python 3).Note
- This method authenticates the current connection, and
will also cause all new
socket
connections in the underlying client instance to be authenticated automatically. - Authenticating more than once on the same database with different
credentials is not supported. You must call
logout()
before authenticating with new credentials. - When sharing a client instance between multiple threads, all threads will share the authentication. If you need different authentication profiles for different purposes you must use distinct client instances.
Parameters: - name: the name of the user to authenticate. Optional when mechanism is MONGODB-X509 and the MongoDB server version is >= 3.4.
- password (optional): the password of the user to authenticate. Not used with GSSAPI or MONGODB-X509 authentication.
- source (optional): the database to authenticate on. If not specified the current database is used.
- mechanism (optional): See
MECHANISMS
for options. If no mechanism is specified, PyMongo automatically uses MONGODB-CR when connected to a pre-3.0 version of MongoDB, SCRAM-SHA-1 when connected to MongoDB 3.0 through 3.6, and negotiates the mechanism to use (SCRAM-SHA-1 or SCRAM-SHA-256) when connected to MongoDB 4.0+. - authMechanismProperties (optional): Used to specify
authentication mechanism specific options. To specify the service
name for GSSAPI authentication pass
authMechanismProperties='SERVICE_NAME:<service name>'
. To specify the session token for MONGODB-AWS authentication passauthMechanismProperties='AWS_SESSION_TOKEN:<session token>'
.
Changed in version 3.7: Added support for SCRAM-SHA-256 with MongoDB 4.0 and later.
Changed in version 3.5: Deprecated. Authenticating multiple users conflicts with support for logical sessions in MongoDB 3.6. To authenticate as multiple users, create multiple instances of MongoClient.
New in version 2.8: Use SCRAM-SHA-1 with MongoDB 3.0 and later.
Changed in version 2.5: Added the source and mechanism parameters.
authenticate()
now raises a subclass ofPyMongoError
if authentication fails due to invalid credentials or configuration issues.- This method authenticates the current connection, and
will also cause all new
-
collection_names
(include_system_collections=True, session=None)¶ DEPRECATED: Get a list of all the collection names in this database.
Parameters: - include_system_collections (optional): if
False
list will not include system collections (e.gsystem.indexes
) - session (optional): a
ClientSession
.
Changed in version 3.7: Deprecated. Use
list_collection_names()
instead.Changed in version 3.6: Added
session
parameter.- include_system_collections (optional): if
-
command
(command, value=1, check=True, allowable_errors=None, read_preference=None, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)), session=None, **kwargs)¶ Issue a MongoDB command.
Send command command to the database and return the response. If command is an instance of
basestring
(str
in python 3) then the command {command: value} will be sent. Otherwise, command must be an instance ofdict
and will be sent as is.Any additional keyword arguments will be added to the final command document before it is sent.
For example, a command like
{buildinfo: 1}
can be sent using:>>> db.command("buildinfo")
For a command where the value matters, like
{collstats: collection_name}
we can do:>>> db.command("collstats", collection_name)
For commands that take additional arguments we can use kwargs. So
{filemd5: object_id, root: file_root}
becomes:>>> db.command("filemd5", object_id, root=file_root)
Parameters: command: document representing the command to be issued, or the name of the command (for simple commands only).
Note
the order of keys in the command document is significant (the “verb” must come first), so commands which require multiple keys (e.g. findandmodify) should use an instance of
SON
or a string and kwargs instead of a Python dict.value (optional): value to use for the command verb when command is passed as a string
check (optional): check the response for errors, raising
OperationFailure
if there are anyallowable_errors: if check is
True
, error messages in this list will be ignored by error-checkingread_preference (optional): The read preference for this operation. See
read_preferences
for options. If the provided session is in a transaction, defaults to the read preference configured for the transaction. Otherwise, defaults toPRIMARY
.codec_options: A
CodecOptions
instance.session (optional): A
ClientSession
.**kwargs (optional): additional keyword arguments will be added to the command document before it is sent
Note
command()
does not obey this Database’sread_preference
orcodec_options
. You must use the read_preference and codec_options parameters instead.Note
command()
does not apply any custom TypeDecoders when decoding the command response.Changed in version 3.6: Added
session
parameter.Changed in version 3.0: Removed the as_class, fields, uuid_subtype, tag_sets, and secondary_acceptable_latency_ms option. Removed compile_re option: PyMongo now always represents BSON regular expressions as
Regex
objects. Usetry_compile()
to attempt to convert from a BSON regular expression to a Python regular expression object. Added the codec_options parameter.Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as
Regex
objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.Changed in version 2.3: Added tag_sets and secondary_acceptable_latency_ms options.
Changed in version 2.2: Added support for as_class - the class you want to use for the resulting documents
-
create_collection
(name, codec_options=None, read_preference=None, write_concern=None, read_concern=None, session=None, **kwargs)¶ Create a new
Collection
in this database.Normally collection creation is automatic. This method should only be used to specify options on creation.
CollectionInvalid
will be raised if the collection already exists.Options should be passed as keyword arguments to this method. Supported options vary with MongoDB release. Some examples include:
- “size”: desired initial size for the collection (in bytes). For capped collections this size is the max size of the collection.
- “capped”: if True, this is a capped collection
- “max”: maximum number of objects if capped (optional)
See the MongoDB documentation for a full list of supported options by server version.
Parameters: - name: the name of the collection to create
- codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisDatabase
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisDatabase
is used. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisDatabase
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisDatabase
is used. - collation (optional): An instance of
Collation
. - session (optional): a
ClientSession
. - **kwargs (optional): additional keyword arguments will be passed as options for the create collection command
Changed in version 3.11: This method is now supported inside multi-document transactions with MongoDB 4.4+.
Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Added the collation option.
Changed in version 3.0: Added the codec_options, read_preference, and write_concern options.
Changed in version 2.2: Removed deprecated argument: options
-
current_op
(include_all=False, session=None)¶ DEPRECATED: Get information on operations currently running.
Starting with MongoDB 3.6 this helper is obsolete. The functionality provided by this helper is available in MongoDB 3.6+ using the $currentOp aggregation pipeline stage, which can be used with
aggregate()
. Note that, while this helper can only return a single document limited to a 16MB result,aggregate()
returns a cursor avoiding that limitation.Users of MongoDB versions older than 3.6 can use the currentOp command directly:
# MongoDB 3.2 and 3.4 client.admin.command("currentOp")
Or query the “inprog” virtual collection:
# MongoDB 2.6 and 3.0 client.admin["$cmd.sys.inprog"].find_one()
Parameters: - include_all (optional): if
True
also list currently idle operations in the result - session (optional): a
ClientSession
.
Changed in version 3.9: Deprecated.
Changed in version 3.6: Added
session
parameter.- include_all (optional): if
-
dereference
(dbref, session=None, **kwargs)¶ Dereference a
DBRef
, getting the document it points to.Raises
TypeError
if dbref is not an instance ofDBRef
. Returns a document, orNone
if the reference does not point to a valid document. RaisesValueError
if dbref has a database specified that is different from the current database.Parameters: - dbref: the reference
- session (optional): a
ClientSession
. - **kwargs (optional): any additional keyword arguments
are the same as the arguments to
find()
.
Changed in version 3.6: Added
session
parameter.
-
drop_collection
(name_or_collection, session=None)¶ Drop a collection.
Parameters: - name_or_collection: the name of a collection to drop or the collection object itself
- session (optional): a
ClientSession
.
Note
The
write_concern
of this database is automatically applied to this operation when using MongoDB >= 3.4.Changed in version 3.6: Added
session
parameter.Changed in version 3.4: Apply this database’s write concern automatically to this operation when connected to MongoDB >= 3.4.
-
error
()¶ DEPRECATED: Get the error if one occurred on the last operation.
This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern
w=1
and report their errors by default.Changed in version 2.8: Deprecated.
-
eval
(code, *args)¶ DEPRECATED: Evaluate a JavaScript expression in MongoDB.
Parameters: - code: string representation of JavaScript code to be evaluated
- args (optional): additional positional arguments are passed to the code being evaluated
Warning
the eval command is deprecated in MongoDB 3.0 and will be removed in a future server version.
-
get_collection
(name, codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get a
Collection
with the given name and options.Useful for creating a
Collection
with different codec options, read preference, and/or write concern from thisDatabase
.>>> db.read_preference Primary() >>> coll1 = db.test >>> coll1.read_preference Primary() >>> from pymongo import ReadPreference >>> coll2 = db.get_collection( ... 'test', read_preference=ReadPreference.SECONDARY) >>> coll2.read_preference Secondary(tag_sets=None)
Parameters: - name: The name of the collection - a string.
- codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisDatabase
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisDatabase
is used. Seeread_preferences
for options. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisDatabase
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisDatabase
is used.
-
incoming_copying_manipulators
¶ DEPRECATED: All incoming SON copying manipulators.
Changed in version 3.5: Deprecated.
New in version 2.0.
-
incoming_manipulators
¶ DEPRECATED: All incoming SON manipulators.
Changed in version 3.5: Deprecated.
New in version 2.0.
-
last_status
()¶ DEPRECATED: Get status information from the last operation.
This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern
w=1
and report their errors by default.Returns a SON object with status information.
Changed in version 2.8: Deprecated.
-
list_collection_names
(session=None, filter=None, **kwargs)¶ Get a list of all the collection names in this database.
For example, to list all non-system collections:
filter = {"name": {"$regex": r"^(?!system\.)"}} db.list_collection_names(filter=filter)
Parameters: - session (optional): a
ClientSession
. - filter (optional): A query document to filter the list of collections returned from the listCollections command.
- **kwargs (optional): Optional parameters of the listCollections command can be passed as keyword arguments to this method. The supported options differ by server version.
Changed in version 3.8: Added the
filter
and**kwargs
parameters.New in version 3.6.
- session (optional): a
-
list_collections
(session=None, filter=None, **kwargs)¶ Get a cursor over the collectons of this database.
Parameters: - session (optional): a
ClientSession
. - filter (optional): A query document to filter the list of collections returned from the listCollections command.
- **kwargs (optional): Optional parameters of the listCollections command can be passed as keyword arguments to this method. The supported options differ by server version.
Returns: An instance of
CommandCursor
.New in version 3.6.
- session (optional): a
-
logout
()¶ DEPRECATED: Deauthorize use of this database.
Warning
Starting in MongoDB 3.6, calling
logout()
invalidates all existing cursors. It may also leave logical sessions open on the server for up to 30 minutes until they time out.
-
outgoing_copying_manipulators
¶ DEPRECATED: All outgoing SON copying manipulators.
Changed in version 3.5: Deprecated.
New in version 2.0.
-
outgoing_manipulators
¶ DEPRECATED: All outgoing SON manipulators.
Changed in version 3.5: Deprecated.
New in version 2.0.
-
previous_error
()¶ DEPRECATED: Get the most recent error on this database.
This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern
w=1
and report their errors by default.Only returns errors that have occurred since the last call to
reset_error_history()
. Returns None if no such errors have occurred.Changed in version 2.8: Deprecated.
-
profiling_info
(session=None)¶ Returns a list containing current profiling information.
Parameters: - session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter.- session (optional): a
-
profiling_level
(session=None)¶ Get the database’s current profiling level.
Returns one of (
OFF
,SLOW_ONLY
,ALL
).Parameters: - session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter.- session (optional): a
-
remove_user
(name, session=None)¶ DEPRECATED: Remove user name from this
Database
.User name will no longer have permissions to access this
Database
.Note
remove_user is deprecated and will be removed in PyMongo 4.0. Use the dropUser command instead:
db.command("dropUser", "user")
Parameters: - name: the name of the user to remove
- session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter. Deprecated remove_user.
-
reset_error_history
()¶ DEPRECATED: Reset the error history of this database.
This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern
w=1
and report their errors by default.Calls to
previous_error()
will only return errors that have occurred since the most recent call to this method.Changed in version 2.8: Deprecated.
-
set_profiling_level
(level, slow_ms=None, session=None)¶ Set the database’s profiling level.
Parameters: - level: Specifies a profiling level, see list of possible values below.
- slow_ms: Optionally modify the threshold for the profile to consider a query or operation. Even if the profiler is off queries slower than the slow_ms level will get written to the logs.
- session (optional): a
ClientSession
.
Possible level values:
Level Setting OFF
Off. No profiling. SLOW_ONLY
On. Only includes slow operations. ALL
On. Includes all operations. Raises
ValueError
if level is not one of (OFF
,SLOW_ONLY
,ALL
).Changed in version 3.6: Added
session
parameter.
-
system_js
¶ DEPRECATED:
SystemJS
helper for thisDatabase
.See the documentation for
SystemJS
for more details.
-
validate_collection
(name_or_collection, scandata=False, full=False, session=None, background=None)¶ Validate a collection.
Returns a dict of validation info. Raises CollectionInvalid if validation fails.
See also the MongoDB documentation on the validate command.
Parameters: - name_or_collection: A Collection object or the name of a collection to validate.
- scandata: Do extra checks beyond checking the overall structure of the collection.
- full: Have the server do a more thorough scan of the collection. Use with scandata for a thorough scan of the structure of the collection and the individual documents.
- session (optional): a
ClientSession
. - background (optional): A boolean flag that determines whether the command runs in the background. Requires MongoDB 4.4+.
Changed in version 3.11: Added
background
parameter.Changed in version 3.6: Added
session
parameter.
-
watch
(pipeline=None, full_document=None, resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, start_at_operation_time=None, session=None, start_after=None)¶ Watch changes on this database.
Performs an aggregation with an implicit initial
$changeStream
stage and returns aDatabaseChangeStream
cursor which iterates over changes on all collections in this database.Introduced in MongoDB 4.0.
with db.watch() as stream: for change in stream: print(change)
The
DatabaseChangeStream
iterable blocks until the next change document is returned or an error is raised. If thenext()
method encounters a network error when retrieving a batch from the server, it will automatically attempt to recreate the cursor such that no change events are missed. Any error encountered during the resume attempt indicates there may be an outage and will be raised.try: with db.watch( [{'$match': {'operationType': 'insert'}}]) as stream: for insert_change in stream: print(insert_change) except pymongo.errors.PyMongoError: # The ChangeStream encountered an unrecoverable error or the # resume attempt failed to recreate the cursor. logging.error('...')
For a precise description of the resume process see the change streams specification.
Parameters: - pipeline (optional): A list of aggregation pipeline stages to
append to an initial
$changeStream
stage. Not all pipeline stages are valid after a$changeStream
stage, see the MongoDB documentation on change streams for the supported stages. - full_document (optional): The fullDocument to pass as an option
to the
$changeStream
stage. Allowed values: ‘updateLookup’. When set to ‘updateLookup’, the change notification for partial updates will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred. - resume_after (optional): A resume token. If provided, the change stream will start returning changes that occur directly after the operation specified in the resume token. A resume token is the _id value of a change document.
- max_await_time_ms (optional): The maximum time in milliseconds for the server to wait for changes before responding to a getMore operation.
- batch_size (optional): The maximum number of documents to return per batch.
- collation (optional): The
Collation
to use for the aggregation. - start_at_operation_time (optional): If provided, the resulting
change stream will only return changes that occurred at or after
the specified
Timestamp
. Requires MongoDB >= 4.0. - session (optional): a
ClientSession
. - start_after (optional): The same as resume_after except that start_after can resume notifications after an invalidate event. This option and resume_after are mutually exclusive.
Returns: A
DatabaseChangeStream
cursor.Changed in version 3.9: Added the
start_after
parameter.New in version 3.7.
- pipeline (optional): A list of aggregation pipeline stages to
append to an initial
-
with_options
(codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get a clone of this database changing the specified settings.
>>> db1.read_preference Primary() >>> from pymongo import ReadPreference >>> db2 = db1.with_options(read_preference=ReadPreference.SECONDARY) >>> db1.read_preference Primary() >>> db2.read_preference Secondary(tag_sets=None)
Parameters: - codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisCollection
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisCollection
is used. Seeread_preferences
for options. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisCollection
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisCollection
is used.
New in version 3.8.
- codec_options (optional): An instance of
- client: A
driver_info
¶
Advanced options for MongoDB drivers implemented on top of PyMongo.
-
class
pymongo.driver_info.
DriverInfo
(name=None, version=None, platform=None)¶ Info about a driver wrapping PyMongo.
The MongoDB server logs PyMongo’s name, version, and platform whenever PyMongo establishes a connection. A driver implemented on top of PyMongo can add its own info to this log message. Initialize with three strings like ‘MyDriver’, ‘1.2.3’, ‘some platform info’. Any of these strings may be None to accept PyMongo’s default.
encryption
– Client-Side Field Level Encryption¶
Support for explicit client-side field level encryption.
-
class
pymongo.encryption.
Algorithm
¶ An enum that defines the supported encryption algorithms.
-
class
pymongo.encryption.
ClientEncryption
(kms_providers, key_vault_namespace, key_vault_client, codec_options)¶ Explicit client-side field level encryption.
The ClientEncryption class encapsulates explicit operations on a key vault collection that cannot be done directly on a MongoClient. Similar to configuring auto encryption on a MongoClient, it is constructed with a MongoClient (to a MongoDB cluster containing the key vault collection), KMS provider configuration, and keyVaultNamespace. It provides an API for explicitly encrypting and decrypting values, and creating data keys. It does not provide an API to query keys from the key vault collection, as this can be done directly on the MongoClient.
See Explicit Encryption for an example.
Parameters: kms_providers: Map of KMS provider options. Two KMS providers are supported: “aws” and “local”. The kmsProviders map values differ by provider:
- aws: Map with “accessKeyId” and “secretAccessKey” as strings. These are the AWS access key ID and AWS secret access key used to generate KMS messages.
- azure: Map with “tenantId”, “clientId”, and “clientSecret” as strings. Additionally, “identityPlatformEndpoint” may also be specified as a string (defaults to ‘login.microsoftonline.com’). These are the Azure Active Directory credentials used to generate Azure Key Vault messages.
- gcp: Map with “email” as a string and “privateKey” as bytes or a base64 encoded string (unicode on Python 2). Additionally, “endpoint” may also be specified as a string (defaults to ‘oauth2.googleapis.com’). These are the credentials used to generate Google Cloud KMS messages.
- local: Map with “key” as bytes (96 bytes in length) or a base64 encoded string (unicode on Python 2) which decodes to 96 bytes. “key” is the master key used to encrypt/decrypt data keys. This key should be generated and stored as securely as possible.
key_vault_namespace: The namespace for the key vault collection. The key vault collection contains all data keys used for encryption and decryption. Data keys are stored as documents in this MongoDB collection. Data keys are protected with encryption by a KMS provider.
key_vault_client: A MongoClient connected to a MongoDB cluster containing the key_vault_namespace collection.
codec_options: An instance of
CodecOptions
to use when encoding a value for encryption and decoding the decrypted BSON value. This should be the same CodecOptions instance configured on the MongoClient, Database, or Collection used to access application data.
New in version 3.9.
-
close
()¶ Release resources.
Note that using this class in a with-statement will automatically call
close()
:with ClientEncryption(...) as client_encryption: encrypted = client_encryption.encrypt(value, ...) decrypted = client_encryption.decrypt(encrypted)
-
create_data_key
(kms_provider, master_key=None, key_alt_names=None)¶ Create and insert a new data key into the key vault collection.
Parameters: kms_provider: The KMS provider to use. Supported values are “aws” and “local”.
master_key: Identifies a KMS-specific key used to encrypt the new data key. If the kmsProvider is “local” the master_key is not applicable and may be omitted.
If the kms_provider is “aws” it is required and has the following fields:
- `region` (string): Required. The AWS region, e.g. "us-east-1". - `key` (string): Required. The Amazon Resource Name (ARN) to the AWS customer. - `endpoint` (string): Optional. An alternate host to send KMS requests to. May include port number, e.g. "kms.us-east-1.amazonaws.com:443".
If the kms_provider is “azure” it is required and has the following fields:
- `keyVaultEndpoint` (string): Required. Host with optional port, e.g. "example.vault.azure.net". - `keyName` (string): Required. Key name in the key vault. - `keyVersion` (string): Optional. Version of the key to use.
If the kms_provider is “gcp” it is required and has the following fields:
- `projectId` (string): Required. The Google cloud project ID. - `location` (string): Required. The GCP location, e.g. "us-east1". - `keyRing` (string): Required. Name of the key ring that contains the key to use. - `keyName` (string): Required. Name of the key to use. - `keyVersion` (string): Optional. Version of the key to use. - `endpoint` (string): Optional. Host with optional port. Defaults to "cloudkms.googleapis.com".
key_alt_names (optional): An optional list of string alternate names used to reference a key. If a key is created with alternate names, then encryption may refer to the key by the unique alternate name instead of by
key_id
. The following example shows creating and referring to a data key by alternate name:client_encryption.create_data_key("local", keyAltNames=["name1"]) # reference the key with the alternate name client_encryption.encrypt("457-55-5462", keyAltName="name1", algorithm=Algorithm.Random)
Returns: The
_id
of the created data key document as aBinary
with subtypeUUID_SUBTYPE
.
-
decrypt
(value)¶ Decrypt an encrypted value.
Parameters: - value (Binary): The encrypted value, a
Binary
with subtype 6.
Returns: The decrypted BSON value.
- value (Binary): The encrypted value, a
-
encrypt
(value, algorithm, key_id=None, key_alt_name=None)¶ Encrypt a BSON value with a given key and algorithm.
Note that exactly one of
key_id
orkey_alt_name
must be provided.Parameters: - value: The BSON value to encrypt.
- algorithm (string): The encryption algorithm to use. See
Algorithm
for some valid options. - key_id: Identifies a data key by
_id
which must be aBinary
with subtype 4 (UUID_SUBTYPE
). - key_alt_name: Identifies a key vault document by ‘keyAltName’.
Returns: The encrypted value, a
Binary
with subtype 6.
encryption_options
– Automatic Client-Side Field Level Encryption¶
Support for automatic client-side field level encryption.
-
class
pymongo.encryption_options.
AutoEncryptionOpts
(kms_providers, key_vault_namespace, key_vault_client=None, schema_map=None, bypass_auto_encryption=False, mongocryptd_uri='mongodb://localhost:27020', mongocryptd_bypass_spawn=False, mongocryptd_spawn_path='mongocryptd', mongocryptd_spawn_args=None)¶ Options to configure automatic client-side field level encryption.
Automatic client-side field level encryption requires MongoDB 4.2 enterprise or a MongoDB 4.2 Atlas cluster. Automatic encryption is not supported for operations on a database or view and will result in error.
Although automatic encryption requires MongoDB 4.2 enterprise or a MongoDB 4.2 Atlas cluster, automatic decryption is supported for all users. To configure automatic decryption without automatic encryption set
bypass_auto_encryption=True
. Explicit encryption and explicit decryption is also supported for all users with theClientEncryption
class.See Automatic Client-Side Field Level Encryption for an example.
Parameters: kms_providers: Map of KMS provider options. Two KMS providers are supported: “aws” and “local”. The kmsProviders map values differ by provider:
- aws: Map with “accessKeyId” and “secretAccessKey” as strings. These are the AWS access key ID and AWS secret access key used to generate KMS messages.
- azure: Map with “tenantId”, “clientId”, and “clientSecret” as strings. Additionally, “identityPlatformEndpoint” may also be specified as a string (defaults to ‘login.microsoftonline.com’). These are the Azure Active Directory credentials used to generate Azure Key Vault messages.
- gcp: Map with “email” as a string and “privateKey” as bytes or a base64 encoded string (unicode on Python 2). Additionally, “endpoint” may also be specified as a string (defaults to ‘oauth2.googleapis.com’). These are the credentials used to generate Google Cloud KMS messages.
- local: Map with “key” as bytes (96 bytes in length) or a base64 encoded string (unicode on Python 2) which decodes to 96 bytes. “key” is the master key used to encrypt/decrypt data keys. This key should be generated and stored as securely as possible.
key_vault_namespace: The namespace for the key vault collection. The key vault collection contains all data keys used for encryption and decryption. Data keys are stored as documents in this MongoDB collection. Data keys are protected with encryption by a KMS provider.
key_vault_client (optional): By default the key vault collection is assumed to reside in the same MongoDB cluster as the encrypted MongoClient. Use this option to route data key queries to a separate MongoDB cluster.
schema_map (optional): Map of collection namespace (“db.coll”) to JSON Schema. By default, a collection’s JSONSchema is periodically polled with the listCollections command. But a JSONSchema may be specified locally with the schemaMap option.
Supplying a `schema_map` provides more security than relying on JSON Schemas obtained from the server. It protects against a malicious server advertising a false JSON Schema, which could trick the client into sending unencrypted data that should be encrypted.
Schemas supplied in the schemaMap only apply to configuring automatic encryption for client side encryption. Other validation rules in the JSON schema will not be enforced by the driver and will result in an error.
bypass_auto_encryption (optional): If
True
, automatic encryption will be disabled but automatic decryption will still be enabled. Defaults toFalse
.mongocryptd_uri (optional): The MongoDB URI used to connect to the local mongocryptd process. Defaults to
'mongodb://localhost:27020'
.mongocryptd_bypass_spawn (optional): If
True
, the encrypted MongoClient will not attempt to spawn the mongocryptd process. Defaults toFalse
.mongocryptd_spawn_path (optional): Used for spawning the mongocryptd process. Defaults to
'mongocryptd'
and spawns mongocryptd from the system path.mongocryptd_spawn_args (optional): A list of string arguments to use when spawning the mongocryptd process. Defaults to
['--idleShutdownTimeoutSecs=60']
. If the list does not include theidleShutdownTimeoutSecs
option then'--idleShutdownTimeoutSecs=60'
will be added.
New in version 3.9.
errors
– Exceptions raised by the pymongo
package¶
Exceptions raised by PyMongo.
-
exception
pymongo.errors.
AutoReconnect
(message='', errors=None)¶ Raised when a connection to the database is lost and an attempt to auto-reconnect will be made.
In order to auto-reconnect you must handle this exception, recognizing that the operation which caused it has not necessarily succeeded. Future operations will attempt to open a new connection to the database (and will continue to raise this exception until the first successful connection is made).
Subclass of
ConnectionFailure
.
-
exception
pymongo.errors.
BulkWriteError
(results)¶ Exception class for bulk write errors.
New in version 2.7.
-
exception
pymongo.errors.
CollectionInvalid
(message='', error_labels=None)¶ Raised when collection validation fails.
-
exception
pymongo.errors.
ConfigurationError
(message='', error_labels=None)¶ Raised when something is incorrectly configured.
-
exception
pymongo.errors.
ConnectionFailure
(message='', error_labels=None)¶ Raised when a connection to the database cannot be made or is lost.
-
exception
pymongo.errors.
CursorNotFound
(error, code=None, details=None, max_wire_version=None)¶ Raised while iterating query results if the cursor is invalidated on the server.
New in version 2.7.
-
exception
pymongo.errors.
DocumentTooLarge
¶ Raised when an encoded document is too large for the connected server.
-
exception
pymongo.errors.
DuplicateKeyError
(error, code=None, details=None, max_wire_version=None)¶ Raised when an insert or update fails due to a duplicate key error.
-
exception
pymongo.errors.
EncryptionError
(cause)¶ Raised when encryption or decryption fails.
This error always wraps another exception which can be retrieved via the
cause
property.New in version 3.9.
-
cause
¶ The exception that caused this encryption or decryption error.
-
-
exception
pymongo.errors.
ExceededMaxWaiters
(message='', error_labels=None)¶ Raised when a thread tries to get a connection from a pool and
maxPoolSize * waitQueueMultiple
threads are already waiting.New in version 2.6.
-
exception
pymongo.errors.
ExecutionTimeout
(error, code=None, details=None, max_wire_version=None)¶ Raised when a database operation times out, exceeding the $maxTimeMS set in the query or command option.
Note
Requires server version >= 2.6.0
New in version 2.7.
-
exception
pymongo.errors.
InvalidName
(message='', error_labels=None)¶ Raised when an invalid name is used.
-
exception
pymongo.errors.
InvalidOperation
(message='', error_labels=None)¶ Raised when a client attempts to perform an invalid operation.
-
exception
pymongo.errors.
InvalidURI
(message='', error_labels=None)¶ Raised when trying to parse an invalid mongodb URI.
-
exception
pymongo.errors.
NetworkTimeout
(message='', errors=None)¶ An operation on an open connection exceeded socketTimeoutMS.
The remaining connections in the pool stay open. In the case of a write operation, you cannot know whether it succeeded or failed.
Subclass of
AutoReconnect
.
-
exception
pymongo.errors.
NotMasterError
(message='', errors=None)¶ The server responded “not master” or “node is recovering”.
These errors result from a query, write, or command. The operation failed because the client thought it was using the primary but the primary has stepped down, or the client thought it was using a healthy secondary but the secondary is stale and trying to recover.
The client launches a refresh operation on a background thread, to update its view of the server as soon as possible after throwing this exception.
Subclass of
AutoReconnect
.
-
exception
pymongo.errors.
OperationFailure
(error, code=None, details=None, max_wire_version=None)¶ Raised when a database operation fails.
New in version 2.7: The
details
attribute.-
code
¶ The error code returned by the server, if any.
-
details
¶ The complete error document returned by the server.
Depending on the error that occurred, the error document may include useful information beyond just the error message. When connected to a mongos the error document may contain one or more subdocuments if errors occurred on multiple shards.
-
-
exception
pymongo.errors.
ProtocolError
(message='', error_labels=None)¶ Raised for failures related to the wire protocol.
-
exception
pymongo.errors.
PyMongoError
(message='', error_labels=None)¶ Base class for all PyMongo exceptions.
-
has_error_label
(label)¶ Return True if this error contains the given label.
New in version 3.7.
-
-
exception
pymongo.errors.
ServerSelectionTimeoutError
(message='', errors=None)¶ Thrown when no MongoDB server is available for an operation
If there is no suitable server for an operation PyMongo tries for
serverSelectionTimeoutMS
(default 30 seconds) to find one, then throws this exception. For example, it is thrown after attempting an operation when PyMongo cannot connect to any server, or if you attempt an insert into a replica set that has no primary and does not elect one within the timeout window, or if you attempt to query with a Read Preference that the replica set cannot satisfy.
-
exception
pymongo.errors.
WTimeoutError
(error, code=None, details=None, max_wire_version=None)¶ Raised when a database operation times out (i.e. wtimeout expires) before replication completes.
With newer versions of MongoDB the details attribute may include write concern fields like ‘n’, ‘updatedExisting’, or ‘writtenTo’.
New in version 2.7.
-
exception
pymongo.errors.
WriteConcernError
(error, code=None, details=None, max_wire_version=None)¶ Base exception type for errors raised due to write concern.
New in version 3.0.
-
exception
pymongo.errors.
WriteError
(error, code=None, details=None, max_wire_version=None)¶ Base exception type for errors raised during write operations.
New in version 3.0.
message
– Tools for creating messages to be sent to MongoDB¶
Tools for creating messages to be sent to MongoDB.
Note
This module is for internal use and is generally not needed by application developers.
-
pymongo.message.
delete
(collection_name, spec, safe, last_error_args, opts, flags=0, ctx=None)¶ Get a delete message.
opts is a CodecOptions. flags is a bit vector that may contain the SingleRemove flag or not:
http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/#op-delete
-
pymongo.message.
get_more
(collection_name, num_to_return, cursor_id, ctx=None)¶ Get a getMore message.
-
pymongo.message.
insert
(collection_name, docs, check_keys, safe, last_error_args, continue_on_error, opts, ctx=None)¶ Get an insert message.
-
pymongo.message.
kill_cursors
(cursor_ids)¶ Get a killCursors message.
-
pymongo.message.
query
(options, collection_name, num_to_skip, num_to_return, query, field_selector, opts, check_keys=False, ctx=None)¶ Get a query message.
-
pymongo.message.
update
(collection_name, upsert, multi, spec, doc, safe, last_error_args, check_keys, opts, ctx=None)¶ Get an update message.
mongo_client
– Tools for connecting to MongoDB¶
Tools for connecting to MongoDB.
See also
High Availability and PyMongo for examples of connecting to replica sets or sets of mongos servers.
To get a Database
instance from a
MongoClient
use either dictionary-style or attribute-style
access:
>>> from pymongo import MongoClient
>>> c = MongoClient()
>>> c.test_database
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test_database')
>>> c['test-database']
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test-database')
-
class
pymongo.mongo_client.
MongoClient
(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)¶ Client for a MongoDB instance, a replica set, or a set of mongoses.
The client object is thread-safe and has connection-pooling built in. If an operation fails because of a network error,
ConnectionFailure
is raised and the client reconnects in the background. Application code should handle this exception (recognizing that the operation failed) and then continue to execute.The host parameter can be a full mongodb URI, in addition to a simple hostname. It can also be a list of hostnames or URIs. Any port specified in the host string(s) will override the port parameter. If multiple mongodb URIs containing database or auth information are passed, the last database, username, and password present will be used. For username and passwords reserved characters like ‘:’, ‘/’, ‘+’ and ‘@’ must be percent encoded following RFC 2396:
try: # Python 3.x from urllib.parse import quote_plus except ImportError: # Python 2.x from urllib import quote_plus uri = "mongodb://%s:%s@%s" % ( quote_plus(user), quote_plus(password), host) client = MongoClient(uri)
Unix domain sockets are also supported. The socket path must be percent encoded in the URI:
uri = "mongodb://%s:%s@%s" % ( quote_plus(user), quote_plus(password), quote_plus(socket_path)) client = MongoClient(uri)
But not when passed as a simple hostname:
client = MongoClient('/tmp/mongodb-27017.sock')
Starting with version 3.6, PyMongo supports mongodb+srv:// URIs. The URI must include one, and only one, hostname. The hostname will be resolved to one or more DNS SRV records which will be used as the seed list for connecting to the MongoDB deployment. When using SRV URIs, the authSource and replicaSet configuration options can be specified using TXT records. See the Initial DNS Seedlist Discovery spec for more details. Note that the use of SRV URIs implicitly enables TLS support. Pass tls=false in the URI to override.
Note
MongoClient creation will block waiting for answers from DNS when mongodb+srv:// URIs are used.
Note
Starting with version 3.0 the
MongoClient
constructor no longer blocks while connecting to the server or servers, and it no longer raisesConnectionFailure
if they are unavailable, norConfigurationError
if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads. You can check if the server is available like this:from pymongo.errors import ConnectionFailure client = MongoClient() try: # The ismaster command is cheap and does not require auth. client.admin.command('ismaster') except ConnectionFailure: print("Server not available")
Warning
When using PyMongo in a multiprocessing context, please read Using PyMongo with Multiprocessing first.
Note
Many of the following options can be passed using a MongoDB URI or keyword parameters. If the same option is passed in a URI and as a keyword parameter the keyword parameter takes precedence.
Parameters: - host (optional): hostname or IP address or Unix domain socket path of a single mongod or mongos instance to connect to, or a mongodb URI, or a list of hostnames / mongodb URIs. If host is an IPv6 literal it must be enclosed in ‘[’ and ‘]’ characters following the RFC2732 URL syntax (e.g. ‘[::1]’ for localhost). Multihomed and round robin DNS addresses are not supported.
- port (optional): port number on which to connect
- document_class (optional): default class to use for documents returned from queries on this client
- type_registry (optional): instance of
TypeRegistry
to enable encoding and decoding of custom types. - tz_aware (optional): if
True
,datetime
instances returned as values in a document by thisMongoClient
will be timezone aware (otherwise they will be naive) - connect (optional): if
True
(the default), immediately begin connecting to MongoDB in the background. Otherwise connect on the first operation. - directConnection (optional): if
True
, forces this client to - connect directly to the specified MongoDB host as a standalone.
If
false
, the client connects to the entire replica set of which the given MongoDB host(s) is a part. If this isTrue
and a mongodb+srv:// URI or a URI containing multiple seeds is provided, an exception will be raised.
- directConnection (optional): if
Other optional parameters can be passed as keyword arguments:maxPoolSize (optional): The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.
minPoolSize (optional): The minimum required number of concurrent connections that the pool will maintain to each connected server. Default is 0.
maxIdleTimeMS (optional): The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced. Defaults to None (no limit).
socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred.
0
orNone
means no timeout. Defaults toNone
(no timeout).connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable.
0
orNone
means no timeout. Defaults to20000
(20 seconds).server_selector: (callable or None) Optional, user-provided function that augments server selection rules. The function should accept as an argument a list of
ServerDescription
objects and return a list of server descriptions that should be considered suitable for the desired operation.serverSelectionTimeoutMS: (integer) Controls how long (in milliseconds) the driver will wait to find an available, appropriate server to carry out a database operation; while it is waiting, multiple server monitoring operations may be carried out, each controlled by connectTimeoutMS. Defaults to
30000
(30 seconds).waitQueueTimeoutMS: (integer or None) How long (in milliseconds) a thread will wait for a socket from the pool if the pool has no free sockets. Defaults to
None
(no timeout).waitQueueMultiple: (integer or None) Multiplied by maxPoolSize to give the number of threads allowed to wait for a socket at one time. Defaults to
None
(no limit).heartbeatFrequencyMS: (optional) The number of milliseconds between periodic server checks, or None to accept the default frequency of 10 seconds.
appname: (string or None) The name of the application that created this MongoClient instance. MongoDB 3.4 and newer will print this value in the server log upon establishing each connection. It is also recorded in the slow query log and profile collections.
driver: (pair or None) A driver implemented on top of PyMongo can pass a
DriverInfo
to add its name, version, and platform to the message printed in the server log when establishing a connection.event_listeners: a list or tuple of event listeners. See
monitoring
for details.retryWrites: (boolean) Whether supported write operations executed within this MongoClient will be retried once after a network error on MongoDB 3.6+. Defaults to
True
. The supported write operations are:bulk_write()
, as long asUpdateMany
orDeleteMany
are not included.delete_one()
insert_one()
insert_many()
replace_one()
update_one()
find_one_and_delete()
find_one_and_replace()
find_one_and_update()
Unsupported write operations include, but are not limited to,
aggregate()
using the$out
pipeline operator and any operation with an unacknowledged write concern (e.g. {w: 0})). See https://github.com/mongodb/specifications/blob/master/source/retryable-writes/retryable-writes.rstretryReads: (boolean) Whether supported read operations executed within this MongoClient will be retried once after a network error on MongoDB 3.6+. Defaults to
True
. The supported read operations are:find()
,find_one()
,aggregate()
without$out
,distinct()
,count()
,estimated_document_count()
,count_documents()
,pymongo.collection.Collection.watch()
,list_indexes()
,pymongo.database.Database.watch()
,list_collections()
,pymongo.mongo_client.MongoClient.watch()
, andlist_databases()
.Unsupported read operations include, but are not limited to:
map_reduce()
,inline_map_reduce()
,command()
, and any getMore operation on a cursor.Enabling retryable reads makes applications more resilient to transient errors such as network failures, database upgrades, and replica set failovers. For an exact definition of which errors trigger a retry, see the retryable reads specification.
socketKeepAlive: (boolean) DEPRECATED Whether to send periodic keep-alive packets on connected sockets. Defaults to
True
. Disabling it is not recommended, see https://docs.mongodb.com/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-mongodb-deployments”,compressors: Comma separated list of compressors for wire protocol compression. The list is used to negotiate a compressor with the server. Currently supported options are “snappy”, “zlib” and “zstd”. Support for snappy requires the python-snappy package. zlib support requires the Python standard library zlib module. zstd requires the zstandard package. By default no compression is used. Compression support must also be enabled on the server. MongoDB 3.4+ supports snappy compression. MongoDB 3.6 adds support for zlib. MongoDB 4.2 adds support for zstd.
zlibCompressionLevel: (int) The zlib compression level to use when zlib is used as the wire protocol compressor. Supported values are -1 through 9. -1 tells the zlib library to use its default compression level (usually 6). 0 means no compression. 1 is best speed. 9 is best compression. Defaults to -1.
uuidRepresentation: The BSON representation to use when encoding from and decoding to instances of
UUID
. Valid values are pythonLegacy (the default), javaLegacy, csharpLegacy, standard and unspecified. New applications should consider setting this to standard for cross language compatibility. See Handling UUID Data for details.
Write Concern options:(Only set if passed. No default values.)- w: (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Passing w=0 disables write acknowledgement and all other write concern options.
- wTimeoutMS: (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised. Passing wTimeoutMS=0 will cause write operations to wait indefinitely.
- journal: If
True
block until write operations have been committed to the journal. Cannot be used in combination with fsync. Prior to MongoDB 2.6 this option was ignored if the server was running without journaling. Starting with MongoDB 2.6 write operations will fail with an exception if this option is used when the server is running without journaling. - fsync: If
True
and the server is running without journaling, blocks until the server has synced all data files to disk. If the server is running with journaling, this acts the same as the j option, blocking until write operations have been committed to the journal. Cannot be used in combination with j.
Replica set keyword arguments for connecting with a replica set - either directly or via a mongos:- replicaSet: (string or None) The name of the replica set to
connect to. The driver will verify that all servers it connects to
match this name. Implies that the hosts specified are a seed list
and the driver should attempt to find all members of the set.
Defaults to
None
.
Read Preference:- readPreference: The replica set read preference for this client.
One of
primary
,primaryPreferred
,secondary
,secondaryPreferred
, ornearest
. Defaults toprimary
. - readPreferenceTags: Specifies a tag set as a comma-separated list
of colon-separated key-value pairs. For example
dc:ny,rack:1
. Defaults toNone
. - maxStalenessSeconds: (integer) The maximum estimated
length of time a replica set secondary can fall behind the primary
in replication before it will no longer be selected for operations.
Defaults to
-1
, meaning no maximum. If maxStalenessSeconds is set, it must be a positive integer greater than or equal to 90 seconds.
See also
Authentication:username: A string.
password: A string.
Although username and password must be percent-escaped in a MongoDB URI, they must not be percent-escaped when passed as parameters. In this example, both the space and slash special characters are passed as-is:
MongoClient(username="user name", password="pass/word")
authSource: The database to authenticate on. Defaults to the database specified in the URI, if provided, or to “admin”.
authMechanism: See
MECHANISMS
for options. If no mechanism is specified, PyMongo automatically uses MONGODB-CR when connected to a pre-3.0 version of MongoDB, SCRAM-SHA-1 when connected to MongoDB 3.0 through 3.6, and negotiates the mechanism to use (SCRAM-SHA-1 or SCRAM-SHA-256) when connected to MongoDB 4.0+.authMechanismProperties: Used to specify authentication mechanism specific options. To specify the service name for GSSAPI authentication pass authMechanismProperties=’SERVICE_NAME:<service name>’. To specify the session token for MONGODB-AWS authentication pass
authMechanismProperties='AWS_SESSION_TOKEN:<session token>'
.
See also
TLS/SSL configuration:- tls: (boolean) If
True
, create the connection to the server using transport layer security. Defaults toFalse
. - tlsInsecure: (boolean) Specify whether TLS constraints should be
relaxed as much as possible. Setting
tlsInsecure=True
impliestlsAllowInvalidCertificates=True
andtlsAllowInvalidHostnames=True
. Defaults toFalse
. Think very carefully before setting this toTrue
as it dramatically reduces the security of TLS. - tlsAllowInvalidCertificates: (boolean) If
True
, continues the TLS handshake regardless of the outcome of the certificate verification process. If this isFalse
, and a value is not provided fortlsCAFile
, PyMongo will attempt to load system provided CA certificates. If the python version in use does not support loading system CA certificates then thetlsCAFile
parameter must point to a file of CA certificates.tlsAllowInvalidCertificates=False
impliestls=True
. Defaults toFalse
. Think very carefully before setting this toTrue
as that could make your application vulnerable to man-in-the-middle attacks. - tlsAllowInvalidHostnames: (boolean) If
True
, disables TLS hostname verification.tlsAllowInvalidHostnames=False
impliestls=True
. Defaults toFalse
. Think very carefully before setting this toTrue
as that could make your application vulnerable to man-in-the-middle attacks. - tlsCAFile: A file containing a single or a bundle of
“certification authority” certificates, which are used to validate
certificates passed from the other end of the connection.
Implies
tls=True
. Defaults toNone
. - tlsCertificateKeyFile: A file containing the client certificate
and private key. If you want to pass the certificate and private
key as separate files, use the
ssl_certfile
andssl_keyfile
options instead. Impliestls=True
. Defaults toNone
. - tlsCRLFile: A file containing a PEM or DER formatted
certificate revocation list. Only supported by python 2.7.9+
(pypy 2.5.1+). Implies
tls=True
. Defaults toNone
. - tlsCertificateKeyFilePassword: The password or passphrase for
decrypting the private key in
tlsCertificateKeyFile
orssl_keyfile
. Only necessary if the private key is encrypted. Only supported by python 2.7.9+ (pypy 2.5.1+) and 3.3+. Defaults toNone
. - tlsDisableOCSPEndpointCheck: (boolean) If
True
, disables certificate revocation status checking via the OCSP responder specified on the server certificate. Defaults toFalse
. - ssl: (boolean) Alias for
tls
. - ssl_certfile: The certificate file used to identify the local
connection against mongod. Implies
tls=True
. Defaults toNone
. - ssl_keyfile: The private keyfile used to identify the local
connection against mongod. Can be omitted if the keyfile is
included with the
tlsCertificateKeyFile
. Impliestls=True
. Defaults toNone
.
Read Concern options:(If not set explicitly, this will use the server default)- readConcernLevel: (string) The read concern level specifies the
level of isolation for read operations. For example, a read
operation using a read concern level of
majority
will only return data that has been written to a majority of nodes. If the level is left unspecified, the server default will be used.
Client side encryption options:(If not set explicitly, client side encryption will not be enabled.)- auto_encryption_opts: A
AutoEncryptionOpts
which configures this client to automatically encrypt collection commands and automatically decrypt results. See Automatic Client-Side Field Level Encryption for an example.
Changed in version 3.11: Added the following keyword arguments and URI options:
tlsDisableOCSPEndpointCheck
directConnection
Changed in version 3.9: Added the
retryReads
keyword argument and URI option. Added thetlsInsecure
keyword argument and URI option. The following keyword arguments and URI options were deprecated:wTimeout
was deprecated in favor ofwTimeoutMS
.j
was deprecated in favor ofjournal
.ssl_cert_reqs
was deprecated in favor oftlsAllowInvalidCertificates
.ssl_match_hostname
was deprecated in favor oftlsAllowInvalidHostnames
.ssl_ca_certs
was deprecated in favor oftlsCAFile
.ssl_certfile
was deprecated in favor oftlsCertificateKeyFile
.ssl_crlfile
was deprecated in favor oftlsCRLFile
.ssl_pem_passphrase
was deprecated in favor oftlsCertificateKeyFilePassword
.
Changed in version 3.9:
retryWrites
now defaults toTrue
.Changed in version 3.8: Added the
server_selector
keyword argument. Added thetype_registry
keyword argument.Changed in version 3.7: Added the
driver
keyword argument.Changed in version 3.6: Added support for mongodb+srv:// URIs. Added the
retryWrites
keyword argument and URI option.Changed in version 3.5: Add
username
andpassword
options. Document theauthSource
,authMechanism
, andauthMechanismProperties
options. Deprecated thesocketKeepAlive
keyword argument and URI option.socketKeepAlive
now defaults toTrue
.Changed in version 3.0:
MongoClient
is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split intoMongoReplicaSetClient
: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs.The
MongoClient
constructor no longer blocks while connecting to the server or servers, and it no longer raisesConnectionFailure
if they are unavailable, norConfigurationError
if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads.Therefore the
alive
method is removed since it no longer provides meaningful information; even if the client is disconnected, it may discover a server in time to fulfill the next operation.In PyMongo 2.x,
MongoClient
accepted a list of standalone MongoDB servers and used the first it could connect to:MongoClient(['host1.com:27017', 'host2.com:27017'])
A list of multiple standalones is no longer supported; if multiple servers are listed they must be members of the same replica set, or mongoses in the same sharded cluster.
The behavior for a list of mongoses is changed from “high availability” to “load balancing”. Before, the client connected to the lowest-latency mongos in the list, and used it until a network error prompted it to re-evaluate all mongoses’ latencies and reconnect to one of them. In PyMongo 3, the client monitors its network latency to all the mongoses continuously, and distributes operations evenly among those with the lowest latency. See mongos Load Balancing for more information.
The
connect
option is added.The
start_request
,in_request
, andend_request
methods are removed, as well as theauto_start_request
option.The
copy_database
method is removed, see the copy_database examples for alternatives.The
MongoClient.disconnect()
method is removed; it was a synonym forclose()
.MongoClient
no longer returns an instance ofDatabase
for attribute names with leading underscores. You must use dict-style lookups instead:client['__my_database__']
Not:
client.__my_database__
-
close
()¶ Cleanup client resources and disconnect from MongoDB.
On MongoDB >= 3.6, end all server sessions created by this client by sending one or more endSessions commands.
Close all sockets in the connection pools and stop the monitor threads. If this instance is used again it will be automatically re-opened and the threads restarted unless auto encryption is enabled. A client enabled with auto encryption cannot be used again after being closed; any attempt will raise
InvalidOperation
.Changed in version 3.6: End all server sessions created by this client.
-
c[db_name] || c.db_name
Get the db_name
Database
onMongoClient
c.Raises
InvalidName
if an invalid database name is used.
-
event_listeners
¶ The event listeners registered for this client.
See
monitoring
for details.
-
address
¶ (host, port) of the current standalone, primary, or mongos, or None.
Accessing
address
raisesInvalidOperation
if the client is load-balancing among mongoses, since there is no single address. Usenodes
instead.If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.
New in version 3.0.
-
primary
¶ The (host, port) of the current primary of the replica set.
Returns
None
if this client is not connected to a replica set, there is no primary, or this client was created without the replicaSet option.New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.
-
secondaries
¶ The secondary members known to this client.
A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no visible secondaries, or this client was created without the replicaSet option.
New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.
-
arbiters
¶ Arbiters in the replica set.
A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no arbiters, or this client was created without the replicaSet option.
-
is_primary
¶ If this client is connected to a server that can accept writes.
True if the current server is a standalone, mongos, or the primary of a replica set. If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.
-
is_mongos
¶ If this client is connected to mongos. If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available..
-
max_pool_size
¶ The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.
When a server’s pool has reached max_pool_size, operations for that server block waiting for a socket to be returned to the pool. If
waitQueueTimeoutMS
is set, a blocked operation will raiseConnectionFailure
after a timeout. By defaultwaitQueueTimeoutMS
is not set.
-
min_pool_size
¶ The minimum required number of concurrent connections that the pool will maintain to each connected server. Default is 0.
-
max_idle_time_ms
¶ The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced. Defaults to None (no limit).
-
nodes
¶ Set of all currently connected servers.
Warning
When connected to a replica set the value of
nodes
can change over time asMongoClient
’s view of the replica set changes.nodes
can also be an empty set whenMongoClient
is first instantiated and hasn’t yet connected to any servers, or a network partition causes it to lose connection to all servers.
-
max_bson_size
¶ The largest BSON object the connected server accepts in bytes.
If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.
-
max_message_size
¶ The largest message the connected server accepts in bytes.
If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.
-
max_write_batch_size
¶ The maxWriteBatchSize reported by the server.
If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.
Returns a default value when connected to server versions prior to MongoDB 2.6.
-
local_threshold_ms
¶ The local threshold for this instance.
-
server_selection_timeout
¶ The server selection timeout for this instance in seconds.
-
codec_options
¶ Read only access to the
CodecOptions
of this instance.
-
read_preference
¶ Read only access to the read preference of this instance.
Changed in version 3.0: The
read_preference
attribute is now read only.
-
write_concern
¶ Read only access to the
WriteConcern
of this instance.Changed in version 3.0: The
write_concern
attribute is now read only.
-
read_concern
¶ Read only access to the
ReadConcern
of this instance.New in version 3.2.
-
start_session
(causal_consistency=True, default_transaction_options=None)¶ Start a logical session.
This method takes the same parameters as
SessionOptions
. See theclient_session
module for details and examples.Requires MongoDB 3.6. It is an error to call
start_session()
if this client has been authenticated to multiple databases using the deprecated methodauthenticate()
.A
ClientSession
may only be used with the MongoClient that started it.ClientSession
instances are not thread-safe or fork-safe. They can only be used by one thread or process at a time. A singleClientSession
cannot be used to run multiple operations concurrently.Returns: An instance of ClientSession
.New in version 3.6.
-
list_databases
(session=None, **kwargs)¶ Get a cursor over the databases of the connected server.
Parameters: - session (optional): a
ClientSession
. - **kwargs (optional): Optional parameters of the listDatabases command can be passed as keyword arguments to this method. The supported options differ by server version.
Returns: An instance of
CommandCursor
.New in version 3.6.
- session (optional): a
-
list_database_names
(session=None)¶ Get a list of the names of all databases on the connected server.
Parameters: - session (optional): a
ClientSession
.
New in version 3.6.
- session (optional): a
-
database_names
(session=None)¶ DEPRECATED: Get a list of the names of all databases on the connected server.
Parameters: - session (optional): a
ClientSession
.
Changed in version 3.7: Deprecated. Use
list_database_names()
instead.Changed in version 3.6: Added
session
parameter.- session (optional): a
-
drop_database
(name_or_database, session=None)¶ Drop a database.
Raises
TypeError
if name_or_database is not an instance ofbasestring
(str
in python 3) orDatabase
.Parameters: - name_or_database: the name of a database to drop, or a
Database
instance representing the database to drop - session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter.Note
The
write_concern
of this client is automatically applied to this operation when using MongoDB >= 3.4.Changed in version 3.4: Apply this client’s write concern automatically to this operation when connected to MongoDB >= 3.4.
- name_or_database: the name of a database to drop, or a
-
get_default_database
(default=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get the database named in the MongoDB connection URI.
>>> uri = 'mongodb://host/my_database' >>> client = MongoClient(uri) >>> db = client.get_default_database() >>> assert db.name == 'my_database' >>> db = client.get_database() >>> assert db.name == 'my_database'
Useful in scripts where you want to choose which database to use based only on the URI in a configuration file.
Parameters: - default (optional): the database name to use if no database name was provided in the URI.
- codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisMongoClient
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisMongoClient
is used. Seeread_preferences
for options. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisMongoClient
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisMongoClient
is used.
Changed in version 3.8: Undeprecated. Added the
default
,codec_options
,read_preference
,write_concern
andread_concern
parameters.Changed in version 3.5: Deprecated, use
get_database()
instead.
-
get_database
(name=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get a
Database
with the given name and options.Useful for creating a
Database
with different codec options, read preference, and/or write concern from thisMongoClient
.>>> client.read_preference Primary() >>> db1 = client.test >>> db1.read_preference Primary() >>> from pymongo import ReadPreference >>> db2 = client.get_database( ... 'test', read_preference=ReadPreference.SECONDARY) >>> db2.read_preference Secondary(tag_sets=None)
Parameters: - name (optional): The name of the database - a string. If
None
(the default) the database named in the MongoDB connection URI is returned. - codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisMongoClient
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisMongoClient
is used. Seeread_preferences
for options. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisMongoClient
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisMongoClient
is used.
Changed in version 3.5: The name parameter is now optional, defaulting to the database named in the MongoDB connection URI.
- name (optional): The name of the database - a string. If
-
server_info
(session=None)¶ Get information about the MongoDB server we’re connected to.
Parameters: - session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter.- session (optional): a
-
watch
(pipeline=None, full_document=None, resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, start_at_operation_time=None, session=None, start_after=None)¶ Watch changes on this cluster.
Performs an aggregation with an implicit initial
$changeStream
stage and returns aClusterChangeStream
cursor which iterates over changes on all databases on this cluster.Introduced in MongoDB 4.0.
with client.watch() as stream: for change in stream: print(change)
The
ClusterChangeStream
iterable blocks until the next change document is returned or an error is raised. If thenext()
method encounters a network error when retrieving a batch from the server, it will automatically attempt to recreate the cursor such that no change events are missed. Any error encountered during the resume attempt indicates there may be an outage and will be raised.try: with client.watch( [{'$match': {'operationType': 'insert'}}]) as stream: for insert_change in stream: print(insert_change) except pymongo.errors.PyMongoError: # The ChangeStream encountered an unrecoverable error or the # resume attempt failed to recreate the cursor. logging.error('...')
For a precise description of the resume process see the change streams specification.
Parameters: - pipeline (optional): A list of aggregation pipeline stages to
append to an initial
$changeStream
stage. Not all pipeline stages are valid after a$changeStream
stage, see the MongoDB documentation on change streams for the supported stages. - full_document (optional): The fullDocument to pass as an option
to the
$changeStream
stage. Allowed values: ‘updateLookup’. When set to ‘updateLookup’, the change notification for partial updates will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred. - resume_after (optional): A resume token. If provided, the change stream will start returning changes that occur directly after the operation specified in the resume token. A resume token is the _id value of a change document.
- max_await_time_ms (optional): The maximum time in milliseconds for the server to wait for changes before responding to a getMore operation.
- batch_size (optional): The maximum number of documents to return per batch.
- collation (optional): The
Collation
to use for the aggregation. - start_at_operation_time (optional): If provided, the resulting
change stream will only return changes that occurred at or after
the specified
Timestamp
. Requires MongoDB >= 4.0. - session (optional): a
ClientSession
. - start_after (optional): The same as resume_after except that start_after can resume notifications after an invalidate event. This option and resume_after are mutually exclusive.
Returns: A
ClusterChangeStream
cursor.Changed in version 3.9: Added the
start_after
parameter.New in version 3.7.
- pipeline (optional): A list of aggregation pipeline stages to
append to an initial
-
close_cursor
(cursor_id, address=None)¶ DEPRECATED - Send a kill cursors message soon with the given id.
Raises
TypeError
if cursor_id is not an instance of(int, long)
. What closing the cursor actually means depends on this client’s cursor manager.This method may be called from a
Cursor
destructor during garbage collection, so it isn’t safe to take a lock or do network I/O. Instead, we schedule the cursor to be closed soon on a background thread.Parameters: - cursor_id: id of cursor to close
- address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.
Changed in version 3.7: Deprecated.
Changed in version 3.0: Added
address
parameter.
-
kill_cursors
(cursor_ids, address=None)¶ DEPRECATED - Send a kill cursors message soon with the given ids.
Raises
TypeError
if cursor_ids is not an instance oflist
.Parameters: - cursor_ids: list of cursor ids to kill
- address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.
Changed in version 3.3: Deprecated.
Changed in version 3.0: Now accepts an address argument. Schedules the cursors to be closed on a background thread instead of sending the message immediately.
-
set_cursor_manager
(manager_class)¶ DEPRECATED - Set this client’s cursor manager.
Raises
TypeError
if manager_class is not a subclass ofCursorManager
. A cursor manager handles closing cursors. Different managers can implement different policies in terms of when to actually kill a cursor that has been closed.Parameters: - manager_class: cursor manager to use
Changed in version 3.3: Deprecated, for real this time.
Changed in version 3.0: Undeprecated.
-
is_locked
¶ DEPRECATED: Is this server locked? While locked, all write operations are blocked, although read operations may still be allowed. Use
unlock()
to unlock.Deprecated. Users of MongoDB version 3.2 or newer can run the currentOp command directly with
command()
:is_locked = client.admin.command('currentOp').get('fsyncLock')
Users of MongoDB version 2.6 and 3.0 can query the “inprog” virtual collection:
is_locked = client.admin["$cmd.sys.inprog"].find_one().get('fsyncLock')
Changed in version 3.11: Deprecated.
-
fsync
(**kwargs)¶ DEPRECATED: Flush all pending writes to datafiles.
- Optional parameters can be passed as keyword arguments:
- lock: If True lock the server to disallow writes.
- async: If True don’t block while synchronizing.
- session (optional): a
ClientSession
.
Note
Starting with Python 3.7 async is a reserved keyword. The async option to the fsync command can be passed using a dictionary instead:
options = {'async': True} client.fsync(**options)
Deprecated. Run the fsync command directly with
command()
instead. For example:client.admin.command('fsync', lock=True)
Changed in version 3.11: Deprecated.
Changed in version 3.6: Added
session
parameter.Warning
async and lock can not be used together.
Warning
MongoDB does not support the async option on Windows and will raise an exception on that platform.
-
unlock
(session=None)¶ DEPRECATED: Unlock a previously locked server.
Parameters: - session (optional): a
ClientSession
.
Deprecated. Users of MongoDB version 3.2 or newer can run the fsyncUnlock command directly with
command()
:client.admin.command('fsyncUnlock')
Users of MongoDB version 2.6 and 3.0 can query the “unlock” virtual collection:
client.admin["$cmd.sys.unlock"].find_one()
Changed in version 3.11: Deprecated.
Changed in version 3.6: Added
session
parameter.- session (optional): a
mongo_replica_set_client
– Tools for connecting to a MongoDB replica set¶
Deprecated. See High Availability and PyMongo.
-
class
pymongo.mongo_replica_set_client.
MongoReplicaSetClient
(hosts_or_uri, document_class=dict, tz_aware=False, connect=True, **kwargs)¶ Deprecated alias for
MongoClient
.MongoReplicaSetClient
will be removed in a future version of PyMongo.Changed in version 3.0:
MongoClient
is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split intoMongoReplicaSetClient
: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs.The
refresh
method is removed fromMongoReplicaSetClient
, as are theseeds
andhosts
properties.-
close
()¶ Cleanup client resources and disconnect from MongoDB.
On MongoDB >= 3.6, end all server sessions created by this client by sending one or more endSessions commands.
Close all sockets in the connection pools and stop the monitor threads. If this instance is used again it will be automatically re-opened and the threads restarted unless auto encryption is enabled. A client enabled with auto encryption cannot be used again after being closed; any attempt will raise
InvalidOperation
.Changed in version 3.6: End all server sessions created by this client.
-
c[db_name] || c.db_name
Get the db_name
Database
onMongoReplicaSetClient
c.Raises
InvalidName
if an invalid database name is used.
-
primary
¶ The (host, port) of the current primary of the replica set.
Returns
None
if this client is not connected to a replica set, there is no primary, or this client was created without the replicaSet option.New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.
-
secondaries
¶ The secondary members known to this client.
A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no visible secondaries, or this client was created without the replicaSet option.
New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.
-
arbiters
¶ Arbiters in the replica set.
A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no arbiters, or this client was created without the replicaSet option.
-
max_pool_size
¶ The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.
When a server’s pool has reached max_pool_size, operations for that server block waiting for a socket to be returned to the pool. If
waitQueueTimeoutMS
is set, a blocked operation will raiseConnectionFailure
after a timeout. By defaultwaitQueueTimeoutMS
is not set.
-
max_bson_size
¶ The largest BSON object the connected server accepts in bytes.
If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.
-
max_message_size
¶ The largest message the connected server accepts in bytes.
If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.
-
local_threshold_ms
¶ The local threshold for this instance.
-
codec_options
¶ Read only access to the
CodecOptions
of this instance.
-
read_preference
¶ Read only access to the read preference of this instance.
Changed in version 3.0: The
read_preference
attribute is now read only.
-
write_concern
¶ Read only access to the
WriteConcern
of this instance.Changed in version 3.0: The
write_concern
attribute is now read only.
-
database_names
(session=None)¶ DEPRECATED: Get a list of the names of all databases on the connected server.
Parameters: - session (optional): a
ClientSession
.
Changed in version 3.7: Deprecated. Use
list_database_names()
instead.Changed in version 3.6: Added
session
parameter.- session (optional): a
-
drop_database
(name_or_database, session=None)¶ Drop a database.
Raises
TypeError
if name_or_database is not an instance ofbasestring
(str
in python 3) orDatabase
.Parameters: - name_or_database: the name of a database to drop, or a
Database
instance representing the database to drop - session (optional): a
ClientSession
.
Changed in version 3.6: Added
session
parameter.Note
The
write_concern
of this client is automatically applied to this operation when using MongoDB >= 3.4.Changed in version 3.4: Apply this client’s write concern automatically to this operation when connected to MongoDB >= 3.4.
- name_or_database: the name of a database to drop, or a
-
get_database
(name=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get a
Database
with the given name and options.Useful for creating a
Database
with different codec options, read preference, and/or write concern from thisMongoClient
.>>> client.read_preference Primary() >>> db1 = client.test >>> db1.read_preference Primary() >>> from pymongo import ReadPreference >>> db2 = client.get_database( ... 'test', read_preference=ReadPreference.SECONDARY) >>> db2.read_preference Secondary(tag_sets=None)
Parameters: - name (optional): The name of the database - a string. If
None
(the default) the database named in the MongoDB connection URI is returned. - codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisMongoClient
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisMongoClient
is used. Seeread_preferences
for options. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisMongoClient
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisMongoClient
is used.
Changed in version 3.5: The name parameter is now optional, defaulting to the database named in the MongoDB connection URI.
- name (optional): The name of the database - a string. If
-
close_cursor
(cursor_id, address=None)¶ DEPRECATED - Send a kill cursors message soon with the given id.
Raises
TypeError
if cursor_id is not an instance of(int, long)
. What closing the cursor actually means depends on this client’s cursor manager.This method may be called from a
Cursor
destructor during garbage collection, so it isn’t safe to take a lock or do network I/O. Instead, we schedule the cursor to be closed soon on a background thread.Parameters: - cursor_id: id of cursor to close
- address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.
Changed in version 3.7: Deprecated.
Changed in version 3.0: Added
address
parameter.
-
kill_cursors
(cursor_ids, address=None)¶ DEPRECATED - Send a kill cursors message soon with the given ids.
Raises
TypeError
if cursor_ids is not an instance oflist
.Parameters: - cursor_ids: list of cursor ids to kill
- address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.
Changed in version 3.3: Deprecated.
Changed in version 3.0: Now accepts an address argument. Schedules the cursors to be closed on a background thread instead of sending the message immediately.
-
set_cursor_manager
(manager_class)¶ DEPRECATED - Set this client’s cursor manager.
Raises
TypeError
if manager_class is not a subclass ofCursorManager
. A cursor manager handles closing cursors. Different managers can implement different policies in terms of when to actually kill a cursor that has been closed.Parameters: - manager_class: cursor manager to use
Changed in version 3.3: Deprecated, for real this time.
Changed in version 3.0: Undeprecated.
-
get_default_database
(default=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)¶ Get the database named in the MongoDB connection URI.
>>> uri = 'mongodb://host/my_database' >>> client = MongoClient(uri) >>> db = client.get_default_database() >>> assert db.name == 'my_database' >>> db = client.get_database() >>> assert db.name == 'my_database'
Useful in scripts where you want to choose which database to use based only on the URI in a configuration file.
Parameters: - default (optional): the database name to use if no database name was provided in the URI.
- codec_options (optional): An instance of
CodecOptions
. IfNone
(the default) thecodec_options
of thisMongoClient
is used. - read_preference (optional): The read preference to use. If
None
(the default) theread_preference
of thisMongoClient
is used. Seeread_preferences
for options. - write_concern (optional): An instance of
WriteConcern
. IfNone
(the default) thewrite_concern
of thisMongoClient
is used. - read_concern (optional): An instance of
ReadConcern
. IfNone
(the default) theread_concern
of thisMongoClient
is used.
Changed in version 3.8: Undeprecated. Added the
default
,codec_options
,read_preference
,write_concern
andread_concern
parameters.Changed in version 3.5: Deprecated, use
get_database()
instead.
-
monitoring
– Tools for monitoring driver events.¶
Tools to monitor driver events.
New in version 3.1.
Attention
Starting in PyMongo 3.11, the monitoring classes outlined below
are included in the PyMongo distribution under the
event_loggers
submodule.
Use register()
to register global listeners for specific events.
Listeners must inherit from one of the abstract classes below and implement
the correct functions for that class.
For example, a simple command logger might be implemented like this:
import logging
from pymongo import monitoring
class CommandLogger(monitoring.CommandListener):
def started(self, event):
logging.info("Command {0.command_name} with request id "
"{0.request_id} started on server "
"{0.connection_id}".format(event))
def succeeded(self, event):
logging.info("Command {0.command_name} with request id "
"{0.request_id} on server {0.connection_id} "
"succeeded in {0.duration_micros} "
"microseconds".format(event))
def failed(self, event):
logging.info("Command {0.command_name} with request id "
"{0.request_id} on server {0.connection_id} "
"failed in {0.duration_micros} "
"microseconds".format(event))
monitoring.register(CommandLogger())
Server discovery and monitoring events are also available. For example:
class ServerLogger(monitoring.ServerListener):
def opened(self, event):
logging.info("Server {0.server_address} added to topology "
"{0.topology_id}".format(event))
def description_changed(self, event):
previous_server_type = event.previous_description.server_type
new_server_type = event.new_description.server_type
if new_server_type != previous_server_type:
# server_type_name was added in PyMongo 3.4
logging.info(
"Server {0.server_address} changed type from "
"{0.previous_description.server_type_name} to "
"{0.new_description.server_type_name}".format(event))
def closed(self, event):
logging.warning("Server {0.server_address} removed from topology "
"{0.topology_id}".format(event))
class HeartbeatLogger(monitoring.ServerHeartbeatListener):
def started(self, event):
logging.info("Heartbeat sent to server "
"{0.connection_id}".format(event))
def succeeded(self, event):
# The reply.document attribute was added in PyMongo 3.4.
logging.info("Heartbeat to server {0.connection_id} "
"succeeded with reply "
"{0.reply.document}".format(event))
def failed(self, event):
logging.warning("Heartbeat to server {0.connection_id} "
"failed with error {0.reply}".format(event))
class TopologyLogger(monitoring.TopologyListener):
def opened(self, event):
logging.info("Topology with id {0.topology_id} "
"opened".format(event))
def description_changed(self, event):
logging.info("Topology description updated for "
"topology id {0.topology_id}".format(event))
previous_topology_type = event.previous_description.topology_type
new_topology_type = event.new_description.topology_type
if new_topology_type != previous_topology_type:
# topology_type_name was added in PyMongo 3.4
logging.info(
"Topology {0.topology_id} changed type from "
"{0.previous_description.topology_type_name} to "
"{0.new_description.topology_type_name}".format(event))
# The has_writable_server and has_readable_server methods
# were added in PyMongo 3.4.
if not event.new_description.has_writable_server():
logging.warning("No writable servers available.")
if not event.new_description.has_readable_server():
logging.warning("No readable servers available.")
def closed(self, event):
logging.info("Topology with id {0.topology_id} "
"closed".format(event))
Connection monitoring and pooling events are also available. For example:
class ConnectionPoolLogger(ConnectionPoolListener):
def pool_created(self, event):
logging.info("[pool {0.address}] pool created".format(event))
def pool_cleared(self, event):
logging.info("[pool {0.address}] pool cleared".format(event))
def pool_closed(self, event):
logging.info("[pool {0.address}] pool closed".format(event))
def connection_created(self, event):
logging.info("[pool {0.address}][conn #{0.connection_id}] "
"connection created".format(event))
def connection_ready(self, event):
logging.info("[pool {0.address}][conn #{0.connection_id}] "
"connection setup succeeded".format(event))
def connection_closed(self, event):
logging.info("[pool {0.address}][conn #{0.connection_id}] "
"connection closed, reason: "
"{0.reason}".format(event))
def connection_check_out_started(self, event):
logging.info("[pool {0.address}] connection check out "
"started".format(event))
def connection_check_out_failed(self, event):
logging.info("[pool {0.address}] connection check out "
"failed, reason: {0.reason}".format(event))
def connection_checked_out(self, event):
logging.info("[pool {0.address}][conn #{0.connection_id}] "
"connection checked out of pool".format(event))
def connection_checked_in(self, event):
logging.info("[pool {0.address}][conn #{0.connection_id}] "
"connection checked into pool".format(event))
Event listeners can also be registered per instance of
MongoClient
:
client = MongoClient(event_listeners=[CommandLogger()])
Note that previously registered global listeners are automatically included when configuring per client event listeners. Registering a new global listener will not add that listener to existing client instances.
Note
Events are delivered synchronously. Application threads block
waiting for event handlers (e.g. started()
) to
return. Care must be taken to ensure that your event handlers are efficient
enough to not adversely affect overall application performance.
Warning
The command documents published through this API are not copies. If you intend to modify them in any way you must copy them in your event handler first.
-
pymongo.monitoring.
register
(listener)¶ Register a global event listener.
Parameters: - listener: A subclasses of
CommandListener
,ServerHeartbeatListener
,ServerListener
,TopologyListener
, orConnectionPoolListener
.
- listener: A subclasses of
-
class
pymongo.monitoring.
CommandListener
¶ Abstract base class for command listeners.
Handles CommandStartedEvent, CommandSucceededEvent, and CommandFailedEvent.
-
failed
(event)¶ Abstract method to handle a CommandFailedEvent.
Parameters: - event: An instance of
CommandFailedEvent
.
- event: An instance of
-
started
(event)¶ Abstract method to handle a CommandStartedEvent.
Parameters: - event: An instance of
CommandStartedEvent
.
- event: An instance of
-
succeeded
(event)¶ Abstract method to handle a CommandSucceededEvent.
Parameters: - event: An instance of
CommandSucceededEvent
.
- event: An instance of
-
-
class
pymongo.monitoring.
ServerListener
¶ Abstract base class for server listeners. Handles ServerOpeningEvent, ServerDescriptionChangedEvent, and ServerClosedEvent.
New in version 3.3.
-
closed
(event)¶ Abstract method to handle a ServerClosedEvent.
Parameters: - event: An instance of
ServerClosedEvent
.
- event: An instance of
-
description_changed
(event)¶ Abstract method to handle a ServerDescriptionChangedEvent.
Parameters: - event: An instance of
ServerDescriptionChangedEvent
.
- event: An instance of
-
opened
(event)¶ Abstract method to handle a ServerOpeningEvent.
Parameters: - event: An instance of
ServerOpeningEvent
.
- event: An instance of
-
-
class
pymongo.monitoring.
ServerHeartbeatListener
¶ Abstract base class for server heartbeat listeners.
Handles ServerHeartbeatStartedEvent, ServerHeartbeatSucceededEvent, and ServerHeartbeatFailedEvent.
New in version 3.3.
-
failed
(event)¶ Abstract method to handle a ServerHeartbeatFailedEvent.
Parameters: - event: An instance of
ServerHeartbeatFailedEvent
.
- event: An instance of
-
started
(event)¶ Abstract method to handle a ServerHeartbeatStartedEvent.
Parameters: - event: An instance of
ServerHeartbeatStartedEvent
.
- event: An instance of
-
succeeded
(event)¶ Abstract method to handle a ServerHeartbeatSucceededEvent.
Parameters: - event: An instance of
ServerHeartbeatSucceededEvent
.
- event: An instance of
-
-
class
pymongo.monitoring.
TopologyListener
¶ Abstract base class for topology monitoring listeners. Handles TopologyOpenedEvent, TopologyDescriptionChangedEvent, and TopologyClosedEvent.
New in version 3.3.
-
closed
(event)¶ Abstract method to handle a TopologyClosedEvent.
Parameters: - event: An instance of
TopologyClosedEvent
.
- event: An instance of
-
description_changed
(event)¶ Abstract method to handle a TopologyDescriptionChangedEvent.
Parameters: - event: An instance of
TopologyDescriptionChangedEvent
.
- event: An instance of
-
opened
(event)¶ Abstract method to handle a TopologyOpenedEvent.
Parameters: - event: An instance of
TopologyOpenedEvent
.
- event: An instance of
-
-
class
pymongo.monitoring.
ConnectionPoolListener
¶ Abstract base class for connection pool listeners.
Handles all of the connection pool events defined in the Connection Monitoring and Pooling Specification:
PoolCreatedEvent
,PoolClearedEvent
,PoolClosedEvent
,ConnectionCreatedEvent
,ConnectionReadyEvent
,ConnectionClosedEvent
,ConnectionCheckOutStartedEvent
,ConnectionCheckOutFailedEvent
,ConnectionCheckedOutEvent
, andConnectionCheckedInEvent
.New in version 3.9.
-
connection_check_out_failed
(event)¶ Abstract method to handle a
ConnectionCheckOutFailedEvent
.Emitted when the driver’s attempt to check out a connection fails.
Parameters: - event: An instance of
ConnectionCheckOutFailedEvent
.
- event: An instance of
-
connection_check_out_started
(event)¶ Abstract method to handle a
ConnectionCheckOutStartedEvent
.Emitted when the driver starts attempting to check out a connection.
Parameters: - event: An instance of
ConnectionCheckOutStartedEvent
.
- event: An instance of
-
connection_checked_in
(event)¶ Abstract method to handle a
ConnectionCheckedInEvent
.Emitted when the driver checks in a Connection back to the Connection Pool.
Parameters: - event: An instance of
ConnectionCheckedInEvent
.
- event: An instance of
-
connection_checked_out
(event)¶ Abstract method to handle a
ConnectionCheckedOutEvent
.Emitted when the driver successfully checks out a Connection.
Parameters: - event: An instance of
ConnectionCheckedOutEvent
.
- event: An instance of
-
connection_closed
(event)¶ Abstract method to handle a
ConnectionClosedEvent
.Emitted when a Connection Pool closes a Connection.
Parameters: - event: An instance of
ConnectionClosedEvent
.
- event: An instance of
-
connection_created
(event)¶ Abstract method to handle a
ConnectionCreatedEvent
.Emitted when a Connection Pool creates a Connection object.
Parameters: - event: An instance of
ConnectionCreatedEvent
.
- event: An instance of
-
connection_ready
(event)¶ Abstract method to handle a
ConnectionReadyEvent
.Emitted when a Connection has finished its setup, and is now ready to use.
Parameters: - event: An instance of
ConnectionReadyEvent
.
- event: An instance of
-
pool_cleared
(event)¶ Abstract method to handle a PoolClearedEvent.
Emitted when a Connection Pool is cleared.
Parameters: - event: An instance of
PoolClearedEvent
.
- event: An instance of
-
pool_closed
(event)¶ Abstract method to handle a PoolClosedEvent.
Emitted when a Connection Pool is closed.
Parameters: - event: An instance of
PoolClosedEvent
.
- event: An instance of
-
pool_created
(event)¶ Abstract method to handle a
PoolCreatedEvent
.Emitted when a Connection Pool is created.
Parameters: - event: An instance of
PoolCreatedEvent
.
- event: An instance of
-
-
class
pymongo.monitoring.
CommandStartedEvent
(command, database_name, *args)¶ Event published when a command starts.
Parameters: - command: The command document.
- database_name: The name of the database this command was run against.
- request_id: The request id for this operation.
- connection_id: The address (host, port) of the server this command was sent to.
- operation_id: An optional identifier for a series of related events.
-
command
¶ The command document.
-
command_name
¶ The command name.
-
connection_id
¶ The address (host, port) of the server this command was sent to.
-
database_name
¶ The name of the database this command was run against.
-
operation_id
¶ An id for this series of events or None.
-
request_id
¶ The request id for this operation.
-
class
pymongo.monitoring.
CommandSucceededEvent
(duration, reply, command_name, request_id, connection_id, operation_id)¶ Event published when a command succeeds.
Parameters: - duration: The command duration as a datetime.timedelta.
- reply: The server reply document.
- command_name: The command name.
- request_id: The request id for this operation.
- connection_id: The address (host, port) of the server this command was sent to.
- operation_id: An optional identifier for a series of related events.
-
command_name
¶ The command name.
-
connection_id
¶ The address (host, port) of the server this command was sent to.
-
duration_micros
¶ The duration of this operation in microseconds.
-
operation_id
¶ An id for this series of events or None.
-
reply
¶ The server failure document for this operation.
-
request_id
¶ The request id for this operation.
-
class
pymongo.monitoring.
CommandFailedEvent
(duration, failure, *args)¶ Event published when a command fails.
Parameters: - duration: The command duration as a datetime.timedelta.
- failure: The server reply document.
- command_name: The command name.
- request_id: The request id for this operation.
- connection_id: The address (host, port) of the server this command was sent to.
- operation_id: An optional identifier for a series of related events.
-
command_name
¶ The command name.
-
connection_id
¶ The address (host, port) of the server this command was sent to.
-
duration_micros
¶ The duration of this operation in microseconds.
-
failure
¶ The server failure document for this operation.
-
operation_id
¶ An id for this series of events or None.
-
request_id
¶ The request id for this operation.
-
class
pymongo.monitoring.
ServerDescriptionChangedEvent
(previous_description, new_description, *args)¶ Published when server description changes.
New in version 3.3.
-
new_description
¶ The new
ServerDescription
.
-
previous_description
¶ The previous
ServerDescription
.
-
server_address
¶ The address (host, port) pair of the server
-
topology_id
¶ A unique identifier for the topology this server is a part of.
-
-
class
pymongo.monitoring.
ServerOpeningEvent
(server_address, topology_id)¶ Published when server is initialized.
New in version 3.3.
-
server_address
¶ The address (host, port) pair of the server
-
topology_id
¶ A unique identifier for the topology this server is a part of.
-
-
class
pymongo.monitoring.
ServerClosedEvent
(server_address, topology_id)¶ Published when server is closed.
New in version 3.3.
-
server_address
¶ The address (host, port) pair of the server
-
topology_id
¶ A unique identifier for the topology this server is a part of.
-
-
class
pymongo.monitoring.
TopologyDescriptionChangedEvent
(previous_description, new_description, *args)¶ Published when the topology description changes.
New in version 3.3.
-
new_description
¶ The new
TopologyDescription
.
-
previous_description
¶ The previous
TopologyDescription
.
-
topology_id
¶ A unique identifier for the topology this server is a part of.
-
-
class
pymongo.monitoring.
TopologyOpenedEvent
(topology_id)¶ Published when the topology is initialized.
New in version 3.3.
-
topology_id
¶ A unique identifier for the topology this server is a part of.
-
-
class
pymongo.monitoring.
TopologyClosedEvent
(topology_id)¶ Published when the topology is closed.
New in version 3.3.
-
topology_id
¶ A unique identifier for the topology this server is a part of.
-
-
class
pymongo.monitoring.
ServerHeartbeatStartedEvent
(connection_id)¶ Published when a heartbeat is started.
New in version 3.3.
-
connection_id
¶ The address (host, port) of the server this heartbeat was sent to.
-
-
class
pymongo.monitoring.
ServerHeartbeatSucceededEvent
(duration, reply, connection_id, awaited=False)¶ Fired when the server heartbeat succeeds.
New in version 3.3.
-
awaited
¶ Whether the heartbeat was awaited.
If true, then
duration()
reflects the sum of the round trip time to the server and the time that the server waited before sending a response.
-
connection_id
¶ The address (host, port) of the server this heartbeat was sent to.
-
duration
¶ The duration of this heartbeat in microseconds.
-
-
class
pymongo.monitoring.
ServerHeartbeatFailedEvent
(duration, reply, connection_id, awaited=False)¶ Fired when the server heartbeat fails, either with an “ok: 0” or a socket exception.
New in version 3.3.
-
awaited
¶ Whether the heartbeat was awaited.
If true, then
duration()
reflects the sum of the round trip time to the server and the time that the server waited before sending a response.
-
connection_id
¶ The address (host, port) of the server this heartbeat was sent to.
-
duration
¶ The duration of this heartbeat in microseconds.
-
-
class
pymongo.monitoring.
PoolCreatedEvent
(address, options)¶ Published when a Connection Pool is created.
Parameters: - address: The address (host, port) pair of the server this Pool is attempting to connect to.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server the pool is attempting to connect to.
-
options
¶ Any non-default pool options that were set on this Connection Pool.
-
class
pymongo.monitoring.
PoolClearedEvent
(address)¶ Published when a Connection Pool is cleared.
Parameters: - address: The address (host, port) pair of the server this Pool is attempting to connect to.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server the pool is attempting to connect to.
-
class
pymongo.monitoring.
PoolClosedEvent
(address)¶ Published when a Connection Pool is closed.
Parameters: - address: The address (host, port) pair of the server this Pool is attempting to connect to.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server the pool is attempting to connect to.
-
class
pymongo.monitoring.
ConnectionCreatedEvent
(address, connection_id)¶ Published when a Connection Pool creates a Connection object.
NOTE: This connection is not ready for use until the
ConnectionReadyEvent
is published.Parameters: - address: The address (host, port) pair of the server this Connection is attempting to connect to.
- connection_id: The integer ID of the Connection in this Pool.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server this connection is attempting to connect to.
-
connection_id
¶ The ID of the Connection.
-
class
pymongo.monitoring.
ConnectionReadyEvent
(address, connection_id)¶ Published when a Connection has finished its setup, and is ready to use.
Parameters: - address: The address (host, port) pair of the server this Connection is attempting to connect to.
- connection_id: The integer ID of the Connection in this Pool.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server this connection is attempting to connect to.
-
connection_id
¶ The ID of the Connection.
-
class
pymongo.monitoring.
ConnectionClosedReason
¶ An enum that defines values for reason on a
ConnectionClosedEvent
.New in version 3.9.
-
ERROR
= 'error'¶ The connection experienced an error, making it no longer valid.
-
IDLE
= 'idle'¶ The connection became stale by being idle for too long (maxIdleTimeMS).
-
POOL_CLOSED
= 'poolClosed'¶ The pool was closed, making the connection no longer valid.
-
STALE
= 'stale'¶ The pool was cleared, making the connection no longer valid.
-
-
class
pymongo.monitoring.
ConnectionClosedEvent
(address, connection_id, reason)¶ Published when a Connection is closed.
Parameters: - address: The address (host, port) pair of the server this Connection is attempting to connect to.
- connection_id: The integer ID of the Connection in this Pool.
- reason: A reason explaining why this connection was closed.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server this connection is attempting to connect to.
-
connection_id
¶ The ID of the Connection.
-
reason
¶ A reason explaining why this connection was closed.
The reason must be one of the strings from the
ConnectionClosedReason
enum.
-
class
pymongo.monitoring.
ConnectionCheckOutStartedEvent
(address)¶ Published when the driver starts attempting to check out a connection.
Parameters: - address: The address (host, port) pair of the server this Connection is attempting to connect to.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server this connection is attempting to connect to.
-
class
pymongo.monitoring.
ConnectionCheckOutFailedReason
¶ An enum that defines values for reason on a
ConnectionCheckOutFailedEvent
.New in version 3.9.
-
CONN_ERROR
= 'connectionError'¶ The connection check out attempt experienced an error while setting up a new connection.
-
POOL_CLOSED
= 'poolClosed'¶ The pool was previously closed, and cannot provide new connections.
-
TIMEOUT
= 'timeout'¶ The connection check out attempt exceeded the specified timeout.
-
-
class
pymongo.monitoring.
ConnectionCheckOutFailedEvent
(address, reason)¶ Published when the driver’s attempt to check out a connection fails.
Parameters: - address: The address (host, port) pair of the server this Connection is attempting to connect to.
- reason: A reason explaining why connection check out failed.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server this connection is attempting to connect to.
-
reason
¶ A reason explaining why connection check out failed.
The reason must be one of the strings from the
ConnectionCheckOutFailedReason
enum.
-
class
pymongo.monitoring.
ConnectionCheckedOutEvent
(address, connection_id)¶ Published when the driver successfully checks out a Connection.
Parameters: - address: The address (host, port) pair of the server this Connection is attempting to connect to.
- connection_id: The integer ID of the Connection in this Pool.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server this connection is attempting to connect to.
-
connection_id
¶ The ID of the Connection.
-
class
pymongo.monitoring.
ConnectionCheckedInEvent
(address, connection_id)¶ Published when the driver checks in a Connection into the Pool.
Parameters: - address: The address (host, port) pair of the server this Connection is attempting to connect to.
- connection_id: The integer ID of the Connection in this Pool.
New in version 3.9.
-
address
¶ The address (host, port) pair of the server this connection is attempting to connect to.
-
connection_id
¶ The ID of the Connection.
operations
– Operation class definitions¶
Operation class definitions.
-
class
pymongo.operations.
DeleteMany
(filter, collation=None, hint=None)¶ Create a DeleteMany instance.
For use with
bulk_write()
.Parameters: - filter: A query that matches the documents to delete.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.4 and above.
Changed in version 3.11: Added the
hint
option.Changed in version 3.5: Added the collation option.
-
class
pymongo.operations.
DeleteOne
(filter, collation=None, hint=None)¶ Create a DeleteOne instance.
For use with
bulk_write()
.Parameters: - filter: A query that matches the document to delete.
- collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.4 and above.
Changed in version 3.11: Added the
hint
option.Changed in version 3.5: Added the collation option.
-
class
pymongo.operations.
IndexModel
(keys, **kwargs)¶ Create an Index instance.
For use with
create_indexes()
.Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of
basestring
(str
in python 3), and the direction(s) must be one of (ASCENDING
,DESCENDING
,GEO2D
,GEOHAYSTACK
,GEOSPHERE
,HASHED
,TEXT
).Valid options include, but are not limited to:
- name: custom name to use for this index - if none is given, a name will be generated.
- unique: if
True
, creates a uniqueness constraint on the index. - background: if
True
, this index should be created in the background. - sparse: if
True
, omit from the index any documents that lack the indexed field. - bucketSize: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
- min: minimum value for keys in a
GEO2D
index. - max: maximum value for keys in a
GEO2D
index. - expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
- partialFilterExpression: A document that specifies a filter for a partial index. Requires MongoDB >= 3.2.
- collation: An instance of
Collation
that specifies the collation to use in MongoDB >= 3.4. - wildcardProjection: Allows users to include or exclude specific field paths from a wildcard index using the { “$**” : 1} key pattern. Requires MongoDB >= 4.2.
- hidden: if
True
, this index will be hidden from the query planner and will not be evaluated as part of query plan selection. Requires MongoDB >= 4.4.
See the MongoDB documentation for a full list of supported options by server version.
Parameters: - keys: a single key or a list of (key, direction) pairs specifying the index to create
- **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments
Changed in version 3.11: Added the
hidden
option.Changed in version 3.2: Added the
partialFilterExpression
option to support partial indexes.-
document
¶ An index document suitable for passing to the createIndexes command.
-
class
pymongo.operations.
InsertOne
(document)¶ Create an InsertOne instance.
For use with
bulk_write()
.Parameters: - document: The document to insert. If the document is missing an _id field one will be added.
-
class
pymongo.operations.
ReplaceOne
(filter, replacement, upsert=False, collation=None, hint=None)¶ Create a ReplaceOne instance.
For use with
bulk_write()
.Parameters: - filter: A query that matches the document to replace.
- replacement: The new document.
- upsert (optional): If
True
, perform an insert if no documents match the filter. - collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.2 and above.
Changed in version 3.11: Added the
hint
option.Changed in version 3.5: Added the
collation
option.
-
class
pymongo.operations.
UpdateMany
(filter, update, upsert=False, collation=None, array_filters=None, hint=None)¶ Create an UpdateMany instance.
For use with
bulk_write()
.Parameters: - filter: A query that matches the documents to update.
- update: The modifications to apply.
- upsert (optional): If
True
, perform an insert if no documents match the filter. - collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
- hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.2 and above.
Changed in version 3.11: Added the hint option.
Changed in version 3.9: Added the ability to accept a pipeline as the update.
Changed in version 3.6: Added the array_filters option.
Changed in version 3.5: Added the collation option.
-
class
pymongo.operations.
UpdateOne
(filter, update, upsert=False, collation=None, array_filters=None, hint=None)¶ Represents an update_one operation.
For use with
bulk_write()
.Parameters: - filter: A query that matches the document to update.
- update: The modifications to apply.
- upsert (optional): If
True
, perform an insert if no documents match the filter. - collation (optional): An instance of
Collation
. This option is only supported on MongoDB 3.4 and above. - array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
- hint (optional): An index to use to support the query
predicate specified either by its string name, or in the same
format as passed to
create_index()
(e.g.[('field', ASCENDING)]
). This option is only supported on MongoDB 4.2 and above.
Changed in version 3.11: Added the hint option.
Changed in version 3.9: Added the ability to accept a pipeline as the update.
Changed in version 3.6: Added the array_filters option.
Changed in version 3.5: Added the collation option.
pool
– Pool module for use with a MongoDB client.¶
-
class
pymongo.pool.
SocketInfo
(sock, pool, address, id)¶ Store a socket with some metadata.
Parameters: - sock: a raw socket object
- pool: a Pool instance
- address: the server’s (host, port)
- id: the id of this socket in it’s pool
-
authenticate
(credentials)¶ Log in to the server and store these credentials in authset.
Can raise ConnectionFailure or OperationFailure.
Parameters: - credentials: A MongoCredential.
-
check_auth
(all_credentials)¶ Update this socket’s authentication.
Log in or out to bring this socket’s credentials up to date with those provided. Can raise ConnectionFailure or OperationFailure.
Parameters: - all_credentials: dict, maps auth source to MongoCredential.
-
close_socket
(reason)¶ Close this connection with a reason.
-
command
(dbname, spec, slave_ok=False, read_preference=Primary(), codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)), check=True, allowable_errors=None, check_keys=False, read_concern=None, write_concern=None, parse_write_concern_error=False, collation=None, session=None, client=None, retryable_write=False, publish_events=True, user_fields=None, exhaust_allowed=False)¶ Execute a command or raise an error.
Parameters: - dbname: name of the database on which to run the command
- spec: a command document as a dict, SON, or mapping object
- slave_ok: whether to set the SlaveOkay wire protocol bit
- read_preference: a read preference
- codec_options: a CodecOptions instance
- check: raise OperationFailure if there are errors
- allowable_errors: errors to ignore if check is True
- check_keys: if True, check spec for invalid keys
- read_concern: The read concern for this command.
- write_concern: The write concern for this command.
- parse_write_concern_error: Whether to parse the
writeConcernError
field in the command response. - collation: The collation for this command.
- session: optional ClientSession instance.
- client: optional MongoClient for gossipping $clusterTime.
- retryable_write: True if this command is a retryable write.
- publish_events: Should we publish events for this command?
- user_fields (optional): Response fields that should be decoded using the TypeDecoders from codec_options, passed to bson._decode_all_selective.
-
idle_time_seconds
()¶ Seconds since this socket was last checked into its pool.
-
legacy_write
(request_id, msg, max_doc_size, with_last_error)¶ Send OP_INSERT, etc., optionally returning response as a dict.
Can raise ConnectionFailure or OperationFailure.
Parameters: - request_id: an int.
- msg: bytes, an OP_INSERT, OP_UPDATE, or OP_DELETE message, perhaps with a getlasterror command appended.
- max_doc_size: size in bytes of the largest document in msg.
- with_last_error: True if a getlasterror command is appended.
-
receive_message
(request_id)¶ Receive a raw BSON message or raise ConnectionFailure.
If any exception is raised, the socket is closed.
-
send_cluster_time
(command, session, client)¶ Add cluster time for MongoDB >= 3.6.
-
send_message
(message, max_doc_size)¶ Send a raw BSON message or raise ConnectionFailure.
If a network exception is raised, the socket is closed.
-
socket_closed
()¶ Return True if we know socket has been closed, False otherwise.
-
validate_session
(client, session)¶ Validate this session before use with client.
Raises error if this session is logged in as a different user or the client is not the one that created the session.
-
write_command
(request_id, msg)¶ Send “insert” etc. command, returning response as a dict.
Can raise ConnectionFailure or OperationFailure.
Parameters: - request_id: an int.
- msg: bytes, the command message.
read_concern
– Tools for working with read concern.¶
Tools for working with read concerns.
-
class
pymongo.read_concern.
ReadConcern
(level=None)¶ Parameters: - level: (string) The read concern level specifies the level of
isolation for read operations. For example, a read operation using a
read concern level of
majority
will only return data that has been written to a majority of nodes. If the level is left unspecified, the server default will be used.
New in version 3.2.
-
document
¶ The document representation of this read concern.
Note
ReadConcern
is immutable. Mutating the value ofdocument
does not mutate thisReadConcern
.
-
level
¶ The read concern level.
-
ok_for_legacy
¶ Return
True
if this read concern is compatible with old wire protocol versions.
- level: (string) The read concern level specifies the level of
isolation for read operations. For example, a read operation using a
read concern level of
read_preferences
– Utilities for choosing which member of a replica set to read from.¶
Utilities for choosing which member of a replica set to read from.
-
class
pymongo.read_preferences.
Primary
¶ Primary read preference.
- When directly connected to one mongod queries are allowed if the server is standalone or a replica set primary.
- When connected to a mongos queries are sent to the primary of a shard.
- When connected to a replica set queries are sent to the primary of the replica set.
-
document
¶ Read preference as a document.
-
mode
¶ The mode of this read preference instance.
-
name
¶ The name of this read preference.
-
class
pymongo.read_preferences.
PrimaryPreferred
(tag_sets=None, max_staleness=-1, hedge=None)¶ PrimaryPreferred read preference.
- When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
- When connected to a mongos queries are sent to the primary of a shard if available, otherwise a shard secondary.
- When connected to a replica set queries are sent to the primary if available, otherwise a secondary.
Parameters: - tag_sets: The
tag_sets
to use if the primary is not available. - max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
- hedge: The
hedge
to use if the primary is not available.
Changed in version 3.11: Added
hedge
parameter.-
document
¶ Read preference as a document.
-
hedge
¶ The read preference
hedge
parameter.A dictionary that configures how the server will perform hedged reads. It consists of the following keys:
enabled
: Enables or disables hedged reads in sharded clusters.
Hedged reads are automatically enabled in MongoDB 4.4+ when using a
nearest
read preference. To explicitly enable hedged reads, set theenabled
key totrue
:>>> Nearest(hedge={'enabled': True})
To explicitly disable hedged reads, set the
enabled
key toFalse
:>>> Nearest(hedge={'enabled': False})
New in version 3.11.
-
max_staleness
¶ The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.
-
min_wire_version
¶ The wire protocol version the server must support.
Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).
All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises
ConfigurationError
.
-
mode
¶ The mode of this read preference instance.
-
mongos_mode
¶ The mongos mode of this read preference.
-
name
¶ The name of this read preference.
-
tag_sets
¶ Set
tag_sets
to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whosedc
tag has the value"ny"
. To specify a priority-order for tag sets, provide a list of tag sets:[{'dc': 'ny'}, {'dc': 'la'}, {}]
. A final, empty tag set,{}
, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.See also
-
class
pymongo.read_preferences.
Secondary
(tag_sets=None, max_staleness=-1, hedge=None)¶ Secondary read preference.
- When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
- When connected to a mongos queries are distributed among shard secondaries. An error is raised if no secondaries are available.
- When connected to a replica set queries are distributed among secondaries. An error is raised if no secondaries are available.
Parameters: - tag_sets: The
tag_sets
for this read preference. - max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
- hedge: The
hedge
for this read preference.
Changed in version 3.11: Added
hedge
parameter.-
document
¶ Read preference as a document.
-
hedge
¶ The read preference
hedge
parameter.A dictionary that configures how the server will perform hedged reads. It consists of the following keys:
enabled
: Enables or disables hedged reads in sharded clusters.
Hedged reads are automatically enabled in MongoDB 4.4+ when using a
nearest
read preference. To explicitly enable hedged reads, set theenabled
key totrue
:>>> Nearest(hedge={'enabled': True})
To explicitly disable hedged reads, set the
enabled
key toFalse
:>>> Nearest(hedge={'enabled': False})
New in version 3.11.
-
max_staleness
¶ The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.
-
min_wire_version
¶ The wire protocol version the server must support.
Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).
All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises
ConfigurationError
.
-
mode
¶ The mode of this read preference instance.
-
mongos_mode
¶ The mongos mode of this read preference.
-
name
¶ The name of this read preference.
-
tag_sets
¶ Set
tag_sets
to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whosedc
tag has the value"ny"
. To specify a priority-order for tag sets, provide a list of tag sets:[{'dc': 'ny'}, {'dc': 'la'}, {}]
. A final, empty tag set,{}
, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.See also
-
class
pymongo.read_preferences.
SecondaryPreferred
(tag_sets=None, max_staleness=-1, hedge=None)¶ SecondaryPreferred read preference.
- When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
- When connected to a mongos queries are distributed among shard secondaries, or the shard primary if no secondary is available.
- When connected to a replica set queries are distributed among secondaries, or the primary if no secondary is available.
Parameters: - tag_sets: The
tag_sets
for this read preference. - max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
- hedge: The
hedge
for this read preference.
Changed in version 3.11: Added
hedge
parameter.-
document
¶ Read preference as a document.
-
hedge
¶ The read preference
hedge
parameter.A dictionary that configures how the server will perform hedged reads. It consists of the following keys:
enabled
: Enables or disables hedged reads in sharded clusters.
Hedged reads are automatically enabled in MongoDB 4.4+ when using a
nearest
read preference. To explicitly enable hedged reads, set theenabled
key totrue
:>>> Nearest(hedge={'enabled': True})
To explicitly disable hedged reads, set the
enabled
key toFalse
:>>> Nearest(hedge={'enabled': False})
New in version 3.11.
-
max_staleness
¶ The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.
-
min_wire_version
¶ The wire protocol version the server must support.
Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).
All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises
ConfigurationError
.
-
mode
¶ The mode of this read preference instance.
-
mongos_mode
¶ The mongos mode of this read preference.
-
name
¶ The name of this read preference.
-
tag_sets
¶ Set
tag_sets
to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whosedc
tag has the value"ny"
. To specify a priority-order for tag sets, provide a list of tag sets:[{'dc': 'ny'}, {'dc': 'la'}, {}]
. A final, empty tag set,{}
, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.See also
-
class
pymongo.read_preferences.
Nearest
(tag_sets=None, max_staleness=-1, hedge=None)¶ Nearest read preference.
- When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
- When connected to a mongos queries are distributed among all members of a shard.
- When connected to a replica set queries are distributed among all members.
Parameters: - tag_sets: The
tag_sets
for this read preference. - max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
- hedge: The
hedge
for this read preference.
Changed in version 3.11: Added
hedge
parameter.-
document
¶ Read preference as a document.
-
hedge
¶ The read preference
hedge
parameter.A dictionary that configures how the server will perform hedged reads. It consists of the following keys:
enabled
: Enables or disables hedged reads in sharded clusters.
Hedged reads are automatically enabled in MongoDB 4.4+ when using a
nearest
read preference. To explicitly enable hedged reads, set theenabled
key totrue
:>>> Nearest(hedge={'enabled': True})
To explicitly disable hedged reads, set the
enabled
key toFalse
:>>> Nearest(hedge={'enabled': False})
New in version 3.11.
-
max_staleness
¶ The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.
-
min_wire_version
¶ The wire protocol version the server must support.
Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).
All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises
ConfigurationError
.
-
mode
¶ The mode of this read preference instance.
-
mongos_mode
¶ The mongos mode of this read preference.
-
name
¶ The name of this read preference.
-
tag_sets
¶ Set
tag_sets
to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whosedc
tag has the value"ny"
. To specify a priority-order for tag sets, provide a list of tag sets:[{'dc': 'ny'}, {'dc': 'la'}, {}]
. A final, empty tag set,{}
, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.See also
-
class
pymongo.read_preferences.
ReadPreference
¶ An enum that defines the read preference modes supported by PyMongo.
See High Availability and PyMongo for code examples.
A read preference is used in three cases:
MongoClient
connected to a single mongod:PRIMARY
: Queries are allowed if the server is standalone or a replica set primary.- All other modes allow queries to standalone servers, to a replica set primary, or to replica set secondaries.
MongoClient
initialized with thereplicaSet
option:PRIMARY
: Read from the primary. This is the default, and provides the strongest consistency. If no primary is available, raiseAutoReconnect
.PRIMARY_PREFERRED
: Read from the primary if available, or if there is none, read from a secondary.SECONDARY
: Read from a secondary. If no secondary is available, raiseAutoReconnect
.SECONDARY_PREFERRED
: Read from a secondary if available, otherwise from the primary.NEAREST
: Read from any member.
MongoClient
connected to a mongos, with a sharded cluster of replica sets:PRIMARY
: Read from the primary of the shard, or raiseOperationFailure
if there is none. This is the default.PRIMARY_PREFERRED
: Read from the primary of the shard, or if there is none, read from a secondary of the shard.SECONDARY
: Read from a secondary of the shard, or raiseOperationFailure
if there is none.SECONDARY_PREFERRED
: Read from a secondary of the shard if available, otherwise from the shard primary.NEAREST
: Read from any shard member.
-
PRIMARY
= Primary()¶
-
PRIMARY_PREFERRED
= PrimaryPreferred(tag_sets=None, max_staleness=-1, hedge=None)¶
-
SECONDARY
= Secondary(tag_sets=None, max_staleness=-1, hedge=None)¶
-
SECONDARY_PREFERRED
= SecondaryPreferred(tag_sets=None, max_staleness=-1, hedge=None)¶
-
NEAREST
= Nearest(tag_sets=None, max_staleness=-1, hedge=None)¶
results
– Result class definitions¶
Result class definitions.
-
class
pymongo.results.
BulkWriteResult
(bulk_api_result, acknowledged)¶ Create a BulkWriteResult instance.
Parameters: - bulk_api_result: A result dict from the bulk API
- acknowledged: Was this write result acknowledged? If
False
then all properties of this object will raiseInvalidOperation
.
-
acknowledged
¶ Is this the result of an acknowledged write operation?
The
acknowledged
attribute will beFalse
when usingWriteConcern(w=0)
, otherwiseTrue
.Note
If the
acknowledged
attribute isFalse
all other attibutes of this class will raiseInvalidOperation
when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.See also
-
bulk_api_result
¶ The raw bulk API result.
-
deleted_count
¶ The number of documents deleted.
-
inserted_count
¶ The number of documents inserted.
-
matched_count
¶ The number of documents matched for an update.
-
modified_count
¶ The number of documents modified.
Note
modified_count is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, this attribute will be set to
None
.
-
upserted_count
¶ The number of documents upserted.
-
upserted_ids
¶ A map of operation index to the _id of the upserted document.
-
class
pymongo.results.
DeleteResult
(raw_result, acknowledged)¶ The return type for
delete_one()
anddelete_many()
-
acknowledged
¶ Is this the result of an acknowledged write operation?
The
acknowledged
attribute will beFalse
when usingWriteConcern(w=0)
, otherwiseTrue
.Note
If the
acknowledged
attribute isFalse
all other attibutes of this class will raiseInvalidOperation
when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.See also
-
deleted_count
¶ The number of documents deleted.
-
raw_result
¶ The raw result document returned by the server.
-
-
class
pymongo.results.
InsertManyResult
(inserted_ids, acknowledged)¶ The return type for
insert_many()
.-
acknowledged
¶ Is this the result of an acknowledged write operation?
The
acknowledged
attribute will beFalse
when usingWriteConcern(w=0)
, otherwiseTrue
.Note
If the
acknowledged
attribute isFalse
all other attibutes of this class will raiseInvalidOperation
when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.See also
-
inserted_ids
¶ A list of _ids of the inserted documents, in the order provided.
Note
If
False
is passed for the ordered parameter toinsert_many()
the server may have inserted the documents in a different order than what is presented here.
-
-
class
pymongo.results.
InsertOneResult
(inserted_id, acknowledged)¶ The return type for
insert_one()
.-
acknowledged
¶ Is this the result of an acknowledged write operation?
The
acknowledged
attribute will beFalse
when usingWriteConcern(w=0)
, otherwiseTrue
.Note
If the
acknowledged
attribute isFalse
all other attibutes of this class will raiseInvalidOperation
when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.See also
-
inserted_id
¶ The inserted document’s _id.
-
-
class
pymongo.results.
UpdateResult
(raw_result, acknowledged)¶ The return type for
update_one()
,update_many()
, andreplace_one()
.-
acknowledged
¶ Is this the result of an acknowledged write operation?
The
acknowledged
attribute will beFalse
when usingWriteConcern(w=0)
, otherwiseTrue
.Note
If the
acknowledged
attribute isFalse
all other attibutes of this class will raiseInvalidOperation
when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.See also
-
matched_count
¶ The number of documents matched for this update.
-
modified_count
¶ The number of documents modified.
Note
modified_count is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, this attribute will be set to
None
.
-
raw_result
¶ The raw result document returned by the server.
-
upserted_id
¶ The _id of the inserted document if an upsert took place. Otherwise
None
.
-
son_manipulator
– Manipulators that can edit SON documents as they are saved or retrieved¶
DEPRECATED: Manipulators that can edit SON objects as they enter and exit a database.
The SONManipulator
API has limitations as a
technique for transforming your data. Instead, it is more flexible and
straightforward to transform outgoing documents in your own code before passing
them to PyMongo, and transform incoming documents after receiving them from
PyMongo. SON Manipulators will be removed from PyMongo in 4.0.
PyMongo does not apply SON manipulators to documents passed to
the modern methods bulk_write()
,
insert_one()
,
insert_many()
,
update_one()
, or
update_many()
. SON manipulators are
not applied to documents returned by the modern methods
find_one_and_delete()
,
find_one_and_replace()
, and
find_one_and_update()
.
-
class
pymongo.son_manipulator.
AutoReference
(db)¶ Transparently reference and de-reference already saved embedded objects.
This manipulator should probably only be used when the NamespaceInjector is also being used, otherwise it doesn’t make too much sense - documents can only be auto-referenced if they have an _ns field.
NOTE: this will behave poorly if you have a circular reference.
TODO: this only works for documents that are in the same database. To fix this we’ll need to add a DatabaseInjector that adds _db and then make use of the optional database support for DBRefs.
-
transform_incoming
(son, collection)¶ Replace embedded documents with DBRefs.
-
transform_outgoing
(son, collection)¶ Replace DBRefs with embedded documents.
-
will_copy
()¶ We need to copy so the user’s document doesn’t get transformed refs.
-
-
class
pymongo.son_manipulator.
NamespaceInjector
¶ A son manipulator that adds the _ns field.
-
transform_incoming
(son, collection)¶ Add the _ns field to the incoming object
-
-
class
pymongo.son_manipulator.
ObjectIdInjector
¶ A son manipulator that adds the _id field if it is missing.
Changed in version 2.7: ObjectIdInjector is no longer used by PyMongo, but remains in this module for backwards compatibility.
-
transform_incoming
(son, collection)¶ Add an _id field if it is missing.
-
-
class
pymongo.son_manipulator.
ObjectIdShuffler
¶ A son manipulator that moves _id to the first position.
-
transform_incoming
(son, collection)¶ Move _id to the front if it’s there.
-
will_copy
()¶ We need to copy to be sure that we are dealing with SON, not a dict.
-
-
class
pymongo.son_manipulator.
SONManipulator
¶ A base son manipulator.
This manipulator just saves and restores objects without changing them.
-
transform_incoming
(son, collection)¶ Manipulate an incoming SON object.
Parameters: - son: the SON object to be inserted into the database
- collection: the collection the object is being inserted into
-
transform_outgoing
(son, collection)¶ Manipulate an outgoing SON object.
Parameters: - son: the SON object being retrieved from the database
- collection: the collection this object was stored in
-
will_copy
()¶ Will this SON manipulator make a copy of the incoming document?
Derived classes that do need to make a copy should override this method, returning True instead of False. All non-copying manipulators will be applied first (so that the user’s document will be updated appropriately), followed by copying manipulators.
-
uri_parser
– Tools to parse and validate a MongoDB URI¶
Tools to parse and validate a MongoDB URI.
-
pymongo.uri_parser.
parse_host
(entity, default_port=27017)¶ Validates a host string
Returns a 2-tuple of host followed by port where port is default_port if it wasn’t specified in the string.
Parameters: - entity: A host or host:port string where host could be a
- hostname or IP address.
- default_port: The port number to use when one wasn’t
- specified in entity.
-
pymongo.uri_parser.
parse_ipv6_literal_host
(entity, default_port)¶ Validates an IPv6 literal host:port string.
Returns a 2-tuple of IPv6 literal followed by port where port is default_port if it wasn’t specified in entity.
Parameters: - entity: A string that represents an IPv6 literal enclosed
- in braces (e.g. ‘[::1]’ or ‘[::1]:27017’).
- default_port: The port number to use when one wasn’t
- specified in entity.
-
pymongo.uri_parser.
parse_uri
(uri, default_port=27017, validate=True, warn=False, normalize=True, connect_timeout=None)¶ Parse and validate a MongoDB URI.
Returns a dict of the form:
{ 'nodelist': <list of (host, port) tuples>, 'username': <username> or None, 'password': <password> or None, 'database': <database name> or None, 'collection': <collection name> or None, 'options': <dict of MongoDB URI options>, 'fqdn': <fqdn of the MongoDB+SRV URI> or None }
If the URI scheme is “mongodb+srv://” DNS SRV and TXT lookups will be done to build nodelist and options.
Parameters: - uri: The MongoDB URI to parse.
- default_port: The port number to use when one wasn’t specified for a host in the URI.
- validate (optional): If
True
(the default), validate and normalize all options. Default:True
. - warn (optional): When validating, if
True
then will warn the user then ignore any invalid options or values. IfFalse
, validation will error when options are unsupported or values are invalid. Default:False
. - normalize (optional): If
True
, convert names of URI options to their internally-used names. Default:True
. - connect_timeout (optional): The maximum time in milliseconds to wait for a response from the DNS server.
Changed in version 3.9: Added the
normalize
parameter.Changed in version 3.6: Added support for mongodb+srv:// URIs.
Changed in version 3.5: Return the original value of the
readPreference
MongoDB URI option instead of the validated read preference mode.Changed in version 3.1:
warn
added so invalid options can be ignored.
-
pymongo.uri_parser.
parse_userinfo
(userinfo)¶ Validates the format of user information in a MongoDB URI. Reserved characters like ‘:’, ‘/’, ‘+’ and ‘@’ must be escaped following RFC 3986.
Returns a 2-tuple containing the unescaped username followed by the unescaped password.
Paramaters: - userinfo: A string of the form <username>:<password>
Changed in version 2.2: Now uses urllib.unquote_plus so + characters must be escaped.
-
pymongo.uri_parser.
split_hosts
(hosts, default_port=27017)¶ Takes a string of the form host1[:port],host2[:port]… and splits it into (host, port) tuples. If [:port] isn’t present the default_port is used.
Returns a set of 2-tuples containing the host name (or IP) followed by port number.
Parameters: - hosts: A string of the form host1[:port],host2[:port],…
- default_port: The port number to use when one wasn’t specified for a host.
-
pymongo.uri_parser.
split_options
(opts, validate=True, warn=False, normalize=True)¶ Takes the options portion of a MongoDB URI, validates each option and returns the options in a dictionary.
Parameters: - opt: A string representing MongoDB URI options.
- validate: If
True
(the default), validate and normalize all options. - warn: If
False
(the default), suppress all warnings raised during validation of options. - normalize: If
True
(the default), renames all options to their internally-used names.
-
pymongo.uri_parser.
validate_options
(opts, warn=False)¶ Validates and normalizes options passed in a MongoDB URI.
Returns a new dictionary of validated and normalized options. If warn is False then errors will be thrown for invalid options, otherwise they will be ignored and a warning will be issued.
Parameters: - opts: A dict of MongoDB URI options.
- warn (optional): If
True
then warnings will be logged and invalid options will be ignored. Otherwise invalid options will cause errors.
write_concern
– Tools for specifying write concern¶
Tools for working with write concerns.
-
class
pymongo.write_concern.
WriteConcern
(w=None, wtimeout=None, j=None, fsync=None)¶ Parameters: - w: (integer or string) Used with replication, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<integer> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). w=0 disables acknowledgement of write operations and can not be used with other write concern options.
- wtimeout: (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
- j: If
True
block until write operations have been committed to the journal. Cannot be used in combination with fsync. Prior to MongoDB 2.6 this option was ignored if the server was running without journaling. Starting with MongoDB 2.6 write operations will fail with an exception if this option is used when the server is running without journaling. - fsync: If
True
and the server is running without journaling, blocks until the server has synced all data files to disk. If the server is running with journaling, this acts the same as the j option, blocking until write operations have been committed to the journal. Cannot be used in combination with j.
-
acknowledged
¶ If
True
write operations will wait for acknowledgement before returning.
-
document
¶ The document representation of this write concern.
Note
WriteConcern
is immutable. Mutating the value ofdocument
does not mutate thisWriteConcern
.
-
is_server_default
¶ Does this WriteConcern match the server default.
event_loggers
– Example loggers¶
Example event logger classes.
New in version 3.11.
These loggers can be registered using register()
or
MongoClient
.
monitoring.register(CommandLogger())
or
MongoClient(event_listeners=[CommandLogger()])
-
class
pymongo.event_loggers.
CommandLogger
¶ A simple listener that logs command events.
Listens for
CommandStartedEvent
,CommandSucceededEvent
andCommandFailedEvent
events and logs them at the INFO severity level usinglogging
. .. versionadded:: 3.11-
failed
(event)¶ Abstract method to handle a CommandFailedEvent.
Parameters: - event: An instance of
CommandFailedEvent
.
- event: An instance of
-
started
(event)¶ Abstract method to handle a CommandStartedEvent.
Parameters: - event: An instance of
CommandStartedEvent
.
- event: An instance of
-
succeeded
(event)¶ Abstract method to handle a CommandSucceededEvent.
Parameters: - event: An instance of
CommandSucceededEvent
.
- event: An instance of
-
-
class
pymongo.event_loggers.
ConnectionPoolLogger
¶ A simple listener that logs server connection pool events.
Listens for
PoolCreatedEvent
,PoolClearedEvent
,PoolClosedEvent
, :~pymongo.monitoring.class:ConnectionCreatedEvent,ConnectionReadyEvent
,ConnectionClosedEvent
,ConnectionCheckOutStartedEvent
,ConnectionCheckOutFailedEvent
,ConnectionCheckedOutEvent
, andConnectionCheckedInEvent
events and logs them at the INFO severity level usinglogging
.New in version 3.11.
-
connection_check_out_failed
(event)¶ Abstract method to handle a
ConnectionCheckOutFailedEvent
.Emitted when the driver’s attempt to check out a connection fails.
Parameters: - event: An instance of
ConnectionCheckOutFailedEvent
.
- event: An instance of
-
connection_check_out_started
(event)¶ Abstract method to handle a
ConnectionCheckOutStartedEvent
.Emitted when the driver starts attempting to check out a connection.
Parameters: - event: An instance of
ConnectionCheckOutStartedEvent
.
- event: An instance of
-
connection_checked_in
(event)¶ Abstract method to handle a
ConnectionCheckedInEvent
.Emitted when the driver checks in a Connection back to the Connection Pool.
Parameters: - event: An instance of
ConnectionCheckedInEvent
.
- event: An instance of
-
connection_checked_out
(event)¶ Abstract method to handle a
ConnectionCheckedOutEvent
.Emitted when the driver successfully checks out a Connection.
Parameters: - event: An instance of
ConnectionCheckedOutEvent
.
- event: An instance of
-
connection_closed
(event)¶ Abstract method to handle a
ConnectionClosedEvent
.Emitted when a Connection Pool closes a Connection.
Parameters: - event: An instance of
ConnectionClosedEvent
.
- event: An instance of
-
connection_created
(event)¶ Abstract method to handle a
ConnectionCreatedEvent
.Emitted when a Connection Pool creates a Connection object.
Parameters: - event: An instance of
ConnectionCreatedEvent
.
- event: An instance of
-
connection_ready
(event)¶ Abstract method to handle a
ConnectionReadyEvent
.Emitted when a Connection has finished its setup, and is now ready to use.
Parameters: - event: An instance of
ConnectionReadyEvent
.
- event: An instance of
-
pool_cleared
(event)¶ Abstract method to handle a PoolClearedEvent.
Emitted when a Connection Pool is cleared.
Parameters: - event: An instance of
PoolClearedEvent
.
- event: An instance of
-
pool_closed
(event)¶ Abstract method to handle a PoolClosedEvent.
Emitted when a Connection Pool is closed.
Parameters: - event: An instance of
PoolClosedEvent
.
- event: An instance of
-
pool_created
(event)¶ Abstract method to handle a
PoolCreatedEvent
.Emitted when a Connection Pool is created.
Parameters: - event: An instance of
PoolCreatedEvent
.
- event: An instance of
-
-
class
pymongo.event_loggers.
HeartbeatLogger
¶ A simple listener that logs server heartbeat events.
Listens for
ServerHeartbeatStartedEvent
,ServerHeartbeatSucceededEvent
, andServerHeartbeatFailedEvent
events and logs them at the INFO severity level usinglogging
.New in version 3.11.
-
failed
(event)¶ Abstract method to handle a ServerHeartbeatFailedEvent.
Parameters: - event: An instance of
ServerHeartbeatFailedEvent
.
- event: An instance of
-
started
(event)¶ Abstract method to handle a ServerHeartbeatStartedEvent.
Parameters: - event: An instance of
ServerHeartbeatStartedEvent
.
- event: An instance of
-
succeeded
(event)¶ Abstract method to handle a ServerHeartbeatSucceededEvent.
Parameters: - event: An instance of
ServerHeartbeatSucceededEvent
.
- event: An instance of
-
-
class
pymongo.event_loggers.
ServerLogger
¶ A simple listener that logs server discovery events.
Listens for
ServerOpeningEvent
,ServerDescriptionChangedEvent
, andServerClosedEvent
events and logs them at the INFO severity level usinglogging
.New in version 3.11.
-
closed
(event)¶ Abstract method to handle a ServerClosedEvent.
Parameters: - event: An instance of
ServerClosedEvent
.
- event: An instance of
-
description_changed
(event)¶ Abstract method to handle a ServerDescriptionChangedEvent.
Parameters: - event: An instance of
ServerDescriptionChangedEvent
.
- event: An instance of
-
opened
(event)¶ Abstract method to handle a ServerOpeningEvent.
Parameters: - event: An instance of
ServerOpeningEvent
.
- event: An instance of
-
-
class
pymongo.event_loggers.
TopologyLogger
¶ A simple listener that logs server topology events.
Listens for
TopologyOpenedEvent
,TopologyDescriptionChangedEvent
, andTopologyClosedEvent
events and logs them at the INFO severity level usinglogging
.New in version 3.11.
-
closed
(event)¶ Abstract method to handle a TopologyClosedEvent.
Parameters: - event: An instance of
TopologyClosedEvent
.
- event: An instance of
-
description_changed
(event)¶ Abstract method to handle a TopologyDescriptionChangedEvent.
Parameters: - event: An instance of
TopologyDescriptionChangedEvent
.
- event: An instance of
-
opened
(event)¶ Abstract method to handle a TopologyOpenedEvent.
Parameters: - event: An instance of
TopologyOpenedEvent
.
- event: An instance of
-
gridfs
– Tools for working with GridFS¶
GridFS is a specification for storing large objects in Mongo.
The gridfs
package is an implementation of GridFS on top of
pymongo
, exposing a file-like interface.
-
class
gridfs.
GridFS
(database, collection='fs', disable_md5=False)¶ Create a new instance of
GridFS
.Raises
TypeError
if database is not an instance ofDatabase
.Parameters: - database: database to use
- collection (optional): root collection to use
- disable_md5 (optional): When True, MD5 checksums will not be computed for uploaded files. Useful in environments where MD5 cannot be used for regulatory or other reasons. Defaults to False.
Changed in version 3.11: Running a GridFS operation in a transaction now always raises an error. GridFS does not support multi-document transactions.
Changed in version 3.1: Indexes are only ensured on the first write to the DB.
Changed in version 3.0: database must use an acknowledged
write_concern
-
delete
(file_id, session=None)¶ Delete a file from GridFS by
"_id"
.Deletes all data belonging to the file with
"_id"
: file_id.Warning
Any processes/threads reading from the file while this method is executing will likely see an invalid/corrupt file. Care should be taken to avoid concurrent reads to a file while it is being deleted.
Note
Deletes of non-existent files are considered successful since the end result is the same: no file with that _id remains.
Parameters: - file_id:
"_id"
of the file to delete - session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.Changed in version 3.1:
delete
no longer ensures indexes.- file_id:
-
exists
(document_or_id=None, session=None, **kwargs)¶ Check if a file exists in this instance of
GridFS
.The file to check for can be specified by the value of its
_id
key, or by passing in a query document. A query document can be passed in as dictionary, or by using keyword arguments. Thus, the following three calls are equivalent:>>> fs.exists(file_id) >>> fs.exists({"_id": file_id}) >>> fs.exists(_id=file_id)
As are the following two calls:
>>> fs.exists({"filename": "mike.txt"}) >>> fs.exists(filename="mike.txt")
And the following two:
>>> fs.exists({"foo": {"$gt": 12}}) >>> fs.exists(foo={"$gt": 12})
Returns
True
if a matching file exists,False
otherwise. Calls toexists()
will not automatically create appropriate indexes; application developers should be sure to create indexes if needed and as appropriate.Parameters: - document_or_id (optional): query document, or _id of the document to check for
- session (optional): a
ClientSession
- **kwargs (optional): keyword arguments are used as a query document, if they’re present.
Changed in version 3.6: Added
session
parameter.
-
find
(*args, **kwargs)¶ Query GridFS for files.
Returns a cursor that iterates across files matching arbitrary queries on the files collection. Can be combined with other modifiers for additional control. For example:
for grid_out in fs.find({"filename": "lisa.txt"}, no_cursor_timeout=True): data = grid_out.read()
would iterate through all versions of “lisa.txt” stored in GridFS. Note that setting no_cursor_timeout to True may be important to prevent the cursor from timing out during long multi-file processing work.
As another example, the call:
most_recent_three = fs.find().sort("uploadDate", -1).limit(3)
would return a cursor to the three most recently uploaded files in GridFS.
Follows a similar interface to
find()
inCollection
.If a
ClientSession
is passed tofind()
, all returnedGridOut
instances are associated with that session.Parameters: - filter (optional): a SON object specifying elements which must be present for a document to be included in the result set
- skip (optional): the number of files to omit (from the start of the result set) when returning the results
- limit (optional): the maximum number of results to return
- no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.
- sort (optional): a list of (key, direction) pairs
specifying the sort order for this query. See
sort()
for details.
Raises
TypeError
if any of the arguments are of improper type. Returns an instance ofGridOutCursor
corresponding to this query.Changed in version 3.0: Removed the read_preference, tag_sets, and secondary_acceptable_latency_ms options.
New in version 2.7.
-
find_one
(filter=None, session=None, *args, **kwargs)¶ Get a single file from gridfs.
All arguments to
find()
are also valid arguments forfind_one()
, although any limit argument will be ignored. Returns a singleGridOut
, orNone
if no matching file is found. For example:file = fs.find_one({"filename": "lisa.txt"})
Parameters: - filter (optional): a dictionary specifying
the query to be performing OR any other type to be used as
the value for a query for
"_id"
in the file collection. - *args (optional): any additional positional arguments are
the same as the arguments to
find()
. - session (optional): a
ClientSession
- **kwargs (optional): any additional keyword arguments
are the same as the arguments to
find()
.
Changed in version 3.6: Added
session
parameter.- filter (optional): a dictionary specifying
the query to be performing OR any other type to be used as
the value for a query for
-
get
(file_id, session=None)¶ Get a file from GridFS by
"_id"
.Returns an instance of
GridOut
, which provides a file-like interface for reading.Parameters: - file_id:
"_id"
of the file to get - session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.- file_id:
-
get_last_version
(filename=None, session=None, **kwargs)¶ Get the most recent version of a file in GridFS by
"filename"
or metadata fields.Equivalent to calling
get_version()
with the default version (-1
).Parameters: - filename:
"filename"
of the file to get, or None - session (optional): a
ClientSession
- **kwargs (optional): find files by custom metadata.
Changed in version 3.6: Added
session
parameter.- filename:
-
get_version
(filename=None, version=-1, session=None, **kwargs)¶ Get a file from GridFS by
"filename"
or metadata fields.Returns a version of the file in GridFS whose filename matches filename and whose metadata fields match the supplied keyword arguments, as an instance of
GridOut
.Version numbering is a convenience atop the GridFS API provided by MongoDB. If more than one file matches the query (either by filename alone, by metadata fields, or by a combination of both), then version
-1
will be the most recently uploaded matching file,-2
the second most recently uploaded, etc. Version0
will be the first version uploaded,1
the second version, etc. So if three versions have been uploaded, then version0
is the same as version-3
, version1
is the same as version-2
, and version2
is the same as version-1
.Raises
NoFile
if no such version of that file exists.Parameters: - filename:
"filename"
of the file to get, or None - version (optional): version of the file to get (defaults to -1, the most recent version uploaded)
- session (optional): a
ClientSession
- **kwargs (optional): find files by custom metadata.
Changed in version 3.6: Added
session
parameter.Changed in version 3.1:
get_version
no longer ensures indexes.- filename:
-
list
(session=None)¶ List the names of all files stored in this instance of
GridFS
.Parameters: - session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.Changed in version 3.1:
list
no longer ensures indexes.- session (optional): a
-
new_file
(**kwargs)¶ Create a new file in GridFS.
Returns a new
GridIn
instance to which data can be written. Any keyword arguments will be passed through toGridIn()
.If the
"_id"
of the file is manually specified, it must not already exist in GridFS. OtherwiseFileExists
is raised.Parameters: - **kwargs (optional): keyword arguments for file creation
-
put
(data, **kwargs)¶ Put data in GridFS as a new file.
Equivalent to doing:
try: f = new_file(**kwargs) f.write(data) finally: f.close()
data can be either an instance of
str
(bytes
in python 3) or a file-like object providing aread()
method. If an encoding keyword argument is passed, data can also be aunicode
(str
in python 3) instance, which will be encoded as encoding before being written. Any keyword arguments will be passed through to the created file - seeGridIn()
for possible arguments. Returns the"_id"
of the created file.If the
"_id"
of the file is manually specified, it must not already exist in GridFS. OtherwiseFileExists
is raised.Parameters: - data: data to be written as a file.
- **kwargs (optional): keyword arguments for file creation
Changed in version 3.0: w=0 writes to GridFS are now prohibited.
-
class
gridfs.
GridFSBucket
(db, bucket_name='fs', chunk_size_bytes=261120, write_concern=None, read_preference=None, disable_md5=False)¶ Create a new instance of
GridFSBucket
.Raises
TypeError
if database is not an instance ofDatabase
.Raises
ConfigurationError
if write_concern is not acknowledged.Parameters: - database: database to use.
- bucket_name (optional): The name of the bucket. Defaults to ‘fs’.
- chunk_size_bytes (optional): The chunk size in bytes. Defaults to 255KB.
- write_concern (optional): The
WriteConcern
to use. IfNone
(the default) db.write_concern is used. - read_preference (optional): The read preference to use. If
None
(the default) db.read_preference is used. - disable_md5 (optional): When True, MD5 checksums will not be computed for uploaded files. Useful in environments where MD5 cannot be used for regulatory or other reasons. Defaults to False.
Changed in version 3.11: Running a GridFS operation in a transaction now always raises an error. GridFSBucket does not support multi-document transactions.
New in version 3.1.
-
delete
(file_id, session=None)¶ Given an file_id, delete this stored file’s files collection document and associated chunks from a GridFS bucket.
For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) # Get _id of file to delete file_id = fs.upload_from_stream("test_file", "data I want to store!") fs.delete(file_id)
Raises
NoFile
if no file with file_id exists.Parameters: - file_id: The _id of the file to be deleted.
- session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
-
download_to_stream
(file_id, destination, session=None)¶ Downloads the contents of the stored file specified by file_id and writes the contents to destination.
For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) # Get _id of file to read file_id = fs.upload_from_stream("test_file", "data I want to store!") # Get file to write to file = open('myfile','wb+') fs.download_to_stream(file_id, file) file.seek(0) contents = file.read()
Raises
NoFile
if no file with file_id exists.Parameters: - file_id: The _id of the file to be downloaded.
- destination: a file-like object implementing
write()
. - session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
-
download_to_stream_by_name
(filename, destination, revision=-1, session=None)¶ Write the contents of filename (with optional revision) to destination.
For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) # Get file to write to file = open('myfile','wb') fs.download_to_stream_by_name("test_file", file)
Raises
NoFile
if no such version of that file exists.Raises
ValueError
if filename is not a string.Parameters: - filename: The name of the file to read from.
- destination: A file-like object that implements
write()
. - revision (optional): Which revision (documents with the same filename and different uploadDate) of the file to retrieve. Defaults to -1 (the most recent revision).
- session (optional): a
ClientSession
Note: Revision numbers are defined as follows:
- 0 = the original stored file
- 1 = the first revision
- 2 = the second revision
- etc…
- -2 = the second most recent revision
- -1 = the most recent revision
Changed in version 3.6: Added
session
parameter.
-
find
(*args, **kwargs)¶ Find and return the files collection documents that match
filter
Returns a cursor that iterates across files matching arbitrary queries on the files collection. Can be combined with other modifiers for additional control.
For example:
for grid_data in fs.find({"filename": "lisa.txt"}, no_cursor_timeout=True): data = grid_data.read()
would iterate through all versions of “lisa.txt” stored in GridFS. Note that setting no_cursor_timeout to True may be important to prevent the cursor from timing out during long multi-file processing work.
As another example, the call:
most_recent_three = fs.find().sort("uploadDate", -1).limit(3)
would return a cursor to the three most recently uploaded files in GridFS.
Follows a similar interface to
find()
inCollection
.If a
ClientSession
is passed tofind()
, all returnedGridOut
instances are associated with that session.Parameters: - filter: Search query.
- batch_size (optional): The number of documents to return per batch.
- limit (optional): The maximum number of documents to return.
- no_cursor_timeout (optional): The server normally times out idle cursors after an inactivity period (10 minutes) to prevent excess memory use. Set this option to True prevent that.
- skip (optional): The number of documents to skip before returning.
- sort (optional): The order by which to sort results. Defaults to None.
-
open_download_stream
(file_id, session=None)¶ Opens a Stream from which the application can read the contents of the stored file specified by file_id.
For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) # get _id of file to read. file_id = fs.upload_from_stream("test_file", "data I want to store!") grid_out = fs.open_download_stream(file_id) contents = grid_out.read()
Returns an instance of
GridOut
.Raises
NoFile
if no file with file_id exists.Parameters: - file_id: The _id of the file to be downloaded.
- session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
-
open_download_stream_by_name
(filename, revision=-1, session=None)¶ Opens a Stream from which the application can read the contents of filename and optional revision.
For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) grid_out = fs.open_download_stream_by_name("test_file") contents = grid_out.read()
Returns an instance of
GridOut
.Raises
NoFile
if no such version of that file exists.Raises
ValueError
filename is not a string.Parameters: - filename: The name of the file to read from.
- revision (optional): Which revision (documents with the same filename and different uploadDate) of the file to retrieve. Defaults to -1 (the most recent revision).
- session (optional): a
ClientSession
Note: Revision numbers are defined as follows:
- 0 = the original stored file
- 1 = the first revision
- 2 = the second revision
- etc…
- -2 = the second most recent revision
- -1 = the most recent revision
Changed in version 3.6: Added
session
parameter.
-
open_upload_stream
(filename, chunk_size_bytes=None, metadata=None, session=None)¶ Opens a Stream that the application can write the contents of the file to.
The user must specify the filename, and can choose to add any additional information in the metadata field of the file document or modify the chunk size. For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) grid_in = fs.open_upload_stream( "test_file", chunk_size_bytes=4, metadata={"contentType": "text/plain"}) grid_in.write("data I want to store!") grid_in.close() # uploaded on close
Returns an instance of
GridIn
.Raises
NoFile
if no such version of that file exists. RaisesValueError
if filename is not a string.Parameters: - filename: The name of the file to upload.
- chunk_size_bytes (options): The number of bytes per chunk of this
file. Defaults to the chunk_size_bytes in
GridFSBucket
. - metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
- session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
-
open_upload_stream_with_id
(file_id, filename, chunk_size_bytes=None, metadata=None, session=None)¶ Opens a Stream that the application can write the contents of the file to.
The user must specify the file id and filename, and can choose to add any additional information in the metadata field of the file document or modify the chunk size. For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) grid_in = fs.open_upload_stream_with_id( ObjectId(), "test_file", chunk_size_bytes=4, metadata={"contentType": "text/plain"}) grid_in.write("data I want to store!") grid_in.close() # uploaded on close
Returns an instance of
GridIn
.Raises
NoFile
if no such version of that file exists. RaisesValueError
if filename is not a string.Parameters: - file_id: The id to use for this file. The id must not have already been used for another file.
- filename: The name of the file to upload.
- chunk_size_bytes (options): The number of bytes per chunk of this
file. Defaults to the chunk_size_bytes in
GridFSBucket
. - metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
- session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
-
rename
(file_id, new_filename, session=None)¶ Renames the stored file with the specified file_id.
For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) # Get _id of file to rename file_id = fs.upload_from_stream("test_file", "data I want to store!") fs.rename(file_id, "new_test_name")
Raises
NoFile
if no file with file_id exists.Parameters: - file_id: The _id of the file to be renamed.
- new_filename: The new name of the file.
- session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
-
upload_from_stream
(filename, source, chunk_size_bytes=None, metadata=None, session=None)¶ Uploads a user file to a GridFS bucket.
Reads the contents of the user file from source and uploads it to the file filename. Source can be a string or file-like object. For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) file_id = fs.upload_from_stream( "test_file", "data I want to store!", chunk_size_bytes=4, metadata={"contentType": "text/plain"})
Returns the _id of the uploaded file.
Raises
NoFile
if no such version of that file exists. RaisesValueError
if filename is not a string.Parameters: - filename: The name of the file to upload.
- source: The source stream of the content to be uploaded. Must be
a file-like object that implements
read()
or a string. - chunk_size_bytes (options): The number of bytes per chunk of this
file. Defaults to the chunk_size_bytes of
GridFSBucket
. - metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
- session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
-
upload_from_stream_with_id
(file_id, filename, source, chunk_size_bytes=None, metadata=None, session=None)¶ Uploads a user file to a GridFS bucket with a custom file id.
Reads the contents of the user file from source and uploads it to the file filename. Source can be a string or file-like object. For example:
my_db = MongoClient().test fs = GridFSBucket(my_db) file_id = fs.upload_from_stream( ObjectId(), "test_file", "data I want to store!", chunk_size_bytes=4, metadata={"contentType": "text/plain"})
Raises
NoFile
if no such version of that file exists. RaisesValueError
if filename is not a string.Parameters: - file_id: The id to use for this file. The id must not have already been used for another file.
- filename: The name of the file to upload.
- source: The source stream of the content to be uploaded. Must be
a file-like object that implements
read()
or a string. - chunk_size_bytes (options): The number of bytes per chunk of this
file. Defaults to the chunk_size_bytes of
GridFSBucket
. - metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
- session (optional): a
ClientSession
Changed in version 3.6: Added
session
parameter.
Sub-modules:
errors
– Exceptions raised by the gridfs
package¶
Exceptions raised by the gridfs
package
-
exception
gridfs.errors.
CorruptGridFile
(message='', error_labels=None)¶ Raised when a file in
GridFS
is malformed.
-
exception
gridfs.errors.
FileExists
(message='', error_labels=None)¶ Raised when trying to create a file that already exists.
-
exception
gridfs.errors.
GridFSError
(message='', error_labels=None)¶ Base class for all GridFS exceptions.
-
exception
gridfs.errors.
NoFile
(message='', error_labels=None)¶ Raised when trying to read from a non-existent file.
grid_file
– Tools for representing files stored in GridFS¶
Tools for representing files stored in GridFS.
-
class
gridfs.grid_file.
GridIn
(root_collection, session=None, disable_md5=False, **kwargs)¶ Write a file to GridFS
Application developers should generally not need to instantiate this class directly - instead see the methods provided by
GridFS
.Raises
TypeError
if root_collection is not an instance ofCollection
.Any of the file level options specified in the GridFS Spec may be passed as keyword arguments. Any additional keyword arguments will be set as additional fields on the file document. Valid keyword arguments include:
"_id"
: unique ID for this file (default:ObjectId
) - this"_id"
must not have already been used for another file"filename"
: human name for the file"contentType"
or"content_type"
: valid mime-type for the file"chunkSize"
or"chunk_size"
: size of each of the chunks, in bytes (default: 255 kb)"encoding"
: encoding used for this file. In Python 2, anyunicode
that is written to the file will be converted to astr
. In Python 3, anystr
that is written to the file will be converted tobytes
.
Parameters: - root_collection: root collection to write to
- session (optional): a
ClientSession
to use for all commands - disable_md5 (optional): When True, an MD5 checksum will not be computed for the uploaded file. Useful in environments where MD5 cannot be used for regulatory or other reasons. Defaults to False.
- **kwargs (optional): file level options (see above)
Changed in version 3.6: Added
session
parameter.Changed in version 3.0: root_collection must use an acknowledged
write_concern
-
_id
¶ The
'_id'
value for this file.This attribute is read-only.
-
abort
()¶ Remove all chunks/files that may have been uploaded and close.
-
chunk_size
¶ Chunk size for this file.
This attribute is read-only.
-
close
()¶ Flush the file and close it.
A closed file cannot be written any more. Calling
close()
more than once is allowed.
-
closed
¶ Is this file closed?
-
content_type
¶ Mime-type for this file.
-
filename
¶ Name of this file.
-
length
¶ Length (in bytes) of this file.
This attribute is read-only and can only be read after
close()
has been called.
-
md5
¶ MD5 of the contents of this file if an md5 sum was created.
This attribute is read-only and can only be read after
close()
has been called.
-
name
¶ Alias for filename.
-
upload_date
¶ Date that this file was uploaded.
This attribute is read-only and can only be read after
close()
has been called.
-
write
(data)¶ Write data to the file. There is no return value.
data can be either a string of bytes or a file-like object (implementing
read()
). If the file has anencoding
attribute, data can also be aunicode
(str
in python 3) instance, which will be encoded asencoding
before being written.Due to buffering, the data may not actually be written to the database until the
close()
method is called. RaisesValueError
if this file is already closed. RaisesTypeError
if data is not an instance ofstr
(bytes
in python 3), a file-like object, or an instance ofunicode
(str
in python 3). Unicode data is only allowed if the file has anencoding
attribute.Parameters: - data: string of bytes or file-like object to be written to the file
-
writelines
(sequence)¶ Write a sequence of strings to the file.
Does not add seperators.
-
class
gridfs.grid_file.
GridOut
(root_collection, file_id=None, file_document=None, session=None)¶ Read a file from GridFS
Application developers should generally not need to instantiate this class directly - instead see the methods provided by
GridFS
.Either file_id or file_document must be specified, file_document will be given priority if present. Raises
TypeError
if root_collection is not an instance ofCollection
.Parameters: - root_collection: root collection to read from
- file_id (optional): value of
"_id"
for the file to read - file_document (optional): file document from root_collection.files
- session (optional): a
ClientSession
to use for all commands
Changed in version 3.8: For better performance and to better follow the GridFS spec,
GridOut
now uses a single cursor to read all the chunks in the file.Changed in version 3.6: Added
session
parameter.Changed in version 3.0: Creating a GridOut does not immediately retrieve the file metadata from the server. Metadata is fetched when first needed.
-
_id
¶ The
'_id'
value for this file.This attribute is read-only.
-
__iter__
()¶ Return an iterator over all of this file’s data.
The iterator will return chunk-sized instances of
str
(bytes
in python 3). This can be useful when serving files using a webserver that handles such an iterator efficiently.Note
This is different from
io.IOBase
which iterates over lines in the file. UseGridOut.readline()
to read line by line instead of chunk by chunk.Changed in version 3.8: The iterator now raises
CorruptGridFile
when encountering any truncated, missing, or extra chunk in a file. The previous behavior was to only raiseCorruptGridFile
on a missing chunk.
-
aliases
¶ List of aliases for this file.
This attribute is read-only.
-
chunk_size
¶ Chunk size for this file.
This attribute is read-only.
-
close
()¶ Make GridOut more generically file-like.
-
content_type
¶ Mime-type for this file.
This attribute is read-only.
-
filename
¶ Name of this file.
This attribute is read-only.
-
length
¶ Length (in bytes) of this file.
This attribute is read-only.
-
md5
¶ MD5 of the contents of this file if an md5 sum was created.
This attribute is read-only.
-
metadata
¶ Metadata attached to this file.
This attribute is read-only.
-
name
¶ Alias for filename.
This attribute is read-only.
-
read
(size=-1)¶ Read at most size bytes from the file (less if there isn’t enough data).
The bytes are returned as an instance of
str
(bytes
in python 3). If size is negative or omitted all data is read.Parameters: - size (optional): the number of bytes to read
Changed in version 3.8: This method now only checks for extra chunks after reading the entire file. Previously, this method would check for extra chunks on every call.
-
readchunk
()¶ Reads a chunk at a time. If the current position is within a chunk the remainder of the chunk is returned.
-
readline
(size=-1)¶ Read one line or up to size bytes from the file.
Parameters: - size (optional): the maximum number of bytes to read
-
seek
(pos, whence=0)¶ Set the current position of this file.
Parameters: - pos: the position (or offset if using relative positioning) to seek to
- whence (optional): where to seek
from.
os.SEEK_SET
(0
) for absolute file positioning,os.SEEK_CUR
(1
) to seek relative to the current position,os.SEEK_END
(2
) to seek relative to the file’s end.
-
tell
()¶ Return the current position of this file.
-
upload_date
¶ Date that this file was first uploaded.
This attribute is read-only.
-
class
gridfs.grid_file.
GridOutCursor
(collection, filter=None, skip=0, limit=0, no_cursor_timeout=False, sort=None, batch_size=0, session=None)¶ Create a new cursor, similar to the normal
Cursor
.Should not be called directly by application developers - see the
GridFS
methodfind()
instead.-
add_option
(*args, **kwargs)¶ Set arbitrary query flags using a bitmask.
To set the tailable flag: cursor.add_option(2)
-
next
()¶ Get next GridOut object from cursor.
-
remove_option
(*args, **kwargs)¶ Unset arbitrary query flags using a bitmask.
To unset the tailable flag: cursor.remove_option(2)
-
Tools¶
Many tools have been written for working with PyMongo. If you know of or have created a tool for working with MongoDB from Python please list it here.
Note
We try to keep this list current. As such, projects that have not been updated recently or appear to be unmaintained will occasionally be removed from the list or moved to the back (to keep the list from becoming too intimidating).
If a project gets removed that is still being developed or is in active use please let us know or add it back.
ORM-like Layers¶
Some people have found that they prefer to work with a layer that has more features than PyMongo provides. Often, things like models and validation are desired. To that end, several different ORM-like layers have been written by various authors.
It is our recommendation that new users begin by working directly with PyMongo, as described in the rest of this documentation. Many people have found that the features of PyMongo are enough for their needs. Even if you eventually come to the decision to use one of these layers, the time spent working directly with the driver will have increased your understanding of how MongoDB actually works.
- PyMODM
- PyMODM is an ORM-like framework on top of PyMongo. PyMODM is maintained by engineers at MongoDB, Inc. and is quick to adopt new MongoDB features. PyMODM is a “core” ODM, meaning that it provides simple, extensible functionality that can be leveraged by other libraries to target platforms like Django. At the same time, PyMODM is powerful enough to be used for developing applications on its own. Complete documentation is available on readthedocs in addition to a Gitter channel for discussing the project.
- Humongolus
- Humongolus is a lightweight ORM framework for Python and MongoDB. The name comes from the combination of MongoDB and Homunculus (the concept of a miniature though fully formed human body). Humongolus allows you to create models/schemas with robust validation. It attempts to be as pythonic as possible and exposes the pymongo cursor objects whenever possible. The code is available for download at GitHub. Tutorials and usage examples are also available at GitHub.
- MincePy
- MincePy is an object-document mapper (ODM) designed to make any Python object storable and queryable in a MongoDB database. It is designed with machine learning and big-data computational and experimental science applications in mind but is entirely general and can be useful to anyone looking to organise, share, or process large amounts data with as little change to their current workflow as possible.
- Ming
- Ming (the Merciless) is a library that allows you to enforce schemas on a MongoDB database in your Python application. It was developed by SourceForge in the course of their migration to MongoDB. See the introductory blog post for more details.
- MongoEngine
- MongoEngine is another ORM-like layer on top of PyMongo. It allows you to define schemas for documents and query collections using syntax inspired by the Django ORM. The code is available on GitHub; for more information, see the tutorial.
- MotorEngine
- MotorEngine is a port of MongoEngine to Motor, for asynchronous access with Tornado. It implements the same modeling APIs to be data-portable, meaning that a model defined in MongoEngine can be read in MotorEngine. The source is available on GitHub.
- uMongo
- uMongo is a Python MongoDB ODM. Its inception comes from two needs: the lack of async ODM and the difficulty to do document (un)serialization with existing ODMs. Works with multiple drivers: PyMongo, TxMongo, motor_asyncio, and mongomock. The source is available on GitHub
No longer maintained¶
- MongoKit
- The MongoKit framework is an ORM-like layer on top of PyMongo. There is also a MongoKit google group.
- MongoAlchemy
- MongoAlchemy is another ORM-like layer on top of PyMongo. Its API is inspired by SQLAlchemy. The code is available on GitHub; for more information, see the tutorial.
- Minimongo
- minimongo is a lightweight, pythonic interface to MongoDB. It retains pymongo’s query and update API, and provides a number of additional features, including a simple document-oriented interface, connection pooling, index management, and collection & database naming helpers. The source is on GitHub.
- Manga
- Manga aims to be a simpler ORM-like layer on top of PyMongo. The syntax for defining schema is inspired by the Django ORM, but Pymongo’s query language is maintained. The source is on GitHub.
Framework Tools¶
This section lists tools and adapters that have been designed to work with various Python frameworks and libraries.
- Djongo is a connector for using Django with MongoDB as the database backend. Use the Django Admin GUI to add and modify documents in MongoDB. The Djongo Source Code is hosted on GitHub and the Djongo package is on pypi.
- Django MongoDB Engine is a MongoDB database backend for Django that completely integrates with its ORM. For more information see the tutorial.
- mango provides MongoDB backends for
Django sessions and authentication (bypassing
django.db
entirely). - Django MongoEngine is a MongoDB backend for Django, an example:. For more information http://docs.mongoengine.org/en/latest/django.html
- mongodb_beaker is a project to enable using MongoDB as a backend for beaker’s caching / session system. The source is on GitHub.
- Log4Mongo is a flexible Python logging handler that can store logs in MongoDB using normal and capped collections.
- MongoLog is a Python logging handler that stores logs in MongoDB using a capped collection.
- c5t is a content-management system using TurboGears and MongoDB.
- rod.recipe.mongodb is a ZC Buildout recipe for downloading and installing MongoDB.
- repoze-what-plugins-mongodb is a project
working to support a plugin for using MongoDB as a backend for
repoze.what
. - mongobox is a tool to run a sandboxed MongoDB instance from within a python app.
- Flask-MongoAlchemy Add Flask support for MongoDB using MongoAlchemy.
- Flask-MongoKit Flask extension to better integrate MongoKit into Flask.
- Flask-PyMongo Flask-PyMongo bridges Flask and PyMongo.
Alternative Drivers¶
These are alternatives to PyMongo.
Contributors¶
The following is a list of people who have contributed to PyMongo. If you belong here and are missing please let us know (or send a pull request after adding yourself to the list):
- Mike Dirolf (mdirolf)
- Jeff Jenkins (jeffjenkins)
- Jim Jones
- Eliot Horowitz (erh)
- Michael Stephens (mikejs)
- Joakim Sernbrant (serbaut)
- Alexander Artemenko (svetlyak40wt)
- Mathias Stearn (RedBeard0531)
- Fajran Iman Rusadi (fajran)
- Brad Clements (bkc)
- Andrey Fedorov (andreyf)
- Joshua Roesslein (joshthecoder)
- Gregg Lind (gregglind)
- Michael Schurter (schmichael)
- Daniel Lundin
- Michael Richardson (mtrichardson)
- Dan McKinley (mcfunley)
- David Wolever (wolever)
- Carlos Valiente (carletes)
- Jehiah Czebotar (jehiah)
- Drew Perttula (drewp)
- Carl Baatz (c-w-b)
- Johan Bergstrom (jbergstroem)
- Jonas Haag (jonashaag)
- Kristina Chodorow (kchodorow)
- Andrew Sibley (sibsibsib)
- Flavio Percoco Premoli (FlaPer87)
- Ken Kurzweil (kurzweil)
- Christian Wyglendowski (dowski)
- James Murty (jmurty)
- Brendan W. McAdams (bwmcadams)
- Bernie Hackett (behackett)
- Reed O’Brien (reedobrien)
- Francisco Souza (fsouza)
- Alexey I. Froloff (raorn)
- Steve Lacy (slacy)
- Richard Shea (shearic)
- Vladimir Sidorenko (gearheart)
- Aaron Westendorf (awestendorf)
- Dan Crosta (dcrosta)
- Ryan Smith-Roberts (rmsr)
- David Pisoni (gefilte)
- Abhay Vardhan (abhayv)
- Alexey Borzenkov (snaury)
- Kostya Rybnikov (k-bx)
- A Jesse Jiryu Davis (ajdavis)
- Samuel Clay (samuelclay)
- Ross Lawley (rozza)
- Wouter Bolsterlee (wbolster)
- Alex Grönholm (agronholm)
- Christoph Simon (kalanzun)
- Chris Tompkinson (tompko)
- Mike O’Brien (mpobrien)
- T Dampier (dampier)
- Michael Henson (hensom)
- Craig Hobbs (craigahobbs)
- Emily Stolfo (estolfo)
- Sam Helman (shelman)
- Justin Patrin (reversefold)
- Xiuming Chen (cxmcc)
- Tyler Jones (thomascirca)
- Amalia Hawkins (hawka)
- Yuchen Ying (yegle)
- Kyle Erf (3rf)
- Luke Lovett (lovett89)
- Jaroslav Semančík (girogiro)
- Don Mitchell (dmitchell)
- Ximing (armnotstrong)
- Can Zhang (cannium)
- Sergey Azovskov (last-g)
- Heewa Barfchin (heewa)
- Anna Herlihy (aherlihy)
- Len Buckens (buckensl)
- ultrabug
- Shane Harvey (ShaneHarvey)
- Cao Siyang (caosiyang)
- Zhecong Kwok (gzcf)
- TaoBeier(tao12345666333)
- Jagrut Trivedi(Jagrut)
- Shrey Batra(shreybatra)
- Felipe Rodrigues(fbidu)
- Terence Honles (terencehonles)
- Paul Fisher (thetorpedodog)
- Julius Park (juliusgeo)
Changelog¶
Changes in Version 3.11.4¶
Issues Resolved¶
Version 3.11.4 fixes a bug where a MongoClient would mistakenly attempt to create minPoolSize connections to arbiter nodes (PYTHON-2634).
See the PyMongo 3.11.4 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.11.3¶
Issues Resolved¶
Version 3.11.3 fixes a bug that prevented PyMongo from retrying writes after
a writeConcernError
on MongoDB 4.4+ (PYTHON-2452)
See the PyMongo 3.11.3 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.11.2¶
Issues Resolved¶
Version 3.11.2 includes a number of bugfixes. Highlights include:
- Fixed a memory leak caused by failing SDAM monitor checks on Python 3 (PYTHON-2433).
- Fixed a regression that changed the string representation of
BulkWriteError
(PYTHON-2438). - Fixed a bug that made it impossible to use
bson.codec_options.CodecOptions.with_options()
andwith_options()
on some early versions of Python 3.4 and Python 3.5 due to a bug in the standard library implementation ofcollections.namedtuple._asdict()
(PYTHON-2440). - Fixed a bug that resulted in a
TypeError
exception when a PyOpenSSL socket was configured with a timeout ofNone
(PYTHON-2443).
See the PyMongo 3.11.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.11.1¶
Version 3.11.1 adds support for Python 3.9 and includes a number of bugfixes. Highlights include:
- Support for Python 3.9.
- Initial support for Azure and GCP KMS providers for client side field level
encryption is in beta. See the docstring for
MongoClient
,AutoEncryptionOpts
, andencryption
. Note: Backwards-breaking changes may be made before the final release. - Fixed a bug where the
bson.json_util.JSONOptions
API did not match thebson.codec_options.CodecOptions
API due to the absence of abson.json_util.JSONOptions.with_options()
method. This method has now been added. - Fixed a bug which made it impossible to serialize
BulkWriteError
instances usingpickle
. - Fixed a bug wherein PyMongo did not always discard an implicit session after encountering a network error.
- Fixed a bug where connections created in the background were not authenticated.
- Fixed a memory leak in the
bson
module when using aTypeRegistry
.
Issues Resolved¶
See the PyMongo 3.11.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.11.0¶
Version 3.11 adds support for MongoDB 4.4 and includes a number of bug fixes. Highlights include:
- Support for OCSP (Online Certificate Status Protocol).
- Support for PyOpenSSL as an alternative TLS implementation. PyOpenSSL is required for OCSP support. It will also be installed when using the “tls” extra if the version of Python in use is older than 2.7.9.
- Support for the MONGODB-AWS authentication mechanism.
- Support for the
directConnection
URI option and kwarg toMongoClient
. - Support for speculative authentication attempts in connection handshakes which reduces the number of network roundtrips needed to authenticate new connections on MongoDB 4.4+.
- Support for creating collections in multi-document transactions with
create_collection()
on MongoDB 4.4+. - Added index hinting support to the
replace_one()
,update_one()
,update_many()
,find_one_and_replace()
,find_one_and_update()
,delete_one()
,delete_many()
, andfind_one_and_delete()
commands. - Added index hinting support to the
ReplaceOne
,UpdateOne
,UpdateMany
,DeleteOne
, andDeleteMany
bulk operations. - Added support for
bson.binary.UuidRepresentation.UNSPECIFIED
andMongoClient(uuidRepresentation='unspecified')
which will become the default UUID representation starting in PyMongo 4.0. See Handling UUID Data for details. - Added the
background
parameter topymongo.database.Database.validate_collection()
. For a description of this parameter see the MongoDB documentation for the validate command. - Added the
allow_disk_use
parameters topymongo.collection.Collection.find()
. - Added the
hedge
parameter toPrimaryPreferred
,Secondary
,SecondaryPreferred
,Nearest
to support disabling (or explicitly enabling) hedged reads in MongoDB 4.4+. - Fixed a bug in change streams that could cause PyMongo to miss some change documents when resuming a stream that was started without a resume token and whose first batch did not contain any change documents.
- Fixed an bug where using gevent.Timeout to timeout an operation could lead to a deadlock.
Deprecations:
- Deprecated the
oplog_replay
parameter topymongo.collection.Collection.find()
. Starting in MongoDB 4.4, the server optimizes queries against the oplog collection without requiring the user to set this flag. - Deprecated
pymongo.collection.Collection.reindex()
. Usecommand()
to run thereIndex
command instead. - Deprecated
pymongo.mongo_client.MongoClient.fsync()
. Usecommand()
to run thefsync
command instead. - Deprecated
pymongo.mongo_client.MongoClient.unlock()
. Usecommand()
to run thefsyncUnlock
command instead. See the documentation for more information. - Deprecated
pymongo.mongo_client.MongoClient.is_locked
. Usecommand()
to run thecurrentOp
command instead. See the documentation for more information.
Unavoidable breaking changes:
GridFSBucket
andGridFS
do not support multi-document transactions. Running a GridFS operation in a transaction now always raises the following error:InvalidOperation: GridFS does not support multi-document transactions
Issues Resolved¶
See the PyMongo 3.11.0 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.10.1¶
Version 3.10.1 fixes the following issues discovered since the release of 3.10.0:
- Fix a TypeError logged to stderr that could be triggered during server
maintenance or during
pymongo.mongo_client.MongoClient.close()
. - Avoid creating new connections during
pymongo.mongo_client.MongoClient.close()
.
Issues Resolved¶
See the PyMongo 3.10.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.10.0¶
Version 3.10 includes a number of improvements and bug fixes. Highlights include:
- Support for Client-Side Field Level Encryption with MongoDB 4.2. See Client-Side Field Level Encryption for examples.
- Support for Python 3.8.
- Added
pymongo.client_session.ClientSession.in_transaction
. - Do not hold the Topology lock while creating connections in a MongoClient’s background thread. This change fixes a bug where application operations would block while the background thread ensures that all server pools have minPoolSize connections.
- Fix a UnicodeDecodeError bug when coercing a PyMongoError with a non-ascii error message to unicode on Python 2.
- Fix an edge case bug where PyMongo could exceed the server’s maxMessageSizeBytes when generating a compressed bulk write command.
Issues Resolved¶
See the PyMongo 3.10 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.9.0¶
Version 3.9 adds support for MongoDB 4.2. Highlights include:
Support for MongoDB 4.2 sharded transactions. Sharded transactions have the same API as replica set transactions. See Transactions.
New method
pymongo.client_session.ClientSession.with_transaction()
to support conveniently running a transaction in a session with automatic retries and at-most-once semantics.Initial support for client side field level encryption. See the docstring for
MongoClient
,AutoEncryptionOpts
, andencryption
for details. Note: Support for client side encryption is in beta. Backwards-breaking changes may be made before the final release.Added the
max_commit_time_ms
parameter tostart_transaction()
.Implement the URI options specification in the
MongoClient()
constructor. Consequently, there are a number of changes in connection options:- The
tlsInsecure
option has been added. - The
tls
option has been added. The olderssl
option has been retained as an alias to the newtls
option. wTimeout
has been deprecated in favor ofwTimeoutMS
.wTimeoutMS
now overrideswTimeout
if the user provides both.j
has been deprecated in favor ofjournal
.journal
now overridesj
if the user provides both.ssl_cert_reqs
has been deprecated in favor oftlsAllowInvalidCertificates
. Instead ofssl.CERT_NONE
,ssl.CERT_OPTIONAL
andssl.CERT_REQUIRED
, the new option expects a boolean value -True
is equivalent tossl.CERT_NONE
, whileFalse
is equivalent tossl.CERT_REQUIRED
.ssl_match_hostname
has been deprecated in favor oftlsAllowInvalidHostnames
.ssl_ca_certs
has been deprecated in favor oftlsCAFile
.ssl_certfile
has been deprecated in favor oftlsCertificateKeyFile
.ssl_pem_passphrase
has been deprecated in favor oftlsCertificateKeyFilePassword
.waitQueueMultiple
has been deprecated without replacement. This option was a poor solution for putting an upper bound on queuing since it didn’t affect queuing in other parts of the driver.
- The
The
retryWrites
URI option now defaults toTrue
. Supported write operations that fail with a retryable error will automatically be retried one time, with at-most-once semantics.Support for retryable reads and the
retryReads
URI option which is enabled by default. See theMongoClient
documentation for details. Now that supported operations are retried automatically and transparently, users should consider adjusting any custom retry logic to prevent an application from inadvertently retrying for too long.Support zstandard for wire protocol compression.
Support for periodically polling DNS SRV records to update the mongos proxy list without having to change client configuration.
New method
pymongo.database.Database.aggregate()
to support running database level aggregations.Support for publishing Connection Monitoring and Pooling events via the new
ConnectionPoolListener
class. Seemonitoring
for an example.pymongo.collection.Collection.aggregate()
andpymongo.database.Database.aggregate()
now support the$merge
pipeline stage and use read preferencePRIMARY
if the$out
or$merge
pipeline stages are used.Support for specifying a pipeline or document in
update_one()
,update_many()
,find_one_and_update()
,UpdateOne()
, andUpdateMany()
.Binary
now supports any bytes-like type that implements the buffer protocol.Resume tokens can now be accessed from a
ChangeStream
cursor using theresume_token
attribute.Connections now survive primary step-down when using MongoDB 4.2+. Applications should expect less socket connection turnover during replica set elections.
Unavoidable breaking changes:
- Applications that use MongoDB with the MMAPv1 storage engine must now
explicitly disable retryable writes via the connection string
(e.g.
MongoClient("mongodb://my.mongodb.cluster/db?retryWrites=false")
) or theMongoClient
constructor’s keyword argument (e.g.MongoClient("mongodb://my.mongodb.cluster/db", retryWrites=False)
) to avoid running intoOperationFailure
exceptions during write operations. The MMAPv1 storage engine is deprecated and does not support retryable writes which are now turned on by default. - In order to ensure that the
connectTimeoutMS
URI option is honored when connecting to clusters with amongodb+srv://
connection string, the minimum required version of the optionaldnspython
dependency has been bumped to 1.16.0. This is a breaking change for applications that use PyMongo’s SRV support with a version ofdnspython
older than 1.16.0.
Issues Resolved¶
See the PyMongo 3.9 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.8.0¶
Warning
PyMongo no longer supports Python 2.6. RHEL 6 users should install Python 2.7 or newer from Red Hat Software Collections. CentOS 6 users should install Python 2.7 or newer from SCL
Warning
PyMongo no longer supports PyPy3 versions older than 3.5. Users must upgrade to PyPy3.5+.
ObjectId
now implements the ObjectID specification version 0.2.- For better performance and to better follow the GridFS spec,
GridOut
now uses a single cursor to read all the chunks in the file. Previously, each chunk in the file was queried individually usingfind_one()
. gridfs.grid_file.GridOut.read()
now only checks for extra chunks after reading the entire file. Previously, this method would check for extra chunks on every call.current_op()
now always uses theDatabase
’scodec_options
when decoding the command response. Previously the codec_options was only used when the MongoDB server version was <= 3.0.- Undeprecated
get_default_database()
and added thedefault
parameter. - TLS Renegotiation is now disabled when possible.
- Custom types can now be directly encoded to, and decoded from MongoDB using
the
TypeCodec
andTypeRegistry
APIs. For more information, see the custom type example. - Attempting a multi-document transaction on a sharded cluster now raises a
ConfigurationError
. pymongo.cursor.Cursor.distinct()
andpymongo.cursor.Cursor.count()
now send the Cursor’scomment()
as the “comment” top-level command option instead of “$comment”. Also, note that “comment” must be a string.- Add the
filter
parameter tolist_collection_names()
. - Changes can now be requested from a
ChangeStream
cursor without blocking indefinitely using the newpymongo.change_stream.ChangeStream.try_next()
method. - Fixed a reference leak bug when splitting a batched write command based on maxWriteBatchSize or the max message size.
- Deprecated running find queries that set
min()
and/ormax()
but do not also set ahint()
of which index to use. The find command is expected to require ahint()
when using min/max starting in MongoDB 4.2. - Documented support for the uuidRepresentation URI option, which has been supported since PyMongo 2.7. Valid values are pythonLegacy (the default), javaLegacy, csharpLegacy and standard. New applications should consider setting this to standard for cross language compatibility.
RawBSONDocument
now validates that thebson_bytes
passed in represent a single bson document. Earlier versions would mistakenly accept multiple bson documents.- Iterating over a
RawBSONDocument
now maintains the same field order of the underlying raw BSON document. - Applications can now register a custom server selector. For more information see the server selector example.
- The connection pool now implements a LIFO policy.
Unavoidable breaking changes:
- In order to follow the ObjectID Spec version 0.2, an ObjectId’s 3-byte machine identifier and 2-byte process id have been replaced with a single 5-byte random value generated per process. This is a breaking change for any application that attempts to interpret those bytes.
Issues Resolved¶
See the PyMongo 3.8 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.7.2¶
Version 3.7.2 fixes a few issues discovered since the release of 3.7.1.
- Fixed a bug in retryable writes where a previous command’s “txnNumber” field could be sent leading to incorrect results.
- Fixed a memory leak of a few bytes on some insert, update, or delete commands when running against MongoDB 3.6+.
- Fixed a bug that caused
pymongo.collection.Collection.ensure_index()
to only cache a single index per database. - Updated the documentation examples to use
pymongo.collection.Collection.count_documents()
instead ofpymongo.collection.Collection.count()
andpymongo.cursor.Cursor.count()
.
Issues Resolved¶
See the PyMongo 3.7.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.7.1¶
Version 3.7.1 fixes a few issues discovered since the release of 3.7.0.
- Calling
authenticate()
more than once with the same credentials results in OperationFailure. - Authentication fails when SCRAM-SHA-1 is used to authenticate users with only MONGODB-CR credentials.
- A millisecond rounding problem when decoding datetimes in the pure Python BSON decoder on 32 bit systems and AWS lambda.
Issues Resolved¶
See the PyMongo 3.7.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.7.0¶
Version 3.7 adds support for MongoDB 4.0. Highlights include:
- Support for single replica set multi-document ACID transactions. See Transactions.
- Support for wire protocol compression. See the
MongoClient()
documentation for details. - Support for Python 3.7.
- New count methods,
count_documents()
andestimated_document_count()
.count_documents()
is always accurate when used with MongoDB 3.6+, or when used with older standalone or replica set deployments. With older sharded clusters is it always accurate when used with Primary read preference. It can also be used in a transaction, unlike the now deprecatedpymongo.collection.Collection.count()
andpymongo.cursor.Cursor.count()
methods. - Support for watching changes on all collections in a database using the
new
pymongo.database.Database.watch()
method. - Support for watching changes on all collections in all databases using the
new
pymongo.mongo_client.MongoClient.watch()
method. - Support for watching changes starting at a user provided timestamp using the
new
start_at_operation_time
parameter for thewatch()
helpers. - Better support for using PyMongo in a FIPS 140-2 environment. Specifically,
the following features and changes allow PyMongo to function when MD5 support
is disabled in OpenSSL by the FIPS Object Module:
- Support for the SCRAM-SHA-256 authentication mechanism. The GSSAPI, PLAIN, and MONGODB-X509 mechanisms can also be used to avoid issues with OpenSSL in FIPS environments.
- MD5 checksums are now optional in GridFS. See the disable_md5 option
of
GridFS
andGridFSBucket
. ObjectId
machine bytes are now hashed using FNV-1a instead of MD5.
- The
list_collection_names()
andcollection_names()
methods use the nameOnly option when supported by MongoDB. - The
pymongo.collection.Collection.watch()
method now returns an instance of theCollectionChangeStream
class which is a subclass ofChangeStream
. - SCRAM client and server keys are cached for improved performance, following RFC 5802.
- If not specified, the authSource for the PLAIN authentication mechanism defaults to $external.
- wtimeoutMS is once again supported as a URI option.
- When using unacknowledged write concern and connected to MongoDB server
version 3.6 or greater, the bypass_document_validation option is now
supported in the following write helpers:
insert_one()
,replace_one()
,update_one()
,update_many()
.
Deprecations:
- Deprecated
pymongo.collection.Collection.count()
andpymongo.cursor.Cursor.count()
. These two methods use the count command and may or may not be accurate, depending on the options used and connected MongoDB topology. Usecount_documents()
instead. - Deprecated the snapshot option of
find()
andfind_one()
. The option was deprecated in MongoDB 3.6 and removed in MongoDB 4.0. - Deprecated the max_scan option of
find()
andfind_one()
. The option was deprecated in MongoDB 4.0. Use maxTimeMS instead. - Deprecated
close_cursor()
. Useclose()
instead. - Deprecated
database_names()
. Uselist_database_names()
instead. - Deprecated
collection_names()
. Uselist_collection_names()
instead. - Deprecated
parallel_scan()
. MongoDB 4.2 will remove the parallelCollectionScan command.
Unavoidable breaking changes:
- Commands that fail with server error codes 10107, 13435, 13436, 11600,
11602, 189, 91 (NotMaster, NotMasterNoSlaveOk, NotMasterOrSecondary,
InterruptedAtShutdown, InterruptedDueToReplStateChange,
PrimarySteppedDown, ShutdownInProgress respectively) now always raise
NotMasterError
instead ofOperationFailure
. parallel_scan()
no longer uses an implicit session. Explicit sessions are still supported.- Unacknowledged writes (
w=0
) with an explicitsession
parameter now raise a client side error. Since PyMongo does not wait for a response for an unacknowledged write, two unacknowledged writes run serially by the client may be executed simultaneously on the server. However, the server requires a single session must not be used simultaneously by more than one operation. Therefore explicit sessions cannot support unacknowledged writes. Unacknowledged writes without asession
parameter are still supported.
Issues Resolved¶
See the PyMongo 3.7 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.6.1¶
Version 3.6.1 fixes bugs reported since the release of 3.6.0:
- Fix regression in PyMongo 3.5.0 that causes idle sockets to be closed almost
instantly when
maxIdleTimeMS
is set. Idle sockets are now closed aftermaxIdleTimeMS
milliseconds. pymongo.mongo_client.MongoClient.max_idle_time_ms
now returns milliseconds instead of seconds.- Properly import and use the monotonic library for monotonic time when it is installed.
aggregate()
now ignores thebatchSize
argument when running a pipeline with a$out
stage.- Always send handshake metadata for new connections.
Issues Resolved¶
See the PyMongo 3.6.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.6.0¶
Version 3.6 adds support for MongoDB 3.6, drops support for CPython 3.3 (PyPy3
is still supported), and drops support for MongoDB versions older than 2.6. If
connecting to a MongoDB 2.4 server or older, PyMongo now throws a
ConfigurationError
.
Highlights include:
- Support for change streams. See the
watch()
method for details. - Support for array_filters in
update_one()
,update_many()
,find_one_and_update()
,UpdateOne()
, andUpdateMany()
. - New Session API, see
start_session()
. - New methods
find_raw_batches()
andaggregate_raw_batches()
for use with external libraries that can parse raw batches of BSON data. - New methods
list_databases()
andlist_database_names()
. - New methods
list_collections()
andlist_collection_names()
. - Support for mongodb+srv:// URIs. See
MongoClient
for details. - Index management helpers
(
create_index()
,create_indexes()
,drop_index()
,drop_indexes()
,reindex()
) now support maxTimeMS. - Support for retryable writes and the
retryWrites
URI option. SeeMongoClient
for details.
Deprecations:
- The useCursor option for
aggregate()
is deprecated. The option was only necessary when upgrading from MongoDB 2.4 to MongoDB 2.6. MongoDB 2.4 is no longer supported. - The
add_user()
andremove_user()
methods are deprecated. See the method docstrings for alternatives.
Unavoidable breaking changes:
- Starting in MongoDB 3.6, the deprecated methods
authenticate()
andlogout()
now invalidate all cursors created prior. Instead of using these methods to change credentials, pass credentials for one user to theMongoClient
at construction time, and either grant access to several databases to one user account, or use a distinct client object for each user. - BSON binary subtype 4 is decoded using RFC-4122 byte order regardless
of the UUID representation. This is a change in behavior for applications
that use UUID representation
bson.binary.JAVA_LEGACY
orbson.binary.CSHARP_LEGACY
to decode BSON binary subtype 4. Other UUID representations,bson.binary.PYTHON_LEGACY
(the default) andbson.binary.STANDARD
, and the decoding of BSON binary subtype 3 are unchanged.
Issues Resolved¶
See the PyMongo 3.6 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.5.1¶
Version 3.5.1 fixes bugs reported since the release of 3.5.0:
- Work around socket.getsockopt issue with NetBSD.
pymongo.command_cursor.CommandCursor.close()
now closes the cursor synchronously instead of deferring to a background thread.- Fix documentation build warnings with Sphinx 1.6.x.
Issues Resolved¶
See the PyMongo 3.5.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.5¶
Version 3.5 implements a number of improvements and bug fixes:
Highlights include:
- Username and password can be passed to
MongoClient
as keyword arguments. Before, the only way to pass them was in the URI. - Increased the performance of using
RawBSONDocument
. - Increased the performance of
database_names()
by using the nameOnly option for listDatabases when available. - Increased the performance of
bulk_write()
by reducing the memory overhead ofInsertOne
,DeleteOne
, andDeleteMany
. - Added the collation option to
DeleteOne
,DeleteMany
,ReplaceOne
,UpdateOne
, andUpdateMany
. - Implemented the MongoDB Extended JSON specification.
Decimal128
now works when cdecimal is installed.- PyMongo is now tested against a wider array of operating systems and CPU architectures (including s390x, ARM64, and POWER8).
Changes and Deprecations:
find()
has new options return_key, show_record_id, snapshot, hint, max_time_ms, max_scan, min, max, and comment. Deprecated the option modifiers.- Deprecated
group()
. The group command was deprecated in MongoDB 3.4 and is expected to be removed in MongoDB 3.6. Applications should useaggregate()
with the $group pipeline stage instead. - Deprecated
authenticate()
. Authenticating multiple users conflicts with support for logical sessions in MongoDB 3.6. To authenticate as multiple users, create multiple instances ofMongoClient
. - Deprecated
eval()
. The eval command was deprecated in MongoDB 3.0 and will be removed in a future server version. - Deprecated
SystemJS
. - Deprecated
get_default_database()
. Applications should useget_database()
without the name parameter instead. - Deprecated the MongoClient option socketKeepAlive. It now defaults to true and disabling it is not recommended, see does TCP keepalive time affect MongoDB Deployments?
- Deprecated
initialize_ordered_bulk_op()
,initialize_unordered_bulk_op()
, andBulkOperationBuilder
. Usebulk_write()
instead. - Deprecated
STRICT_JSON_OPTIONS
. UseRELAXED_JSON_OPTIONS
orCANONICAL_JSON_OPTIONS
instead. - If a custom
CodecOptions
is passed toRawBSONDocument
, its document_class must beRawBSONDocument
. list_indexes()
no longer raises OperationFailure when the collection (or database) does not exist on MongoDB >= 3.0. Instead, it returns an emptyCommandCursor
to make the behavior consistent across all MongoDB versions.- In Python 3,
loads()
now automatically decodes JSON $binary with a subtype of 0 intobytes
instead ofBinary
. See the Python 3 FAQ for more details. loads()
now raisesTypeError
orValueError
when parsing JSON type wrappers with values of the wrong type or any extra keys.pymongo.cursor.Cursor.close()
andpymongo.mongo_client.MongoClient.close()
now kill cursors synchronously instead of deferring to a background thread.parse_uri()
now returns the original value of thereadPreference
MongoDB URI option instead of the validated read preference mode.
Issues Resolved¶
See the PyMongo 3.5 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.4¶
Version 3.4 implements the new server features introduced in MongoDB 3.4 and a whole lot more:
Highlights include:
- Complete support for MongoDB 3.4:
- Unicode aware string comparison using Collations.
- Support for the new
Decimal128
BSON type. - A new maxStalenessSeconds read preference option.
- A username is no longer required for the MONGODB-X509 authentication mechanism when connected to MongoDB >= 3.4.
parallel_scan()
supports maxTimeMS.WriteConcern
is automatically applied by all helpers for commands that write to the database when connected to MongoDB 3.4+. This change affects the following helpers:drop_database()
create_collection()
drop_collection()
aggregate()
(when using $out)create_indexes()
create_index()
drop_indexes()
drop_indexes()
drop_index()
map_reduce()
(when output is not “inline”)reindex()
rename()
- Improved support for logging server discovery and monitoring events. See
monitoring
for examples. - Support for matching iPAddress subjectAltName values for TLS certificate verification.
- TLS compression is now explicitly disabled when possible.
- The Server Name Indication (SNI) TLS extension is used when possible.
- Finer control over JSON encoding/decoding with
JSONOptions
. - Allow
Code
objects to have a scope ofNone
, signifying no scope. Also allow encoding Code objects with an empty scope (i.e.{}
).
Warning
Starting in PyMongo 3.4, bson.code.Code.scope
may return
None
, as the default scope is None
instead of {}
.
Note
PyMongo 3.4+ attempts to create sockets non-inheritable when possible (i.e. it sets the close-on-exec flag on socket file descriptors). Support is limited to a subset of POSIX operating systems (not including Windows) and the flag usually cannot be set in a single atomic operation. CPython 3.4+ implements PEP 446, creating all file descriptors non-inheritable by default. Users that require this behavior are encouraged to upgrade to CPython 3.4+.
Since 3.4rc0, the max staleness option has been renamed from maxStalenessMS
to maxStalenessSeconds
, its smallest value has changed from twice
heartbeatFrequencyMS
to 90 seconds, and its default value has changed from
None
or 0 to -1.
Issues Resolved¶
See the PyMongo 3.4 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.3.1¶
Version 3.3.1 fixes a memory leak when decoding elements inside of a
RawBSONDocument
.
Issues Resolved¶
See the PyMongo 3.3.1 release notes in Jira for the list of resolved issues in this release.
Changes in Version 3.3¶
Version 3.3 adds the following major new features:
- C extensions support on big endian systems.
- Kerberos authentication support on Windows using WinKerberos.
- A new
ssl_clrfile
option to support certificate revocation lists. - A new
ssl_pem_passphrase
option to support encrypted key files. - Support for publishing server discovery and monitoring events. See
monitoring
for details. - New connection pool options
minPoolSize
andmaxIdleTimeMS
. - New
heartbeatFrequencyMS
option controls the rate at which background monitoring threads re-check servers. Default is once every 10 seconds.
Warning
PyMongo 3.3 drops support for MongoDB versions older than 2.4. It also drops support for python 3.2 (pypy3 continues to be supported).
Issues Resolved¶
See the PyMongo 3.3 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.2.2¶
Version 3.2.2 fixes a few issues reported since the release of 3.2.1, including a fix for using the connect option in the MongoDB URI and support for setting the batch size for a query to 1 when using MongoDB 3.2+.
Issues Resolved¶
See the PyMongo 3.2.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.2.1¶
Version 3.2.1 fixes a few issues reported since the release of 3.2, including
running the mapreduce command twice when calling the
inline_map_reduce()
method and a
TypeError
being raised when calling
download_to_stream()
. This release also
improves error messaging around BSON decoding.
Issues Resolved¶
See the PyMongo 3.2.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.2¶
Version 3.2 implements the new server features introduced in MongoDB 3.2.
Highlights include:
- Full support for MongoDB 3.2 including:
- Support for
ReadConcern
WriteConcern
is now applied tofind_one_and_replace()
,find_one_and_update()
, andfind_one_and_delete()
.- Support for the new bypassDocumentValidation option in write helpers.
- Support for
- Support for reading and writing raw BSON with
RawBSONDocument
Note
Certain MongoClient
properties now
block until a connection is established or raise
ServerSelectionTimeoutError
if no server is available.
See MongoClient
for details.
Issues Resolved¶
See the PyMongo 3.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.1.1¶
Version 3.1.1 fixes a few issues reported since the release of 3.1, including a regression in error handling for oversize command documents and interrupt handling issues in the C extensions.
Issues Resolved¶
See the PyMongo 3.1.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.1¶
Version 3.1 implements a few new features and fixes bugs reported since the release of 3.0.3.
Highlights include:
- Command monitoring support. See
monitoring
for details. - Configurable error handling for
UnicodeDecodeError
. See the unicode_decode_error_handler option ofCodecOptions
. - Optional automatic timezone conversion when decoding BSON datetime. See the
tzinfo option of
CodecOptions
. - An implementation of
GridFSBucket
from the new GridFS spec. - Compliance with the new Connection String spec.
- Reduced idle CPU usage in Python 2.
Changes in internal classes¶
The private PeriodicExecutor
class no longer takes a condition_class
option, and the private thread_util.Event
class is removed.
Issues Resolved¶
See the PyMongo 3.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.0.3¶
Version 3.0.3 fixes issues reported since the release of 3.0.2, including a feature breaking bug in the GSSAPI implementation.
Issues Resolved¶
See the PyMongo 3.0.3 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.0.2¶
Version 3.0.2 fixes issues reported since the release of 3.0.1, most
importantly a bug that could route operations to replica set members
that are not in primary or secondary state when using
PrimaryPreferred
or
Nearest
. It is a recommended upgrade for
all users of PyMongo 3.0.x.
Issues Resolved¶
See the PyMongo 3.0.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.0.1¶
Version 3.0.1 fixes issues reported since the release of 3.0, most importantly a bug in GridFS.delete that could prevent file chunks from actually being deleted.
Issues Resolved¶
See the PyMongo 3.0.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 3.0¶
PyMongo 3.0 is a partial rewrite of PyMongo bringing a large number of improvements:
- A unified client class. MongoClient is the one and only client class for connecting to a standalone mongod, replica set, or sharded cluster. Migrating from a standalone, to a replica set, to a sharded cluster can be accomplished with only a simple URI change.
- MongoClient is much more responsive to configuration changes in your MongoDB deployment. All connected servers are monitored in a non-blocking manner. Slow to respond or down servers no longer block server discovery, reducing application startup time and time to respond to new or reconfigured servers and replica set failovers.
- A unified CRUD API. All official MongoDB drivers now implement a standard CRUD API allowing polyglot developers to move from language to language with ease.
- Single source support for Python 2.x and 3.x. PyMongo no longer relies on 2to3 to support Python 3.
- A rewritten pure Python BSON implementation, improving performance with pypy and cpython deployments without support for C extensions.
- Better support for greenlet based async frameworks including eventlet.
- Immutable client, database, and collection classes, avoiding a host of thread safety issues in client applications.
PyMongo 3.0 brings a large number of API changes. Be sure to read the changes listed below before upgrading from PyMongo 2.x.
Warning
PyMongo no longer supports Python 2.4, 2.5, or 3.1. If you must use PyMongo with these versions of Python the 2.x branch of PyMongo will be minimally supported for some time.
SONManipulator changes¶
The SONManipulator
API has limitations as a
technique for transforming your data. Instead, it is more flexible and
straightforward to transform outgoing documents in your own code before passing
them to PyMongo, and transform incoming documents after receiving them from
PyMongo.
Thus the add_son_manipulator()
method is
deprecated. PyMongo 3’s new CRUD API does not apply SON manipulators to
documents passed to bulk_write()
,
insert_one()
,
insert_many()
,
update_one()
, or
update_many()
. SON manipulators are not
applied to documents returned by the new methods
find_one_and_delete()
,
find_one_and_replace()
, and
find_one_and_update()
.
SSL/TLS changes¶
When ssl is True
the ssl_cert_reqs option now defaults to
ssl.CERT_REQUIRED
if not provided. PyMongo will attempt to load OS
provided CA certificates to verify the server, raising
ConfigurationError
if it cannot.
Gevent Support¶
In previous versions, PyMongo supported Gevent in two modes: you could call
gevent.monkey.patch_socket()
and pass use_greenlets=True
to
MongoClient
, or you could simply call
gevent.monkey.patch_all()
and omit the use_greenlets
argument.
In PyMongo 3.0, the use_greenlets
option is gone. To use PyMongo with
Gevent simply call gevent.monkey.patch_all()
.
For more information, see PyMongo’s Gevent documentation.
MongoClient
changes¶
MongoClient
is now the one and only
client class for a standalone server, mongos, or replica set.
It includes the functionality that had been split into
MongoReplicaSetClient
: it can connect to a replica set, discover all its
members, and monitor the set for stepdowns, elections, and reconfigs.
MongoClient
now also supports the full
ReadPreference
API.
The obsolete classes MasterSlaveConnection
, Connection
, and
ReplicaSetConnection
are removed.
The MongoClient
constructor no
longer blocks while connecting to the server or servers, and it no
longer raises ConnectionFailure
if they
are unavailable, nor ConfigurationError
if the user’s credentials are wrong. Instead, the constructor
returns immediately and launches the connection process on
background threads. The connect
option is added to control whether
these threads are started immediately, or when the client is first used.
Therefore the alive
method is removed since it no longer provides meaningful
information; even if the client is disconnected, it may discover a server in
time to fulfill the next operation.
In PyMongo 2.x, MongoClient
accepted a list of
standalone MongoDB servers and used the first it could connect to:
MongoClient(['host1.com:27017', 'host2.com:27017'])
A list of multiple standalones is no longer supported; if multiple servers are listed they must be members of the same replica set, or mongoses in the same sharded cluster.
The behavior for a list of mongoses is changed from “high availability” to “load balancing”. Before, the client connected to the lowest-latency mongos in the list, and used it until a network error prompted it to re-evaluate all mongoses’ latencies and reconnect to one of them. In PyMongo 3, the client monitors its network latency to all the mongoses continuously, and distributes operations evenly among those with the lowest latency. See mongos Load Balancing for more information.
The client methods start_request
, in_request
, and end_request
are removed, and so is the auto_start_request
option. Requests were
designed to make read-your-writes consistency more likely with the w=0
write concern. Additionally, a thread in a request used the same member for
all secondary reads in a replica set. To ensure read-your-writes consistency
in PyMongo 3.0, do not override the default write concern with w=0
, and
do not override the default read preference of
PRIMARY.
Support for the slaveOk
(or slave_okay
), safe
, and
network_timeout
options has been removed. Use
SECONDARY_PREFERRED
instead of
slave_okay. Accept the default write concern, acknowledged writes, instead of
setting safe=True. Use socketTimeoutMS in place of network_timeout (note that
network_timeout was in seconds, where as socketTimeoutMS is milliseconds).
The max_pool_size
option has been removed. It is replaced by the
maxPoolSize
MongoDB URI option. maxPoolSize
is now a supported URI
option in PyMongo and can be passed as a keyword argument.
The copy_database
method is removed, see the
copy_database examples for alternatives.
The disconnect
method is removed. Use
close()
instead.
The get_document_class
method is removed. Use
codec_options
instead.
The get_lasterror_options
, set_lasterror_options
, and
unset_lasterror_options
methods are removed. Write concern options
can be passed to MongoClient
as keyword
arguments or MongoDB URI options.
The get_database()
method is added for
getting a Database instance with its options configured differently than the
MongoClient’s.
The following read-only attributes have been added:
The following attributes are now read-only:
The following attributes have been removed:
document_class
(usecodec_options
instead)host
(useaddress
instead)min_wire_version
max_wire_version
port
(useaddress
instead)safe
(usewrite_concern
instead)slave_okay
(useread_preference
instead)tag_sets
(useread_preference
instead)tz_aware
(usecodec_options
instead)
The following attributes have been renamed:
secondary_acceptable_latency_ms
is nowlocal_threshold_ms
and is now read-only.
Cursor management changes¶
CursorManager
and
set_cursor_manager()
are no longer
deprecated. If you subclass CursorManager
your implementation of close()
must now take a second parameter, address. The BatchCursorManager
class
is removed.
The second parameter to close_cursor()
is renamed from _conn_id
to address
.
kill_cursors()
now accepts an address
parameter.
Database
changes¶
The connection
property is renamed to
client
.
The following read-only attributes have been added:
The following attributes are now read-only:
Use get_database()
for getting a
Database instance with its options configured differently than the
MongoClient’s.
The following attributes have been removed:
safe
secondary_acceptable_latency_ms
slave_okay
tag_sets
The following methods have been added:
The following methods have been changed:
command()
. Support for as_class, uuid_subtype, tag_sets, and secondary_acceptable_latency_ms have been removed. You can instead pass an instance ofCodecOptions
as codec_options and an instance of a read preference class fromread_preferences
as read_preference. The fields and compile_re options are also removed. The fields options was undocumented and never really worked. Regular expressions are always decoded toRegex
.
The following methods have been deprecated:
The following methods have been removed:
The get_lasterror_options
, set_lasterror_options
, and
unset_lasterror_options
methods have been removed. Use
WriteConcern
with
get_database()
instead.
Collection
changes¶
The following read-only attributes have been added:
The following attributes are now read-only:
Use get_collection()
or
with_options()
for getting a Collection
instance with its options configured differently than the Database’s.
The following attributes have been removed:
safe
secondary_acceptable_latency_ms
slave_okay
tag_sets
The following methods have been added:
bulk_write()
insert_one()
insert_many()
update_one()
update_many()
replace_one()
delete_one()
delete_many()
find_one_and_delete()
find_one_and_replace()
find_one_and_update()
with_options()
create_indexes()
list_indexes()
The following methods have changed:
aggregate()
now always returns an instance ofCommandCursor
. See the documentation for all options.count()
now optionally takes a filter argument, as well as other options supported by the count command.distinct()
now optionally takes a filter argument.create_index()
no longer caches indexes, therefore the cache_for parameter has been removed. It also no longer supports the bucket_size and drop_dups aliases for bucketSize and dropDups.
The following methods are deprecated:
The following methods have been removed:
The get_lasterror_options
, set_lasterror_options
, and
unset_lasterror_options
methods have been removed. Use
WriteConcern
with
with_options()
instead.
Changes to find()
and find_one()
¶
The following find/find_one options have been renamed:
These renames only affect your code if you passed these as keyword arguments, like find(fields=[‘fieldname’]). If you passed only positional parameters these changes are not significant for your application.
- spec -> filter
- fields -> projection
- partial -> allow_partial_results
The following find/find_one options have been added:
- cursor_type (see
CursorType
for values) - oplog_replay
- modifiers
The following find/find_one options have been removed:
- network_timeout (use
max_time_ms()
instead) - slave_okay (use one of the read preference classes from
read_preferences
andwith_options()
instead) - read_preference (use
with_options()
instead) - tag_sets (use one of the read preference classes from
read_preferences
andwith_options()
instead) - secondary_acceptable_latency_ms (use the localThresholdMS URI option instead)
- max_scan (use the new modifiers option instead)
- snapshot (use the new modifiers option instead)
- tailable (use the new cursor_type option instead)
- await_data (use the new cursor_type option instead)
- exhaust (use the new cursor_type option instead)
- as_class (use
with_options()
withCodecOptions
instead) - compile_re (BSON regular expressions are always decoded to
Regex
)
The following find/find_one options are deprecated:
- manipulate
The following renames need special handling.
- timeout -> no_cursor_timeout - The default for timeout was True. The default for no_cursor_timeout is False. If you were previously passing False for timeout you must pass True for no_cursor_timeout to keep the previous behavior.
gridfs
changes¶
Since PyMongo 1.6, methods open
and close
of GridFS
raised an UnsupportedAPI
exception, as did the entire GridFile
class.
The unsupported methods, the class, and the exception are all deleted.
bson
changes¶
The compile_re option is removed from all methods
that accepted it in bson
and json_util
. Additionally, it
is removed from find()
,
find_one()
,
aggregate()
,
command()
, and so on.
PyMongo now always represents BSON regular expressions as
Regex
objects. This prevents errors for incompatible
patterns, see PYTHON-500. Use try_compile()
to
attempt to convert from a BSON regular expression to a Python regular
expression object.
PyMongo now decodes the int64 BSON type to Int64
, a
trivial wrapper around long (in python 2.x) or int (in python 3.x). This
allows BSON int64 to be round tripped without losing type information in
python 3. Note that if you store a python long (or a python int larger than
4 bytes) it will be returned from PyMongo as Int64
.
The as_class, tz_aware, and uuid_subtype options are removed from all
BSON encoding and decoding methods. Use
CodecOptions
to configure these options. The
APIs affected are:
This is a breaking change for any application that uses the BSON API directly and changes any of the named parameter defaults. No changes are required for applications that use the default values for these options. The behavior remains the same.
Issues Resolved¶
See the PyMongo 3.0 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.9.5¶
Version 2.9.5 works around ssl module deprecations in Python 3.6, and expected future ssl module deprecations. It also fixes bugs found since the release of 2.9.4.
- Use ssl.SSLContext and ssl.PROTOCOL_TLS_CLIENT when available.
- Fixed a C extensions build issue when the interpreter was built with -std=c99
- Fixed various build issues with MinGW32.
- Fixed a write concern bug in
add_user()
andremove_user()
when connected to MongoDB 3.2+ - Fixed various test failures related to changes in gevent, MongoDB, and our CI test environment.
Issues Resolved¶
See the PyMongo 2.9.5 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.9.4¶
Version 2.9.4 fixes issues reported since the release of 2.9.3.
- Fixed __repr__ for closed instances of
MongoClient
. - Fixed
MongoReplicaSetClient
handling of uuidRepresentation. - Fixed building and testing the documentation with python 3.x.
- New documentation for TLS/SSL and PyMongo and Using PyMongo with MongoDB Atlas.
Issues Resolved¶
See the PyMongo 2.9.4 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.9.3¶
Version 2.9.3 fixes a few issues reported since the release of 2.9.2 including
thread safety issues in ensure_index()
,
drop_index()
, and
drop_indexes()
.
Issues Resolved¶
See the PyMongo 2.9.3 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.9.2¶
Version 2.9.2 restores Python 3.1 support, which was broken in PyMongo 2.8. It
improves an error message when decoding BSON as well as fixes a couple other
issues including aggregate()
ignoring
codec_options
and
command()
raising a superfluous
DeprecationWarning.
Issues Resolved¶
See the PyMongo 2.9.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.9.1¶
Version 2.9.1 fixes two interrupt handling issues in the C extensions and adapts a test case for a behavior change in MongoDB 3.2.
Issues Resolved¶
See the PyMongo 2.9.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.9¶
Version 2.9 provides an upgrade path to PyMongo 3.x. Most of the API changes from PyMongo 3.0 have been backported in a backward compatible way, allowing applications to be written against PyMongo >= 2.9, rather then PyMongo 2.x or PyMongo 3.x. See the PyMongo 3 Migration Guide for detailed examples.
Note
There are a number of new deprecations in this release for features that were removed in PyMongo 3.0.
MongoClient
:host
port
use_greenlets
document_class
tz_aware
secondary_acceptable_latency_ms
tag_sets
uuid_subtype
disconnect()
alive()
MongoReplicaSetClient
:use_greenlets
document_class
tz_aware
secondary_acceptable_latency_ms
tag_sets
uuid_subtype
alive()
Database
:secondary_acceptable_latency_ms
tag_sets
uuid_subtype
Collection
:secondary_acceptable_latency_ms
tag_sets
uuid_subtype
Warning
In previous versions of PyMongo, changing the value of
document_class
changed
the behavior of all existing instances of
Collection
:
>>> coll = client.test.test
>>> coll.find_one()
{u'_id': ObjectId('5579dc7cfba5220cc14d9a18')}
>>> from bson.son import SON
>>> client.document_class = SON
>>> coll.find_one()
SON([(u'_id', ObjectId('5579dc7cfba5220cc14d9a18'))])
The document_class setting is now configurable at the client,
database, collection, and per-operation level. This required breaking
the existing behavior. To change the document class per operation in a
forward compatible way use
with_options()
:
>>> coll.find_one()
{u'_id': ObjectId('5579dc7cfba5220cc14d9a18')}
>>> from bson.codec_options import CodecOptions
>>> coll.with_options(CodecOptions(SON)).find_one()
SON([(u'_id', ObjectId('5579dc7cfba5220cc14d9a18'))])
Issues Resolved¶
See the PyMongo 2.9 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.8.1¶
Version 2.8.1 fixes a number of issues reported since the release of PyMongo 2.8. It is a recommended upgrade for all users of PyMongo 2.x.
Issues Resolved¶
See the PyMongo 2.8.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.8¶
Version 2.8 is a major release that provides full support for MongoDB 3.0 and fixes a number of bugs.
Special thanks to Don Mitchell, Ximing, Can Zhang, Sergey Azovskov, and Heewa Barfchin for their contributions to this release.
Highlights include:
- Support for the SCRAM-SHA-1 authentication mechanism (new in MongoDB 3.0).
- JSON decoder support for the new $numberLong and $undefined types.
- JSON decoder support for the $date type as an ISO-8601 string.
- Support passing an index name to
hint()
. - The
count()
method will use a hint if one has been provided throughhint()
. - A new socketKeepAlive option for the connection pool.
- New generator based BSON decode functions,
decode_iter()
anddecode_file_iter()
. - Internal changes to support alternative storage engines like wiredtiger.
Note
There are a number of deprecations in this release for features that will be removed in PyMongo 3.0. These include:
start_request()
in_request()
end_request()
copy_database()
error()
last_status()
previous_error()
reset_error_history()
MasterSlaveConnection
The JSON format for Timestamp
has changed from
‘{“t”: <int>, “i”: <int>}’ to ‘{“$timestamp”: {“t”: <int>, “i”: <int>}}’.
This new format will be decoded to an instance of
Timestamp
. The old format will continue to be
decoded to a python dict as before. Encoding to the old format is no
longer supported as it was never correct and loses type information.
Issues Resolved¶
See the PyMongo 2.8 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.7.2¶
Version 2.7.2 includes fixes for upsert reporting in the bulk API for MongoDB
versions previous to 2.6, a regression in how son manipulators are applied in
insert()
, a few obscure connection pool
semaphore leaks, and a few other minor issues. See the list of issues resolved
for full details.
Issues Resolved¶
See the PyMongo 2.7.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.7.1¶
Version 2.7.1 fixes a number of issues reported since the release of 2.7, most importantly a fix for creating indexes and manipulating users through mongos versions older than 2.4.0.
Issues Resolved¶
See the PyMongo 2.7.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.7¶
PyMongo 2.7 is a major release with a large number of new features and bug fixes. Highlights include:
- Full support for MongoDB 2.6.
- A new bulk write operations API.
- Support for server side query timeouts using
max_time_ms()
. - Support for writing
aggregate()
output to a collection. - A new
parallel_scan()
helper. OperationFailure
and its subclasses now include adetails
attribute with complete error details from the server.- A new GridFS
find()
method that returns aGridOutCursor
. - Greatly improved support for mod_wsgi when using PyMongo’s C extensions. Read Jesse’s blog post for details.
- Improved C extension support for ARM little endian.
Breaking changes¶
Version 2.7 drops support for replica sets running MongoDB versions older than 1.6.2.
Issues Resolved¶
See the PyMongo 2.7 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.6.3¶
Version 2.6.3 fixes issues reported since the release of 2.6.2, most importantly a semaphore leak when a connection to the server fails.
Issues Resolved¶
See the PyMongo 2.6.3 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.6.2¶
Version 2.6.2 fixes a TypeError
problem when max_pool_size=None
is used in Python 3.
Issues Resolved¶
See the PyMongo 2.6.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.6.1¶
Version 2.6.1 fixes a reference leak in
the insert()
method.
Issues Resolved¶
See the PyMongo 2.6.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.6¶
Version 2.6 includes some frequently requested improvements and adds support for some early MongoDB 2.6 features.
Special thanks go to Justin Patrin for his work on the connection pool in this release.
Important new features:
- The
max_pool_size
option forMongoClient
andMongoReplicaSetClient
now actually caps the number of sockets the pool will open concurrently. Once the pool has reachedmax_pool_size
operations will block waiting for a socket to become available. IfwaitQueueTimeoutMS
is set, an operation that blocks waiting for a socket will raiseConnectionFailure
after the timeout. By defaultwaitQueueTimeoutMS
is not set. See How does connection pooling work in PyMongo? for more information. - The
insert()
method automatically splits large batches of documents into multiple insert messages based onmax_message_size
- Support for the exhaust cursor flag.
See
find()
for details and caveats. - Support for the PLAIN and MONGODB-X509 authentication mechanisms. See the authentication docs for more information.
- Support aggregation output as a
Cursor
. Seeaggregate()
for details.
Warning
SIGNIFICANT BEHAVIOR CHANGE in 2.6. Previously, max_pool_size
would limit only the idle sockets the pool would hold onto, not the
number of open sockets. The default has also changed, from 10 to 100.
If you pass a value for max_pool_size
make sure it is large enough for
the expected load. (Sockets are only opened when needed, so there is no cost
to having a max_pool_size
larger than necessary. Err towards a larger
value.) If your application accepts the default, continue to do so.
See How does connection pooling work in PyMongo? for more information.
Issues Resolved¶
See the PyMongo 2.6 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.5.2¶
Version 2.5.2 fixes a NULL pointer dereference issue when decoding
an invalid DBRef
.
Issues Resolved¶
See the PyMongo 2.5.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.5.1¶
Version 2.5.1 is a minor release that fixes issues discovered after the release of 2.5. Most importantly, this release addresses some race conditions in replica set monitoring.
Issues Resolved¶
See the PyMongo 2.5.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.5¶
Version 2.5 includes changes to support new features in MongoDB 2.4.
Important new features:
- Support for GSSAPI (Kerberos) authentication.
- Support for SSL certificate validation with hostname matching.
- Support for delegated and role based authentication.
- New GEOSPHERE (2dsphere) and HASHED index constants.
Note
authenticate()
now raises a
subclass of PyMongoError
if authentication
fails due to invalid credentials or configuration issues.
Issues Resolved¶
See the PyMongo 2.5 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.4.2¶
Version 2.4.2 is a minor release that fixes issues discovered after the release of 2.4.1. Most importantly, PyMongo will no longer select a replica set member for read operations that is not in primary or secondary state.
Issues Resolved¶
See the PyMongo 2.4.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.4.1¶
Version 2.4.1 is a minor release that fixes issues discovered after the
release of 2.4. Most importantly, this release fixes a regression using
aggregate()
, and possibly other commands,
with mongos.
Issues Resolved¶
See the PyMongo 2.4.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.4¶
Version 2.4 includes a few important new features and a large number of bug fixes.
Important new features:
- New
MongoClient
andMongoReplicaSetClient
classes - these connection classes do acknowledged write operations (previously referred to as ‘safe’ writes) by default.Connection
andReplicaSetConnection
are deprecated but still support the old default fire-and-forget behavior. - A new write concern API implemented as a
write_concern
attribute on the connection,Database
, orCollection
classes. MongoClient
(andConnection
) now support Unix Domain Sockets.Cursor
can be copied with functions from thecopy
module.- The
set_profiling_level()
method now supports a slow_ms option. - The replica set monitor task (used by
MongoReplicaSetClient
andReplicaSetConnection
) is a daemon thread once again, meaning you won’t have to callclose()
before exiting the python interactive shell.
Warning
The constructors for MongoClient
,
MongoReplicaSetClient
,
Connection
, and
ReplicaSetConnection
now raise
ConnectionFailure
instead of its subclass
AutoReconnect
if the server is unavailable. Applications
that expect to catch AutoReconnect
should now catch
ConnectionFailure
while creating a new connection.
Issues Resolved¶
See the PyMongo 2.4 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.3¶
Version 2.3 adds support for new features and behavior changes in MongoDB 2.2.
Important New Features:
- Support for expanded read preferences including directing reads to tagged servers - See Secondary Reads for more information.
- Support for mongos failover.
- A new
aggregate()
method to support MongoDB’s new aggregation framework. - Support for legacy Java and C# byte order when encoding and decoding UUIDs.
- Support for connecting directly to an arbiter.
Warning
Starting with MongoDB 2.2 the getLastError command requires authentication when the server’s authentication features are enabled. Changes to PyMongo were required to support this behavior change. Users of authentication must upgrade to PyMongo 2.3 (or newer) for “safe” write operations to function correctly.
Issues Resolved¶
See the PyMongo 2.3 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.2.1¶
Version 2.2.1 is a minor release that fixes issues discovered after the release of 2.2. Most importantly, this release fixes an incompatibility with mod_wsgi 2.x that could cause connections to leak. Users of mod_wsgi 2.x are strongly encouraged to upgrade from PyMongo 2.2.
Issues Resolved¶
See the PyMongo 2.2.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.2¶
Version 2.2 adds a few more frequently requested features and fixes a number of bugs.
Special thanks go to Alex Grönholm for his contributions to Python 3 support and maintaining the original pymongo3 port. Christoph Simon, Wouter Bolsterlee, Mike O’Brien, and Chris Tompkinson also contributed to this release.
Important New Features:
- Support for Python 3 - See the Python 3 FAQ for more information.
- Support for Gevent - See Gevent for more information.
- Improved connection pooling. See PYTHON-287.
Warning
A number of methods and method parameters that were deprecated in PyMongo 1.9 or older versions have been removed in this release. The full list of changes can be found in the following JIRA ticket:
https://jira.mongodb.org/browse/PYTHON-305
BSON module aliases from the pymongo package that were deprecated in PyMongo 1.9 have also been removed in this release. See the following JIRA ticket for details:
https://jira.mongodb.org/browse/PYTHON-304
As a result of this cleanup some minor code changes may be required to use this release.
Issues Resolved¶
See the PyMongo 2.2 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.1.1¶
Version 2.1.1 is a minor release that fixes a few issues
discovered after the release of 2.1. You can now use
ReplicaSetConnection
to run inline map reduce commands on secondaries. See
inline_map_reduce()
for details.
Special thanks go to Samuel Clay and Ross Lawley for their contributions to this release.
Issues Resolved¶
See the PyMongo 2.1.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.1¶
Version 2.1 adds a few frequently requested features and includes the usual round of bug fixes and improvements.
Special thanks go to Alexey Borzenkov, Dan Crosta, Kostya Rybnikov, Flavio Percoco Premoli, Jonas Haag, and Jesse Davis for their contributions to this release.
Important New Features:
- ReplicaSetConnection -
ReplicaSetConnection
can be used to distribute reads to secondaries in a replica set. It supports automatic failover handling and periodically checks the state of the replica set to handle issues like primary stepdown or secondaries being removed for backup operations. Read preferences are defined throughReadPreference
. - PyMongo supports the new BSON binary subtype 4 for UUIDs. The default
subtype to use can be set through
uuid_subtype
The current default remainsOLD_UUID_SUBTYPE
but will be changed toUUID_SUBTYPE
in a future release. - The getLastError option ‘w’ can be set to a string, allowing for options like “majority” available in newer version of MongoDB.
- Added support for the MongoDB URI options socketTimeoutMS and connectTimeoutMS.
- Added support for the ContinueOnError insert flag.
- Added basic SSL support.
- Added basic support for Jython.
- Secondaries can be used for
count()
,distinct()
,group()
, and queryingGridFS
. - Added document_class and tz_aware options to
MasterSlaveConnection
Issues Resolved¶
See the PyMongo 2.1 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 2.0.1¶
Version 2.0.1 fixes a regression in GridIn
when
writing pre-chunked strings. Thanks go to Alexey Borzenkov for reporting the
issue and submitting a patch.
Issues Resolved¶
- PYTHON-271: Regression in GridFS leads to serious loss of data.
Changes in Version 2.0¶
Version 2.0 adds a large number of features and fixes a number of issues.
Special thanks go to James Murty, Abhay Vardhan, David Pisoni, Ryan Smith-Roberts, Andrew Pendleton, Mher Movsisyan, Reed O’Brien, Michael Schurter, Josip Delic and Jonas Haag for their contributions to this release.
Important New Features:
- PyMongo now performs automatic per-socket database authentication. You no
longer have to re-authenticate for each new thread or after a replica set
failover. Authentication credentials are cached by the driver until the
application calls
logout()
. - slave_okay can be set independently at the connection, database, collection or query level. Each level will inherit the slave_okay setting from the previous level and each level can override the previous level’s setting.
- safe and getLastError options (e.g. w, wtimeout, etc.) can be set independently at the connection, database, collection or query level. Each level will inherit settings from the previous level and each level can override the previous level’s setting.
- PyMongo now supports the await_data and partial cursor flags. If the await_data flag is set on a tailable cursor the server will block for some extra time waiting for more data to return. The partial flag tells a mongos to return partial data for a query if not all shards are available.
map_reduce()
will accept a dict or instance ofSON
as the out parameter.- The URI parser has been moved into its own module and can be used directly by application code.
- AutoReconnect exception now provides information about the error that actually occured instead of a generic failure message.
- A number of new helper methods have been added with options for setting and unsetting cursor flags, re-indexing a collection, fsync and locking a server, and getting the server’s current operations.
API changes:
- If only one host:port pair is specified
Connection
will make a direct connection to only that host. Please note that slave_okay must be True in order to query from a secondary. - If more than one host:port pair is specified or the replicaset option is used PyMongo will treat the specified host:port pair(s) as a seed list and connect using replica set behavior.
Warning
The default subtype for Binary
has changed
from OLD_BINARY_SUBTYPE
(2) to
BINARY_SUBTYPE
(0).
Issues Resolved¶
See the PyMongo 2.0 release notes in JIRA for the list of resolved issues in this release.
Changes in Version 1.11¶
Version 1.11 adds a few new features and fixes a few more bugs.
New Features:
- Basic IPv6 support: pymongo prefers IPv4 but will try IPv6. You can also specify an IPv6 address literal in the host parameter or a MongoDB URI provided it is enclosed in ‘[’ and ‘]’.
- max_pool_size option: previously pymongo had a hard coded pool size
of 10 connections. With this change you can specify a different pool
size as a parameter to
Connection
(max_pool_size=<integer>) or in the MongoDB URI (maxPoolSize=<integer>). - Find by metadata in GridFS: You can know specify query fields as
keyword parameters for
get_version()
andget_last_version()
. - Per-query slave_okay option: slave_okay=True is now a valid keyword
argument for
find()
andfind_one()
.
API changes:
validate_collection()
now returns a dict instead of a string. This change was required to deal with an API change on the server. This method also now takes the optional scandata and full parameters. See the documentation for more details.
Warning
The pool_size, auto_start_request, and timeout parameters
for Connection
have been completely
removed in this release. They were deprecated in pymongo-1.4 and
have had no effect since then. Please make sure that your code
doesn’t currently pass these parameters when creating a
Connection instance.
Issues resolved¶
- PYTHON-241: Support setting slaveok at the cursor level.
- PYTHON-240: Queries can sometimes permanently fail after a replica set fail over.
- PYTHON-238: error after few million requests
- PYTHON-237: Basic IPv6 support.
- PYTHON-236: Restore option to specify pool size in Connection.
- PYTHON-212: pymongo does not recover after stale config
- PYTHON-138: Find method for GridFS
Changes in Version 1.10.1¶
Version 1.10.1 is primarily a bugfix release. It fixes a regression in version 1.10 that broke pickling of ObjectIds. A number of other bugs have been fixed as well.
There are two behavior changes to be aware of:
- If a read slave raises
AutoReconnect
MasterSlaveConnection
will now retry the query on each slave until it is successful or all slaves have raisedAutoReconnect
. Any other exception will immediately be raised. The order that the slaves are tried is random. Previously the read would be sent to one randomly chosen slave andAutoReconnect
was immediately raised in case of a connection failure. - A Python long is now always BSON encoded as an int64. Previously the encoding was based only on the value of the field and a long with a value less than 2147483648 or greater than -2147483649 would always be BSON encoded as an int32.
Issues resolved¶
- PYTHON-234: Fix setup.py to raise exception if any when building extensions
- PYTHON-233: Add information to build and test with extensions on windows
- PYTHON-232: Traceback when hashing a DBRef instance
- PYTHON-231: Traceback when pickling a DBRef instance
- PYTHON-230: Pickled ObjectIds are not compatible between pymongo 1.9 and 1.10
- PYTHON-228: Cannot pickle bson.ObjectId
- PYTHON-227: Traceback when calling find() on system.js
- PYTHON-216: MasterSlaveConnection is missing disconnect() method
- PYTHON-186: When storing integers, type is selected according to value instead of type
- PYTHON-173: as_class option is not propogated by Cursor.clone
- PYTHON-113: Redunducy in MasterSlaveConnection
Changes in Version 1.10¶
Version 1.10 includes changes to support new features in MongoDB 1.8.x. Highlights include a modified map/reduce API including an inline map/reduce helper method, a new find_and_modify helper, and the ability to query the server for the maximum BSON document size it supports.
- added
find_and_modify()
. - added
inline_map_reduce()
. - changed
map_reduce()
.
Warning
MongoDB versions greater than 1.7.4 no longer generate temporary
collections for map/reduce results. An output collection name must be
provided and the output will replace any existing output collection with
the same name. map_reduce()
now
requires the out parameter.
Issues resolved¶
- PYTHON-225:
ObjectId
class definition should use __slots__. - PYTHON-223: Documentation fix.
- PYTHON-220: Documentation fix.
- PYTHON-219: KeyError in
find_and_modify()
- PYTHON-213: Query server for maximum BSON document size.
- PYTHON-208: Fix
Connection
__repr__. - PYTHON-207: Changes to Map/Reduce API.
- PYTHON-205: Accept slaveOk in the URI to match the URI docs.
- PYTHON-203: When slave_okay=True and we only specify one host don’t autodetect other set members.
- PYTHON-194: Show size when whining about a document being too large.
- PYTHON-184: Raise
DuplicateKeyError
for duplicate keys in capped collections. - PYTHON-178: Don’t segfault when trying to encode a recursive data structure.
- PYTHON-177: Don’t segfault when decoding dicts with broken iterators.
- PYTHON-172: Fix a typo.
- PYTHON-170: Add
find_and_modify()
. - PYTHON-169: Support deepcopy of DBRef.
- PYTHON-167: Duplicate of PYTHON-166.
- PYTHON-166: Fixes a concurrency issue.
- PYTHON-158: Add code and err string to db assertion messages.
Changes in Version 1.9¶
Version 1.9 adds a new package to the PyMongo distribution,
bson
. bson
contains all of the BSON encoding and decoding logic, and the BSON
types that were formerly in the pymongo
package. The following
modules have been renamed:
pymongo.bson
->bson
pymongo._cbson
->bson._cbson
andpymongo._cmessage
pymongo.binary
->bson.binary
pymongo.code
->bson.code
pymongo.dbref
->bson.dbref
pymongo.json_util
->bson.json_util
pymongo.max_key
->bson.max_key
pymongo.min_key
->bson.min_key
pymongo.objectid
->bson.objectid
pymongo.son
->bson.son
pymongo.timestamp
->bson.timestamp
pymongo.tz_util
->bson.tz_util
In addition, the following exception classes have been renamed:
pymongo.errors.InvalidBSON
->bson.errors.InvalidBSON
pymongo.errors.InvalidStringData
->bson.errors.InvalidStringData
pymongo.errors.InvalidDocument
->bson.errors.InvalidDocument
pymongo.errors.InvalidId
->bson.errors.InvalidId
The above exceptions now inherit from bson.errors.BSONError
rather than pymongo.errors.PyMongoError
.
Note
All of the renamed modules and exceptions above have aliases created with the old names, so these changes should not break existing code. The old names will eventually be deprecated and then removed, so users should begin migrating towards the new names now.
Warning
The change to the exception hierarchy mentioned above is
possibly breaking. If your code is catching
PyMongoError
, then the exceptions raised
by bson
will not be caught, even though they would have been
caught previously. Before upgrading, it is recommended that users
check for any cases like this.
- the C extension now shares buffer.c/h with the Ruby driver
bson
no longer raisesInvalidName
, all occurrences have been replaced withInvalidDocument
.- renamed
bson._to_dicts()
todecode_all()
. - renamed
from_dict()
toencode()
andto_dict()
todecode()
. - added
batch_size()
. - allow updating (some) file metadata after a
GridIn
instance has been closed. - performance improvements for reading from GridFS.
- special cased slice with the same start and stop to return an empty cursor.
- allow writing
unicode
to GridFS if anencoding
attribute has been specified for the file. - added
gridfs.GridFS.get_version()
. - scope variables for
Code
can now be specified as keyword arguments. - added
readline()
toGridOut
. - make a best effort to transparently auto-reconnect if a
Connection
has been idle for a while. - added
list()
toSystemJS
. - added file_document argument to
GridOut()
to allow initializing from an existing file document. - raise
TimeoutError
even if thegetLastError
command was run manually and not through “safe” mode. - added
uuid
support tojson_util
.
Changes in Version 1.8.1¶
- fixed a typo in the C extension that could cause safe-mode
operations to report a failure (
SystemError
) even when none occurred. - added a
__ne__()
implementation to any class where we define__eq__()
.
Changes in Version 1.8¶
Version 1.8 adds support for connecting to replica sets, specifying per-operation values for w and wtimeout, and decoding to timezone-aware datetimes.
- fixed a reference leak in the C extension when decoding a
DBRef
. - added support for w, wtimeout, and fsync (and any other options for getLastError) to “safe mode” operations.
- added
nodes
property. - added a maximum pool size of 10 sockets.
- added support for replica sets.
- DEPRECATED
from_uri()
andpaired()
, both are supplanted by extended functionality inConnection()
. - added tz aware support for datetimes in
ObjectId
,Timestamp
andjson_util
methods. - added
drop()
helper. - reuse the socket used for finding the master when a
Connection
is first created. - added support for
MinKey
,MaxKey
andTimestamp
tojson_util
. - added support for decoding datetimes as aware (UTC) - it is highly
recommended to enable this by setting the tz_aware parameter to
Connection()
toTrue
. - added network_timeout option for individual calls to
find()
andfind_one()
. - added
exists()
to check if a file exists in GridFS. - added support for additional keys in
DBRef
instances. - added
code
attribute toOperationFailure
exceptions. - fixed serialization of int and float subclasses in the C extension.
Changes in Version 1.7¶
Version 1.7 is a recommended upgrade for all PyMongo users. The full release notes are below, and some more in depth discussion of the highlights is here.
- no longer attempt to build the C extension on big-endian systems.
- added
MinKey
andMaxKey
. - use unsigned for
Timestamp
in BSON encoder/decoder. - support
True
as"ok"
in command responses, in addition to1.0
- necessary for server versions >= 1.5.X - BREAKING change to
index_information()
to add support for querying unique status and other index information. - added
document_class
, to specify class for returned documents. - added as_class argument for
find()
, and in the BSON decoder. - added support for creating
Timestamp
instances using adatetime
. - allow dropTarget argument for
rename
. - handle aware
datetime
instances, by converting to UTC. - added support for
max_scan
. - raise
FileExists
exception when creating a duplicate GridFS file. - use y2038 for time handling in the C extension - eliminates 2038 problems when extension is installed.
- added sort parameter to
find()
- finalized deprecation of changes from versions <= 1.4
- take any non-
dict
as an"_id"
query forfind_one()
orremove()
- added ability to pass a
dict
for fields argument tofind()
(supports"$slice"
and field negation) - simplified code to find master, since paired setups don’t always have a remote
- fixed bug in C encoder for certain invalid types (like
Collection
instances). - don’t transparently map
"filename"
key toname
attribute for GridFS.
Changes in Version 1.6¶
The biggest change in version 1.6 is a complete re-implementation of
gridfs
with a lot of improvements over the old
implementation. There are many details and examples of using the new
API in this blog post. The
old API has been removed in this version, so existing code will need
to be modified before upgrading to 1.6.
- fixed issue where connection pool was being shared across
Connection
instances. - more improvements to Python code caching in C extension - should improve behavior on mod_wsgi.
- added
from_datetime()
. - complete rewrite of
gridfs
support. - improvements to the
command()
API. - fixed
drop_indexes()
behavior on non-existent collections. - disallow empty bulk inserts.
Changes in Version 1.5.2¶
- fixed response handling to ignore unknown response flags in queries.
- handle server versions containing ‘-pre-‘.
Changes in Version 1.5.1¶
- added
_id
property forGridFile
instances. - fix for making a
Connection
(with slave_okay set) directly to a slave in a replica pair. - accept kwargs for
create_index()
andensure_index()
to support all indexing options. - add
pymongo.GEO2D
and support for geo indexing. - improvements to Python code caching in C extension - should improve behavior on mod_wsgi.
Changes in Version 1.5¶
- added subtype constants to
binary
module. - DEPRECATED options argument to
Collection()
andcreate_collection()
in favor of kwargs. - added
has_c()
to check for C extension. - added
copy_database()
. - added
alive
to tell when a cursor might have more data to return (useful for tailable cursors). - added
Timestamp
to better support dealing with internal MongoDB timestamps. - added name argument for
create_index()
andensure_index()
. - fixed connection pooling w/ fork
paired()
takes all kwargs that are allowed forConnection()
.insert()
returns list for bulk inserts of size one.- fixed handling of
datetime.datetime
instances injson_util
. - added
from_uri()
to support MongoDB connection uri scheme. - fixed chunk number calculation when unaligned in
gridfs
. command()
takes a string for simple commands.- added
system_js
helper for dealing with server-side JS. - don’t wrap queries containing
"$query"
(support manual use of"$min"
, etc.). - added
GridFSError
as base class forgridfs
exceptions.
Changes in Version 1.4¶
Perhaps the most important change in version 1.4 is that we have decided to no longer support Python 2.3. The most immediate reason for this is to allow some improvements to connection pooling. This will also allow us to use some new (as in Python 2.4 ;) idioms and will help begin the path towards supporting Python 3.0. If you need to use Python 2.3 you should consider using version 1.3 of this driver, although that will no longer be actively supported.
Other changes:
- move
"_id"
to front only for top-level documents (fixes some corner cases). update()
andremove()
return the entire response to the lastError command when safe isTrue
.- completed removal of things that were deprecated in version 1.2 or earlier.
- enforce that collection names do not contain the NULL byte.
- fix to allow using UTF-8 collection names with the C extension.
- added
PyMongoError
as base exception class for allerrors
. this changes the exception hierarchy somewhat, and is a BREAKING change if you depend onConnectionFailure
being aIOError
orInvalidBSON
being aValueError
, for example. - added
DuplicateKeyError
for calls toinsert()
orupdate()
with safe set toTrue
. - removed
thread_util
. - added
add_user()
andremove_user()
helpers. - fix for
authenticate()
when using non-UTF-8 names or passwords. - minor fixes for
MasterSlaveConnection
. - clean up all cases where
ConnectionFailure
is raised. - simplification of connection pooling - makes driver ~2x faster for simple benchmarks. see How does connection pooling work in PyMongo? for more information.
- DEPRECATED pool_size, auto_start_request and timeout
parameters to
Connection
. DEPRECATEDstart_request()
. - use
socket.sendall()
. - removed
from_xml()
as it was only being used for some internal testing - also eliminates dependency onelementtree
. - implementation of
update()
in C. - deprecate
_command()
in favor ofcommand()
. - send all commands without wrapping as
{"query": ...}
. - support string as key argument to
group()
(keyf) and run all groups as commands. - support for equality testing for
Code
instances. - allow the NULL byte in strings and disallow it in key names or regex patterns
Changes in Version 1.3¶
- DEPRECATED running
group()
aseval()
, also changed default forgroup()
to running as a command - remove
pymongo.cursor.Cursor.__len__()
, which was deprecated in 1.1.1 - needed to do this aggressively due to it’s presence breaking Django template for loops - DEPRECATED
host()
,port()
,connection()
,name()
,database()
,name()
andfull_name()
in favor ofhost
,port
,connection
,name
,database
,name
andfull_name
, respectively. The deprecation schedule for this change will probably be faster than usual, as it carries some performance implications. - added
disconnect()
Changes in Version 1.2.1¶
- added Changelog to docs
- added
setup.py doc --test
to run doctests for tutorial, examples - moved most examples to Sphinx docs (and remove from examples/ directory)
- raise
InvalidId
instead ofTypeError
when passing a 24 character string toObjectId
that contains non-hexadecimal characters - allow
unicode
instances forObjectId
init
Changes in Version 1.2¶
- spec parameter for
remove()
is now optional to allow for deleting all documents in aCollection
- always wrap queries with
{query: ...}
even when no special options - get around some issues with queries on fields namedquery
- enforce 4MB document limit on the client side
- added
map_reduce()
helper - see example - added
distinct()
method onCursor
instances to allow distinct with queries - fix for
__getitem__()
afterskip()
- allow any UTF-8 string in
BSON
encoder, not just ASCII subset - added
generation_time
- removed support for legacy
ObjectId
format - pretty sure this was never used, and is just confusing - DEPRECATED
url_encode()
andurl_decode()
in favor ofstr()
andObjectId()
, respectively - allow oplog.$main as a valid collection name
- some minor fixes for installation process
- added support for datetime and regex in
json_util
Changes in Version 1.1.2¶
- improvements to
insert()
speed (using C for insert message creation) - use random number for request_id
- fix some race conditions with
AutoReconnect
Changes in Version 1.1.1¶
- added multi parameter for
update()
- fix unicode regex patterns with C extension
- added
distinct()
- added database support for
DBRef
- added
json_util
with helpers for encoding / decoding special types to JSON - DEPRECATED
pymongo.cursor.Cursor.__len__()
in favor ofcount()
with with_limit_and_skip set toTrue
due to performance regression - switch documentation to Sphinx
Changes in Version 1.1¶
- added
__hash__()
forDBRef
andObjectId
- bulk
insert()
works with any iterable - fix
ObjectId
generation when usingmultiprocessing
- added
collection
- added network_timeout parameter for
Connection()
- DEPRECATED slave_okay parameter for individual queries
- fix for safe mode when multi-threaded
- added safe parameter for
remove()
- added tailable parameter for
find()
Changes in Version 1.0¶
- fixes for
MasterSlaveConnection
- added finalize parameter for
group()
- improvements to
insert()
speed - improvements to
gridfs
speed - added
__getitem__()
and__len__()
forCursor
instances
Changes in Version 0.16¶
Changes in Version 0.15.2¶
- documentation changes only
Changes in Version 0.15.1¶
- various performance improvements
- API CHANGE no longer need to specify direction for
create_index()
andensure_index()
when indexing a single key - support for encoding
tuple
instances aslist
instances
Changes in Version 0.15¶
Changes in Version 0.14.2¶
- minor bugfixes
Changes in Version 0.14.1¶
seek()
andtell()
for (read mode)GridFile
instances
Changes in Version 0.14¶
Changes in Version 0.13¶
Changes in Version 0.12¶
- improved
ObjectId
generation - added
AutoReconnect
exception for when reconnection is possible - make
gridfs
thread-safe - fix for
gridfs
with nonObjectId
_id
Changes in Version 0.11.3¶
- don’t allow NULL bytes in string encoder
- fixes for Python 2.3
Changes in Version 0.11.1¶
- fix for connection pooling under Python 2.5
Changes in Version 0.11¶
- better build failure detection
- driver support for selecting fields in sub-documents
- disallow insertion of invalid key names
- added timeout parameter for
Connection()
Changes in Version 0.10.3¶
- fix bug with large
limit()
- better exception when modules get reloaded out from underneath the C extension
- better exception messages when calling a
Collection
orDatabase
instance
Changes in Version 0.10.1¶
- alias
Connection
aspymongo.Connection
- raise an exception rather than silently overflowing in encoder
Changes in Version 0.10¶
- added
ensure_index()
Changes in Version 0.9.7¶
- allow sub-collections of $cmd as valid
Collection
names - add version as
pymongo.version
- add
--no_ext
command line option to setup.py
Python 3 FAQ¶
Contents
What Python 3 versions are supported?¶
PyMongo supports CPython 3.4+ and PyPy3.5+.
Are there any PyMongo behavior changes with Python 3?¶
Only one intentional change. Instances of bytes
are encoded as BSON type 5 (Binary data) with subtype 0.
In Python 3 they are decoded back to bytes
. In
Python 2 they are decoded to Binary
with subtype 0.
For example, let’s insert a bytes
instance using Python 3 then
read it back. Notice the byte string is decoded back to bytes
:
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> c = pymongo.MongoClient()
>>> c.test.bintest.insert_one({'binary': b'this is a byte string'}).inserted_id
ObjectId('4f9086b1fba5222021000000')
>>> c.test.bintest.find_one()
{'binary': b'this is a byte string', '_id': ObjectId('4f9086b1fba5222021000000')}
Now retrieve the same document in Python 2. Notice the byte string is decoded
to Binary
:
Python 2.7.6 (default, Feb 26 2014, 10:36:22)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> c = pymongo.MongoClient()
>>> c.test.bintest.find_one()
{u'binary': Binary('this is a byte string', 0), u'_id': ObjectId('4f9086b1fba5222021000000')}
There is a similar change in behavior in parsing JSON binary with subtype 0.
In Python 3 they are decoded into bytes
. In Python 2 they are
decoded to Binary
with subtype 0.
For example, let’s decode a JSON binary subtype 0 using Python 3. Notice the
byte string is decoded to bytes
:
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bson.json_util import loads
>>> loads('{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}')
{'b': b'this is a byte string'}
Now decode the same JSON in Python 2 . Notice the byte string is decoded
to Binary
:
Python 2.7.10 (default, Feb 7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bson.json_util import loads
>>> loads('{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}')
{u'b': Binary('this is a byte string', 0)}
PyMongo 3 Migration Guide¶
Contents
PyMongo 3 is a partial rewrite bringing a large number of improvements. It also brings a number of backward breaking changes. This guide provides a roadmap for migrating an existing application from PyMongo 2.x to 3.x or writing libraries that will work with both PyMongo 2.x and 3.x.
PyMongo 2.9¶
The first step in any successful migration involves upgrading to, or requiring, at least PyMongo 2.9. If your project has a requirements.txt file, add the line “pymongo >= 2.9, < 3.0” until you have completely migrated to PyMongo 3. Most of the key new methods and options from PyMongo 3.0 are backported in PyMongo 2.9 making migration much easier.
Enable Deprecation Warnings¶
Starting with PyMongo 2.9, DeprecationWarning
is raised by most methods
removed in PyMongo 3.0. Make sure you enable runtime warnings to see
where deprecated functions and methods are being used in your application:
python -Wd <your application>
Warnings can also be changed to errors:
python -Wd -Werror <your application>
Note
Not all deprecated features raise DeprecationWarning
when
used. For example, the find()
options
renamed in PyMongo 3.0 do not raise DeprecationWarning
when used in
PyMongo 2.x. See also Removed features with no migration path.
CRUD API¶
Changes to find() and find_one()¶
“spec” renamed “filter”¶
The spec option has been renamed to filter. Code like this:
>>> cursor = collection.find(spec={"a": 1})
can be changed to this with PyMongo 2.9 or later:
>>> cursor = collection.find(filter={"a": 1})
or this with any version of PyMongo:
>>> cursor = collection.find({"a": 1})
“fields” renamed “projection”¶
The fields option has been renamed to projection. Code like this:
>>> cursor = collection.find({"a": 1}, fields={"_id": False})
can be changed to this with PyMongo 2.9 or later:
>>> cursor = collection.find({"a": 1}, projection={"_id": False})
or this with any version of PyMongo:
>>> cursor = collection.find({"a": 1}, {"_id": False})
“partial” renamed “allow_partial_results”¶
The partial option has been renamed to allow_partial_results. Code like this:
>>> cursor = collection.find({"a": 1}, partial=True)
can be changed to this with PyMongo 2.9 or later:
>>> cursor = collection.find({"a": 1}, allow_partial_results=True)
“timeout” replaced by “no_cursor_timeout”¶
The timeout option has been replaced by no_cursor_timeout. Code like this:
>>> cursor = collection.find({"a": 1}, timeout=False)
can be changed to this with PyMongo 2.9 or later:
>>> cursor = collection.find({"a": 1}, no_cursor_timeout=True)
“network_timeout” is removed¶
The network_timeout option has been removed. This option was always the wrong solution for timing out long running queries and should never be used in production. Starting with MongoDB 2.6 you can use the $maxTimeMS query modifier. Code like this:
# Set a 5 second select() timeout.
>>> cursor = collection.find({"a": 1}, network_timeout=5)
can be changed to this with PyMongo 2.9 or later:
# Set a 5 second (5000 millisecond) server side query timeout.
>>> cursor = collection.find({"a": 1}, modifiers={"$maxTimeMS": 5000})
or with PyMongo 3.5 or later:
>>> cursor = collection.find({"a": 1}, max_time_ms=5000)
or with any version of PyMongo:
>>> cursor = collection.find({"$query": {"a": 1}, "$maxTimeMS": 5000})
See also
Tailable cursors¶
The tailable and await_data options have been replaced by cursor_type. Code like this:
>>> cursor = collection.find({"a": 1}, tailable=True)
>>> cursor = collection.find({"a": 1}, tailable=True, await_data=True)
can be changed to this with PyMongo 2.9 or later:
>>> from pymongo import CursorType
>>> cursor = collection.find({"a": 1}, cursor_type=CursorType.TAILABLE)
>>> cursor = collection.find({"a": 1}, cursor_type=CursorType.TAILABLE_AWAIT)
Other removed options¶
The slave_okay, read_preference, tag_sets, and secondary_acceptable_latency_ms options have been removed. See the Read Preferences section for solutions.
The aggregate method always returns a cursor¶
PyMongo 2.6 added an option to return an iterable cursor from
aggregate()
. In PyMongo 3
aggregate()
always returns a cursor. Use
the cursor option for consistent behavior with PyMongo 2.9 and later:
>>> for result in collection.aggregate([], cursor={}):
... pass
Read Preferences¶
The “slave_okay” option is removed¶
The slave_okay option is removed from PyMongo’s API. The secondaryPreferred read preference provides the same behavior. Code like this:
>>> client = MongoClient(slave_okay=True)
can be changed to this with PyMongo 2.9 or newer:
>>> client = MongoClient(readPreference="secondaryPreferred")
The “read_preference” attribute is immutable¶
Code like this:
>>> from pymongo import ReadPreference
>>> db = client.my_database
>>> db.read_preference = ReadPreference.SECONDARY
can be changed to this with PyMongo 2.9 or later:
>>> db = client.get_database("my_database",
... read_preference=ReadPreference.SECONDARY)
Code like this:
>>> cursor = collection.find({"a": 1},
... read_preference=ReadPreference.SECONDARY)
can be changed to this with PyMongo 2.9 or later:
>>> coll2 = collection.with_options(read_preference=ReadPreference.SECONDARY)
>>> cursor = coll2.find({"a": 1})
See also
The “tag_sets” option and attribute are removed¶
The tag_sets MongoClient option is removed. The read_preference option can be used instead. Code like this:
>>> client = MongoClient(
... read_preference=ReadPreference.SECONDARY,
... tag_sets=[{"dc": "ny"}, {"dc": "sf"}])
can be changed to this with PyMongo 2.9 or later:
>>> from pymongo.read_preferences import Secondary
>>> client = MongoClient(read_preference=Secondary([{"dc": "ny"}]))
To change the tags sets for a Database or Collection, code like this:
>>> db = client.my_database
>>> db.read_preference = ReadPreference.SECONDARY
>>> db.tag_sets = [{"dc": "ny"}]
can be changed to this with PyMongo 2.9 or later:
>>> db = client.get_database("my_database",
... read_preference=Secondary([{"dc": "ny"}]))
Code like this:
>>> cursor = collection.find(
... {"a": 1},
... read_preference=ReadPreference.SECONDARY,
... tag_sets=[{"dc": "ny"}])
can be changed to this with PyMongo 2.9 or later:
>>> from pymongo.read_preferences import Secondary
>>> coll2 = collection.with_options(
... read_preference=Secondary([{"dc": "ny"}]))
>>> cursor = coll2.find({"a": 1})
See also
The “secondary_acceptable_latency_ms” option and attribute are removed¶
PyMongo 2.x supports secondary_acceptable_latency_ms as an option to methods throughout the driver, but mongos only supports a global latency option. PyMongo 3.x has changed to match the behavior of mongos, allowing migration from a single server, to a replica set, to a sharded cluster without a surprising change in server selection behavior. A new option, localThresholdMS, is available through MongoClient and should be used in place of secondaryAcceptableLatencyMS. Code like this:
>>> client = MongoClient(readPreference="nearest",
... secondaryAcceptableLatencyMS=100)
can be changed to this with PyMongo 2.9 or later:
>>> client = MongoClient(readPreference="nearest",
... localThresholdMS=100)
Write Concern¶
The “safe” option is removed¶
In PyMongo 3 the safe option is removed from the entire API.
MongoClient
has always defaulted to acknowledged
write operations and continues to do so in PyMongo 3.
The “write_concern” attribute is immutable¶
The write_concern attribute is immutable in PyMongo 3. Code like this:
>>> client = MongoClient()
>>> client.write_concern = {"w": "majority"}
can be changed to this with any version of PyMongo:
>>> client = MongoClient(w="majority")
Code like this:
>>> db = client.my_database
>>> db.write_concern = {"w": "majority"}
can be changed to this with PyMongo 2.9 or later:
>>> from pymongo import WriteConcern
>>> db = client.get_database("my_database",
... write_concern=WriteConcern(w="majority"))
The new CRUD API write methods do not accept write concern options. Code like this:
>>> oid = collection.insert({"a": 2}, w="majority")
can be changed to this with PyMongo 2.9 or later:
>>> from pymongo import WriteConcern
>>> coll2 = collection.with_options(
... write_concern=WriteConcern(w="majority"))
>>> oid = coll2.insert({"a": 2})
See also
Codec Options¶
The “document_class” attribute is removed¶
Code like this:
>>> from bson.son import SON
>>> client = MongoClient()
>>> client.document_class = SON
can be replaced by this in any version of PyMongo:
>>> from bson.son import SON
>>> client = MongoClient(document_class=SON)
or to change the document_class for a Database
with PyMongo 2.9 or later:
>>> from bson.codec_options import CodecOptions
>>> from bson.son import SON
>>> db = client.get_database("my_database", CodecOptions(SON))
See also
The “uuid_subtype” option and attribute are removed¶
Code like this:
>>> from bson.binary import JAVA_LEGACY
>>> db = client.my_database
>>> db.uuid_subtype = JAVA_LEGACY
can be replaced by this with PyMongo 2.9 or later:
>>> from bson.binary import JAVA_LEGACY
>>> from bson.codec_options import CodecOptions
>>> db = client.get_database("my_database",
... CodecOptions(uuid_representation=JAVA_LEGACY))
See also
MongoClient¶
MongoClient connects asynchronously¶
In PyMongo 3, the MongoClient
constructor no
longer blocks while connecting to the server or servers, and it no longer
raises ConnectionFailure
if they are unavailable, nor
ConfigurationError
if the user’s credentials are wrong.
Instead, the constructor returns immediately and launches the connection
process on background threads. The connect option is added to control whether
these threads are started immediately, or when the client is first used.
For consistent behavior in PyMongo 2.x and PyMongo 3.x, code like this:
>>> from pymongo.errors import ConnectionFailure
>>> try:
... client = MongoClient()
... except ConnectionFailure:
... print("Server not available")
>>>
can be changed to this with PyMongo 2.9 or later:
>>> from pymongo.errors import ConnectionFailure
>>> client = MongoClient(connect=False)
>>> try:
... result = client.admin.command("ismaster")
... except ConnectionFailure:
... print("Server not available")
>>>
Any operation can be used to determine if the server is available. We choose the “ismaster” command here because it is cheap and does not require auth, so it is a simple way to check whether the server is available.
The max_pool_size parameter is removed¶
PyMongo 3 replaced the max_pool_size parameter with support for the MongoDB URI maxPoolSize option. Code like this:
>>> client = MongoClient(max_pool_size=10)
can be replaced by this with PyMongo 2.9 or later:
>>> client = MongoClient(maxPoolSize=10)
>>> client = MongoClient("mongodb://localhost:27017/?maxPoolSize=10")
The “disconnect” method is removed¶
Code like this:
>>> client.disconnect()
can be replaced by this with PyMongo 2.9 or later:
>>> client.close()
The host and port attributes are removed¶
Code like this:
>>> host = client.host
>>> port = client.port
can be replaced by this with PyMongo 2.9 or later:
>>> address = client.address
>>> host, port = address or (None, None)
BSON¶
“as_class”, “tz_aware”, and “uuid_subtype” are removed¶
The as_class, tz_aware, and uuid_subtype parameters have been
removed from the functions provided in bson
. Furthermore, the
encode()
and decode()
functions have been added
as more performant alternatives to the bson.BSON.encode()
and
bson.BSON.decode()
methods. Code like this:
>>> from bson import BSON
>>> from bson.son import SON
>>> encoded = BSON.encode({"a": 1}, as_class=SON)
can be replaced by this in PyMongo 2.9 or later:
>>> from bson import encode
>>> from bson.codec_options import CodecOptions
>>> from bson.son import SON
>>> encoded = encode({"a": 1}, codec_options=CodecOptions(SON))
Removed features with no migration path¶
MasterSlaveConnection is removed¶
Master slave deployments are deprecated in MongoDB. Starting with MongoDB 3.0 a replica set can have up to 50 members and that limit is likely to be removed in later releases. We recommend migrating to replica sets instead.
Requests are removed¶
The client methods start_request, in_request, and end_request are removed. Requests were designed to make read-your-writes consistency more likely with the w=0 write concern. Additionally, a thread in a request used the same member for all secondary reads in a replica set. To ensure read-your-writes consistency in PyMongo 3.0, do not override the default write concern with w=0, and do not override the default read preference of PRIMARY.
The “compile_re” option is removed¶
In PyMongo 3 regular expressions are never compiled to Python match objects.
The “use_greenlets” option is removed¶
The use_greenlets option was meant to allow use of PyMongo with Gevent without the use of gevent.monkey.patch_threads(). This option caused a lot of confusion and made it difficult to support alternative asyncio libraries like Eventlet. Users of Gevent should use gevent.monkey.patch_all() instead.
See also
Developer Guide¶
Technical guide for contributors to PyMongo.
Periodic Executors¶
PyMongo implements a PeriodicExecutor
for two
purposes: as the background thread for Monitor
, and to
regularly check if there are OP_KILL_CURSORS messages that must be sent to the server.
Killing Cursors¶
An incompletely iterated Cursor
on the client represents an
open cursor object on the server. In code like this, we lose a reference to
the cursor before finishing iteration:
for doc in collection.find():
raise Exception()
We try to send an OP_KILL_CURSORS to the server to tell it to clean up the
server-side cursor. But we must not take any locks directly from the cursor’s
destructor (see PYTHON-799), so we cannot safely use the PyMongo data
structures required to send a message. The solution is to add the cursor’s id
to an array on the MongoClient
without taking any locks.
Each client has a PeriodicExecutor
devoted to
checking the array for cursor ids. Any it sees are the result of cursors that
were freed while the server-side cursor was still open. The executor can safely
take the locks it needs in order to send the OP_KILL_CURSORS message.
Stopping Executors¶
Just as Cursor
must not take any locks from its destructor,
neither can MongoClient
and Topology
.
Thus, although the client calls close()
on its kill-cursors thread, and
the topology calls close()
on all its monitor threads, the close()
method cannot actually call wake()
on the executor, since wake()
takes a lock.
Instead, executors wake periodically to check if self.close
is set,
and if so they exit.
A thread can log spurious errors if it wakes late in the Python interpreter’s
shutdown sequence, so we try to join threads before then. Each periodic
executor (either a monitor or a kill-cursors thread) adds a weakref to itself
to a set called _EXECUTORS
, in the periodic_executor
module.
An exit handler runs on shutdown and tells all executors to stop, then tries (with a short timeout) to join all executor threads.
Monitoring¶
For each server in the topology, Topology
uses a periodic
executor to launch a monitor thread. This thread must not prevent the topology
from being freed, so it weakrefs the topology. Furthermore, it uses a weakref
callback to terminate itself soon after the topology is freed.
Solid lines represent strong references, dashed lines weak ones:

See Stopping Executors above for an explanation of the _EXECUTORS
set.
It is a requirement of the Server Discovery And Monitoring Spec that a sleeping monitor can be awakened early. Aside from infrequent wakeups to do their appointed chores, and occasional interruptions, periodic executors also wake periodically to check if they should terminate.
Our first implementation of this idea was the obvious one: use the Python standard library’s threading.Condition.wait with a timeout. Another thread wakes the executor early by signaling the condition variable.
A topology cannot signal the condition variable to tell the executor to terminate, because it would risk a deadlock in the garbage collector: no destructor or weakref callback can take a lock to signal the condition variable (see PYTHON-863); thus the only way for a dying object to terminate a periodic executor is to set its “stopped” flag and let the executor see the flag next time it wakes.
We erred on the side of prompt cleanup, and set the check interval at 100ms. We assumed that checking a flag and going back to sleep 10 times a second was cheap on modern machines.
Starting in Python 3.2, the builtin C implementation of lock.acquire takes a timeout parameter, so Python 3.2+ Condition variables sleep simply by calling lock.acquire; they are implemented as efficiently as expected.
But in Python 2, lock.acquire has no timeout. To wait with a timeout, a Python 2 condition variable sleeps a millisecond, tries to acquire the lock, sleeps twice as long, and tries again. This exponential backoff reaches a maximum sleep time of 50ms.
If PyMongo calls the condition variable’s “wait” method with a short timeout, the exponential backoff is restarted frequently. Overall, the condition variable is not waking a few times a second, but hundreds of times. (See PYTHON-983.)
Thus the current design of periodic executors is surprisingly simple: they do a simple time.sleep for a half-second, check if it is time to wake or terminate, and sleep again.