PyMongo 3.6.0 Documentation

Overview

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. This documentation attempts to explain everything you need to know to use PyMongo.

Installing / Upgrading
Instructions on how to get the distribution.
Tutorial
Start here for a quick overview.
Examples
Examples of how to perform specific tasks.
Using PyMongo with MongoDB Atlas
Using PyMongo with MongoDB Atlas.
TLS/SSL and PyMongo
Using PyMongo with TLS / SSL.
Frequently Asked Questions
Some questions that come up often.
PyMongo 3 Migration Guide
A PyMongo 2.x to 3.x migration guide.
Python 3 FAQ
Frequently asked questions about python 3 support.
Compatibility Policy
Explanation of deprecations, and how to keep pace with changes in PyMongo’s API.
API Documentation
The complete API documentation, organized by module.
Tools
A listing of Python tools and libraries that have been written for MongoDB.
Developer Guide
Developer guide for contributors to PyMongo.

Getting Help

If you’re having trouble or have questions about PyMongo, the best place to ask is the MongoDB user group. Once you get an answer, it’d be great if you could work it back into this documentation and contribute!

Issues

All issues should be reported (and can be tracked / voted for / commented on) at the main MongoDB JIRA bug tracker, in the “Python Driver” project.

Contributing

PyMongo has a large community and contributions are always encouraged. Contributions can be as simple as minor tweaks to this documentation. To contribute, fork the project on github and send a pull request.

Changes

See the Changelog for a full list of changes to PyMongo. For older versions of the documentation please see the archive list.

About This Documentation

This documentation is generated using the Sphinx documentation generator. The source files for the documentation are located in the doc/ directory of the PyMongo distribution. To generate the docs locally run the following command from the root directory of the PyMongo source:

$ python setup.py doc

Indices and tables

Using PyMongo with MongoDB Atlas

Atlas is MongoDB, Inc.’s hosted MongoDB as a service offering. To connect to Atlas, pass the connection string provided by Atlas to MongoClient:

client = pymongo.MongoClient(<Atlas connection string>)

Connections to Atlas require TLS/SSL. For connections using TLS/SSL, PyMongo may require third party dependencies as determined by your version of Python. With PyMongo 3.3+, you can install PyMongo 3.3+ and any TLS/SSL-related dependencies using the following pip command:

$ python -m pip install pymongo[tls]

Earlier versions of PyMongo require you to manually install the dependencies. For a list of TLS/SSL-related dependencies, see TLS/SSL and PyMongo.

Installing / Upgrading

PyMongo is in the Python Package Index.

Warning

Do not install the “bson” package. PyMongo comes with its own bson package; doing “pip install bson” or “easy_install bson” installs a third-party package that is incompatible with PyMongo.

Installing with pip

We recommend using pip to install pymongo on all platforms:

$ python -m pip install pymongo

To get a specific version of pymongo:

$ python -m pip install pymongo==3.1.1

To upgrade using pip:

$ python -m pip install --upgrade pymongo

Note

pip does not support installing python packages in .egg format. If you would like to install PyMongo from a .egg provided on pypi use easy_install instead.

Installing with easy_install

To use easy_install from setuptools do:

$ python -m easy_install pymongo

To upgrade do:

$ python -m easy_install -U pymongo

Dependencies

PyMongo supports CPython 2.6, 2.7, 3.4+, PyPy, and PyPy3.

Optional dependencies:

GSSAPI authentication requires pykerberos on Unix or WinKerberos on Windows. The correct dependency can be installed automatically along with PyMongo:

$ python -m pip install pymongo[gssapi]

Support for mongodb+srv:// URIs requires dnspython:

$ python -m pip install pymongo[srv]

TLS / SSL support may require ipaddress and certifi or wincertstore depending on the Python version in use. The necessary dependencies can be installed along with PyMongo:

$ python -m pip install pymongo[tls]

You can install all dependencies automatically with the following command:

$ python -m pip install pymongo[gssapi,srv,tls]

Other optional packages:

  • backports.pbkdf2, improves authentication performance with SCRAM-SHA-1, the default authentication mechanism for MongoDB 3.0+. It especially improves performance on Python older than 2.7.8, or on Python 3 before Python 3.4.
  • monotonic adds support for a monotonic clock, which improves reliability in environments where clock adjustments are frequent. Not needed in Python 3.3+.

Dependencies for installing C Extensions on Unix

MongoDB, Inc. does not provide statically linked binary packages for Unix flavors other than OSX. To build the optional C extensions you must have the GNU C compiler (gcc) installed. Depending on your flavor of Unix (or Linux distribution) you may also need a python development package that provides the necessary header files for your version of Python. The package name may vary from distro to distro.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel

Installing from source

If you’d rather install directly from the source (i.e. to stay on the bleeding edge), install the C extension dependencies then check out the latest source from github and install the driver from the resulting tree:

$ git clone git://github.com/mongodb/mongo-python-driver.git pymongo
$ cd pymongo/
$ python setup.py install
Installing from source on OSX

If you want to install PyMongo from source on OSX you will have to install the following to build the C extensions:

Snow Leopard (10.6) - Xcode 3 with ‘UNIX Development Support’.

Snow Leopard Xcode 4: The Python versions shipped with OSX 10.6.x are universal binaries. They support i386, PPC, and (in the case of python2.6) x86_64. Xcode 4 removed support for PPC, causing the distutils version shipped with Apple’s builds of Python to fail to build the C extensions if you have Xcode 4 installed. There is a workaround:

# For Apple-supplied Python2.6 (installed at /usr/bin/python2.6) and
# some builds from python.org
$ env ARCHFLAGS='-arch i386 -arch x86_64' python -m easy_install pymongo

See http://bugs.python.org/issue11623 for a more detailed explanation.

Lion (10.7) and newer - PyMongo’s C extensions can be built against versions of Python 2.7 >= 2.7.4 or Python 3.4+ downloaded from python.org. In all cases Xcode must be installed with ‘UNIX Development Support’.

Xcode 5.1: Starting with version 5.1 the version of clang that ships with Xcode throws an error when it encounters compiler flags it doesn’t recognize. This may cause C extension builds to fail with an error similar to:

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

There are workarounds:

# Apple specified workaround for Xcode 5.1
# easy_install
$ ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future easy_install pymongo
# or pip
$ ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install pymongo

# Alternative workaround using CFLAGS
# easy_install
$ CFLAGS=-Qunused-arguments easy_install pymongo
# or pip
$ CFLAGS=-Qunused-arguments pip install pymongo
Installing from source on Windows

If you want to install PyMongo with C extensions from source the following requirements apply to both CPython and ActiveState’s ActivePython:

64-bit Windows

For Python 3.5 and newer install Visual Studio 2015. For Python 3.4 install Visual Studio 2010. For Python 2.6 and 2.7 install Visual Studio 2008, or the Microsoft Visual C++ Compiler for Python 2.7. You must use the full version of Visual Studio 2010 or 2008 as Visual C++ Express does not provide 64-bit compilers. Make sure that you check the “x64 Compilers and Tools” option under Visual C++.

32-bit Windows

For Python 3.5 and newer install Visual Studio 2015.

For Python 3.4 install Visual C++ 2010 Express.

For Python 2.6 and 2.7 install Visual C++ 2008 Express SP1.

Installing Without C Extensions

By default, the driver attempts to build and install optional C extensions (used for increasing performance) when it is installed. If any extension fails to build the driver will be installed anyway but a warning will be printed.

If you wish to install PyMongo without the C extensions, even if the extensions build properly, it can be done using a command line option to setup.py:

$ python setup.py --no_ext install

Building PyMongo egg Packages

Some organizations do not allow compilers and other build tools on production systems. To install PyMongo on these systems with C extensions you may need to build custom egg packages. Make sure that you have installed the dependencies listed above for your operating system then run the following command in the PyMongo source directory:

$ python setup.py bdist_egg

The egg package can be found in the dist/ subdirectory. The file name will resemble “pymongo-3.6-py2.7-linux-x86_64.egg” but may have a different name depending on your platform and the version of python you use to compile.

Warning

These “binary distributions,” will only work on systems that resemble the environment on which you built the package. In other words, ensure that operating systems and versions of Python and architecture (i.e. “32” or “64” bit) match.

Copy this file to the target system and issue the following command to install the package:

$ sudo python -m easy_install pymongo-3.6-py2.7-linux-x86_64.egg

Installing a beta or release candidate

MongoDB, Inc. may occasionally tag a beta or release candidate for testing by the community before final release. These releases will not be uploaded to pypi but can be found on the github tags page. They can be installed by passing the full URL for the tag to pip:

$ python -m pip install https://github.com/mongodb/mongo-python-driver/archive/3.6rc0.tar.gz

or easy_install:

$ python -m easy_install https://github.com/mongodb/mongo-python-driver/archive/3.6rc0.tar.gz

Tutorial

This tutorial is intended as an introduction to working with MongoDB and PyMongo.

Prerequisites

Before we start, make sure that you have the PyMongo distribution installed. In the Python shell, the following should run without raising an exception:

>>> import pymongo

This tutorial also assumes that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:

$ mongod

Making a Connection with MongoClient

The first step when working with PyMongo is to create a MongoClient to the running mongod instance. Doing so is easy:

>>> from pymongo import MongoClient
>>> client = MongoClient()

The above code will connect on the default host and port. We can also specify the host and port explicitly, as follows:

>>> client = MongoClient('localhost', 27017)

Or use the MongoDB URI format:

>>> client = MongoClient('mongodb://localhost:27017/')

Getting a Database

A single instance of MongoDB can support multiple independent databases. When working with PyMongo you access databases using attribute style access on MongoClient instances:

>>> db = client.test_database

If your database name is such that using attribute style access won’t work (like test-database), you can use dictionary style access instead:

>>> db = client['test-database']

Getting a Collection

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in PyMongo works the same as getting a database:

>>> collection = db.test_collection

or (using dictionary style access):

>>> collection = db['test-collection']

An important note about collections (and databases) in MongoDB is that they are created lazily - none of the above commands have actually performed any operations on the MongoDB server. Collections and databases are created when the first document is inserted into them.

Documents

Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

>>> import datetime
>>> post = {"author": "Mike",
...         "text": "My first blog post!",
...         "tags": ["mongodb", "python", "pymongo"],
...         "date": datetime.datetime.utcnow()}

Note that documents can contain native Python types (like datetime.datetime instances) which will be automatically converted to and from the appropriate BSON types.

Inserting a Document

To insert a document into a collection we can use the insert_one() method:

>>> posts = db.posts
>>> post_id = posts.insert_one(post).inserted_id
>>> post_id
ObjectId('...')

When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection. insert_one() returns an instance of InsertOneResult. For more information on "_id", see the documentation on _id.

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:

>>> db.collection_names(include_system_collections=False)
[u'posts']

Getting a Single Document With find_one()

The most basic type of query that can be performed in MongoDB is find_one(). This method returns a single document matching a query (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match. Here we use find_one() to get the first document from the posts collection:

>>> import pprint
>>> pprint.pprint(posts.find_one())
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

The result is a dictionary matching the one that we inserted previously.

Note

The returned document contains an "_id", which was automatically added on insert.

find_one() also supports querying on specific elements that the resulting document must match. To limit our results to a document with author “Mike” we do:

>>> pprint.pprint(posts.find_one({"author": "Mike"}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

If we try with a different author, like “Eliot”, we’ll get no result:

>>> posts.find_one({"author": "Eliot"})
>>>

Querying By ObjectId

We can also find a post by its _id, which in our example is an ObjectId:

>>> post_id
ObjectId(...)
>>> pprint.pprint(posts.find_one({"_id": post_id}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

Note that an ObjectId is not the same as its string representation:

>>> post_id_as_str = str(post_id)
>>> posts.find_one({"_id": post_id_as_str}) # No result
>>>

A common task in web applications is to get an ObjectId from the request URL and find the matching document. It’s necessary in this case to convert the ObjectId from a string before passing it to find_one:

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

A Note On Unicode Strings

You probably noticed that the regular Python strings we stored earlier look different when retrieved from the server (e.g. u’Mike’ instead of ‘Mike’). A short explanation is in order.

MongoDB stores data in BSON format. BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Regular strings (<type ‘str’>) are validated and stored unaltered. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.

You can read more about Python unicode strings here.

Bulk Inserts

In order to make querying a little more interesting, let’s insert a few more documents. In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to insert_many(). This will insert each document in the list, sending only a single command to the server:

>>> new_posts = [{"author": "Mike",
...               "text": "Another post!",
...               "tags": ["bulk", "insert"],
...               "date": datetime.datetime(2009, 11, 12, 11, 14)},
...              {"author": "Eliot",
...               "title": "MongoDB is fun",
...               "text": "and pretty easy too!",
...               "date": datetime.datetime(2009, 11, 10, 10, 45)}]
>>> result = posts.insert_many(new_posts)
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...')]

There are a couple of interesting things to note about this example:

  • The result from insert_many() now returns two ObjectId instances, one for each inserted document.
  • new_posts[1] has a different “shape” than the other posts - there is no "tags" field and we’ve added a new field, "title". This is what we mean when we say that MongoDB is schema-free.

Querying for More Than One Document

To get more than a single document as the result of a query we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents. For example, we can iterate over every document in the posts collection:

>>> for post in posts.find():
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}

Just like we did with find_one(), we can pass a document to find() to limit the returned results. Here, we get only those documents whose author is “Mike”:

>>> for post in posts.find({"author": "Mike"}):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

Counting

If we just want to know how many documents match a query we can perform a count() operation instead of a full query. We can get a count of all of the documents in a collection:

>>> posts.count()
3

or just of those documents that match a specific query:

>>> posts.find({"author": "Mike"}).count()
2

Range Queries

MongoDB supports many different types of advanced queries. As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author:

>>> d = datetime.datetime(2009, 11, 12, 12)
>>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

Here we use the special "$lt" operator to do a range query, and also call sort() to sort the results by author.

Indexing

Adding indexes can help accelerate certain queries and can also add additional functionality to querying and storing documents. In this example, we’ll demonstrate how to create a unique index on a key that rejects documents whose value for that key already exists in the index.

First, we’ll need to create the index:

>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
...                                   unique=True)
>>> sorted(list(db.profiles.index_information()))
[u'_id_', u'user_id_1']

Notice that we have two indexes now: one is the index on _id that MongoDB creates automatically, and the other is the index on user_id we just created.

Now let’s set up some user profiles:

>>> user_profiles = [
...     {'user_id': 211, 'name': 'Luke'},
...     {'user_id': 212, 'name': 'Ziltoid'}]
>>> result = db.profiles.insert_many(user_profiles)

The index prevents us from inserting a document whose user_id is already in the collection:

>>> new_profile = {'user_id': 213, 'name': 'Drew'}
>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
>>> result = db.profiles.insert_one(new_profile)  # This is fine.
>>> result = db.profiles.insert_one(duplicate_profile)
Traceback (most recent call last):
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }

See also

The MongoDB documentation on indexes

Examples

The examples in this section are intended to give in depth overviews of how to accomplish specific tasks with MongoDB and PyMongo.

Unless otherwise noted, all examples assume that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:

$ mongod

Aggregation Examples

There are several methods of performing aggregations in MongoDB. These examples cover the new aggregation framework, using map reduce and using the group method.

Setup

To start, we’ll insert some example data which we can perform aggregations on:

>>> from pymongo import MongoClient
>>> db = MongoClient().aggregation_example
>>> result = db.things.insert_many([{"x": 1, "tags": ["dog", "cat"]},
...                                 {"x": 2, "tags": ["cat"]},
...                                 {"x": 2, "tags": ["mouse", "cat", "dog"]},
...                                 {"x": 3, "tags": []}])
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')]
Aggregation Framework

This example shows how to use the aggregate() method to use the aggregation framework. We’ll perform a simple aggregation to count the number of occurrences for each tag in the tags array, across the entire collection. To achieve this we need to pass in three operations to the pipeline. First, we need to unwind the tags array, then group by the tags and sum them up, finally we sort by count.

As python dictionaries don’t maintain order you should use SON or collections.OrderedDict where explicit ordering is required eg “$sort”:

Note

aggregate requires server version >= 2.1.0.

>>> from bson.son import SON
>>> pipeline = [
...     {"$unwind": "$tags"},
...     {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
...     {"$sort": SON([("count", -1), ("_id", -1)])}
... ]
>>> import pprint
>>> pprint.pprint(list(db.things.aggregate(pipeline)))
[{u'_id': u'cat', u'count': 3},
 {u'_id': u'dog', u'count': 2},
 {u'_id': u'mouse', u'count': 1}]

To run an explain plan for this aggregation use the command() method:

>>> db.command('aggregate', 'things', pipeline=pipeline, explain=True)
{u'ok': 1.0, u'stages': [...]}

As well as simple aggregations the aggregation framework provides projection capabilities to reshape the returned data. Using projections and aggregation, you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.

See also

The full documentation for MongoDB’s aggregation framework

Map/Reduce

Another option for aggregation is to use the map reduce framework. Here we will define map and reduce functions to also count the number of occurrences for each tag in the tags array, across the entire collection.

Our map function just emits a single (key, 1) pair for each tag in the array:

>>> from bson.code import Code
>>> mapper = Code("""
...               function () {
...                 this.tags.forEach(function(z) {
...                   emit(z, 1);
...                 });
...               }
...               """)

The reduce function sums over all of the emitted values for a given key:

>>> reducer = Code("""
...                function (key, values) {
...                  var total = 0;
...                  for (var i = 0; i < values.length; i++) {
...                    total += values[i];
...                  }
...                  return total;
...                }
...                """)

Note

We can’t just return values.length as the reduce function might be called iteratively on the results of other reduce steps.

Finally, we call map_reduce() and iterate over the result collection:

>>> result = db.things.map_reduce(mapper, reducer, "myresults")
>>> for doc in result.find():
...   pprint.pprint(doc)
...
{u'_id': u'cat', u'value': 3.0}
{u'_id': u'dog', u'value': 2.0}
{u'_id': u'mouse', u'value': 1.0}
Advanced Map/Reduce

PyMongo’s API supports all of the features of MongoDB’s map/reduce engine. One interesting feature is the ability to get more detailed results when desired, by passing full_response=True to map_reduce(). This returns the full response to the map/reduce command, rather than just the result collection:

>>> pprint.pprint(
...     db.things.map_reduce(mapper, reducer, "myresults", full_response=True))
{...u'counts': {u'emit': 6, u'input': 4, u'output': 3, u'reduce': 2},
 u'ok': ...,
 u'result': u'...',
 u'timeMillis': ...}

All of the optional map/reduce parameters are also supported, simply pass them as keyword arguments. In this example we use the query parameter to limit the documents that will be mapped over:

>>> results = db.things.map_reduce(
...     mapper, reducer, "myresults", query={"x": {"$lt": 2}})
>>> for doc in results.find():
...   pprint.pprint(doc)
...
{u'_id': u'cat', u'value': 1.0}
{u'_id': u'dog', u'value': 1.0}

You can use SON or collections.OrderedDict to specify a different database to store the result collection:

>>> from bson.son import SON
>>> pprint.pprint(
...     db.things.map_reduce(
...         mapper,
...         reducer,
...         out=SON([("replace", "results"), ("db", "outdb")]),
...         full_response=True))
{...u'counts': {u'emit': 6, u'input': 4, u'output': 3, u'reduce': 2},
 u'ok': ...,
 u'result': {u'collection': ..., u'db': ...},
 u'timeMillis': ...}

See also

The full list of options for MongoDB’s map reduce engine

Authentication Examples

MongoDB supports several different authentication mechanisms. These examples cover all authentication methods currently supported by PyMongo, documenting Python module and MongoDB version dependencies.

Percent-Escaping Username and Password

Username and password must be percent-escaped with urllib.parse.quote_plus() in Python 3, or urllib.quote_plus() in Python 2, to be used in a MongoDB URI. For example, in Python 3:

>>> from pymongo import MongoClient
>>> import urllib.parse
>>> username = urllib.parse.quote_plus('user')
>>> username
'user'
>>> password = urllib.parse.quote_plus('pass/word')
>>> password
'pass%2Fword'
>>> MongoClient('mongodb://%s:%s@127.0.0.1' % (username, password))
...
SCRAM-SHA-1 (RFC 5802)

New in version 2.8.

SCRAM-SHA-1 is the default authentication mechanism supported by a cluster configured for authentication with MongoDB 3.0 or later. Authentication requires a username, a password, and a database name. The default database name is “admin”, this can be overidden with the authSource option. Credentials can be specified as arguments to MongoClient:

>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
...                      username='user',
...                      password='password',
...                      authSource='the_database',
...                      authMechanism='SCRAM-SHA-1')

Or through the MongoDB URI:

>>> uri = "mongodb://user:password@example.com/the_database?authMechanism=SCRAM-SHA-1"
>>> client = MongoClient(uri)

For best performance install backports.pbkdf2, especially on Python older than 2.7.8, or on Python 3 before Python 3.4.

MONGODB-CR

Before MongoDB 3.0 the default authentication mechanism was MONGODB-CR, the “MongoDB Challenge-Response” protocol:

>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
...                      username='user',
...                      password='password',
...                      authMechanism='MONGODB-CR')
>>>
>>> uri = "mongodb://user:password@example.com/the_database?authMechanism=MONGODB-CR"
>>> client = MongoClient(uri)
Default Authentication Mechanism

If no mechanism is specified, PyMongo automatically uses MONGODB-CR when connected to a pre-3.0 version of MongoDB, and SCRAM-SHA-1 when connected to a recent version.

Default Database and “authSource”

You can specify both a default database and the authentication database in the URI:

>>> uri = "mongodb://user:password@example.com/default_db?authSource=admin"
>>> client = MongoClient(uri)

PyMongo will authenticate on the “admin” database, but the default database will be “default_db”:

>>> # get_database with no "name" argument chooses the DB from the URI
>>> db = MongoClient(uri).get_database()
>>> print(db.name)
'default_db'
MONGODB-X509

New in version 2.6.

The MONGODB-X509 mechanism authenticates a username derived from the distinguished subject name of the X.509 certificate presented by the driver during SSL negotiation. This authentication method requires the use of SSL connections with certificate validation and is available in MongoDB 2.6 and newer:

>>> import ssl
>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
...                      username="<X.509 derived username>"
...                      authMechanism="MONGODB-X509",
...                      ssl=True,
...                      ssl_certfile='/path/to/client.pem',
...                      ssl_cert_reqs=ssl.CERT_REQUIRED,
...                      ssl_ca_certs='/path/to/ca.pem')

MONGODB-X509 authenticates against the $external virtual database, so you do not have to specify a database in the URI:

>>> uri = "mongodb://<X.509 derived username>@example.com/?authMechanism=MONGODB-X509"
>>> client = MongoClient(uri,
...                      ssl=True,
...                      ssl_certfile='/path/to/client.pem',
...                      ssl_cert_reqs=ssl.CERT_REQUIRED,
...                      ssl_ca_certs='/path/to/ca.pem')
>>>

Changed in version 3.4: When connected to MongoDB >= 3.4 the username is no longer required.

GSSAPI (Kerberos)

New in version 2.5.

GSSAPI (Kerberos) authentication is available in the Enterprise Edition of MongoDB.

Unix

To authenticate using GSSAPI you must first install the python kerberos or pykerberos module using easy_install or pip. Make sure you run kinit before using the following authentication methods:

$ kinit mongodbuser@EXAMPLE.COM
mongodbuser@EXAMPLE.COM's Password:
$ klist
Credentials cache: FILE:/tmp/krb5cc_1000
        Principal: mongodbuser@EXAMPLE.COM

  Issued                Expires               Principal
Feb  9 13:48:51 2013  Feb  9 23:48:51 2013  krbtgt/EXAMPLE.COM@EXAMPLE.COM

Now authenticate using the MongoDB URI. GSSAPI authenticates against the $external virtual database so you do not have to specify a database in the URI:

>>> # Note: the kerberos principal must be url encoded.
>>> from pymongo import MongoClient
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@mongo-server.example.com/?authMechanism=GSSAPI"
>>> client = MongoClient(uri)
>>>

The default service name used by MongoDB and PyMongo is mongodb. You can specify a custom service name with the authMechanismProperties option:

>>> from pymongo import MongoClient
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@mongo-server.example.com/?authMechanism=GSSAPI&authMechanismProperties=SERVICE_NAME:myservicename"
>>> client = MongoClient(uri)
Windows (SSPI)

New in version 3.3.

First install the winkerberos module. Unlike authentication on Unix kinit is not used. If the user to authenticate is different from the user that owns the application process provide a password to authenticate:

>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM:mongodbuserpassword@example.com/?authMechanism=GSSAPI"

Two extra authMechanismProperties are supported on Windows platforms:

  • CANONICALIZE_HOST_NAME - Uses the fully qualified domain name (FQDN) of the MongoDB host for the server principal (GSSAPI libraries on Unix do this by default):

    >>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@example.com/?authMechanism=GSSAPI&authMechanismProperties=CANONICALIZE_HOST_NAME:true"
    
  • SERVICE_REALM - This is used when the user’s realm is different from the service’s realm:

    >>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@example.com/?authMechanism=GSSAPI&authMechanismProperties=SERVICE_REALM:otherrealm"
    
SASL PLAIN (RFC 4616)

New in version 2.6.

MongoDB Enterprise Edition version 2.6 and newer support the SASL PLAIN authentication mechanism, initially intended for delegating authentication to an LDAP server. Using the PLAIN mechanism is very similar to MONGODB-CR. These examples use the $external virtual database for LDAP support:

>>> from pymongo import MongoClient
>>> uri = "mongodb://user:password@example.com/?authMechanism=PLAIN&authSource=$external"
>>> client = MongoClient(uri)
>>>

SASL PLAIN is a clear-text authentication mechanism. We strongly recommend that you connect to MongoDB using SSL with certificate validation when using the SASL PLAIN mechanism:

>>> import ssl
>>> from pymongo import MongoClient
>>> uri = "mongodb://user:password@example.com/?authMechanism=PLAIN&authSource=$external"
>>> client = MongoClient(uri,
...                      ssl=True,
...                      ssl_certfile='/path/to/client.pem',
...                      ssl_cert_reqs=ssl.CERT_REQUIRED,
...                      ssl_ca_certs='/path/to/ca.pem')
>>>

Collations

See also

The API docs for collation.

Collations are a new feature in MongoDB version 3.4. They provide a set of rules to use when comparing strings that comply with the conventions of a particular language, such as Spanish or German. If no collation is specified, the server sorts strings based on a binary comparison. Many languages have specific ordering rules, and collations allow users to build applications that adhere to language-specific comparison rules.

In French, for example, the last accent in a given word determines the sorting order. The correct sorting order for the following four words in French is:

cote < côte < coté < côté

Specifying a French collation allows users to sort string fields using the French sort order.

Usage

Users can specify a collation for a collection, an index, or a CRUD command.

Collation Parameters:

Collations can be specified with the Collation model or with plain Python dictionaries. The structure is the same:

Collation(locale=<string>,
          caseLevel=<bool>,
          caseFirst=<string>,
          strength=<int>,
          numericOrdering=<bool>,
          alternate=<string>,
          maxVariable=<string>,
          backwards=<bool>)

The only required parameter is locale, which the server parses as an ICU format locale ID. For example, set locale to en_US to represent US English or fr_CA to represent Canadian French.

For a complete description of the available parameters, see the MongoDB manual.

Assign a Default Collation to a Collection

The following example demonstrates how to create a new collection called contacts and assign a default collation with the fr_CA locale. This operation ensures that all queries that are run against the contacts collection use the fr_CA collation unless another collation is explicitly specified:

from pymongo import MongoClient
from pymongo.collation import Collation

db = MongoClient().test
collection = db.create_collection('contacts',
                                  collation=Collation(locale='fr_CA'))
Assign a Default Collation to an Index

When creating a new index, you can specify a default collation.

The following example shows how to create an index on the name field of the contacts collection, with the unique parameter enabled and a default collation with locale set to fr_CA:

from pymongo import MongoClient
from pymongo.collation import Collation

contacts = MongoClient().test.contacts
contacts.create_index('name',
                      unique=True,
                      collation=Collation(locale='fr_CA'))
Specify a Collation for a Query

Individual queries can specify a collation to use when sorting results. The following example demonstrates a query that runs on the contacts collection in database test. It matches on documents that contain New York in the city field, and sorts on the name field with the fr_CA collation:

from pymongo import MongoClient
from pymongo.collation import Collation

collection = MongoClient().test.contacts
docs = collection.find({'city': 'New York'}).sort('name').collation(
    Collation(locale='fr_CA'))
Other Query Types

You can use collations to control document matching rules for several different types of queries. All the various update and delete methods (update_one(), update_many(), delete_one(), etc.) support collation, and you can create query filters which employ collations to comply with any of the languages and variants available to the locale parameter.

The following example uses a collation with strength set to SECONDARY, which considers only the base character and character accents in string comparisons, but not case sensitivity, for example. All documents in the contacts collection with jürgen (case-insensitive) in the first_name field are updated:

from pymongo import MongoClient
from pymongo.collation import Collation, CollationStrength

contacts = MongoClient().test.contacts
result = contacts.update_many(
    {'first_name': 'jürgen'},
    {'$set': {'verified': 1}},
    collation=Collation(locale='de',
                        strength=CollationStrength.SECONDARY))

Copying a Database

To copy a database within a single mongod process, or between mongod servers, simply connect to the target mongod and use the command() method:

>>> from pymongo import MongoClient
>>> client = MongoClient('target.example.com')
>>> client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name')

To copy from a different mongod server that is not password-protected:

>>> client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name',
                         fromhost='source.example.com')

If the target server is password-protected, authenticate to the “admin” database:

>>> client = MongoClient('target.example.com',
...                      username='administrator',
...                      password='pwd')
>>> client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name',
                         fromhost='source.example.com')

See the authentication examples.

If the source server is password-protected, use the copyDatabase function in the mongo shell.

Versions of PyMongo before 3.0 included a copy_database helper method, but it has been removed.

Bulk Write Operations

This tutorial explains how to take advantage of PyMongo’s bulk write operation features. Executing write operations in batches reduces the number of network round trips, increasing write throughput.

Bulk Insert

New in version 2.6.

A batch of documents can be inserted by passing a list to the insert_many() method. PyMongo will automatically split the batch into smaller sub-batches based on the maximum message size accepted by MongoDB, supporting very large bulk insert operations.

>>> import pymongo
>>> db = pymongo.MongoClient().bulk_example
>>> db.test.insert_many([{'i': i} for i in range(10000)]).inserted_ids
[...]
>>> db.test.count()
10000
Mixed Bulk Write Operations

New in version 2.7.

PyMongo also supports executing mixed bulk write operations. A batch of insert, update, and remove operations can be executed together using the bulk write operations API.

Ordered Bulk Write Operations

Ordered bulk write operations are batched and sent to the server in the order provided for serial execution. The return value is an instance of BulkWriteResult describing the type and count of operations performed.

>>> from pprint import pprint
>>> from pymongo import InsertOne, DeleteMany, ReplaceOne, UpdateOne
>>> result = db.test.bulk_write([
...     DeleteMany({}),  # Remove all documents from the previous example.
...     InsertOne({'_id': 1}),
...     InsertOne({'_id': 2}),
...     InsertOne({'_id': 3}),
...     UpdateOne({'_id': 1}, {'$set': {'foo': 'bar'}}),
...     UpdateOne({'_id': 4}, {'$inc': {'j': 1}}, upsert=True),
...     ReplaceOne({'j': 1}, {'j': 2})])
>>> pprint(result.bulk_api_result)
{'nInserted': 3,
 'nMatched': 2,
 'nModified': 2,
 'nRemoved': 10000,
 'nUpserted': 1,
 'upserted': [{u'_id': 4, u'index': 5}],
 'writeConcernErrors': [],
 'writeErrors': []}

Warning

nModified is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, PyMongo omits this field from the results of a bulk write operation.

The first write failure that occurs (e.g. duplicate key error) aborts the remaining operations, and PyMongo raises BulkWriteError. The details attibute of the exception instance provides the execution results up until the failure occurred and details about the failure - including the operation that caused the failure.

>>> from pymongo import InsertOne, DeleteOne, ReplaceOne
>>> from pymongo.errors import BulkWriteError
>>> requests = [
...     ReplaceOne({'j': 2}, {'i': 5}),
...     InsertOne({'_id': 4}),  # Violates the unique key constraint on _id.
...     DeleteOne({'i': 5})]
>>> try:
...     db.test.bulk_write(requests)
... except BulkWriteError as bwe:
...     pprint(bwe.details)
...
{'nInserted': 0,
 'nMatched': 1,
 'nModified': 1,
 'nRemoved': 0,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [],
 'writeErrors': [{u'code': 11000,
                  u'errmsg': u'...E11000...duplicate key error...',
                  u'index': 1,
                  u'op': {'_id': 4}}]}
Unordered Bulk Write Operations

Unordered bulk write operations are batched and sent to the server in arbitrary order where they may be executed in parallel. Any errors that occur are reported after all operations are attempted.

In the next example the first and third operations fail due to the unique constraint on _id. Since we are doing unordered execution the second and fourth operations succeed.

>>> requests = [
...     InsertOne({'_id': 1}),
...     DeleteOne({'_id': 2}),
...     InsertOne({'_id': 3}),
...     ReplaceOne({'_id': 4}, {'i': 1})]
>>> try:
...     db.test.bulk_write(requests, ordered=False)
... except BulkWriteError as bwe:
...     pprint(bwe.details)
...
{'nInserted': 0,
 'nMatched': 1,
 'nModified': 1,
 'nRemoved': 1,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [],
 'writeErrors': [{u'code': 11000,
                  u'errmsg': u'...E11000...duplicate key error...',
                  u'index': 0,
                  u'op': {'_id': 1}},
                 {u'code': 11000,
                  u'errmsg': u'...E11000...duplicate key error...',
                  u'index': 2,
                  u'op': {'_id': 3}}]}
Write Concern

Bulk operations are executed with the write_concern of the collection they are executed against. Write concern errors (e.g. wtimeout) will be reported after all operations are attempted, regardless of execution order.

::
>>> from pymongo import WriteConcern
>>> coll = db.get_collection(
...     'test', write_concern=WriteConcern(w=3, wtimeout=1))
>>> try:
...     coll.bulk_write([InsertOne({'a': i}) for i in range(4)])
... except BulkWriteError as bwe:
...     pprint(bwe.details)
...
{'nInserted': 4,
 'nMatched': 0,
 'nModified': 0,
 'nRemoved': 0,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [{u'code': 64...
                         u'errInfo': {u'wtimeout': True},
                         u'errmsg': u'waiting for replication timed out'}],
 'writeErrors': []}

Datetimes and Timezones

These examples show how to handle Python datetime.datetime objects correctly in PyMongo.

Basic Usage

PyMongo uses datetime.datetime objects for representing dates and times in MongoDB documents. Because MongoDB assumes that dates and times are in UTC, care should be taken to ensure that dates and times written to the database reflect UTC. For example, the following code stores the current UTC date and time into MongoDB:

>>> result = db.objects.insert_one(
...     {"last_modified": datetime.datetime.utcnow()})

Always use datetime.datetime.utcnow(), which returns the current time in UTC, instead of datetime.datetime.now(), which returns the current local time. Avoid doing this:

>>> result = db.objects.insert_one(
...     {"last_modified": datetime.datetime.now()})

The value for last_modified is very different between these two examples, even though both documents were stored at around the same local time. This will be confusing to the application that reads them:

>>> [doc['last_modified'] for doc in db.objects.find()]  
[datetime.datetime(2015, 7, 8, 18, 17, 28, 324000),
 datetime.datetime(2015, 7, 8, 11, 17, 42, 911000)]

bson.codec_options.CodecOptions has a tz_aware option that enables “aware” datetime.datetime objects, i.e., datetimes that know what timezone they’re in. By default, PyMongo retrieves naive datetimes:

>>> result = db.tzdemo.insert_one(
...     {'date': datetime.datetime(2002, 10, 27, 6, 0, 0)})
>>> db.tzdemo.find_one()['date']
datetime.datetime(2002, 10, 27, 6, 0)
>>> options = CodecOptions(tz_aware=True)
>>> db.get_collection('tzdemo', codec_options=options).find_one()['date']  
datetime.datetime(2002, 10, 27, 6, 0,
                  tzinfo=<bson.tz_util.FixedOffset object at 0x10583a050>)
Saving Datetimes with Timezones

When storing datetime.datetime objects that specify a timezone (i.e. they have a tzinfo property that isn’t None), PyMongo will convert those datetimes to UTC automatically:

>>> import pytz
>>> pacific = pytz.timezone('US/Pacific')
>>> aware_datetime = pacific.localize(
...     datetime.datetime(2002, 10, 27, 6, 0, 0))
>>> result = db.times.insert_one({"date": aware_datetime})
>>> db.times.find_one()['date']
datetime.datetime(2002, 10, 27, 14, 0)
Reading Time

As previously mentioned, by default all datetime.datetime objects returned by PyMongo will be naive but reflect UTC (i.e. the time as stored in MongoDB). By setting the tz_aware option on CodecOptions, datetime.datetime objects will be timezone-aware and have a tzinfo property that reflects the UTC timezone.

PyMongo 3.1 introduced a tzinfo property that can be set on CodecOptions to convert datetime.datetime objects to local time automatically. For example, if we wanted to read all times out of MongoDB in US/Pacific time:

>>> from bson.codec_options import CodecOptions
>>> db.times.find_one()['date']
datetime.datetime(2002, 10, 27, 14, 0)
>>> aware_times = db.times.with_options(codec_options=CodecOptions(
...     tz_aware=True,
...     tzinfo=pytz.timezone('US/Pacific')))
>>> result = aware_times.find_one()
datetime.datetime(2002, 10, 27, 6, 0,  # doctest: +NORMALIZE_WHITESPACE
                  tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)

Geospatial Indexing Example

This example shows how to create and use a GEO2D index in PyMongo.

See also

The MongoDB documentation on

geo

Creating a Geospatial Index

Creating a geospatial index in pymongo is easy:

>>> from pymongo import MongoClient, GEO2D
>>> db = MongoClient().geo_example
>>> db.places.create_index([("loc", GEO2D)])
u'loc_2d'
Inserting Places

Locations in MongoDB are represented using either embedded documents or lists where the first two elements are coordinates. Here, we’ll insert a couple of example locations:

>>> result = db.places.insert_many([{"loc": [2, 5]},
...                                 {"loc": [30, 5]},
...                                 {"loc": [1, 2]},
...                                 {"loc": [4, 4]}])  
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')]
Querying

Using the geospatial index we can find documents near another point:

>>> import pprint
>>> for doc in db.places.find({"loc": {"$near": [3, 6]}}).limit(3):
...   pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}

The $maxDistance operator requires the use of SON:

>>> from bson.son import SON
>>> query = {"loc": SON([("$near", [3, 6]), ("$maxDistance", 100)])}
>>> for doc in db.places.find(query).limit(3):
...   pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}

It’s also possible to query for all items within a given rectangle (specified by lower-left and upper-right coordinates):

>>> query = {"loc": {"$within": {"$box": [[2, 2], [5, 6]]}}}
>>> for doc in db.places.find(query).sort('_id'):
...     pprint.pprint(doc)
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}

Or circle (specified by center point and radius):

>>> query = {"loc": {"$within": {"$center": [[0, 0], 6]}}}
>>> for doc in db.places.find(query).sort('_id'):
...   pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}

geoNear queries are also supported using SON:

>>> from bson.son import SON
>>> db.command(SON([('geoNear', 'places'), ('near', [1, 2])]))
{u'ok': 1.0, u'stats': ...}

Gevent

PyMongo supports Gevent. Simply call Gevent’s monkey.patch_all() before loading any other modules:

>>> # You must call patch_all() *before* importing any other modules
>>> from gevent import monkey
>>> monkey.patch_all()
>>> from pymongo import MongoClient
>>> client = MongoClient()

PyMongo uses thread and socket functions from the Python standard library. Gevent’s monkey-patching replaces those standard functions so that PyMongo does asynchronous I/O with non-blocking sockets, and schedules operations on greenlets instead of threads.

Avoid blocking in Hub.join

By default, PyMongo uses threads to discover and monitor your servers’ topology (see Health Monitoring). If you execute monkey.patch_all() when your application first begins, PyMongo automatically uses greenlets instead of threads.

When shutting down, if your application calls join() on Gevent’s Hub without first terminating these background greenlets, the call to join() blocks indefinitely. You therefore must close or dereference any active MongoClient before exiting.

An example solution to this issue in some application frameworks is a signal handler to end background greenlets when your application receives SIGHUP:

import signal

def graceful_reload(signum, traceback):
    """Explicitly close some global MongoClient object."""
    client.close()

signal.signal(signal.SIGHUP, graceful_reload)

Applications using uWSGI prior to 1.9.16 are affected by this issue, or newer uWSGI versions with the -gevent-wait-for-hub option. See the uWSGI changelog for details.

GridFS Example

This example shows how to use gridfs to store large binary objects (e.g. files) in MongoDB.

See also

The API docs for gridfs.

See also

This blog post for some motivation behind this API.

Setup

We start by creating a GridFS instance to use:

>>> from pymongo import MongoClient
>>> import gridfs
>>>
>>> db = MongoClient().gridfs_example
>>> fs = gridfs.GridFS(db)

Every GridFS instance is created with and will operate on a specific Database instance.

Saving and Retrieving Data

The simplest way to work with gridfs is to use its key/value interface (the put() and get() methods). To write data to GridFS, use put():

>>> a = fs.put(b"hello world")

put() creates a new file in GridFS, and returns the value of the file document’s "_id" key. Given that "_id" we can use get() to get back the contents of the file:

>>> fs.get(a).read()
'hello world'

get() returns a file-like object, so we get the file’s contents by calling read().

In addition to putting a str as a GridFS file, we can also put any file-like object (an object with a read() method). GridFS will handle reading the file in chunk-sized segments automatically. We can also add additional attributes to the file as keyword arguments:

>>> b = fs.put(fs.get(a), filename="foo", bar="baz")
>>> out = fs.get(b)
>>> out.read()
'hello world'
>>> out.filename
u'foo'
>>> out.bar
u'baz'
>>> out.upload_date
datetime.datetime(...)

The attributes we set in put() are stored in the file document, and retrievable after calling get(). Some attributes (like "filename") are special and are defined in the GridFS specification - see that document for more details.

High Availability and PyMongo

PyMongo makes it easy to write highly available applications whether you use a single replica set or a large sharded cluster.

Connecting to a Replica Set

PyMongo makes working with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.

See also

The MongoDB documentation on

rs

Starting a Replica Set

The main replica set documentation contains extensive information about setting up a new replica set or migrating an existing MongoDB setup, be sure to check that out. Here, we’ll just do the bare minimum to get a three node replica set setup locally.

Warning

Replica sets should always use multiple nodes in production - putting all set members on the same physical node is only recommended for testing and development.

We start three mongod processes, each on a different port and with a different dbpath, but all using the same replica set name “foo”.

$ mkdir -p /data/db0 /data/db1 /data/db2
$ mongod --port 27017 --dbpath /data/db0 --replSet foo
$ mongod --port 27018 --dbpath /data/db1 --replSet foo
$ mongod --port 27019 --dbpath /data/db2 --replSet foo
Initializing the Set

At this point all of our nodes are up and running, but the set has yet to be initialized. Until the set is initialized no node will become the primary, and things are essentially “offline”.

To initialize the set we need to connect to a single node and run the initiate command:

>>> from pymongo import MongoClient
>>> c = MongoClient('localhost', 27017)

Note

We could have connected to any of the other nodes instead, but only the node we initiate from is allowed to contain any initial data.

After connecting, we run the initiate command to get things started:

>>> config = {'_id': 'foo', 'members': [
...     {'_id': 0, 'host': 'localhost:27017'},
...     {'_id': 1, 'host': 'localhost:27018'},
...     {'_id': 2, 'host': 'localhost:27019'}]}
>>> c.admin.command("replSetInitiate", config)
{'ok': 1.0, ...}

The three mongod servers we started earlier will now coordinate and come online as a replica set.

Connecting to a Replica Set

The initial connection as made above is a special case for an uninitialized replica set. Normally we’ll want to connect differently. A connection to a replica set can be made using the MongoClient() constructor, specifying one or more members of the set, along with the replica set name. Any of the following connects to the replica set we just created:

>>> MongoClient('localhost', replicaset='foo')
MongoClient(host=['localhost:27017'], replicaset='foo', ...)
>>> MongoClient('localhost:27018', replicaset='foo')
MongoClient(['localhost:27018'], replicaset='foo', ...)
>>> MongoClient('localhost', 27019, replicaset='foo')
MongoClient(['localhost:27019'], replicaset='foo', ...)
>>> MongoClient('mongodb://localhost:27017,localhost:27018/?replicaSet=foo')
MongoClient(['localhost:27017', 'localhost:27018'], replicaset='foo', ...)

The addresses passed to MongoClient() are called the seeds. As long as at least one of the seeds is online, MongoClient discovers all the members in the replica set, and determines which is the current primary and which are secondaries or arbiters. Each seed must be the address of a single mongod. Multihomed and round robin DNS addresses are not supported.

The MongoClient constructor is non-blocking: the constructor returns immediately while the client connects to the replica set using background threads. Note how, if you create a client and immediately print the string representation of its nodes attribute, the list may be empty initially. If you wait a moment, MongoClient discovers the whole replica set:

>>> from time import sleep
>>> c = MongoClient(replicaset='foo'); print(c.nodes); sleep(0.1); print(c.nodes)
frozenset([])
frozenset([(u'localhost', 27019), (u'localhost', 27017), (u'localhost', 27018)])

You need not wait for replica set discovery in your application, however. If you need to do any operation with a MongoClient, such as a find() or an insert_one(), the client waits to discover a suitable member before it attempts the operation.

Handling Failover

When a failover occurs, PyMongo will automatically attempt to find the new primary node and perform subsequent operations on that node. This can’t happen completely transparently, however. Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to the replica set and perform a couple of basic operations:

>>> db = MongoClient("localhost", replicaSet='foo').test
>>> db.test.insert_one({"x": 1}).inserted_id
ObjectId('...')
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}

By checking the host and port, we can see that we’re connected to localhost:27017, which is the current primary:

>>> db.client.address
('localhost', 27017)

Now let’s bring down that node and see what happens when we run our query again:

>>> db.test.find_one()
Traceback (most recent call last):
pymongo.errors.AutoReconnect: ...

We get an AutoReconnect exception. This means that the driver was not able to connect to the old primary (which makes sense, as we killed the server), but that it will attempt to automatically reconnect on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.

On subsequent attempts to run the query we might continue to see this exception. Eventually, however, the replica set will failover and elect a new primary (this should take no more than a couple of seconds in general). At that point the driver will connect to the new primary and the operation will succeed:

>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
>>> db.client.address
('localhost', 27018)

Bring the former primary back up. It will rejoin the set as a secondary. Now we can move to the next section: distributing reads to secondaries.

Secondary Reads

By default an instance of MongoClient sends queries to the primary member of the replica set. To use secondaries for queries we have to change the read preference:

>>> client = MongoClient(
...     'localhost:27017',
...     replicaSet='foo',
...     readPreference='secondaryPreferred')
>>> client.read_preference
SecondaryPreferred(tag_sets=None)

Now all queries will be sent to the secondary members of the set. If there are no secondary members the primary will be used as a fallback. If you have queries you would prefer to never send to the primary you can specify that using the secondary read preference.

By default the read preference of a Database is inherited from its MongoClient, and the read preference of a Collection is inherited from its Database. To use a different read preference use the get_database() method, or the get_collection() method:

>>> from pymongo import ReadPreference
>>> client.read_preference
SecondaryPreferred(tag_sets=None)
>>> db = client.get_database('test', read_preference=ReadPreference.SECONDARY)
>>> db.read_preference
Secondary(tag_sets=None)
>>> coll = db.get_collection('test', read_preference=ReadPreference.PRIMARY)
>>> coll.read_preference
Primary()

You can also change the read preference of an existing Collection with the with_options() method:

>>> coll2 = coll.with_options(read_preference=ReadPreference.NEAREST)
>>> coll.read_preference
Primary()
>>> coll2.read_preference
Nearest(tag_sets=None)

Note that since most database commands can only be sent to the primary of a replica set, the command() method does not obey the Database’s read_preference, but you can pass an explicit read preference to the method:

>>> db.command('dbstats', read_preference=ReadPreference.NEAREST)
{...}

Reads are configured using three options: read preference, tag sets, and local threshold.

Read preference:

Read preference is configured using one of the classes from read_preferences (Primary, PrimaryPreferred, Secondary, SecondaryPreferred, or Nearest). For convenience, we also provide ReadPreference with the following attributes:

  • PRIMARY: Read from the primary. This is the default read preference, and provides the strongest consistency. If no primary is available, raise AutoReconnect.
  • PRIMARY_PREFERRED: Read from the primary if available, otherwise read from a secondary.
  • SECONDARY: Read from a secondary. If no matching secondary is available, raise AutoReconnect.
  • SECONDARY_PREFERRED: Read from a secondary if available, otherwise from the primary.
  • NEAREST: Read from any available member.

Tag sets:

Replica-set members can be tagged according to any criteria you choose. By default, PyMongo ignores tags when choosing a member to read from, but your read preference can be configured with a tag_sets parameter. tag_sets must be a list of dictionaries, each dict providing tag values that the replica set member must match. PyMongo tries each set of tags in turn until it finds a set of tags with at least one matching member. For example, to prefer reads from the New York data center, but fall back to the San Francisco data center, tag your replica set members according to their location and create a MongoClient like so:

>>> from pymongo.read_preferences import Secondary
>>> db = client.get_database(
...     'test', read_preference=Secondary([{'dc': 'ny'}, {'dc': 'sf'}]))
>>> db.read_preference
Secondary(tag_sets=[{'dc': 'ny'}, {'dc': 'sf'}])

MongoClient tries to find secondaries in New York, then San Francisco, and raises AutoReconnect if none are available. As an additional fallback, specify a final, empty tag set, {}, which means “read from any member that matches the mode, ignoring tags.”

See read_preferences for more information.

Local threshold:

If multiple members match the read preference and tag sets, PyMongo reads from among the nearest members, chosen according to ping time. By default, only members whose ping times are within 15 milliseconds of the nearest are used for queries. You can choose to distribute reads among members with higher latencies by setting localThresholdMS to a larger number:

>>> client = pymongo.MongoClient(
...     replicaSet='repl0',
...     readPreference='secondaryPreferred',
...     localThresholdMS=35)

In this case, PyMongo distributes reads among matching members within 35 milliseconds of the closest member’s ping time.

Note

localThresholdMS is ignored when talking to a replica set through a mongos. The equivalent is the localThreshold command line option.

Health Monitoring

When MongoClient is initialized it launches background threads to monitor the replica set for changes in:

  • Health: detect when a member goes down or comes up, or if a different member becomes primary
  • Configuration: detect when members are added or removed, and detect changes in members’ tags
  • Latency: track a moving average of each member’s ping time

Replica-set monitoring ensures queries are continually routed to the proper members as the state of the replica set changes.

mongos Load Balancing

An instance of MongoClient can be configured with a list of addresses of mongos servers:

>>> client = MongoClient('mongodb://host1,host2,host3')

Each member of the list must be a single mongos server. Multihomed and round robin DNS addresses are not supported. The client continuously monitors all the mongoses’ availability, and its network latency to each.

PyMongo distributes operations evenly among the set of mongoses within its localThresholdMS (similar to how it distributes reads to secondaries in a replica set). By default the threshold is 15 ms.

The lowest-latency server, and all servers with latencies no more than localThresholdMS beyond the lowest-latency server’s, receive operations equally. For example, if we have three mongoses:

  • host1: 20 ms
  • host2: 35 ms
  • host3: 40 ms

By default the localThresholdMS is 15 ms, so PyMongo uses host1 and host2 evenly. It uses host1 because its network latency to the driver is shortest. It uses host2 because its latency is within 15 ms of the lowest-latency server’s. But it excuses host3: host3 is 20ms beyond the lowest-latency server.

If we set localThresholdMS to 30 ms all servers are within the threshold:

>>> client = MongoClient('mongodb://host1,host2,host3/?localThresholdMS=30')

Warning

Do not connect PyMongo to a pool of mongos instances through a load balancer. A single socket connection must always be routed to the same mongos instance for proper cursor support.

PyMongo and mod_wsgi

To run your application under mod_wsgi, follow these guidelines:

  • Run mod_wsgi in daemon mode with the WSGIDaemonProcess directive.
  • Assign each application to a separate daemon with WSGIProcessGroup.
  • Use WSGIApplicationGroup %{GLOBAL} to ensure your application is running in the daemon’s main Python interpreter, not a sub interpreter.

For example, this mod_wsgi configuration ensures an application runs in the main interpreter:

<VirtualHost *>
    WSGIDaemonProcess my_process
    WSGIScriptAlias /my_app /path/to/app.wsgi
    WSGIProcessGroup my_process
    WSGIApplicationGroup %{GLOBAL}
</VirtualHost>

If you have multiple applications that use PyMongo, put each in a separate daemon, still in the global application group:

<VirtualHost *>
    WSGIDaemonProcess my_process
    WSGIScriptAlias /my_app /path/to/app.wsgi
    <Location /my_app>
        WSGIProcessGroup my_process
    </Location>

    WSGIDaemonProcess my_other_process
    WSGIScriptAlias /my_other_app /path/to/other_app.wsgi
    <Location /my_other_app>
        WSGIProcessGroup my_other_process
    </Location>

    WSGIApplicationGroup %{GLOBAL}
</VirtualHost>

Background: mod_wsgi can run in “embedded” mode when only WSGIScriptAlias is set, or “daemon” mode with WSGIDaemonProcess. In daemon mode, mod_wsgi can run your application in the Python main interpreter, or in sub interpreters. The correct way to run a PyMongo application is in daemon mode, using the main interpreter.

Python C extensions in general have issues running in multiple Python sub interpreters. These difficulties are explained in the documentation for Py_NewInterpreter and in the Multiple Python Sub Interpreters section of the mod_wsgi documentation.

Beginning with PyMongo 2.7, the C extension for BSON detects when it is running in a sub interpreter and activates a workaround, which adds a small cost to BSON decoding. To avoid this cost, use WSGIApplicationGroup %{GLOBAL} to ensure your application runs in the main interpreter.

Since your program runs in the main interpreter it should not share its process with any other applications, lest they interfere with each other’s state. Each application should have its own daemon process, as shown in the example above.

Tailable Cursors

By default, MongoDB will automatically close a cursor when the client has exhausted all results in the cursor. However, for capped collections you may use a tailable cursor that remains open after the client exhausts the results in the initial cursor.

The following is a basic example of using a tailable cursor to tail the oplog of a replica set member:

import time

import pymongo

client = pymongo.MongoClient()
oplog = client.local.oplog.rs
first = oplog.find().sort('$natural', pymongo.ASCENDING).limit(-1).next()
print(first)
ts = first['ts']

while True:
    # For a regular capped collection CursorType.TAILABLE_AWAIT is the
    # only option required to create a tailable cursor. When querying the
    # oplog the oplog_replay option enables an optimization to quickly
    # find the 'ts' value we're looking for. The oplog_replay option
    # can only be used when querying the oplog.
    cursor = oplog.find({'ts': {'$gt': ts}},
                        cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
                        oplog_replay=True)
    while cursor.alive:
        for doc in cursor:
            ts = doc['ts']
            print(doc)
        # We end up here if the find() returned no documents or if the
        # tailable cursor timed out (no new documents were added to the
        # collection for more than 1 second).
        time.sleep(1)

TLS/SSL and PyMongo

PyMongo supports connecting to MongoDB over TLS/SSL. This guide covers the configuration options supported by PyMongo. See the server documentation to configure MongoDB.

Dependencies

For connections using TLS/SSL, PyMongo may require third party dependencies as determined by your version of Python. With PyMongo 3.3+, you can install PyMongo 3.3+ and any TLS/SSL-related dependencies using the following pip command:

$ python -m pip install pymongo[tls]

Earlier versions of PyMongo require you to manually install the dependencies listed below.

Python 2.x

The ipaddress module is required on all platforms.

When using CPython < 2.7.9 or PyPy < 2.5.1:

  • On Windows, the wincertstore module is required.
  • On all other platforms, the certifi module is required.
Python 3.x

On Windows, the wincertstore module is required when using CPython < 3.4.0 or any version of PyPy3.

Basic configuration

In many cases connecting to MongoDB over TLS/SSL requires nothing more than passing ssl=True as a keyword argument to MongoClient:

>>> client = pymongo.MongoClient('example.com', ssl=True)

Or passing ssl=true in the URI:

>>> client = pymongo.MongoClient('mongodb://example.com/?ssl=true')

This configures PyMongo to connect to the server using TLS, verify the server’s certificate and verify that the host you are attempting to connect to is listed by that certificate.

Certificate verification policy

By default, PyMongo is configured to require a certificate from the server when TLS is enabled. This is configurable using the ssl_cert_reqs option. To disable this requirement pass ssl.CERT_NONE as a keyword parameter:

>>> import ssl
>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_cert_reqs=ssl.CERT_NONE)

Or, in the URI:

>>> uri = 'mongodb://example.com/?ssl=true&ssl_cert_reqs=CERT_NONE'
>>> client = pymongo.MongoClient(uri)
Specifying a CA file

In some cases you may want to configure PyMongo to use a specific set of CA certificates. This is most often the case when using “self-signed” server certificates. The ssl_ca_certs option takes a path to a CA file. It can be passed as a keyword argument:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_ca_certs='/path/to/ca.pem')

Or, in the URI:

>>> uri = 'mongodb://example.com/?ssl=true&ssl_ca_certs=/path/to/ca.pem'
>>> client = pymongo.MongoClient(uri)
Specifying a certificate revocation list

Python 2.7.9+ (pypy 2.5.1+) and 3.4+ provide support for certificate revocation lists. The ssl_crlfile option takes a path to a CRL file. It can be passed as a keyword argument:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_crlfile='/path/to/crl.pem')

Or, in the URI:

>>> uri = 'mongodb://example.com/?ssl=true&ssl_crlfile=/path/to/crl.pem'
>>> client = pymongo.MongoClient(uri)
Client certificates

PyMongo can be configured to present a client certificate using the ssl_certfile option:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_certfile='/path/to/client.pem')

If the private key for the client certificate is stored in a separate file use the ssl_keyfile option:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_certfile='/path/to/client.pem',
...                              ssl_keyfile='/path/to/key.pem')

Python 2.7.9+ (pypy 2.5.1+) and 3.3+ support providing a password or passphrase to decrypt encrypted private keys. Use the ssl_pem_passphrase option:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_certfile='/path/to/client.pem',
...                              ssl_keyfile='/path/to/key.pem',
...                              ssl_pem_passphrase=<passphrase>)

These options can also be passed as part of the MongoDB URI.

Frequently Asked Questions

Is PyMongo thread-safe?

PyMongo is thread-safe and provides built-in connection pooling for threaded applications.

Is PyMongo fork-safe?

PyMongo is not fork-safe. Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to the inherent incompatibilities between fork(), threads, and locks described below. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.

MongoClient spawns multiple threads to run background tasks such as monitoring connected servers. These threads share state that is protected by instances of Lock, which are themselves not fork-safe. The driver is therefore subject to the same limitations as any other multithreaded code that uses Lock (and mutexes in general). One of these limitations is that the locks become useless after fork(). During the fork, all locks are copied over to the child process in the same state as they were in the parent: if they were locked, the copied locks are also locked. The child created by fork() only has one thread, so any locks that were taken out by other threads in the parent will never be released in the child. The next time the child process attempts to acquire one of these locks, deadlock occurs.

For a long but interesting read about the problems of Python locks in multithreaded contexts with fork(), see http://bugs.python.org/issue6721.

How does connection pooling work in PyMongo?

Every MongoClient instance has a built-in connection pool per server in your MongoDB topology. These pools open sockets on demand to support the number of concurrent MongoDB operations that your multi-threaded application requires. There is no thread-affinity for sockets.

The size of each connection pool is capped at maxPoolSize, which defaults to 100. If there are maxPoolSize connections to a server and all are in use, the next request to that server will wait until one of the connections becomes available.

The client instance opens one additional socket per server in your MongoDB topology for monitoring the server’s state.

For example, a client connected to a 3-node replica set opens 3 monitoring sockets. It also opens as many sockets as needed to support a multi-threaded application’s concurrent operations on each server, up to maxPoolSize. With a maxPoolSize of 100, if the application only uses the primary (the default), then only the primary connection pool grows and the total connections is at most 103. If the application uses a ReadPreference to query the secondaries, their pools also grow and the total connections can reach 303.

It is possible to set the minimum number of concurrent connections to each server with minPoolSize, which defaults to 0. The connection pool will be initialized with this number of sockets. If sockets are closed due to any network errors, causing the total number of sockets (both in use and idle) to drop below the minimum, more sockets are opened until the minimum is reached.

The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced can be set with maxIdleTime, which defaults to None (no limit).

The default configuration for a MongoClient works for most applications:

client = MongoClient(host, port)

Create this client once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient.

To support extremely high numbers of concurrent MongoDB operations within one process, increase maxPoolSize:

client = MongoClient(host, port, maxPoolSize=200)

… or make it unbounded:

client = MongoClient(host, port, maxPoolSize=None)

By default, any number of threads are allowed to wait for sockets to become available, and they can wait any length of time. Override waitQueueMultiple to cap the number of waiting threads. E.g., to keep the number of waiters less than or equal to 500:

client = MongoClient(host, port, maxPoolSize=50, waitQueueMultiple=10)

When 500 threads are waiting for a socket, the 501st that needs a socket raises ExceededMaxWaiters. Use this option to bound the amount of queueing in your application during a load spike, at the cost of additional exceptions.

Once the pool reaches its max size, additional threads are allowed to wait indefinitely for sockets to become available, unless you set waitQueueTimeoutMS:

client = MongoClient(host, port, waitQueueTimeoutMS=100)

A thread that waits more than 100ms (in this example) for a socket raises ConnectionFailure. Use this option if it is more important to bound the duration of operations during a load spike than it is to complete every operation.

When close() is called by any thread, all idle sockets are closed, and all sockets that are in use will be closed as they are returned to the pool.

Does PyMongo support Python 3?

PyMongo supports CPython 3.4+ and PyPy3. See the Python 3 FAQ for details.

Does PyMongo support asynchronous frameworks like Gevent, asyncio, Tornado, or Twisted?

PyMongo fully supports Gevent.

To use MongoDB with asyncio or Tornado, see the Motor project.

For Twisted, see TxMongo. Its stated mission is to keep feature parity with PyMongo.

Why does PyMongo add an _id field to all of my documents?

When a document is inserted to MongoDB using insert_one(), insert_many(), or bulk_write(), and that document does not include an _id field, PyMongo automatically adds one for you, set to an instance of ObjectId. For example:

>>> my_doc = {'x': 1}
>>> collection.insert_one(my_doc)
<pymongo.results.InsertOneResult object at 0x7f3fc25bd640>
>>> my_doc
{'x': 1, '_id': ObjectId('560db337fba522189f171720')}

Users often discover this behavior when calling insert_many() with a list of references to a single document raises BulkWriteError. Several Python idioms lead to this pitfall:

>>> doc = {}
>>> collection.insert_many(doc for _ in range(10))
Traceback (most recent call last):
...
pymongo.errors.BulkWriteError: batch op errors occurred
>>> doc
{'_id': ObjectId('560f171cfba52279f0b0da0c')}

>>> docs = [{}]
>>> collection.insert_many(docs * 10)
Traceback (most recent call last):
...
pymongo.errors.BulkWriteError: batch op errors occurred
>>> docs
[{'_id': ObjectId('560f1933fba52279f0b0da0e')}]

PyMongo adds an _id field in this manner for a few reasons:

  • All MongoDB documents are required to have an _id field.
  • If PyMongo were to insert a document without an _id MongoDB would add one itself, but it would not report the value back to PyMongo.
  • Copying the document to insert before adding the _id field would be prohibitively expensive for most high write volume applications.

If you don’t want PyMongo to add an _id to your documents, insert only documents that already have an _id field, added by your application.

Key order in subdocuments – why does my query work in the shell but not PyMongo?

The key-value pairs in a BSON document can have any order (except that _id is always first). The mongo shell preserves key order when reading and writing data. Observe that “b” comes before “a” when we create the document and when it is displayed:

> // mongo shell.
> db.collection.insert( { "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } } )
WriteResult({ "nInserted" : 1 })
> db.collection.find()
{ "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }

PyMongo represents BSON documents as Python dicts by default, and the order of keys in dicts is not defined. That is, a dict declared with the “a” key first is the same, to Python, as one with “b” first:

>>> print({'a': 1.0, 'b': 1.0})
{'a': 1.0, 'b': 1.0}
>>> print({'b': 1.0, 'a': 1.0})
{'a': 1.0, 'b': 1.0}

Therefore, Python dicts are not guaranteed to show keys in the order they are stored in BSON. Here, “a” is shown before “b”:

>>> print(collection.find_one())
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

To preserve order when reading BSON, use the SON class, which is a dict that remembers its key order. First, get a handle to the collection, configured to use SON instead of dict:

>>> from bson import CodecOptions, SON
>>> opts = CodecOptions(document_class=SON)
>>> opts
CodecOptions(document_class=<class 'bson.son.SON'>,
             tz_aware=False,
             uuid_representation=PYTHON_LEGACY,
             unicode_decode_error_handler='strict',
             tzinfo=None)
>>> collection_son = collection.with_options(codec_options=opts)

Now, documents and subdocuments in query results are represented with SON objects:

>>> print(collection_son.find_one())
SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])

The subdocument’s actual storage layout is now visible: “b” is before “a”.

Because a dict’s key order is not defined, you cannot predict how it will be serialized to BSON. But MongoDB considers subdocuments equal only if their keys have the same order. So if you use a dict to query on a subdocument it may not match:

>>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None
True

Swapping the key order in your query makes no difference:

>>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None
True

… because, as we saw above, Python considers the two dicts the same.

There are two solutions. First, you can match the subdocument field-by-field:

>>> collection.find_one({'subdocument.a': 1.0,
...                      'subdocument.b': 1.0})
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The query matches any subdocument with an “a” of 1.0 and a “b” of 1.0, regardless of the order you specify them in Python or the order they are stored in BSON. Additionally, this query now matches subdocuments with additional keys besides “a” and “b”, whereas the previous query required an exact match.

The second solution is to use a SON to specify the key order:

>>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
>>> collection.find_one(query)
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The key order you use when you create a SON is preserved when it is serialized to BSON and used as a query. Thus you can create a subdocument that exactly matches the subdocument in the collection.

What does CursorNotFound cursor id not valid at server mean?

Cursors in MongoDB can timeout on the server if they’ve been open for a long time without any operations being performed on them. This can lead to an CursorNotFound exception being raised when attempting to iterate the cursor.

How do I change the timeout value for cursors?

MongoDB doesn’t support custom timeouts for cursors, but cursor timeouts can be turned off entirely. Pass no_cursor_timeout=True to find().

How can I store decimal.Decimal instances?

PyMongo >= 3.4 supports the Decimal128 BSON type introduced in MongoDB 3.4. See decimal128 for more information.

MongoDB <= 3.2 only supports IEEE 754 floating points - the same as the Python float type. The only way PyMongo could store Decimal instances to these versions of MongoDB would be to convert them to this standard, so you’d really only be storing floats anyway - we force users to do this conversion explicitly so that they are aware that it is happening.

I’m saving 9.99 but when I query my document contains 9.9900000000000002 - what’s going on here?

The database representation is 9.99 as an IEEE floating point (which is common to MongoDB and Python as well as most other modern languages). The problem is that 9.99 cannot be represented exactly with a double precision floating point - this is true in some versions of Python as well:

>>> 9.99
9.9900000000000002

The result that you get when you save 9.99 with PyMongo is exactly the same as the result you’d get saving it with the JavaScript shell or any of the other languages (and as the data you’re working with when you type 9.99 into a Python program).

Can you add attribute style access for documents?

This request has come up a number of times but we’ve decided not to implement anything like this. The relevant jira case has some information about the decision, but here is a brief summary:

  1. This will pollute the attribute namespace for documents, so could lead to subtle bugs / confusing errors when using a key with the same name as a dictionary method.
  2. The only reason we even use SON objects instead of regular dictionaries is to maintain key ordering, since the server requires this for certain operations. So we’re hesitant to needlessly complicate SON (at some point it’s hypothetically possible we might want to revert back to using dictionaries alone, without breaking backwards compatibility for everyone).
  3. It’s easy (and Pythonic) for new users to deal with documents, since they behave just like dictionaries. If we start changing their behavior it adds a barrier to entry for new users - another class to learn.

What is the correct way to handle time zones with PyMongo?

See Datetimes and Timezones for examples on how to handle datetime objects correctly.

How can I save a datetime.date instance?

PyMongo doesn’t support saving datetime.date instances, since there is no BSON type for dates without times. Rather than having the driver enforce a convention for converting datetime.date instances to datetime.datetime instances for you, any conversion should be performed in your client code.

When I query for a document by ObjectId in my web application I get no result

It’s common in web applications to encode documents’ ObjectIds in URLs, like:

"/posts/50b3bda58a02fb9a84d8991e"

Your web framework will pass the ObjectId portion of the URL to your request handler as a string, so it must be converted to ObjectId before it is passed to find_one(). It is a common mistake to forget to do this conversion. Here’s how to do it correctly in Flask (other web frameworks are similar):

from pymongo import MongoClient
from bson.objectid import ObjectId

from flask import Flask, render_template

client = MongoClient()
app = Flask(__name__)

@app.route("/posts/<_id>")
def show_post(_id):
   # NOTE!: converting _id from string to ObjectId before passing to find_one
   post = client.db.posts.find_one({'_id': ObjectId(_id)})
   return render_template('post.html', post=post)

if __name__ == "__main__":
    app.run()

How can I use PyMongo from Django?

Django is a popular Python web framework. Django includes an ORM, django.db. Currently, there’s no official MongoDB backend for Django.

django-mongodb-engine is an unofficial MongoDB backend that supports Django aggregations, (atomic) updates, embedded objects, Map/Reduce and GridFS. It allows you to use most of Django’s built-in features, including the ORM, admin, authentication, site and session frameworks and caching.

However, it’s easy to use MongoDB (and PyMongo) from Django without using a Django backend. Certain features of Django that require django.db (admin, authentication and sessions) will not work using just MongoDB, but most of what Django provides can still be used.

One project which should make working with MongoDB and Django easier is mango. Mango is a set of MongoDB backends for Django sessions and authentication (bypassing django.db entirely).

Does PyMongo work with mod_wsgi?

Yes. See the configuration guide for PyMongo and mod_wsgi.

How can I use something like Python’s json module to encode my documents to JSON?

json_util is PyMongo’s built in, flexible tool for using Python’s json module with BSON documents and MongoDB Extended JSON. The json module won’t work out of the box with all documents from PyMongo as PyMongo supports some special types (like ObjectId and DBRef) that are not supported in JSON.

python-bsonjs is a fast BSON to MongoDB Extended JSON converter built on top of libbson. python-bsonjs does not depend on PyMongo and can offer a nice performance improvement over json_util. python-bsonjs works best with PyMongo when using RawBSONDocument.

Why do I get OverflowError decoding dates stored by another language’s driver?

PyMongo decodes BSON datetime values to instances of Python’s datetime.datetime. Instances of datetime.datetime are limited to years between datetime.MINYEAR (usually 1) and datetime.MAXYEAR (usually 9999). Some MongoDB drivers (e.g. the PHP driver) can store BSON datetimes with year values far outside those supported by datetime.datetime.

There are a few ways to work around this issue. One option is to filter out documents with values outside of the range supported by datetime.datetime:

>>> from datetime import datetime
>>> coll = client.test.dates
>>> cur = coll.find({'dt': {'$gte': datetime.min, '$lte': datetime.max}})

Another option, assuming you don’t need the datetime field, is to filter out just that field:

>>> cur = coll.find({}, projection={'dt': False})

Using PyMongo with Multiprocessing

On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. For example:

# Each process creates its own instance of MongoClient.
def func():
    db = pymongo.MongoClient().mydb
    # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

Never do this:

client = pymongo.MongoClient()

# Each child process attempts to copy a global MongoClient
# created in the parent process. Never do this.
def func():
  db = client.mydb
  # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to inherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.

Compatibility Policy

Semantic Versioning

PyMongo’s version numbers follow semantic versioning: each version number is structured “major.minor.patch”. Patch releases fix bugs, minor releases add features (and may fix bugs), and major releases include API changes that break backwards compatibility (and may add features and fix bugs).

Deprecation

Before we remove a feature in a major release, PyMongo’s maintainers make an effort to release at least one minor version that deprecates it. We add “DEPRECATED” to the feature’s documentation, and update the code to raise a DeprecationWarning. You can ensure your code is future-proof by running your code with the latest PyMongo release and looking for DeprecationWarnings.

Starting with Python 2.7, the interpreter silences DeprecationWarnings by default. For example, the following code uses the deprecated insert method but does not raise any warning:

# "insert.py"
from pymongo import MongoClient

client = MongoClient()
client.test.test.insert({})

To print deprecation warnings to stderr, run python with “-Wd”:

$ python -Wd insert.py
insert.py:4: DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.
  client.test.test.insert({})

You can turn warnings into exceptions with “python -We”:

$ python -We insert.py
Traceback (most recent call last):
  File "insert.py", line 4, in <module>
    client.test.test.insert({})
  File "/home/durin/work/mongo-python-driver/pymongo/collection.py", line 2906, in insert
    "instead.", DeprecationWarning, stacklevel=2)
DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.

If your own code’s test suite passes with “python -We” then it uses no deprecated PyMongo features.

See also

The Python documentation on the warnings module, and the -W command line option.

API Documentation

The PyMongo distribution contains three top-level packages for interacting with MongoDB. bson is an implementation of the BSON format, pymongo is a full-featured driver for MongoDB, and gridfs is a set of tools for working with the GridFS storage specification.

bson – BSON (Binary JSON) Encoding and Decoding

BSON (Binary JSON) encoding and decoding.

The mapping from Python types to BSON types is as follows:

Python Type BSON Type Supported Direction
None null both
bool boolean both
int [1] int32 / int64 py -> bson
long int64 py -> bson
bson.int64.Int64 int64 both
float number (real) both
string string py -> bson
unicode string both
list array both
dict / SON object both
datetime.datetime [2] [3] date both
bson.regex.Regex regex both
compiled re [4] regex py -> bson
bson.binary.Binary binary both
bson.objectid.ObjectId oid both
bson.dbref.DBRef dbref both
None undefined bson -> py
unicode code bson -> py
bson.code.Code code py -> bson
unicode symbol bson -> py
bytes (Python 3) [5] binary both

Note that, when using Python 2.x, to save binary data it must be wrapped as an instance of bson.binary.Binary. Otherwise it will be saved as a BSON string and retrieved as unicode. Users of Python 3.x can use the Python bytes type.

[1]A Python int will be saved as a BSON int32 or BSON int64 depending on its size. A BSON int32 will always decode to a Python int. A BSON int64 will always decode to a Int64.
[2]datetime.datetime instances will be rounded to the nearest millisecond when saved
[3]all datetime.datetime instances are treated as naive. clients should always use UTC.
[4]Regex instances and regular expression objects from re.compile() are both saved as BSON regular expressions. BSON regular expressions are decoded as Regex instances.
[5]The bytes type from Python 3.x is encoded as BSON binary with subtype 0. In Python 3.x it will be decoded back to bytes. In Python 2.x it will be decoded to an instance of Binary with subtype 0.
class bson.BSON

BSON (Binary JSON) data.

decode(codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None))

Decode this BSON data.

By default, returns a BSON document represented as a Python dict. To use a different MutableMapping class, configure a CodecOptions:

>>> import collections  # From Python standard library.
>>> import bson
>>> from bson.codec_options import CodecOptions
>>> data = bson.BSON.encode({'a': 1})
>>> decoded_doc = bson.BSON.decode(data)
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.BSON.decode(data, codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>
Parameters:

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

classmethod encode(document, check_keys=False, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None))

Encode a document to a new BSON instance.

A document can be any mapping type (like dict).

Raises TypeError if document is not a mapping type, or contains keys that are not instances of basestring (str in python 3). Raises InvalidDocument if document cannot be converted to BSON.

Parameters:
  • document: mapping type representing a document
  • check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising InvalidDocument in either case
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced uuid_subtype option with codec_options.

bson.decode_all(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None))

Decode BSON data to multiple documents.

data must be a string of concatenated, valid, BSON-encoded documents.

Parameters:
  • data: BSON data
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

bson.decode_file_iter(file_obj, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None))

Decode bson data from a file to multiple documents as a generator.

Works similarly to the decode_all function, but reads from the file object in chunks and parses bson in chunks, yielding one document at a time.

Parameters:
  • file_obj: A file object containing BSON data.
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

bson.decode_iter(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None))

Decode BSON data to multiple documents as a generator.

Works similarly to the decode_all function, but yields one document at a time.

data must be a string of concatenated, valid, BSON-encoded documents.

Parameters:
  • data: BSON data
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

bson.gen_list_name()

Generate “keys” for encoded lists in the sequence b”0”, b”1”, b”2”, …

The first 1000 keys are returned from a pre-built cache. All subsequent keys are generated on the fly.

bson.has_c()

Is the C extension installed?

bson.is_valid(bson)

Check that the given string represents valid BSON data.

Raises TypeError if bson is not an instance of str (bytes in python 3). Returns True if bson is valid BSON, False otherwise.

Parameters:
  • bson: the data to be validated

Sub-modules:

binary – Tools for representing binary data to be stored in MongoDB
bson.binary.BINARY_SUBTYPE = 0

BSON binary subtype for binary data.

This is the default subtype for binary data.

bson.binary.FUNCTION_SUBTYPE = 1

BSON binary subtype for functions.

bson.binary.OLD_BINARY_SUBTYPE = 2

Old BSON binary subtype for binary data.

This is the old default subtype, the current default is BINARY_SUBTYPE.

bson.binary.OLD_UUID_SUBTYPE = 3

Old BSON binary subtype for a UUID.

uuid.UUID instances will automatically be encoded by bson using this subtype.

New in version 2.1.

bson.binary.UUID_SUBTYPE = 4

BSON binary subtype for a UUID.

This is the new BSON binary subtype for UUIDs. The current default is OLD_UUID_SUBTYPE but will change to this in a future release.

Changed in version 2.1: Changed to subtype 4.

bson.binary.STANDARD = 4

The standard UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtype UUID_SUBTYPE.

New in version 3.0.

bson.binary.PYTHON_LEGACY = 3

The Python legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtype OLD_UUID_SUBTYPE.

New in version 3.0.

bson.binary.JAVA_LEGACY = 5

The Java legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary subtype OLD_UUID_SUBTYPE, using the Java driver’s legacy byte order.

Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.

New in version 2.3.

bson.binary.CSHARP_LEGACY = 6

The C#/.net legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary subtype OLD_UUID_SUBTYPE, using the C# driver’s legacy byte order.

Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.

New in version 2.3.

bson.binary.MD5_SUBTYPE = 5

BSON binary subtype for an MD5 hash.

bson.binary.USER_DEFINED_SUBTYPE = 128

BSON binary subtype for any user defined structure.

class bson.binary.Binary(data, subtype=BINARY_SUBTYPE)

Bases: bytes

Representation of BSON binary data.

This is necessary because we want to represent Python strings as the BSON string type. We need to wrap binary data so we can tell the difference between what should be considered binary data and what should be considered a string when we encode to BSON.

Raises TypeError if data is not an instance of str (bytes in python 3) or subtype is not an instance of int. Raises ValueError if subtype is not in [0, 256).

Note

In python 3 instances of Binary with subtype 0 will be decoded directly to bytes.

Parameters:
  • data: the binary data to represent
  • subtype (optional): the binary subtype to use
subtype

Subtype of this binary data.

class bson.binary.UUIDLegacy(obj)

Bases: bson.binary.Binary

UUID wrapper to support working with UUIDs stored as PYTHON_LEGACY.

>>> import uuid
>>> from bson.binary import Binary, UUIDLegacy, STANDARD
>>> from bson.codec_options import CodecOptions
>>> my_uuid = uuid.uuid4()
>>> coll = db.get_collection('test',
...                          CodecOptions(uuid_representation=STANDARD))
>>> coll.insert_one({'uuid': Binary(my_uuid.bytes, 3)}).inserted_id
ObjectId('...')
>>> coll.find({'uuid': my_uuid}).count()
0
>>> coll.find({'uuid': UUIDLegacy(my_uuid)}).count()
1
>>> coll.find({'uuid': UUIDLegacy(my_uuid)})[0]['uuid']
UUID('...')
>>>
>>> # Convert from subtype 3 to subtype 4
>>> doc = coll.find_one({'uuid': UUIDLegacy(my_uuid)})
>>> coll.replace_one({"_id": doc["_id"]}, doc).matched_count
1
>>> coll.find({'uuid': UUIDLegacy(my_uuid)}).count()
0
>>> coll.find({'uuid': {'$in': [UUIDLegacy(my_uuid), my_uuid]}}).count()
1
>>> coll.find_one({'uuid': my_uuid})['uuid']
UUID('...')

Raises TypeError if obj is not an instance of UUID.

Parameters:
  • obj: An instance of UUID.
uuid

UUID instance wrapped by this UUIDLegacy instance.

code – Tools for representing JavaScript code

Tools for representing JavaScript code in BSON.

class bson.code.Code(code, scope=None, **kwargs)

Bases: str

BSON’s JavaScript code type.

Raises TypeError if code is not an instance of basestring (str in python 3) or scope is not None or an instance of dict.

Scope variables can be set by passing a dictionary as the scope argument or by using keyword arguments. If a variable is set as a keyword argument it will override any setting for that variable in the scope dictionary.

Parameters:
  • code: A string containing JavaScript code to be evaluated or another instance of Code. In the latter case, the scope of code becomes this Code’s scope.
  • scope (optional): dictionary representing the scope in which code should be evaluated - a mapping from identifiers (as strings) to values. Defaults to None. This is applied after any scope associated with a given code above.
  • **kwargs (optional): scope variables can also be passed as keyword arguments. These are applied after scope and code.

Changed in version 3.4: The default value for scope is None instead of {}.

scope

Scope dictionary for this instance or None.

codec_options – Tools for specifying BSON codec options

Tools for specifying BSON codec options.

class bson.codec_options.CodecOptions

Encapsulates BSON options used in CRUD operations.

Parameters:
  • document_class: BSON documents returned in queries will be decoded to an instance of this class. Must be a subclass of MutableMapping. Defaults to dict.
  • tz_aware: If True, BSON datetimes will be decoded to timezone aware instances of datetime. Otherwise they will be naive. Defaults to False.
  • uuid_representation: The BSON representation to use when encoding and decoding instances of UUID. Defaults to PYTHON_LEGACY.
  • unicode_decode_error_handler: The error handler to use when decoding an invalid BSON string. Valid options include ‘strict’, ‘replace’, and ‘ignore’. Defaults to ‘strict’.
  • tzinfo: A tzinfo subclass that specifies the timezone to/from which datetime objects should be encoded/decoded.

Warning

Care must be taken when changing unicode_decode_error_handler from its default value (‘strict’). The ‘replace’ and ‘ignore’ modes should not be used when documents retrieved from the server will be modified in the client application and stored back to the server.

with_options(**kwargs)

Make a copy of this CodecOptions, overriding some options:

>>> from bson.codec_options import DEFAULT_CODEC_OPTIONS
>>> DEFAULT_CODEC_OPTIONS.tz_aware
False
>>> options = DEFAULT_CODEC_OPTIONS.with_options(tz_aware=True)
>>> options.tz_aware
True

New in version 3.5.

dbref – Tools for manipulating DBRefs (references to documents stored in MongoDB)

Tools for manipulating DBRefs (references to MongoDB documents).

class bson.dbref.DBRef(collection, id, database=None, _extra={}, **kwargs)

Initialize a new DBRef.

Raises TypeError if collection or database is not an instance of basestring (str in python 3). database is optional and allows references to documents to work across databases. Any additional keyword arguments will create additional fields in the resultant embedded document.

Parameters:
  • collection: name of the collection the document is stored in
  • id: the value of the document’s "_id" field
  • database (optional): name of the database to reference
  • **kwargs (optional): additional keyword arguments will create additional, custom fields

See also

The MongoDB documentation on

dbrefs

as_doc()

Get the SON document representation of this DBRef.

Generally not needed by application developers

collection

Get the name of this DBRef’s collection as unicode.

database

Get the name of this DBRef’s database.

Returns None if this DBRef doesn’t specify a database.

id

Get this DBRef’s _id.

decimal128 – Support for BSON Decimal128

Tools for working with the BSON decimal128 type.

New in version 3.4.

Note

The Decimal128 BSON type requires MongoDB 3.4+.

class bson.decimal128.Decimal128(value)

BSON Decimal128 type:

>>> Decimal128(Decimal("0.0005"))
Decimal128('0.0005')
>>> Decimal128("0.0005")
Decimal128('0.0005')
>>> Decimal128((3474527112516337664, 5))
Decimal128('0.0005')
Parameters:
  • value: An instance of decimal.Decimal, string, or tuple of (high bits, low bits) from Binary Integer Decimal (BID) format.

Note

Decimal128 uses an instance of decimal.Context configured for IEEE-754 Decimal128 when validating parameters. Signals like decimal.InvalidOperation, decimal.Inexact, and decimal.Overflow are trapped and raised as exceptions:

>>> Decimal128(".13.1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]
>>>
>>> Decimal128("1E-6177")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.Inexact: [<class 'decimal.Inexact'>]
>>>
>>> Decimal128("1E6145")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.Overflow: [<class 'decimal.Overflow'>, <class 'decimal.Rounded'>]

To ensure the result of a calculation can always be stored as BSON Decimal128 use the context returned by create_decimal128_context():

>>> import decimal
>>> decimal128_ctx = create_decimal128_context()
>>> with decimal.localcontext(decimal128_ctx) as ctx:
...     Decimal128(ctx.create_decimal(".13.3"))
...
Decimal128('NaN')
>>>
>>> with decimal.localcontext(decimal128_ctx) as ctx:
...     Decimal128(ctx.create_decimal("1E-6177"))
...
Decimal128('0E-6176')
>>>
>>> with decimal.localcontext(DECIMAL128_CTX) as ctx:
...     Decimal128(ctx.create_decimal("1E6145"))
...
Decimal128('Infinity')

To match the behavior of MongoDB’s Decimal128 implementation str(Decimal(value)) may not match str(Decimal128(value)) for NaN values:

>>> Decimal128(Decimal('NaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('-NaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('sNaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('-sNaN'))
Decimal128('NaN')

However, to_decimal() will return the exact value:

>>> Decimal128(Decimal('NaN')).to_decimal()
Decimal('NaN')
>>> Decimal128(Decimal('-NaN')).to_decimal()
Decimal('-NaN')
>>> Decimal128(Decimal('sNaN')).to_decimal()
Decimal('sNaN')
>>> Decimal128(Decimal('-sNaN')).to_decimal()
Decimal('-sNaN')

Two instances of Decimal128 compare equal if their Binary Integer Decimal encodings are equal:

>>> Decimal128('NaN') == Decimal128('NaN')
True
>>> Decimal128('NaN').bid == Decimal128('NaN').bid
True

This differs from decimal.Decimal comparisons for NaN:

>>> Decimal('NaN') == Decimal('NaN')
False
bid

The Binary Integer Decimal (BID) encoding of this instance.

classmethod from_bid(value)

Create an instance of Decimal128 from Binary Integer Decimal string.

Parameters:
  • value: 16 byte string (128-bit IEEE 754-2008 decimal floating point in Binary Integer Decimal (BID) format).
to_decimal()

Returns an instance of decimal.Decimal for this Decimal128.

bson.decimal128.create_decimal128_context()

Returns an instance of decimal.Context appropriate for working with IEEE-754 128-bit decimal floating point values.

errors – Exceptions raised by the bson package

Exceptions raised by the BSON package.

exception bson.errors.BSONError

Base class for all BSON exceptions.

exception bson.errors.InvalidBSON

Raised when trying to create a BSON object from invalid data.

exception bson.errors.InvalidDocument

Raised when trying to create a BSON object from an invalid document.

exception bson.errors.InvalidId

Raised when trying to create an ObjectId from invalid data.

exception bson.errors.InvalidStringData

Raised when trying to encode a string containing non-UTF8 data.

int64 – Tools for representing BSON int64

New in version 3.0.

A BSON wrapper for long (int in python3)

class bson.int64.Int64

Representation of the BSON int64 type.

This is necessary because every integral number is an int in Python 3. Small integral numbers are encoded to BSON int32 by default, but Int64 numbers will always be encoded to BSON int64.

Parameters:
  • value: the numeric value to represent
json_util – Tools for using Python’s json module with BSON documents

Tools for using Python’s json module with BSON documents.

This module provides two helper methods dumps and loads that wrap the native json methods and provide explicit BSON conversion to and from JSON. JSONOptions provides a way to control how JSON is emitted and parsed, with the default being the legacy PyMongo format. json_util can also generate Canonical or Relaxed Extended JSON when CANONICAL_JSON_OPTIONS or RELAXED_JSON_OPTIONS is provided, respectively.

Example usage (deserialization):

>>> from bson.json_util import loads
>>> loads('[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$scope": {}, "$code": "function x() { return 1; }"}}, {"bin": {"$type": "80", "$binary": "AQIDBA=="}}]')
[{u'foo': [1, 2]}, {u'bar': {u'hello': u'world'}}, {u'code': Code('function x() { return 1; }', {})}, {u'bin': Binary('...', 128)}]

Example usage (serialization):

>>> from bson import Binary, Code
>>> from bson.json_util import dumps
>>> dumps([{'foo': [1, 2]},
...        {'bar': {'hello': 'world'}},
...        {'code': Code("function x() { return 1; }", {})},
...        {'bin': Binary(b"")}])
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }", "$scope": {}}}, {"bin": {"$binary": "AQIDBA==", "$type": "00"}}]'

Example usage (with CANONICAL_JSON_OPTIONS):

>>> from bson import Binary, Code
>>> from bson.json_util import dumps, CANONICAL_JSON_OPTIONS
>>> dumps([{'foo': [1, 2]},
...        {'bar': {'hello': 'world'}},
...        {'code': Code("function x() { return 1; }")},
...        {'bin': Binary(b"")}],
...       json_options=CANONICAL_JSON_OPTIONS)
'[{"foo": [{"$numberInt": "1"}, {"$numberInt": "2"}]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }"}}, {"bin": {"$binary": {"base64": "AQIDBA==", "subType": "00"}}}]'

Example usage (with RELAXED_JSON_OPTIONS):

>>> from bson import Binary, Code
>>> from bson.json_util import dumps, RELAXED_JSON_OPTIONS
>>> dumps([{'foo': [1, 2]},
...        {'bar': {'hello': 'world'}},
...        {'code': Code("function x() { return 1; }")},
...        {'bin': Binary(b"")}],
...       json_options=RELAXED_JSON_OPTIONS)
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }"}}, {"bin": {"$binary": {"base64": "AQIDBA==", "subType": "00"}}}]'

Alternatively, you can manually pass the default to json.dumps(). It won’t handle Binary and Code instances (as they are extended strings you can’t provide custom defaults), but it will be faster as there is less recursion.

Note

If your application does not need the flexibility offered by JSONOptions and spends a large amount of time in the json_util module, look to python-bsonjs for a nice performance improvement. python-bsonjs is a fast BSON to MongoDB Extended JSON converter for Python built on top of libbson. python-bsonjs works best with PyMongo when using RawBSONDocument.

Changed in version 2.8: The output format for Timestamp has changed from ‘{“t”: <int>, “i”: <int>}’ to ‘{“$timestamp”: {“t”: <int>, “i”: <int>}}’. This new format will be decoded to an instance of Timestamp. The old format will continue to be decoded to a python dict as before. Encoding to the old format is no longer supported as it was never correct and loses type information. Added support for $numberLong and $undefined - new in MongoDB 2.6 - and parsing $date in ISO-8601 format.

Changed in version 2.7: Preserves order when rendering SON, Timestamp, Code, Binary, and DBRef instances.

Changed in version 2.3: Added dumps and loads helpers to automatically handle conversion to and from json and supports Binary and Code

class bson.json_util.DatetimeRepresentation
LEGACY = 0

Legacy MongoDB Extended JSON datetime representation.

datetime.datetime instances will be encoded to JSON in the format {“$date”: <dateAsMilliseconds>}, where dateAsMilliseconds is a 64-bit signed integer giving the number of milliseconds since the Unix epoch UTC. This was the default encoding before PyMongo version 3.4.

New in version 3.4.

NUMBERLONG = 1

NumberLong datetime representation.

datetime.datetime instances will be encoded to JSON in the format {“$date”: {“$numberLong”: “<dateAsMilliseconds>”}}, where dateAsMilliseconds is the string representation of a 64-bit signed integer giving the number of milliseconds since the Unix epoch UTC.

New in version 3.4.

ISO8601 = 2

ISO-8601 datetime representation.

datetime.datetime instances greater than or equal to the Unix epoch UTC will be encoded to JSON in the format {“$date”: “<ISO-8601>”}. datetime.datetime instances before the Unix epoch UTC will be encoded as if the datetime representation is NUMBERLONG.

New in version 3.4.

class bson.json_util.JSONMode
LEGACY = 0

Legacy Extended JSON representation.

In this mode, dumps() produces PyMongo’s legacy non-standard JSON output. Consider using RELAXED or CANONICAL instead.

New in version 3.5.

RELAXED = 1

Relaxed Extended JSON representation.

In this mode, dumps() produces Relaxed Extended JSON, a mostly JSON-like format. Consider using this for things like a web API, where one is sending a document (or a projection of a document) that only uses ordinary JSON type primitives. In particular, the int, Int64, and float numeric types are represented in the native JSON number format. This output is also the most human readable and is useful for debugging and documentation.

See also

The specification for Relaxed Extended JSON.

New in version 3.5.

CANONICAL = 2

Canonical Extended JSON representation.

In this mode, dumps() produces Canonical Extended JSON, a type preserving format. Consider using this for things like testing, where one has to precisely specify expected types in JSON. In particular, the int, Int64, and float numeric types are encoded with type wrappers.

See also

The specification for Canonical Extended JSON.

New in version 3.5.

class bson.json_util.JSONOptions

Encapsulates JSON options for dumps() and loads().

Raises ConfigurationError on Python 2.6 if simplejson >= 2.1.0 is not installed and document_class is not the default (dict).

Parameters:
  • strict_number_long: If True, Int64 objects are encoded to MongoDB Extended JSON’s Strict mode type NumberLong, ie '{"$numberLong": "<number>" }'. Otherwise they will be encoded as an int. Defaults to False.
  • datetime_representation: The representation to use when encoding instances of datetime.datetime. Defaults to LEGACY.
  • strict_uuid: If True, uuid.UUID object are encoded to MongoDB Extended JSON’s Strict mode type Binary. Otherwise it will be encoded as '{"$uuid": "<hex>" }'. Defaults to False.
  • json_mode: The JSONMode to use when encoding BSON types to Extended JSON. Defaults to LEGACY.
  • document_class: BSON documents returned by loads() will be decoded to an instance of this class. Must be a subclass of collections.MutableMapping. Defaults to dict.
  • uuid_representation: The BSON representation to use when encoding and decoding instances of uuid.UUID. Defaults to PYTHON_LEGACY.
  • tz_aware: If True, MongoDB Extended JSON’s Strict mode type Date will be decoded to timezone aware instances of datetime.datetime. Otherwise they will be naive. Defaults to True.
  • tzinfo: A datetime.tzinfo subclass that specifies the timezone from which datetime objects should be decoded. Defaults to utc.
  • args: arguments to CodecOptions
  • kwargs: arguments to CodecOptions

See also

The specification for Relaxed and Canonical Extended JSON.

New in version 3.4.

Changed in version 3.5: Accepts the optional parameter json_mode.

bson.json_util.LEGACY_JSON_OPTIONS = JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>)

JSONOptions for encoding to PyMongo’s legacy JSON format.

See also

The documentation for bson.json_util.JSONMode.LEGACY.

New in version 3.5.

bson.json_util.DEFAULT_JSON_OPTIONS = JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>)

The default JSONOptions for JSON encoding/decoding.

The same as LEGACY_JSON_OPTIONS. This will change to RELAXED_JSON_OPTIONS in a future release.

New in version 3.4.

bson.json_util.CANONICAL_JSON_OPTIONS = JSONOptions(strict_number_long=True, datetime_representation=1, strict_uuid=True, json_mode=2, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>)

JSONOptions for Canonical Extended JSON.

See also

The documentation for bson.json_util.JSONMode.CANONICAL.

New in version 3.5.

bson.json_util.RELAXED_JSON_OPTIONS = JSONOptions(strict_number_long=False, datetime_representation=2, strict_uuid=True, json_mode=1, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>)

JSONOptions for Relaxed Extended JSON.

See also

The documentation for bson.json_util.JSONMode.RELAXED.

New in version 3.5.

bson.json_util.STRICT_JSON_OPTIONS = JSONOptions(strict_number_long=True, datetime_representation=2, strict_uuid=True, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>)

DEPRECATED - JSONOptions for MongoDB Extended JSON’s Strict mode encoding.

New in version 3.4.

Changed in version 3.5: Deprecated. Use RELAXED_JSON_OPTIONS or CANONICAL_JSON_OPTIONS instead.

bson.json_util.dumps(obj, *args, **kwargs)

Helper function that wraps json.dumps().

Recursive function that handles all BSON types including Binary and Code.

Parameters:

Changed in version 3.4: Accepts optional parameter json_options. See JSONOptions.

Changed in version 2.7: Preserves order when rendering SON, Timestamp, Code, Binary, and DBRef instances.

bson.json_util.loads(s, *args, **kwargs)

Helper function that wraps json.loads().

Automatically passes the object_hook for BSON type conversion.

Raises TypeError, ValueError, KeyError, or InvalidId on invalid MongoDB Extended JSON.

Parameters:

Changed in version 3.5: Parses Relaxed and Canonical Extended JSON as well as PyMongo’s legacy format. Now raises TypeError or ValueError when parsing JSON type wrappers with values of the wrong type or any extra keys.

Changed in version 3.4: Accepts optional parameter json_options. See JSONOptions.

bson.json_util.object_pairs_hook(pairs, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>))
bson.json_util.object_hook(dct, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>))
bson.json_util.default(obj, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>))
max_key – Representation for the MongoDB internal MaxKey type

Representation for the MongoDB internal MaxKey type.

class bson.max_key.MaxKey

MongoDB internal MaxKey type.

Changed in version 2.7: MaxKey now implements comparison operators.

min_key – Representation for the MongoDB internal MinKey type

Representation for the MongoDB internal MinKey type.

class bson.min_key.MinKey

MongoDB internal MinKey type.

Changed in version 2.7: MinKey now implements comparison operators.

objectid – Tools for working with MongoDB ObjectIds

Tools for working with MongoDB ObjectIds.

class bson.objectid.ObjectId(oid=None)

Initialize a new ObjectId.

An ObjectId is a 12-byte unique identifier consisting of:

  • a 4-byte value representing the seconds since the Unix epoch,
  • a 3-byte machine identifier,
  • a 2-byte process id, and
  • a 3-byte counter, starting with a random value.

By default, ObjectId() creates a new unique identifier. The optional parameter oid can be an ObjectId, or any 12 bytes or, in Python 2, any 12-character str.

For example, the 12 bytes b’foo-bar-quux’ do not follow the ObjectId specification but they are acceptable input:

>>> ObjectId(b'foo-bar-quux')
ObjectId('666f6f2d6261722d71757578')

oid can also be a unicode or str of 24 hex digits:

>>> ObjectId('0123456789ab0123456789ab')
ObjectId('0123456789ab0123456789ab')
>>>
>>> # A u-prefixed unicode literal:
>>> ObjectId(u'0123456789ab0123456789ab')
ObjectId('0123456789ab0123456789ab')

Raises InvalidId if oid is not 12 bytes nor 24 hex digits, or TypeError if oid is not an accepted type.

Parameters:
  • oid (optional): a valid ObjectId.

See also

The MongoDB documentation on

objectids

str(o)

Get a hex encoded version of ObjectId o.

The following property always holds:

>>> o = ObjectId()
>>> o == ObjectId(str(o))
True

This representation is useful for urls or other places where o.binary is inappropriate.

binary

12-byte binary representation of this ObjectId.

classmethod from_datetime(generation_time)

Create a dummy ObjectId instance with a specific generation time.

This method is useful for doing range queries on a field containing ObjectId instances.

Warning

It is not safe to insert a document containing an ObjectId generated using this method. This method deliberately eliminates the uniqueness guarantee that ObjectIds generally provide. ObjectIds generated with this method should be used exclusively in queries.

generation_time will be converted to UTC. Naive datetime instances will be treated as though they already contain UTC.

An example using this helper to get documents where "_id" was generated before January 1, 2010 would be:

>>> gen_time = datetime.datetime(2010, 1, 1)
>>> dummy_id = ObjectId.from_datetime(gen_time)
>>> result = collection.find({"_id": {"$lt": dummy_id}})
Parameters:
  • generation_time: datetime to be used as the generation time for the resulting ObjectId.
generation_time

A datetime.datetime instance representing the time of generation for this ObjectId.

The datetime.datetime is timezone aware, and represents the generation time in UTC. It is precise to the second.

classmethod is_valid(oid)

Checks if a oid string is valid or not.

Parameters:
  • oid: the object id to validate

New in version 2.3.

raw_bson – Tools for representing raw BSON documents.

Tools for representing raw BSON documents.

class bson.raw_bson.RawBSONDocument(bson_bytes, codec_options=None)

Create a new RawBSONDocument.

Parameters:
  • bson_bytes: the BSON bytes that compose this document
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.5: If a CodecOptions is passed in, its document_class must be RawBSONDocument.

items()

Lazily decode and iterate elements in this document.

raw

The raw BSON bytes composing this document.

regex – Tools for representing MongoDB regular expressions

New in version 2.7.

Tools for representing MongoDB regular expressions.

class bson.regex.Regex(pattern, flags=0)

BSON regular expression data.

This class is useful to store and retrieve regular expressions that are incompatible with Python’s regular expression dialect.

Parameters:
  • pattern: string
  • flags: (optional) an integer bitmask, or a string of flag characters like “im” for IGNORECASE and MULTILINE
classmethod from_native(regex)

Convert a Python regular expression into a Regex instance.

Note that in Python 3, a regular expression compiled from a str has the re.UNICODE flag set. If it is undesirable to store this flag in a BSON regular expression, unset it first:

>>> pattern = re.compile('.*')
>>> regex = Regex.from_native(pattern)
>>> regex.flags ^= re.UNICODE
>>> db.collection.insert({'pattern': regex})
Parameters:
  • regex: A regular expression object from re.compile().

Warning

Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query.

try_compile()

Compile this Regex as a Python regular expression.

Warning

Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query. try_compile() may raise re.error.

son – Tools for working with SON, an ordered mapping

Tools for creating and manipulating SON, the Serialized Ocument Notation.

Regular dictionaries can be used instead of SON objects, but not when the order of keys is important. A SON object can be used just like a normal Python dictionary.

class bson.son.SON(data=None, **kwargs)

SON data.

A subclass of dict that maintains ordering of keys and provides a few extra niceties for dealing with SON. SON provides an API similar to collections.OrderedDict from Python 2.7+.

clear() → None. Remove all items from D.
copy() → a shallow copy of D
get(key, default=None)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items
keys() → a set-like object providing a view on D's keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a

2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

to_dict()

Convert a SON document to a normal Python dictionary instance.

This is trickier than just dict(…) because it needs to be recursive.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values
timestamp – Tools for representing MongoDB internal Timestamps

Tools for representing MongoDB internal Timestamps.

class bson.timestamp.Timestamp(time, inc)

Create a new Timestamp.

This class is only for use with the MongoDB opLog. If you need to store a regular timestamp, please use a datetime.

Raises TypeError if time is not an instance of :class: int or datetime, or inc is not an instance of int. Raises ValueError if time or inc is not in [0, 2**32).

Parameters:
  • time: time in seconds since epoch UTC, or a naive UTC datetime, or an aware datetime
  • inc: the incrementing counter
as_datetime()

Return a datetime instance corresponding to the time portion of this Timestamp.

The returned datetime’s timezone is UTC.

inc

Get the inc portion of this Timestamp.

time

Get the time portion of this Timestamp.

tz_util – Utilities for dealing with timezones in Python

Timezone related utilities for BSON.

class bson.tz_util.FixedOffset(offset, name)

Fixed offset timezone, in minutes east from UTC.

Implementation based from the Python standard library documentation. Defining __getinitargs__ enables pickling / copying.

dst(dt)

datetime -> DST offset as timedelta positive east of UTC.

tzname(dt)

datetime -> string name of time zone.

utcoffset(dt)

datetime -> timedelta showing offset from UTC, negative values indicating West of UTC

bson.tz_util.utc = <bson.tz_util.FixedOffset object>

Fixed offset timezone representing UTC.

pymongo – Python driver for MongoDB

Python driver for MongoDB.

pymongo.version = '3.6.0'

str(object=’‘) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

pymongo.MongoClient

Alias for pymongo.mongo_client.MongoClient.

pymongo.MongoReplicaSetClient

Alias for pymongo.mongo_replica_set_client.MongoReplicaSetClient.

pymongo.ReadPreference

Alias for pymongo.read_preferences.ReadPreference.

pymongo.has_c()

Is the C extension installed?

pymongo.MIN_SUPPORTED_WIRE_VERSION

The minimum wire protocol version PyMongo supports.

pymongo.MAX_SUPPORTED_WIRE_VERSION

The maximum wire protocol version PyMongo supports.

Sub-modules:

database – Database level operations

Database level operations.

pymongo.auth.MECHANISMS = frozenset({'GSSAPI', 'MONGODB-CR', 'SCRAM-SHA-1', 'MONGODB-X509', 'DEFAULT', 'PLAIN'})

The authentication mechanisms supported by PyMongo.

pymongo.OFF = 0

No database profiling.

pymongo.SLOW_ONLY = 1

Only profile slow operations.

pymongo.ALL = 2

Profile all operations.

class pymongo.database.Database(client, name, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a database by client and name.

Raises TypeError if name is not an instance of basestring (str in python 3). Raises InvalidName if name is not a valid database name.

Parameters:
  • client: A MongoClient instance.
  • name: The database name.
  • codec_options (optional): An instance of CodecOptions. If None (the default) client.codec_options is used.
  • read_preference (optional): The read preference to use. If None (the default) client.read_preference is used.
  • write_concern (optional): An instance of WriteConcern. If None (the default) client.write_concern is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) client.read_concern is used.

See also

The MongoDB documentation on

databases

Changed in version 3.2: Added the read_concern option.

Changed in version 3.0: Added the codec_options, read_preference, and write_concern options. Database no longer returns an instance of Collection for attribute names with leading underscores. You must use dict-style lookups instead::

db[‘__my_collection__’]

Not:

db.__my_collection__
db[collection_name] || db.collection_name

Get the collection_name Collection of Database db.

Raises InvalidName if an invalid collection name is used.

Note

Use dictionary style access if collection_name is an attribute of the Database class eg: db[collection_name].

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

read_concern

Read only access to the ReadConcern of this instance.

New in version 3.2.

add_son_manipulator(manipulator)

Add a new son manipulator to this database.

DEPRECATED - add_son_manipulator is deprecated.

Changed in version 3.0: Deprecated add_son_manipulator.

add_user(name, password=None, read_only=None, session=None, **kwargs)

DEPRECATED: Create user name with password password.

Add a new user with permissions for this Database.

Note

Will change the password if user name already exists.

Note

add_user is deprecated and will be removed in PyMongo 4.0. Starting with MongoDB 2.6 user management is handled with four database commands, createUser, usersInfo, updateUser, and dropUser.

To create a user:

db.command("createUser", "admin", pwd="password", roles=["root"])

To create a read-only user:

db.command("createUser", "user", pwd="password", roles=["read"])

To change a password:

db.command("updateUser", "user", pwd="newpassword")

Or change roles:

db.command("updateUser", "user", roles=["readWrite"])
Parameters:
  • name: the name of the user to create
  • password (optional): the password of the user to create. Can not be used with the userSource argument.
  • read_only (optional): if True the user will be read only
  • **kwargs (optional): optional fields for the user document (e.g. userSource, otherDBRoles, or roles). See http://docs.mongodb.org/manual/reference/privilege-documents for more information.
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter. Deprecated add_user.

Changed in version 2.5: Added kwargs support for optional fields introduced in MongoDB 2.4

Changed in version 2.2: Added support for read only users

authenticate(name=None, password=None, source=None, mechanism='DEFAULT', **kwargs)

DEPRECATED: Authenticate to use this database.

Warning

Starting in MongoDB 3.6, calling authenticate() invalidates all existing cursors. It may also leave logical sessions open on the server for up to 30 minutes until they time out.

Authentication lasts for the life of the underlying client instance, or until logout() is called.

Raises TypeError if (required) name, (optional) password, or (optional) source is not an instance of basestring (str in python 3).

Note

  • This method authenticates the current connection, and will also cause all new socket connections in the underlying client instance to be authenticated automatically.
  • Authenticating more than once on the same database with different credentials is not supported. You must call logout() before authenticating with new credentials.
  • When sharing a client instance between multiple threads, all threads will share the authentication. If you need different authentication profiles for different purposes you must use distinct client instances.
Parameters:
  • name: the name of the user to authenticate. Optional when mechanism is MONGODB-X509 and the MongoDB server version is >= 3.4.
  • password (optional): the password of the user to authenticate. Not used with GSSAPI or MONGODB-X509 authentication.
  • source (optional): the database to authenticate on. If not specified the current database is used.
  • mechanism (optional): See MECHANISMS for options. By default, use SCRAM-SHA-1 with MongoDB 3.0 and later, MONGODB-CR (MongoDB Challenge Response protocol) for older servers.
  • authMechanismProperties (optional): Used to specify authentication mechanism specific options. To specify the service name for GSSAPI authentication pass authMechanismProperties=’SERVICE_NAME:<service name>’

Changed in version 3.5: Deprecated. Authenticating multiple users conflicts with support for logical sessions in MongoDB 3.6. To authenticate as multiple users, create multiple instances of MongoClient.

New in version 2.8: Use SCRAM-SHA-1 with MongoDB 3.0 and later.

Changed in version 2.5: Added the source and mechanism parameters. authenticate() now raises a subclass of PyMongoError if authentication fails due to invalid credentials or configuration issues.

See also

The MongoDB documentation on

authenticate

client

The client instance for this Database.

collection_names(include_system_collections=True, session=None)

Get a list of all the collection names in this database.

Parameters:
  • include_system_collections (optional): if False list will not include system collections (e.g system.indexes)
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

command(command, value=1, check=True, allowable_errors=None, read_preference=Primary(), codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None), session=None, **kwargs)

Issue a MongoDB command.

Send command command to the database and return the response. If command is an instance of basestring (str in python 3) then the command {command: value} will be sent. Otherwise, command must be an instance of dict and will be sent as is.

Any additional keyword arguments will be added to the final command document before it is sent.

For example, a command like {buildinfo: 1} can be sent using:

>>> db.command("buildinfo")

For a command where the value matters, like {collstats: collection_name} we can do:

>>> db.command("collstats", collection_name)

For commands that take additional arguments we can use kwargs. So {filemd5: object_id, root: file_root} becomes:

>>> db.command("filemd5", object_id, root=file_root)
Parameters:
  • command: document representing the command to be issued, or the name of the command (for simple commands only).

    Note

    the order of keys in the command document is significant (the “verb” must come first), so commands which require multiple keys (e.g. findandmodify) should use an instance of SON or a string and kwargs instead of a Python dict.

  • value (optional): value to use for the command verb when command is passed as a string

  • check (optional): check the response for errors, raising OperationFailure if there are any

  • allowable_errors: if check is True, error messages in this list will be ignored by error-checking

  • read_preference: The read preference for this operation. See read_preferences for options.

  • codec_options: A CodecOptions instance.

  • session (optional): a ClientSession.

  • **kwargs (optional): additional keyword arguments will be added to the command document before it is sent

Note

command() does not obey read_preference or codec_options. You must use the read_preference and codec_options parameters instead.

Changed in version 3.6: Added session parameter.

Changed in version 3.0: Removed the as_class, fields, uuid_subtype, tag_sets, and secondary_acceptable_latency_ms option. Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object. Added the codec_options parameter.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

Changed in version 2.3: Added tag_sets and secondary_acceptable_latency_ms options.

Changed in version 2.2: Added support for as_class - the class you want to use for the resulting documents

See also

The MongoDB documentation on

commands

create_collection(name, codec_options=None, read_preference=None, write_concern=None, read_concern=None, session=None, **kwargs)

Create a new Collection in this database.

Normally collection creation is automatic. This method should only be used to specify options on creation. CollectionInvalid will be raised if the collection already exists.

Options should be passed as keyword arguments to this method. Supported options vary with MongoDB release. Some examples include:

  • “size”: desired initial size for the collection (in bytes). For capped collections this size is the max size of the collection.
  • “capped”: if True, this is a capped collection
  • “max”: maximum number of objects if capped (optional)

See the MongoDB documentation for a full list of supported options by server version.

Parameters:
  • name: the name of the collection to create
  • codec_options (optional): An instance of CodecOptions. If None (the default) the codec_options of this Database is used.
  • read_preference (optional): The read preference to use. If None (the default) the read_preference of this Database is used.
  • write_concern (optional): An instance of WriteConcern. If None (the default) the write_concern of this Database is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) the read_concern of this Database is used.
  • collation (optional): An instance of Collation.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional keyword arguments will be passed as options for the create collection command

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

Changed in version 3.0: Added the codec_options, read_preference, and write_concern options.

Changed in version 2.2: Removed deprecated argument: options

current_op(include_all=False, session=None)

Get information on operations currently running.

Parameters:
  • include_all (optional): if True also list currently idle operations in the result
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

dereference(dbref, session=None, **kwargs)

Dereference a DBRef, getting the document it points to.

Raises TypeError if dbref is not an instance of DBRef. Returns a document, or None if the reference does not point to a valid document. Raises ValueError if dbref has a database specified that is different from the current database.

Parameters:
  • dbref: the reference
  • session (optional): a ClientSession.
  • **kwargs (optional): any additional keyword arguments are the same as the arguments to find().

Changed in version 3.6: Added session parameter.

drop_collection(name_or_collection, session=None)

Drop a collection.

Parameters:
  • name_or_collection: the name of a collection to drop or the collection object itself
  • session (optional): a ClientSession.

Note

The write_concern of this database is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Apply this database’s write concern automatically to this operation when connected to MongoDB >= 3.4.

error()

DEPRECATED: Get the error if one occurred on the last operation.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Changed in version 2.8: Deprecated.

eval(code, *args)

DEPRECATED: Evaluate a JavaScript expression in MongoDB.

Parameters:
  • code: string representation of JavaScript code to be evaluated
  • args (optional): additional positional arguments are passed to the code being evaluated

Warning

the eval command is deprecated in MongoDB 3.0 and will be removed in a future server version.

get_collection(name, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a Collection with the given name and options.

Useful for creating a Collection with different codec options, read preference, and/or write concern from this Database.

>>> db.read_preference
Primary()
>>> coll1 = db.test
>>> coll1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> coll2 = db.get_collection(
...     'test', read_preference=ReadPreference.SECONDARY)
>>> coll2.read_preference
Secondary(tag_sets=None)
Parameters:
incoming_copying_manipulators

DEPRECATED: All incoming SON copying manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

incoming_manipulators

DEPRECATED: All incoming SON manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

last_status()

DEPRECATED: Get status information from the last operation.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Returns a SON object with status information.

Changed in version 2.8: Deprecated.

list_collection_names(session=None)

Get a list of all the collection names in this database.

Parameters:

New in version 3.6.

list_collections(session=None, **kwargs)

Get a cursor over the collectons of this database.

Parameters:
  • session (optional): a ClientSession.
  • **kwargs (optional): Optional parameters of the listCollections command can be passed as keyword arguments to this method. The supported options differ by server version.
Returns:

An instance of CommandCursor.

New in version 3.6.

logout()

DEPRECATED: Deauthorize use of this database.

Warning

Starting in MongoDB 3.6, calling logout() invalidates all existing cursors. It may also leave logical sessions open on the server for up to 30 minutes until they time out.

name

The name of this Database.

outgoing_copying_manipulators

DEPRECATED: All outgoing SON copying manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

outgoing_manipulators

DEPRECATED: All outgoing SON manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

previous_error()

DEPRECATED: Get the most recent error on this database.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Only returns errors that have occurred since the last call to reset_error_history(). Returns None if no such errors have occurred.

Changed in version 2.8: Deprecated.

profiling_info(session=None)

Returns a list containing current profiling information.

Parameters:

Changed in version 3.6: Added session parameter.

See also

The MongoDB documentation on

profiling

profiling_level(session=None)

Get the database’s current profiling level.

Returns one of (OFF, SLOW_ONLY, ALL).

Parameters:

Changed in version 3.6: Added session parameter.

See also

The MongoDB documentation on

profiling

remove_user(name, session=None)

DEPRECATED: Remove user name from this Database.

User name will no longer have permissions to access this Database.

Note

remove_user is deprecated and will be removed in PyMongo 4.0. Use the dropUser command instead:

db.command("dropUser", "user")
Parameters:
  • name: the name of the user to remove
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter. Deprecated remove_user.

reset_error_history()

DEPRECATED: Reset the error history of this database.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Calls to previous_error() will only return errors that have occurred since the most recent call to this method.

Changed in version 2.8: Deprecated.

set_profiling_level(level, slow_ms=None, session=None)

Set the database’s profiling level.

Parameters:
  • level: Specifies a profiling level, see list of possible values below.
  • slow_ms: Optionally modify the threshold for the profile to consider a query or operation. Even if the profiler is off queries slower than the slow_ms level will get written to the logs.
  • session (optional): a ClientSession.

Possible level values:

Level Setting
OFF Off. No profiling.
SLOW_ONLY On. Only includes slow operations.
ALL On. Includes all operations.

Raises ValueError if level is not one of (OFF, SLOW_ONLY, ALL).

Changed in version 3.6: Added session parameter.

See also

The MongoDB documentation on

profiling

system_js

DEPRECATED: SystemJS helper for this Database.

See the documentation for SystemJS for more details.

validate_collection(name_or_collection, scandata=False, full=False, session=None)

Validate a collection.

Returns a dict of validation info. Raises CollectionInvalid if validation fails.

Parameters:
  • name_or_collection: A Collection object or the name of a collection to validate.
  • scandata: Do extra checks beyond checking the overall structure of the collection.
  • full: Have the server do a more thorough scan of the collection. Use with scandata for a thorough scan of the structure of the collection and the individual documents.
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

class pymongo.database.SystemJS(database)

DEPRECATED: Get a system js helper for the database database.

SystemJS will be removed in PyMongo 4.0.

list()

Get a list of the names of the functions stored in this database.

change_stream – Watch changes on a collection

ChangeStream cursor to iterate over changes on a collection.

class pymongo.change_stream.ChangeStream(collection, pipeline, full_document, resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, session=None)

A change stream cursor.

Should not be called directly by application developers. Use watch() instead.

See also

The MongoDB documentation on

changeStreams

close()

Close this ChangeStream.

next()

Advance the cursor.

This method blocks until the next change document is returned or an unrecoverable error is raised.

Raises StopIteration if this ChangeStream is closed.

client_session – Logical sessions for sequential operations

Logical sessions for ordering sequential operations.

Requires MongoDB 3.6.

New in version 3.6.

Causally Consistent Reads
with client.start_session(causal_consistency=True) as session:
    collection = client.db.collection
    collection.update_one({'_id': 1}, {'$set': {'x': 10}}, session=session)
    secondary_c = collection.with_options(
        read_preference=ReadPreference.SECONDARY)

    # A secondary read waits for replication of the write.
    secondary_c.find_one({'_id': 1}, session=session)

If causal_consistency is True (the default), read operations that use the session are causally after previous read and write operations. Using a causally consistent session, an application can read its own writes and is guaranteed monotonic reads, even when reading from replica set secondaries.

See also

The MongoDB documentation on

causal-consistency

Classes
class pymongo.client_session.ClientSession(client, server_session, options, authset)

A session for ordering sequential operations.

advance_cluster_time(cluster_time)

Update the cluster time for this session.

Parameters:
  • cluster_time: The cluster_time from another ClientSession instance.
advance_operation_time(operation_time)

Update the operation time for this session.

Parameters:
  • operation_time: The operation_time from another ClientSession instance.
client

The MongoClient this session was created from.

cluster_time

The cluster time returned by the last operation executed in this session.

end_session()

Finish this session.

It is an error to use the session or any derived Database, Collection, or Cursor after the session has ended.

has_ended

True if this session is finished.

operation_time

The operation time returned by the last operation executed in this session.

options

The SessionOptions this session was created with.

session_id

A BSON document, the opaque server session identifier.

class pymongo.client_session.SessionOptions(causal_consistency=True)

Options for a new ClientSession.

Parameters:
  • causal_consistency (optional): If True (the default), read operations are causally ordered within the session.
causal_consistency

Whether causal consistency is configured.

collation – Tools for working with collations.

Tools for working with collations.

class pymongo.collation.Collation(locale, caseLevel=None, caseFirst=None, strength=None, numericOrdering=None, alternate=None, maxVariable=None, normalization=None, backwards=None, **kwargs)
Parameters:
  • locale: (string) The locale of the collation. This should be a string that identifies an ICU locale ID exactly. For example, en_US is valid, but en_us and en-US are not. Consult the MongoDB documentation for a list of supported locales.

  • caseLevel: (optional) If True, turn on case sensitivity if strength is 1 or 2 (case sensitivity is implied if strength is greater than 2). Defaults to False.

  • caseFirst: (optional) Specify that either uppercase or lowercase characters take precedence. Must be one of the following values:

  • strength: (optional) Specify the comparison strength. This is also known as the ICU comparison level. This must be one of the following values:

    Each successive level builds upon the previous. For example, a strength of SECONDARY differentiates characters based both on the unadorned base character and its accents.

  • numericOrdering: (optional) If True, order numbers numerically instead of in collation order (defaults to False).

  • alternate: (optional) Specify whether spaces and punctuation are considered base characters. This must be one of the following values:

  • maxVariable: (optional) When alternate is SHIFTED, this option specifies what characters may be ignored. This must be one of the following values:

  • normalization: (optional) If True, normalizes text into Unicode NFD. Defaults to False.

  • backwards: (optional) If True, accents on characters are considered from the back of the word to the front, as it is done in some French dictionary ordering traditions. Defaults to False.

  • kwargs: (optional) Keyword arguments supplying any additional options to be sent with this Collation object.

class pymongo.collation.CollationStrength

An enum that defines values for strength on a Collation.

PRIMARY = 1

Differentiate base (unadorned) characters.

SECONDARY = 2

Differentiate character accents.

TERTIARY = 3

Differentiate character case.

QUATERNARY = 4

Differentiate words with and without punctuation.

IDENTICAL = 5

Differentiate unicode code point (characters are exactly identical).

class pymongo.collation.CollationAlternate

An enum that defines values for alternate on a Collation.

NON_IGNORABLE = 'non-ignorable'

Spaces and punctuation are treated as base characters.

SHIFTED = 'shifted'

Spaces and punctuation are not considered base characters.

Spaces and punctuation are distinguished regardless when the Collation strength is at least QUATERNARY.

class pymongo.collation.CollationCaseFirst

An enum that defines values for case_first on a Collation.

UPPER = 'upper'

Sort uppercase characters first.

LOWER = 'lower'

Sort lowercase characters first.

OFF = 'off'

Default for locale or collation strength.

class pymongo.collation.CollationMaxVariable

An enum that defines values for max_variable on a Collation.

PUNCT = 'punct'

Both punctuation and spaces are ignored.

SPACE = 'space'

Spaces alone are ignored.

collection – Collection level operations

Collection level utilities for Mongo.

pymongo.ASCENDING = 1

Ascending sort order.

pymongo.DESCENDING = -1

Descending sort order.

pymongo.GEO2D = '2d'

Index specifier for a 2-dimensional geospatial index.

pymongo.GEOHAYSTACK = 'geoHaystack'

Index specifier for a 2-dimensional haystack index.

New in version 2.1.

pymongo.GEOSPHERE = '2dsphere'

Index specifier for a spherical geospatial index.

New in version 2.5.

pymongo.HASHED = 'hashed'

Index specifier for a hashed index.

New in version 2.5.

pymongo.TEXT = 'text'

Index specifier for a text index.

New in version 2.7.1.

class pymongo.collection.ReturnDocument

An enum used with find_one_and_replace() and find_one_and_update().

BEFORE

Return the original document before it was updated/replaced, or None if no document matches the query.

AFTER

Return the updated/replaced or inserted document.

class pymongo.collection.Collection(database, name, create=False, **kwargs)

Get / create a Mongo collection.

Raises TypeError if name is not an instance of basestring (str in python 3). Raises InvalidName if name is not a valid collection name. Any additional keyword arguments will be used as options passed to the create command. See create_collection() for valid options.

If create is True, collation is specified, or any additional keyword arguments are present, a create command will be sent, using session if specified. Otherwise, a create command will not be sent and the collection will be created implicitly on first use. The optional session argument is only used for the create command, it is not associated with the collection afterward.

Parameters:
  • database: the database to get a collection from
  • name: the name of the collection to get
  • create (optional): if True, force collection creation even without options being set
  • codec_options (optional): An instance of CodecOptions. If None (the default) database.codec_options is used.
  • read_preference (optional): The read preference to use. If None (the default) database.read_preference is used.
  • write_concern (optional): An instance of WriteConcern. If None (the default) database.write_concern is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) database.read_concern is used.
  • collation (optional): An instance of Collation. If a collation is provided, it will be passed to the create collection command. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession that is used with the create collection command
  • **kwargs (optional): additional keyword arguments will be passed as options for the create collection command

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Support the collation option.

Changed in version 3.2: Added the read_concern option.

Changed in version 3.0: Added the codec_options, read_preference, and write_concern options. Removed the uuid_subtype attribute. Collection no longer returns an instance of Collection for attribute names with leading underscores. You must use dict-style lookups instead::

collection[‘__my_collection__’]

Not:

collection.__my_collection__

Changed in version 2.2: Removed deprecated argument: options

New in version 2.1: uuid_subtype attribute

See also

The MongoDB documentation on

collections

c[name] || c.name

Get the name sub-collection of Collection c.

Raises InvalidName if an invalid collection name is used.

full_name

The full name of this Collection.

The full name is of the form database_name.collection_name.

name

The name of this Collection.

database

The Database that this Collection is a part of.

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

read_concern

Read only access to the ReadConcern of this instance.

New in version 3.2.

with_options(codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a clone of this collection changing the specified settings.

>>> coll1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> coll2 = coll1.with_options(read_preference=ReadPreference.SECONDARY)
>>> coll1.read_preference
Primary()
>>> coll2.read_preference
Secondary(tag_sets=None)
Parameters:
bulk_write(requests, ordered=True, bypass_document_validation=False, session=None)

Send a batch of write operations to the server.

Requests are passed as a list of write operation instances ( InsertOne, UpdateOne, UpdateMany, ReplaceOne, DeleteOne, or DeleteMany).

>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634ef')}
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')}
>>> # DeleteMany, UpdateOne, and UpdateMany are also available.
...
>>> from pymongo import InsertOne, DeleteOne, ReplaceOne
>>> requests = [InsertOne({'y': 1}), DeleteOne({'x': 1}),
...             ReplaceOne({'w': 1}, {'z': 1}, upsert=True)]
>>> result = db.test.bulk_write(requests)
>>> result.inserted_count
1
>>> result.deleted_count
1
>>> result.modified_count
0
>>> result.upserted_ids
{2: ObjectId('54f62ee28891e756a6e1abd5')}
>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')}
{u'y': 1, u'_id': ObjectId('54f62ee2fba5226811f634f1')}
{u'z': 1, u'_id': ObjectId('54f62ee28891e756a6e1abd5')}
Parameters:
  • requests: A list of write operations (see examples above).
  • ordered (optional): If True (the default) requests will be performed on the server serially, in the order provided. If an error occurs all remaining operations are aborted. If False requests will be performed on the server in arbitrary order, possibly in parallel, and all operations will be attempted.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • session (optional): a ClientSession.
Returns:

An instance of BulkWriteResult.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

insert_one(document, bypass_document_validation=False, session=None)

Insert a single document.

>>> db.test.count({'x': 1})
0
>>> result = db.test.insert_one({'x': 1})
>>> result.inserted_id
ObjectId('54f112defba522406c9cc208')
>>> db.test.find_one({'x': 1})
{u'x': 1, u'_id': ObjectId('54f112defba522406c9cc208')}
Parameters:
  • document: The document to insert. Must be a mutable mapping type. If the document does not have an _id field one will be added automatically.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

insert_many(documents, ordered=True, bypass_document_validation=False, session=None)

Insert an iterable of documents.

>>> db.test.count()
0
>>> result = db.test.insert_many([{'x': i} for i in range(2)])
>>> result.inserted_ids
[ObjectId('54f113fffba522406c9cc20e'), ObjectId('54f113fffba522406c9cc20f')]
>>> db.test.count()
2
Parameters:
  • documents: A iterable of documents to insert.
  • ordered (optional): If True (the default) documents will be inserted on the server serially, in the order provided. If an error occurs all remaining inserts are aborted. If False, documents will be inserted on the server in arbitrary order, possibly in parallel, and all document inserts will be attempted.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • session (optional): a ClientSession.
Returns:

An instance of InsertManyResult.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

replace_one(filter, replacement, upsert=False, bypass_document_validation=False, collation=None, session=None)

Replace a single document matching the filter.

>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': ObjectId('54f4c5befba5220aa4d6dee7')}
>>> result = db.test.replace_one({'x': 1}, {'y': 1})
>>> result.matched_count
1
>>> result.modified_count
1
>>> for doc in db.test.find({}):
...     print(doc)
...
{u'y': 1, u'_id': ObjectId('54f4c5befba5220aa4d6dee7')}

The upsert option can be used to insert a new document if a matching document does not exist.

>>> result = db.test.replace_one({'x': 1}, {'x': 1}, True)
>>> result.matched_count
0
>>> result.modified_count
0
>>> result.upserted_id
ObjectId('54f11e5c8891e756a6e1abd4')
>>> db.test.find_one({'x': 1})
{u'x': 1, u'_id': ObjectId('54f11e5c8891e756a6e1abd4')}
Parameters:
  • filter: A query that matches the document to replace.
  • replacement: The new document.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

update_one(filter, update, upsert=False, bypass_document_validation=False, collation=None, array_filters=None, session=None)

Update a single document matching the filter.

>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> result = db.test.update_one({'x': 1}, {'$inc': {'x': 3}})
>>> result.matched_count
1
>>> result.modified_count
1
>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 4, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
Parameters:
  • filter: A query that matches the document to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added the array_filters and session parameters.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

update_many(filter, update, upsert=False, array_filters=None, bypass_document_validation=False, collation=None, session=None)

Update one or more documents that match the filter.

>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> result = db.test.update_many({'x': 1}, {'$inc': {'x': 3}})
>>> result.matched_count
3
>>> result.modified_count
3
>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 4, u'_id': 0}
{u'x': 4, u'_id': 1}
{u'x': 4, u'_id': 2}
Parameters:
  • filter: A query that matches the documents to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • bypass_document_validation (optional): If True, allows the write to opt-out of document level validation. Default is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added array_filters and session parameters.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

delete_one(filter, collation=None, session=None)

Delete a single document matching the filter.

>>> db.test.count({'x': 1})
3
>>> result = db.test.delete_one({'x': 1})
>>> result.deleted_count
1
>>> db.test.count({'x': 1})
2
Parameters:
  • filter: A query that matches the document to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession.
Returns:

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

New in version 3.0.

delete_many(filter, collation=None, session=None)

Delete one or more documents matching the filter.

>>> db.test.count({'x': 1})
3
>>> result = db.test.delete_many({'x': 1})
>>> result.deleted_count
3
>>> db.test.count({'x': 1})
0
Parameters:
  • filter: A query that matches the documents to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession.
Returns:

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

New in version 3.0.

aggregate(pipeline, session=None, **kwargs)

Perform an aggregation using the aggregation framework on this collection.

All optional aggregate command parameters should be passed as keyword arguments to this method. Valid options include, but are not limited to:

  • allowDiskUse (bool): Enables writing to temporary files. When set to True, aggregation stages can write data to the _tmp subdirectory of the –dbpath directory. The default is False.
  • maxTimeMS (int): The maximum amount of time to allow the operation to run in milliseconds.
  • batchSize (int): The maximum number of documents to return per batch. Ignored if the connected mongod or mongos does not support returning aggregate results using a cursor, or useCursor is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • useCursor (bool): Deprecated. Will be removed in PyMongo 4.0.

The aggregate() method obeys the read_preference of this Collection. Please note that using the $out pipeline stage requires a read preference of PRIMARY (the default). The server will raise an error if the $out pipeline stage is used with any other read preference.

Note

This method does not support the ‘explain’ option. Please use command() instead. An example is included in the Aggregation Framework documentation.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Parameters:
  • pipeline: a list of aggregation pipeline stages
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.
Returns:

A CommandCursor over the result set.

Changed in version 3.6: Added the session parameter. Added the maxAwaitTimeMS option. Deprecated the useCursor option.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4. Support the collation option.

Changed in version 3.0: The aggregate() method always returns a CommandCursor. The pipeline argument must be a list.

Changed in version 2.7: When the cursor option is used, return CommandCursor instead of Cursor.

Changed in version 2.6: Added cursor support.

New in version 2.3.

aggregate_raw_batches(pipeline, **kwargs)

Perform an aggregation and retrieve batches of raw BSON.

Similar to the aggregate() method but returns a RawBatchCursor.

This example demonstrates how to work with raw batches, but in practice raw batches should be passed to an external library that can decode BSON into another data type, rather than used with PyMongo’s bson module.

>>> import bson
>>> cursor = db.test.aggregate_raw_batches([
...     {'$project': {'x': {'$multiply': [2, '$x']}}}])
>>> for batch in cursor:
...     print(bson.decode_all(batch))

Note

aggregate_raw_batches does not support sessions.

New in version 3.6.

watch(pipeline=None, full_document='default', resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, session=None)

Watch changes on this collection.

Performs an aggregation with an implicit initial $changeStream stage and returns a ChangeStream cursor which iterates over changes on this collection. Introduced in MongoDB 3.6.

for change in db.collection.watch():
    print(change)

The ChangeStream iterable blocks until the next change document is returned or an error is raised. If the next() method encounters a network error when retrieving a batch from the server, it will automatically attempt to recreate the cursor such that no change events are missed. Any error encountered during the resume attempt indicates there may be an outage and will be raised.

try:
    for insert_change in db.collection.watch(
            [{'$match': {'operationType': 'insert'}}]):
        print(insert_change)
except pymongo.errors.PyMongoError:
    # The ChangeStream encountered an unrecoverable error or the
    # resume attempt failed to recreate the cursor.
    log.error('...')

For a precise description of the resume process see the change streams specification.

Note

Using this helper method is preferred to directly calling aggregate() with a $changeStream stage, for the purpose of supporting resumability.

Warning

This Collection’s read_concern must be ReadConcern("majority") in order to use the $changeStream stage.

Parameters:
  • pipeline (optional): A list of aggregation pipeline stages to append to an initial $changeStream stage. Not all pipeline stages are valid after a $changeStream stage, see the MongoDB documentation on change streams for the supported stages.
  • full_document (optional): The fullDocument to pass as an option to the $changeStream stage. Allowed values: ‘default’, ‘updateLookup’. Defaults to ‘default’. When set to ‘updateLookup’, the change notification for partial updates will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred.
  • resume_after (optional): The logical starting point for this change stream.
  • max_await_time_ms (optional): The maximum time in milliseconds for the server to wait for changes before responding to a getMore operation.
  • batch_size (optional): The maximum number of documents to return per batch.
  • collation (optional): The Collation to use for the aggregation.
  • session (optional): a ClientSession.
Returns:

A ChangeStream cursor.

New in version 3.6.

See also

The MongoDB documentation on

changeStreams

find(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, session=None)

Query the database.

The filter argument is a prototype document that all results must match. For example:

>>> db.test.find({"hello": "world"})

only matches documents that have a key “hello” with value “world”. Matches can have other keys in addition to “hello”. The projection argument is used to specify a subset of fields that should be included in the result documents. By limiting results to a certain subset of fields you can cut down on network traffic and decoding time.

Raises TypeError if any of the arguments are of improper type. Returns an instance of Cursor corresponding to this query.

The find() method obeys the read_preference of this Collection.

Parameters:
  • filter (optional): a SON object specifying elements which must be present for a document to be included in the result set
  • projection (optional): a list of field names that should be returned in the result set or a dict specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a dict to exclude fields from the result (e.g. projection={‘_id’: False}).
  • session (optional): a ClientSession.
  • skip (optional): the number of documents to omit (from the start of the result set) when returning the results
  • limit (optional): the maximum number of results to return
  • no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.
  • cursor_type (optional): the type of cursor to return. The valid options are defined by CursorType:
    • NON_TAILABLE - the result of this find call will return a standard cursor over the result set.
    • TAILABLE - the result of this find call will be a tailable cursor - tailable cursors are only for use with capped collections. They are not closed when the last data is retrieved but are kept open and the cursor location marks the final document position. If more data is received iteration of the cursor will continue from the last document received. For details, see the tailable cursor documentation.
    • TAILABLE_AWAIT - the result of this find call will be a tailable cursor with the await flag set. The server will wait for a few seconds after returning the full result set so that it can capture and return additional data added during the query.
    • EXHAUST - the result of this find call will be an exhaust cursor. MongoDB will stream batched results to the client without waiting for the client to request each batch, reducing latency. See notes on compatibility below.
  • sort (optional): a list of (key, direction) pairs specifying the sort order for this query. See sort() for details.
  • allow_partial_results (optional): if True, mongos will return partial results if some shards are down instead of returning an error.
  • oplog_replay (optional): If True, set the oplogReplay query flag.
  • batch_size (optional): Limits the number of documents returned in a single batch.
  • manipulate (optional): DEPRECATED - If True (the default), apply any outgoing SON manipulators before returning.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • return_key (optional): If True, return only the index keys in each document.
  • show_record_id (optional): If True, adds a field $recordId in each document with the storage engine’s internal record identifier.
  • snapshot (optional): If True, prevents the cursor from returning a document more than once because of an intervening write operation.
  • hint (optional): An index, in the same format as passed to create_index() (e.g. [('field', ASCENDING)]). Pass this as an alternative to calling hint() on the cursor to tell Mongo the proper index to use for the query.
  • max_time_ms (optional): Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. Pass this as an alternative to calling max_time_ms() on the cursor.
  • max_scan (optional): The maximum number of documents to scan. Pass this as an alternative to calling max_scan() on the cursor.
  • min (optional): A list of field, limit pairs specifying the inclusive lower bound for all keys of a specific index in order. Pass this as an alternative to calling min() on the cursor.
  • max (optional): A list of field, limit pairs specifying the exclusive upper bound for all keys of a specific index in order. Pass this as an alternative to calling max() on the cursor.
  • comment (optional): A string or document. Pass this as an alternative to calling comment() on the cursor.
  • modifiers (optional): DEPRECATED - A dict specifying additional MongoDB query modifiers. Use the keyword arguments listed above instead.

Note

There are a number of caveats to using EXHAUST as cursor_type:

  • The limit option can not be used with an exhaust cursor.
  • Exhaust cursors are not supported by mongos and can not be used with a sharded cluster.
  • A Cursor instance created with the EXHAUST cursor_type requires an exclusive socket connection to MongoDB. If the Cursor is discarded without being completely iterated the underlying socket connection will be closed and discarded without being returned to the connection pool.

Changed in version 3.6: Added session parameter.

Changed in version 3.5: Added the options return_key, show_record_id, snapshot, hint, max_time_ms, max_scan, min, max, and comment. Deprecated the option modifiers.

Changed in version 3.4: Support the collation option.

Changed in version 3.0: Changed the parameter names spec, fields, timeout, and partial to filter, projection, no_cursor_timeout, and allow_partial_results respectively. Added the cursor_type, oplog_replay, and modifiers options. Removed the network_timeout, read_preference, tag_sets, secondary_acceptable_latency_ms, max_scan, snapshot, tailable, await_data, exhaust, as_class, and slave_okay parameters. Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object. Soft deprecated the manipulate option.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

New in version 2.3: The tag_sets and secondary_acceptable_latency_ms parameters.

See also

The MongoDB documentation on

find

find_raw_batches(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None)

Query the database and retrieve batches of raw BSON.

Similar to the find() method but returns a RawBatchCursor.

This example demonstrates how to work with raw batches, but in practice raw batches should be passed to an external library that can decode BSON into another data type, rather than used with PyMongo’s bson module.

>>> import bson
>>> cursor = db.test.find_raw_batches()
>>> for batch in cursor:
...     print(bson.decode_all(batch))

Note

find_raw_batches does not support sessions.

New in version 3.6.

find_one(filter=None, *args, **kwargs)

Get a single document from the database.

All arguments to find() are also valid arguments for find_one(), although any limit argument will be ignored. Returns a single document, or None if no matching document is found.

The find_one() method obeys the read_preference of this Collection.

Parameters:
  • filter (optional): a dictionary specifying the query to be performed OR any other type to be used as the value for a query for "_id".

  • *args (optional): any additional positional arguments are the same as the arguments to find().

  • **kwargs (optional): any additional keyword arguments are the same as the arguments to find().

    >>> collection.find_one(max_time_ms=100)
    
find_one_and_delete(filter, projection=None, sort=None, session=None, **kwargs)

Finds a single document and deletes it, returning the document.

>>> db.test.count({'x': 1})
2
>>> db.test.find_one_and_delete({'x': 1})
{u'x': 1, u'_id': ObjectId('54f4e12bfba5220aa4d6dee8')}
>>> db.test.count({'x': 1})
1

If multiple documents match filter, a sort can be applied.

>>> for doc in db.test.find({'x': 1}):
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> db.test.find_one_and_delete(
...     {'x': 1}, sort=[('_id', pymongo.DESCENDING)])
{u'x': 1, u'_id': 2}

The projection option can be used to limit the fields returned.

>>> db.test.find_one_and_delete({'x': 1}, projection={'_id': False})
{u'x': 1}
Parameters:
  • filter: A query that matches the document to delete.
  • projection (optional): a list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a mapping to exclude fields from the result (e.g. projection={‘_id’: False}).
  • sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is deleted.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Respects write concern.

Warning

Starting in PyMongo 3.2, this command uses the WriteConcern of this Collection when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.

Changed in version 3.4: Added the collation option.

New in version 3.0.

find_one_and_replace(filter, replacement, projection=None, sort=None, return_document=ReturnDocument.BEFORE, session=None, **kwargs)

Finds a single document and replaces it, returning either the original or the replaced document.

The find_one_and_replace() method differs from find_one_and_update() by replacing the document matched by filter, rather than modifying the existing document.

>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> db.test.find_one_and_replace({'x': 1}, {'y': 1})
{u'x': 1, u'_id': 0}
>>> for doc in db.test.find({}):
...     print(doc)
...
{u'y': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
Parameters:
  • filter: A query that matches the document to replace.
  • replacement: The replacement document.
  • projection (optional): A list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a mapping to exclude fields from the result (e.g. projection={‘_id’: False}).
  • sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is replaced.
  • upsert (optional): When True, inserts a new document if no document matches the query. Defaults to False.
  • return_document: If ReturnDocument.BEFORE (the default), returns the original document before it was replaced, or None if no document matches. If ReturnDocument.AFTER, returns the replaced or inserted document.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Respects write concern.

Warning

Starting in PyMongo 3.2, this command uses the WriteConcern of this Collection when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.

New in version 3.0.

find_one_and_update(filter, update, projection=None, sort=None, return_document=ReturnDocument.BEFORE, array_filters=None, session=None, **kwargs)

Finds a single document and updates it, returning either the original or the updated document.

>>> db.test.find_one_and_update(
...    {'_id': 665}, {'$inc': {'count': 1}, '$set': {'done': True}})
{u'_id': 665, u'done': False, u'count': 25}}

By default find_one_and_update() returns the original version of the document before the update was applied. To return the updated version of the document instead, use the return_document option.

>>> from pymongo import ReturnDocument
>>> db.example.find_one_and_update(
...     {'_id': 'userid'},
...     {'$inc': {'seq': 1}},
...     return_document=ReturnDocument.AFTER)
{u'_id': u'userid', u'seq': 1}

You can limit the fields returned with the projection option.

>>> db.example.find_one_and_update(
...     {'_id': 'userid'},
...     {'$inc': {'seq': 1}},
...     projection={'seq': True, '_id': False},
...     return_document=ReturnDocument.AFTER)
{u'seq': 2}

The upsert option can be used to create the document if it doesn’t already exist.

>>> db.example.delete_many({}).deleted_count
1
>>> db.example.find_one_and_update(
...     {'_id': 'userid'},
...     {'$inc': {'seq': 1}},
...     projection={'seq': True, '_id': False},
...     upsert=True,
...     return_document=ReturnDocument.AFTER)
{u'seq': 1}

If multiple documents match filter, a sort can be applied.

>>> for doc in db.test.find({'done': True}):
...     print(doc)
...
{u'_id': 665, u'done': True, u'result': {u'count': 26}}
{u'_id': 701, u'done': True, u'result': {u'count': 17}}
>>> db.test.find_one_and_update(
...     {'done': True},
...     {'$set': {'final': True}},
...     sort=[('_id', pymongo.DESCENDING)])
{u'_id': 701, u'done': True, u'result': {u'count': 17}}
Parameters:
  • filter: A query that matches the document to update.
  • update: The update operations to apply.
  • projection (optional): A list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a dict to exclude fields from the result (e.g. projection={‘_id’: False}).
  • sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is updated.
  • upsert (optional): When True, inserts a new document if no document matches the query. Defaults to False.
  • return_document: If ReturnDocument.BEFORE (the default), returns the original document before it was updated, or None if no document matches. If ReturnDocument.AFTER, returns the updated or inserted document.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).

Changed in version 3.6: Added the array_filters and session options.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Respects write concern.

Warning

Starting in PyMongo 3.2, this command uses the WriteConcern of this Collection when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.

New in version 3.0.

count(filter=None, session=None, **kwargs)

Get the number of documents in this collection.

All optional count parameters should be passed as keyword arguments to this method. Valid options include:

  • hint (string or list of tuples): The index to use. Specify either the index name as a string or the index specification as a list of tuples (e.g. [(‘a’, pymongo.ASCENDING), (‘b’, pymongo.ASCENDING)]).
  • limit (int): The maximum number of documents to count.
  • skip (int): The number of matching documents to skip before returning results.
  • maxTimeMS (int): The maximum amount of time to allow the count command to run, in milliseconds.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

The count() method obeys the read_preference of this Collection.

Parameters:
  • filter (optional): A query document that selects which documents to count in the collection.
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Support the collation option.

distinct(key, filter=None, session=None, **kwargs)

Get a list of distinct values for key among all documents in this collection.

Raises TypeError if key is not an instance of basestring (str in python 3).

All optional distinct parameters should be passed as keyword arguments to this method. Valid options include:

  • maxTimeMS (int): The maximum amount of time to allow the count command to run, in milliseconds.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

The distinct() method obeys the read_preference of this Collection.

Parameters:
  • key: name of the field for which we want to get the distinct values
  • filter (optional): A query document that specifies the documents from which to retrieve the distinct values.
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Support the collation option.

create_index(keys, session=None, **kwargs)

Creates an index on this collection.

Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of basestring (str in python 3), and the direction(s) must be one of (ASCENDING, DESCENDING, GEO2D, GEOHAYSTACK, GEOSPHERE, HASHED, TEXT).

To create a single key ascending index on the key 'mike' we just use a string argument:

>>> my_collection.create_index("mike")

For a compound index on 'mike' descending and 'eliot' ascending we need to use a list of tuples:

>>> my_collection.create_index([("mike", pymongo.DESCENDING),
...                             ("eliot", pymongo.ASCENDING)])

All optional index creation parameters should be passed as keyword arguments to this method. For example:

>>> my_collection.create_index([("mike", pymongo.DESCENDING)],
...                            background=True)

Valid options include, but are not limited to:

  • name: custom name to use for this index - if none is given, a name will be generated.
  • unique: if True creates a uniqueness constraint on the index.
  • background: if True this index should be created in the background.
  • sparse: if True, omit from the index any documents that lack the indexed field.
  • bucketSize: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
  • min: minimum value for keys in a GEO2D index.
  • max: maximum value for keys in a GEO2D index.
  • expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
  • partialFilterExpression: A document that specifies a filter for a partial index.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

See the MongoDB documentation for a full list of supported options by server version.

Warning

dropDups is not supported by MongoDB 3.0 or newer. The option is silently ignored by the server and unique index builds using the option will fail if a duplicate value is detected.

Note

partialFilterExpression requires server version >= 3.2

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Parameters:
  • keys: a single key or a list of (key, direction) pairs specifying the index to create
  • session (optional): a ClientSession.
  • **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments

Changed in version 3.6: Added session parameter. Added support for passing maxTimeMS in kwargs.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4. Support the collation option.

Changed in version 3.2: Added partialFilterExpression to support partial indexes.

Changed in version 3.0: Renamed key_or_list to keys. Removed the cache_for option. create_index() no longer caches index names. Removed support for the drop_dups and bucket_size aliases.

See also

The MongoDB documentation on

indexes

create_indexes(indexes, session=None, **kwargs)

Create one or more indexes on this collection.

>>> from pymongo import IndexModel, ASCENDING, DESCENDING
>>> index1 = IndexModel([("hello", DESCENDING),
...                      ("world", ASCENDING)], name="hello_world")
>>> index2 = IndexModel([("goodbye", DESCENDING)])
>>> db.test.create_indexes([index1, index2])
["hello_world", "goodbye_-1"]
Parameters:
  • indexes: A list of IndexModel instances.
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.

Note

create_indexes uses the createIndexes command introduced in MongoDB 2.6 and cannot be used with earlier versions.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

New in version 3.0.

drop_index(index_or_name, session=None, **kwargs)

Drops the specified index on this collection.

Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error (e.g. trying to drop an index that does not exist). index_or_name can be either an index name (as returned by create_index), or an index specifier (as passed to create_index). An index specifier should be a list of (key, direction) pairs. Raises TypeError if index is not an instance of (str, unicode, list).

Warning

if a custom name was used on index creation (by passing the name parameter to create_index() or ensure_index()) the index must be dropped by name.

Parameters:
  • index_or_name: index (or name of index) to drop
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

drop_indexes(session=None, **kwargs)

Drops all indexes on this collection.

Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error.

Parameters:
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

reindex(session=None, **kwargs)

Rebuilds all indexes on this collection.

Parameters:
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the reIndex command (like maxTimeMS) can be passed as keyword arguments.

Warning

reindex blocks all other operations (indexes are built in the foreground) and will be slow for large collections.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

Changed in version 3.5: We no longer apply this collection’s write concern to this operation. MongoDB 3.4 silently ignored the write concern. MongoDB 3.6+ returns an error if we include the write concern.

list_indexes(session=None)

Get a cursor over the index documents for this collection.

>>> for index in db.test.list_indexes():
...     print(index)
...
SON([(u'v', 1), (u'key', SON([(u'_id', 1)])),
     (u'name', u'_id_'), (u'ns', u'test.test')])
Parameters:
Returns:

An instance of CommandCursor.

Changed in version 3.6: Added session parameter.

New in version 3.0.

index_information(session=None)

Get information on this collection’s indexes.

Returns a dictionary where the keys are index names (as returned by create_index()) and the values are dictionaries containing information about each index. The dictionary is guaranteed to contain at least a single key, "key" which is a list of (key, direction) pairs specifying the index (as passed to create_index()). It will also contain any other metadata about the indexes, except for the "ns" and "name" keys, which are cleaned. Example output might look like this:

>>> db.test.create_index("x", unique=True)
u'x_1'
>>> db.test.index_information()
{u'_id_': {u'key': [(u'_id', 1)]},
 u'x_1': {u'unique': True, u'key': [(u'x', 1)]}}
Parameters:

Changed in version 3.6: Added session parameter.

drop(session=None)

Alias for drop_collection().

Parameters:

The following two calls are equivalent:

>>> db.foo.drop()
>>> db.drop_collection("foo")

Changed in version 3.6: Added session parameter.

rename(new_name, session=None, **kwargs)

Rename this collection.

If operating in auth mode, client must be authorized as an admin to perform this operation. Raises TypeError if new_name is not an instance of basestring (str in python 3). Raises InvalidName if new_name is not a valid collection name.

Parameters:
  • new_name: new name for this collection
  • session (optional): a ClientSession.
  • **kwargs (optional): additional arguments to the rename command may be passed as keyword arguments to this helper method (i.e. dropTarget=True)

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

options(session=None)

Get the options set on this collection.

Returns a dictionary of options and their values - see create_collection() for more information on the possible options. Returns an empty dictionary if the collection has not been created yet.

Parameters:

Changed in version 3.6: Added session parameter.

map_reduce(map, reduce, out, full_response=False, session=None, **kwargs)

Perform a map/reduce operation on this collection.

If full_response is False (default) returns a Collection instance containing the results of the operation. Otherwise, returns the full response from the server to the map reduce command.

Parameters:
  • map: map function (as a JavaScript string)

  • reduce: reduce function (as a JavaScript string)

  • out: output collection name or out object (dict). See the map reduce command documentation for available options. Note: out options are order sensitive. SON can be used to specify multiple options. e.g. SON([(‘replace’, <collection name>), (‘db’, <database name>)])

  • full_response (optional): if True, return full response to this command - otherwise just return the result collection

  • session (optional): a ClientSession.

  • **kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:

    >>> db.test.map_reduce(map, reduce, "myresults", limit=2)
    

Note

The map_reduce() method does not obey the read_preference of this Collection. To run mapReduce on a secondary use the inline_map_reduce() method instead.

Note

The write_concern of this collection is automatically applied to this operation (if the output is not inline) when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

Changed in version 3.4: Added the collation option.

Changed in version 2.2: Removed deprecated arguments: merge_output and reduce_output

See also

The MongoDB documentation on

mapreduce

inline_map_reduce(map, reduce, full_response=False, session=None, **kwargs)

Perform an inline map/reduce operation on this collection.

Perform the map/reduce operation on the server in RAM. A result collection is not created. The result set is returned as a list of documents.

If full_response is False (default) returns the result documents in a list. Otherwise, returns the full response from the server to the map reduce command.

The inline_map_reduce() method obeys the read_preference of this Collection.

Parameters:
  • map: map function (as a JavaScript string)

  • reduce: reduce function (as a JavaScript string)

  • full_response (optional): if True, return full response to this command - otherwise just return the result collection

  • session (optional): a ClientSession.

  • **kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:

    >>> db.test.inline_map_reduce(map, reduce, limit=2)
    

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

parallel_scan(num_cursors, session=None, **kwargs)

Scan this entire collection in parallel.

Returns a list of up to num_cursors cursors that can be iterated concurrently. As long as the collection is not modified during scanning, each document appears once in one of the cursors result sets.

For example, to process each document in a collection using some thread-safe process_document() function:

>>> def process_cursor(cursor):
...     for document in cursor:
...     # Some thread-safe processing function:
...     process_document(document)
>>>
>>> # Get up to 4 cursors.
...
>>> cursors = collection.parallel_scan(4)
>>> threads = [
...     threading.Thread(target=process_cursor, args=(cursor,))
...     for cursor in cursors]
>>>
>>> for thread in threads:
...     thread.start()
>>>
>>> for thread in threads:
...     thread.join()
>>>
>>> # All documents have now been processed.

The parallel_scan() method obeys the read_preference of this Collection.

Parameters:
  • num_cursors: the number of cursors to return
  • session (optional): a ClientSession.
  • **kwargs: additional options for the parallelCollectionScan command can be passed as keyword arguments.

Note

Requires server version >= 2.5.5.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added back support for arbitrary keyword arguments. MongoDB 3.4 adds support for maxTimeMS as an option to the parallelCollectionScan command.

Changed in version 3.0: Removed support for arbitrary keyword arguments, since the parallelCollectionScan command has no optional arguments.

initialize_unordered_bulk_op(bypass_document_validation=False)

DEPRECATED - Initialize an unordered batch of write operations.

Operations will be performed on the server in arbitrary order, possibly in parallel. All operations will be attempted.

Parameters:
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.

Returns a BulkOperationBuilder instance.

See Unordered Bulk Write Operations for examples.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.5: Deprecated. Use bulk_write() instead.

Changed in version 3.2: Added bypass_document_validation support

New in version 2.7.

initialize_ordered_bulk_op(bypass_document_validation=False)

DEPRECATED - Initialize an ordered batch of write operations.

Operations will be performed on the server serially, in the order provided. If an error occurs all remaining operations are aborted.

Parameters:
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.

Returns a BulkOperationBuilder instance.

See Ordered Bulk Write Operations for examples.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.5: Deprecated. Use bulk_write() instead.

Changed in version 3.2: Added bypass_document_validation support

New in version 2.7.

group(key, condition, initial, reduce, finalize=None, **kwargs)

Perform a query similar to an SQL group by operation.

DEPRECATED - The group command was deprecated in MongoDB 3.4. The group() method is deprecated and will be removed in PyMongo 4.0. Use aggregate() with the $group stage or map_reduce() instead.

Changed in version 3.5: Deprecated the group method.

Changed in version 3.4: Added the collation option.

Changed in version 2.2: Removed deprecated argument: command

insert(doc_or_docs, manipulate=True, check_keys=True, continue_on_error=False, **kwargs)

Insert a document(s) into this collection.

DEPRECATED - Use insert_one() or insert_many() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

save(to_save, manipulate=True, check_keys=True, **kwargs)

Save a document in this collection.

DEPRECATED - Use insert_one() or replace_one() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

update(spec, document, upsert=False, manipulate=False, multi=False, check_keys=True, **kwargs)

Update a document(s) in this collection.

DEPRECATED - Use replace_one(), update_one(), or update_many() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

remove(spec_or_id=None, multi=True, **kwargs)

Remove a document(s) from this collection.

DEPRECATED - Use delete_one() or delete_many() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

find_and_modify(query={}, update=None, upsert=False, sort=None, full_response=False, manipulate=False, **kwargs)

Update and return an object.

DEPRECATED - Use find_one_and_delete(), find_one_and_replace(), or find_one_and_update() instead.

ensure_index(key_or_list, cache_for=300, **kwargs)

DEPRECATED - Ensures that an index exists on this collection.

Changed in version 3.0: DEPRECATED

command_cursor – Tools for iterating over MongoDB command results

CommandCursor class to iterate over command results.

class pymongo.command_cursor.CommandCursor(collection, cursor_info, address, retrieved=0, batch_size=0, max_await_time_ms=None, session=None, explicit_session=False)

Create a new command cursor.

The parameter ‘retrieved’ is unused.

address

The (host, port) of the server used, or None.

New in version 3.0.

alive

Does this cursor have the potential to return more data?

Even if alive is True, next() can raise StopIteration. Best to use a for loop:

for doc in collection.aggregate(pipeline):
    print(doc)

Note

alive can be True while iterating a cursor from a failed server. In this case alive will return False after next() fails to retrieve the next batch of results from the server.

batch_size(batch_size)

Limits the number of documents returned in one batch. Each batch requires a round trip to the server. It can be adjusted to optimize performance and limit data transfer.

Note

batch_size can not override MongoDB’s internal limits on the amount of data it will return to the client in a single batch (i.e if you set batch size to 1,000,000,000, MongoDB will currently only return 4-16MB of results per batch).

Raises TypeError if batch_size is not an integer. Raises ValueError if batch_size is less than 0.

Parameters:
  • batch_size: The size of each batch of results requested.
close()

Explicitly close / kill this cursor.

cursor_id

Returns the id of the cursor.

next()

Advance the cursor.

session

The cursor’s ClientSession, or None.

New in version 3.6.

class pymongo.command_cursor.RawBatchCommandCursor(collection, cursor_info, address, retrieved=0, batch_size=0, max_await_time_ms=None, session=None, explicit_session=False)

Create a new cursor / iterator over raw batches of BSON data.

Should not be called directly by application developers - see aggregate_raw_batches() instead.

See also

The MongoDB documentation on

cursors

cursor – Tools for iterating over MongoDB query results

Cursor class to iterate over Mongo query results.

class pymongo.cursor.CursorType
NON_TAILABLE

The standard cursor type.

TAILABLE

The tailable cursor type.

Tailable cursors are only for use with capped collections. They are not closed when the last data is retrieved but are kept open and the cursor location marks the final document position. If more data is received iteration of the cursor will continue from the last document received.

TAILABLE_AWAIT

A tailable cursor with the await option set.

Creates a tailable cursor that will wait for a few seconds after returning the full result set so that it can capture and return additional data added during the query.

EXHAUST

An exhaust cursor.

MongoDB will stream batched results to the client without waiting for the client to request each batch, reducing latency.

class pymongo.cursor.Cursor(collection, filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None)

Create a new cursor.

Should not be called directly by application developers - see find() instead.

See also

The MongoDB documentation on

cursors

c[index]

See __getitem__().

__getitem__(index)

Get a single document or a slice of documents from this cursor.

Raises InvalidOperation if this cursor has already been used.

To get a single document use an integral index, e.g.:

>>> db.test.find()[50]

An IndexError will be raised if the index is negative or greater than the amount of documents in this cursor. Any limit previously applied to this cursor will be ignored.

To get a slice of documents use a slice index, e.g.:

>>> db.test.find()[20:25]

This will return this cursor with a limit of 5 and skip of 20 applied. Using a slice index will override any prior limits or skips applied to this cursor (including those applied through previous calls to this method). Raises IndexError when the slice has a step, a negative start value, or a stop value less than or equal to the start value.

Parameters:
  • index: An integer or slice index to be applied to this cursor
add_option(mask)

Set arbitrary query flags using a bitmask.

To set the tailable flag: cursor.add_option(2)

address

The (host, port) of the server used, or None.

Changed in version 3.0: Renamed from “conn_id”.

alive

Does this cursor have the potential to return more data?

This is mostly useful with tailable cursors since they will stop iterating even though they may return more results in the future.

With regular cursors, simply use a for loop instead of alive:

for doc in collection.find():
    print(doc)

Note

Even if alive is True, next() can raise StopIteration. alive can also be True while iterating a cursor from a failed server. In this case alive will return False after next() fails to retrieve the next batch of results from the server.

batch_size(batch_size)

Limits the number of documents returned in one batch. Each batch requires a round trip to the server. It can be adjusted to optimize performance and limit data transfer.

Note

batch_size can not override MongoDB’s internal limits on the amount of data it will return to the client in a single batch (i.e if you set batch size to 1,000,000,000, MongoDB will currently only return 4-16MB of results per batch).

Raises TypeError if batch_size is not an integer. Raises ValueError if batch_size is less than 0. Raises InvalidOperation if this Cursor has already been used. The last batch_size applied to this cursor takes precedence.

Parameters:
  • batch_size: The size of each batch of results requested.
clone()

Get a clone of this cursor.

Returns a new Cursor instance with options matching those that have been set on the current instance. The clone will be completely unevaluated, even if the current instance has been partially or completely evaluated.

close()

Explicitly close / kill this cursor.

collation(collation)

Adds a Collation to this query.

This option is only supported on MongoDB 3.4 and above.

Raises TypeError if collation is not an instance of Collation or a dict. Raises InvalidOperation if this Cursor has already been used. Only the last collation applied to this cursor has any effect.

Parameters:
collection

The Collection that this Cursor is iterating.

comment(comment)

Adds a ‘comment’ to the cursor.

http://docs.mongodb.org/manual/reference/operator/comment/

Parameters:
  • comment: A string or document

New in version 2.7.

count(with_limit_and_skip=False)

Get the size of the results set for this query.

Returns the number of documents in the results set for this query. Does not take limit() and skip() into account by default - set with_limit_and_skip to True if that is the desired behavior. Raises OperationFailure on a database error.

When used with MongoDB >= 2.6, count() uses any hint() applied to the query. In the following example the hint is passed to the count command:

collection.find({‘field’: ‘value’}).hint(‘field_1’).count()

The count() method obeys the read_preference of the Collection instance on which find() was called.

Parameters:
  • with_limit_and_skip (optional): take any limit() or skip() that has been applied to this cursor into account when getting the count

Note

The with_limit_and_skip parameter requires server version >= 1.1.4-

Changed in version 2.8: The count() method now supports hint().

cursor_id

Returns the id of the cursor

Useful if you need to manage cursor ids and want to handle killing cursors manually using kill_cursors()

New in version 2.2.

distinct(key)

Get a list of distinct values for key among all documents in the result set of this query.

Raises TypeError if key is not an instance of basestring (str in python 3).

The distinct() method obeys the read_preference of the Collection instance on which find() was called.

Parameters:
  • key: name of key for which we want to get the distinct values
explain()

Returns an explain plan record for this cursor.

See also

The MongoDB documentation on

explain

hint(index)

Adds a ‘hint’, telling Mongo the proper index to use for the query.

Judicious use of hints can greatly improve query performance. When doing a query on multiple fields (at least one of which is indexed) pass the indexed field as a hint to the query. Hinting will not do anything if the corresponding index does not exist. Raises InvalidOperation if this cursor has already been used.

index should be an index as passed to create_index() (e.g. [('field', ASCENDING)]) or the name of the index. If index is None any existing hint for this query is cleared. The last hint applied to this cursor takes precedence over all others.

Parameters:
  • index: index to hint on (as an index specifier)

Changed in version 2.8: The hint() method accepts the name of the index.

limit(limit)

Limits the number of results to be returned by this cursor.

Raises TypeError if limit is not an integer. Raises InvalidOperation if this Cursor has already been used. The last limit applied to this cursor takes precedence. A limit of 0 is equivalent to no limit.

Parameters:
  • limit: the number of results to return

See also

The MongoDB documentation on

limit

max(spec)

Adds max operator that specifies upper bound for specific index.

Parameters:
  • spec: a list of field, limit pairs specifying the exclusive upper bound for all keys of a specific index in order.

New in version 2.7.

max_await_time_ms(max_await_time_ms)

Specifies a time limit for a getMore operation on a TAILABLE_AWAIT cursor. For all other types of cursor max_await_time_ms is ignored.

Raises TypeError if max_await_time_ms is not an integer or None. Raises InvalidOperation if this Cursor has already been used.

Note

max_await_time_ms requires server version >= 3.2

Parameters:
  • max_await_time_ms: the time limit after which the operation is aborted

New in version 3.2.

max_scan(max_scan)

Limit the number of documents to scan when performing the query.

Raises InvalidOperation if this cursor has already been used. Only the last max_scan() applied to this cursor has any effect.

Parameters:
  • max_scan: the maximum number of documents to scan
max_time_ms(max_time_ms)

Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. If max_time_ms is None no limit is applied.

Raises TypeError if max_time_ms is not an integer or None. Raises InvalidOperation if this Cursor has already been used.

Parameters:
  • max_time_ms: the time limit after which the operation is aborted
min(spec)

Adds min operator that specifies lower bound for specific index.

Parameters:
  • spec: a list of field, limit pairs specifying the inclusive lower bound for all keys of a specific index in order.

New in version 2.7.

next()

Advance the cursor.

remove_option(mask)

Unset arbitrary query flags using a bitmask.

To unset the tailable flag: cursor.remove_option(2)

retrieved

The number of documents retrieved so far.

rewind()

Rewind this cursor to its unevaluated state.

Reset this cursor if it has been partially or completely evaluated. Any options that are present on the cursor will remain in effect. Future iterating performed on this cursor will cause new queries to be sent to the server, even if the resultant data has already been retrieved by this cursor.

session

The cursor’s ClientSession, or None.

New in version 3.6.

skip(skip)

Skips the first skip results of this cursor.

Raises TypeError if skip is not an integer. Raises ValueError if skip is less than 0. Raises InvalidOperation if this Cursor has already been used. The last skip applied to this cursor takes precedence.

Parameters:
  • skip: the number of results to skip
sort(key_or_list, direction=None)

Sorts this cursor’s results.

Pass a field name and a direction, either ASCENDING or DESCENDING:

for doc in collection.find().sort('field', pymongo.ASCENDING):
    print(doc)

To sort by multiple fields, pass a list of (key, direction) pairs:

for doc in collection.find().sort([
        ('field1', pymongo.ASCENDING),
        ('field2', pymongo.DESCENDING)]):
    print(doc)

Beginning with MongoDB version 2.6, text search results can be sorted by relevance:

cursor = db.test.find(
    {'$text': {'$search': 'some words'}},
    {'score': {'$meta': 'textScore'}})

# Sort by 'score' field.
cursor.sort([('score', {'$meta': 'textScore'})])

for doc in cursor:
    print(doc)

Raises InvalidOperation if this cursor has already been used. Only the last sort() applied to this cursor has any effect.

Parameters:
  • key_or_list: a single key or a list of (key, direction) pairs specifying the keys to sort on
  • direction (optional): only used if key_or_list is a single key, if not given ASCENDING is assumed
where(code)

Adds a $where clause to this query.

The code argument must be an instance of basestring (str in python 3) or Code containing a JavaScript expression. This expression will be evaluated for each document scanned. Only those documents for which the expression evaluates to true will be returned as results. The keyword this refers to the object currently being scanned.

Raises TypeError if code is not an instance of basestring (str in python 3). Raises InvalidOperation if this Cursor has already been used. Only the last call to where() applied to a Cursor has any effect.

Parameters:
  • code: JavaScript expression to use as a filter
class pymongo.cursor.RawBatchCursor(collection, filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None)

Create a new cursor / iterator over raw batches of BSON data.

Should not be called directly by application developers - see find_raw_batches() instead.

See also

The MongoDB documentation on

cursors

bulk – The bulk write operations interface

The bulk write operations interface.

New in version 2.7.

class pymongo.bulk.BulkOperationBuilder(collection, ordered=True, bypass_document_validation=False)

DEPRECATED: Initialize a new BulkOperationBuilder instance.

Parameters:
  • collection: A Collection instance.
  • ordered (optional): If True all operations will be executed serially, in the order provided, and the entire execution will abort on the first error. If False operations will be executed in arbitrary order (possibly in parallel on the server), reporting any errors that occurred after attempting all operations. Defaults to True.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.5: Deprecated. Use bulk_write() instead.

Changed in version 3.2: Added bypass_document_validation support

execute(write_concern=None)

Execute all provided operations.

Parameters:
  • write_concern (optional): the write concern for this bulk execution.
find(selector, collation=None)

Specify selection criteria for bulk operations.

Parameters:
  • selector (dict): the selection criteria for update and remove operations.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
Returns:
  • A BulkWriteOperation instance, used to add update and remove operations to this bulk operation.

Changed in version 3.4: Added the collation option.

insert(document)

Insert a single document.

Parameters:
  • document (dict): the document to insert
class pymongo.bulk.BulkUpsertOperation(selector, bulk, collation)

An interface for adding upsert operations.

replace_one(replacement)

Replace one entire document matching the selector criteria.

Parameters:
  • replacement (dict): the replacement document
update(update)

Update all documents matching the selector.

Parameters:
  • update (dict): the update operations to apply
update_one(update)

Update one document matching the selector.

Parameters:
  • update (dict): the update operations to apply
class pymongo.bulk.BulkWriteOperation(selector, bulk, collation)

An interface for adding update or remove operations.

remove()

Remove all documents matching the selector criteria.

remove_one()

Remove a single document matching the selector criteria.

replace_one(replacement)

Replace one entire document matching the selector criteria.

Parameters:
  • replacement (dict): the replacement document
update(update)

Update all documents matching the selector criteria.

Parameters:
  • update (dict): the update operations to apply
update_one(update)

Update one document matching the selector criteria.

Parameters:
  • update (dict): the update operations to apply
upsert()

Specify that all chained update operations should be upserts.

Returns:
errors – Exceptions raised by the pymongo package

Exceptions raised by PyMongo.

exception pymongo.errors.AutoReconnect(message='', errors=None)

Raised when a connection to the database is lost and an attempt to auto-reconnect will be made.

In order to auto-reconnect you must handle this exception, recognizing that the operation which caused it has not necessarily succeeded. Future operations will attempt to open a new connection to the database (and will continue to raise this exception until the first successful connection is made).

Subclass of ConnectionFailure.

exception pymongo.errors.BulkWriteError(results)

Exception class for bulk write errors.

New in version 2.7.

exception pymongo.errors.CollectionInvalid

Raised when collection validation fails.

exception pymongo.errors.ConfigurationError

Raised when something is incorrectly configured.

exception pymongo.errors.ConnectionFailure

Raised when a connection to the database cannot be made or is lost.

exception pymongo.errors.CursorNotFound(error, code=None, details=None)

Raised while iterating query results if the cursor is invalidated on the server.

New in version 2.7.

exception pymongo.errors.DocumentTooLarge

Raised when an encoded document is too large for the connected server.

exception pymongo.errors.DuplicateKeyError(error, code=None, details=None)

Raised when an insert or update fails due to a duplicate key error.

exception pymongo.errors.ExceededMaxWaiters

Raised when a thread tries to get a connection from a pool and maxPoolSize * waitQueueMultiple threads are already waiting.

New in version 2.6.

exception pymongo.errors.ExecutionTimeout(error, code=None, details=None)

Raised when a database operation times out, exceeding the $maxTimeMS set in the query or command option.

Note

Requires server version >= 2.6.0

New in version 2.7.

exception pymongo.errors.InvalidName

Raised when an invalid name is used.

exception pymongo.errors.InvalidOperation

Raised when a client attempts to perform an invalid operation.

exception pymongo.errors.InvalidURI

Raised when trying to parse an invalid mongodb URI.

exception pymongo.errors.NetworkTimeout(message='', errors=None)

An operation on an open connection exceeded socketTimeoutMS.

The remaining connections in the pool stay open. In the case of a write operation, you cannot know whether it succeeded or failed.

Subclass of AutoReconnect.

exception pymongo.errors.NotMasterError(message='', errors=None)

The server responded “not master” or “node is recovering”.

These errors result from a query, write, or command. The operation failed because the client thought it was using the primary but the primary has stepped down, or the client thought it was using a healthy secondary but the secondary is stale and trying to recover.

The client launches a refresh operation on a background thread, to update its view of the server as soon as possible after throwing this exception.

Subclass of AutoReconnect.

exception pymongo.errors.OperationFailure(error, code=None, details=None)

Raised when a database operation fails.

New in version 2.7: The details attribute.

code

The error code returned by the server, if any.

details

The complete error document returned by the server.

Depending on the error that occurred, the error document may include useful information beyond just the error message. When connected to a mongos the error document may contain one or more subdocuments if errors occurred on multiple shards.

exception pymongo.errors.ProtocolError

Raised for failures related to the wire protocol.

exception pymongo.errors.PyMongoError

Base class for all PyMongo exceptions.

exception pymongo.errors.ServerSelectionTimeoutError(message='', errors=None)

Thrown when no MongoDB server is available for an operation

If there is no suitable server for an operation PyMongo tries for serverSelectionTimeoutMS (default 30 seconds) to find one, then throws this exception. For example, it is thrown after attempting an operation when PyMongo cannot connect to any server, or if you attempt an insert into a replica set that has no primary and does not elect one within the timeout window, or if you attempt to query with a Read Preference that the replica set cannot satisfy.

exception pymongo.errors.WTimeoutError(error, code=None, details=None)

Raised when a database operation times out (i.e. wtimeout expires) before replication completes.

With newer versions of MongoDB the details attribute may include write concern fields like ‘n’, ‘updatedExisting’, or ‘writtenTo’.

New in version 2.7.

exception pymongo.errors.WriteConcernError(error, code=None, details=None)

Base exception type for errors raised due to write concern.

New in version 3.0.

exception pymongo.errors.WriteError(error, code=None, details=None)

Base exception type for errors raised during write operations.

New in version 3.0.

message – Tools for creating messages to be sent to MongoDB

Tools for creating messages to be sent to MongoDB.

Note

This module is for internal use and is generally not needed by application developers.

pymongo.message.delete(collection_name, spec, safe, last_error_args, opts, flags=0)

Get a delete message.

opts is a CodecOptions. flags is a bit vector that may contain the SingleRemove flag or not:

http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/#op-delete

pymongo.message.get_more(collection_name, num_to_return, cursor_id)

Get a getMore message.

pymongo.message.insert(collection_name, docs, check_keys, safe, last_error_args, continue_on_error, opts)

Get an insert message.

pymongo.message.kill_cursors(cursor_ids)

Get a killCursors message.

pymongo.message.query(options, collection_name, num_to_skip, num_to_return, query, field_selector, opts, check_keys=False)

Get a query message.

pymongo.message.update(collection_name, upsert, multi, spec, doc, safe, last_error_args, check_keys, opts)

Get an update message.

monitoring – Tools for monitoring driver events.

Tools to monitor driver events.

New in version 3.1.

Use register() to register global listeners for specific events. Listeners must inherit from one of the abstract classes below and implement the correct functions for that class.

For example, a simple command logger might be implemented like this:

import logging

from pymongo import monitoring

class CommandLogger(monitoring.CommandListener):

    def started(self, event):
        logging.info("Command {0.command_name} with request id "
                     "{0.request_id} started on server "
                     "{0.connection_id}".format(event))

    def succeeded(self, event):
        logging.info("Command {0.command_name} with request id "
                     "{0.request_id} on server {0.connection_id} "
                     "succeeded in {0.duration_micros} "
                     "microseconds".format(event))

    def failed(self, event):
        logging.info("Command {0.command_name} with request id "
                     "{0.request_id} on server {0.connection_id} "
                     "failed in {0.duration_micros} "
                     "microseconds".format(event))

monitoring.register(CommandLogger())

Server discovery and monitoring events are also available. For example:

class ServerLogger(monitoring.ServerListener):

    def opened(self, event):
        logging.info("Server {0.server_address} added to topology "
                     "{0.topology_id}".format(event))

    def description_changed(self, event):
        previous_server_type = event.previous_description.server_type
        new_server_type = event.new_description.server_type
        if new_server_type != previous_server_type:
            # server_type_name was added in PyMongo 3.4
            logging.info(
                "Server {0.server_address} changed type from "
                "{0.previous_description.server_type_name} to "
                "{0.new_description.server_type_name}".format(event))

    def closed(self, event):
        logging.warning("Server {0.server_address} removed from topology "
                        "{0.topology_id}".format(event))


class HeartbeatLogger(monitoring.ServerHeartbeatListener):

    def started(self, event):
        logging.info("Heartbeat sent to server "
                     "{0.connection_id}".format(event))

    def succeeded(self, event):
        # The reply.document attribute was added in PyMongo 3.4.
        logging.info("Heartbeat to server {0.connection_id} "
                     "succeeded with reply "
                     "{0.reply.document}".format(event))

    def failed(self, event):
        logging.warning("Heartbeat to server {0.connection_id} "
                        "failed with error {0.reply}".format(event))

class TopologyLogger(monitoring.TopologyListener):

    def opened(self, event):
        logging.info("Topology with id {0.topology_id} "
                     "opened".format(event))

    def description_changed(self, event):
        logging.info("Topology description updated for "
                     "topology id {0.topology_id}".format(event))
        previous_topology_type = event.previous_description.topology_type
        new_topology_type = event.new_description.topology_type
        if new_topology_type != previous_topology_type:
            # topology_type_name was added in PyMongo 3.4
            logging.info(
                "Topology {0.topology_id} changed type from "
                "{0.previous_description.topology_type_name} to "
                "{0.new_description.topology_type_name}".format(event))
        # The has_writable_server and has_readable_server methods
        # were added in PyMongo 3.4.
        if not event.new_description.has_writable_server():
            logging.warning("No writable servers available.")
        if not event.new_description.has_readable_server():
            logging.warning("No readable servers available.")

    def closed(self, event):
        logging.info("Topology with id {0.topology_id} "
                     "closed".format(event))

Event listeners can also be registered per instance of MongoClient:

client = MongoClient(event_listeners=[CommandLogger()])

Note that previously registered global listeners are automatically included when configuring per client event listeners. Registering a new global listener will not add that listener to existing client instances.

Note

Events are delivered synchronously. Application threads block waiting for event handlers (e.g. started()) to return. Care must be taken to ensure that your event handlers are efficient enough to not adversely affect overall application performance.

Warning

The command documents published through this API are not copies. If you intend to modify them in any way you must copy them in your event handler first.

pymongo.monitoring.register(listener)

Register a global event listener.

Parameters:
class pymongo.monitoring.CommandListener

Abstract base class for command listeners. Handles CommandStartedEvent, CommandSucceededEvent, and CommandFailedEvent.

failed(event)

Abstract method to handle a CommandFailedEvent.

Parameters:
started(event)

Abstract method to handle a CommandStartedEvent.

Parameters:
succeeded(event)

Abstract method to handle a CommandSucceededEvent.

Parameters:
class pymongo.monitoring.ServerListener

Abstract base class for server listeners. Handles ServerOpeningEvent, ServerDescriptionChangedEvent, and ServerClosedEvent.

New in version 3.3.

closed(event)

Abstract method to handle a ServerClosedEvent.

Parameters:
description_changed(event)

Abstract method to handle a ServerDescriptionChangedEvent.

Parameters:
opened(event)

Abstract method to handle a ServerOpeningEvent.

Parameters:
class pymongo.monitoring.ServerHeartbeatListener

Abstract base class for server heartbeat listeners. Handles ServerHeartbeatStartedEvent, ServerHeartbeatSucceededEvent, and ServerHeartbeatFailedEvent.

New in version 3.3.

failed(event)

Abstract method to handle a ServerHeartbeatFailedEvent.

Parameters:
started(event)

Abstract method to handle a ServerHeartbeatStartedEvent.

Parameters:
succeeded(event)

Abstract method to handle a ServerHeartbeatSucceededEvent.

Parameters:
class pymongo.monitoring.TopologyListener

Abstract base class for topology monitoring listeners. Handles TopologyOpenedEvent, TopologyDescriptionChangedEvent, and TopologyClosedEvent.

New in version 3.3.

closed(event)

Abstract method to handle a TopologyClosedEvent.

Parameters:
description_changed(event)

Abstract method to handle a TopologyDescriptionChangedEvent.

Parameters:
opened(event)

Abstract method to handle a TopologyOpenedEvent.

Parameters:
class pymongo.monitoring.CommandStartedEvent(command, database_name, *args)

Event published when a command starts.

Parameters:
  • command: The command document.
  • database_name: The name of the database this command was run against.
  • request_id: The request id for this operation.
  • connection_id: The address (host, port) of the server this command was sent to.
  • operation_id: An optional identifier for a series of related events.
command

The command document.

command_name

The command name.

connection_id

The address (host, port) of the server this command was sent to.

database_name

The name of the database this command was run against.

operation_id

An id for this series of events or None.

request_id

The request id for this operation.

class pymongo.monitoring.CommandSucceededEvent(duration, reply, command_name, request_id, connection_id, operation_id)

Event published when a command succeeds.

Parameters:
  • duration: The command duration as a datetime.timedelta.
  • reply: The server reply document.
  • command_name: The command name.
  • request_id: The request id for this operation.
  • connection_id: The address (host, port) of the server this command was sent to.
  • operation_id: An optional identifier for a series of related events.
command_name

The command name.

connection_id

The address (host, port) of the server this command was sent to.

duration_micros

The duration of this operation in microseconds.

operation_id

An id for this series of events or None.

reply

The server failure document for this operation.

request_id

The request id for this operation.

class pymongo.monitoring.CommandFailedEvent(duration, failure, *args)

Event published when a command fails.

Parameters:
  • duration: The command duration as a datetime.timedelta.
  • failure: The server reply document.
  • command_name: The command name.
  • request_id: The request id for this operation.
  • connection_id: The address (host, port) of the server this command was sent to.
  • operation_id: An optional identifier for a series of related events.
command_name

The command name.

connection_id

The address (host, port) of the server this command was sent to.

duration_micros

The duration of this operation in microseconds.

failure

The server failure document for this operation.

operation_id

An id for this series of events or None.

request_id

The request id for this operation.

class pymongo.monitoring.ServerDescriptionChangedEvent(previous_description, new_description, *args)

Published when server description changes.

New in version 3.3.

new_description

The new ServerDescription.

previous_description

The previous ServerDescription.

server_address

The address (host/port pair) of the server

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.ServerOpeningEvent(server_address, topology_id)

Published when server is initialized.

New in version 3.3.

server_address

The address (host/port pair) of the server

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.ServerClosedEvent(server_address, topology_id)

Published when server is closed.

New in version 3.3.

server_address

The address (host/port pair) of the server

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.TopologyDescriptionChangedEvent(previous_description, new_description, *args)

Published when the topology description changes.

New in version 3.3.

new_description

The new TopologyDescription.

previous_description

The previous TopologyDescription.

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.TopologyOpenedEvent(topology_id)

Published when the topology is initialized.

New in version 3.3.

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.TopologyClosedEvent(topology_id)

Published when the topology is closed.

New in version 3.3.

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.ServerHeartbeatStartedEvent(connection_id)

Published when a heartbeat is started.

New in version 3.3.

connection_id

The address (host, port) of the server this heartbeat was sent to.

class pymongo.monitoring.ServerHeartbeatSucceededEvent(duration, reply, *args)

Fired when the server heartbeat succeeds.

New in version 3.3.

connection_id

The address (host, port) of the server this heartbeat was sent to.

duration

The duration of this heartbeat in microseconds.

reply

An instance of IsMaster.

class pymongo.monitoring.ServerHeartbeatFailedEvent(duration, reply, *args)

Fired when the server heartbeat fails, either with an “ok: 0” or a socket exception.

New in version 3.3.

connection_id

The address (host, port) of the server this heartbeat was sent to.

duration

The duration of this heartbeat in microseconds.

reply

A subclass of Exception.

mongo_client – Tools for connecting to MongoDB

Tools for connecting to MongoDB.

See also

High Availability and PyMongo for examples of connecting to replica sets or sets of mongos servers.

To get a Database instance from a MongoClient use either dictionary-style or attribute-style access:

>>> from pymongo import MongoClient
>>> c = MongoClient()
>>> c.test_database
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test_database')
>>> c['test-database']
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test-database')
class pymongo.mongo_client.MongoClient(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)

Client for a MongoDB instance, a replica set, or a set of mongoses.

The client object is thread-safe and has connection-pooling built in. If an operation fails because of a network error, ConnectionFailure is raised and the client reconnects in the background. Application code should handle this exception (recognizing that the operation failed) and then continue to execute.

The host parameter can be a full mongodb URI, in addition to a simple hostname. It can also be a list of hostnames or URIs. Any port specified in the host string(s) will override the port parameter. If multiple mongodb URIs containing database or auth information are passed, the last database, username, and password present will be used. For username and passwords reserved characters like ‘:’, ‘/’, ‘+’ and ‘@’ must be percent encoded following RFC 2396:

try:
    # Python 3.x
    from urllib.parse import quote_plus
except ImportError:
    # Python 2.x
    from urllib import quote_plus

uri = "mongodb://%s:%s@%s" % (
    quote_plus(user), quote_plus(password), host)
client = MongoClient(uri)

Unix domain sockets are also supported. The socket path must be percent encoded in the URI:

uri = "mongodb://%s:%s@%s" % (
    quote_plus(user), quote_plus(password), quote_plus(socket_path))
client = MongoClient(uri)

But not when passed as a simple hostname:

client = MongoClient('/tmp/mongodb-27017.sock')

Starting with version 3.6, PyMongo supports mongodb+srv:// URIs. The URI must include one, and only one, hostname. The hostname will be resolved to one or more DNS SRV records which will be used as the seed list for connecting to the MongoDB deployment. When using SRV URIs, the authSource and replicaSet configuration options can be specified using TXT records. See the Initial DNS Seedlist Discovery spec for more details. Note that the use of SRV URIs implicitly enables TLS support. Pass ssl=false in the URI to override.

Note

MongoClient creation will block waiting for answers from DNS when mongodb+srv:// URIs are used.

Note

Starting with version 3.0 the MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads. You can check if the server is available like this:

from pymongo.errors import ConnectionFailure
client = MongoClient()
try:
    # The ismaster command is cheap and does not require auth.
    client.admin.command('ismaster')
except ConnectionFailure:
    print("Server not available")

Warning

When using PyMongo in a multiprocessing context, please read Using PyMongo with Multiprocessing first.

Parameters:
  • host (optional): hostname or IP address or Unix domain socket path of a single mongod or mongos instance to connect to, or a mongodb URI, or a list of hostnames / mongodb URIs. If host is an IPv6 literal it must be enclosed in ‘[‘ and ‘]’ characters following the RFC2732 URL syntax (e.g. ‘[::1]’ for localhost). Multihomed and round robin DNS addresses are not supported.
  • port (optional): port number on which to connect
  • document_class (optional): default class to use for documents returned from queries on this client
  • tz_aware (optional): if True, datetime instances returned as values in a document by this MongoClient will be timezone aware (otherwise they will be naive)
  • connect (optional): if True (the default), immediately begin connecting to MongoDB in the background. Otherwise connect on the first operation.
Other optional parameters can be passed as keyword arguments:
  • maxPoolSize (optional): The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.

  • minPoolSize (optional): The minimum required number of concurrent connections that the pool will maintain to each connected server. Default is 0.

  • maxIdleTimeMS (optional): The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced. Defaults to None (no limit).

  • socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to None (no timeout).

  • connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable. Defaults to 20000 (20 seconds).

  • serverSelectionTimeoutMS: (integer) Controls how long (in milliseconds) the driver will wait to find an available, appropriate server to carry out a database operation; while it is waiting, multiple server monitoring operations may be carried out, each controlled by connectTimeoutMS. Defaults to 30000 (30 seconds).

  • waitQueueTimeoutMS: (integer or None) How long (in milliseconds) a thread will wait for a socket from the pool if the pool has no free sockets. Defaults to None (no timeout).

  • waitQueueMultiple: (integer or None) Multiplied by maxPoolSize to give the number of threads allowed to wait for a socket at one time. Defaults to None (no limit).

  • heartbeatFrequencyMS: (optional) The number of milliseconds between periodic server checks, or None to accept the default frequency of 10 seconds.

  • appname: (string or None) The name of the application that created this MongoClient instance. MongoDB 3.4 and newer will print this value in the server log upon establishing each connection. It is also recorded in the slow query log and profile collections.

  • event_listeners: a list or tuple of event listeners. See monitoring for details.

  • retryWrites: (boolean) Whether supported write operations executed within this MongoClient will be retried once after a network error on MongoDB 3.6+. Defaults to False. The supported write operations are:

    Unsupported write operations include, but are not limited to, aggregate() using the $out pipeline operator and any operation with an unacknowledged write concern (e.g. {w: 0})). See https://github.com/mongodb/specifications/blob/master/source/retryable-writes/retryable-writes.rst

  • socketKeepAlive: (boolean) DEPRECATED Whether to send periodic keep-alive packets on connected sockets. Defaults to True. Disabling it is not recommended, see https://docs.mongodb.com/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-mongodb-deployments”,

Write Concern options:
(Only set if passed. No default values.)
  • w: (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Passing w=0 disables write acknowledgement and all other write concern options.
  • wtimeout: (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
  • j: If True block until write operations have been committed to the journal. Cannot be used in combination with fsync. Prior to MongoDB 2.6 this option was ignored if the server was running without journaling. Starting with MongoDB 2.6 write operations will fail with an exception if this option is used when the server is running without journaling.
  • fsync: If True and the server is running without journaling, blocks until the server has synced all data files to disk. If the server is running with journaling, this acts the same as the j option, blocking until write operations have been committed to the journal. Cannot be used in combination with j.
Replica set keyword arguments for connecting with a replica set - either directly or via a mongos:
  • replicaSet: (string or None) The name of the replica set to connect to. The driver will verify that all servers it connects to match this name. Implies that the hosts specified are a seed list and the driver should attempt to find all members of the set. Defaults to None.
Read Preference:
  • readPreference: The replica set read preference for this client. One of primary, primaryPreferred, secondary, secondaryPreferred, or nearest. Defaults to primary.
  • readPreferenceTags: Specifies a tag set as a comma-separated list of colon-separated key-value pairs. For example dc:ny,rack:1. Defaults to None.
  • maxStalenessSeconds: (integer) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Defaults to -1, meaning no maximum. If maxStalenessSeconds is set, it must be a positive integer greater than or equal to 90 seconds.
Authentication:
  • username: A string.

  • password: A string.

    Although username and password must be percent-escaped in a MongoDB URI, they must not be percent-escaped when passed as parameters. In this example, both the space and slash special characters are passed as-is:

    MongoClient(username="user name", password="pass/word")
    
  • authSource: The database to authenticate on. Defaults to the database specified in the URI, if provided, or to “admin”.

  • authMechanism: See MECHANISMS for options. By default, use SCRAM-SHA-1 with MongoDB 3.0 and later, MONGODB-CR (MongoDB Challenge Response protocol) for older servers.

  • authMechanismProperties: Used to specify authentication mechanism specific options. To specify the service name for GSSAPI authentication pass authMechanismProperties=’SERVICE_NAME:<service name>’

SSL configuration:
  • ssl: If True, create the connection to the server using SSL. Defaults to False.
  • ssl_certfile: The certificate file used to identify the local connection against mongod. Implies ssl=True. Defaults to None.
  • ssl_keyfile: The private keyfile used to identify the local connection against mongod. If included with the certfile then only the ssl_certfile is needed. Implies ssl=True. Defaults to None.
  • ssl_pem_passphrase: The password or passphrase for decrypting the private key in ssl_certfile or ssl_keyfile. Only necessary if the private key is encrypted. Only supported by python 2.7.9+ (pypy 2.5.1+) and 3.3+. Defaults to None.
  • ssl_cert_reqs: Specifies whether a certificate is required from the other side of the connection, and whether it will be validated if provided. It must be one of the three values ssl.CERT_NONE (certificates ignored), ssl.CERT_REQUIRED (certificates required and validated), or ssl.CERT_OPTIONAL (the same as CERT_REQUIRED, unless the server was configured to use anonymous ciphers). If the value of this parameter is not ssl.CERT_NONE and a value is not provided for ssl_ca_certs PyMongo will attempt to load system provided CA certificates. If the python version in use does not support loading system CA certificates then the ssl_ca_certs parameter must point to a file of CA certificates. Implies ssl=True. Defaults to ssl.CERT_REQUIRED if not provided and ssl=True.
  • ssl_ca_certs: The ca_certs file contains a set of concatenated “certification authority” certificates, which are used to validate certificates passed from the other end of the connection. Implies ssl=True. Defaults to None.
  • ssl_crlfile: The path to a PEM or DER formatted certificate revocation list. Only supported by python 2.7.9+ (pypy 2.5.1+) and 3.4+. Defaults to None.
  • ssl_match_hostname: If True (the default), and ssl_cert_reqs is not ssl.CERT_NONE, enables hostname verification using the match_hostname() function from python’s ssl module. Think very carefully before setting this to False as that could make your application vulnerable to man-in-the-middle attacks.
Read Concern options:
(If not set explicitly, this will use the server default)
  • readConcernLevel: (string) The read concern level specifies the level of isolation for read operations. For example, a read operation using a read concern level of majority will only return data that has been written to a majority of nodes. If the level is left unspecified, the server default will be used.

See also

The MongoDB documentation on

connections

Changed in version 3.6: Added support for mongodb+srv:// URIs. Added the retryWrites keyword argument and URI option.

Changed in version 3.5: Add username and password options. Document the authSource, authMechanism, and authMechanismProperties `` options. Deprecated the `socketKeepAlive` keyword argument and URI option. `socketKeepAlive` now defaults to ``True.

Changed in version 3.0: MongoClient is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs.

The MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads.

Therefore the alive method is removed since it no longer provides meaningful information; even if the client is disconnected, it may discover a server in time to fulfill the next operation.

In PyMongo 2.x, MongoClient accepted a list of standalone MongoDB servers and used the first it could connect to:

MongoClient(['host1.com:27017', 'host2.com:27017'])

A list of multiple standalones is no longer supported; if multiple servers are listed they must be members of the same replica set, or mongoses in the same sharded cluster.

The behavior for a list of mongoses is changed from “high availability” to “load balancing”. Before, the client connected to the lowest-latency mongos in the list, and used it until a network error prompted it to re-evaluate all mongoses’ latencies and reconnect to one of them. In PyMongo 3, the client monitors its network latency to all the mongoses continuously, and distributes operations evenly among those with the lowest latency. See mongos Load Balancing for more information.

The connect option is added.

The start_request, in_request, and end_request methods are removed, as well as the auto_start_request option.

The copy_database method is removed, see the copy_database examples for alternatives.

The MongoClient.disconnect() method is removed; it was a synonym for close().

MongoClient no longer returns an instance of Database for attribute names with leading underscores. You must use dict-style lookups instead:

client['__my_database__']

Not:

client.__my_database__
close()

Cleanup client resources and disconnect from MongoDB.

On MongoDB >= 3.6, end all server sessions created by this client by sending one or more endSessions commands.

Close all sockets in the connection pools and stop the monitor threads. If this instance is used again it will be automatically re-opened and the threads restarted.

Changed in version 3.6: End all server sessions created by this client.

c[db_name] || c.db_name

Get the db_name Database on MongoClient c.

Raises InvalidName if an invalid database name is used.

event_listeners

The event listeners registered for this client.

See monitoring for details.

address

(host, port) of the current standalone, primary, or mongos, or None.

Accessing address raises InvalidOperation if the client is load-balancing among mongoses, since there is no single address. Use nodes instead.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

New in version 3.0.

primary

The (host, port) of the current primary of the replica set.

Returns None if this client is not connected to a replica set, there is no primary, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

secondaries

The secondary members known to this client.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no visible secondaries, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

arbiters

Arbiters in the replica set.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no arbiters, or this client was created without the replicaSet option.

is_primary

If this client is connected to a server that can accept writes.

True if the current server is a standalone, mongos, or the primary of a replica set. If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

is_mongos

If this client is connected to mongos. If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available..

max_pool_size

The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.

When a server’s pool has reached max_pool_size, operations for that server block waiting for a socket to be returned to the pool. If waitQueueTimeoutMS is set, a blocked operation will raise ConnectionFailure after a timeout. By default waitQueueTimeoutMS is not set.

min_pool_size

The minimum required number of concurrent connections that the pool will maintain to each connected server. Default is 0.

max_idle_time_ms

The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced. Defaults to None (no limit).

nodes

Set of all currently connected servers.

Warning

When connected to a replica set the value of nodes can change over time as MongoClient’s view of the replica set changes. nodes can also be an empty set when MongoClient is first instantiated and hasn’t yet connected to any servers, or a network partition causes it to lose connection to all servers.

max_bson_size

The largest BSON object the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

max_message_size

The largest message the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

max_write_batch_size

The maxWriteBatchSize reported by the server.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

Returns a default value when connected to server versions prior to MongoDB 2.6.

local_threshold_ms

The local threshold for this instance.

server_selection_timeout

The server selection timeout for this instance in seconds.

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

read_concern

Read only access to the ReadConcern of this instance.

New in version 3.2.

is_locked

Is this server locked? While locked, all write operations are blocked, although read operations may still be allowed. Use unlock() to unlock.

start_session(causal_consistency=True)

Start a logical session.

This method takes the same parameters as SessionOptions. See the client_session module for details and examples.

Requires MongoDB 3.6. It is an error to call start_session() if this client has been authenticated to multiple databases using the deprecated method authenticate().

A ClientSession may only be used with the MongoClient that started it.

Returns:An instance of ClientSession.

New in version 3.6.

list_databases(session=None, **kwargs)

Get a cursor over the databases of the connected server.

Parameters:
  • session (optional): a ClientSession.
  • **kwargs (optional): Optional parameters of the listDatabases command can be passed as keyword arguments to this method. The supported options differ by server version.
Returns:

An instance of CommandCursor.

New in version 3.6.

list_database_names(session=None)

Get a list of the names of all databases on the connected server.

Parameters:

Changed in version 3.6: Added session parameter.

database_names(session=None)

Get a list of the names of all databases on the connected server.

Parameters:

Changed in version 3.6: Added session parameter.

drop_database(name_or_database, session=None)

Drop a database.

Raises TypeError if name_or_database is not an instance of basestring (str in python 3) or Database.

Parameters:
  • name_or_database: the name of a database to drop, or a Database instance representing the database to drop
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

Note

The write_concern of this client is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.4: Apply this client’s write concern automatically to this operation when connected to MongoDB >= 3.4.

get_database(name=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a Database with the given name and options.

Useful for creating a Database with different codec options, read preference, and/or write concern from this MongoClient.

>>> client.read_preference
Primary()
>>> db1 = client.test
>>> db1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> db2 = client.get_database(
...     'test', read_preference=ReadPreference.SECONDARY)
>>> db2.read_preference
Secondary(tag_sets=None)
Parameters:

Changed in version 3.5: The name parameter is now optional, defaulting to the database named in the MongoDB connection URI.

server_info(session=None)

Get information about the MongoDB server we’re connected to.

Parameters:

Changed in version 3.6: Added session parameter.

close_cursor(cursor_id, address=None)

Send a kill cursors message soon with the given id.

Raises TypeError if cursor_id is not an instance of (int, long). What closing the cursor actually means depends on this client’s cursor manager.

This method may be called from a Cursor destructor during garbage collection, so it isn’t safe to take a lock or do network I/O. Instead, we schedule the cursor to be closed soon on a background thread.

Parameters:
  • cursor_id: id of cursor to close
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.0: Added address parameter.

kill_cursors(cursor_ids, address=None)

DEPRECATED - Send a kill cursors message soon with the given ids.

Raises TypeError if cursor_ids is not an instance of list.

Parameters:
  • cursor_ids: list of cursor ids to kill
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.3: Deprecated.

Changed in version 3.0: Now accepts an address argument. Schedules the cursors to be closed on a background thread instead of sending the message immediately.

set_cursor_manager(manager_class)

DEPRECATED - Set this client’s cursor manager.

Raises TypeError if manager_class is not a subclass of CursorManager. A cursor manager handles closing cursors. Different managers can implement different policies in terms of when to actually kill a cursor that has been closed.

Parameters:
  • manager_class: cursor manager to use

Changed in version 3.3: Deprecated, for real this time.

Changed in version 3.0: Undeprecated.

fsync(**kwargs)

Flush all pending writes to datafiles.

Optional parameters can be passed as keyword arguments:
  • lock: If True lock the server to disallow writes.
  • async: If True don’t block while synchronizing.
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

Warning

async and lock can not be used together.

Warning

MongoDB does not support the async option on Windows and will raise an exception on that platform.

unlock(session=None)

Unlock a previously locked server.

Parameters:

Changed in version 3.6: Added session parameter.

get_default_database()

DEPRECATED - Get the database named in the MongoDB connection URI.

>>> uri = 'mongodb://host/my_database'
>>> client = MongoClient(uri)
>>> db = client.get_default_database()
>>> assert db.name == 'my_database'
>>> db = client.get_database()
>>> assert db.name == 'my_database'

Useful in scripts where you want to choose which database to use based only on the URI in a configuration file.

Changed in version 3.5: Deprecated, use get_database() instead.

mongo_replica_set_client – Tools for connecting to a MongoDB replica set

Deprecated. See High Availability and PyMongo.

class pymongo.mongo_replica_set_client.MongoReplicaSetClient(hosts_or_uri, document_class=dict, tz_aware=False, connect=True, **kwargs)

Deprecated alias for MongoClient.

MongoReplicaSetClient will be removed in a future version of PyMongo.

Changed in version 3.0: MongoClient is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs.

The refresh method is removed from MongoReplicaSetClient, as are the seeds and hosts properties.

close()

Cleanup client resources and disconnect from MongoDB.

On MongoDB >= 3.6, end all server sessions created by this client by sending one or more endSessions commands.

Close all sockets in the connection pools and stop the monitor threads. If this instance is used again it will be automatically re-opened and the threads restarted.

Changed in version 3.6: End all server sessions created by this client.

c[db_name] || c.db_name

Get the db_name Database on MongoReplicaSetClient c.

Raises InvalidName if an invalid database name is used.

primary

The (host, port) of the current primary of the replica set.

Returns None if this client is not connected to a replica set, there is no primary, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

secondaries

The secondary members known to this client.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no visible secondaries, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

arbiters

Arbiters in the replica set.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no arbiters, or this client was created without the replicaSet option.

max_pool_size

The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.

When a server’s pool has reached max_pool_size, operations for that server block waiting for a socket to be returned to the pool. If waitQueueTimeoutMS is set, a blocked operation will raise ConnectionFailure after a timeout. By default waitQueueTimeoutMS is not set.

max_bson_size

The largest BSON object the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

max_message_size

The largest message the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

local_threshold_ms

The local threshold for this instance.

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

database_names(session=None)

Get a list of the names of all databases on the connected server.

Parameters:

Changed in version 3.6: Added session parameter.

drop_database(name_or_database, session=None)

Drop a database.

Raises TypeError if name_or_database is not an instance of basestring (str in python 3) or Database.

Parameters:
  • name_or_database: the name of a database to drop, or a Database instance representing the database to drop
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

Note

The write_concern of this client is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.4: Apply this client’s write concern automatically to this operation when connected to MongoDB >= 3.4.

get_database(name=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a Database with the given name and options.

Useful for creating a Database with different codec options, read preference, and/or write concern from this MongoClient.

>>> client.read_preference
Primary()
>>> db1 = client.test
>>> db1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> db2 = client.get_database(
...     'test', read_preference=ReadPreference.SECONDARY)
>>> db2.read_preference
Secondary(tag_sets=None)
Parameters:
  • name (optional): The name of the database - a string. If None (the default) the database named in the MongoDB connection URI is returned.
  • codec_options (optional): An instance of CodecOptions. If None (the default) the codec_options of this MongoClient is used.
  • read_preference (optional): The read preference to use. If None (the default) the read_preference of this MongoClient is used. See read_preferences for options.
  • write_concern (optional): An instance of WriteConcern. If None (the default) the write_concern of this MongoClient is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) the read_concern of this MongoClient is used.

Changed in version 3.5: The name parameter is now optional, defaulting to the database named in the MongoDB connection URI.

close_cursor(cursor_id, address=None)

Send a kill cursors message soon with the given id.

Raises TypeError if cursor_id is not an instance of (int, long). What closing the cursor actually means depends on this client’s cursor manager.

This method may be called from a Cursor destructor during garbage collection, so it isn’t safe to take a lock or do network I/O. Instead, we schedule the cursor to be closed soon on a background thread.

Parameters:
  • cursor_id: id of cursor to close
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.0: Added address parameter.

kill_cursors(cursor_ids, address=None)

DEPRECATED - Send a kill cursors message soon with the given ids.

Raises TypeError if cursor_ids is not an instance of list.

Parameters:
  • cursor_ids: list of cursor ids to kill
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.3: Deprecated.

Changed in version 3.0: Now accepts an address argument. Schedules the cursors to be closed on a background thread instead of sending the message immediately.

set_cursor_manager(manager_class)

DEPRECATED - Set this client’s cursor manager.

Raises TypeError if manager_class is not a subclass of CursorManager. A cursor manager handles closing cursors. Different managers can implement different policies in terms of when to actually kill a cursor that has been closed.

Parameters:
  • manager_class: cursor manager to use

Changed in version 3.3: Deprecated, for real this time.

Changed in version 3.0: Undeprecated.

get_default_database()

DEPRECATED - Get the database named in the MongoDB connection URI.

>>> uri = 'mongodb://host/my_database'
>>> client = MongoClient(uri)
>>> db = client.get_default_database()
>>> assert db.name == 'my_database'
>>> db = client.get_database()
>>> assert db.name == 'my_database'

Useful in scripts where you want to choose which database to use based only on the URI in a configuration file.

Changed in version 3.5: Deprecated, use get_database() instead.

operations – Operation class definitions

Operation class definitions.

class pymongo.operations.DeleteMany(filter, collation=None)

Create a DeleteMany instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the documents to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

Changed in version 3.5: Added the collation option.

class pymongo.operations.DeleteOne(filter, collation=None)

Create a DeleteOne instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the document to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

Changed in version 3.5: Added the collation option.

class pymongo.operations.IndexModel(keys, **kwargs)

Create an Index instance.

For use with create_indexes().

Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of basestring (str in python 3), and the direction(s) must be one of (ASCENDING, DESCENDING, GEO2D, GEOHAYSTACK, GEOSPHERE, HASHED, TEXT).

Valid options include, but are not limited to:

  • name: custom name to use for this index - if none is given, a name will be generated.
  • unique: if True creates a uniqueness constraint on the index.
  • background: if True this index should be created in the background.
  • sparse: if True, omit from the index any documents that lack the indexed field.
  • bucketSize: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
  • min: minimum value for keys in a GEO2D index.
  • max: maximum value for keys in a GEO2D index.
  • expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
  • partialFilterExpression: A document that specifies a filter for a partial index.
  • collation: An instance of Collation that specifies the collation to use in MongoDB >= 3.4.

See the MongoDB documentation for a full list of supported options by server version.

Note

partialFilterExpression requires server version >= 3.2

Parameters:
  • keys: a single key or a list of (key, direction) pairs specifying the index to create
  • **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments

Changed in version 3.2: Added partialFilterExpression to support partial indexes.

document

An index document suitable for passing to the createIndexes command.

class pymongo.operations.InsertOne(document)

Create an InsertOne instance.

For use with bulk_write().

Parameters:
  • document: The document to insert. If the document is missing an _id field one will be added.
class pymongo.operations.ReplaceOne(filter, replacement, upsert=False, collation=None)

Create a ReplaceOne instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the document to replace.
  • replacement: The new document.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

Changed in version 3.5: Added the collation option.

class pymongo.operations.UpdateMany(filter, update, upsert=False, collation=None, array_filters=None)

Create an UpdateMany instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the documents to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.

Changed in version 3.6: Added the array_filters option.

Changed in version 3.5: Added the collation option.

class pymongo.operations.UpdateOne(filter, update, upsert=False, collation=None, array_filters=None)

Represents an update_one operation.

For use with bulk_write().

Parameters:
  • filter: A query that matches the document to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.

Changed in version 3.6: Added the array_filters option.

Changed in version 3.5: Added the collation option.

pool – Pool module for use with a MongoDB client.
class pymongo.pool.SocketInfo(sock, pool, ismaster, address)

Store a socket with some metadata.

Parameters:
  • sock: a raw socket object
  • pool: a Pool instance
  • ismaster: optional IsMaster instance, response to ismaster on sock
  • address: the server’s (host, port)
authenticate(credentials)

Log in to the server and store these credentials in authset.

Can raise ConnectionFailure or OperationFailure.

Parameters:
  • credentials: A MongoCredential.
check_auth(all_credentials)

Update this socket’s authentication.

Log in or out to bring this socket’s credentials up to date with those provided. Can raise ConnectionFailure or OperationFailure.

Parameters:
  • all_credentials: dict, maps auth source to MongoCredential.
command(dbname, spec, slave_ok=False, read_preference=Primary(), codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None), check=True, allowable_errors=None, check_keys=False, read_concern=None, write_concern=None, parse_write_concern_error=False, collation=None, session=None, client=None, retryable_write=False)

Execute a command or raise an error.

Parameters:
  • dbname: name of the database on which to run the command
  • spec: a command document as a dict, SON, or mapping object
  • slave_ok: whether to set the SlaveOkay wire protocol bit
  • read_preference: a read preference
  • codec_options: a CodecOptions instance
  • check: raise OperationFailure if there are errors
  • allowable_errors: errors to ignore if check is True
  • check_keys: if True, check spec for invalid keys
  • read_concern: The read concern for this command.
  • write_concern: The write concern for this command.
  • parse_write_concern_error: Whether to parse the writeConcernError field in the command response.
  • collation: The collation for this command.
  • session: optional ClientSession instance.
  • client: optional MongoClient for gossipping $clusterTime.
  • retryable_write: True if this command is a retryable write.
legacy_write(request_id, msg, max_doc_size, with_last_error)

Send OP_INSERT, etc., optionally returning response as a dict.

Can raise ConnectionFailure or OperationFailure.

Parameters:
  • request_id: an int.
  • msg: bytes, an OP_INSERT, OP_UPDATE, or OP_DELETE message, perhaps with a getlasterror command appended.
  • max_doc_size: size in bytes of the largest document in msg.
  • with_last_error: True if a getlasterror command is appended.
receive_message(request_id)

Receive a raw BSON message or raise ConnectionFailure.

If any exception is raised, the socket is closed.

send_cluster_time(command, session, client)

Add cluster time for MongoDB >= 3.6.

send_message(message, max_doc_size)

Send a raw BSON message or raise ConnectionFailure.

If a network exception is raised, the socket is closed.

validate_session(client, session)

Validate this session before use with client.

Raises error if this session is logged in as a different user or the client is not the one that created the session.

write_command(request_id, msg)

Send “insert” etc. command, returning response as a dict.

Can raise ConnectionFailure or OperationFailure.

Parameters:
  • request_id: an int.
  • msg: bytes, the command message.
read_concern – Tools for working with read concern.

Tools for working with read concerns.

class pymongo.read_concern.ReadConcern(level=None)
Parameters:
  • level: (string) The read concern level specifies the level of isolation for read operations. For example, a read operation using a read concern level of majority will only return data that has been written to a majority of nodes. If the level is left unspecified, the server default will be used.

New in version 3.2.

document

The document representation of this read concern.

Note

ReadConcern is immutable. Mutating the value of document does not mutate this ReadConcern.

level

The read concern level.

ok_for_legacy

Return True if this read concern is compatible with old wire protocol versions.

read_preferences – Utilities for choosing which member of a replica set to read from.

Utilities for choosing which member of a replica set to read from.

class pymongo.read_preferences.Primary

Primary read preference.

  • When directly connected to one mongod queries are allowed if the server is standalone or a replica set primary.
  • When connected to a mongos queries are sent to the primary of a shard.
  • When connected to a replica set queries are sent to the primary of the replica set.
document

Read preference as a document.

mode

The mode of this read preference instance.

name

The name of this read preference.

class pymongo.read_preferences.PrimaryPreferred(tag_sets=None, max_staleness=-1)

PrimaryPreferred read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are sent to the primary of a shard if available, otherwise a shard secondary.
  • When connected to a replica set queries are sent to the primary if available, otherwise a secondary.
Parameters:
  • tag_sets: The tag_sets to use if the primary is not available.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.Secondary(tag_sets=None, max_staleness=-1)

Secondary read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are distributed among shard secondaries. An error is raised if no secondaries are available.
  • When connected to a replica set queries are distributed among secondaries. An error is raised if no secondaries are available.
Parameters:
  • tag_sets: The tag_sets for this read preference.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.SecondaryPreferred(tag_sets=None, max_staleness=-1)

SecondaryPreferred read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are distributed among shard secondaries, or the shard primary if no secondary is available.
  • When connected to a replica set queries are distributed among secondaries, or the primary if no secondary is available.
Parameters:
  • tag_sets: The tag_sets for this read preference.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.Nearest(tag_sets=None, max_staleness=-1)

Nearest read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are distributed among all members of a shard.
  • When connected to a replica set queries are distributed among all members.
Parameters:
  • tag_sets: The tag_sets for this read preference.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.ReadPreference

An enum that defines the read preference modes supported by PyMongo.

See High Availability and PyMongo for code examples.

A read preference is used in three cases:

MongoClient connected to a single mongod:

  • PRIMARY: Queries are allowed if the server is standalone or a replica set primary.
  • All other modes allow queries to standalone servers, to a replica set primary, or to replica set secondaries.

MongoClient initialized with the replicaSet option:

  • PRIMARY: Read from the primary. This is the default, and provides the strongest consistency. If no primary is available, raise AutoReconnect.
  • PRIMARY_PREFERRED: Read from the primary if available, or if there is none, read from a secondary.
  • SECONDARY: Read from a secondary. If no secondary is available, raise AutoReconnect.
  • SECONDARY_PREFERRED: Read from a secondary if available, otherwise from the primary.
  • NEAREST: Read from any member.

MongoClient connected to a mongos, with a sharded cluster of replica sets:

  • PRIMARY: Read from the primary of the shard, or raise OperationFailure if there is none. This is the default.
  • PRIMARY_PREFERRED: Read from the primary of the shard, or if there is none, read from a secondary of the shard.
  • SECONDARY: Read from a secondary of the shard, or raise OperationFailure if there is none.
  • SECONDARY_PREFERRED: Read from a secondary of the shard if available, otherwise from the shard primary.
  • NEAREST: Read from any shard member.
PRIMARY = Primary()
PRIMARY_PREFERRED = PrimaryPreferred(tag_sets=None, max_staleness=-1)
SECONDARY = Secondary(tag_sets=None, max_staleness=-1)
SECONDARY_PREFERRED = SecondaryPreferred(tag_sets=None, max_staleness=-1)
NEAREST = Nearest(tag_sets=None, max_staleness=-1)
results – Result class definitions

Result class definitions.

class pymongo.results.BulkWriteResult(bulk_api_result, acknowledged)

Create a BulkWriteResult instance.

Parameters:
  • bulk_api_result: A result dict from the bulk API
  • acknowledged: Was this write result acknowledged? If False then all properties of this object will raise InvalidOperation.
acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

bulk_api_result

The raw bulk API result.

deleted_count

The number of documents deleted.

inserted_count

The number of documents inserted.

matched_count

The number of documents matched for an update.

modified_count

The number of documents modified.

Note

modified_count is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, this attribute will be set to None.

upserted_count

The number of documents upserted.

upserted_ids

A map of operation index to the _id of the upserted document.

class pymongo.results.DeleteResult(raw_result, acknowledged)

The return type for delete_one() and delete_many()

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

deleted_count

The number of documents deleted.

raw_result

The raw result document returned by the server.

class pymongo.results.InsertManyResult(inserted_ids, acknowledged)

The return type for insert_many().

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

inserted_ids

A list of _ids of the inserted documents, in the order provided.

Note

If False is passed for the ordered parameter to insert_many() the server may have inserted the documents in a different order than what is presented here.

class pymongo.results.InsertOneResult(inserted_id, acknowledged)

The return type for insert_one().

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

inserted_id

The inserted document’s _id.

class pymongo.results.UpdateResult(raw_result, acknowledged)

The return type for update_one(), update_many(), and replace_one().

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

matched_count

The number of documents matched for this update.

modified_count

The number of documents modified.

Note

modified_count is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, this attribute will be set to None.

raw_result

The raw result document returned by the server.

upserted_id

The _id of the inserted document if an upsert took place. Otherwise None.

son_manipulator – Manipulators that can edit SON documents as they are saved or retrieved

DEPRECATED: Manipulators that can edit SON objects as they enter and exit a database.

The SONManipulator API has limitations as a technique for transforming your data. Instead, it is more flexible and straightforward to transform outgoing documents in your own code before passing them to PyMongo, and transform incoming documents after receiving them from PyMongo. SON Manipulators will be removed from PyMongo in 4.0.

PyMongo does not apply SON manipulators to documents passed to the modern methods bulk_write(), insert_one(), insert_many(), update_one(), or update_many(). SON manipulators are not applied to documents returned by the modern methods find_one_and_delete(), find_one_and_replace(), and find_one_and_update().

class pymongo.son_manipulator.AutoReference(db)

Transparently reference and de-reference already saved embedded objects.

This manipulator should probably only be used when the NamespaceInjector is also being used, otherwise it doesn’t make too much sense - documents can only be auto-referenced if they have an _ns field.

NOTE: this will behave poorly if you have a circular reference.

TODO: this only works for documents that are in the same database. To fix this we’ll need to add a DatabaseInjector that adds _db and then make use of the optional database support for DBRefs.

transform_incoming(son, collection)

Replace embedded documents with DBRefs.

transform_outgoing(son, collection)

Replace DBRefs with embedded documents.

will_copy()

We need to copy so the user’s document doesn’t get transformed refs.

class pymongo.son_manipulator.NamespaceInjector

A son manipulator that adds the _ns field.

transform_incoming(son, collection)

Add the _ns field to the incoming object

class pymongo.son_manipulator.ObjectIdInjector

A son manipulator that adds the _id field if it is missing.

Changed in version 2.7: ObjectIdInjector is no longer used by PyMongo, but remains in this module for backwards compatibility.

transform_incoming(son, collection)

Add an _id field if it is missing.

class pymongo.son_manipulator.ObjectIdShuffler

A son manipulator that moves _id to the first position.

transform_incoming(son, collection)

Move _id to the front if it’s there.

will_copy()

We need to copy to be sure that we are dealing with SON, not a dict.

class pymongo.son_manipulator.SONManipulator

A base son manipulator.

This manipulator just saves and restores objects without changing them.

transform_incoming(son, collection)

Manipulate an incoming SON object.

Parameters:
  • son: the SON object to be inserted into the database
  • collection: the collection the object is being inserted into
transform_outgoing(son, collection)

Manipulate an outgoing SON object.

Parameters:
  • son: the SON object being retrieved from the database
  • collection: the collection this object was stored in
will_copy()

Will this SON manipulator make a copy of the incoming document?

Derived classes that do need to make a copy should override this method, returning True instead of False. All non-copying manipulators will be applied first (so that the user’s document will be updated appropriately), followed by copying manipulators.

cursor_manager – Managers to handle when cursors are killed after being closed

DEPRECATED - A manager to handle when cursors are killed after they are closed.

New cursor managers should be defined as subclasses of CursorManager and can be installed on a client by calling set_cursor_manager().

Changed in version 3.3: Deprecated, for real this time.

Changed in version 3.0: Undeprecated. close() now requires an address argument. The BatchCursorManager class is removed.

class pymongo.cursor_manager.CursorManager(client)

Instantiate the manager.

Parameters:
  • client: a MongoClient
close(cursor_id, address)

Kill a cursor.

Raises TypeError if cursor_id is not an instance of (int, long).

Parameters:
  • cursor_id: cursor id to close
  • address: the cursor’s server’s (host, port) pair

Changed in version 3.0: Now requires an address argument.

uri_parser – Tools to parse and validate a MongoDB URI

Tools to parse and validate a MongoDB URI.

pymongo.uri_parser.parse_host(entity, default_port=27017)

Validates a host string

Returns a 2-tuple of host followed by port where port is default_port if it wasn’t specified in the string.

Parameters:
  • entity: A host or host:port string where host could be a
    hostname or IP address.
  • default_port: The port number to use when one wasn’t
    specified in entity.
pymongo.uri_parser.parse_ipv6_literal_host(entity, default_port)

Validates an IPv6 literal host:port string.

Returns a 2-tuple of IPv6 literal followed by port where port is default_port if it wasn’t specified in entity.

Parameters:
  • entity: A string that represents an IPv6 literal enclosed
    in braces (e.g. ‘[::1]’ or ‘[::1]:27017’).
  • default_port: The port number to use when one wasn’t
    specified in entity.
pymongo.uri_parser.parse_uri(uri, default_port=27017, validate=True, warn=False)

Parse and validate a MongoDB URI.

Returns a dict of the form:

{
    'nodelist': <list of (host, port) tuples>,
    'username': <username> or None,
    'password': <password> or None,
    'database': <database name> or None,
    'collection': <collection name> or None,
    'options': <dict of MongoDB URI options>
}

If the URI scheme is “mongodb+srv://” DNS SRV and TXT lookups will be done to build nodelist and options.

Parameters:
  • uri: The MongoDB URI to parse.
  • default_port: The port number to use when one wasn’t specified for a host in the URI.
  • validate: If True (the default), validate and normalize all options.
  • warn (optional): When validating, if True then will warn the user then ignore any invalid options or values. If False, validation will error when options are unsupported or values are invalid.

Changed in version 3.6: Added support for mongodb+srv:// URIs

Changed in version 3.5: Return the original value of the readPreference MongoDB URI option instead of the validated read preference mode.

Changed in version 3.1: warn added so invalid options can be ignored.

pymongo.uri_parser.parse_userinfo(userinfo)

Validates the format of user information in a MongoDB URI. Reserved characters like ‘:’, ‘/’, ‘+’ and ‘@’ must be escaped following RFC 3986.

Returns a 2-tuple containing the unescaped username followed by the unescaped password.

Paramaters:
  • userinfo: A string of the form <username>:<password>

Changed in version 2.2: Now uses urllib.unquote_plus so + characters must be escaped.

pymongo.uri_parser.split_hosts(hosts, default_port=27017)

Takes a string of the form host1[:port],host2[:port]… and splits it into (host, port) tuples. If [:port] isn’t present the default_port is used.

Returns a set of 2-tuples containing the host name (or IP) followed by port number.

Parameters:
  • hosts: A string of the form host1[:port],host2[:port],…
  • default_port: The port number to use when one wasn’t specified for a host.
pymongo.uri_parser.split_options(opts, validate=True, warn=False)

Takes the options portion of a MongoDB URI, validates each option and returns the options in a dictionary.

Parameters:
  • opt: A string representing MongoDB URI options.
  • validate: If True (the default), validate and normalize all options.
pymongo.uri_parser.validate_options(opts, warn=False)

Validates and normalizes options passed in a MongoDB URI.

Returns a new dictionary of validated and normalized options. If warn is False then errors will be thrown for invalid options, otherwise they will be ignored and a warning will be issued.

Parameters:
  • opts: A dict of MongoDB URI options.
  • warn (optional): If True then warnigns will be logged and invalid options will be ignored. Otherwise invalid options will cause errors.
write_concern – Tools for specifying write concern

Tools for working with write concerns.

class pymongo.write_concern.WriteConcern(w=None, wtimeout=None, j=None, fsync=None)
Parameters:
  • w: (integer or string) Used with replication, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<integer> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). w=0 disables acknowledgement of write operations and can not be used with other write concern options.
  • wtimeout: (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
  • j: If True block until write operations have been committed to the journal. Cannot be used in combination with fsync. Prior to MongoDB 2.6 this option was ignored if the server was running without journaling. Starting with MongoDB 2.6 write operations will fail with an exception if this option is used when the server is running without journaling.
  • fsync: If True and the server is running without journaling, blocks until the server has synced all data files to disk. If the server is running with journaling, this acts the same as the j option, blocking until write operations have been committed to the journal. Cannot be used in combination with j.
acknowledged

If True write operations will wait for acknowledgement before returning.

document

The document representation of this write concern.

Note

WriteConcern is immutable. Mutating the value of document does not mutate this WriteConcern.

gridfs – Tools for working with GridFS

GridFS is a specification for storing large objects in Mongo.

The gridfs package is an implementation of GridFS on top of pymongo, exposing a file-like interface.

See also

The MongoDB documentation on

gridfs

class gridfs.GridFS(database, collection='fs')

Create a new instance of GridFS.

Raises TypeError if database is not an instance of Database.

Parameters:
  • database: database to use
  • collection (optional): root collection to use

Changed in version 3.1: Indexes are only ensured on the first write to the DB.

Changed in version 3.0: database must use an acknowledged write_concern

See also

The MongoDB documentation on

gridfs

delete(file_id, session=None)

Delete a file from GridFS by "_id".

Deletes all data belonging to the file with "_id": file_id.

Warning

Any processes/threads reading from the file while this method is executing will likely see an invalid/corrupt file. Care should be taken to avoid concurrent reads to a file while it is being deleted.

Note

Deletes of non-existent files are considered successful since the end result is the same: no file with that _id remains.

Parameters:
  • file_id: "_id" of the file to delete
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

Changed in version 3.1: delete no longer ensures indexes.

exists(document_or_id=None, session=None, **kwargs)

Check if a file exists in this instance of GridFS.

The file to check for can be specified by the value of its _id key, or by passing in a query document. A query document can be passed in as dictionary, or by using keyword arguments. Thus, the following three calls are equivalent:

>>> fs.exists(file_id)
>>> fs.exists({"_id": file_id})
>>> fs.exists(_id=file_id)

As are the following two calls:

>>> fs.exists({"filename": "mike.txt"})
>>> fs.exists(filename="mike.txt")

And the following two:

>>> fs.exists({"foo": {"$gt": 12}})
>>> fs.exists(foo={"$gt": 12})

Returns True if a matching file exists, False otherwise. Calls to exists() will not automatically create appropriate indexes; application developers should be sure to create indexes if needed and as appropriate.

Parameters:
  • document_or_id (optional): query document, or _id of the document to check for
  • session (optional): a ClientSession
  • **kwargs (optional): keyword arguments are used as a query document, if they’re present.

Changed in version 3.6: Added session parameter.

find(*args, **kwargs)

Query GridFS for files.

Returns a cursor that iterates across files matching arbitrary queries on the files collection. Can be combined with other modifiers for additional control. For example:

for grid_out in fs.find({"filename": "lisa.txt"},
                        no_cursor_timeout=True):
    data = grid_out.read()

would iterate through all versions of “lisa.txt” stored in GridFS. Note that setting no_cursor_timeout to True may be important to prevent the cursor from timing out during long multi-file processing work.

As another example, the call:

most_recent_three = fs.find().sort("uploadDate", -1).limit(3)

would return a cursor to the three most recently uploaded files in GridFS.

Follows a similar interface to find() in Collection.

If a ClientSession is passed to find(), all returned GridOut instances are associated with that session.

Parameters:
  • filter (optional): a SON object specifying elements which must be present for a document to be included in the result set
  • skip (optional): the number of files to omit (from the start of the result set) when returning the results
  • limit (optional): the maximum number of results to return
  • no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.
  • sort (optional): a list of (key, direction) pairs specifying the sort order for this query. See sort() for details.

Raises TypeError if any of the arguments are of improper type. Returns an instance of GridOutCursor corresponding to this query.

Changed in version 3.0: Removed the read_preference, tag_sets, and secondary_acceptable_latency_ms options.

New in version 2.7.

See also

The MongoDB documentation on

find

find_one(filter=None, session=None, *args, **kwargs)

Get a single file from gridfs.

All arguments to find() are also valid arguments for find_one(), although any limit argument will be ignored. Returns a single GridOut, or None if no matching file is found. For example:

file = fs.find_one({"filename": "lisa.txt"})
Parameters:
  • filter (optional): a dictionary specifying the query to be performing OR any other type to be used as the value for a query for "_id" in the file collection.
  • *args (optional): any additional positional arguments are the same as the arguments to find().
  • session (optional): a ClientSession
  • **kwargs (optional): any additional keyword arguments are the same as the arguments to find().

Changed in version 3.6: Added session parameter.

get(file_id, session=None)

Get a file from GridFS by "_id".

Returns an instance of GridOut, which provides a file-like interface for reading.

Parameters:
  • file_id: "_id" of the file to get
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

get_last_version(filename=None, session=None, **kwargs)

Get the most recent version of a file in GridFS by "filename" or metadata fields.

Equivalent to calling get_version() with the default version (-1).

Parameters:
  • filename: "filename" of the file to get, or None
  • session (optional): a ClientSession
  • **kwargs (optional): find files by custom metadata.

Changed in version 3.6: Added session parameter.

get_version(filename=None, version=-1, session=None, **kwargs)

Get a file from GridFS by "filename" or metadata fields.

Returns a version of the file in GridFS whose filename matches filename and whose metadata fields match the supplied keyword arguments, as an instance of GridOut.

Version numbering is a convenience atop the GridFS API provided by MongoDB. If more than one file matches the query (either by filename alone, by metadata fields, or by a combination of both), then version -1 will be the most recently uploaded matching file, -2 the second most recently uploaded, etc. Version 0 will be the first version uploaded, 1 the second version, etc. So if three versions have been uploaded, then version 0 is the same as version -3, version 1 is the same as version -2, and version 2 is the same as version -1.

Raises NoFile if no such version of that file exists.

Parameters:
  • filename: "filename" of the file to get, or None
  • version (optional): version of the file to get (defaults to -1, the most recent version uploaded)
  • session (optional): a ClientSession
  • **kwargs (optional): find files by custom metadata.

Changed in version 3.6: Added session parameter.

Changed in version 3.1: get_version no longer ensures indexes.

list(session=None)

List the names of all files stored in this instance of GridFS.

Parameters:

Changed in version 3.6: Added session parameter.

Changed in version 3.1: list no longer ensures indexes.

new_file(**kwargs)

Create a new file in GridFS.

Returns a new GridIn instance to which data can be written. Any keyword arguments will be passed through to GridIn().

If the "_id" of the file is manually specified, it must not already exist in GridFS. Otherwise FileExists is raised.

Parameters:
  • **kwargs (optional): keyword arguments for file creation
put(data, **kwargs)

Put data in GridFS as a new file.

Equivalent to doing:

try:
    f = new_file(**kwargs)
    f.write(data)
finally:
    f.close()

data can be either an instance of str (bytes in python 3) or a file-like object providing a read() method. If an encoding keyword argument is passed, data can also be a unicode (str in python 3) instance, which will be encoded as encoding before being written. Any keyword arguments will be passed through to the created file - see GridIn() for possible arguments. Returns the "_id" of the created file.

If the "_id" of the file is manually specified, it must not already exist in GridFS. Otherwise FileExists is raised.

Parameters:
  • data: data to be written as a file.
  • **kwargs (optional): keyword arguments for file creation

Changed in version 3.0: w=0 writes to GridFS are now prohibited.

class gridfs.GridFSBucket(db, bucket_name='fs', chunk_size_bytes=261120, write_concern=None, read_preference=None)

Create a new instance of GridFSBucket.

Raises TypeError if database is not an instance of Database.

Raises ConfigurationError if write_concern is not acknowledged.

Parameters:
  • database: database to use.
  • bucket_name (optional): The name of the bucket. Defaults to ‘fs’.
  • chunk_size_bytes (optional): The chunk size in bytes. Defaults to 255KB.
  • write_concern (optional): The WriteConcern to use. If None (the default) db.write_concern is used.
  • read_preference (optional): The read preference to use. If None (the default) db.read_preference is used.

New in version 3.1.

See also

The MongoDB documentation on

gridfs

delete(file_id, session=None)

Given an file_id, delete this stored file’s files collection document and associated chunks from a GridFS bucket.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get _id of file to delete
file_id = fs.upload_from_stream("test_file", "data I want to store!")
fs.delete(file_id)

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be deleted.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

download_to_stream(file_id, destination, session=None)

Downloads the contents of the stored file specified by file_id and writes the contents to destination.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get _id of file to read
file_id = fs.upload_from_stream("test_file", "data I want to store!")
# Get file to write to
file = open('myfile','wb+')
fs.download_to_stream(file_id, file)
file.seek(0)
contents = file.read()

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be downloaded.
  • destination: a file-like object implementing write().
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

download_to_stream_by_name(filename, destination, revision=-1, session=None)

Write the contents of filename (with optional revision) to destination.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get file to write to
file = open('myfile','wb')
fs.download_to_stream_by_name("test_file", file)

Raises NoFile if no such version of that file exists.

Raises ValueError if filename is not a string.

Parameters:
  • filename: The name of the file to read from.
  • destination: A file-like object that implements write().
  • revision (optional): Which revision (documents with the same filename and different uploadDate) of the file to retrieve. Defaults to -1 (the most recent revision).
  • session (optional): a ClientSession
Note:

Revision numbers are defined as follows:

  • 0 = the original stored file
  • 1 = the first revision
  • 2 = the second revision
  • etc…
  • -2 = the second most recent revision
  • -1 = the most recent revision

Changed in version 3.6: Added session parameter.

find(*args, **kwargs)

Find and return the files collection documents that match filter

Returns a cursor that iterates across files matching arbitrary queries on the files collection. Can be combined with other modifiers for additional control.

For example:

for grid_data in fs.find({"filename": "lisa.txt"},
                        no_cursor_timeout=True):
    data = grid_data.read()

would iterate through all versions of “lisa.txt” stored in GridFS. Note that setting no_cursor_timeout to True may be important to prevent the cursor from timing out during long multi-file processing work.

As another example, the call:

most_recent_three = fs.find().sort("uploadDate", -1).limit(3)

would return a cursor to the three most recently uploaded files in GridFS.

Follows a similar interface to find() in Collection.

If a ClientSession is passed to find(), all returned GridOut instances are associated with that session.

Parameters:
  • filter: Search query.
  • batch_size (optional): The number of documents to return per batch.
  • limit (optional): The maximum number of documents to return.
  • no_cursor_timeout (optional): The server normally times out idle cursors after an inactivity period (10 minutes) to prevent excess memory use. Set this option to True prevent that.
  • skip (optional): The number of documents to skip before returning.
  • sort (optional): The order by which to sort results. Defaults to None.
open_download_stream(file_id, session=None)

Opens a Stream from which the application can read the contents of the stored file specified by file_id.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# get _id of file to read.
file_id = fs.upload_from_stream("test_file", "data I want to store!")
grid_out = fs.open_download_stream(file_id)
contents = grid_out.read()

Returns an instance of GridOut.

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be downloaded.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

open_download_stream_by_name(filename, revision=-1, session=None)

Opens a Stream from which the application can read the contents of filename and optional revision.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
grid_out = fs.open_download_stream_by_name("test_file")
contents = grid_out.read()

Returns an instance of GridOut.

Raises NoFile if no such version of that file exists.

Raises ValueError filename is not a string.

Parameters:
  • filename: The name of the file to read from.
  • revision (optional): Which revision (documents with the same filename and different uploadDate) of the file to retrieve. Defaults to -1 (the most recent revision).
  • session (optional): a ClientSession
Note:

Revision numbers are defined as follows:

  • 0 = the original stored file
  • 1 = the first revision
  • 2 = the second revision
  • etc…
  • -2 = the second most recent revision
  • -1 = the most recent revision

Changed in version 3.6: Added session parameter.

open_upload_stream(filename, chunk_size_bytes=None, metadata=None, session=None)

Opens a Stream that the application can write the contents of the file to.

The user must specify the filename, and can choose to add any additional information in the metadata field of the file document or modify the chunk size. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
grid_in, file_id = fs.open_upload_stream(
      "test_file", chunk_size_bytes=4,
      metadata={"contentType": "text/plain"})
grid_in.write("data I want to store!")
grid_in.close()  # uploaded on close

Returns an instance of GridIn.

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • filename: The name of the file to upload.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes in GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

open_upload_stream_with_id(file_id, filename, chunk_size_bytes=None, metadata=None, session=None)

Opens a Stream that the application can write the contents of the file to.

The user must specify the file id and filename, and can choose to add any additional information in the metadata field of the file document or modify the chunk size. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
grid_in, file_id = fs.open_upload_stream(
      ObjectId(),
      "test_file",
      chunk_size_bytes=4,
      metadata={"contentType": "text/plain"})
grid_in.write("data I want to store!")
grid_in.close()  # uploaded on close

Returns an instance of GridIn.

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • file_id: The id to use for this file. The id must not have already been used for another file.
  • filename: The name of the file to upload.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes in GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

rename(file_id, new_filename, session=None)

Renames the stored file with the specified file_id.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get _id of file to rename
file_id = fs.upload_from_stream("test_file", "data I want to store!")
fs.rename(file_id, "new_test_name")

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be renamed.
  • new_filename: The new name of the file.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

upload_from_stream(filename, source, chunk_size_bytes=None, metadata=None, session=None)

Uploads a user file to a GridFS bucket.

Reads the contents of the user file from source and uploads it to the file filename. Source can be a string or file-like object. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
file_id = fs.upload_from_stream(
    "test_file",
    "data I want to store!",
    chunk_size_bytes=4,
    metadata={"contentType": "text/plain"})

Returns the _id of the uploaded file.

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • filename: The name of the file to upload.
  • source: The source stream of the content to be uploaded. Must be a file-like object that implements read() or a string.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes of GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

upload_from_stream_with_id(file_id, filename, source, chunk_size_bytes=None, metadata=None, session=None)

Uploads a user file to a GridFS bucket with a custom file id.

Reads the contents of the user file from source and uploads it to the file filename. Source can be a string or file-like object. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
file_id = fs.upload_from_stream(
    ObjectId(),
    "test_file",
    "data I want to store!",
    chunk_size_bytes=4,
    metadata={"contentType": "text/plain"})

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • file_id: The id to use for this file. The id must not have already been used for another file.
  • filename: The name of the file to upload.
  • source: The source stream of the content to be uploaded. Must be a file-like object that implements read() or a string.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes of GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

Sub-modules:

errors – Exceptions raised by the gridfs package

Exceptions raised by the gridfs package

exception gridfs.errors.CorruptGridFile

Raised when a file in GridFS is malformed.

exception gridfs.errors.FileExists

Raised when trying to create a file that already exists.

exception gridfs.errors.GridFSError

Base class for all GridFS exceptions.

exception gridfs.errors.NoFile

Raised when trying to read from a non-existent file.

grid_file – Tools for representing files stored in GridFS

Tools for representing files stored in GridFS.

class gridfs.grid_file.GridIn(root_collection, session=None, **kwargs)

Write a file to GridFS

Application developers should generally not need to instantiate this class directly - instead see the methods provided by GridFS.

Raises TypeError if root_collection is not an instance of Collection.

Any of the file level options specified in the GridFS Spec may be passed as keyword arguments. Any additional keyword arguments will be set as additional fields on the file document. Valid keyword arguments include:

  • "_id": unique ID for this file (default: ObjectId) - this "_id" must not have already been used for another file
  • "filename": human name for the file
  • "contentType" or "content_type": valid mime-type for the file
  • "chunkSize" or "chunk_size": size of each of the chunks, in bytes (default: 255 kb)
  • "encoding": encoding used for this file. In Python 2, any unicode that is written to the file will be converted to a str. In Python 3, any str that is written to the file will be converted to bytes.
Parameters:
  • root_collection: root collection to write to
  • session (optional): a ClientSession to use for all commands
  • **kwargs (optional): file level options (see above)

Changed in version 3.6: Added session parameter.

Changed in version 3.0: root_collection must use an acknowledged write_concern

_id

The '_id' value for this file.

This attribute is read-only.

abort()

Remove all chunks/files that may have been uploaded and close.

chunk_size

Chunk size for this file.

This attribute is read-only.

close()

Flush the file and close it.

A closed file cannot be written any more. Calling close() more than once is allowed.

closed

Is this file closed?

content_type

Mime-type for this file.

filename

Name of this file.

length

Length (in bytes) of this file.

This attribute is read-only and can only be read after close() has been called.

md5

MD5 of the contents of this file (generated on the server).

This attribute is read-only and can only be read after close() has been called.

name

Alias for filename.

upload_date

Date that this file was uploaded.

This attribute is read-only and can only be read after close() has been called.

write(data)

Write data to the file. There is no return value.

data can be either a string of bytes or a file-like object (implementing read()). If the file has an encoding attribute, data can also be a unicode (str in python 3) instance, which will be encoded as encoding before being written.

Due to buffering, the data may not actually be written to the database until the close() method is called. Raises ValueError if this file is already closed. Raises TypeError if data is not an instance of str (bytes in python 3), a file-like object, or an instance of unicode (str in python 3). Unicode data is only allowed if the file has an encoding attribute.

Parameters:
  • data: string of bytes or file-like object to be written to the file
writelines(sequence)

Write a sequence of strings to the file.

Does not add seperators.

class gridfs.grid_file.GridOut(root_collection, file_id=None, file_document=None, session=None)

Read a file from GridFS

Application developers should generally not need to instantiate this class directly - instead see the methods provided by GridFS.

Either file_id or file_document must be specified, file_document will be given priority if present. Raises TypeError if root_collection is not an instance of Collection.

Parameters:
  • root_collection: root collection to read from
  • file_id (optional): value of "_id" for the file to read
  • file_document (optional): file document from root_collection.files
  • session (optional): a ClientSession to use for all commands

Changed in version 3.6: Added session parameter.

Changed in version 3.0: Creating a GridOut does not immediately retrieve the file metadata from the server. Metadata is fetched when first needed.

_id

The '_id' value for this file.

This attribute is read-only.

__iter__()

Return an iterator over all of this file’s data.

The iterator will return chunk-sized instances of str (bytes in python 3). This can be useful when serving files using a webserver that handles such an iterator efficiently.

aliases

List of aliases for this file.

This attribute is read-only.

chunk_size

Chunk size for this file.

This attribute is read-only.

close()

Make GridOut more generically file-like.

content_type

Mime-type for this file.

This attribute is read-only.

filename

Name of this file.

This attribute is read-only.

length

Length (in bytes) of this file.

This attribute is read-only.

md5

MD5 of the contents of this file (generated on the server).

This attribute is read-only.

metadata

Metadata attached to this file.

This attribute is read-only.

name

Alias for filename.

This attribute is read-only.

read(size=-1)

Read at most size bytes from the file (less if there isn’t enough data).

The bytes are returned as an instance of str (bytes in python 3). If size is negative or omitted all data is read.

Parameters:
  • size (optional): the number of bytes to read
readchunk()

Reads a chunk at a time. If the current position is within a chunk the remainder of the chunk is returned.

readline(size=-1)

Read one line or up to size bytes from the file.

Parameters:
  • size (optional): the maximum number of bytes to read
seek(pos, whence=0)

Set the current position of this file.

Parameters:
  • pos: the position (or offset if using relative positioning) to seek to
  • whence (optional): where to seek from. os.SEEK_SET (0) for absolute file positioning, os.SEEK_CUR (1) to seek relative to the current position, os.SEEK_END (2) to seek relative to the file’s end.
tell()

Return the current position of this file.

upload_date

Date that this file was first uploaded.

This attribute is read-only.

class gridfs.grid_file.GridOutCursor(collection, filter=None, skip=0, limit=0, no_cursor_timeout=False, sort=None, batch_size=0, session=None)

Create a new cursor, similar to the normal Cursor.

Should not be called directly by application developers - see the GridFS method find() instead.

See also

The MongoDB documentation on

cursors

add_option(*args, **kwargs)

Set arbitrary query flags using a bitmask.

To set the tailable flag: cursor.add_option(2)

next()

Get next GridOut object from cursor.

remove_option(*args, **kwargs)

Unset arbitrary query flags using a bitmask.

To unset the tailable flag: cursor.remove_option(2)

Tools

Many tools have been written for working with PyMongo. If you know of or have created a tool for working with MongoDB from Python please list it here.

Note

We try to keep this list current. As such, projects that have not been updated recently or appear to be unmaintained will occasionally be removed from the list or moved to the back (to keep the list from becoming too intimidating).

If a project gets removed that is still being developed or is in active use please let us know or add it back.

ORM-like Layers

Some people have found that they prefer to work with a layer that has more features than PyMongo provides. Often, things like models and validation are desired. To that end, several different ORM-like layers have been written by various authors.

It is our recommendation that new users begin by working directly with PyMongo, as described in the rest of this documentation. Many people have found that the features of PyMongo are enough for their needs. Even if you eventually come to the decision to use one of these layers, the time spent working directly with the driver will have increased your understanding of how MongoDB actually works.

PyMODM
PyMODM is an ORM-like framework on top of PyMongo. PyMODM is maintained by engineers at MongoDB, Inc. and is quick to adopt new MongoDB features. PyMODM is a “core” ODM, meaning that it provides simple, extensible functionality that can be leveraged by other libraries to target platforms like Django. At the same time, PyMODM is powerful enough to be used for developing applications on its own. Complete documentation is available on readthedocs in addition to a Gitter channel for discussing the project.
Humongolus
Humongolus is a lightweight ORM framework for Python and MongoDB. The name comes from the combination of MongoDB and Homunculus (the concept of a miniature though fully formed human body). Humongolus allows you to create models/schemas with robust validation. It attempts to be as pythonic as possible and exposes the pymongo cursor objects whenever possible. The code is available for download at github. Tutorials and usage examples are also available at GitHub.
MongoKit
The MongoKit framework is an ORM-like layer on top of PyMongo. There is also a MongoKit google group.
Ming
Ming (the Merciless) is a library that allows you to enforce schemas on a MongoDB database in your Python application. It was developed by SourceForge in the course of their migration to MongoDB. See the introductory blog post for more details.
MongoAlchemy
MongoAlchemy is another ORM-like layer on top of PyMongo. Its API is inspired by SQLAlchemy. The code is available on github; for more information, see the tutorial.
MongoEngine
MongoEngine is another ORM-like layer on top of PyMongo. It allows you to define schemas for documents and query collections using syntax inspired by the Django ORM. The code is available on github; for more information, see the tutorial.
Minimongo
minimongo is a lightweight, pythonic interface to MongoDB. It retains pymongo’s query and update API, and provides a number of additional features, including a simple document-oriented interface, connection pooling, index management, and collection & database naming helpers. The source is on github.
Manga
Manga aims to be a simpler ORM-like layer on top of PyMongo. The syntax for defining schema is inspired by the Django ORM, but Pymongo’s query language is maintained. The source is on github.
MotorEngine
MotorEngine is a port of MongoEngine to Motor, for asynchronous access with Tornado. It implements the same modeling APIs to be data-portable, meaning that a model defined in MongoEngine can be read in MotorEngine. The source is available on github.

Framework Tools

This section lists tools and adapters that have been designed to work with various Python frameworks and libraries.

Alternative Drivers

These are alternatives to PyMongo.

  • Motor is a full-featured, non-blocking MongoDB driver for Python Tornado applications.
  • TxMongo is an asynchronous Twisted Python driver for MongoDB.

Contributors

The following is a list of people who have contributed to PyMongo. If you belong here and are missing please let us know (or send a pull request after adding yourself to the list):

  • Mike Dirolf (mdirolf)
  • Jeff Jenkins (jeffjenkins)
  • Jim Jones
  • Eliot Horowitz (erh)
  • Michael Stephens (mikejs)
  • Joakim Sernbrant (serbaut)
  • Alexander Artemenko (svetlyak40wt)
  • Mathias Stearn (RedBeard0531)
  • Fajran Iman Rusadi (fajran)
  • Brad Clements (bkc)
  • Andrey Fedorov (andreyf)
  • Joshua Roesslein (joshthecoder)
  • Gregg Lind (gregglind)
  • Michael Schurter (schmichael)
  • Daniel Lundin
  • Michael Richardson (mtrichardson)
  • Dan McKinley (mcfunley)
  • David Wolever (wolever)
  • Carlos Valiente (carletes)
  • Jehiah Czebotar (jehiah)
  • Drew Perttula (drewp)
  • Carl Baatz (c-w-b)
  • Johan Bergstrom (jbergstroem)
  • Jonas Haag (jonashaag)
  • Kristina Chodorow (kchodorow)
  • Andrew Sibley (sibsibsib)
  • Flavio Percoco Premoli (FlaPer87)
  • Ken Kurzweil (kurzweil)
  • Christian Wyglendowski (dowski)
  • James Murty (jmurty)
  • Brendan W. McAdams (bwmcadams)
  • Bernie Hackett (behackett)
  • Reed O’Brien (reedobrien)
  • Francisco Souza (fsouza)
  • Alexey I. Froloff (raorn)
  • Steve Lacy (slacy)
  • Richard Shea (shearic)
  • Vladimir Sidorenko (gearheart)
  • Aaron Westendorf (awestendorf)
  • Dan Crosta (dcrosta)
  • Ryan Smith-Roberts (rmsr)
  • David Pisoni (gefilte)
  • Abhay Vardhan (abhayv)
  • Alexey Borzenkov (snaury)
  • Kostya Rybnikov (k-bx)
  • A Jesse Jiryu Davis (ajdavis)
  • Samuel Clay (samuelclay)
  • Ross Lawley (rozza)
  • Wouter Bolsterlee (wbolster)
  • Alex Grönholm (agronholm)
  • Christoph Simon (kalanzun)
  • Chris Tompkinson (tompko)
  • Mike O’Brien (mpobrien)
  • T Dampier (dampier)
  • Michael Henson (hensom)
  • Craig Hobbs (craigahobbs)
  • Emily Stolfo (estolfo)
  • Sam Helman (shelman)
  • Justin Patrin (reversefold)
  • Xiuming Chen (cxmcc)
  • Tyler Jones (thomascirca)
  • Amalia Hawkins (hawka)
  • Yuchen Ying (yegle)
  • Kyle Erf (3rf)
  • Luke Lovett (lovett89)
  • Jaroslav Semančík (girogiro)
  • Don Mitchell (dmitchell)
  • Ximing (armnotstrong)
  • Can Zhang (cannium)
  • Sergey Azovskov (last-g)
  • Heewa Barfchin (heewa)
  • Anna Herlihy (aherlihy)
  • Len Buckens (buckensl)
  • ultrabug
  • Shane Harvey (ShaneHarvey)
  • Cao Siyang (caosiyang)
  • Zhecong Kwok (gzcf)
  • TaoBeier(tao12345666333)
  • Jagrut Trivedi(Jagrut)

Changelog

Changes in Version 3.6.0

Version 3.6 adds support for MongoDB 3.6, drops support for CPython 3.3 (PyPy3 is still supported), and drops support for MongoDB versions older than 2.6. If connecting to a MongoDB 2.4 server or older, PyMongo now throws a ConfigurationError.

Highlights include:

Deprecations:

  • The useCursor option for aggregate() is deprecated. The option was only necessary when upgrading from MongoDB 2.4 to MongoDB 2.6. MongoDB 2.4 is no longer supported.
  • The add_user() and remove_user() methods are deprecated. See the method docstrings for alternatives.

Unavoidable breaking changes:

  • Starting in MongoDB 3.6, the deprecated methods authenticate() and logout() now invalidate all cursors created prior. Instead of using these methods to change credentials, pass credentials for one user to the MongoClient at construction time, and either grant access to several databases to one user account, or use a distinct client object for each user.
  • BSON binary subtype 4 is decoded using RFC-4122 byte order regardless of the UUID representation. This is a change in behavior for applications that use UUID representation bson.binary.JAVA_LEGACY or bson.binary.CSHARP_LEGACY to decode BSON binary subtype 4. Other UUID representations, bson.binary.PYTHON_LEGACY (the default) and bson.binary.STANDARD, and the decoding of BSON binary subtype 3 are unchanged.
Issues Resolved

See the PyMongo 3.6 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.5.1

Version 3.5.1 fixes bugs reported since the release of 3.5.0:

  • Work around socket.getsockopt issue with NetBSD.
  • pymongo.command_cursor.CommandCursor.close() now closes the cursor synchronously instead of deferring to a background thread.
  • Fix documentation build warnings with Sphinx 1.6.x.
Issues Resolved

See the PyMongo 3.5.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.5

Version 3.5 implements a number of improvements and bug fixes:

Highlights include:

Changes and Deprecations:

Issues Resolved

See the PyMongo 3.5 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.4

Version 3.4 implements the new server features introduced in MongoDB 3.4 and a whole lot more:

Highlights include:

  • Complete support for MongoDB 3.4:
  • Improved support for logging server discovery and monitoring events. See monitoring for examples.
  • Support for matching iPAddress subjectAltName values for TLS certificate verification.
  • TLS compression is now explicitly disabled when possible.
  • The Server Name Indication (SNI) TLS extension is used when possible.
  • Finer control over JSON encoding/decoding with JSONOptions.
  • Allow Code objects to have a scope of None, signifying no scope. Also allow encoding Code objects with an empty scope (i.e. {}).

Warning

Starting in PyMongo 3.4, bson.code.Code.scope may return None, as the default scope is None instead of {}.

Note

PyMongo 3.4+ attempts to create sockets non-inheritable when possible (i.e. it sets the close-on-exec flag on socket file descriptors). Support is limited to a subset of POSIX operating systems (not including Windows) and the flag usually cannot be set in a single atomic operation. CPython 3.4+ implements PEP 446, creating all file descriptors non-inheritable by default. Users that require this behavior are encouraged to upgrade to CPython 3.4+.

Since 3.4rc0, the max staleness option has been renamed from maxStalenessMS to maxStalenessSeconds, its smallest value has changed from twice heartbeatFrequencyMS to 90 seconds, and its default value has changed from None or 0 to -1.

Issues Resolved

See the PyMongo 3.4 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.3.1

Version 3.3.1 fixes a memory leak when decoding elements inside of a RawBSONDocument.

Issues Resolved

See the PyMongo 3.3.1 release notes in Jira for the list of resolved issues in this release.

Changes in Version 3.3

Version 3.3 adds the following major new features:

  • C extensions support on big endian systems.
  • Kerberos authentication support on Windows using WinKerberos.
  • A new ssl_clrfile option to support certificate revocation lists.
  • A new ssl_pem_passphrase option to support encrypted key files.
  • Support for publishing server discovery and monitoring events. See monitoring for details.
  • New connection pool options minPoolSize and maxIdleTimeMS.
  • New heartbeatFrequencyMS option controls the rate at which background monitoring threads re-check servers. Default is once every 10 seconds.

Warning

PyMongo 3.3 drops support for MongoDB versions older than 2.4. It also drops support for python 3.2 (pypy3 continues to be supported).

Issues Resolved

See the PyMongo 3.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.2.2

Version 3.2.2 fixes a few issues reported since the release of 3.2.1, including a fix for using the connect option in the MongoDB URI and support for setting the batch size for a query to 1 when using MongoDB 3.2+.

Issues Resolved

See the PyMongo 3.2.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.2.1

Version 3.2.1 fixes a few issues reported since the release of 3.2, including running the mapreduce command twice when calling the inline_map_reduce() method and a TypeError being raised when calling download_to_stream(). This release also improves error messaging around BSON decoding.

Issues Resolved

See the PyMongo 3.2.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.2

Version 3.2 implements the new server features introduced in MongoDB 3.2.

Highlights include:

Note

Certain MongoClient properties now block until a connection is established or raise ServerSelectionTimeoutError if no server is available. See MongoClient for details.

Issues Resolved

See the PyMongo 3.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.1.1

Version 3.1.1 fixes a few issues reported since the release of 3.1, including a regression in error handling for oversize command documents and interrupt handling issues in the C extensions.

Issues Resolved

See the PyMongo 3.1.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.1

Version 3.1 implements a few new features and fixes bugs reported since the release of 3.0.3.

Highlights include:

  • Command monitoring support. See monitoring for details.
  • Configurable error handling for UnicodeDecodeError. See the unicode_decode_error_handler option of CodecOptions.
  • Optional automatic timezone conversion when decoding BSON datetime. See the tzinfo option of CodecOptions.
  • An implementation of GridFSBucket from the new GridFS spec.
  • Compliance with the new Connection String spec.
  • Reduced idle CPU usage in Python 2.
Changes in internal classes

The private PeriodicExecutor class no longer takes a condition_class option, and the private thread_util.Event class is removed.

Issues Resolved

See the PyMongo 3.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0.3

Version 3.0.3 fixes issues reported since the release of 3.0.2, including a feature breaking bug in the GSSAPI implementation.

Issues Resolved

See the PyMongo 3.0.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0.2

Version 3.0.2 fixes issues reported since the release of 3.0.1, most importantly a bug that could route operations to replica set members that are not in primary or secondary state when using PrimaryPreferred or Nearest. It is a recommended upgrade for all users of PyMongo 3.0.x.

Issues Resolved

See the PyMongo 3.0.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0.1

Version 3.0.1 fixes issues reported since the release of 3.0, most importantly a bug in GridFS.delete that could prevent file chunks from actually being deleted.

Issues Resolved

See the PyMongo 3.0.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0

PyMongo 3.0 is a partial rewrite of PyMongo bringing a large number of improvements:

  • A unified client class. MongoClient is the one and only client class for connecting to a standalone mongod, replica set, or sharded cluster. Migrating from a standalone, to a replica set, to a sharded cluster can be accomplished with only a simple URI change.
  • MongoClient is much more responsive to configuration changes in your MongoDB deployment. All connected servers are monitored in a non-blocking manner. Slow to respond or down servers no longer block server discovery, reducing application startup time and time to respond to new or reconfigured servers and replica set failovers.
  • A unified CRUD API. All official MongoDB drivers now implement a standard CRUD API allowing polyglot developers to move from language to language with ease.
  • Single source support for Python 2.x and 3.x. PyMongo no longer relies on 2to3 to support Python 3.
  • A rewritten pure Python BSON implementation, improving performance with pypy and cpython deployments without support for C extensions.
  • Better support for greenlet based async frameworks including eventlet.
  • Immutable client, database, and collection classes, avoiding a host of thread safety issues in client applications.

PyMongo 3.0 brings a large number of API changes. Be sure to read the changes listed below before upgrading from PyMongo 2.x.

Warning

PyMongo no longer supports Python 2.4, 2.5, or 3.1. If you must use PyMongo with these versions of Python the 2.x branch of PyMongo will be minimally supported for some time.

SONManipulator changes

The SONManipulator API has limitations as a technique for transforming your data. Instead, it is more flexible and straightforward to transform outgoing documents in your own code before passing them to PyMongo, and transform incoming documents after receiving them from PyMongo.

Thus the add_son_manipulator() method is deprecated. PyMongo 3’s new CRUD API does not apply SON manipulators to documents passed to bulk_write(), insert_one(), insert_many(), update_one(), or update_many(). SON manipulators are not applied to documents returned by the new methods find_one_and_delete(), find_one_and_replace(), and find_one_and_update().

SSL/TLS changes

When ssl is True the ssl_cert_reqs option now defaults to ssl.CERT_REQUIRED if not provided. PyMongo will attempt to load OS provided CA certificates to verify the server, raising ConfigurationError if it cannot.

Gevent Support

In previous versions, PyMongo supported Gevent in two modes: you could call gevent.monkey.patch_socket() and pass use_greenlets=True to MongoClient, or you could simply call gevent.monkey.patch_all() and omit the use_greenlets argument.

In PyMongo 3.0, the use_greenlets option is gone. To use PyMongo with Gevent simply call gevent.monkey.patch_all().

For more information, see PyMongo’s Gevent documentation.

MongoClient changes

MongoClient is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs. MongoClient now also supports the full ReadPreference API.

The obsolete classes MasterSlaveConnection, Connection, and ReplicaSetConnection are removed.

The MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads. The connect option is added to control whether these threads are started immediately, or when the client is first used.

Therefore the alive method is removed since it no longer provides meaningful information; even if the client is disconnected, it may discover a server in time to fulfill the next operation.

In PyMongo 2.x, MongoClient accepted a list of standalone MongoDB servers and used the first it could connect to:

MongoClient(['host1.com:27017', 'host2.com:27017'])

A list of multiple standalones is no longer supported; if multiple servers are listed they must be members of the same replica set, or mongoses in the same sharded cluster.

The behavior for a list of mongoses is changed from “high availability” to “load balancing”. Before, the client connected to the lowest-latency mongos in the list, and used it until a network error prompted it to re-evaluate all mongoses’ latencies and reconnect to one of them. In PyMongo 3, the client monitors its network latency to all the mongoses continuously, and distributes operations evenly among those with the lowest latency. See mongos Load Balancing for more information.

The client methods start_request, in_request, and end_request are removed, and so is the auto_start_request option. Requests were designed to make read-your-writes consistency more likely with the w=0 write concern. Additionally, a thread in a request used the same member for all secondary reads in a replica set. To ensure read-your-writes consistency in PyMongo 3.0, do not override the default write concern with w=0, and do not override the default read preference of PRIMARY.

Support for the slaveOk (or slave_okay), safe, and network_timeout options has been removed. Use SECONDARY_PREFERRED instead of slave_okay. Accept the default write concern, acknowledged writes, instead of setting safe=True. Use socketTimeoutMS in place of network_timeout (note that network_timeout was in seconds, where as socketTimeoutMS is milliseconds).

The max_pool_size option has been removed. It is replaced by the maxPoolSize MongoDB URI option. maxPoolSize is now a supported URI option in PyMongo and can be passed as a keyword argument.

The copy_database method is removed, see the copy_database examples for alternatives.

The disconnect method is removed. Use close() instead.

The get_document_class method is removed. Use codec_options instead.

The get_lasterror_options, set_lasterror_options, and unset_lasterror_options methods are removed. Write concern options can be passed to MongoClient as keyword arguments or MongoDB URI options.

The get_database() method is added for getting a Database instance with its options configured differently than the MongoClient’s.

The following read-only attributes have been added:

The following attributes are now read-only:

The following attributes have been removed:

The following attributes have been renamed:

Cursor changes

The conn_id property is renamed to address.

Cursor management changes

CursorManager and set_cursor_manager() are no longer deprecated. If you subclass CursorManager your implementation of close() must now take a second parameter, address. The BatchCursorManager class is removed.

The second parameter to close_cursor() is renamed from _conn_id to address. kill_cursors() now accepts an address parameter.

Database changes

The connection property is renamed to client.

The following read-only attributes have been added:

The following attributes are now read-only:

Use get_database() for getting a Database instance with its options configured differently than the MongoClient’s.

The following attributes have been removed:

  • safe
  • secondary_acceptable_latency_ms
  • slave_okay
  • tag_sets

The following methods have been added:

The following methods have been changed:

  • command(). Support for as_class, uuid_subtype, tag_sets, and secondary_acceptable_latency_ms have been removed. You can instead pass an instance of CodecOptions as codec_options and an instance of a read preference class from read_preferences as read_preference. The fields and compile_re options are also removed. The fields options was undocumented and never really worked. Regular expressions are always decoded to Regex.

The following methods have been deprecated:

The following methods have been removed:

The get_lasterror_options, set_lasterror_options, and unset_lasterror_options methods have been removed. Use WriteConcern with get_database() instead.

Collection changes

The following read-only attributes have been added:

The following attributes are now read-only:

Use get_collection() or with_options() for getting a Collection instance with its options configured differently than the Database’s.

The following attributes have been removed:

  • safe
  • secondary_acceptable_latency_ms
  • slave_okay
  • tag_sets

The following methods have been added:

The following methods have changed:

  • aggregate() now always returns an instance of CommandCursor. See the documentation for all options.
  • count() now optionally takes a filter argument, as well as other options supported by the count command.
  • distinct() now optionally takes a filter argument.
  • create_index() no longer caches indexes, therefore the cache_for parameter has been removed. It also no longer supports the bucket_size and drop_dups aliases for bucketSize and dropDups.

The following methods are deprecated:

The following methods have been removed:

The get_lasterror_options, set_lasterror_options, and unset_lasterror_options methods have been removed. Use WriteConcern with with_options() instead.

Changes to find() and find_one()

The following find/find_one options have been renamed:

These renames only affect your code if you passed these as keyword arguments, like find(fields=[‘fieldname’]). If you passed only positional parameters these changes are not significant for your application.

  • spec -> filter
  • fields -> projection
  • partial -> allow_partial_results

The following find/find_one options have been added:

  • cursor_type (see CursorType for values)
  • oplog_replay
  • modifiers

The following find/find_one options have been removed:

  • network_timeout (use max_time_ms() instead)
  • slave_okay (use one of the read preference classes from read_preferences and with_options() instead)
  • read_preference (use with_options() instead)
  • tag_sets (use one of the read preference classes from read_preferences and with_options() instead)
  • secondary_acceptable_latency_ms (use the localThresholdMS URI option instead)
  • max_scan (use the new modifiers option instead)
  • snapshot (use the new modifiers option instead)
  • tailable (use the new cursor_type option instead)
  • await_data (use the new cursor_type option instead)
  • exhaust (use the new cursor_type option instead)
  • as_class (use with_options() with CodecOptions instead)
  • compile_re (BSON regular expressions are always decoded to Regex)

The following find/find_one options are deprecated:

  • manipulate

The following renames need special handling.

  • timeout -> no_cursor_timeout - The default for timeout was True. The default for no_cursor_timeout is False. If you were previously passing False for timeout you must pass True for no_cursor_timeout to keep the previous behavior.
errors changes

The exception classes UnsupportedOption and TimeoutError are deleted.

gridfs changes

Since PyMongo 1.6, methods open and close of GridFS raised an UnsupportedAPI exception, as did the entire GridFile class. The unsupported methods, the class, and the exception are all deleted.

bson changes

The compile_re option is removed from all methods that accepted it in bson and json_util. Additionally, it is removed from find(), find_one(), aggregate(), command(), and so on. PyMongo now always represents BSON regular expressions as Regex objects. This prevents errors for incompatible patterns, see PYTHON-500. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

PyMongo now decodes the int64 BSON type to Int64, a trivial wrapper around long (in python 2.x) or int (in python 3.x). This allows BSON int64 to be round tripped without losing type information in python 3. Note that if you store a python long (or a python int larger than 4 bytes) it will be returned from PyMongo as Int64.

The as_class, tz_aware, and uuid_subtype options are removed from all BSON encoding and decoding methods. Use CodecOptions to configure these options. The APIs affected are:

This is a breaking change for any application that uses the BSON API directly and changes any of the named parameter defaults. No changes are required for applications that use the default values for these options. The behavior remains the same.

Issues Resolved

See the PyMongo 3.0 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.5

Version 2.9.5 works around ssl module deprecations in Python 3.6, and expected future ssl module deprecations. It also fixes bugs found since the release of 2.9.4.

  • Use ssl.SSLContext and ssl.PROTOCOL_TLS_CLIENT when available.
  • Fixed a C extensions build issue when the interpreter was built with -std=c99
  • Fixed various build issues with MinGW32.
  • Fixed a write concern bug in add_user() and remove_user() when connected to MongoDB 3.2+
  • Fixed various test failures related to changes in gevent, MongoDB, and our CI test environment.
Issues Resolved

See the PyMongo 2.9.5 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.4

Version 2.9.4 fixes issues reported since the release of 2.9.3.

Issues Resolved

See the PyMongo 2.9.4 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.3

Version 2.9.3 fixes a few issues reported since the release of 2.9.2 including thread safety issues in ensure_index(), drop_index(), and drop_indexes().

Issues Resolved

See the PyMongo 2.9.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.2

Version 2.9.2 restores Python 3.1 support, which was broken in PyMongo 2.8. It improves an error message when decoding BSON as well as fixes a couple other issues including aggregate() ignoring codec_options and command() raising a superfluous DeprecationWarning.

Issues Resolved

See the PyMongo 2.9.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.1

Version 2.9.1 fixes two interrupt handling issues in the C extensions and adapts a test case for a behavior change in MongoDB 3.2.

Issues Resolved

See the PyMongo 2.9.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9

Version 2.9 provides an upgrade path to PyMongo 3.x. Most of the API changes from PyMongo 3.0 have been backported in a backward compatible way, allowing applications to be written against PyMongo >= 2.9, rather then PyMongo 2.x or PyMongo 3.x. See the PyMongo 3 Migration Guide for detailed examples.

Note

There are a number of new deprecations in this release for features that were removed in PyMongo 3.0.

MongoClient:
  • host
  • port
  • use_greenlets
  • document_class
  • tz_aware
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype
  • disconnect()
  • alive()
MongoReplicaSetClient:
  • use_greenlets
  • document_class
  • tz_aware
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype
  • alive()
Database:
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype
Collection:
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype

Warning

In previous versions of PyMongo, changing the value of document_class changed the behavior of all existing instances of Collection:

>>> coll = client.test.test
>>> coll.find_one()
{u'_id': ObjectId('5579dc7cfba5220cc14d9a18')}
>>> from bson.son import SON
>>> client.document_class = SON
>>> coll.find_one()
SON([(u'_id', ObjectId('5579dc7cfba5220cc14d9a18'))])

The document_class setting is now configurable at the client, database, collection, and per-operation level. This required breaking the existing behavior. To change the document class per operation in a forward compatible way use with_options():

>>> coll.find_one()
{u'_id': ObjectId('5579dc7cfba5220cc14d9a18')}
>>> from bson.codec_options import CodecOptions
>>> coll.with_options(CodecOptions(SON)).find_one()
SON([(u'_id', ObjectId('5579dc7cfba5220cc14d9a18'))])
Issues Resolved

See the PyMongo 2.9 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.8.1

Version 2.8.1 fixes a number of issues reported since the release of PyMongo 2.8. It is a recommended upgrade for all users of PyMongo 2.x.

Issues Resolved

See the PyMongo 2.8.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.8

Version 2.8 is a major release that provides full support for MongoDB 3.0 and fixes a number of bugs.

Special thanks to Don Mitchell, Ximing, Can Zhang, Sergey Azovskov, and Heewa Barfchin for their contributions to this release.

Highlights include:

  • Support for the SCRAM-SHA-1 authentication mechanism (new in MongoDB 3.0).
  • JSON decoder support for the new $numberLong and $undefined types.
  • JSON decoder support for the $date type as an ISO-8601 string.
  • Support passing an index name to hint().
  • The count() method will use a hint if one has been provided through hint().
  • A new socketKeepAlive option for the connection pool.
  • New generator based BSON decode functions, decode_iter() and decode_file_iter().
  • Internal changes to support alternative storage engines like wiredtiger.

Note

There are a number of deprecations in this release for features that will be removed in PyMongo 3.0. These include:

The JSON format for Timestamp has changed from ‘{“t”: <int>, “i”: <int>}’ to ‘{“$timestamp”: {“t”: <int>, “i”: <int>}}’. This new format will be decoded to an instance of Timestamp. The old format will continue to be decoded to a python dict as before. Encoding to the old format is no longer supported as it was never correct and loses type information.

Issues Resolved

See the PyMongo 2.8 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.7.2

Version 2.7.2 includes fixes for upsert reporting in the bulk API for MongoDB versions previous to 2.6, a regression in how son manipulators are applied in insert(), a few obscure connection pool semaphore leaks, and a few other minor issues. See the list of issues resolved for full details.

Issues Resolved

See the PyMongo 2.7.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.7.1

Version 2.7.1 fixes a number of issues reported since the release of 2.7, most importantly a fix for creating indexes and manipulating users through mongos versions older than 2.4.0.

Issues Resolved

See the PyMongo 2.7.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.7

PyMongo 2.7 is a major release with a large number of new features and bug fixes. Highlights include:

Breaking changes

Version 2.7 drops support for replica sets running MongoDB versions older than 1.6.2.

Issues Resolved

See the PyMongo 2.7 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6.3

Version 2.6.3 fixes issues reported since the release of 2.6.2, most importantly a semaphore leak when a connection to the server fails.

Issues Resolved

See the PyMongo 2.6.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6.2

Version 2.6.2 fixes a TypeError problem when max_pool_size=None is used in Python 3.

Issues Resolved

See the PyMongo 2.6.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6.1

Version 2.6.1 fixes a reference leak in the insert() method.

Issues Resolved

See the PyMongo 2.6.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6

Version 2.6 includes some frequently requested improvements and adds support for some early MongoDB 2.6 features.

Special thanks go to Justin Patrin for his work on the connection pool in this release.

Important new features:

Warning

SIGNIFICANT BEHAVIOR CHANGE in 2.6. Previously, max_pool_size would limit only the idle sockets the pool would hold onto, not the number of open sockets. The default has also changed, from 10 to 100. If you pass a value for max_pool_size make sure it is large enough for the expected load. (Sockets are only opened when needed, so there is no cost to having a max_pool_size larger than necessary. Err towards a larger value.) If your application accepts the default, continue to do so.

See How does connection pooling work in PyMongo? for more information.

Issues Resolved

See the PyMongo 2.6 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.5.2

Version 2.5.2 fixes a NULL pointer dereference issue when decoding an invalid DBRef.

Issues Resolved

See the PyMongo 2.5.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.5.1

Version 2.5.1 is a minor release that fixes issues discovered after the release of 2.5. Most importantly, this release addresses some race conditions in replica set monitoring.

Issues Resolved

See the PyMongo 2.5.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.5

Version 2.5 includes changes to support new features in MongoDB 2.4.

Important new features:

  • Support for GSSAPI (Kerberos) authentication.
  • Support for SSL certificate validation with hostname matching.
  • Support for delegated and role based authentication.
  • New GEOSPHERE (2dsphere) and HASHED index constants.

Note

authenticate() now raises a subclass of PyMongoError if authentication fails due to invalid credentials or configuration issues.

Issues Resolved

See the PyMongo 2.5 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.4.2

Version 2.4.2 is a minor release that fixes issues discovered after the release of 2.4.1. Most importantly, PyMongo will no longer select a replica set member for read operations that is not in primary or secondary state.

Issues Resolved

See the PyMongo 2.4.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.4.1

Version 2.4.1 is a minor release that fixes issues discovered after the release of 2.4. Most importantly, this release fixes a regression using aggregate(), and possibly other commands, with mongos.

Issues Resolved

See the PyMongo 2.4.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.4

Version 2.4 includes a few important new features and a large number of bug fixes.

Important new features:

  • New MongoClient and MongoReplicaSetClient classes - these connection classes do acknowledged write operations (previously referred to as ‘safe’ writes) by default. Connection and ReplicaSetConnection are deprecated but still support the old default fire-and-forget behavior.
  • A new write concern API implemented as a write_concern attribute on the connection, Database, or Collection classes.
  • MongoClient (and Connection) now support Unix Domain Sockets.
  • Cursor can be copied with functions from the copy module.
  • The set_profiling_level() method now supports a slow_ms option.
  • The replica set monitor task (used by MongoReplicaSetClient and ReplicaSetConnection) is a daemon thread once again, meaning you won’t have to call close() before exiting the python interactive shell.

Warning

The constructors for MongoClient, MongoReplicaSetClient, Connection, and ReplicaSetConnection now raise ConnectionFailure instead of its subclass AutoReconnect if the server is unavailable. Applications that expect to catch AutoReconnect should now catch ConnectionFailure while creating a new connection.

Issues Resolved

See the PyMongo 2.4 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.3

Version 2.3 adds support for new features and behavior changes in MongoDB 2.2.

Important New Features:

  • Support for expanded read preferences including directing reads to tagged servers - See Secondary Reads for more information.
  • Support for mongos failover.
  • A new aggregate() method to support MongoDB’s new aggregation framework.
  • Support for legacy Java and C# byte order when encoding and decoding UUIDs.
  • Support for connecting directly to an arbiter.

Warning

Starting with MongoDB 2.2 the getLastError command requires authentication when the server’s authentication features are enabled. Changes to PyMongo were required to support this behavior change. Users of authentication must upgrade to PyMongo 2.3 (or newer) for “safe” write operations to function correctly.

Issues Resolved

See the PyMongo 2.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.2.1

Version 2.2.1 is a minor release that fixes issues discovered after the release of 2.2. Most importantly, this release fixes an incompatibility with mod_wsgi 2.x that could cause connections to leak. Users of mod_wsgi 2.x are strongly encouraged to upgrade from PyMongo 2.2.

Issues Resolved

See the PyMongo 2.2.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.2

Version 2.2 adds a few more frequently requested features and fixes a number of bugs.

Special thanks go to Alex Grönholm for his contributions to Python 3 support and maintaining the original pymongo3 port. Christoph Simon, Wouter Bolsterlee, Mike O’Brien, and Chris Tompkinson also contributed to this release.

Important New Features:

  • Support for Python 3 - See the Python 3 FAQ for more information.
  • Support for Gevent - See Gevent for more information.
  • Improved connection pooling. See PYTHON-287.

Warning

A number of methods and method parameters that were deprecated in PyMongo 1.9 or older versions have been removed in this release. The full list of changes can be found in the following JIRA ticket:

https://jira.mongodb.org/browse/PYTHON-305

BSON module aliases from the pymongo package that were deprecated in PyMongo 1.9 have also been removed in this release. See the following JIRA ticket for details:

https://jira.mongodb.org/browse/PYTHON-304

As a result of this cleanup some minor code changes may be required to use this release.

Issues Resolved

See the PyMongo 2.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.1.1

Version 2.1.1 is a minor release that fixes a few issues discovered after the release of 2.1. You can now use ReplicaSetConnection to run inline map reduce commands on secondaries. See inline_map_reduce() for details.

Special thanks go to Samuel Clay and Ross Lawley for their contributions to this release.

Issues Resolved

See the PyMongo 2.1.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.1

Version 2.1 adds a few frequently requested features and includes the usual round of bug fixes and improvements.

Special thanks go to Alexey Borzenkov, Dan Crosta, Kostya Rybnikov, Flavio Percoco Premoli, Jonas Haag, and Jesse Davis for their contributions to this release.

Important New Features:

  • ReplicaSetConnection - ReplicaSetConnection can be used to distribute reads to secondaries in a replica set. It supports automatic failover handling and periodically checks the state of the replica set to handle issues like primary stepdown or secondaries being removed for backup operations. Read preferences are defined through ReadPreference.
  • PyMongo supports the new BSON binary subtype 4 for UUIDs. The default subtype to use can be set through uuid_subtype The current default remains OLD_UUID_SUBTYPE but will be changed to UUID_SUBTYPE in a future release.
  • The getLastError option ‘w’ can be set to a string, allowing for options like “majority” available in newer version of MongoDB.
  • Added support for the MongoDB URI options socketTimeoutMS and connectTimeoutMS.
  • Added support for the ContinueOnError insert flag.
  • Added basic SSL support.
  • Added basic support for Jython.
  • Secondaries can be used for count(), distinct(), group(), and querying GridFS.
  • Added document_class and tz_aware options to MasterSlaveConnection
Issues Resolved

See the PyMongo 2.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.0.1

Version 2.0.1 fixes a regression in GridIn when writing pre-chunked strings. Thanks go to Alexey Borzenkov for reporting the issue and submitting a patch.

Issues Resolved
  • PYTHON-271: Regression in GridFS leads to serious loss of data.

Changes in Version 2.0

Version 2.0 adds a large number of features and fixes a number of issues.

Special thanks go to James Murty, Abhay Vardhan, David Pisoni, Ryan Smith-Roberts, Andrew Pendleton, Mher Movsisyan, Reed O’Brien, Michael Schurter, Josip Delic and Jonas Haag for their contributions to this release.

Important New Features:

  • PyMongo now performs automatic per-socket database authentication. You no longer have to re-authenticate for each new thread or after a replica set failover. Authentication credentials are cached by the driver until the application calls logout().
  • slave_okay can be set independently at the connection, database, collection or query level. Each level will inherit the slave_okay setting from the previous level and each level can override the previous level’s setting.
  • safe and getLastError options (e.g. w, wtimeout, etc.) can be set independently at the connection, database, collection or query level. Each level will inherit settings from the previous level and each level can override the previous level’s setting.
  • PyMongo now supports the await_data and partial cursor flags. If the await_data flag is set on a tailable cursor the server will block for some extra time waiting for more data to return. The partial flag tells a mongos to return partial data for a query if not all shards are available.
  • map_reduce() will accept a dict or instance of SON as the out parameter.
  • The URI parser has been moved into its own module and can be used directly by application code.
  • AutoReconnect exception now provides information about the error that actually occured instead of a generic failure message.
  • A number of new helper methods have been added with options for setting and unsetting cursor flags, re-indexing a collection, fsync and locking a server, and getting the server’s current operations.

API changes:

  • If only one host:port pair is specified Connection will make a direct connection to only that host. Please note that slave_okay must be True in order to query from a secondary.
  • If more than one host:port pair is specified or the replicaset option is used PyMongo will treat the specified host:port pair(s) as a seed list and connect using replica set behavior.

Warning

The default subtype for Binary has changed from OLD_BINARY_SUBTYPE (2) to BINARY_SUBTYPE (0).

Issues Resolved

See the PyMongo 2.0 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 1.11

Version 1.11 adds a few new features and fixes a few more bugs.

New Features:

  • Basic IPv6 support: pymongo prefers IPv4 but will try IPv6. You can also specify an IPv6 address literal in the host parameter or a MongoDB URI provided it is enclosed in ‘[‘ and ‘]’.
  • max_pool_size option: previously pymongo had a hard coded pool size of 10 connections. With this change you can specify a different pool size as a parameter to Connection (max_pool_size=<integer>) or in the MongoDB URI (maxPoolSize=<integer>).
  • Find by metadata in GridFS: You can know specify query fields as keyword parameters for get_version() and get_last_version().
  • Per-query slave_okay option: slave_okay=True is now a valid keyword argument for find() and find_one().

API changes:

  • validate_collection() now returns a dict instead of a string. This change was required to deal with an API change on the server. This method also now takes the optional scandata and full parameters. See the documentation for more details.

Warning

The pool_size, auto_start_request, and timeout parameters for Connection have been completely removed in this release. They were deprecated in pymongo-1.4 and have had no effect since then. Please make sure that your code doesn’t currently pass these parameters when creating a Connection instance.

Issues resolved
  • PYTHON-241: Support setting slaveok at the cursor level.
  • PYTHON-240: Queries can sometimes permanently fail after a replica set fail over.
  • PYTHON-238: error after few million requests
  • PYTHON-237: Basic IPv6 support.
  • PYTHON-236: Restore option to specify pool size in Connection.
  • PYTHON-212: pymongo does not recover after stale config
  • PYTHON-138: Find method for GridFS

Changes in Version 1.10.1

Version 1.10.1 is primarily a bugfix release. It fixes a regression in version 1.10 that broke pickling of ObjectIds. A number of other bugs have been fixed as well.

There are two behavior changes to be aware of:

  • If a read slave raises AutoReconnect MasterSlaveConnection will now retry the query on each slave until it is successful or all slaves have raised AutoReconnect. Any other exception will immediately be raised. The order that the slaves are tried is random. Previously the read would be sent to one randomly chosen slave and AutoReconnect was immediately raised in case of a connection failure.
  • A Python long is now always BSON encoded as an int64. Previously the encoding was based only on the value of the field and a long with a value less than 2147483648 or greater than -2147483649 would always be BSON encoded as an int32.
Issues resolved
  • PYTHON-234: Fix setup.py to raise exception if any when building extensions
  • PYTHON-233: Add information to build and test with extensions on windows
  • PYTHON-232: Traceback when hashing a DBRef instance
  • PYTHON-231: Traceback when pickling a DBRef instance
  • PYTHON-230: Pickled ObjectIds are not compatible between pymongo 1.9 and 1.10
  • PYTHON-228: Cannot pickle bson.ObjectId
  • PYTHON-227: Traceback when calling find() on system.js
  • PYTHON-216: MasterSlaveConnection is missing disconnect() method
  • PYTHON-186: When storing integers, type is selected according to value instead of type
  • PYTHON-173: as_class option is not propogated by Cursor.clone
  • PYTHON-113: Redunducy in MasterSlaveConnection

Changes in Version 1.10

Version 1.10 includes changes to support new features in MongoDB 1.8.x. Highlights include a modified map/reduce API including an inline map/reduce helper method, a new find_and_modify helper, and the ability to query the server for the maximum BSON document size it supports.

Warning

MongoDB versions greater than 1.7.4 no longer generate temporary collections for map/reduce results. An output collection name must be provided and the output will replace any existing output collection with the same name. map_reduce() now requires the out parameter.

Issues resolved
  • PYTHON-225: ObjectId class definition should use __slots__.
  • PYTHON-223: Documentation fix.
  • PYTHON-220: Documentation fix.
  • PYTHON-219: KeyError in find_and_modify()
  • PYTHON-213: Query server for maximum BSON document size.
  • PYTHON-208: Fix Connection __repr__.
  • PYTHON-207: Changes to Map/Reduce API.
  • PYTHON-205: Accept slaveOk in the URI to match the URI docs.
  • PYTHON-203: When slave_okay=True and we only specify one host don’t autodetect other set members.
  • PYTHON-194: Show size when whining about a document being too large.
  • PYTHON-184: Raise DuplicateKeyError for duplicate keys in capped collections.
  • PYTHON-178: Don’t segfault when trying to encode a recursive data structure.
  • PYTHON-177: Don’t segfault when decoding dicts with broken iterators.
  • PYTHON-172: Fix a typo.
  • PYTHON-170: Add find_and_modify().
  • PYTHON-169: Support deepcopy of DBRef.
  • PYTHON-167: Duplicate of PYTHON-166.
  • PYTHON-166: Fixes a concurrency issue.
  • PYTHON-158: Add code and err string to db assertion messages.

Changes in Version 1.9

Version 1.9 adds a new package to the PyMongo distribution, bson. bson contains all of the BSON encoding and decoding logic, and the BSON types that were formerly in the pymongo package. The following modules have been renamed:

In addition, the following exception classes have been renamed:

The above exceptions now inherit from bson.errors.BSONError rather than pymongo.errors.PyMongoError.

Note

All of the renamed modules and exceptions above have aliases created with the old names, so these changes should not break existing code. The old names will eventually be deprecated and then removed, so users should begin migrating towards the new names now.

Warning

The change to the exception hierarchy mentioned above is possibly breaking. If your code is catching PyMongoError, then the exceptions raised by bson will not be caught, even though they would have been caught previously. Before upgrading, it is recommended that users check for any cases like this.

  • the C extension now shares buffer.c/h with the Ruby driver
  • bson no longer raises InvalidName, all occurrences have been replaced with InvalidDocument.
  • renamed bson._to_dicts() to decode_all().
  • renamed from_dict() to encode() and to_dict() to decode().
  • added batch_size().
  • allow updating (some) file metadata after a GridIn instance has been closed.
  • performance improvements for reading from GridFS.
  • special cased slice with the same start and stop to return an empty cursor.
  • allow writing unicode to GridFS if an encoding attribute has been specified for the file.
  • added gridfs.GridFS.get_version().
  • scope variables for Code can now be specified as keyword arguments.
  • added readline() to GridOut.
  • make a best effort to transparently auto-reconnect if a Connection has been idle for a while.
  • added list() to SystemJS.
  • added file_document argument to GridOut() to allow initializing from an existing file document.
  • raise TimeoutError even if the getLastError command was run manually and not through “safe” mode.
  • added uuid support to json_util.

Changes in Version 1.8.1

  • fixed a typo in the C extension that could cause safe-mode operations to report a failure (SystemError) even when none occurred.
  • added a __ne__() implementation to any class where we define __eq__().

Changes in Version 1.8

Version 1.8 adds support for connecting to replica sets, specifying per-operation values for w and wtimeout, and decoding to timezone-aware datetimes.

  • fixed a reference leak in the C extension when decoding a DBRef.
  • added support for w, wtimeout, and fsync (and any other options for getLastError) to “safe mode” operations.
  • added nodes property.
  • added a maximum pool size of 10 sockets.
  • added support for replica sets.
  • DEPRECATED from_uri() and paired(), both are supplanted by extended functionality in Connection().
  • added tz aware support for datetimes in ObjectId, Timestamp and json_util methods.
  • added drop() helper.
  • reuse the socket used for finding the master when a Connection is first created.
  • added support for MinKey, MaxKey and Timestamp to json_util.
  • added support for decoding datetimes as aware (UTC) - it is highly recommended to enable this by setting the tz_aware parameter to Connection() to True.
  • added network_timeout option for individual calls to find() and find_one().
  • added exists() to check if a file exists in GridFS.
  • added support for additional keys in DBRef instances.
  • added code attribute to OperationFailure exceptions.
  • fixed serialization of int and float subclasses in the C extension.

Changes in Version 1.7

Version 1.7 is a recommended upgrade for all PyMongo users. The full release notes are below, and some more in depth discussion of the highlights is here.

  • no longer attempt to build the C extension on big-endian systems.
  • added MinKey and MaxKey.
  • use unsigned for Timestamp in BSON encoder/decoder.
  • support True as "ok" in command responses, in addition to 1.0 - necessary for server versions >= 1.5.X
  • BREAKING change to index_information() to add support for querying unique status and other index information.
  • added document_class, to specify class for returned documents.
  • added as_class argument for find(), and in the BSON decoder.
  • added support for creating Timestamp instances using a datetime.
  • allow dropTarget argument for rename.
  • handle aware datetime instances, by converting to UTC.
  • added support for max_scan.
  • raise FileExists exception when creating a duplicate GridFS file.
  • use y2038 for time handling in the C extension - eliminates 2038 problems when extension is installed.
  • added sort parameter to find()
  • finalized deprecation of changes from versions <= 1.4
  • take any non-dict as an "_id" query for find_one() or remove()
  • added ability to pass a dict for fields argument to find() (supports "$slice" and field negation)
  • simplified code to find master, since paired setups don’t always have a remote
  • fixed bug in C encoder for certain invalid types (like Collection instances).
  • don’t transparently map "filename" key to name attribute for GridFS.

Changes in Version 1.6

The biggest change in version 1.6 is a complete re-implementation of gridfs with a lot of improvements over the old implementation. There are many details and examples of using the new API in this blog post. The old API has been removed in this version, so existing code will need to be modified before upgrading to 1.6.

  • fixed issue where connection pool was being shared across Connection instances.
  • more improvements to Python code caching in C extension - should improve behavior on mod_wsgi.
  • added from_datetime().
  • complete rewrite of gridfs support.
  • improvements to the command() API.
  • fixed drop_indexes() behavior on non-existent collections.
  • disallow empty bulk inserts.

Changes in Version 1.5.2

  • fixed response handling to ignore unknown response flags in queries.
  • handle server versions containing ‘-pre-‘.

Changes in Version 1.5.1

  • added _id property for GridFile instances.
  • fix for making a Connection (with slave_okay set) directly to a slave in a replica pair.
  • accept kwargs for create_index() and ensure_index() to support all indexing options.
  • add pymongo.GEO2D and support for geo indexing.
  • improvements to Python code caching in C extension - should improve behavior on mod_wsgi.

Changes in Version 1.5

  • added subtype constants to binary module.
  • DEPRECATED options argument to Collection() and create_collection() in favor of kwargs.
  • added has_c() to check for C extension.
  • added copy_database().
  • added alive to tell when a cursor might have more data to return (useful for tailable cursors).
  • added Timestamp to better support dealing with internal MongoDB timestamps.
  • added name argument for create_index() and ensure_index().
  • fixed connection pooling w/ fork
  • paired() takes all kwargs that are allowed for Connection().
  • insert() returns list for bulk inserts of size one.
  • fixed handling of datetime.datetime instances in json_util.
  • added from_uri() to support MongoDB connection uri scheme.
  • fixed chunk number calculation when unaligned in gridfs.
  • command() takes a string for simple commands.
  • added system_js helper for dealing with server-side JS.
  • don’t wrap queries containing "$query" (support manual use of "$min", etc.).
  • added GridFSError as base class for gridfs exceptions.

Changes in Version 1.4

Perhaps the most important change in version 1.4 is that we have decided to no longer support Python 2.3. The most immediate reason for this is to allow some improvements to connection pooling. This will also allow us to use some new (as in Python 2.4 ;) idioms and will help begin the path towards supporting Python 3.0. If you need to use Python 2.3 you should consider using version 1.3 of this driver, although that will no longer be actively supported.

Other changes:

  • move "_id" to front only for top-level documents (fixes some corner cases).
  • update() and remove() return the entire response to the lastError command when safe is True.
  • completed removal of things that were deprecated in version 1.2 or earlier.
  • enforce that collection names do not contain the NULL byte.
  • fix to allow using UTF-8 collection names with the C extension.
  • added PyMongoError as base exception class for all errors. this changes the exception hierarchy somewhat, and is a BREAKING change if you depend on ConnectionFailure being a IOError or InvalidBSON being a ValueError, for example.
  • added DuplicateKeyError for calls to insert() or update() with safe set to True.
  • removed thread_util.
  • added add_user() and remove_user() helpers.
  • fix for authenticate() when using non-UTF-8 names or passwords.
  • minor fixes for MasterSlaveConnection.
  • clean up all cases where ConnectionFailure is raised.
  • simplification of connection pooling - makes driver ~2x faster for simple benchmarks. see How does connection pooling work in PyMongo? for more information.
  • DEPRECATED pool_size, auto_start_request and timeout parameters to Connection. DEPRECATED start_request().
  • use socket.sendall().
  • removed from_xml() as it was only being used for some internal testing - also eliminates dependency on elementtree.
  • implementation of update() in C.
  • deprecate _command() in favor of command().
  • send all commands without wrapping as {"query": ...}.
  • support string as key argument to group() (keyf) and run all groups as commands.
  • support for equality testing for Code instances.
  • allow the NULL byte in strings and disallow it in key names or regex patterns

Changes in Version 1.3

  • DEPRECATED running group() as eval(), also changed default for group() to running as a command
  • remove pymongo.cursor.Cursor.__len__(), which was deprecated in 1.1.1 - needed to do this aggressively due to it’s presence breaking Django template for loops
  • DEPRECATED host(), port(), connection(), name(), database(), name() and full_name() in favor of host, port, connection, name, database, name and full_name, respectively. The deprecation schedule for this change will probably be faster than usual, as it carries some performance implications.
  • added disconnect()

Changes in Version 1.2.1

  • added Changelog to docs
  • added setup.py doc --test to run doctests for tutorial, examples
  • moved most examples to Sphinx docs (and remove from examples/ directory)
  • raise InvalidId instead of TypeError when passing a 24 character string to ObjectId that contains non-hexadecimal characters
  • allow unicode instances for ObjectId init

Changes in Version 1.2

  • spec parameter for remove() is now optional to allow for deleting all documents in a Collection
  • always wrap queries with {query: ...} even when no special options - get around some issues with queries on fields named query
  • enforce 4MB document limit on the client side
  • added map_reduce() helper - see example
  • added distinct() method on Cursor instances to allow distinct with queries
  • fix for __getitem__() after skip()
  • allow any UTF-8 string in BSON encoder, not just ASCII subset
  • added generation_time
  • removed support for legacy ObjectId format - pretty sure this was never used, and is just confusing
  • DEPRECATED url_encode() and url_decode() in favor of str() and ObjectId(), respectively
  • allow oplog.$main as a valid collection name
  • some minor fixes for installation process
  • added support for datetime and regex in json_util

Changes in Version 1.1.2

  • improvements to insert() speed (using C for insert message creation)
  • use random number for request_id
  • fix some race conditions with AutoReconnect

Changes in Version 1.1.1

  • added multi parameter for update()
  • fix unicode regex patterns with C extension
  • added distinct()
  • added database support for DBRef
  • added json_util with helpers for encoding / decoding special types to JSON
  • DEPRECATED pymongo.cursor.Cursor.__len__() in favor of count() with with_limit_and_skip set to True due to performance regression
  • switch documentation to Sphinx

Changes in Version 1.1

  • added __hash__() for DBRef and ObjectId
  • bulk insert() works with any iterable
  • fix ObjectId generation when using multiprocessing
  • added collection
  • added network_timeout parameter for Connection()
  • DEPRECATED slave_okay parameter for individual queries
  • fix for safe mode when multi-threaded
  • added safe parameter for remove()
  • added tailable parameter for find()

Changes in Version 1.0

Changes in Version 0.16

  • support for encoding/decoding uuid.UUID instances
  • fix for explain() with limits

Changes in Version 0.15.2

  • documentation changes only

Changes in Version 0.15.1

  • various performance improvements
  • API CHANGE no longer need to specify direction for create_index() and ensure_index() when indexing a single key
  • support for encoding tuple instances as list instances

Changes in Version 0.15

  • fix string representation of ObjectId instances
  • added timeout parameter for find()
  • allow scope for reduce function in group()

Changes in Version 0.14.2

  • minor bugfixes

Changes in Version 0.14.1

  • seek() and tell() for (read mode) GridFile instances

Changes in Version 0.14

Changes in Version 0.13

  • better MasterSlaveConnection support
  • API CHANGE insert() and save() both return inserted _id
  • DEPRECATED passing an index name to hint()

Changes in Version 0.12

Changes in Version 0.11.3

  • don’t allow NULL bytes in string encoder
  • fixes for Python 2.3

Changes in Version 0.11.2

  • PEP 8
  • updates for group()
  • VS build

Changes in Version 0.11.1

  • fix for connection pooling under Python 2.5

Changes in Version 0.11

  • better build failure detection
  • driver support for selecting fields in sub-documents
  • disallow insertion of invalid key names
  • added timeout parameter for Connection()

Changes in Version 0.10.3

  • fix bug with large limit()
  • better exception when modules get reloaded out from underneath the C extension
  • better exception messages when calling a Collection or Database instance

Changes in Version 0.10.2

  • support subclasses of dict in C encoder

Changes in Version 0.10.1

  • alias Connection as pymongo.Connection
  • raise an exception rather than silently overflowing in encoder

Changes in Version 0.10

Changes in Version 0.9.7

  • allow sub-collections of $cmd as valid Collection names
  • add version as pymongo.version
  • add --no_ext command line option to setup.py
Python 3 FAQ
What Python 3 versions are supported?

PyMongo supports CPython 3.4+ and PyPy3.

Are there any PyMongo behavior changes with Python 3?

Only one intentional change. Instances of bytes are encoded as BSON type 5 (Binary data) with subtype 0. In Python 3 they are decoded back to bytes. In Python 2 they are decoded to Binary with subtype 0.

For example, let’s insert a bytes instance using Python 3 then read it back. Notice the byte string is decoded back to bytes:

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> c = pymongo.MongoClient()
>>> c.test.bintest.insert_one({'binary': b'this is a byte string'}).inserted_id
ObjectId('4f9086b1fba5222021000000')
>>> c.test.bintest.find_one()
{'binary': b'this is a byte string', '_id': ObjectId('4f9086b1fba5222021000000')}

Now retrieve the same document in Python 2. Notice the byte string is decoded to Binary:

Python 2.7.6 (default, Feb 26 2014, 10:36:22)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> c = pymongo.MongoClient()
>>> c.test.bintest.find_one()
{u'binary': Binary('this is a byte string', 0), u'_id': ObjectId('4f9086b1fba5222021000000')}

There is a similar change in behavior in parsing JSON binary with subtype 0. In Python 3 they are decoded into bytes. In Python 2 they are decoded to Binary with subtype 0.

For example, let’s decode a JSON binary subtype 0 using Python 3. Notice the byte string is decoded to bytes:

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bson.json_util import loads
>>> loads('{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}')
{'b': b'this is a byte string'}

Now decode the same JSON in Python 2 . Notice the byte string is decoded to Binary:

Python 2.7.10 (default, Feb  7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bson.json_util import loads
>>> loads('{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}')
{u'b': Binary('this is a byte string', 0)}
Why can’t I share pickled ObjectIds between some versions of Python 2 and 3?

Instances of ObjectId pickled using Python 2 can always be unpickled using Python 3.

If you pickled an ObjectId using Python 2 and want to unpickle it using Python 3 you must pass encoding='latin-1' to pickle.loads:

Python 2.7.6 (default, Feb 26 2014, 10:36:22)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> from bson.objectid import ObjectId
>>> oid = ObjectId()
>>> oid
ObjectId('4f919ba2fba5225b84000000')
>>> pickle.dumps(oid)
'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...'

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads(b'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...', encoding='latin-1')
ObjectId('4f919ba2fba5225b84000000')

If you need to pickle ObjectIds using Python 3 and unpickle them using Python 2 you must use protocol <= 2:

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> from bson.objectid import ObjectId
>>> oid = ObjectId()
>>> oid
ObjectId('4f96f20c430ee6bd06000000')
>>> pickle.dumps(oid, protocol=2)
b'\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...'

Python 2.6.9 (unknown, Feb 26 2014, 12:39:10)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads('\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...')
ObjectId('4f96f20c430ee6bd06000000')

PyMongo 3 Migration Guide

PyMongo 3 is a partial rewrite bringing a large number of improvements. It also brings a number of backward breaking changes. This guide provides a roadmap for migrating an existing application from PyMongo 2.x to 3.x or writing libraries that will work with both PyMongo 2.x and 3.x.

PyMongo 2.9

The first step in any successful migration involves upgrading to, or requiring, at least PyMongo 2.9. If your project has a requirements.txt file, add the line “pymongo >= 2.9, < 3.0” until you have completely migrated to PyMongo 3. Most of the key new methods and options from PyMongo 3.0 are backported in PyMongo 2.9 making migration much easier.

Enable Deprecation Warnings

Starting with PyMongo 2.9, DeprecationWarning is raised by most methods removed in PyMongo 3.0. Make sure you enable runtime warnings to see where deprecated functions and methods are being used in your application:

python -Wd <your application>

Warnings can also be changed to errors:

python -Wd -Werror <your application>

Note

Not all deprecated features raise DeprecationWarning when used. For example, the find() options renamed in PyMongo 3.0 do not raise DeprecationWarning when used in PyMongo 2.x. See also Removed features with no migration path.

CRUD API

Changes to find() and find_one()
“spec” renamed “filter”

The spec option has been renamed to filter. Code like this:

>>> cursor = collection.find(spec={"a": 1})

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find(filter={"a": 1})

or this with any version of PyMongo:

>>> cursor = collection.find({"a": 1})
“fields” renamed “projection”

The fields option has been renamed to projection. Code like this:

>>> cursor = collection.find({"a": 1}, fields={"_id": False})

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find({"a": 1}, projection={"_id": False})

or this with any version of PyMongo:

>>> cursor = collection.find({"a": 1}, {"_id": False})
“partial” renamed “allow_partial_results”

The partial option has been renamed to allow_partial_results. Code like this:

>>> cursor = collection.find({"a": 1}, partial=True)

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find({"a": 1}, allow_partial_results=True)
“timeout” replaced by “no_cursor_timeout”

The timeout option has been replaced by no_cursor_timeout. Code like this:

>>> cursor = collection.find({"a": 1}, timeout=False)

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find({"a": 1}, no_cursor_timeout=True)
“network_timeout” is removed

The network_timeout option has been removed. This option was always the wrong solution for timing out long running queries and should never be used in production. Starting with MongoDB 2.6 you can use the $maxTimeMS query modifier. Code like this:

# Set a 5 second select() timeout.
>>> cursor = collection.find({"a": 1}, network_timeout=5)

can be changed to this with PyMongo 2.9 or later:

# Set a 5 second (5000 millisecond) server side query timeout.
>>> cursor = collection.find({"a": 1}, modifiers={"$maxTimeMS": 5000})

or with PyMongo 3.5 or later:

>>> cursor = collection.find({"a": 1}, max_time_ms=5000)

or with any version of PyMongo:

>>> cursor = collection.find({"$query": {"a": 1}, "$maxTimeMS": 5000})

See also

$maxTimeMS

Tailable cursors

The tailable and await_data options have been replaced by cursor_type. Code like this:

>>> cursor = collection.find({"a": 1}, tailable=True)
>>> cursor = collection.find({"a": 1}, tailable=True, await_data=True)

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo import CursorType
>>> cursor = collection.find({"a": 1}, cursor_type=CursorType.TAILABLE)
>>> cursor = collection.find({"a": 1}, cursor_type=CursorType.TAILABLE_AWAIT)
Other removed options

The slave_okay, read_preference, tag_sets, and secondary_acceptable_latency_ms options have been removed. See the Read Preferences section for solutions.

The aggregate method always returns a cursor

PyMongo 2.6 added an option to return an iterable cursor from aggregate(). In PyMongo 3 aggregate() always returns a cursor. Use the cursor option for consistent behavior with PyMongo 2.9 and later:

>>> for result in collection.aggregate([], cursor={}):
...     pass

Read Preferences

The “slave_okay” option is removed

The slave_okay option is removed from PyMongo’s API. The secondaryPreferred read preference provides the same behavior. Code like this:

>>> client = MongoClient(slave_okay=True)

can be changed to this with PyMongo 2.9 or newer:

>>> client = MongoClient(readPreference="secondaryPreferred")
The “read_preference” attribute is immutable

Code like this:

>>> from pymongo import ReadPreference
>>> db = client.my_database
>>> db.read_preference = ReadPreference.SECONDARY

can be changed to this with PyMongo 2.9 or later:

>>> db = client.get_database("my_database",
...                          read_preference=ReadPreference.SECONDARY)

Code like this:

>>> cursor = collection.find({"a": 1},
...                          read_preference=ReadPreference.SECONDARY)

can be changed to this with PyMongo 2.9 or later:

>>> coll2 = collection.with_options(read_preference=ReadPreference.SECONDARY)
>>> cursor = coll2.find({"a": 1})

See also

get_collection()

The “tag_sets” option and attribute are removed

The tag_sets MongoClient option is removed. The read_preference option can be used instead. Code like this:

>>> client = MongoClient(
...     read_preference=ReadPreference.SECONDARY,
...     tag_sets=[{"dc": "ny"}, {"dc": "sf"}])

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo.read_preferences import Secondary
>>> client = MongoClient(read_preference=Secondary([{"dc": "ny"}]))

To change the tags sets for a Database or Collection, code like this:

>>> db = client.my_database
>>> db.read_preference = ReadPreference.SECONDARY
>>> db.tag_sets = [{"dc": "ny"}]

can be changed to this with PyMongo 2.9 or later:

>>> db = client.get_database("my_database",
...                          read_preference=Secondary([{"dc": "ny"}]))

Code like this:

>>> cursor = collection.find(
...     {"a": 1},
...     read_preference=ReadPreference.SECONDARY,
...     tag_sets=[{"dc": "ny"}])

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo.read_preferences import Secondary
>>> coll2 = collection.with_options(
...     read_preference=Secondary([{"dc": "ny"}]))
>>> cursor = coll2.find({"a": 1})

See also

get_collection()

The “secondary_acceptable_latency_ms” option and attribute are removed

PyMongo 2.x supports secondary_acceptable_latency_ms as an option to methods throughout the driver, but mongos only supports a global latency option. PyMongo 3.x has changed to match the behavior of mongos, allowing migration from a single server, to a replica set, to a sharded cluster without a surprising change in server selection behavior. A new option, localThresholdMS, is available through MongoClient and should be used in place of secondaryAcceptableLatencyMS. Code like this:

>>> client = MongoClient(readPreference="nearest",
...                      secondaryAcceptableLatencyMS=100)

can be changed to this with PyMongo 2.9 or later:

>>> client = MongoClient(readPreference="nearest",
...                      localThresholdMS=100)

Write Concern

The “safe” option is removed

In PyMongo 3 the safe option is removed from the entire API. MongoClient has always defaulted to acknowledged write operations and continues to do so in PyMongo 3.

The “write_concern” attribute is immutable

The write_concern attribute is immutable in PyMongo 3. Code like this:

>>> client = MongoClient()
>>> client.write_concern = {"w": "majority"}

can be changed to this with any version of PyMongo:

>>> client = MongoClient(w="majority")

Code like this:

>>> db = client.my_database
>>> db.write_concern = {"w": "majority"}

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo import WriteConcern
>>> db = client.get_database("my_database",
...                          write_concern=WriteConcern(w="majority"))

The new CRUD API write methods do not accept write concern options. Code like this:

>>> oid = collection.insert({"a": 2}, w="majority")

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo import WriteConcern
>>> coll2 = collection.with_options(
...     write_concern=WriteConcern(w="majority"))
>>> oid = coll2.insert({"a": 2})

See also

get_collection()

Codec Options

The “document_class” attribute is removed

Code like this:

>>> from bson.son import SON
>>> client = MongoClient()
>>> client.document_class = SON

can be replaced by this in any version of PyMongo:

>>> from bson.son import SON
>>> client = MongoClient(document_class=SON)

or to change the document_class for a Database with PyMongo 2.9 or later:

>>> from bson.codec_options import CodecOptions
>>> from bson.son import SON
>>> db = client.get_database("my_database", CodecOptions(SON))
The “uuid_subtype” option and attribute are removed

Code like this:

>>> from bson.binary import JAVA_LEGACY
>>> db = client.my_database
>>> db.uuid_subtype = JAVA_LEGACY

can be replaced by this with PyMongo 2.9 or later:

>>> from bson.binary import JAVA_LEGACY
>>> from bson.codec_options import CodecOptions
>>> db = client.get_database("my_database",
...                          CodecOptions(uuid_representation=JAVA_LEGACY))

MongoClient

MongoClient connects asynchronously

In PyMongo 3, the MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads. The connect option is added to control whether these threads are started immediately, or when the client is first used.

For consistent behavior in PyMongo 2.x and PyMongo 3.x, code like this:

>>> from pymongo.errors import ConnectionFailure
>>> try:
...     client = MongoClient()
... except ConnectionFailure:
...     print("Server not available")
>>>

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo.errors import ConnectionFailure
>>> client = MongoClient(connect=False)
>>> try:
...     result = client.admin.command("ismaster")
... except ConnectionFailure:
...     print("Server not available")
>>>

Any operation can be used to determine if the server is available. We choose the “ismaster” command here because it is cheap and does not require auth, so it is a simple way to check whether the server is available.

The max_pool_size parameter is removed

PyMongo 3 replaced the max_pool_size parameter with support for the MongoDB URI maxPoolSize option. Code like this:

>>> client = MongoClient(max_pool_size=10)

can be replaced by this with PyMongo 2.9 or later:

>>> client = MongoClient(maxPoolSize=10)
>>> client = MongoClient("mongodb://localhost:27017/?maxPoolSize=10")
The “disconnect” method is removed

Code like this:

>>> client.disconnect()

can be replaced by this with PyMongo 2.9 or later:

>>> client.close()
The host and port attributes are removed

Code like this:

>>> host = client.host
>>> port = client.port

can be replaced by this with PyMongo 2.9 or later:

>>> address = client.address
>>> host, port = address or (None, None)

BSON

“as_class”, “tz_aware”, and “uuid_subtype” are removed

The as_class, tz_aware, and uuid_subtype parameters have been removed from the functions provided in bson. Code like this:

>>> from bson import BSON
>>> from bson.son import SON
>>> encoded = BSON.encode({"a": 1}, as_class=SON)

can be replaced by this in PyMongo 2.9 or later:

>>> from bson import BSON
>>> from bson.codec_options import CodecOptions
>>> from bson.son import SON
>>> encoded = BSON.encode({"a": 1}, codec_options=CodecOptions(SON))

Removed features with no migration path

MasterSlaveConnection is removed

Master slave deployments are deprecated in MongoDB. Starting with MongoDB 3.0 a replica set can have up to 50 members and that limit is likely to be removed in later releases. We recommend migrating to replica sets instead.

Requests are removed

The client methods start_request, in_request, and end_request are removed. Requests were designed to make read-your-writes consistency more likely with the w=0 write concern. Additionally, a thread in a request used the same member for all secondary reads in a replica set. To ensure read-your-writes consistency in PyMongo 3.0, do not override the default write concern with w=0, and do not override the default read preference of PRIMARY.

The “compile_re” option is removed

In PyMongo 3 regular expressions are never compiled to Python match objects.

The “use_greenlets” option is removed

The use_greenlets option was meant to allow use of PyMongo with Gevent without the use of gevent.monkey.patch_threads(). This option caused a lot of confusion and made it difficult to support alternative asyncio libraries like Eventlet. Users of Gevent should use gevent.monkey.patch_all() instead.

See also

Gevent

Developer Guide

Technical guide for contributors to PyMongo.

Periodic Executors

PyMongo implements a PeriodicExecutor for two purposes: as the background thread for Monitor, and to regularly check if there are OP_KILL_CURSORS messages that must be sent to the server.

Killing Cursors

An incompletely iterated Cursor on the client represents an open cursor object on the server. In code like this, we lose a reference to the cursor before finishing iteration:

for doc in collection.find():
    raise Exception()

We try to send an OP_KILL_CURSORS to the server to tell it to clean up the server-side cursor. But we must not take any locks directly from the cursor’s destructor (see PYTHON-799), so we cannot safely use the PyMongo data structures required to send a message. The solution is to add the cursor’s id to an array on the MongoClient without taking any locks.

Each client has a PeriodicExecutor devoted to checking the array for cursor ids. Any it sees are the result of cursors that were freed while the server-side cursor was still open. The executor can safely take the locks it needs in order to send the OP_KILL_CURSORS message.

Stopping Executors

Just as Cursor must not take any locks from its destructor, neither can MongoClient and Topology. Thus, although the client calls close() on its kill-cursors thread, and the topology calls close() on all its monitor threads, the close() method cannot actually call wake() on the executor, since wake() takes a lock.

Instead, executors wake periodically to check if self.close is set, and if so they exit.

A thread can log spurious errors if it wakes late in the Python interpreter’s shutdown sequence, so we try to join threads before then. Each periodic executor (either a monitor or a kill-cursors thread) adds a weakref to itself to a set called _EXECUTORS, in the periodic_executor module.

An exit handler runs on shutdown and tells all executors to stop, then tries (with a short timeout) to join all executor threads.

Monitoring

For each server in the topology, Topology uses a periodic executor to launch a monitor thread. This thread must not prevent the topology from being freed, so it weakrefs the topology. Furthermore, it uses a weakref callback to terminate itself soon after the topology is freed.

Solid lines represent strong references, dashed lines weak ones:

_images/periodic-executor-refs.png

See Stopping Executors above for an explanation of the _EXECUTORS set.

It is a requirement of the Server Discovery And Monitoring Spec that a sleeping monitor can be awakened early. Aside from infrequent wakeups to do their appointed chores, and occasional interruptions, periodic executors also wake periodically to check if they should terminate.

Our first implementation of this idea was the obvious one: use the Python standard library’s threading.Condition.wait with a timeout. Another thread wakes the executor early by signaling the condition variable.

A topology cannot signal the condition variable to tell the executor to terminate, because it would risk a deadlock in the garbage collector: no destructor or weakref callback can take a lock to signal the condition variable (see PYTHON-863); thus the only way for a dying object to terminate a periodic executor is to set its “stopped” flag and let the executor see the flag next time it wakes.

We erred on the side of prompt cleanup, and set the check interval at 100ms. We assumed that checking a flag and going back to sleep 10 times a second was cheap on modern machines.

Starting in Python 3.2, the builtin C implementation of lock.acquire takes a timeout parameter, so Python 3.2+ Condition variables sleep simply by calling lock.acquire; they are implemented as efficiently as expected.

But in Python 2, lock.acquire has no timeout. To wait with a timeout, a Python 2 condition variable sleeps a millisecond, tries to acquire the lock, sleeps twice as long, and tries again. This exponential backoff reaches a maximum sleep time of 50ms.

If PyMongo calls the condition variable’s “wait” method with a short timeout, the exponential backoff is restarted frequently. Overall, the condition variable is not waking a few times a second, but hundreds of times. (See PYTHON-983.)

Thus the current design of periodic executors is surprisingly simple: they do a simple time.sleep for a half-second, check if it is time to wake or terminate, and sleep again.