PyMongo 3.9.0 Documentation

Overview

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python. This documentation attempts to explain everything you need to know to use PyMongo.

Installing / Upgrading
Instructions on how to get the distribution.
Tutorial
Start here for a quick overview.
Examples
Examples of how to perform specific tasks.
Using PyMongo with MongoDB Atlas
Using PyMongo with MongoDB Atlas.
TLS/SSL and PyMongo
Using PyMongo with TLS / SSL.
Frequently Asked Questions
Some questions that come up often.
PyMongo 3 Migration Guide
A PyMongo 2.x to 3.x migration guide.
Python 3 FAQ
Frequently asked questions about python 3 support.
Compatibility Policy
Explanation of deprecations, and how to keep pace with changes in PyMongo’s API.
API Documentation
The complete API documentation, organized by module.
Tools
A listing of Python tools and libraries that have been written for MongoDB.
Developer Guide
Developer guide for contributors to PyMongo.

Getting Help

If you’re having trouble or have questions about PyMongo, the best place to ask is the MongoDB user group. Once you get an answer, it’d be great if you could work it back into this documentation and contribute!

Issues

All issues should be reported (and can be tracked / voted for / commented on) at the main MongoDB JIRA bug tracker, in the “Python Driver” project.

Contributing

PyMongo has a large community and contributions are always encouraged. Contributions can be as simple as minor tweaks to this documentation. To contribute, fork the project on GitHub and send a pull request.

Changes

See the Changelog for a full list of changes to PyMongo. For older versions of the documentation please see the archive list.

About This Documentation

This documentation is generated using the Sphinx documentation generator. The source files for the documentation are located in the doc/ directory of the PyMongo distribution. To generate the docs locally run the following command from the root directory of the PyMongo source:

$ python setup.py doc

Indices and tables

Using PyMongo with MongoDB Atlas

Atlas is MongoDB, Inc.’s hosted MongoDB as a service offering. To connect to Atlas, pass the connection string provided by Atlas to MongoClient:

client = pymongo.MongoClient(<Atlas connection string>)

Connections to Atlas require TLS/SSL. For connections using TLS/SSL, PyMongo may require third party dependencies as determined by your version of Python. With PyMongo 3.3+, you can install PyMongo 3.3+ and any TLS/SSL-related dependencies using the following pip command:

$ python -m pip install pymongo[tls]

Earlier versions of PyMongo require you to manually install the dependencies. For a list of TLS/SSL-related dependencies, see TLS/SSL and PyMongo.

Note

Connecting to Atlas “Free Tier” or “Shared Cluster” instances requires Server Name Indication (SNI) support. SNI support requires CPython 2.7.9 / PyPy 2.5.1 or newer. To check if your version of Python supports SNI run the following command:

$ python -c "import ssl; print(getattr(ssl, 'HAS_SNI', False))"

You should see “True”.

Warning

Industry best practices recommend, and some regulations require, the use of TLS 1.1 or newer. Though no application changes are required for PyMongo to make use of the newest protocols, some operating systems or versions may not provide an OpenSSL version new enough to support them.

Users of macOS older than 10.13 (High Sierra) will need to install Python from python.org, homebrew, macports, or another similar source.

Users of Linux or other non-macOS Unix can check their OpenSSL version like this:

$ openssl version

If the version number is less than 1.0.1 support for TLS 1.1 or newer is not available. Contact your operating system vendor for a solution or upgrade to a newer distribution.

You can check your Python interpreter by installing the requests module and executing the following command:

python -c "import requests; print(requests.get('https://www.howsmyssl.com/a/check', verify=False).json()['tls_version'])"

You should see “TLS 1.X” where X is >= 1.

You can read more about TLS versions and their security implications here:

https://www.owasp.org/index.php/Transport_Layer_Protection_Cheat_Sheet#Rule_-_Only_Support_Strong_Protocols

Installing / Upgrading

PyMongo is in the Python Package Index.

Warning

Do not install the “bson” package from pypi. PyMongo comes with its own bson package; doing “pip install bson” or “easy_install bson” installs a third-party package that is incompatible with PyMongo.

Installing with pip

We recommend using pip to install pymongo on all platforms:

$ python -m pip install pymongo

To get a specific version of pymongo:

$ python -m pip install pymongo==3.5.1

To upgrade using pip:

$ python -m pip install --upgrade pymongo

Note

pip does not support installing python packages in .egg format. If you would like to install PyMongo from a .egg provided on pypi use easy_install instead.

Installing with easy_install

To use easy_install from setuptools do:

$ python -m easy_install pymongo

To upgrade do:

$ python -m easy_install -U pymongo

Dependencies

PyMongo supports CPython 2.7, 3.4+, PyPy, and PyPy3.5+.

Optional dependencies:

GSSAPI authentication requires pykerberos on Unix or WinKerberos on Windows. The correct dependency can be installed automatically along with PyMongo:

$ python -m pip install pymongo[gssapi]

Support for mongodb+srv:// URIs requires dnspython:

$ python -m pip install pymongo[srv]

TLS / SSL support may require ipaddress and certifi or wincertstore depending on the Python version in use. The necessary dependencies can be installed along with PyMongo:

$ python -m pip install pymongo[tls]

Wire protocol compression with snappy requires python-snappy:

$ python -m pip install pymongo[snappy]

Wire protocol compression with zstandard requires zstandard:

$ python -m pip install pymongo[zstd]

You can install all dependencies automatically with the following command:

$ python -m pip install pymongo[snappy,gssapi,srv,tls,zstd]

Other optional packages:

  • backports.pbkdf2, improves authentication performance with SCRAM-SHA-1 and SCRAM-SHA-256. It especially improves performance on Python versions older than 2.7.8.
  • monotonic adds support for a monotonic clock, which improves reliability in environments where clock adjustments are frequent. Not needed in Python 3.

Installing from source

If you’d rather install directly from the source (i.e. to stay on the bleeding edge), install the C extension dependencies then check out the latest source from GitHub and install the driver from the resulting tree:

$ git clone git://github.com/mongodb/mongo-python-driver.git pymongo
$ cd pymongo/
$ python setup.py install
Installing from source on Unix

To build the optional C extensions on Linux or another non-macOS Unix you must have the GNU C compiler (gcc) installed. Depending on your flavor of Unix (or Linux distribution) you may also need a python development package that provides the necessary header files for your version of Python. The package name may vary from distro to distro.

Debian and Ubuntu users should issue the following command:

$ sudo apt-get install build-essential python-dev

Users of Red Hat based distributions (RHEL, CentOS, Amazon Linux, Oracle Linux, Fedora, etc.) should issue the following command:

$ sudo yum install gcc python-devel
Installing from source on macOS / OSX

If you want to install PyMongo with C extensions from source you will need the command line developer tools. On modern versions of macOS they can be installed by running the following in Terminal (found in /Applications/Utilities/):

xcode-select --install

For older versions of OSX you may need Xcode. See the notes below for various OSX and Xcode versions.

Snow Leopard (10.6) - Xcode 3 with ‘UNIX Development Support’.

Snow Leopard Xcode 4: The Python versions shipped with OSX 10.6.x are universal binaries. They support i386, PPC, and x86_64. Xcode 4 removed support for PPC, causing the distutils version shipped with Apple’s builds of Python to fail to build the C extensions if you have Xcode 4 installed. There is a workaround:

# For some Python builds from python.org
$ env ARCHFLAGS='-arch i386 -arch x86_64' python -m easy_install pymongo

See http://bugs.python.org/issue11623 for a more detailed explanation.

Lion (10.7) and newer - PyMongo’s C extensions can be built against versions of Python 2.7 >= 2.7.4 or Python 3.4+ downloaded from python.org. In all cases Xcode must be installed with ‘UNIX Development Support’.

Xcode 5.1: Starting with version 5.1 the version of clang that ships with Xcode throws an error when it encounters compiler flags it doesn’t recognize. This may cause C extension builds to fail with an error similar to:

clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future]

There are workarounds:

# Apple specified workaround for Xcode 5.1
# easy_install
$ ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future easy_install pymongo
# or pip
$ ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future pip install pymongo

# Alternative workaround using CFLAGS
# easy_install
$ CFLAGS=-Qunused-arguments easy_install pymongo
# or pip
$ CFLAGS=-Qunused-arguments pip install pymongo
Installing from source on Windows

If you want to install PyMongo with C extensions from source the following requirements apply to both CPython and ActiveState’s ActivePython:

64-bit Windows

For Python 3.5 and newer install Visual Studio 2015. For Python 3.4 install Visual Studio 2010. You must use the full version of Visual Studio 2010 as Visual C++ Express does not provide 64-bit compilers. Make sure that you check the “x64 Compilers and Tools” option under Visual C++. For Python 2.7 install the Microsoft Visual C++ Compiler for Python 2.7.

32-bit Windows

For Python 3.5 and newer install Visual Studio 2015.

For Python 3.4 install Visual C++ 2010 Express.

For Python 2.7 install the Microsoft Visual C++ Compiler for Python 2.7

Installing Without C Extensions

By default, the driver attempts to build and install optional C extensions (used for increasing performance) when it is installed. If any extension fails to build the driver will be installed anyway but a warning will be printed.

If you wish to install PyMongo without the C extensions, even if the extensions build properly, it can be done using a command line option to setup.py:

$ python setup.py --no_ext install

Building PyMongo egg Packages

Some organizations do not allow compilers and other build tools on production systems. To install PyMongo on these systems with C extensions you may need to build custom egg packages. Make sure that you have installed the dependencies listed above for your operating system then run the following command in the PyMongo source directory:

$ python setup.py bdist_egg

The egg package can be found in the dist/ subdirectory. The file name will resemble “pymongo-3.6-py2.7-linux-x86_64.egg” but may have a different name depending on your platform and the version of python you use to compile.

Warning

These “binary distributions,” will only work on systems that resemble the environment on which you built the package. In other words, ensure that operating systems and versions of Python and architecture (i.e. “32” or “64” bit) match.

Copy this file to the target system and issue the following command to install the package:

$ sudo python -m easy_install pymongo-3.6-py2.7-linux-x86_64.egg

Installing a beta or release candidate

MongoDB, Inc. may occasionally tag a beta or release candidate for testing by the community before final release. These releases will not be uploaded to pypi but can be found on the GitHub tags page. They can be installed by passing the full URL for the tag to pip:

$ python -m pip install https://github.com/mongodb/mongo-python-driver/archive/3.9.0b1.tar.gz

or easy_install:

$ python -m easy_install https://github.com/mongodb/mongo-python-driver/archive/3.9.0b1.tar.gz

Tutorial

This tutorial is intended as an introduction to working with MongoDB and PyMongo.

Prerequisites

Before we start, make sure that you have the PyMongo distribution installed. In the Python shell, the following should run without raising an exception:

>>> import pymongo

This tutorial also assumes that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:

$ mongod

Making a Connection with MongoClient

The first step when working with PyMongo is to create a MongoClient to the running mongod instance. Doing so is easy:

>>> from pymongo import MongoClient
>>> client = MongoClient()

The above code will connect on the default host and port. We can also specify the host and port explicitly, as follows:

>>> client = MongoClient('localhost', 27017)

Or use the MongoDB URI format:

>>> client = MongoClient('mongodb://localhost:27017/')

Getting a Database

A single instance of MongoDB can support multiple independent databases. When working with PyMongo you access databases using attribute style access on MongoClient instances:

>>> db = client.test_database

If your database name is such that using attribute style access won’t work (like test-database), you can use dictionary style access instead:

>>> db = client['test-database']

Getting a Collection

A collection is a group of documents stored in MongoDB, and can be thought of as roughly the equivalent of a table in a relational database. Getting a collection in PyMongo works the same as getting a database:

>>> collection = db.test_collection

or (using dictionary style access):

>>> collection = db['test-collection']

An important note about collections (and databases) in MongoDB is that they are created lazily - none of the above commands have actually performed any operations on the MongoDB server. Collections and databases are created when the first document is inserted into them.

Documents

Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

>>> import datetime
>>> post = {"author": "Mike",
...         "text": "My first blog post!",
...         "tags": ["mongodb", "python", "pymongo"],
...         "date": datetime.datetime.utcnow()}

Note that documents can contain native Python types (like datetime.datetime instances) which will be automatically converted to and from the appropriate BSON types.

Inserting a Document

To insert a document into a collection we can use the insert_one() method:

>>> posts = db.posts
>>> post_id = posts.insert_one(post).inserted_id
>>> post_id
ObjectId('...')

When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection. insert_one() returns an instance of InsertOneResult. For more information on "_id", see the documentation on _id.

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:

>>> db.list_collection_names()
[u'posts']

Getting a Single Document With find_one()

The most basic type of query that can be performed in MongoDB is find_one(). This method returns a single document matching a query (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match. Here we use find_one() to get the first document from the posts collection:

>>> import pprint
>>> pprint.pprint(posts.find_one())
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

The result is a dictionary matching the one that we inserted previously.

Note

The returned document contains an "_id", which was automatically added on insert.

find_one() also supports querying on specific elements that the resulting document must match. To limit our results to a document with author “Mike” we do:

>>> pprint.pprint(posts.find_one({"author": "Mike"}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

If we try with a different author, like “Eliot”, we’ll get no result:

>>> posts.find_one({"author": "Eliot"})
>>>

Querying By ObjectId

We can also find a post by its _id, which in our example is an ObjectId:

>>> post_id
ObjectId(...)
>>> pprint.pprint(posts.find_one({"_id": post_id}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

Note that an ObjectId is not the same as its string representation:

>>> post_id_as_str = str(post_id)
>>> posts.find_one({"_id": post_id_as_str}) # No result
>>>

A common task in web applications is to get an ObjectId from the request URL and find the matching document. It’s necessary in this case to convert the ObjectId from a string before passing it to find_one:

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

A Note On Unicode Strings

You probably noticed that the regular Python strings we stored earlier look different when retrieved from the server (e.g. u’Mike’ instead of ‘Mike’). A short explanation is in order.

MongoDB stores data in BSON format. BSON strings are UTF-8 encoded so PyMongo must ensure that any strings it stores contain only valid UTF-8 data. Regular strings (<type ‘str’>) are validated and stored unaltered. Unicode strings (<type ‘unicode’>) are encoded UTF-8 first. The reason our example string is represented in the Python shell as u’Mike’ instead of ‘Mike’ is that PyMongo decodes each BSON string to a Python unicode string, not a regular str.

You can read more about Python unicode strings here.

Bulk Inserts

In order to make querying a little more interesting, let’s insert a few more documents. In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to insert_many(). This will insert each document in the list, sending only a single command to the server:

>>> new_posts = [{"author": "Mike",
...               "text": "Another post!",
...               "tags": ["bulk", "insert"],
...               "date": datetime.datetime(2009, 11, 12, 11, 14)},
...              {"author": "Eliot",
...               "title": "MongoDB is fun",
...               "text": "and pretty easy too!",
...               "date": datetime.datetime(2009, 11, 10, 10, 45)}]
>>> result = posts.insert_many(new_posts)
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...')]

There are a couple of interesting things to note about this example:

  • The result from insert_many() now returns two ObjectId instances, one for each inserted document.
  • new_posts[1] has a different “shape” than the other posts - there is no "tags" field and we’ve added a new field, "title". This is what we mean when we say that MongoDB is schema-free.

Querying for More Than One Document

To get more than a single document as the result of a query we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents. For example, we can iterate over every document in the posts collection:

>>> for post in posts.find():
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}

Just like we did with find_one(), we can pass a document to find() to limit the returned results. Here, we get only those documents whose author is “Mike”:

>>> for post in posts.find({"author": "Mike"}):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

Counting

If we just want to know how many documents match a query we can perform a count_documents() operation instead of a full query. We can get a count of all of the documents in a collection:

>>> posts.count_documents({})
3

or just of those documents that match a specific query:

>>> posts.count_documents({"author": "Mike"})
2

Range Queries

MongoDB supports many different types of advanced queries. As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author:

>>> d = datetime.datetime(2009, 11, 12, 12)
>>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

Here we use the special "$lt" operator to do a range query, and also call sort() to sort the results by author.

Indexing

Adding indexes can help accelerate certain queries and can also add additional functionality to querying and storing documents. In this example, we’ll demonstrate how to create a unique index on a key that rejects documents whose value for that key already exists in the index.

First, we’ll need to create the index:

>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)],
...                                   unique=True)
>>> sorted(list(db.profiles.index_information()))
[u'_id_', u'user_id_1']

Notice that we have two indexes now: one is the index on _id that MongoDB creates automatically, and the other is the index on user_id we just created.

Now let’s set up some user profiles:

>>> user_profiles = [
...     {'user_id': 211, 'name': 'Luke'},
...     {'user_id': 212, 'name': 'Ziltoid'}]
>>> result = db.profiles.insert_many(user_profiles)

The index prevents us from inserting a document whose user_id is already in the collection:

>>> new_profile = {'user_id': 213, 'name': 'Drew'}
>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
>>> result = db.profiles.insert_one(new_profile)  # This is fine.
>>> result = db.profiles.insert_one(duplicate_profile)
Traceback (most recent call last):
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }

See also

The MongoDB documentation on indexes

Examples

The examples in this section are intended to give in depth overviews of how to accomplish specific tasks with MongoDB and PyMongo.

Unless otherwise noted, all examples assume that a MongoDB instance is running on the default host and port. Assuming you have downloaded and installed MongoDB, you can start it like so:

$ mongod

Aggregation Examples

There are several methods of performing aggregations in MongoDB. These examples cover the new aggregation framework, using map reduce and using the group method.

Setup

To start, we’ll insert some example data which we can perform aggregations on:

>>> from pymongo import MongoClient
>>> db = MongoClient().aggregation_example
>>> result = db.things.insert_many([{"x": 1, "tags": ["dog", "cat"]},
...                                 {"x": 2, "tags": ["cat"]},
...                                 {"x": 2, "tags": ["mouse", "cat", "dog"]},
...                                 {"x": 3, "tags": []}])
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')]
Aggregation Framework

This example shows how to use the aggregate() method to use the aggregation framework. We’ll perform a simple aggregation to count the number of occurrences for each tag in the tags array, across the entire collection. To achieve this we need to pass in three operations to the pipeline. First, we need to unwind the tags array, then group by the tags and sum them up, finally we sort by count.

As python dictionaries don’t maintain order you should use SON or collections.OrderedDict where explicit ordering is required eg “$sort”:

Note

aggregate requires server version >= 2.1.0.

>>> from bson.son import SON
>>> pipeline = [
...     {"$unwind": "$tags"},
...     {"$group": {"_id": "$tags", "count": {"$sum": 1}}},
...     {"$sort": SON([("count", -1), ("_id", -1)])}
... ]
>>> import pprint
>>> pprint.pprint(list(db.things.aggregate(pipeline)))
[{u'_id': u'cat', u'count': 3},
 {u'_id': u'dog', u'count': 2},
 {u'_id': u'mouse', u'count': 1}]

To run an explain plan for this aggregation use the command() method:

>>> db.command('aggregate', 'things', pipeline=pipeline, explain=True)
{u'ok': 1.0, u'stages': [...]}

As well as simple aggregations the aggregation framework provides projection capabilities to reshape the returned data. Using projections and aggregation, you can add computed fields, create new virtual sub-objects, and extract sub-fields into the top-level of results.

See also

The full documentation for MongoDB’s aggregation framework

Map/Reduce

Another option for aggregation is to use the map reduce framework. Here we will define map and reduce functions to also count the number of occurrences for each tag in the tags array, across the entire collection.

Our map function just emits a single (key, 1) pair for each tag in the array:

>>> from bson.code import Code
>>> mapper = Code("""
...               function () {
...                 this.tags.forEach(function(z) {
...                   emit(z, 1);
...                 });
...               }
...               """)

The reduce function sums over all of the emitted values for a given key:

>>> reducer = Code("""
...                function (key, values) {
...                  var total = 0;
...                  for (var i = 0; i < values.length; i++) {
...                    total += values[i];
...                  }
...                  return total;
...                }
...                """)

Note

We can’t just return values.length as the reduce function might be called iteratively on the results of other reduce steps.

Finally, we call map_reduce() and iterate over the result collection:

>>> result = db.things.map_reduce(mapper, reducer, "myresults")
>>> for doc in result.find():
...   pprint.pprint(doc)
...
{u'_id': u'cat', u'value': 3.0}
{u'_id': u'dog', u'value': 2.0}
{u'_id': u'mouse', u'value': 1.0}
Advanced Map/Reduce

PyMongo’s API supports all of the features of MongoDB’s map/reduce engine. One interesting feature is the ability to get more detailed results when desired, by passing full_response=True to map_reduce(). This returns the full response to the map/reduce command, rather than just the result collection:

>>> pprint.pprint(
...     db.things.map_reduce(mapper, reducer, "myresults", full_response=True))
{...u'counts': {u'emit': 6, u'input': 4, u'output': 3, u'reduce': 2},
 u'ok': ...,
 u'result': u'...',
 u'timeMillis': ...}

All of the optional map/reduce parameters are also supported, simply pass them as keyword arguments. In this example we use the query parameter to limit the documents that will be mapped over:

>>> results = db.things.map_reduce(
...     mapper, reducer, "myresults", query={"x": {"$lt": 2}})
>>> for doc in results.find():
...   pprint.pprint(doc)
...
{u'_id': u'cat', u'value': 1.0}
{u'_id': u'dog', u'value': 1.0}

You can use SON or collections.OrderedDict to specify a different database to store the result collection:

>>> from bson.son import SON
>>> pprint.pprint(
...     db.things.map_reduce(
...         mapper,
...         reducer,
...         out=SON([("replace", "results"), ("db", "outdb")]),
...         full_response=True))
{...u'counts': {u'emit': 6, u'input': 4, u'output': 3, u'reduce': 2},
 u'ok': ...,
 u'result': {u'collection': ..., u'db': ...},
 u'timeMillis': ...}

See also

The full list of options for MongoDB’s map reduce engine

Authentication Examples

MongoDB supports several different authentication mechanisms. These examples cover all authentication methods currently supported by PyMongo, documenting Python module and MongoDB version dependencies.

Percent-Escaping Username and Password

Username and password must be percent-escaped with urllib.parse.quote_plus() in Python 3, or urllib.quote_plus() in Python 2, to be used in a MongoDB URI. For example, in Python 3:

>>> from pymongo import MongoClient
>>> import urllib.parse
>>> username = urllib.parse.quote_plus('user')
>>> username
'user'
>>> password = urllib.parse.quote_plus('pass/word')
>>> password
'pass%2Fword'
>>> MongoClient('mongodb://%s:%s@127.0.0.1' % (username, password))
...
SCRAM-SHA-256 (RFC 7677)

New in version 3.7.

SCRAM-SHA-256 is the default authentication mechanism supported by a cluster configured for authentication with MongoDB 4.0 or later. Authentication requires a username, a password, and a database name. The default database name is “admin”, this can be overidden with the authSource option. Credentials can be specified as arguments to MongoClient:

>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
...                      username='user',
...                      password='password',
...                      authSource='the_database',
...                      authMechanism='SCRAM-SHA-256')

Or through the MongoDB URI:

>>> uri = "mongodb://user:password@example.com/?authSource=the_database&authMechanism=SCRAM-SHA-256"
>>> client = MongoClient(uri)
SCRAM-SHA-1 (RFC 5802)

New in version 2.8.

SCRAM-SHA-1 is the default authentication mechanism supported by a cluster configured for authentication with MongoDB 3.0 or later. Authentication requires a username, a password, and a database name. The default database name is “admin”, this can be overidden with the authSource option. Credentials can be specified as arguments to MongoClient:

>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
...                      username='user',
...                      password='password',
...                      authSource='the_database',
...                      authMechanism='SCRAM-SHA-1')

Or through the MongoDB URI:

>>> uri = "mongodb://user:password@example.com/?authSource=the_database&authMechanism=SCRAM-SHA-1"
>>> client = MongoClient(uri)

For best performance on Python versions older than 2.7.8 install backports.pbkdf2.

MONGODB-CR

Warning

MONGODB-CR was deprecated with the release of MongoDB 3.6 and is no longer supported by MongoDB 4.0.

Before MongoDB 3.0 the default authentication mechanism was MONGODB-CR, the “MongoDB Challenge-Response” protocol:

>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
...                      username='user',
...                      password='password',
...                      authMechanism='MONGODB-CR')
>>>
>>> uri = "mongodb://user:password@example.com/?authSource=the_database&authMechanism=MONGODB-CR"
>>> client = MongoClient(uri)
Default Authentication Mechanism

If no mechanism is specified, PyMongo automatically uses MONGODB-CR when connected to a pre-3.0 version of MongoDB, SCRAM-SHA-1 when connected to MongoDB 3.0 through 3.6, and negotiates the mechanism to use (SCRAM-SHA-1 or SCRAM-SHA-256) when connected to MongoDB 4.0+.

Default Database and “authSource”

You can specify both a default database and the authentication database in the URI:

>>> uri = "mongodb://user:password@example.com/default_db?authSource=admin"
>>> client = MongoClient(uri)

PyMongo will authenticate on the “admin” database, but the default database will be “default_db”:

>>> # get_database with no "name" argument chooses the DB from the URI
>>> db = MongoClient(uri).get_database()
>>> print(db.name)
'default_db'
MONGODB-X509

New in version 2.6.

The MONGODB-X509 mechanism authenticates a username derived from the distinguished subject name of the X.509 certificate presented by the driver during SSL negotiation. This authentication method requires the use of SSL connections with certificate validation and is available in MongoDB 2.6 and newer:

>>> import ssl
>>> from pymongo import MongoClient
>>> client = MongoClient('example.com',
...                      username="<X.509 derived username>"
...                      authMechanism="MONGODB-X509",
...                      ssl=True,
...                      ssl_certfile='/path/to/client.pem',
...                      ssl_cert_reqs=ssl.CERT_REQUIRED,
...                      ssl_ca_certs='/path/to/ca.pem')

MONGODB-X509 authenticates against the $external virtual database, so you do not have to specify a database in the URI:

>>> uri = "mongodb://<X.509 derived username>@example.com/?authMechanism=MONGODB-X509"
>>> client = MongoClient(uri,
...                      ssl=True,
...                      ssl_certfile='/path/to/client.pem',
...                      ssl_cert_reqs=ssl.CERT_REQUIRED,
...                      ssl_ca_certs='/path/to/ca.pem')
>>>

Changed in version 3.4: When connected to MongoDB >= 3.4 the username is no longer required.

GSSAPI (Kerberos)

New in version 2.5.

GSSAPI (Kerberos) authentication is available in the Enterprise Edition of MongoDB.

Unix

To authenticate using GSSAPI you must first install the python kerberos or pykerberos module using easy_install or pip. Make sure you run kinit before using the following authentication methods:

$ kinit mongodbuser@EXAMPLE.COM
mongodbuser@EXAMPLE.COM's Password:
$ klist
Credentials cache: FILE:/tmp/krb5cc_1000
        Principal: mongodbuser@EXAMPLE.COM

  Issued                Expires               Principal
Feb  9 13:48:51 2013  Feb  9 23:48:51 2013  krbtgt/EXAMPLE.COM@EXAMPLE.COM

Now authenticate using the MongoDB URI. GSSAPI authenticates against the $external virtual database so you do not have to specify a database in the URI:

>>> # Note: the kerberos principal must be url encoded.
>>> from pymongo import MongoClient
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@mongo-server.example.com/?authMechanism=GSSAPI"
>>> client = MongoClient(uri)
>>>

The default service name used by MongoDB and PyMongo is mongodb. You can specify a custom service name with the authMechanismProperties option:

>>> from pymongo import MongoClient
>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@mongo-server.example.com/?authMechanism=GSSAPI&authMechanismProperties=SERVICE_NAME:myservicename"
>>> client = MongoClient(uri)
Windows (SSPI)

New in version 3.3.

First install the winkerberos module. Unlike authentication on Unix kinit is not used. If the user to authenticate is different from the user that owns the application process provide a password to authenticate:

>>> uri = "mongodb://mongodbuser%40EXAMPLE.COM:mongodbuserpassword@example.com/?authMechanism=GSSAPI"

Two extra authMechanismProperties are supported on Windows platforms:

  • CANONICALIZE_HOST_NAME - Uses the fully qualified domain name (FQDN) of the MongoDB host for the server principal (GSSAPI libraries on Unix do this by default):

    >>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@example.com/?authMechanism=GSSAPI&authMechanismProperties=CANONICALIZE_HOST_NAME:true"
    
  • SERVICE_REALM - This is used when the user’s realm is different from the service’s realm:

    >>> uri = "mongodb://mongodbuser%40EXAMPLE.COM@example.com/?authMechanism=GSSAPI&authMechanismProperties=SERVICE_REALM:otherrealm"
    
SASL PLAIN (RFC 4616)

New in version 2.6.

MongoDB Enterprise Edition version 2.6 and newer support the SASL PLAIN authentication mechanism, initially intended for delegating authentication to an LDAP server. Using the PLAIN mechanism is very similar to MONGODB-CR. These examples use the $external virtual database for LDAP support:

>>> from pymongo import MongoClient
>>> uri = "mongodb://user:password@example.com/?authMechanism=PLAIN"
>>> client = MongoClient(uri)
>>>

SASL PLAIN is a clear-text authentication mechanism. We strongly recommend that you connect to MongoDB using SSL with certificate validation when using the SASL PLAIN mechanism:

>>> import ssl
>>> from pymongo import MongoClient
>>> uri = "mongodb://user:password@example.com/?authMechanism=PLAIN"
>>> client = MongoClient(uri,
...                      ssl=True,
...                      ssl_certfile='/path/to/client.pem',
...                      ssl_cert_reqs=ssl.CERT_REQUIRED,
...                      ssl_ca_certs='/path/to/ca.pem')
>>>

Collations

See also

The API docs for collation.

Collations are a new feature in MongoDB version 3.4. They provide a set of rules to use when comparing strings that comply with the conventions of a particular language, such as Spanish or German. If no collation is specified, the server sorts strings based on a binary comparison. Many languages have specific ordering rules, and collations allow users to build applications that adhere to language-specific comparison rules.

In French, for example, the last accent in a given word determines the sorting order. The correct sorting order for the following four words in French is:

cote < côte < coté < côté

Specifying a French collation allows users to sort string fields using the French sort order.

Usage

Users can specify a collation for a collection, an index, or a CRUD command.

Collation Parameters:

Collations can be specified with the Collation model or with plain Python dictionaries. The structure is the same:

Collation(locale=<string>,
          caseLevel=<bool>,
          caseFirst=<string>,
          strength=<int>,
          numericOrdering=<bool>,
          alternate=<string>,
          maxVariable=<string>,
          backwards=<bool>)

The only required parameter is locale, which the server parses as an ICU format locale ID. For example, set locale to en_US to represent US English or fr_CA to represent Canadian French.

For a complete description of the available parameters, see the MongoDB manual.

Assign a Default Collation to a Collection

The following example demonstrates how to create a new collection called contacts and assign a default collation with the fr_CA locale. This operation ensures that all queries that are run against the contacts collection use the fr_CA collation unless another collation is explicitly specified:

from pymongo import MongoClient
from pymongo.collation import Collation

db = MongoClient().test
collection = db.create_collection('contacts',
                                  collation=Collation(locale='fr_CA'))
Assign a Default Collation to an Index

When creating a new index, you can specify a default collation.

The following example shows how to create an index on the name field of the contacts collection, with the unique parameter enabled and a default collation with locale set to fr_CA:

from pymongo import MongoClient
from pymongo.collation import Collation

contacts = MongoClient().test.contacts
contacts.create_index('name',
                      unique=True,
                      collation=Collation(locale='fr_CA'))
Specify a Collation for a Query

Individual queries can specify a collation to use when sorting results. The following example demonstrates a query that runs on the contacts collection in database test. It matches on documents that contain New York in the city field, and sorts on the name field with the fr_CA collation:

from pymongo import MongoClient
from pymongo.collation import Collation

collection = MongoClient().test.contacts
docs = collection.find({'city': 'New York'}).sort('name').collation(
    Collation(locale='fr_CA'))
Other Query Types

You can use collations to control document matching rules for several different types of queries. All the various update and delete methods (update_one(), update_many(), delete_one(), etc.) support collation, and you can create query filters which employ collations to comply with any of the languages and variants available to the locale parameter.

The following example uses a collation with strength set to SECONDARY, which considers only the base character and character accents in string comparisons, but not case sensitivity, for example. All documents in the contacts collection with jürgen (case-insensitive) in the first_name field are updated:

from pymongo import MongoClient
from pymongo.collation import Collation, CollationStrength

contacts = MongoClient().test.contacts
result = contacts.update_many(
    {'first_name': 'jürgen'},
    {'$set': {'verified': 1}},
    collation=Collation(locale='de',
                        strength=CollationStrength.SECONDARY))

Copying a Database

To copy a database within a single mongod process, or between mongod servers, simply connect to the target mongod and use the command() method:

>>> from pymongo import MongoClient
>>> client = MongoClient('target.example.com')
>>> client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name')

To copy from a different mongod server that is not password-protected:

>>> client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name',
                         fromhost='source.example.com')

If the target server is password-protected, authenticate to the “admin” database:

>>> client = MongoClient('target.example.com',
...                      username='administrator',
...                      password='pwd')
>>> client.admin.command('copydb',
                         fromdb='source_db_name',
                         todb='target_db_name',
                         fromhost='source.example.com')

See the authentication examples.

If the source server is password-protected, use the copyDatabase function in the mongo shell.

Versions of PyMongo before 3.0 included a copy_database helper method, but it has been removed.

Custom Type Example

This is an example of using a custom type with PyMongo. The example here shows how to subclass TypeCodec to write a type codec, which is used to populate a TypeRegistry. The type registry can then be used to create a custom-type-aware Collection. Read and write operations issued against the resulting collection object transparently manipulate documents as they are saved to or retrieved from MongoDB.

Setting Up

We’ll start by getting a clean database to use for the example:

>>> from pymongo import MongoClient
>>> client = MongoClient()
>>> client.drop_database('custom_type_example')
>>> db = client.custom_type_example

Since the purpose of the example is to demonstrate working with custom types, we’ll need a custom data type to use. For this example, we will be working with the Decimal type from Python’s standard library. Since the BSON library’s Decimal128 type (that implements the IEEE 754 decimal128 decimal-based floating-point numbering format) is distinct from Python’s built-in Decimal type, attempting to save an instance of Decimal with PyMongo, results in an InvalidDocument exception.

>>> from decimal import Decimal
>>> num = Decimal("45.321")
>>> db.test.insert_one({'num': num})
Traceback (most recent call last):
...
bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'), of type: <class 'decimal.Decimal'>
The TypeCodec Class

New in version 3.8.

In order to encode a custom type, we must first define a type codec for that type. A type codec describes how an instance of a custom type can be transformed to and/or from one of the types bson already understands. Depending on the desired functionality, users must choose from the following base classes when defining type codecs:

  • TypeEncoder: subclass this to define a codec that encodes a custom Python type to a known BSON type. Users must implement the python_type property/attribute and the transform_python method.
  • TypeDecoder: subclass this to define a codec that decodes a specified BSON type into a custom Python type. Users must implement the bson_type property/attribute and the transform_bson method.
  • TypeCodec: subclass this to define a codec that can both encode and decode a custom type. Users must implement the python_type and bson_type properties/attributes, as well as the transform_python and transform_bson methods.

The type codec for our custom type simply needs to define how a Decimal instance can be converted into a Decimal128 instance and vice-versa. Since we are interested in both encoding and decoding our custom type, we use the TypeCodec base class to define our codec:

>>> from bson.decimal128 import Decimal128
>>> from bson.codec_options import TypeCodec
>>> class DecimalCodec(TypeCodec):
...     python_type = Decimal    # the Python type acted upon by this type codec
...     bson_type = Decimal128   # the BSON type acted upon by this type codec
...     def transform_python(self, value):
...         """Function that transforms a custom type value into a type
...         that BSON can encode."""
...         return Decimal128(value)
...     def transform_bson(self, value):
...         """Function that transforms a vanilla BSON type value into our
...         custom type."""
...         return value.to_decimal()
>>> decimal_codec = DecimalCodec()
The TypeRegistry Class

New in version 3.8.

Before we can begin encoding and decoding our custom type objects, we must first inform PyMongo about the corresponding codec. This is done by creating a TypeRegistry instance:

>>> from bson.codec_options import TypeRegistry
>>> type_registry = TypeRegistry([decimal_codec])

Note that type registries can be instantiated with any number of type codecs. Once instantiated, registries are immutable and the only way to add codecs to a registry is to create a new one.

Putting It Together

Finally, we can define a CodecOptions instance with our type_registry and use it to get a Collection object that understands the Decimal data type:

>>> from bson.codec_options import CodecOptions
>>> codec_options = CodecOptions(type_registry=type_registry)
>>> collection = db.get_collection('test', codec_options=codec_options)

Now, we can seamlessly encode and decode instances of Decimal:

>>> collection.insert_one({'num': Decimal("45.321")})
<pymongo.results.InsertOneResult object at ...>
>>> mydoc = collection.find_one()
>>> import pprint
>>> pprint.pprint(mydoc)
{u'_id': ObjectId('...'), u'num': Decimal('45.321')}

We can see what’s actually being saved to the database by creating a fresh collection object without the customized codec options and using that to query MongoDB:

>>> vanilla_collection = db.get_collection('test')
>>> pprint.pprint(vanilla_collection.find_one())
{u'_id': ObjectId('...'), u'num': Decimal128('45.321')}
Encoding Subtypes

Consider the situation where, in addition to encoding Decimal, we also need to encode a type that subclasses Decimal. PyMongo does this automatically for types that inherit from Python types that are BSON-encodable by default, but the type codec system described above does not offer the same flexibility.

Consider this subtype of Decimal that has a method to return its value as an integer:

>>> class DecimalInt(Decimal):
...     def my_method(self):
...         """Method implementing some custom logic."""
...         return int(self)

If we try to save an instance of this type without first registering a type codec for it, we get an error:

>>> collection.insert_one({'num': DecimalInt("45.321")})
Traceback (most recent call last):
...
bson.errors.InvalidDocument: cannot encode object: Decimal('45.321'), of type: <class 'decimal.Decimal'>

In order to proceed further, we must define a type codec for DecimalInt. This is trivial to do since the same transformation as the one used for Decimal is adequate for encoding DecimalInt as well:

>>> class DecimalIntCodec(DecimalCodec):
...     @property
...     def python_type(self):
...         """The Python type acted upon by this type codec."""
...         return DecimalInt
>>> decimalint_codec = DecimalIntCodec()

Note

No attempt is made to modify decoding behavior because without additional information, it is impossible to discern which incoming Decimal128 value needs to be decoded as Decimal and which needs to be decoded as DecimalInt. This example only considers the situation where a user wants to encode documents containing either of these types.

After creating a new codec options object and using it to get a collection object, we can seamlessly encode instances of DecimalInt:

>>> type_registry = TypeRegistry([decimal_codec, decimalint_codec])
>>> codec_options = CodecOptions(type_registry=type_registry)
>>> collection = db.get_collection('test', codec_options=codec_options)
>>> collection.drop()
>>> collection.insert_one({'num': DecimalInt("45.321")})
<pymongo.results.InsertOneResult object at ...>
>>> mydoc = collection.find_one()
>>> pprint.pprint(mydoc)
{u'_id': ObjectId('...'), u'num': Decimal('45.321')}

Note that the transform_bson method of the base codec class results in these values being decoded as Decimal (and not DecimalInt).

Decoding Binary Types

The decoding treatment of Binary types having subtype = 0 by the bson module varies slightly depending on the version of the Python runtime in use. This must be taken into account while writing a TypeDecoder that modifies how this datatype is decoded.

On Python 3.x, Binary data (subtype = 0) is decoded as a bytes instance:

>>> # On Python 3.x.
>>> from bson.binary import Binary
>>> newcoll = db.get_collection('new')
>>> newcoll.insert_one({'_id': 1, 'data': Binary(b"123", subtype=0)})
>>> doc = newcoll.find_one()
>>> type(doc['data'])
bytes

On Python 2.7.x, the same data is decoded as a Binary instance:

>>> # On Python 2.7.x
>>> newcoll = db.get_collection('new')
>>> doc = newcoll.find_one()
>>> type(doc['data'])
bson.binary.Binary

As a consequence of this disparity, users must set the bson_type attribute on their TypeDecoder classes differently, depending on the python version in use.

Note

For codebases requiring compatibility with both Python 2 and 3, type decoders will have to be registered for both possible bson_type values.

The fallback_encoder Callable

New in version 3.8.

In addition to type codecs, users can also register a callable to encode types that BSON doesn’t recognize and for which no type codec has been registered. This callable is the fallback encoder and like the transform_python method, it accepts an unencodable value as a parameter and returns a BSON-encodable value. The following fallback encoder encodes python’s Decimal type to a Decimal128:

>>> def fallback_encoder(value):
...     if isinstance(value, Decimal):
...         return Decimal128(value)
...     return value

After declaring the callback, we must create a type registry and codec options with this fallback encoder before it can be used for initializing a collection:

>>> type_registry = TypeRegistry(fallback_encoder=fallback_encoder)
>>> codec_options = CodecOptions(type_registry=type_registry)
>>> collection = db.get_collection('test', codec_options=codec_options)
>>> collection.drop()

We can now seamlessly encode instances of Decimal:

>>> collection.insert_one({'num': Decimal("45.321")})
<pymongo.results.InsertOneResult object at ...>
>>> mydoc = collection.find_one()
>>> pprint.pprint(mydoc)
{u'_id': ObjectId('...'), u'num': Decimal128('45.321')}

Note

Fallback encoders are invoked after attempts to encode the given value with standard BSON encoders and any configured type encoders have failed. Therefore, in a type registry configured with a type encoder and fallback encoder that both target the same custom type, the behavior specified in the type encoder will prevail.

Because fallback encoders don’t need to declare the types that they encode beforehand, they can be used to support interesting use-cases that cannot be serviced by TypeEncoder. One such use-case is described in the next section.

Encoding Unknown Types

In this example, we demonstrate how a fallback encoder can be used to save arbitrary objects to the database. We will use the the standard library’s pickle module to serialize the unknown types and so naturally, this approach only works for types that are picklable.

We start by defining some arbitrary custom types:

class MyStringType(object):
    def __init__(self, value):
        self.__value = value
    def __repr__(self):
        return "MyStringType('%s')" % (self.__value,)

class MyNumberType(object):
    def __init__(self, value):
        self.__value = value
    def __repr__(self):
        return "MyNumberType(%s)" % (self.__value,)

We also define a fallback encoder that pickles whatever objects it receives and returns them as Binary instances with a custom subtype. The custom subtype, in turn, allows us to write a TypeDecoder that identifies pickled artifacts upon retrieval and transparently decodes them back into Python objects:

import pickle
from bson.binary import Binary, USER_DEFINED_SUBTYPE
def fallback_pickle_encoder(value):
    return Binary(pickle.dumps(value), USER_DEFINED_SUBTYPE)

class PickledBinaryDecoder(TypeDecoder):
    bson_type = Binary
    def transform_bson(self, value):
        if value.subtype == USER_DEFINED_SUBTYPE:
            return pickle.loads(value)
        return value

Note

The above example is written assuming the use of Python 3. If you are using Python 2, bson_type must be set to Binary. See the Decoding Binary Types section for a detailed explanation.

Finally, we create a CodecOptions instance:

codec_options = CodecOptions(type_registry=TypeRegistry(
    [PickledBinaryDecoder()], fallback_encoder=fallback_pickle_encoder))

We can now round trip our custom objects to MongoDB:

collection = db.get_collection('test_fe', codec_options=codec_options)
collection.insert_one({'_id': 1, 'str': MyStringType("hello world"),
                       'num': MyNumberType(2)})
mydoc = collection.find_one()
assert isinstance(mydoc['str'], MyStringType)
assert isinstance(mydoc['num'], MyNumberType)
Limitations

PyMongo’s type codec and fallback encoder features have the following limitations:

  1. Users cannot customize the encoding behavior of Python types that PyMongo already understands like int and str (the ‘built-in types’). Attempting to instantiate a type registry with one or more codecs that act upon a built-in type results in a TypeError. This limitation extends to all subtypes of the standard types.
  2. Chaining type encoders is not supported. A custom type value, once transformed by a codec’s transform_python method, must result in a type that is either BSON-encodable by default, or can be transformed by the fallback encoder into something BSON-encodable–it cannot be transformed a second time by a different type codec.
  3. The command() method does not apply the user’s TypeDecoders while decoding the command response document.
  4. gridfs does not apply custom type encoding or decoding to any documents received from or to returned to the user.

Bulk Write Operations

This tutorial explains how to take advantage of PyMongo’s bulk write operation features. Executing write operations in batches reduces the number of network round trips, increasing write throughput.

Bulk Insert

New in version 2.6.

A batch of documents can be inserted by passing a list to the insert_many() method. PyMongo will automatically split the batch into smaller sub-batches based on the maximum message size accepted by MongoDB, supporting very large bulk insert operations.

>>> import pymongo
>>> db = pymongo.MongoClient().bulk_example
>>> db.test.insert_many([{'i': i} for i in range(10000)]).inserted_ids
[...]
>>> db.test.count_documents({})
10000
Mixed Bulk Write Operations

New in version 2.7.

PyMongo also supports executing mixed bulk write operations. A batch of insert, update, and remove operations can be executed together using the bulk write operations API.

Ordered Bulk Write Operations

Ordered bulk write operations are batched and sent to the server in the order provided for serial execution. The return value is an instance of BulkWriteResult describing the type and count of operations performed.

>>> from pprint import pprint
>>> from pymongo import InsertOne, DeleteMany, ReplaceOne, UpdateOne
>>> result = db.test.bulk_write([
...     DeleteMany({}),  # Remove all documents from the previous example.
...     InsertOne({'_id': 1}),
...     InsertOne({'_id': 2}),
...     InsertOne({'_id': 3}),
...     UpdateOne({'_id': 1}, {'$set': {'foo': 'bar'}}),
...     UpdateOne({'_id': 4}, {'$inc': {'j': 1}}, upsert=True),
...     ReplaceOne({'j': 1}, {'j': 2})])
>>> pprint(result.bulk_api_result)
{'nInserted': 3,
 'nMatched': 2,
 'nModified': 2,
 'nRemoved': 10000,
 'nUpserted': 1,
 'upserted': [{u'_id': 4, u'index': 5}],
 'writeConcernErrors': [],
 'writeErrors': []}

Warning

nModified is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, PyMongo omits this field from the results of a bulk write operation.

The first write failure that occurs (e.g. duplicate key error) aborts the remaining operations, and PyMongo raises BulkWriteError. The details attibute of the exception instance provides the execution results up until the failure occurred and details about the failure - including the operation that caused the failure.

>>> from pymongo import InsertOne, DeleteOne, ReplaceOne
>>> from pymongo.errors import BulkWriteError
>>> requests = [
...     ReplaceOne({'j': 2}, {'i': 5}),
...     InsertOne({'_id': 4}),  # Violates the unique key constraint on _id.
...     DeleteOne({'i': 5})]
>>> try:
...     db.test.bulk_write(requests)
... except BulkWriteError as bwe:
...     pprint(bwe.details)
...
{'nInserted': 0,
 'nMatched': 1,
 'nModified': 1,
 'nRemoved': 0,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [],
 'writeErrors': [{u'code': 11000,
                  u'errmsg': u'...E11000...duplicate key error...',
                  u'index': 1,...
                  u'op': {'_id': 4}}]}
Unordered Bulk Write Operations

Unordered bulk write operations are batched and sent to the server in arbitrary order where they may be executed in parallel. Any errors that occur are reported after all operations are attempted.

In the next example the first and third operations fail due to the unique constraint on _id. Since we are doing unordered execution the second and fourth operations succeed.

>>> requests = [
...     InsertOne({'_id': 1}),
...     DeleteOne({'_id': 2}),
...     InsertOne({'_id': 3}),
...     ReplaceOne({'_id': 4}, {'i': 1})]
>>> try:
...     db.test.bulk_write(requests, ordered=False)
... except BulkWriteError as bwe:
...     pprint(bwe.details)
...
{'nInserted': 0,
 'nMatched': 1,
 'nModified': 1,
 'nRemoved': 1,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [],
 'writeErrors': [{u'code': 11000,
                  u'errmsg': u'...E11000...duplicate key error...',
                  u'index': 0,...
                  u'op': {'_id': 1}},
                 {u'code': 11000,
                  u'errmsg': u'...E11000...duplicate key error...',
                  u'index': 2,...
                  u'op': {'_id': 3}}]}
Write Concern

Bulk operations are executed with the write_concern of the collection they are executed against. Write concern errors (e.g. wtimeout) will be reported after all operations are attempted, regardless of execution order.

::
>>> from pymongo import WriteConcern
>>> coll = db.get_collection(
...     'test', write_concern=WriteConcern(w=3, wtimeout=1))
>>> try:
...     coll.bulk_write([InsertOne({'a': i}) for i in range(4)])
... except BulkWriteError as bwe:
...     pprint(bwe.details)
...
{'nInserted': 4,
 'nMatched': 0,
 'nModified': 0,
 'nRemoved': 0,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [{u'code': 64...
                         u'errInfo': {u'wtimeout': True},
                         u'errmsg': u'waiting for replication timed out'}],
 'writeErrors': []}

Datetimes and Timezones

These examples show how to handle Python datetime.datetime objects correctly in PyMongo.

Basic Usage

PyMongo uses datetime.datetime objects for representing dates and times in MongoDB documents. Because MongoDB assumes that dates and times are in UTC, care should be taken to ensure that dates and times written to the database reflect UTC. For example, the following code stores the current UTC date and time into MongoDB:

>>> result = db.objects.insert_one(
...     {"last_modified": datetime.datetime.utcnow()})

Always use datetime.datetime.utcnow(), which returns the current time in UTC, instead of datetime.datetime.now(), which returns the current local time. Avoid doing this:

>>> result = db.objects.insert_one(
...     {"last_modified": datetime.datetime.now()})

The value for last_modified is very different between these two examples, even though both documents were stored at around the same local time. This will be confusing to the application that reads them:

>>> [doc['last_modified'] for doc in db.objects.find()]  
[datetime.datetime(2015, 7, 8, 18, 17, 28, 324000),
 datetime.datetime(2015, 7, 8, 11, 17, 42, 911000)]

bson.codec_options.CodecOptions has a tz_aware option that enables “aware” datetime.datetime objects, i.e., datetimes that know what timezone they’re in. By default, PyMongo retrieves naive datetimes:

>>> result = db.tzdemo.insert_one(
...     {'date': datetime.datetime(2002, 10, 27, 6, 0, 0)})
>>> db.tzdemo.find_one()['date']
datetime.datetime(2002, 10, 27, 6, 0)
>>> options = CodecOptions(tz_aware=True)
>>> db.get_collection('tzdemo', codec_options=options).find_one()['date']  
datetime.datetime(2002, 10, 27, 6, 0,
                  tzinfo=<bson.tz_util.FixedOffset object at 0x10583a050>)
Saving Datetimes with Timezones

When storing datetime.datetime objects that specify a timezone (i.e. they have a tzinfo property that isn’t None), PyMongo will convert those datetimes to UTC automatically:

>>> import pytz
>>> pacific = pytz.timezone('US/Pacific')
>>> aware_datetime = pacific.localize(
...     datetime.datetime(2002, 10, 27, 6, 0, 0))
>>> result = db.times.insert_one({"date": aware_datetime})
>>> db.times.find_one()['date']
datetime.datetime(2002, 10, 27, 14, 0)
Reading Time

As previously mentioned, by default all datetime.datetime objects returned by PyMongo will be naive but reflect UTC (i.e. the time as stored in MongoDB). By setting the tz_aware option on CodecOptions, datetime.datetime objects will be timezone-aware and have a tzinfo property that reflects the UTC timezone.

PyMongo 3.1 introduced a tzinfo property that can be set on CodecOptions to convert datetime.datetime objects to local time automatically. For example, if we wanted to read all times out of MongoDB in US/Pacific time:

>>> from bson.codec_options import CodecOptions
>>> db.times.find_one()['date']
datetime.datetime(2002, 10, 27, 14, 0)
>>> aware_times = db.times.with_options(codec_options=CodecOptions(
...     tz_aware=True,
...     tzinfo=pytz.timezone('US/Pacific')))
>>> result = aware_times.find_one()
datetime.datetime(2002, 10, 27, 6, 0,  # doctest: +NORMALIZE_WHITESPACE
                  tzinfo=<DstTzInfo 'US/Pacific' PST-1 day, 16:00:00 STD>)

Geospatial Indexing Example

This example shows how to create and use a GEO2D index in PyMongo. To create a spherical (earth-like) geospatial index use GEOSPHERE instead.

See also

The MongoDB documentation on

geo

Creating a Geospatial Index

Creating a geospatial index in pymongo is easy:

>>> from pymongo import MongoClient, GEO2D
>>> db = MongoClient().geo_example
>>> db.places.create_index([("loc", GEO2D)])
u'loc_2d'
Inserting Places

Locations in MongoDB are represented using either embedded documents or lists where the first two elements are coordinates. Here, we’ll insert a couple of example locations:

>>> result = db.places.insert_many([{"loc": [2, 5]},
...                                 {"loc": [30, 5]},
...                                 {"loc": [1, 2]},
...                                 {"loc": [4, 4]}])  
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...'), ObjectId('...'), ObjectId('...')]

Note

If specifying latitude and longitude coordinates in GEOSPHERE, list the longitude first and then latitude.

Querying

Using the geospatial index we can find documents near another point:

>>> import pprint
>>> for doc in db.places.find({"loc": {"$near": [3, 6]}}).limit(3):
...   pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}

Note

If using pymongo.GEOSPHERE, using $nearSphere is recommended.

The $maxDistance operator requires the use of SON:

>>> from bson.son import SON
>>> query = {"loc": SON([("$near", [3, 6]), ("$maxDistance", 100)])}
>>> for doc in db.places.find(query).limit(3):
...   pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}

It’s also possible to query for all items within a given rectangle (specified by lower-left and upper-right coordinates):

>>> query = {"loc": {"$within": {"$box": [[2, 2], [5, 6]]}}}
>>> for doc in db.places.find(query).sort('_id'):
...     pprint.pprint(doc)
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}

Or circle (specified by center point and radius):

>>> query = {"loc": {"$within": {"$center": [[0, 0], 6]}}}
>>> for doc in db.places.find(query).sort('_id'):
...   pprint.pprint(doc)
...
{u'_id': ObjectId('...'), u'loc': [2, 5]}
{u'_id': ObjectId('...'), u'loc': [1, 2]}
{u'_id': ObjectId('...'), u'loc': [4, 4]}

geoNear queries are also supported using SON:

>>> from bson.son import SON
>>> db.command(SON([('geoNear', 'places'), ('near', [1, 2])]))
{u'ok': 1.0, u'stats': ...}

Warning

Starting in MongoDB version 4.0, MongoDB deprecates the geoNear command. Use one of the following operations instead.

  • $geoNear - aggregation stage.
  • $near - query operator.
  • $nearSphere - query operator.

Gevent

PyMongo supports Gevent. Simply call Gevent’s monkey.patch_all() before loading any other modules:

>>> # You must call patch_all() *before* importing any other modules
>>> from gevent import monkey
>>> _ = monkey.patch_all()
>>> from pymongo import MongoClient
>>> client = MongoClient()

PyMongo uses thread and socket functions from the Python standard library. Gevent’s monkey-patching replaces those standard functions so that PyMongo does asynchronous I/O with non-blocking sockets, and schedules operations on greenlets instead of threads.

Avoid blocking in Hub.join

By default, PyMongo uses threads to discover and monitor your servers’ topology (see Health Monitoring). If you execute monkey.patch_all() when your application first begins, PyMongo automatically uses greenlets instead of threads.

When shutting down, if your application calls join() on Gevent’s Hub without first terminating these background greenlets, the call to join() blocks indefinitely. You therefore must close or dereference any active MongoClient before exiting.

An example solution to this issue in some application frameworks is a signal handler to end background greenlets when your application receives SIGHUP:

import signal

def graceful_reload(signum, traceback):
    """Explicitly close some global MongoClient object."""
    client.close()

signal.signal(signal.SIGHUP, graceful_reload)

Applications using uWSGI prior to 1.9.16 are affected by this issue, or newer uWSGI versions with the -gevent-wait-for-hub option. See the uWSGI changelog for details.

GridFS Example

This example shows how to use gridfs to store large binary objects (e.g. files) in MongoDB.

See also

The API docs for gridfs.

See also

This blog post for some motivation behind this API.

Setup

We start by creating a GridFS instance to use:

>>> from pymongo import MongoClient
>>> import gridfs
>>>
>>> db = MongoClient().gridfs_example
>>> fs = gridfs.GridFS(db)

Every GridFS instance is created with and will operate on a specific Database instance.

Saving and Retrieving Data

The simplest way to work with gridfs is to use its key/value interface (the put() and get() methods). To write data to GridFS, use put():

>>> a = fs.put(b"hello world")

put() creates a new file in GridFS, and returns the value of the file document’s "_id" key. Given that "_id" we can use get() to get back the contents of the file:

>>> fs.get(a).read()
'hello world'

get() returns a file-like object, so we get the file’s contents by calling read().

In addition to putting a str as a GridFS file, we can also put any file-like object (an object with a read() method). GridFS will handle reading the file in chunk-sized segments automatically. We can also add additional attributes to the file as keyword arguments:

>>> b = fs.put(fs.get(a), filename="foo", bar="baz")
>>> out = fs.get(b)
>>> out.read()
'hello world'
>>> out.filename
u'foo'
>>> out.bar
u'baz'
>>> out.upload_date
datetime.datetime(...)

The attributes we set in put() are stored in the file document, and retrievable after calling get(). Some attributes (like "filename") are special and are defined in the GridFS specification - see that document for more details.

High Availability and PyMongo

PyMongo makes it easy to write highly available applications whether you use a single replica set or a large sharded cluster.

Connecting to a Replica Set

PyMongo makes working with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.

See also

The MongoDB documentation on

rs

Starting a Replica Set

The main replica set documentation contains extensive information about setting up a new replica set or migrating an existing MongoDB setup, be sure to check that out. Here, we’ll just do the bare minimum to get a three node replica set setup locally.

Warning

Replica sets should always use multiple nodes in production - putting all set members on the same physical node is only recommended for testing and development.

We start three mongod processes, each on a different port and with a different dbpath, but all using the same replica set name “foo”.

$ mkdir -p /data/db0 /data/db1 /data/db2
$ mongod --port 27017 --dbpath /data/db0 --replSet foo
$ mongod --port 27018 --dbpath /data/db1 --replSet foo
$ mongod --port 27019 --dbpath /data/db2 --replSet foo
Initializing the Set

At this point all of our nodes are up and running, but the set has yet to be initialized. Until the set is initialized no node will become the primary, and things are essentially “offline”.

To initialize the set we need to connect to a single node and run the initiate command:

>>> from pymongo import MongoClient
>>> c = MongoClient('localhost', 27017)

Note

We could have connected to any of the other nodes instead, but only the node we initiate from is allowed to contain any initial data.

After connecting, we run the initiate command to get things started:

>>> config = {'_id': 'foo', 'members': [
...     {'_id': 0, 'host': 'localhost:27017'},
...     {'_id': 1, 'host': 'localhost:27018'},
...     {'_id': 2, 'host': 'localhost:27019'}]}
>>> c.admin.command("replSetInitiate", config)
{'ok': 1.0, ...}

The three mongod servers we started earlier will now coordinate and come online as a replica set.

Connecting to a Replica Set

The initial connection as made above is a special case for an uninitialized replica set. Normally we’ll want to connect differently. A connection to a replica set can be made using the MongoClient() constructor, specifying one or more members of the set, along with the replica set name. Any of the following connects to the replica set we just created:

>>> MongoClient('localhost', replicaset='foo')
MongoClient(host=['localhost:27017'], replicaset='foo', ...)
>>> MongoClient('localhost:27018', replicaset='foo')
MongoClient(['localhost:27018'], replicaset='foo', ...)
>>> MongoClient('localhost', 27019, replicaset='foo')
MongoClient(['localhost:27019'], replicaset='foo', ...)
>>> MongoClient('mongodb://localhost:27017,localhost:27018/?replicaSet=foo')
MongoClient(['localhost:27017', 'localhost:27018'], replicaset='foo', ...)

The addresses passed to MongoClient() are called the seeds. As long as at least one of the seeds is online, MongoClient discovers all the members in the replica set, and determines which is the current primary and which are secondaries or arbiters. Each seed must be the address of a single mongod. Multihomed and round robin DNS addresses are not supported.

The MongoClient constructor is non-blocking: the constructor returns immediately while the client connects to the replica set using background threads. Note how, if you create a client and immediately print the string representation of its nodes attribute, the list may be empty initially. If you wait a moment, MongoClient discovers the whole replica set:

>>> from time import sleep
>>> c = MongoClient(replicaset='foo'); print(c.nodes); sleep(0.1); print(c.nodes)
frozenset([])
frozenset([(u'localhost', 27019), (u'localhost', 27017), (u'localhost', 27018)])

You need not wait for replica set discovery in your application, however. If you need to do any operation with a MongoClient, such as a find() or an insert_one(), the client waits to discover a suitable member before it attempts the operation.

Handling Failover

When a failover occurs, PyMongo will automatically attempt to find the new primary node and perform subsequent operations on that node. This can’t happen completely transparently, however. Here we’ll perform an example failover to illustrate how everything behaves. First, we’ll connect to the replica set and perform a couple of basic operations:

>>> db = MongoClient("localhost", replicaSet='foo').test
>>> db.test.insert_one({"x": 1}).inserted_id
ObjectId('...')
>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}

By checking the host and port, we can see that we’re connected to localhost:27017, which is the current primary:

>>> db.client.address
('localhost', 27017)

Now let’s bring down that node and see what happens when we run our query again:

>>> db.test.find_one()
Traceback (most recent call last):
pymongo.errors.AutoReconnect: ...

We get an AutoReconnect exception. This means that the driver was not able to connect to the old primary (which makes sense, as we killed the server), but that it will attempt to automatically reconnect on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.

On subsequent attempts to run the query we might continue to see this exception. Eventually, however, the replica set will failover and elect a new primary (this should take no more than a couple of seconds in general). At that point the driver will connect to the new primary and the operation will succeed:

>>> db.test.find_one()
{u'x': 1, u'_id': ObjectId('...')}
>>> db.client.address
('localhost', 27018)

Bring the former primary back up. It will rejoin the set as a secondary. Now we can move to the next section: distributing reads to secondaries.

Secondary Reads

By default an instance of MongoClient sends queries to the primary member of the replica set. To use secondaries for queries we have to change the read preference:

>>> client = MongoClient(
...     'localhost:27017',
...     replicaSet='foo',
...     readPreference='secondaryPreferred')
>>> client.read_preference
SecondaryPreferred(tag_sets=None)

Now all queries will be sent to the secondary members of the set. If there are no secondary members the primary will be used as a fallback. If you have queries you would prefer to never send to the primary you can specify that using the secondary read preference.

By default the read preference of a Database is inherited from its MongoClient, and the read preference of a Collection is inherited from its Database. To use a different read preference use the get_database() method, or the get_collection() method:

>>> from pymongo import ReadPreference
>>> client.read_preference
SecondaryPreferred(tag_sets=None)
>>> db = client.get_database('test', read_preference=ReadPreference.SECONDARY)
>>> db.read_preference
Secondary(tag_sets=None)
>>> coll = db.get_collection('test', read_preference=ReadPreference.PRIMARY)
>>> coll.read_preference
Primary()

You can also change the read preference of an existing Collection with the with_options() method:

>>> coll2 = coll.with_options(read_preference=ReadPreference.NEAREST)
>>> coll.read_preference
Primary()
>>> coll2.read_preference
Nearest(tag_sets=None)

Note that since most database commands can only be sent to the primary of a replica set, the command() method does not obey the Database’s read_preference, but you can pass an explicit read preference to the method:

>>> db.command('dbstats', read_preference=ReadPreference.NEAREST)
{...}

Reads are configured using three options: read preference, tag sets, and local threshold.

Read preference:

Read preference is configured using one of the classes from read_preferences (Primary, PrimaryPreferred, Secondary, SecondaryPreferred, or Nearest). For convenience, we also provide ReadPreference with the following attributes:

  • PRIMARY: Read from the primary. This is the default read preference, and provides the strongest consistency. If no primary is available, raise AutoReconnect.
  • PRIMARY_PREFERRED: Read from the primary if available, otherwise read from a secondary.
  • SECONDARY: Read from a secondary. If no matching secondary is available, raise AutoReconnect.
  • SECONDARY_PREFERRED: Read from a secondary if available, otherwise from the primary.
  • NEAREST: Read from any available member.

Tag sets:

Replica-set members can be tagged according to any criteria you choose. By default, PyMongo ignores tags when choosing a member to read from, but your read preference can be configured with a tag_sets parameter. tag_sets must be a list of dictionaries, each dict providing tag values that the replica set member must match. PyMongo tries each set of tags in turn until it finds a set of tags with at least one matching member. For example, to prefer reads from the New York data center, but fall back to the San Francisco data center, tag your replica set members according to their location and create a MongoClient like so:

>>> from pymongo.read_preferences import Secondary
>>> db = client.get_database(
...     'test', read_preference=Secondary([{'dc': 'ny'}, {'dc': 'sf'}]))
>>> db.read_preference
Secondary(tag_sets=[{'dc': 'ny'}, {'dc': 'sf'}])

MongoClient tries to find secondaries in New York, then San Francisco, and raises AutoReconnect if none are available. As an additional fallback, specify a final, empty tag set, {}, which means “read from any member that matches the mode, ignoring tags.”

See read_preferences for more information.

Local threshold:

If multiple members match the read preference and tag sets, PyMongo reads from among the nearest members, chosen according to ping time. By default, only members whose ping times are within 15 milliseconds of the nearest are used for queries. You can choose to distribute reads among members with higher latencies by setting localThresholdMS to a larger number:

>>> client = pymongo.MongoClient(
...     replicaSet='repl0',
...     readPreference='secondaryPreferred',
...     localThresholdMS=35)

In this case, PyMongo distributes reads among matching members within 35 milliseconds of the closest member’s ping time.

Note

localThresholdMS is ignored when talking to a replica set through a mongos. The equivalent is the localThreshold command line option.

Health Monitoring

When MongoClient is initialized it launches background threads to monitor the replica set for changes in:

  • Health: detect when a member goes down or comes up, or if a different member becomes primary
  • Configuration: detect when members are added or removed, and detect changes in members’ tags
  • Latency: track a moving average of each member’s ping time

Replica-set monitoring ensures queries are continually routed to the proper members as the state of the replica set changes.

mongos Load Balancing

An instance of MongoClient can be configured with a list of addresses of mongos servers:

>>> client = MongoClient('mongodb://host1,host2,host3')

Each member of the list must be a single mongos server. Multihomed and round robin DNS addresses are not supported. The client continuously monitors all the mongoses’ availability, and its network latency to each.

PyMongo distributes operations evenly among the set of mongoses within its localThresholdMS (similar to how it distributes reads to secondaries in a replica set). By default the threshold is 15 ms.

The lowest-latency server, and all servers with latencies no more than localThresholdMS beyond the lowest-latency server’s, receive operations equally. For example, if we have three mongoses:

  • host1: 20 ms
  • host2: 35 ms
  • host3: 40 ms

By default the localThresholdMS is 15 ms, so PyMongo uses host1 and host2 evenly. It uses host1 because its network latency to the driver is shortest. It uses host2 because its latency is within 15 ms of the lowest-latency server’s. But it excuses host3: host3 is 20ms beyond the lowest-latency server.

If we set localThresholdMS to 30 ms all servers are within the threshold:

>>> client = MongoClient('mongodb://host1,host2,host3/?localThresholdMS=30')

Warning

Do not connect PyMongo to a pool of mongos instances through a load balancer. A single socket connection must always be routed to the same mongos instance for proper cursor support.

PyMongo and mod_wsgi

To run your application under mod_wsgi, follow these guidelines:

  • Run mod_wsgi in daemon mode with the WSGIDaemonProcess directive.
  • Assign each application to a separate daemon with WSGIProcessGroup.
  • Use WSGIApplicationGroup %{GLOBAL} to ensure your application is running in the daemon’s main Python interpreter, not a sub interpreter.

For example, this mod_wsgi configuration ensures an application runs in the main interpreter:

<VirtualHost *>
    WSGIDaemonProcess my_process
    WSGIScriptAlias /my_app /path/to/app.wsgi
    WSGIProcessGroup my_process
    WSGIApplicationGroup %{GLOBAL}
</VirtualHost>

If you have multiple applications that use PyMongo, put each in a separate daemon, still in the global application group:

<VirtualHost *>
    WSGIDaemonProcess my_process
    WSGIScriptAlias /my_app /path/to/app.wsgi
    <Location /my_app>
        WSGIProcessGroup my_process
    </Location>

    WSGIDaemonProcess my_other_process
    WSGIScriptAlias /my_other_app /path/to/other_app.wsgi
    <Location /my_other_app>
        WSGIProcessGroup my_other_process
    </Location>

    WSGIApplicationGroup %{GLOBAL}
</VirtualHost>

Background: mod_wsgi can run in “embedded” mode when only WSGIScriptAlias is set, or “daemon” mode with WSGIDaemonProcess. In daemon mode, mod_wsgi can run your application in the Python main interpreter, or in sub interpreters. The correct way to run a PyMongo application is in daemon mode, using the main interpreter.

Python C extensions in general have issues running in multiple Python sub interpreters. These difficulties are explained in the documentation for Py_NewInterpreter and in the Multiple Python Sub Interpreters section of the mod_wsgi documentation.

Beginning with PyMongo 2.7, the C extension for BSON detects when it is running in a sub interpreter and activates a workaround, which adds a small cost to BSON decoding. To avoid this cost, use WSGIApplicationGroup %{GLOBAL} to ensure your application runs in the main interpreter.

Since your program runs in the main interpreter it should not share its process with any other applications, lest they interfere with each other’s state. Each application should have its own daemon process, as shown in the example above.

Server Selector Example

Users can exert fine-grained control over the server selection algorithm by setting the server_selector option on the MongoClient to an appropriate callable. This example shows how to use this functionality to prefer servers running on localhost.

Warning

Use of custom server selector functions is a power user feature. Misusing custom server selectors can have unintended consequences such as degraded read/write performance.

Example: Selecting Servers Running on localhost

To start, we need to write the server selector function that will be used. The server selector function should accept a list of ServerDescription objects and return a list of server descriptions that are suitable for the read or write operation. A server selector must not create or modify ServerDescription objects, and must return the selected instances unchanged.

In this example, we write a server selector that prioritizes servers running on localhost. This can be desirable when using a sharded cluster with multiple mongos, as locally run queries are likely to see lower latency and higher throughput. Please note, however, that it is highly dependent on the application if preferring localhost is beneficial or not.

In addition to comparing the hostname with localhost, our server selector function accounts for the edge case when no servers are running on localhost. In this case, we allow the default server selection logic to prevail by passing through the received server description list unchanged. Failure to do this would render the client unable to communicate with MongoDB in the event that no servers were running on localhost.

The described server selection logic is implemented in the following server selector function:

>>> def server_selector(server_descriptions):
...     servers = [
...         server for server in server_descriptions
...         if server.address[0] == 'localhost'
...     ]
...     if not servers:
...         return server_descriptions
...     return servers

Finally, we can create a MongoClient instance with this server selector.

>>> client = MongoClient(server_selector=server_selector)
Server Selection Process

This section dives deeper into the server selection process for reads and writes. In the case of a write, the driver performs the following operations (in order) during the selection process:

  1. Select all writeable servers from the list of known hosts. For a replica set this is the primary, while for a sharded cluster this is all the known mongoses.
  2. Apply the user-defined server selector function. Note that the custom server selector is not called if there are no servers left from the previous filtering stage.
  3. Apply the localThresholdMS setting to the list of remaining hosts. This whittles the host list down to only contain servers whose latency is at most localThresholdMS milliseconds higher than the lowest observed latency.
  4. Select a server at random from the remaining host list. The desired operation is then performed against the selected server.

In the case of reads the process is identical except for the first step. Here, instead of selecting all writeable servers, we select all servers matching the user’s ReadPreference from the list of known hosts. As an example, for a 3-member replica set with a Secondary read preference, we would select all available secondaries.

Tailable Cursors

By default, MongoDB will automatically close a cursor when the client has exhausted all results in the cursor. However, for capped collections you may use a tailable cursor that remains open after the client exhausts the results in the initial cursor.

The following is a basic example of using a tailable cursor to tail the oplog of a replica set member:

import time

import pymongo

client = pymongo.MongoClient()
oplog = client.local.oplog.rs
first = oplog.find().sort('$natural', pymongo.ASCENDING).limit(-1).next()
print(first)
ts = first['ts']

while True:
    # For a regular capped collection CursorType.TAILABLE_AWAIT is the
    # only option required to create a tailable cursor. When querying the
    # oplog the oplog_replay option enables an optimization to quickly
    # find the 'ts' value we're looking for. The oplog_replay option
    # can only be used when querying the oplog.
    cursor = oplog.find({'ts': {'$gt': ts}},
                        cursor_type=pymongo.CursorType.TAILABLE_AWAIT,
                        oplog_replay=True)
    while cursor.alive:
        for doc in cursor:
            ts = doc['ts']
            print(doc)
        # We end up here if the find() returned no documents or if the
        # tailable cursor timed out (no new documents were added to the
        # collection for more than 1 second).
        time.sleep(1)

TLS/SSL and PyMongo

PyMongo supports connecting to MongoDB over TLS/SSL. This guide covers the configuration options supported by PyMongo. See the server documentation to configure MongoDB.

Dependencies

For connections using TLS/SSL, PyMongo may require third party dependencies as determined by your version of Python. With PyMongo 3.3+, you can install PyMongo 3.3+ and any TLS/SSL-related dependencies using the following pip command:

$ python -m pip install pymongo[tls]

Earlier versions of PyMongo require you to manually install the dependencies listed below.

Python 2.x

The ipaddress module is required on all platforms.

When using CPython < 2.7.9 or PyPy < 2.5.1:

  • On Windows, the wincertstore module is required.
  • On all other platforms, the certifi module is required.

Warning

Industry best practices recommend, and some regulations require, the use of TLS 1.1 or newer. Though no application changes are required for PyMongo to make use of the newest protocols, some operating systems or versions may not provide an OpenSSL version new enough to support them.

Users of macOS older than 10.13 (High Sierra) will need to install Python from python.org, homebrew, macports, or another similar source.

Users of Linux or other non-macOS Unix can check their OpenSSL version like this:

$ openssl version

If the version number is less than 1.0.1 support for TLS 1.1 or newer is not available. Contact your operating system vendor for a solution or upgrade to a newer distribution.

You can check your Python interpreter by installing the requests module and executing the following command:

python -c "import requests; print(requests.get('https://www.howsmyssl.com/a/check', verify=False).json()['tls_version'])"

You should see “TLS 1.X” where X is >= 1.

You can read more about TLS versions and their security implications here:

https://www.owasp.org/index.php/Transport_Layer_Protection_Cheat_Sheet#Rule_-_Only_Support_Strong_Protocols

Basic configuration

In many cases connecting to MongoDB over TLS/SSL requires nothing more than passing ssl=True as a keyword argument to MongoClient:

>>> client = pymongo.MongoClient('example.com', ssl=True)

Or passing ssl=true in the URI:

>>> client = pymongo.MongoClient('mongodb://example.com/?ssl=true')

This configures PyMongo to connect to the server using TLS, verify the server’s certificate and verify that the host you are attempting to connect to is listed by that certificate.

Certificate verification policy

By default, PyMongo is configured to require a certificate from the server when TLS is enabled. This is configurable using the ssl_cert_reqs option. To disable this requirement pass ssl.CERT_NONE as a keyword parameter:

>>> import ssl
>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_cert_reqs=ssl.CERT_NONE)

Or, in the URI:

>>> uri = 'mongodb://example.com/?ssl=true&ssl_cert_reqs=CERT_NONE'
>>> client = pymongo.MongoClient(uri)
Specifying a CA file

In some cases you may want to configure PyMongo to use a specific set of CA certificates. This is most often the case when using “self-signed” server certificates. The ssl_ca_certs option takes a path to a CA file. It can be passed as a keyword argument:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_ca_certs='/path/to/ca.pem')

Or, in the URI:

>>> uri = 'mongodb://example.com/?ssl=true&ssl_ca_certs=/path/to/ca.pem'
>>> client = pymongo.MongoClient(uri)
Specifying a certificate revocation list

Python 2.7.9+ (pypy 2.5.1+) and 3.4+ provide support for certificate revocation lists. The ssl_crlfile option takes a path to a CRL file. It can be passed as a keyword argument:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_crlfile='/path/to/crl.pem')

Or, in the URI:

>>> uri = 'mongodb://example.com/?ssl=true&ssl_crlfile=/path/to/crl.pem'
>>> client = pymongo.MongoClient(uri)
Client certificates

PyMongo can be configured to present a client certificate using the ssl_certfile option:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_certfile='/path/to/client.pem')

If the private key for the client certificate is stored in a separate file use the ssl_keyfile option:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_certfile='/path/to/client.pem',
...                              ssl_keyfile='/path/to/key.pem')

Python 2.7.9+ (pypy 2.5.1+) and 3.3+ support providing a password or passphrase to decrypt encrypted private keys. Use the ssl_pem_passphrase option:

>>> client = pymongo.MongoClient('example.com',
...                              ssl=True,
...                              ssl_certfile='/path/to/client.pem',
...                              ssl_keyfile='/path/to/key.pem',
...                              ssl_pem_passphrase=<passphrase>)

These options can also be passed as part of the MongoDB URI.

Troubleshooting TLS Errors

TLS errors often fall into two categories, certificate verification failure or protocol version mismatch. An error message similar to the following means that OpenSSL was not able to verify the server’s certificate:

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed

This often occurs because OpenSSL does not have access to the system’s root certificates or the certificates are out of date. Linux users should ensure that they have the latest root certificate updates installed from their Linux vendor. macOS users using Python 3.6.0 or newer downloaded from python.org may have to run a script included with python to install root certificates:

open "/Applications/Python <YOUR PYTHON VERSION>/Install Certificates.command"

Users of older PyPy portable versions may have to set an environment variable to tell OpenSSL where to find root certificates. This is easily done using the certifi module from pypi:

$ pypy -m pip install certifi
$ export SSL_CERT_FILE=$(pypy -c "import certifi; print(certifi.where())")

An error message similar to the following message means that the OpenSSL version used by Python does not support a new enough TLS protocol to connect to the server:

[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version

Industry best practices recommend, and some regulations require, that older TLS protocols be disabled in some MongoDB deployments. Some deployments may disable TLS 1.0, others may disable TLS 1.0 and TLS 1.1. See the warning earlier in this document for troubleshooting steps and solutions.

Frequently Asked Questions

Is PyMongo thread-safe?

PyMongo is thread-safe and provides built-in connection pooling for threaded applications.

Is PyMongo fork-safe?

PyMongo is not fork-safe. Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to the inherent incompatibilities between fork(), threads, and locks described below. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.

MongoClient spawns multiple threads to run background tasks such as monitoring connected servers. These threads share state that is protected by instances of Lock, which are themselves not fork-safe. The driver is therefore subject to the same limitations as any other multithreaded code that uses Lock (and mutexes in general). One of these limitations is that the locks become useless after fork(). During the fork, all locks are copied over to the child process in the same state as they were in the parent: if they were locked, the copied locks are also locked. The child created by fork() only has one thread, so any locks that were taken out by other threads in the parent will never be released in the child. The next time the child process attempts to acquire one of these locks, deadlock occurs.

For a long but interesting read about the problems of Python locks in multithreaded contexts with fork(), see http://bugs.python.org/issue6721.

How does connection pooling work in PyMongo?

Every MongoClient instance has a built-in connection pool per server in your MongoDB topology. These pools open sockets on demand to support the number of concurrent MongoDB operations that your multi-threaded application requires. There is no thread-affinity for sockets.

The size of each connection pool is capped at maxPoolSize, which defaults to 100. If there are maxPoolSize connections to a server and all are in use, the next request to that server will wait until one of the connections becomes available.

The client instance opens one additional socket per server in your MongoDB topology for monitoring the server’s state.

For example, a client connected to a 3-node replica set opens 3 monitoring sockets. It also opens as many sockets as needed to support a multi-threaded application’s concurrent operations on each server, up to maxPoolSize. With a maxPoolSize of 100, if the application only uses the primary (the default), then only the primary connection pool grows and the total connections is at most 103. If the application uses a ReadPreference to query the secondaries, their pools also grow and the total connections can reach 303.

It is possible to set the minimum number of concurrent connections to each server with minPoolSize, which defaults to 0. The connection pool will be initialized with this number of sockets. If sockets are closed due to any network errors, causing the total number of sockets (both in use and idle) to drop below the minimum, more sockets are opened until the minimum is reached.

The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced can be set with maxIdleTime, which defaults to None (no limit).

The default configuration for a MongoClient works for most applications:

client = MongoClient(host, port)

Create this client once for each process, and reuse it for all operations. It is a common mistake to create a new client for each request, which is very inefficient.

To support extremely high numbers of concurrent MongoDB operations within one process, increase maxPoolSize:

client = MongoClient(host, port, maxPoolSize=200)

… or make it unbounded:

client = MongoClient(host, port, maxPoolSize=None)

Once the pool reaches its maximum size, additional threads have to wait for sockets to become available. PyMongo does not limit the number of threads that can wait for sockets to become available and it is the application’s responsibility to limit the size of its thread pool to bound queuing during a load spike. Threads are allowed to wait for any length of time unless waitQueueTimeoutMS is defined:

client = MongoClient(host, port, waitQueueTimeoutMS=100)

A thread that waits more than 100ms (in this example) for a socket raises ConnectionFailure. Use this option if it is more important to bound the duration of operations during a load spike than it is to complete every operation.

When close() is called by any thread, all idle sockets are closed, and all sockets that are in use will be closed as they are returned to the pool.

Does PyMongo support Python 3?

PyMongo supports CPython 3.4+ and PyPy3.5+. See the Python 3 FAQ for details.

Does PyMongo support asynchronous frameworks like Gevent, asyncio, Tornado, or Twisted?

PyMongo fully supports Gevent.

To use MongoDB with asyncio or Tornado, see the Motor project.

For Twisted, see TxMongo. Its stated mission is to keep feature parity with PyMongo.

Why does PyMongo add an _id field to all of my documents?

When a document is inserted to MongoDB using insert_one(), insert_many(), or bulk_write(), and that document does not include an _id field, PyMongo automatically adds one for you, set to an instance of ObjectId. For example:

>>> my_doc = {'x': 1}
>>> collection.insert_one(my_doc)
<pymongo.results.InsertOneResult object at 0x7f3fc25bd640>
>>> my_doc
{'x': 1, '_id': ObjectId('560db337fba522189f171720')}

Users often discover this behavior when calling insert_many() with a list of references to a single document raises BulkWriteError. Several Python idioms lead to this pitfall:

>>> doc = {}
>>> collection.insert_many(doc for _ in range(10))
Traceback (most recent call last):
...
pymongo.errors.BulkWriteError: batch op errors occurred
>>> doc
{'_id': ObjectId('560f171cfba52279f0b0da0c')}

>>> docs = [{}]
>>> collection.insert_many(docs * 10)
Traceback (most recent call last):
...
pymongo.errors.BulkWriteError: batch op errors occurred
>>> docs
[{'_id': ObjectId('560f1933fba52279f0b0da0e')}]

PyMongo adds an _id field in this manner for a few reasons:

  • All MongoDB documents are required to have an _id field.
  • If PyMongo were to insert a document without an _id MongoDB would add one itself, but it would not report the value back to PyMongo.
  • Copying the document to insert before adding the _id field would be prohibitively expensive for most high write volume applications.

If you don’t want PyMongo to add an _id to your documents, insert only documents that already have an _id field, added by your application.

Key order in subdocuments – why does my query work in the shell but not PyMongo?

The key-value pairs in a BSON document can have any order (except that _id is always first). The mongo shell preserves key order when reading and writing data. Observe that “b” comes before “a” when we create the document and when it is displayed:

> // mongo shell.
> db.collection.insert( { "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } } )
WriteResult({ "nInserted" : 1 })
> db.collection.find()
{ "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }

PyMongo represents BSON documents as Python dicts by default, and the order of keys in dicts is not defined. That is, a dict declared with the “a” key first is the same, to Python, as one with “b” first:

>>> print({'a': 1.0, 'b': 1.0})
{'a': 1.0, 'b': 1.0}
>>> print({'b': 1.0, 'a': 1.0})
{'a': 1.0, 'b': 1.0}

Therefore, Python dicts are not guaranteed to show keys in the order they are stored in BSON. Here, “a” is shown before “b”:

>>> print(collection.find_one())
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

To preserve order when reading BSON, use the SON class, which is a dict that remembers its key order. First, get a handle to the collection, configured to use SON instead of dict:

>>> from bson import CodecOptions, SON
>>> opts = CodecOptions(document_class=SON)
>>> opts
CodecOptions(document_class=<class 'bson.son.SON'>,
             tz_aware=False,
             uuid_representation=PYTHON_LEGACY,
             unicode_decode_error_handler='strict',
             tzinfo=None, type_registry=TypeRegistry(type_codecs=[],
                                                     fallback_encoder=None))
>>> collection_son = collection.with_options(codec_options=opts)

Now, documents and subdocuments in query results are represented with SON objects:

>>> print(collection_son.find_one())
SON([(u'_id', 1.0), (u'subdocument', SON([(u'b', 1.0), (u'a', 1.0)]))])

The subdocument’s actual storage layout is now visible: “b” is before “a”.

Because a dict’s key order is not defined, you cannot predict how it will be serialized to BSON. But MongoDB considers subdocuments equal only if their keys have the same order. So if you use a dict to query on a subdocument it may not match:

>>> collection.find_one({'subdocument': {'a': 1.0, 'b': 1.0}}) is None
True

Swapping the key order in your query makes no difference:

>>> collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None
True

… because, as we saw above, Python considers the two dicts the same.

There are two solutions. First, you can match the subdocument field-by-field:

>>> collection.find_one({'subdocument.a': 1.0,
...                      'subdocument.b': 1.0})
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The query matches any subdocument with an “a” of 1.0 and a “b” of 1.0, regardless of the order you specify them in Python or the order they are stored in BSON. Additionally, this query now matches subdocuments with additional keys besides “a” and “b”, whereas the previous query required an exact match.

The second solution is to use a SON to specify the key order:

>>> query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
>>> collection.find_one(query)
{u'_id': 1.0, u'subdocument': {u'a': 1.0, u'b': 1.0}}

The key order you use when you create a SON is preserved when it is serialized to BSON and used as a query. Thus you can create a subdocument that exactly matches the subdocument in the collection.

What does CursorNotFound cursor id not valid at server mean?

Cursors in MongoDB can timeout on the server if they’ve been open for a long time without any operations being performed on them. This can lead to an CursorNotFound exception being raised when attempting to iterate the cursor.

How do I change the timeout value for cursors?

MongoDB doesn’t support custom timeouts for cursors, but cursor timeouts can be turned off entirely. Pass no_cursor_timeout=True to find().

How can I store decimal.Decimal instances?

PyMongo >= 3.4 supports the Decimal128 BSON type introduced in MongoDB 3.4. See decimal128 for more information.

MongoDB <= 3.2 only supports IEEE 754 floating points - the same as the Python float type. The only way PyMongo could store Decimal instances to these versions of MongoDB would be to convert them to this standard, so you’d really only be storing floats anyway - we force users to do this conversion explicitly so that they are aware that it is happening.

I’m saving 9.99 but when I query my document contains 9.9900000000000002 - what’s going on here?

The database representation is 9.99 as an IEEE floating point (which is common to MongoDB and Python as well as most other modern languages). The problem is that 9.99 cannot be represented exactly with a double precision floating point - this is true in some versions of Python as well:

>>> 9.99
9.9900000000000002

The result that you get when you save 9.99 with PyMongo is exactly the same as the result you’d get saving it with the JavaScript shell or any of the other languages (and as the data you’re working with when you type 9.99 into a Python program).

Can you add attribute style access for documents?

This request has come up a number of times but we’ve decided not to implement anything like this. The relevant jira case has some information about the decision, but here is a brief summary:

  1. This will pollute the attribute namespace for documents, so could lead to subtle bugs / confusing errors when using a key with the same name as a dictionary method.
  2. The only reason we even use SON objects instead of regular dictionaries is to maintain key ordering, since the server requires this for certain operations. So we’re hesitant to needlessly complicate SON (at some point it’s hypothetically possible we might want to revert back to using dictionaries alone, without breaking backwards compatibility for everyone).
  3. It’s easy (and Pythonic) for new users to deal with documents, since they behave just like dictionaries. If we start changing their behavior it adds a barrier to entry for new users - another class to learn.

How can I save a datetime.date instance?

PyMongo doesn’t support saving datetime.date instances, since there is no BSON type for dates without times. Rather than having the driver enforce a convention for converting datetime.date instances to datetime.datetime instances for you, any conversion should be performed in your client code.

When I query for a document by ObjectId in my web application I get no result

It’s common in web applications to encode documents’ ObjectIds in URLs, like:

"/posts/50b3bda58a02fb9a84d8991e"

Your web framework will pass the ObjectId portion of the URL to your request handler as a string, so it must be converted to ObjectId before it is passed to find_one(). It is a common mistake to forget to do this conversion. Here’s how to do it correctly in Flask (other web frameworks are similar):

from pymongo import MongoClient
from bson.objectid import ObjectId

from flask import Flask, render_template

client = MongoClient()
app = Flask(__name__)

@app.route("/posts/<_id>")
def show_post(_id):
   # NOTE!: converting _id from string to ObjectId before passing to find_one
   post = client.db.posts.find_one({'_id': ObjectId(_id)})
   return render_template('post.html', post=post)

if __name__ == "__main__":
    app.run()

How can I use PyMongo from Django?

Django is a popular Python web framework. Django includes an ORM, django.db. Currently, there’s no official MongoDB backend for Django.

django-mongodb-engine is an unofficial MongoDB backend that supports Django aggregations, (atomic) updates, embedded objects, Map/Reduce and GridFS. It allows you to use most of Django’s built-in features, including the ORM, admin, authentication, site and session frameworks and caching.

However, it’s easy to use MongoDB (and PyMongo) from Django without using a Django backend. Certain features of Django that require django.db (admin, authentication and sessions) will not work using just MongoDB, but most of what Django provides can still be used.

One project which should make working with MongoDB and Django easier is mango. Mango is a set of MongoDB backends for Django sessions and authentication (bypassing django.db entirely).

Does PyMongo work with mod_wsgi?

Yes. See the configuration guide for PyMongo and mod_wsgi.

Does PyMongo work with PythonAnywhere?

No. PyMongo creates Python threads which PythonAnywhere does not support. For more information see PYTHON-1495.

How can I use something like Python’s json module to encode my documents to JSON?

json_util is PyMongo’s built in, flexible tool for using Python’s json module with BSON documents and MongoDB Extended JSON. The json module won’t work out of the box with all documents from PyMongo as PyMongo supports some special types (like ObjectId and DBRef) that are not supported in JSON.

python-bsonjs is a fast BSON to MongoDB Extended JSON converter built on top of libbson. python-bsonjs does not depend on PyMongo and can offer a nice performance improvement over json_util. python-bsonjs works best with PyMongo when using RawBSONDocument.

Why do I get OverflowError decoding dates stored by another language’s driver?

PyMongo decodes BSON datetime values to instances of Python’s datetime.datetime. Instances of datetime.datetime are limited to years between datetime.MINYEAR (usually 1) and datetime.MAXYEAR (usually 9999). Some MongoDB drivers (e.g. the PHP driver) can store BSON datetimes with year values far outside those supported by datetime.datetime.

There are a few ways to work around this issue. One option is to filter out documents with values outside of the range supported by datetime.datetime:

>>> from datetime import datetime
>>> coll = client.test.dates
>>> cur = coll.find({'dt': {'$gte': datetime.min, '$lte': datetime.max}})

Another option, assuming you don’t need the datetime field, is to filter out just that field:

>>> cur = coll.find({}, projection={'dt': False})

Using PyMongo with Multiprocessing

On Unix systems the multiprocessing module spawns processes using fork(). Care must be taken when using instances of MongoClient with fork(). Specifically, instances of MongoClient must not be copied from a parent process to a child process. Instead, the parent process and each child process must create their own instances of MongoClient. For example:

# Each process creates its own instance of MongoClient.
def func():
    db = pymongo.MongoClient().mydb
    # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

Never do this:

client = pymongo.MongoClient()

# Each child process attempts to copy a global MongoClient
# created in the parent process. Never do this.
def func():
  db = client.mydb
  # Do something with db.

proc = multiprocessing.Process(target=func)
proc.start()

Instances of MongoClient copied from the parent process have a high probability of deadlock in the child process due to inherent incompatibilities between fork(), threads, and locks. PyMongo will attempt to issue a warning if there is a chance of this deadlock occurring.

Compatibility Policy

Semantic Versioning

PyMongo’s version numbers follow semantic versioning: each version number is structured “major.minor.patch”. Patch releases fix bugs, minor releases add features (and may fix bugs), and major releases include API changes that break backwards compatibility (and may add features and fix bugs).

Deprecation

Before we remove a feature in a major release, PyMongo’s maintainers make an effort to release at least one minor version that deprecates it. We add “DEPRECATED” to the feature’s documentation, and update the code to raise a DeprecationWarning. You can ensure your code is future-proof by running your code with the latest PyMongo release and looking for DeprecationWarnings.

Starting with Python 2.7, the interpreter silences DeprecationWarnings by default. For example, the following code uses the deprecated insert method but does not raise any warning:

# "insert.py"
from pymongo import MongoClient

client = MongoClient()
client.test.test.insert({})

To print deprecation warnings to stderr, run python with “-Wd”:

$ python -Wd insert.py
insert.py:4: DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.
  client.test.test.insert({})

You can turn warnings into exceptions with “python -We”:

$ python -We insert.py
Traceback (most recent call last):
  File "insert.py", line 4, in <module>
    client.test.test.insert({})
  File "/home/durin/work/mongo-python-driver/pymongo/collection.py", line 2906, in insert
    "instead.", DeprecationWarning, stacklevel=2)
DeprecationWarning: insert is deprecated. Use insert_one or insert_many instead.

If your own code’s test suite passes with “python -We” then it uses no deprecated PyMongo features.

See also

The Python documentation on the warnings module, and the -W command line option.

API Documentation

The PyMongo distribution contains three top-level packages for interacting with MongoDB. bson is an implementation of the BSON format, pymongo is a full-featured driver for MongoDB, and gridfs is a set of tools for working with the GridFS storage specification.

bson – BSON (Binary JSON) Encoding and Decoding

BSON (Binary JSON) encoding and decoding.

The mapping from Python types to BSON types is as follows:

Python Type BSON Type Supported Direction
None null both
bool boolean both
int [1] int32 / int64 py -> bson
long int64 py -> bson
bson.int64.Int64 int64 both
float number (real) both
string string py -> bson
unicode string both
list array both
dict / SON object both
datetime.datetime [2] [3] date both
bson.regex.Regex regex both
compiled re [4] regex py -> bson
bson.binary.Binary binary both
bson.objectid.ObjectId oid both
bson.dbref.DBRef dbref both
None undefined bson -> py
unicode code bson -> py
bson.code.Code code py -> bson
unicode symbol bson -> py
bytes (Python 3) [5] binary both

Note that, when using Python 2.x, to save binary data it must be wrapped as an instance of bson.binary.Binary. Otherwise it will be saved as a BSON string and retrieved as unicode. Users of Python 3.x can use the Python bytes type.

[1]A Python int will be saved as a BSON int32 or BSON int64 depending on its size. A BSON int32 will always decode to a Python int. A BSON int64 will always decode to a Int64.
[2]datetime.datetime instances will be rounded to the nearest millisecond when saved
[3]all datetime.datetime instances are treated as naive. clients should always use UTC.
[4]Regex instances and regular expression objects from re.compile() are both saved as BSON regular expressions. BSON regular expressions are decoded as Regex instances.
[5]The bytes type from Python 3.x is encoded as BSON binary with subtype 0. In Python 3.x it will be decoded back to bytes. In Python 2.x it will be decoded to an instance of Binary with subtype 0.
class bson.BSON

BSON (Binary JSON) data.

Warning

Using this class to encode and decode BSON adds a performance cost. For better performance use the module level functions encode() and decode() instead.

decode(codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))

Decode this BSON data.

By default, returns a BSON document represented as a Python dict. To use a different MutableMapping class, configure a CodecOptions:

>>> import collections  # From Python standard library.
>>> import bson
>>> from bson.codec_options import CodecOptions
>>> data = bson.BSON.encode({'a': 1})
>>> decoded_doc = bson.BSON(data).decode()
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.BSON(data).decode(codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>
Parameters:

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

classmethod encode(document, check_keys=False, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))

Encode a document to a new BSON instance.

A document can be any mapping type (like dict).

Raises TypeError if document is not a mapping type, or contains keys that are not instances of basestring (str in python 3). Raises InvalidDocument if document cannot be converted to BSON.

Parameters:
  • document: mapping type representing a document
  • check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising InvalidDocument in either case
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced uuid_subtype option with codec_options.

bson.decode(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))

Decode BSON to a document.

By default, returns a BSON document represented as a Python dict. To use a different MutableMapping class, configure a CodecOptions:

>>> import collections  # From Python standard library.
>>> import bson
>>> from bson.codec_options import CodecOptions
>>> data = bson.encode({'a': 1})
>>> decoded_doc = bson.decode(data)
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.decode(data, codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>
Parameters:
  • data: the BSON to decode. Any bytes-like object that implements the buffer protocol.
  • codec_options (optional): An instance of CodecOptions.

New in version 3.9.

bson.decode_all(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))

Decode BSON data to multiple documents.

data must be a bytes-like object implementing the buffer protocol that provides concatenated, valid, BSON-encoded documents.

Parameters:
  • data: BSON data
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.9: Supports bytes-like objects that implement the buffer protocol.

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

bson.decode_file_iter(file_obj, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))

Decode bson data from a file to multiple documents as a generator.

Works similarly to the decode_all function, but reads from the file object in chunks and parses bson in chunks, yielding one document at a time.

Parameters:
  • file_obj: A file object containing BSON data.
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

bson.decode_iter(data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))

Decode BSON data to multiple documents as a generator.

Works similarly to the decode_all function, but yields one document at a time.

data must be a string of concatenated, valid, BSON-encoded documents.

Parameters:
  • data: BSON data
  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

bson.encode(document, check_keys=False, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))

Encode a document to BSON.

A document can be any mapping type (like dict).

Raises TypeError if document is not a mapping type, or contains keys that are not instances of basestring (str in python 3). Raises InvalidDocument if document cannot be converted to BSON.

Parameters:
  • document: mapping type representing a document
  • check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising InvalidDocument in either case
  • codec_options (optional): An instance of CodecOptions.

New in version 3.9.

bson.gen_list_name()

Generate “keys” for encoded lists in the sequence b”0”, b”1”, b”2”, …

The first 1000 keys are returned from a pre-built cache. All subsequent keys are generated on the fly.

bson.has_c()

Is the C extension installed?

bson.is_valid(bson)

Check that the given string represents valid BSON data.

Raises TypeError if bson is not an instance of str (bytes in python 3). Returns True if bson is valid BSON, False otherwise.

Parameters:
  • bson: the data to be validated

Sub-modules:

binary – Tools for representing binary data to be stored in MongoDB
bson.binary.BINARY_SUBTYPE = 0

BSON binary subtype for binary data.

This is the default subtype for binary data.

bson.binary.FUNCTION_SUBTYPE = 1

BSON binary subtype for functions.

bson.binary.OLD_BINARY_SUBTYPE = 2

Old BSON binary subtype for binary data.

This is the old default subtype, the current default is BINARY_SUBTYPE.

bson.binary.OLD_UUID_SUBTYPE = 3

Old BSON binary subtype for a UUID.

uuid.UUID instances will automatically be encoded by bson using this subtype.

New in version 2.1.

bson.binary.UUID_SUBTYPE = 4

BSON binary subtype for a UUID.

This is the new BSON binary subtype for UUIDs. The current default is OLD_UUID_SUBTYPE.

Changed in version 2.1: Changed to subtype 4.

bson.binary.STANDARD = 4

The standard UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtype UUID_SUBTYPE.

New in version 3.0.

bson.binary.PYTHON_LEGACY = 3

The Python legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtype OLD_UUID_SUBTYPE.

New in version 3.0.

bson.binary.JAVA_LEGACY = 5

The Java legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary subtype OLD_UUID_SUBTYPE, using the Java driver’s legacy byte order.

Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.

New in version 2.3.

bson.binary.CSHARP_LEGACY = 6

The C#/.net legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary subtype OLD_UUID_SUBTYPE, using the C# driver’s legacy byte order.

Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.

New in version 2.3.

bson.binary.MD5_SUBTYPE = 5

BSON binary subtype for an MD5 hash.

bson.binary.USER_DEFINED_SUBTYPE = 128

BSON binary subtype for any user defined structure.

class bson.binary.Binary(data, subtype=BINARY_SUBTYPE)

Bases: bytes

Representation of BSON binary data.

This is necessary because we want to represent Python strings as the BSON string type. We need to wrap binary data so we can tell the difference between what should be considered binary data and what should be considered a string when we encode to BSON.

Raises TypeError if data is not an instance of bytes (str in python 2) or subtype is not an instance of int. Raises ValueError if subtype is not in [0, 256).

Note

In python 3 instances of Binary with subtype 0 will be decoded directly to bytes.

Parameters:
  • data: the binary data to represent. Can be any bytes-like type that implements the buffer protocol.
  • subtype (optional): the binary subtype to use

Changed in version 3.9: Support any bytes-like type that implements the buffer protocol.

subtype

Subtype of this binary data.

class bson.binary.UUIDLegacy(obj)

Bases: bson.binary.Binary

UUID wrapper to support working with UUIDs stored as PYTHON_LEGACY.

>>> import uuid
>>> from bson.binary import Binary, UUIDLegacy, STANDARD
>>> from bson.codec_options import CodecOptions
>>> my_uuid = uuid.uuid4()
>>> coll = db.get_collection('test',
...                          CodecOptions(uuid_representation=STANDARD))
>>> coll.insert_one({'uuid': Binary(my_uuid.bytes, 3)}).inserted_id
ObjectId('...')
>>> coll.count_documents({'uuid': my_uuid})
0
>>> coll.count_documents({'uuid': UUIDLegacy(my_uuid)})
1
>>> coll.find({'uuid': UUIDLegacy(my_uuid)})[0]['uuid']
UUID('...')
>>>
>>> # Convert from subtype 3 to subtype 4
>>> doc = coll.find_one({'uuid': UUIDLegacy(my_uuid)})
>>> coll.replace_one({"_id": doc["_id"]}, doc).matched_count
1
>>> coll.count_documents({'uuid': UUIDLegacy(my_uuid)})
0
>>> coll.count_documents({'uuid': {'$in': [UUIDLegacy(my_uuid), my_uuid]}})
1
>>> coll.find_one({'uuid': my_uuid})['uuid']
UUID('...')

Raises TypeError if obj is not an instance of UUID.

Parameters:
  • obj: An instance of UUID.
uuid

UUID instance wrapped by this UUIDLegacy instance.

code – Tools for representing JavaScript code

Tools for representing JavaScript code in BSON.

class bson.code.Code(code, scope=None, **kwargs)

Bases: str

BSON’s JavaScript code type.

Raises TypeError if code is not an instance of basestring (str in python 3) or scope is not None or an instance of dict.

Scope variables can be set by passing a dictionary as the scope argument or by using keyword arguments. If a variable is set as a keyword argument it will override any setting for that variable in the scope dictionary.

Parameters:
  • code: A string containing JavaScript code to be evaluated or another instance of Code. In the latter case, the scope of code becomes this Code’s scope.
  • scope (optional): dictionary representing the scope in which code should be evaluated - a mapping from identifiers (as strings) to values. Defaults to None. This is applied after any scope associated with a given code above.
  • **kwargs (optional): scope variables can also be passed as keyword arguments. These are applied after scope and code.

Changed in version 3.4: The default value for scope is None instead of {}.

scope

Scope dictionary for this instance or None.

codec_options – Tools for specifying BSON codec options

Tools for specifying BSON codec options.

class bson.codec_options.CodecOptions

Encapsulates options used encoding and / or decoding BSON.

The document_class option is used to define a custom type for use decoding BSON documents. Access to the underlying raw BSON bytes for a document is available using the RawBSONDocument type:

>>> from bson.raw_bson import RawBSONDocument
>>> from bson.codec_options import CodecOptions
>>> codec_options = CodecOptions(document_class=RawBSONDocument)
>>> coll = db.get_collection('test', codec_options=codec_options)
>>> doc = coll.find_one()
>>> doc.raw
'\x16\x00\x00\x00\x07_id\x00[0\x165\x91\x10\xea\x14\xe8\xc5\x8b\x93\x00'

The document class can be any type that inherits from MutableMapping:

>>> class AttributeDict(dict):
...     # A dict that supports attribute access.
...     def __getattr__(self, key):
...         return self[key]
...     def __setattr__(self, key, value):
...         self[key] = value
...
>>> codec_options = CodecOptions(document_class=AttributeDict)
>>> coll = db.get_collection('test', codec_options=codec_options)
>>> doc = coll.find_one()
>>> doc._id
ObjectId('5b3016359110ea14e8c58b93')

See Datetimes and Timezones for examples using the tz_aware and tzinfo options.

See UUIDLegacy for examples using the uuid_representation option.

Parameters:
  • document_class: BSON documents returned in queries will be decoded to an instance of this class. Must be a subclass of MutableMapping. Defaults to dict.
  • tz_aware: If True, BSON datetimes will be decoded to timezone aware instances of datetime. Otherwise they will be naive. Defaults to False.
  • uuid_representation: The BSON representation to use when encoding and decoding instances of UUID. Defaults to PYTHON_LEGACY.
  • unicode_decode_error_handler: The error handler to apply when a Unicode-related error occurs during BSON decoding that would otherwise raise UnicodeDecodeError. Valid options include ‘strict’, ‘replace’, and ‘ignore’. Defaults to ‘strict’.
  • tzinfo: A tzinfo subclass that specifies the timezone to/from which datetime objects should be encoded/decoded.
  • type_registry: Instance of TypeRegistry used to customize encoding and decoding behavior.

New in version 3.8: type_registry attribute.

Warning

Care must be taken when changing unicode_decode_error_handler from its default value (‘strict’). The ‘replace’ and ‘ignore’ modes should not be used when documents retrieved from the server will be modified in the client application and stored back to the server.

with_options(**kwargs)

Make a copy of this CodecOptions, overriding some options:

>>> from bson.codec_options import DEFAULT_CODEC_OPTIONS
>>> DEFAULT_CODEC_OPTIONS.tz_aware
False
>>> options = DEFAULT_CODEC_OPTIONS.with_options(tz_aware=True)
>>> options.tz_aware
True

New in version 3.5.

class bson.codec_options.TypeCodec

Base class for defining type codec classes which describe how a custom type can be transformed to/from one of the types bson can already encode/decode.

Codec classes must implement the python_type attribute, and the transform_python method to support encoding, as well as the bson_type attribute, and the transform_bson method to support decoding.

See The TypeCodec Class documentation for an example.

class bson.codec_options.TypeDecoder

Base class for defining type codec classes which describe how a BSON type can be transformed to a custom type.

Codec classes must implement the bson_type attribute, and the transform_bson method to support decoding.

See The TypeCodec Class documentation for an example.

bson_type

The BSON type to be converted into our own type.

transform_bson(value)

Convert the given BSON value into our own type.

class bson.codec_options.TypeEncoder

Base class for defining type codec classes which describe how a custom type can be transformed to one of the types BSON understands.

Codec classes must implement the python_type attribute, and the transform_python method to support encoding.

See The TypeCodec Class documentation for an example.

python_type

The Python type to be converted into something serializable.

transform_python(value)

Convert the given Python object into something serializable.

class bson.codec_options.TypeRegistry(type_codecs=None, fallback_encoder=None)

Encapsulates type codecs used in encoding and / or decoding BSON, as well as the fallback encoder. Type registries cannot be modified after instantiation.

TypeRegistry can be initialized with an iterable of type codecs, and a callable for the fallback encoder:

>>> from bson.codec_options import TypeRegistry
>>> type_registry = TypeRegistry([Codec1, Codec2, Codec3, ...],
...                              fallback_encoder)

See The TypeRegistry Class documentation for an example.

Parameters:
  • type_codecs (optional): iterable of type codec instances. If type_codecs contains multiple codecs that transform a single python or BSON type, the transformation specified by the type codec occurring last prevails. A TypeError will be raised if one or more type codecs modify the encoding behavior of a built-in bson type.
  • fallback_encoder (optional): callable that accepts a single, unencodable python value and transforms it into a type that bson can encode. See The fallback_encoder Callable documentation for an example.
dbref – Tools for manipulating DBRefs (references to documents stored in MongoDB)

Tools for manipulating DBRefs (references to MongoDB documents).

class bson.dbref.DBRef(collection, id, database=None, _extra={}, **kwargs)

Initialize a new DBRef.

Raises TypeError if collection or database is not an instance of basestring (str in python 3). database is optional and allows references to documents to work across databases. Any additional keyword arguments will create additional fields in the resultant embedded document.

Parameters:
  • collection: name of the collection the document is stored in
  • id: the value of the document’s "_id" field
  • database (optional): name of the database to reference
  • **kwargs (optional): additional keyword arguments will create additional, custom fields

See also

The MongoDB documentation on

dbrefs

as_doc()

Get the SON document representation of this DBRef.

Generally not needed by application developers

collection

Get the name of this DBRef’s collection as unicode.

database

Get the name of this DBRef’s database.

Returns None if this DBRef doesn’t specify a database.

id

Get this DBRef’s _id.

decimal128 – Support for BSON Decimal128

Tools for working with the BSON decimal128 type.

New in version 3.4.

Note

The Decimal128 BSON type requires MongoDB 3.4+.

class bson.decimal128.Decimal128(value)

BSON Decimal128 type:

>>> Decimal128(Decimal("0.0005"))
Decimal128('0.0005')
>>> Decimal128("0.0005")
Decimal128('0.0005')
>>> Decimal128((3474527112516337664, 5))
Decimal128('0.0005')
Parameters:
  • value: An instance of decimal.Decimal, string, or tuple of (high bits, low bits) from Binary Integer Decimal (BID) format.

Note

Decimal128 uses an instance of decimal.Context configured for IEEE-754 Decimal128 when validating parameters. Signals like decimal.InvalidOperation, decimal.Inexact, and decimal.Overflow are trapped and raised as exceptions:

>>> Decimal128(".13.1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]
>>>
>>> Decimal128("1E-6177")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.Inexact: [<class 'decimal.Inexact'>]
>>>
>>> Decimal128("1E6145")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.Overflow: [<class 'decimal.Overflow'>, <class 'decimal.Rounded'>]

To ensure the result of a calculation can always be stored as BSON Decimal128 use the context returned by create_decimal128_context():

>>> import decimal
>>> decimal128_ctx = create_decimal128_context()
>>> with decimal.localcontext(decimal128_ctx) as ctx:
...     Decimal128(ctx.create_decimal(".13.3"))
...
Decimal128('NaN')
>>>
>>> with decimal.localcontext(decimal128_ctx) as ctx:
...     Decimal128(ctx.create_decimal("1E-6177"))
...
Decimal128('0E-6176')
>>>
>>> with decimal.localcontext(DECIMAL128_CTX) as ctx:
...     Decimal128(ctx.create_decimal("1E6145"))
...
Decimal128('Infinity')

To match the behavior of MongoDB’s Decimal128 implementation str(Decimal(value)) may not match str(Decimal128(value)) for NaN values:

>>> Decimal128(Decimal('NaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('-NaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('sNaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('-sNaN'))
Decimal128('NaN')

However, to_decimal() will return the exact value:

>>> Decimal128(Decimal('NaN')).to_decimal()
Decimal('NaN')
>>> Decimal128(Decimal('-NaN')).to_decimal()
Decimal('-NaN')
>>> Decimal128(Decimal('sNaN')).to_decimal()
Decimal('sNaN')
>>> Decimal128(Decimal('-sNaN')).to_decimal()
Decimal('-sNaN')

Two instances of Decimal128 compare equal if their Binary Integer Decimal encodings are equal:

>>> Decimal128('NaN') == Decimal128('NaN')
True
>>> Decimal128('NaN').bid == Decimal128('NaN').bid
True

This differs from decimal.Decimal comparisons for NaN:

>>> Decimal('NaN') == Decimal('NaN')
False
bid

The Binary Integer Decimal (BID) encoding of this instance.

classmethod from_bid(value)

Create an instance of Decimal128 from Binary Integer Decimal string.

Parameters:
  • value: 16 byte string (128-bit IEEE 754-2008 decimal floating point in Binary Integer Decimal (BID) format).
to_decimal()

Returns an instance of decimal.Decimal for this Decimal128.

bson.decimal128.create_decimal128_context()

Returns an instance of decimal.Context appropriate for working with IEEE-754 128-bit decimal floating point values.

errors – Exceptions raised by the bson package

Exceptions raised by the BSON package.

exception bson.errors.BSONError

Base class for all BSON exceptions.

exception bson.errors.InvalidBSON

Raised when trying to create a BSON object from invalid data.

exception bson.errors.InvalidDocument

Raised when trying to create a BSON object from an invalid document.

exception bson.errors.InvalidId

Raised when trying to create an ObjectId from invalid data.

exception bson.errors.InvalidStringData

Raised when trying to encode a string containing non-UTF8 data.

int64 – Tools for representing BSON int64

New in version 3.0.

A BSON wrapper for long (int in python3)

class bson.int64.Int64

Representation of the BSON int64 type.

This is necessary because every integral number is an int in Python 3. Small integral numbers are encoded to BSON int32 by default, but Int64 numbers will always be encoded to BSON int64.

Parameters:
  • value: the numeric value to represent
json_util – Tools for using Python’s json module with BSON documents

Tools for using Python’s json module with BSON documents.

This module provides two helper methods dumps and loads that wrap the native json methods and provide explicit BSON conversion to and from JSON. JSONOptions provides a way to control how JSON is emitted and parsed, with the default being the legacy PyMongo format. json_util can also generate Canonical or Relaxed Extended JSON when CANONICAL_JSON_OPTIONS or RELAXED_JSON_OPTIONS is provided, respectively.

Example usage (deserialization):

>>> from bson.json_util import loads
>>> loads('[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$scope": {}, "$code": "function x() { return 1; }"}}, {"bin": {"$type": "80", "$binary": "AQIDBA=="}}]')
[{u'foo': [1, 2]}, {u'bar': {u'hello': u'world'}}, {u'code': Code('function x() { return 1; }', {})}, {u'bin': Binary('...', 128)}]

Example usage (serialization):

>>> from bson import Binary, Code
>>> from bson.json_util import dumps
>>> dumps([{'foo': [1, 2]},
...        {'bar': {'hello': 'world'}},
...        {'code': Code("function x() { return 1; }", {})},
...        {'bin': Binary(b"")}])
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }", "$scope": {}}}, {"bin": {"$binary": "AQIDBA==", "$type": "00"}}]'

Example usage (with CANONICAL_JSON_OPTIONS):

>>> from bson import Binary, Code
>>> from bson.json_util import dumps, CANONICAL_JSON_OPTIONS
>>> dumps([{'foo': [1, 2]},
...        {'bar': {'hello': 'world'}},
...        {'code': Code("function x() { return 1; }")},
...        {'bin': Binary(b"")}],
...       json_options=CANONICAL_JSON_OPTIONS)
'[{"foo": [{"$numberInt": "1"}, {"$numberInt": "2"}]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }"}}, {"bin": {"$binary": {"base64": "AQIDBA==", "subType": "00"}}}]'

Example usage (with RELAXED_JSON_OPTIONS):

>>> from bson import Binary, Code
>>> from bson.json_util import dumps, RELAXED_JSON_OPTIONS
>>> dumps([{'foo': [1, 2]},
...        {'bar': {'hello': 'world'}},
...        {'code': Code("function x() { return 1; }")},
...        {'bin': Binary(b"")}],
...       json_options=RELAXED_JSON_OPTIONS)
'[{"foo": [1, 2]}, {"bar": {"hello": "world"}}, {"code": {"$code": "function x() { return 1; }"}}, {"bin": {"$binary": {"base64": "AQIDBA==", "subType": "00"}}}]'

Alternatively, you can manually pass the default to json.dumps(). It won’t handle Binary and Code instances (as they are extended strings you can’t provide custom defaults), but it will be faster as there is less recursion.

Note

If your application does not need the flexibility offered by JSONOptions and spends a large amount of time in the json_util module, look to python-bsonjs for a nice performance improvement. python-bsonjs is a fast BSON to MongoDB Extended JSON converter for Python built on top of libbson. python-bsonjs works best with PyMongo when using RawBSONDocument.

Changed in version 2.8: The output format for Timestamp has changed from ‘{“t”: <int>, “i”: <int>}’ to ‘{“$timestamp”: {“t”: <int>, “i”: <int>}}’. This new format will be decoded to an instance of Timestamp. The old format will continue to be decoded to a python dict as before. Encoding to the old format is no longer supported as it was never correct and loses type information. Added support for $numberLong and $undefined - new in MongoDB 2.6 - and parsing $date in ISO-8601 format.

Changed in version 2.7: Preserves order when rendering SON, Timestamp, Code, Binary, and DBRef instances.

Changed in version 2.3: Added dumps and loads helpers to automatically handle conversion to and from json and supports Binary and Code

class bson.json_util.DatetimeRepresentation
LEGACY = 0

Legacy MongoDB Extended JSON datetime representation.

datetime.datetime instances will be encoded to JSON in the format {“$date”: <dateAsMilliseconds>}, where dateAsMilliseconds is a 64-bit signed integer giving the number of milliseconds since the Unix epoch UTC. This was the default encoding before PyMongo version 3.4.

New in version 3.4.

NUMBERLONG = 1

NumberLong datetime representation.

datetime.datetime instances will be encoded to JSON in the format {“$date”: {“$numberLong”: “<dateAsMilliseconds>”}}, where dateAsMilliseconds is the string representation of a 64-bit signed integer giving the number of milliseconds since the Unix epoch UTC.

New in version 3.4.

ISO8601 = 2

ISO-8601 datetime representation.

datetime.datetime instances greater than or equal to the Unix epoch UTC will be encoded to JSON in the format {“$date”: “<ISO-8601>”}. datetime.datetime instances before the Unix epoch UTC will be encoded as if the datetime representation is NUMBERLONG.

New in version 3.4.

class bson.json_util.JSONMode
LEGACY = 0

Legacy Extended JSON representation.

In this mode, dumps() produces PyMongo’s legacy non-standard JSON output. Consider using RELAXED or CANONICAL instead.

New in version 3.5.

RELAXED = 1

Relaxed Extended JSON representation.

In this mode, dumps() produces Relaxed Extended JSON, a mostly JSON-like format. Consider using this for things like a web API, where one is sending a document (or a projection of a document) that only uses ordinary JSON type primitives. In particular, the int, Int64, and float numeric types are represented in the native JSON number format. This output is also the most human readable and is useful for debugging and documentation.

See also

The specification for Relaxed Extended JSON.

New in version 3.5.

CANONICAL = 2

Canonical Extended JSON representation.

In this mode, dumps() produces Canonical Extended JSON, a type preserving format. Consider using this for things like testing, where one has to precisely specify expected types in JSON. In particular, the int, Int64, and float numeric types are encoded with type wrappers.

See also

The specification for Canonical Extended JSON.

New in version 3.5.

class bson.json_util.JSONOptions

Encapsulates JSON options for dumps() and loads().

Parameters:
  • strict_number_long: If True, Int64 objects are encoded to MongoDB Extended JSON’s Strict mode type NumberLong, ie '{"$numberLong": "<number>" }'. Otherwise they will be encoded as an int. Defaults to False.
  • datetime_representation: The representation to use when encoding instances of datetime.datetime. Defaults to LEGACY.
  • strict_uuid: If True, uuid.UUID object are encoded to MongoDB Extended JSON’s Strict mode type Binary. Otherwise it will be encoded as '{"$uuid": "<hex>" }'. Defaults to False.
  • json_mode: The JSONMode to use when encoding BSON types to Extended JSON. Defaults to LEGACY.
  • document_class: BSON documents returned by loads() will be decoded to an instance of this class. Must be a subclass of collections.MutableMapping. Defaults to dict.
  • uuid_representation: The BSON representation to use when encoding and decoding instances of uuid.UUID. Defaults to PYTHON_LEGACY.
  • tz_aware: If True, MongoDB Extended JSON’s Strict mode type Date will be decoded to timezone aware instances of datetime.datetime. Otherwise they will be naive. Defaults to True.
  • tzinfo: A datetime.tzinfo subclass that specifies the timezone from which datetime objects should be decoded. Defaults to utc.
  • args: arguments to CodecOptions
  • kwargs: arguments to CodecOptions

See also

The specification for Relaxed and Canonical Extended JSON.

New in version 3.4.

Changed in version 3.5: Accepts the optional parameter json_mode.

bson.json_util.LEGACY_JSON_OPTIONS = JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))

JSONOptions for encoding to PyMongo’s legacy JSON format.

See also

The documentation for bson.json_util.JSONMode.LEGACY.

New in version 3.5.

bson.json_util.DEFAULT_JSON_OPTIONS = JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))

The default JSONOptions for JSON encoding/decoding.

The same as LEGACY_JSON_OPTIONS. This will change to RELAXED_JSON_OPTIONS in a future release.

New in version 3.4.

bson.json_util.CANONICAL_JSON_OPTIONS = JSONOptions(strict_number_long=True, datetime_representation=1, strict_uuid=True, json_mode=2, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))

JSONOptions for Canonical Extended JSON.

See also

The documentation for bson.json_util.JSONMode.CANONICAL.

New in version 3.5.

bson.json_util.RELAXED_JSON_OPTIONS = JSONOptions(strict_number_long=False, datetime_representation=2, strict_uuid=True, json_mode=1, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))

JSONOptions for Relaxed Extended JSON.

See also

The documentation for bson.json_util.JSONMode.RELAXED.

New in version 3.5.

bson.json_util.STRICT_JSON_OPTIONS = JSONOptions(strict_number_long=True, datetime_representation=2, strict_uuid=True, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))

DEPRECATED - JSONOptions for MongoDB Extended JSON’s Strict mode encoding.

New in version 3.4.

Changed in version 3.5: Deprecated. Use RELAXED_JSON_OPTIONS or CANONICAL_JSON_OPTIONS instead.

bson.json_util.dumps(obj, *args, **kwargs)

Helper function that wraps json.dumps().

Recursive function that handles all BSON types including Binary and Code.

Parameters:

Changed in version 3.4: Accepts optional parameter json_options. See JSONOptions.

Changed in version 2.7: Preserves order when rendering SON, Timestamp, Code, Binary, and DBRef instances.

bson.json_util.loads(s, *args, **kwargs)

Helper function that wraps json.loads().

Automatically passes the object_hook for BSON type conversion.

Raises TypeError, ValueError, KeyError, or InvalidId on invalid MongoDB Extended JSON.

Parameters:

Changed in version 3.5: Parses Relaxed and Canonical Extended JSON as well as PyMongo’s legacy format. Now raises TypeError or ValueError when parsing JSON type wrappers with values of the wrong type or any extra keys.

Changed in version 3.4: Accepts optional parameter json_options. See JSONOptions.

bson.json_util.object_pairs_hook(pairs, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))
bson.json_util.object_hook(dct, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))
bson.json_util.default(obj, json_options=JSONOptions(strict_number_long=False, datetime_representation=0, strict_uuid=False, json_mode=0, document_class=dict, tz_aware=True, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=<bson.tz_util.FixedOffset object>, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)))
max_key – Representation for the MongoDB internal MaxKey type

Representation for the MongoDB internal MaxKey type.

class bson.max_key.MaxKey

MongoDB internal MaxKey type.

Changed in version 2.7: MaxKey now implements comparison operators.

min_key – Representation for the MongoDB internal MinKey type

Representation for the MongoDB internal MinKey type.

class bson.min_key.MinKey

MongoDB internal MinKey type.

Changed in version 2.7: MinKey now implements comparison operators.

objectid – Tools for working with MongoDB ObjectIds

Tools for working with MongoDB ObjectIds.

class bson.objectid.ObjectId(oid=None)

Initialize a new ObjectId.

An ObjectId is a 12-byte unique identifier consisting of:

  • a 4-byte value representing the seconds since the Unix epoch,
  • a 5-byte random value,
  • a 3-byte counter, starting with a random value.

By default, ObjectId() creates a new unique identifier. The optional parameter oid can be an ObjectId, or any 12 bytes or, in Python 2, any 12-character str.

For example, the 12 bytes b’foo-bar-quux’ do not follow the ObjectId specification but they are acceptable input:

>>> ObjectId(b'foo-bar-quux')
ObjectId('666f6f2d6261722d71757578')

oid can also be a unicode or str of 24 hex digits:

>>> ObjectId('0123456789ab0123456789ab')
ObjectId('0123456789ab0123456789ab')
>>>
>>> # A u-prefixed unicode literal:
>>> ObjectId(u'0123456789ab0123456789ab')
ObjectId('0123456789ab0123456789ab')

Raises InvalidId if oid is not 12 bytes nor 24 hex digits, or TypeError if oid is not an accepted type.

Parameters:
  • oid (optional): a valid ObjectId.

See also

The MongoDB documentation on

objectids

Changed in version 3.8: ObjectId now implements the ObjectID specification version 0.2.

str(o)

Get a hex encoded version of ObjectId o.

The following property always holds:

>>> o = ObjectId()
>>> o == ObjectId(str(o))
True

This representation is useful for urls or other places where o.binary is inappropriate.

binary

12-byte binary representation of this ObjectId.

classmethod from_datetime(generation_time)

Create a dummy ObjectId instance with a specific generation time.

This method is useful for doing range queries on a field containing ObjectId instances.

Warning

It is not safe to insert a document containing an ObjectId generated using this method. This method deliberately eliminates the uniqueness guarantee that ObjectIds generally provide. ObjectIds generated with this method should be used exclusively in queries.

generation_time will be converted to UTC. Naive datetime instances will be treated as though they already contain UTC.

An example using this helper to get documents where "_id" was generated before January 1, 2010 would be:

>>> gen_time = datetime.datetime(2010, 1, 1)
>>> dummy_id = ObjectId.from_datetime(gen_time)
>>> result = collection.find({"_id": {"$lt": dummy_id}})
Parameters:
  • generation_time: datetime to be used as the generation time for the resulting ObjectId.
generation_time

A datetime.datetime instance representing the time of generation for this ObjectId.

The datetime.datetime is timezone aware, and represents the generation time in UTC. It is precise to the second.

classmethod is_valid(oid)

Checks if a oid string is valid or not.

Parameters:
  • oid: the object id to validate

New in version 2.3.

raw_bson – Tools for representing raw BSON documents.

Tools for representing raw BSON documents.

bson.raw_bson.DEFAULT_RAW_BSON_OPTIONS = CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))

The default CodecOptions for RawBSONDocument.

class bson.raw_bson.RawBSONDocument(bson_bytes, codec_options=None)

Create a new RawBSONDocument

RawBSONDocument is a representation of a BSON document that provides access to the underlying raw BSON bytes. Only when a field is accessed or modified within the document does RawBSONDocument decode its bytes.

RawBSONDocument implements the Mapping abstract base class from the standard library so it can be used like a read-only dict:

>>> raw_doc = RawBSONDocument(BSON.encode({'_id': 'my_doc'}))
>>> raw_doc.raw
b'...'
>>> raw_doc['_id']
'my_doc'
Parameters:

Changed in version 3.8: RawBSONDocument now validates that the bson_bytes passed in represent a single bson document.

Changed in version 3.5: If a CodecOptions is passed in, its document_class must be RawBSONDocument.

items()

Lazily decode and iterate elements in this document.

raw

The raw BSON bytes composing this document.

regex – Tools for representing MongoDB regular expressions

New in version 2.7.

Tools for representing MongoDB regular expressions.

class bson.regex.Regex(pattern, flags=0)

BSON regular expression data.

This class is useful to store and retrieve regular expressions that are incompatible with Python’s regular expression dialect.

Parameters:
  • pattern: string
  • flags: (optional) an integer bitmask, or a string of flag characters like “im” for IGNORECASE and MULTILINE
classmethod from_native(regex)

Convert a Python regular expression into a Regex instance.

Note that in Python 3, a regular expression compiled from a str has the re.UNICODE flag set. If it is undesirable to store this flag in a BSON regular expression, unset it first:

>>> pattern = re.compile('.*')
>>> regex = Regex.from_native(pattern)
>>> regex.flags ^= re.UNICODE
>>> db.collection.insert({'pattern': regex})
Parameters:
  • regex: A regular expression object from re.compile().

Warning

Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query.

try_compile()

Compile this Regex as a Python regular expression.

Warning

Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query. try_compile() may raise re.error.

son – Tools for working with SON, an ordered mapping

Tools for creating and manipulating SON, the Serialized Ocument Notation.

Regular dictionaries can be used instead of SON objects, but not when the order of keys is important. A SON object can be used just like a normal Python dictionary.

class bson.son.SON(data=None, **kwargs)

SON data.

A subclass of dict that maintains ordering of keys and provides a few extra niceties for dealing with SON. SON provides an API similar to collections.OrderedDict from Python 2.7+.

clear() → None. Remove all items from D.
copy() → a shallow copy of D
get(key, default=None)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items
keys() → a set-like object providing a view on D's keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem() → (k, v), remove and return some (key, value) pair as a

2-tuple; but raise KeyError if D is empty.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

to_dict()

Convert a SON document to a normal Python dictionary instance.

This is trickier than just dict(…) because it needs to be recursive.

update([E, ]**F) → None. Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values
timestamp – Tools for representing MongoDB internal Timestamps

Tools for representing MongoDB internal Timestamps.

class bson.timestamp.Timestamp(time, inc)

Create a new Timestamp.

This class is only for use with the MongoDB opLog. If you need to store a regular timestamp, please use a datetime.

Raises TypeError if time is not an instance of :class: int or datetime, or inc is not an instance of int. Raises ValueError if time or inc is not in [0, 2**32).

Parameters:
  • time: time in seconds since epoch UTC, or a naive UTC datetime, or an aware datetime
  • inc: the incrementing counter
as_datetime()

Return a datetime instance corresponding to the time portion of this Timestamp.

The returned datetime’s timezone is UTC.

inc

Get the inc portion of this Timestamp.

time

Get the time portion of this Timestamp.

tz_util – Utilities for dealing with timezones in Python

Timezone related utilities for BSON.

class bson.tz_util.FixedOffset(offset, name)

Fixed offset timezone, in minutes east from UTC.

Implementation based from the Python standard library documentation. Defining __getinitargs__ enables pickling / copying.

dst(dt)

datetime -> DST offset as timedelta positive east of UTC.

tzname(dt)

datetime -> string name of time zone.

utcoffset(dt)

datetime -> timedelta showing offset from UTC, negative values indicating West of UTC

bson.tz_util.utc = <bson.tz_util.FixedOffset object>

Fixed offset timezone representing UTC.

pymongo – Python driver for MongoDB

Python driver for MongoDB.

pymongo.version = '3.9.0'

str(object=’‘) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

pymongo.MongoClient

Alias for pymongo.mongo_client.MongoClient.

pymongo.MongoReplicaSetClient

Alias for pymongo.mongo_replica_set_client.MongoReplicaSetClient.

pymongo.ReadPreference

Alias for pymongo.read_preferences.ReadPreference.

pymongo.has_c()

Is the C extension installed?

pymongo.MIN_SUPPORTED_WIRE_VERSION

The minimum wire protocol version PyMongo supports.

pymongo.MAX_SUPPORTED_WIRE_VERSION

The maximum wire protocol version PyMongo supports.

Sub-modules:

bulk – The bulk write operations interface

The bulk write operations interface.

New in version 2.7.

class pymongo.bulk.BulkOperationBuilder(collection, ordered=True, bypass_document_validation=False)

DEPRECATED: Initialize a new BulkOperationBuilder instance.

Parameters:
  • collection: A Collection instance.
  • ordered (optional): If True all operations will be executed serially, in the order provided, and the entire execution will abort on the first error. If False operations will be executed in arbitrary order (possibly in parallel on the server), reporting any errors that occurred after attempting all operations. Defaults to True.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.5: Deprecated. Use bulk_write() instead.

Changed in version 3.2: Added bypass_document_validation support

execute(write_concern=None)

Execute all provided operations.

Parameters:
  • write_concern (optional): the write concern for this bulk execution.
find(selector, collation=None)

Specify selection criteria for bulk operations.

Parameters:
  • selector (dict): the selection criteria for update and remove operations.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
Returns:
  • A BulkWriteOperation instance, used to add update and remove operations to this bulk operation.

Changed in version 3.4: Added the collation option.

insert(document)

Insert a single document.

Parameters:
  • document (dict): the document to insert
class pymongo.bulk.BulkUpsertOperation(selector, bulk, collation)

An interface for adding upsert operations.

replace_one(replacement)

Replace one entire document matching the selector criteria.

Parameters:
  • replacement (dict): the replacement document
update(update)

Update all documents matching the selector.

Parameters:
  • update (dict): the update operations to apply
update_one(update)

Update one document matching the selector.

Parameters:
  • update (dict): the update operations to apply
class pymongo.bulk.BulkWriteOperation(selector, bulk, collation)

An interface for adding update or remove operations.

remove()

Remove all documents matching the selector criteria.

remove_one()

Remove a single document matching the selector criteria.

replace_one(replacement)

Replace one entire document matching the selector criteria.

Parameters:
  • replacement (dict): the replacement document
update(update)

Update all documents matching the selector criteria.

Parameters:
  • update (dict): the update operations to apply
update_one(update)

Update one document matching the selector criteria.

Parameters:
  • update (dict): the update operations to apply
upsert()

Specify that all chained update operations should be upserts.

Returns:
change_stream – Watch changes on a collection, database, or cluster

Watch changes on a collection, a database, or the entire cluster.

class pymongo.change_stream.ChangeStream(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)

The internal abstract base class for change stream cursors.

Should not be called directly by application developers. Use pymongo.collection.Collection.watch(), pymongo.database.Database.watch(), or pymongo.mongo_client.MongoClient.watch() instead.

New in version 3.6.

See also

The MongoDB documentation on

changeStreams

alive

Does this cursor have the potential to return more data?

Note

Even if alive is True, next() can raise StopIteration and try_next() can return None.

New in version 3.8.

close()

Close this ChangeStream.

next()

Advance the cursor.

This method blocks until the next change document is returned or an unrecoverable error is raised. This method is used when iterating over all changes in the cursor. For example:

try:
    resume_token = None
    pipeline = [{'$match': {'operationType': 'insert'}}]
    with db.collection.watch(pipeline) as stream:
        for insert_change in stream:
            print(insert_change)
            resume_token = stream.resume_token
except pymongo.errors.PyMongoError:
    # The ChangeStream encountered an unrecoverable error or the
    # resume attempt failed to recreate the cursor.
    if resume_token is None:
        # There is no usable resume token because there was a
        # failure during ChangeStream initialization.
        logging.error('...')
    else:
        # Use the interrupted ChangeStream's resume token to create
        # a new ChangeStream. The new stream will continue from the
        # last seen insert change without missing any events.
        with db.collection.watch(
                pipeline, resume_after=resume_token) as stream:
            for insert_change in stream:
                print(insert_change)

Raises StopIteration if this ChangeStream is closed.

resume_token

The cached resume token that will be used to resume after the most recently returned change.

New in version 3.9.

try_next()

Advance the cursor without blocking indefinitely.

This method returns the next change document without waiting indefinitely for the next change. For example:

with db.collection.watch() as stream:
    while stream.alive:
        change = stream.try_next()
        # Note that the ChangeStream's resume token may be updated
        # even when no changes are returned.
        print("Current resume token: %r" % (stream.resume_token,))
        if change is not None:
            print("Change document: %r" % (change,))
            continue
        # We end up here when there are no recent changes.
        # Sleep for a while before trying again to avoid flooding
        # the server with getMore requests when no changes are
        # available.
        time.sleep(10)

If no change document is cached locally then this method runs a single getMore command. If the getMore yields any documents, the next document is returned, otherwise, if the getMore returns no documents (because there have been no changes) then None is returned.

Returns:The next change document or None when no document is available after running a single getMore or when the cursor is closed.

New in version 3.8.

class pymongo.change_stream.ClusterChangeStream(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)

A change stream that watches changes on all collections in the cluster.

Should not be called directly by application developers. Use helper method pymongo.mongo_client.MongoClient.watch() instead.

New in version 3.7.

class pymongo.change_stream.CollectionChangeStream(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)

A change stream that watches changes on a single collection.

Should not be called directly by application developers. Use helper method pymongo.collection.Collection.watch() instead.

New in version 3.7.

class pymongo.change_stream.DatabaseChangeStream(target, pipeline, full_document, resume_after, max_await_time_ms, batch_size, collation, start_at_operation_time, session, start_after)

A change stream that watches changes on all collections in a database.

Should not be called directly by application developers. Use helper method pymongo.database.Database.watch() instead.

New in version 3.7.

client_session – Logical sessions for sequential operations

Logical sessions for ordering sequential operations.

Requires MongoDB 3.6.

New in version 3.6.

Causally Consistent Reads
with client.start_session(causal_consistency=True) as session:
    collection = client.db.collection
    collection.update_one({'_id': 1}, {'$set': {'x': 10}}, session=session)
    secondary_c = collection.with_options(
        read_preference=ReadPreference.SECONDARY)

    # A secondary read waits for replication of the write.
    secondary_c.find_one({'_id': 1}, session=session)

If causal_consistency is True (the default), read operations that use the session are causally after previous read and write operations. Using a causally consistent session, an application can read its own writes and is guaranteed monotonic reads, even when reading from replica set secondaries.

See also

The MongoDB documentation on

causal-consistency

Transactions

MongoDB 4.0 adds support for transactions on replica set primaries. A transaction is associated with a ClientSession. To start a transaction on a session, use ClientSession.start_transaction() in a with-statement. Then, execute an operation within the transaction by passing the session to the operation:

orders = client.db.orders
inventory = client.db.inventory
with client.start_session() as session:
    with session.start_transaction():
        orders.insert_one({"sku": "abc123", "qty": 100}, session=session)
        inventory.update_one({"sku": "abc123", "qty": {"$gte": 100}},
                             {"$inc": {"qty": -100}}, session=session)

Upon normal completion of with session.start_transaction() block, the transaction automatically calls ClientSession.commit_transaction(). If the block exits with an exception, the transaction automatically calls ClientSession.abort_transaction().

For multi-document transactions, you can only specify read/write (CRUD) operations on existing collections. For example, a multi-document transaction cannot include a create or drop collection/index operations, including an insert operation that would result in the creation of a new collection.

A session may only have a single active transaction at a time, multiple transactions on the same session can be executed in sequence.

New in version 3.7.

Sharded Transactions

PyMongo 3.9 adds support for transactions on sharded clusters running MongoDB 4.2. Sharded transactions have the same API as replica set transactions. When running a transaction against a sharded cluster, the session is pinned to the mongos server selected for the first operation in the transaction. All subsequent operations that are part of the same transaction are routed to the same mongos server. When the transaction is completed, by running either commitTransaction or abortTransaction, the session is unpinned.

New in version 3.9.

See also

The MongoDB documentation on

transactions

Classes
class pymongo.client_session.ClientSession(client, server_session, options, authset, implicit)

A session for ordering sequential operations.

abort_transaction()

Abort a multi-statement transaction.

New in version 3.7.

advance_cluster_time(cluster_time)

Update the cluster time for this session.

Parameters:
  • cluster_time: The cluster_time from another ClientSession instance.
advance_operation_time(operation_time)

Update the operation time for this session.

Parameters:
  • operation_time: The operation_time from another ClientSession instance.
client

The MongoClient this session was created from.

cluster_time

The cluster time returned by the last operation executed in this session.

commit_transaction()

Commit a multi-statement transaction.

New in version 3.7.

end_session()

Finish this session. If a transaction has started, abort it.

It is an error to use the session after the session has ended.

has_ended

True if this session is finished.

operation_time

The operation time returned by the last operation executed in this session.

options

The SessionOptions this session was created with.

session_id

A BSON document, the opaque server session identifier.

start_transaction(read_concern=None, write_concern=None, read_preference=None, max_commit_time_ms=None)

Start a multi-statement transaction.

Takes the same arguments as TransactionOptions.

Changed in version 3.9: Added the max_commit_time_ms option.

New in version 3.7.

with_transaction(callback, read_concern=None, write_concern=None, read_preference=None, max_commit_time_ms=None)

Execute a callback in a transaction.

This method starts a transaction on this session, executes callback once, and then commits the transaction. For example:

def callback(session):
    orders = session.client.db.orders
    inventory = session.client.db.inventory
    orders.insert_one({"sku": "abc123", "qty": 100}, session=session)
    inventory.update_one({"sku": "abc123", "qty": {"$gte": 100}},
                         {"$inc": {"qty": -100}}, session=session)

with client.start_session() as session:
    session.with_transaction(callback)

To pass arbitrary arguments to the callback, wrap your callable with a lambda like this:

def callback(session, custom_arg, custom_kwarg=None):
    # Transaction operations...

with client.start_session() as session:
    session.with_transaction(
        lambda s: callback(s, "custom_arg", custom_kwarg=1))

In the event of an exception, with_transaction may retry the commit or the entire transaction, therefore callback may be invoked multiple times by a single call to with_transaction. Developers should be mindful of this possiblity when writing a callback that modifies application state or has any other side-effects. Note that even when the callback is invoked multiple times, with_transaction ensures that the transaction will be committed at-most-once on the server.

The callback should not attempt to start new transactions, but should simply run operations meant to be contained within a transaction. The callback should also not commit the transaction; this is handled automatically by with_transaction. If the callback does commit or abort the transaction without error, however, with_transaction will return without taking further action.

When callback raises an exception, with_transaction automatically aborts the current transaction. When callback or commit_transaction() raises an exception that includes the "TransientTransactionError" error label, with_transaction starts a new transaction and re-executes the callback.

When commit_transaction() raises an exception with the "UnknownTransactionCommitResult" error label, with_transaction retries the commit until the result of the transaction is known.

This method will cease retrying after 120 seconds has elapsed. This timeout is not configurable and any exception raised by the callback or by ClientSession.commit_transaction() after the timeout is reached will be re-raised. Applications that desire a different timeout duration should not use this method.

Parameters:
  • callback: The callable callback to run inside a transaction. The callable must accept a single argument, this session. Note, under certain error conditions the callback may be run multiple times.
  • read_concern (optional): The ReadConcern to use for this transaction.
  • write_concern (optional): The WriteConcern to use for this transaction.
  • read_preference (optional): The read preference to use for this transaction. If None (the default) the read_preference of this Database is used. See read_preferences for options.
Returns:

The return value of the callback.

New in version 3.9.

class pymongo.client_session.SessionOptions(causal_consistency=True, default_transaction_options=None)

Options for a new ClientSession.

Parameters:
  • causal_consistency (optional): If True (the default), read operations are causally ordered within the session.
  • default_transaction_options (optional): The default TransactionOptions to use for transactions started on this session.
causal_consistency

Whether causal consistency is configured.

default_transaction_options

The default TransactionOptions to use for transactions started on this session.

New in version 3.7.

class pymongo.client_session.TransactionOptions(read_concern=None, write_concern=None, read_preference=None, max_commit_time_ms=None)

Options for ClientSession.start_transaction().

Parameters:
  • read_concern (optional): The ReadConcern to use for this transaction. If None (the default) the read_preference of the MongoClient is used.
  • write_concern (optional): The WriteConcern to use for this transaction. If None (the default) the read_preference of the MongoClient is used.
  • read_preference (optional): The read preference to use. If None (the default) the read_preference of this MongoClient is used. See read_preferences for options. Transactions which read must use PRIMARY.
  • max_commit_time_ms (optional): The maximum amount of time to allow a single commitTransaction command to run. This option is an alias for maxTimeMS option on the commitTransaction command. If None (the default) maxTimeMS is not used.

Changed in version 3.9: Added the max_commit_time_ms option.

New in version 3.7.

max_commit_time_ms

The maxTimeMS to use when running a commitTransaction command.

New in version 3.9.

read_concern

This transaction’s ReadConcern.

read_preference

This transaction’s ReadPreference.

write_concern

This transaction’s WriteConcern.

collation – Tools for working with collations.

Tools for working with collations.

class pymongo.collation.Collation(locale, caseLevel=None, caseFirst=None, strength=None, numericOrdering=None, alternate=None, maxVariable=None, normalization=None, backwards=None, **kwargs)
Parameters:
  • locale: (string) The locale of the collation. This should be a string that identifies an ICU locale ID exactly. For example, en_US is valid, but en_us and en-US are not. Consult the MongoDB documentation for a list of supported locales.

  • caseLevel: (optional) If True, turn on case sensitivity if strength is 1 or 2 (case sensitivity is implied if strength is greater than 2). Defaults to False.

  • caseFirst: (optional) Specify that either uppercase or lowercase characters take precedence. Must be one of the following values:

  • strength: (optional) Specify the comparison strength. This is also known as the ICU comparison level. This must be one of the following values:

    Each successive level builds upon the previous. For example, a strength of SECONDARY differentiates characters based both on the unadorned base character and its accents.

  • numericOrdering: (optional) If True, order numbers numerically instead of in collation order (defaults to False).

  • alternate: (optional) Specify whether spaces and punctuation are considered base characters. This must be one of the following values:

  • maxVariable: (optional) When alternate is SHIFTED, this option specifies what characters may be ignored. This must be one of the following values:

  • normalization: (optional) If True, normalizes text into Unicode NFD. Defaults to False.

  • backwards: (optional) If True, accents on characters are considered from the back of the word to the front, as it is done in some French dictionary ordering traditions. Defaults to False.

  • kwargs: (optional) Keyword arguments supplying any additional options to be sent with this Collation object.

class pymongo.collation.CollationStrength

An enum that defines values for strength on a Collation.

PRIMARY = 1

Differentiate base (unadorned) characters.

SECONDARY = 2

Differentiate character accents.

TERTIARY = 3

Differentiate character case.

QUATERNARY = 4

Differentiate words with and without punctuation.

IDENTICAL = 5

Differentiate unicode code point (characters are exactly identical).

class pymongo.collation.CollationAlternate

An enum that defines values for alternate on a Collation.

NON_IGNORABLE = 'non-ignorable'

Spaces and punctuation are treated as base characters.

SHIFTED = 'shifted'

Spaces and punctuation are not considered base characters.

Spaces and punctuation are distinguished regardless when the Collation strength is at least QUATERNARY.

class pymongo.collation.CollationCaseFirst

An enum that defines values for case_first on a Collation.

UPPER = 'upper'

Sort uppercase characters first.

LOWER = 'lower'

Sort lowercase characters first.

OFF = 'off'

Default for locale or collation strength.

class pymongo.collation.CollationMaxVariable

An enum that defines values for max_variable on a Collation.

PUNCT = 'punct'

Both punctuation and spaces are ignored.

SPACE = 'space'

Spaces alone are ignored.

collection – Collection level operations

Collection level utilities for Mongo.

pymongo.ASCENDING = 1

Ascending sort order.

pymongo.DESCENDING = -1

Descending sort order.

pymongo.GEO2D = '2d'

Index specifier for a 2-dimensional geospatial index.

pymongo.GEOHAYSTACK = 'geoHaystack'

Index specifier for a 2-dimensional haystack index.

New in version 2.1.

pymongo.GEOSPHERE = '2dsphere'

Index specifier for a spherical geospatial index.

New in version 2.5.

pymongo.HASHED = 'hashed'

Index specifier for a hashed index.

New in version 2.5.

pymongo.TEXT = 'text'

Index specifier for a text index.

New in version 2.7.1.

class pymongo.collection.ReturnDocument

An enum used with find_one_and_replace() and find_one_and_update().

BEFORE

Return the original document before it was updated/replaced, or None if no document matches the query.

AFTER

Return the updated/replaced or inserted document.

class pymongo.collection.Collection(database, name, create=False, **kwargs)

Get / create a Mongo collection.

Raises TypeError if name is not an instance of basestring (str in python 3). Raises InvalidName if name is not a valid collection name. Any additional keyword arguments will be used as options passed to the create command. See create_collection() for valid options.

If create is True, collation is specified, or any additional keyword arguments are present, a create command will be sent, using session if specified. Otherwise, a create command will not be sent and the collection will be created implicitly on first use. The optional session argument is only used for the create command, it is not associated with the collection afterward.

Parameters:
  • database: the database to get a collection from
  • name: the name of the collection to get
  • create (optional): if True, force collection creation even without options being set
  • codec_options (optional): An instance of CodecOptions. If None (the default) database.codec_options is used.
  • read_preference (optional): The read preference to use. If None (the default) database.read_preference is used.
  • write_concern (optional): An instance of WriteConcern. If None (the default) database.write_concern is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) database.read_concern is used.
  • collation (optional): An instance of Collation. If a collation is provided, it will be passed to the create collection command. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession that is used with the create collection command
  • **kwargs (optional): additional keyword arguments will be passed as options for the create collection command

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Support the collation option.

Changed in version 3.2: Added the read_concern option.

Changed in version 3.0: Added the codec_options, read_preference, and write_concern options. Removed the uuid_subtype attribute. Collection no longer returns an instance of Collection for attribute names with leading underscores. You must use dict-style lookups instead::

collection[‘__my_collection__’]

Not:

collection.__my_collection__

Changed in version 2.2: Removed deprecated argument: options

New in version 2.1: uuid_subtype attribute

See also

The MongoDB documentation on

collections

c[name] || c.name

Get the name sub-collection of Collection c.

Raises InvalidName if an invalid collection name is used.

full_name

The full name of this Collection.

The full name is of the form database_name.collection_name.

name

The name of this Collection.

database

The Database that this Collection is a part of.

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

read_concern

Read only access to the ReadConcern of this instance.

New in version 3.2.

with_options(codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a clone of this collection changing the specified settings.

>>> coll1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> coll2 = coll1.with_options(read_preference=ReadPreference.SECONDARY)
>>> coll1.read_preference
Primary()
>>> coll2.read_preference
Secondary(tag_sets=None)
Parameters:
bulk_write(requests, ordered=True, bypass_document_validation=False, session=None)

Send a batch of write operations to the server.

Requests are passed as a list of write operation instances ( InsertOne, UpdateOne, UpdateMany, ReplaceOne, DeleteOne, or DeleteMany).

>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634ef')}
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')}
>>> # DeleteMany, UpdateOne, and UpdateMany are also available.
...
>>> from pymongo import InsertOne, DeleteOne, ReplaceOne
>>> requests = [InsertOne({'y': 1}), DeleteOne({'x': 1}),
...             ReplaceOne({'w': 1}, {'z': 1}, upsert=True)]
>>> result = db.test.bulk_write(requests)
>>> result.inserted_count
1
>>> result.deleted_count
1
>>> result.modified_count
0
>>> result.upserted_ids
{2: ObjectId('54f62ee28891e756a6e1abd5')}
>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': ObjectId('54f62e60fba5226811f634f0')}
{u'y': 1, u'_id': ObjectId('54f62ee2fba5226811f634f1')}
{u'z': 1, u'_id': ObjectId('54f62ee28891e756a6e1abd5')}
Parameters:
  • requests: A list of write operations (see examples above).
  • ordered (optional): If True (the default) requests will be performed on the server serially, in the order provided. If an error occurs all remaining operations are aborted. If False requests will be performed on the server in arbitrary order, possibly in parallel, and all operations will be attempted.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • session (optional): a ClientSession.
Returns:

An instance of BulkWriteResult.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

insert_one(document, bypass_document_validation=False, session=None)

Insert a single document.

>>> db.test.count_documents({'x': 1})
0
>>> result = db.test.insert_one({'x': 1})
>>> result.inserted_id
ObjectId('54f112defba522406c9cc208')
>>> db.test.find_one({'x': 1})
{u'x': 1, u'_id': ObjectId('54f112defba522406c9cc208')}
Parameters:
  • document: The document to insert. Must be a mutable mapping type. If the document does not have an _id field one will be added automatically.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

insert_many(documents, ordered=True, bypass_document_validation=False, session=None)

Insert an iterable of documents.

>>> db.test.count_documents({})
0
>>> result = db.test.insert_many([{'x': i} for i in range(2)])
>>> result.inserted_ids
[ObjectId('54f113fffba522406c9cc20e'), ObjectId('54f113fffba522406c9cc20f')]
>>> db.test.count_documents({})
2
Parameters:
  • documents: A iterable of documents to insert.
  • ordered (optional): If True (the default) documents will be inserted on the server serially, in the order provided. If an error occurs all remaining inserts are aborted. If False, documents will be inserted on the server in arbitrary order, possibly in parallel, and all document inserts will be attempted.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • session (optional): a ClientSession.
Returns:

An instance of InsertManyResult.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

replace_one(filter, replacement, upsert=False, bypass_document_validation=False, collation=None, session=None)

Replace a single document matching the filter.

>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': ObjectId('54f4c5befba5220aa4d6dee7')}
>>> result = db.test.replace_one({'x': 1}, {'y': 1})
>>> result.matched_count
1
>>> result.modified_count
1
>>> for doc in db.test.find({}):
...     print(doc)
...
{u'y': 1, u'_id': ObjectId('54f4c5befba5220aa4d6dee7')}

The upsert option can be used to insert a new document if a matching document does not exist.

>>> result = db.test.replace_one({'x': 1}, {'x': 1}, True)
>>> result.matched_count
0
>>> result.modified_count
0
>>> result.upserted_id
ObjectId('54f11e5c8891e756a6e1abd4')
>>> db.test.find_one({'x': 1})
{u'x': 1, u'_id': ObjectId('54f11e5c8891e756a6e1abd4')}
Parameters:
  • filter: A query that matches the document to replace.
  • replacement: The new document.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

update_one(filter, update, upsert=False, bypass_document_validation=False, collation=None, array_filters=None, session=None)

Update a single document matching the filter.

>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> result = db.test.update_one({'x': 1}, {'$inc': {'x': 3}})
>>> result.matched_count
1
>>> result.modified_count
1
>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 4, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
Parameters:
  • filter: A query that matches the document to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.9: Added the ability to accept a pipeline as the update.

Changed in version 3.6: Added the array_filters and session parameters.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

update_many(filter, update, upsert=False, array_filters=None, bypass_document_validation=False, collation=None, session=None)

Update one or more documents that match the filter.

>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> result = db.test.update_many({'x': 1}, {'$inc': {'x': 3}})
>>> result.matched_count
3
>>> result.modified_count
3
>>> for doc in db.test.find():
...     print(doc)
...
{u'x': 4, u'_id': 0}
{u'x': 4, u'_id': 1}
{u'x': 4, u'_id': 2}
Parameters:
  • filter: A query that matches the documents to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • bypass_document_validation (optional): If True, allows the write to opt-out of document level validation. Default is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
  • session (optional): a ClientSession.
Returns:

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.9: Added the ability to accept a pipeline as the update.

Changed in version 3.6: Added array_filters and session parameters.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Added bypass_document_validation support

New in version 3.0.

delete_one(filter, collation=None, session=None)

Delete a single document matching the filter.

>>> db.test.count_documents({'x': 1})
3
>>> result = db.test.delete_one({'x': 1})
>>> result.deleted_count
1
>>> db.test.count_documents({'x': 1})
2
Parameters:
  • filter: A query that matches the document to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession.
Returns:

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

New in version 3.0.

delete_many(filter, collation=None, session=None)

Delete one or more documents matching the filter.

>>> db.test.count_documents({'x': 1})
3
>>> result = db.test.delete_many({'x': 1})
>>> result.deleted_count
3
>>> db.test.count_documents({'x': 1})
0
Parameters:
  • filter: A query that matches the documents to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • session (optional): a ClientSession.
Returns:

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

New in version 3.0.

aggregate(pipeline, session=None, **kwargs)

Perform an aggregation using the aggregation framework on this collection.

All optional aggregate command parameters should be passed as keyword arguments to this method. Valid options include, but are not limited to:

  • allowDiskUse (bool): Enables writing to temporary files. When set to True, aggregation stages can write data to the _tmp subdirectory of the –dbpath directory. The default is False.
  • maxTimeMS (int): The maximum amount of time to allow the operation to run in milliseconds.
  • batchSize (int): The maximum number of documents to return per batch. Ignored if the connected mongod or mongos does not support returning aggregate results using a cursor, or useCursor is False.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • useCursor (bool): Deprecated. Will be removed in PyMongo 4.0.

The aggregate() method obeys the read_preference of this Collection, except when $out or $merge are used, in which case PRIMARY is used.

Note

This method does not support the ‘explain’ option. Please use command() instead. An example is included in the Aggregation Framework documentation.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Parameters:
  • pipeline: a list of aggregation pipeline stages
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.
Returns:

A CommandCursor over the result set.

Changed in version 3.9: Apply this collection’s read concern to pipelines containing the $out stage when connected to MongoDB >= 4.2. Added support for the $merge pipeline stage. Aggregations that write always use read preference PRIMARY.

Changed in version 3.6: Added the session parameter. Added the maxAwaitTimeMS option. Deprecated the useCursor option.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4. Support the collation option.

Changed in version 3.0: The aggregate() method always returns a CommandCursor. The pipeline argument must be a list.

Changed in version 2.7: When the cursor option is used, return CommandCursor instead of Cursor.

Changed in version 2.6: Added cursor support.

New in version 2.3.

aggregate_raw_batches(pipeline, **kwargs)

Perform an aggregation and retrieve batches of raw BSON.

Similar to the aggregate() method but returns a RawBatchCursor.

This example demonstrates how to work with raw batches, but in practice raw batches should be passed to an external library that can decode BSON into another data type, rather than used with PyMongo’s bson module.

>>> import bson
>>> cursor = db.test.aggregate_raw_batches([
...     {'$project': {'x': {'$multiply': [2, '$x']}}}])
>>> for batch in cursor:
...     print(bson.decode_all(batch))

Note

aggregate_raw_batches does not support sessions or auto encryption.

New in version 3.6.

watch(pipeline=None, full_document=None, resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, start_at_operation_time=None, session=None, start_after=None)

Watch changes on this collection.

Performs an aggregation with an implicit initial $changeStream stage and returns a CollectionChangeStream cursor which iterates over changes on this collection.

Introduced in MongoDB 3.6.

with db.collection.watch() as stream:
    for change in stream:
        print(change)

The CollectionChangeStream iterable blocks until the next change document is returned or an error is raised. If the next() method encounters a network error when retrieving a batch from the server, it will automatically attempt to recreate the cursor such that no change events are missed. Any error encountered during the resume attempt indicates there may be an outage and will be raised.

try:
    with db.collection.watch(
            [{'$match': {'operationType': 'insert'}}]) as stream:
        for insert_change in stream:
            print(insert_change)
except pymongo.errors.PyMongoError:
    # The ChangeStream encountered an unrecoverable error or the
    # resume attempt failed to recreate the cursor.
    logging.error('...')

For a precise description of the resume process see the change streams specification.

Note

Using this helper method is preferred to directly calling aggregate() with a $changeStream stage, for the purpose of supporting resumability.

Warning

This Collection’s read_concern must be ReadConcern("majority") in order to use the $changeStream stage.

Parameters:
  • pipeline (optional): A list of aggregation pipeline stages to append to an initial $changeStream stage. Not all pipeline stages are valid after a $changeStream stage, see the MongoDB documentation on change streams for the supported stages.
  • full_document (optional): The fullDocument to pass as an option to the $changeStream stage. Allowed values: ‘updateLookup’. When set to ‘updateLookup’, the change notification for partial updates will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred.
  • resume_after (optional): A resume token. If provided, the change stream will start returning changes that occur directly after the operation specified in the resume token. A resume token is the _id value of a change document.
  • max_await_time_ms (optional): The maximum time in milliseconds for the server to wait for changes before responding to a getMore operation.
  • batch_size (optional): The maximum number of documents to return per batch.
  • collation (optional): The Collation to use for the aggregation.
  • start_at_operation_time (optional): If provided, the resulting change stream will only return changes that occurred at or after the specified Timestamp. Requires MongoDB >= 4.0.
  • session (optional): a ClientSession.
  • start_after (optional): The same as resume_after except that start_after can resume notifications after an invalidate event. This option and resume_after are mutually exclusive.
Returns:

A CollectionChangeStream cursor.

Changed in version 3.9: Added the start_after parameter.

Changed in version 3.7: Added the start_at_operation_time parameter.

New in version 3.6.

See also

The MongoDB documentation on

changeStreams

find(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None, session=None)

Query the database.

The filter argument is a prototype document that all results must match. For example:

>>> db.test.find({"hello": "world"})

only matches documents that have a key “hello” with value “world”. Matches can have other keys in addition to “hello”. The projection argument is used to specify a subset of fields that should be included in the result documents. By limiting results to a certain subset of fields you can cut down on network traffic and decoding time.

Raises TypeError if any of the arguments are of improper type. Returns an instance of Cursor corresponding to this query.

The find() method obeys the read_preference of this Collection.

Parameters:
  • filter (optional): a SON object specifying elements which must be present for a document to be included in the result set
  • projection (optional): a list of field names that should be returned in the result set or a dict specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a dict to exclude fields from the result (e.g. projection={‘_id’: False}).
  • session (optional): a ClientSession.
  • skip (optional): the number of documents to omit (from the start of the result set) when returning the results
  • limit (optional): the maximum number of results to return. A limit of 0 (the default) is equivalent to setting no limit.
  • no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.
  • cursor_type (optional): the type of cursor to return. The valid options are defined by CursorType:
    • NON_TAILABLE - the result of this find call will return a standard cursor over the result set.
    • TAILABLE - the result of this find call will be a tailable cursor - tailable cursors are only for use with capped collections. They are not closed when the last data is retrieved but are kept open and the cursor location marks the final document position. If more data is received iteration of the cursor will continue from the last document received. For details, see the tailable cursor documentation.
    • TAILABLE_AWAIT - the result of this find call will be a tailable cursor with the await flag set. The server will wait for a few seconds after returning the full result set so that it can capture and return additional data added during the query.
    • EXHAUST - the result of this find call will be an exhaust cursor. MongoDB will stream batched results to the client without waiting for the client to request each batch, reducing latency. See notes on compatibility below.
  • sort (optional): a list of (key, direction) pairs specifying the sort order for this query. See sort() for details.
  • allow_partial_results (optional): if True, mongos will return partial results if some shards are down instead of returning an error.
  • oplog_replay (optional): If True, set the oplogReplay query flag.
  • batch_size (optional): Limits the number of documents returned in a single batch.
  • manipulate (optional): DEPRECATED - If True (the default), apply any outgoing SON manipulators before returning.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • return_key (optional): If True, return only the index keys in each document.
  • show_record_id (optional): If True, adds a field $recordId in each document with the storage engine’s internal record identifier.
  • snapshot (optional): DEPRECATED - If True, prevents the cursor from returning a document more than once because of an intervening write operation.
  • hint (optional): An index, in the same format as passed to create_index() (e.g. [('field', ASCENDING)]). Pass this as an alternative to calling hint() on the cursor to tell Mongo the proper index to use for the query.
  • max_time_ms (optional): Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. Pass this as an alternative to calling max_time_ms() on the cursor.
  • max_scan (optional): DEPRECATED - The maximum number of documents to scan. Pass this as an alternative to calling max_scan() on the cursor.
  • min (optional): A list of field, limit pairs specifying the inclusive lower bound for all keys of a specific index in order. Pass this as an alternative to calling min() on the cursor. hint must also be passed to ensure the query utilizes the correct index.
  • max (optional): A list of field, limit pairs specifying the exclusive upper bound for all keys of a specific index in order. Pass this as an alternative to calling max() on the cursor. hint must also be passed to ensure the query utilizes the correct index.
  • comment (optional): A string to attach to the query to help interpret and trace the operation in the server logs and in profile data. Pass this as an alternative to calling comment() on the cursor.
  • modifiers (optional): DEPRECATED - A dict specifying additional MongoDB query modifiers. Use the keyword arguments listed above instead.

Note

There are a number of caveats to using EXHAUST as cursor_type:

  • The limit option can not be used with an exhaust cursor.
  • Exhaust cursors are not supported by mongos and can not be used with a sharded cluster.
  • A Cursor instance created with the EXHAUST cursor_type requires an exclusive socket connection to MongoDB. If the Cursor is discarded without being completely iterated the underlying socket connection will be closed and discarded without being returned to the connection pool.

Changed in version 3.7: Deprecated the snapshot option, which is deprecated in MongoDB 3.6 and removed in MongoDB 4.0. Deprecated the max_scan option. Support for this option is deprecated in MongoDB 4.0. Use max_time_ms instead to limit server side execution time.

Changed in version 3.6: Added session parameter.

Changed in version 3.5: Added the options return_key, show_record_id, snapshot, hint, max_time_ms, max_scan, min, max, and comment. Deprecated the option modifiers.

Changed in version 3.4: Support the collation option.

Changed in version 3.0: Changed the parameter names spec, fields, timeout, and partial to filter, projection, no_cursor_timeout, and allow_partial_results respectively. Added the cursor_type, oplog_replay, and modifiers options. Removed the network_timeout, read_preference, tag_sets, secondary_acceptable_latency_ms, max_scan, snapshot, tailable, await_data, exhaust, as_class, and slave_okay parameters. Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object. Soft deprecated the manipulate option.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

New in version 2.3: The tag_sets and secondary_acceptable_latency_ms parameters.

See also

The MongoDB documentation on

find

find_raw_batches(filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None)

Query the database and retrieve batches of raw BSON.

Similar to the find() method but returns a RawBatchCursor.

This example demonstrates how to work with raw batches, but in practice raw batches should be passed to an external library that can decode BSON into another data type, rather than used with PyMongo’s bson module.

>>> import bson
>>> cursor = db.test.find_raw_batches()
>>> for batch in cursor:
...     print(bson.decode_all(batch))

Note

find_raw_batches does not support sessions or auto encryption.

New in version 3.6.

find_one(filter=None, *args, **kwargs)

Get a single document from the database.

All arguments to find() are also valid arguments for find_one(), although any limit argument will be ignored. Returns a single document, or None if no matching document is found.

The find_one() method obeys the read_preference of this Collection.

Parameters:
  • filter (optional): a dictionary specifying the query to be performed OR any other type to be used as the value for a query for "_id".

  • *args (optional): any additional positional arguments are the same as the arguments to find().

  • **kwargs (optional): any additional keyword arguments are the same as the arguments to find().

    >>> collection.find_one(max_time_ms=100)
    
find_one_and_delete(filter, projection=None, sort=None, session=None, **kwargs)

Finds a single document and deletes it, returning the document.

>>> db.test.count_documents({'x': 1})
2
>>> db.test.find_one_and_delete({'x': 1})
{u'x': 1, u'_id': ObjectId('54f4e12bfba5220aa4d6dee8')}
>>> db.test.count_documents({'x': 1})
1

If multiple documents match filter, a sort can be applied.

>>> for doc in db.test.find({'x': 1}):
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> db.test.find_one_and_delete(
...     {'x': 1}, sort=[('_id', pymongo.DESCENDING)])
{u'x': 1, u'_id': 2}

The projection option can be used to limit the fields returned.

>>> db.test.find_one_and_delete({'x': 1}, projection={'_id': False})
{u'x': 1}
Parameters:
  • filter: A query that matches the document to delete.
  • projection (optional): a list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a mapping to exclude fields from the result (e.g. projection={‘_id’: False}).
  • sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is deleted.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).

Changed in version 3.6: Added session parameter.

Changed in version 3.2: Respects write concern.

Warning

Starting in PyMongo 3.2, this command uses the WriteConcern of this Collection when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.

Changed in version 3.4: Added the collation option.

New in version 3.0.

find_one_and_replace(filter, replacement, projection=None, sort=None, return_document=ReturnDocument.BEFORE, session=None, **kwargs)

Finds a single document and replaces it, returning either the original or the replaced document.

The find_one_and_replace() method differs from find_one_and_update() by replacing the document matched by filter, rather than modifying the existing document.

>>> for doc in db.test.find({}):
...     print(doc)
...
{u'x': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
>>> db.test.find_one_and_replace({'x': 1}, {'y': 1})
{u'x': 1, u'_id': 0}
>>> for doc in db.test.find({}):
...     print(doc)
...
{u'y': 1, u'_id': 0}
{u'x': 1, u'_id': 1}
{u'x': 1, u'_id': 2}
Parameters:
  • filter: A query that matches the document to replace.
  • replacement: The replacement document.
  • projection (optional): A list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a mapping to exclude fields from the result (e.g. projection={‘_id’: False}).
  • sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is replaced.
  • upsert (optional): When True, inserts a new document if no document matches the query. Defaults to False.
  • return_document: If ReturnDocument.BEFORE (the default), returns the original document before it was replaced, or None if no document matches. If ReturnDocument.AFTER, returns the replaced or inserted document.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Respects write concern.

Warning

Starting in PyMongo 3.2, this command uses the WriteConcern of this Collection when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.

New in version 3.0.

find_one_and_update(filter, update, projection=None, sort=None, return_document=ReturnDocument.BEFORE, array_filters=None, session=None, **kwargs)

Finds a single document and updates it, returning either the original or the updated document.

>>> db.test.find_one_and_update(
...    {'_id': 665}, {'$inc': {'count': 1}, '$set': {'done': True}})
{u'_id': 665, u'done': False, u'count': 25}}

Returns None if no document matches the filter.

>>> db.test.find_one_and_update(
...    {'_exists': False}, {'$inc': {'count': 1}})

When the filter matches, by default find_one_and_update() returns the original version of the document before the update was applied. To return the updated (or inserted in the case of upsert) version of the document instead, use the return_document option.

>>> from pymongo import ReturnDocument
>>> db.example.find_one_and_update(
...     {'_id': 'userid'},
...     {'$inc': {'seq': 1}},
...     return_document=ReturnDocument.AFTER)
{u'_id': u'userid', u'seq': 1}

You can limit the fields returned with the projection option.

>>> db.example.find_one_and_update(
...     {'_id': 'userid'},
...     {'$inc': {'seq': 1}},
...     projection={'seq': True, '_id': False},
...     return_document=ReturnDocument.AFTER)
{u'seq': 2}

The upsert option can be used to create the document if it doesn’t already exist.

>>> db.example.delete_many({}).deleted_count
1
>>> db.example.find_one_and_update(
...     {'_id': 'userid'},
...     {'$inc': {'seq': 1}},
...     projection={'seq': True, '_id': False},
...     upsert=True,
...     return_document=ReturnDocument.AFTER)
{u'seq': 1}

If multiple documents match filter, a sort can be applied.

>>> for doc in db.test.find({'done': True}):
...     print(doc)
...
{u'_id': 665, u'done': True, u'result': {u'count': 26}}
{u'_id': 701, u'done': True, u'result': {u'count': 17}}
>>> db.test.find_one_and_update(
...     {'done': True},
...     {'$set': {'final': True}},
...     sort=[('_id', pymongo.DESCENDING)])
{u'_id': 701, u'done': True, u'result': {u'count': 17}}
Parameters:
  • filter: A query that matches the document to update.
  • update: The update operations to apply.
  • projection (optional): A list of field names that should be returned in the result document or a mapping specifying the fields to include or exclude. If projection is a list “_id” will always be returned. Use a dict to exclude fields from the result (e.g. projection={‘_id’: False}).
  • sort (optional): a list of (key, direction) pairs specifying the sort order for the query. If multiple documents match the query, they are sorted and the first is updated.
  • upsert (optional): When True, inserts a new document if no document matches the query. Defaults to False.
  • return_document: If ReturnDocument.BEFORE (the default), returns the original document before it was updated. If ReturnDocument.AFTER, returns the updated or inserted document.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional command arguments can be passed as keyword arguments (for example maxTimeMS can be used with recent server versions).

Changed in version 3.9: Added the ability to accept a pipeline as the update.

Changed in version 3.6: Added the array_filters and session options.

Changed in version 3.4: Added the collation option.

Changed in version 3.2: Respects write concern.

Warning

Starting in PyMongo 3.2, this command uses the WriteConcern of this Collection when connected to MongoDB >= 3.2. Note that using an elevated write concern with this command may be slower compared to using the default write concern.

New in version 3.0.

count_documents(filter, session=None, **kwargs)

Count the number of documents in this collection.

Note

For a fast count of the total documents in a collection see estimated_document_count().

The count_documents() method is supported in a transaction.

All optional parameters should be passed as keyword arguments to this method. Valid options include:

  • skip (int): The number of matching documents to skip before returning results.
  • limit (int): The maximum number of documents to count. Must be a positive integer. If not provided, no limit is imposed.
  • maxTimeMS (int): The maximum amount of time to allow this operation to run, in milliseconds.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • hint (string or list of tuples): The index to use. Specify either the index name as a string or the index specification as a list of tuples (e.g. [(‘a’, pymongo.ASCENDING), (‘b’, pymongo.ASCENDING)]). This option is only supported on MongoDB 3.6 and above.

The count_documents() method obeys the read_preference of this Collection.

Note

When migrating from count() to count_documents() the following query operators must be replaced:

Operator Replacement
$where $expr
$near $geoWithin with $center
$nearSphere $geoWithin with $centerSphere

$expr requires MongoDB 3.6+

Parameters:
  • filter (required): A query document that selects which documents to count in the collection. Can be an empty document to count all documents.
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.

New in version 3.7.

estimated_document_count(**kwargs)

Get an estimate of the number of documents in this collection using collection metadata.

The estimated_document_count() method is not supported in a transaction.

All optional parameters should be passed as keyword arguments to this method. Valid options include:

  • maxTimeMS (int): The maximum amount of time to allow this operation to run, in milliseconds.
Parameters:
  • **kwargs (optional): See list of options above.

New in version 3.7.

distinct(key, filter=None, session=None, **kwargs)

Get a list of distinct values for key among all documents in this collection.

Raises TypeError if key is not an instance of basestring (str in python 3).

All optional distinct parameters should be passed as keyword arguments to this method. Valid options include:

  • maxTimeMS (int): The maximum amount of time to allow the count command to run, in milliseconds.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

The distinct() method obeys the read_preference of this Collection.

Parameters:
  • key: name of the field for which we want to get the distinct values
  • filter (optional): A query document that specifies the documents from which to retrieve the distinct values.
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Support the collation option.

create_index(keys, session=None, **kwargs)

Creates an index on this collection.

Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of basestring (str in python 3), and the direction(s) must be one of (ASCENDING, DESCENDING, GEO2D, GEOHAYSTACK, GEOSPHERE, HASHED, TEXT).

To create a single key ascending index on the key 'mike' we just use a string argument:

>>> my_collection.create_index("mike")

For a compound index on 'mike' descending and 'eliot' ascending we need to use a list of tuples:

>>> my_collection.create_index([("mike", pymongo.DESCENDING),
...                             ("eliot", pymongo.ASCENDING)])

All optional index creation parameters should be passed as keyword arguments to this method. For example:

>>> my_collection.create_index([("mike", pymongo.DESCENDING)],
...                            background=True)

Valid options include, but are not limited to:

  • name: custom name to use for this index - if none is given, a name will be generated.
  • unique: if True creates a uniqueness constraint on the index.
  • background: if True this index should be created in the background.
  • sparse: if True, omit from the index any documents that lack the indexed field.
  • bucketSize: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
  • min: minimum value for keys in a GEO2D index.
  • max: maximum value for keys in a GEO2D index.
  • expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
  • partialFilterExpression: A document that specifies a filter for a partial index. Requires server version >=3.2.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • wildcardProjection: Allows users to include or exclude specific field paths from a wildcard index using the { “$**” : 1} key pattern. Requires server version >= 4.2.

See the MongoDB documentation for a full list of supported options by server version.

Warning

dropDups is not supported by MongoDB 3.0 or newer. The option is silently ignored by the server and unique index builds using the option will fail if a duplicate value is detected.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Parameters:
  • keys: a single key or a list of (key, direction) pairs specifying the index to create
  • session (optional): a ClientSession.
  • **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments

Changed in version 3.6: Added session parameter. Added support for passing maxTimeMS in kwargs.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4. Support the collation option.

Changed in version 3.2: Added partialFilterExpression to support partial indexes.

Changed in version 3.0: Renamed key_or_list to keys. Removed the cache_for option. create_index() no longer caches index names. Removed support for the drop_dups and bucket_size aliases.

See also

The MongoDB documentation on

indexes

create_indexes(indexes, session=None, **kwargs)

Create one or more indexes on this collection.

>>> from pymongo import IndexModel, ASCENDING, DESCENDING
>>> index1 = IndexModel([("hello", DESCENDING),
...                      ("world", ASCENDING)], name="hello_world")
>>> index2 = IndexModel([("goodbye", DESCENDING)])
>>> db.test.create_indexes([index1, index2])
["hello_world", "goodbye_-1"]
Parameters:
  • indexes: A list of IndexModel instances.
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.

Note

create_indexes uses the createIndexes command introduced in MongoDB 2.6 and cannot be used with earlier versions.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

New in version 3.0.

drop_index(index_or_name, session=None, **kwargs)

Drops the specified index on this collection.

Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error (e.g. trying to drop an index that does not exist). index_or_name can be either an index name (as returned by create_index), or an index specifier (as passed to create_index). An index specifier should be a list of (key, direction) pairs. Raises TypeError if index is not an instance of (str, unicode, list).

Warning

if a custom name was used on index creation (by passing the name parameter to create_index() or ensure_index()) the index must be dropped by name.

Parameters:
  • index_or_name: index (or name of index) to drop
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

drop_indexes(session=None, **kwargs)

Drops all indexes on this collection.

Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error.

Parameters:
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the createIndexes command (like maxTimeMS) can be passed as keyword arguments.

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

reindex(session=None, **kwargs)

Rebuilds all indexes on this collection.

Parameters:
  • session (optional): a ClientSession.
  • **kwargs (optional): optional arguments to the reIndex command (like maxTimeMS) can be passed as keyword arguments.

Warning

reindex blocks all other operations (indexes are built in the foreground) and will be slow for large collections.

Changed in version 3.6: Added session parameter. Added support for arbitrary keyword arguments.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

Changed in version 3.5: We no longer apply this collection’s write concern to this operation. MongoDB 3.4 silently ignored the write concern. MongoDB 3.6+ returns an error if we include the write concern.

list_indexes(session=None)

Get a cursor over the index documents for this collection.

>>> for index in db.test.list_indexes():
...     print(index)
...
SON([(u'v', 1), (u'key', SON([(u'_id', 1)])),
     (u'name', u'_id_'), (u'ns', u'test.test')])
Parameters:
Returns:

An instance of CommandCursor.

Changed in version 3.6: Added session parameter.

New in version 3.0.

index_information(session=None)

Get information on this collection’s indexes.

Returns a dictionary where the keys are index names (as returned by create_index()) and the values are dictionaries containing information about each index. The dictionary is guaranteed to contain at least a single key, "key" which is a list of (key, direction) pairs specifying the index (as passed to create_index()). It will also contain any other metadata about the indexes, except for the "ns" and "name" keys, which are cleaned. Example output might look like this:

>>> db.test.create_index("x", unique=True)
u'x_1'
>>> db.test.index_information()
{u'_id_': {u'key': [(u'_id', 1)]},
 u'x_1': {u'unique': True, u'key': [(u'x', 1)]}}
Parameters:

Changed in version 3.6: Added session parameter.

drop(session=None)

Alias for drop_collection().

Parameters:

The following two calls are equivalent:

>>> db.foo.drop()
>>> db.drop_collection("foo")

Changed in version 3.7: drop() now respects this Collection’s write_concern.

Changed in version 3.6: Added session parameter.

rename(new_name, session=None, **kwargs)

Rename this collection.

If operating in auth mode, client must be authorized as an admin to perform this operation. Raises TypeError if new_name is not an instance of basestring (str in python 3). Raises InvalidName if new_name is not a valid collection name.

Parameters:
  • new_name: new name for this collection
  • session (optional): a ClientSession.
  • **kwargs (optional): additional arguments to the rename command may be passed as keyword arguments to this helper method (i.e. dropTarget=True)

Note

The write_concern of this collection is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

options(session=None)

Get the options set on this collection.

Returns a dictionary of options and their values - see create_collection() for more information on the possible options. Returns an empty dictionary if the collection has not been created yet.

Parameters:

Changed in version 3.6: Added session parameter.

map_reduce(map, reduce, out, full_response=False, session=None, **kwargs)

Perform a map/reduce operation on this collection.

If full_response is False (default) returns a Collection instance containing the results of the operation. Otherwise, returns the full response from the server to the map reduce command.

Parameters:
  • map: map function (as a JavaScript string)

  • reduce: reduce function (as a JavaScript string)

  • out: output collection name or out object (dict). See the map reduce command documentation for available options. Note: out options are order sensitive. SON can be used to specify multiple options. e.g. SON([(‘replace’, <collection name>), (‘db’, <database name>)])

  • full_response (optional): if True, return full response to this command - otherwise just return the result collection

  • session (optional): a ClientSession.

  • **kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:

    >>> db.test.map_reduce(map, reduce, "myresults", limit=2)
    

Note

The map_reduce() method does not obey the read_preference of this Collection. To run mapReduce on a secondary use the inline_map_reduce() method instead.

Note

The write_concern of this collection is automatically applied to this operation (if the output is not inline) when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Apply this collection’s write concern automatically to this operation when connected to MongoDB >= 3.4.

Changed in version 3.4: Added the collation option.

Changed in version 2.2: Removed deprecated arguments: merge_output and reduce_output

See also

The MongoDB documentation on

mapreduce

inline_map_reduce(map, reduce, full_response=False, session=None, **kwargs)

Perform an inline map/reduce operation on this collection.

Perform the map/reduce operation on the server in RAM. A result collection is not created. The result set is returned as a list of documents.

If full_response is False (default) returns the result documents in a list. Otherwise, returns the full response from the server to the map reduce command.

The inline_map_reduce() method obeys the read_preference of this Collection.

Parameters:
  • map: map function (as a JavaScript string)

  • reduce: reduce function (as a JavaScript string)

  • full_response (optional): if True, return full response to this command - otherwise just return the result collection

  • session (optional): a ClientSession.

  • **kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:

    >>> db.test.inline_map_reduce(map, reduce, limit=2)
    

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

parallel_scan(num_cursors, session=None, **kwargs)

DEPRECATED: Scan this entire collection in parallel.

Returns a list of up to num_cursors cursors that can be iterated concurrently. As long as the collection is not modified during scanning, each document appears once in one of the cursors result sets.

For example, to process each document in a collection using some thread-safe process_document() function:

>>> def process_cursor(cursor):
...     for document in cursor:
...     # Some thread-safe processing function:
...     process_document(document)
>>>
>>> # Get up to 4 cursors.
...
>>> cursors = collection.parallel_scan(4)
>>> threads = [
...     threading.Thread(target=process_cursor, args=(cursor,))
...     for cursor in cursors]
>>>
>>> for thread in threads:
...     thread.start()
>>>
>>> for thread in threads:
...     thread.join()
>>>
>>> # All documents have now been processed.

The parallel_scan() method obeys the read_preference of this Collection.

Parameters:
  • num_cursors: the number of cursors to return
  • session (optional): a ClientSession.
  • **kwargs: additional options for the parallelCollectionScan command can be passed as keyword arguments.

Note

Requires server version >= 2.5.5.

Changed in version 3.7: Deprecated.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added back support for arbitrary keyword arguments. MongoDB 3.4 adds support for maxTimeMS as an option to the parallelCollectionScan command.

Changed in version 3.0: Removed support for arbitrary keyword arguments, since the parallelCollectionScan command has no optional arguments.

initialize_unordered_bulk_op(bypass_document_validation=False)

DEPRECATED - Initialize an unordered batch of write operations.

Operations will be performed on the server in arbitrary order, possibly in parallel. All operations will be attempted.

Parameters:
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.

Returns a BulkOperationBuilder instance.

See Unordered Bulk Write Operations for examples.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.5: Deprecated. Use bulk_write() instead.

Changed in version 3.2: Added bypass_document_validation support

New in version 2.7.

initialize_ordered_bulk_op(bypass_document_validation=False)

DEPRECATED - Initialize an ordered batch of write operations.

Operations will be performed on the server serially, in the order provided. If an error occurs all remaining operations are aborted.

Parameters:
  • bypass_document_validation: (optional) If True, allows the write to opt-out of document level validation. Default is False.

Returns a BulkOperationBuilder instance.

See Ordered Bulk Write Operations for examples.

Note

bypass_document_validation requires server version >= 3.2

Changed in version 3.5: Deprecated. Use bulk_write() instead.

Changed in version 3.2: Added bypass_document_validation support

New in version 2.7.

group(key, condition, initial, reduce, finalize=None, **kwargs)

Perform a query similar to an SQL group by operation.

DEPRECATED - The group command was deprecated in MongoDB 3.4. The group() method is deprecated and will be removed in PyMongo 4.0. Use aggregate() with the $group stage or map_reduce() instead.

Changed in version 3.5: Deprecated the group method.

Changed in version 3.4: Added the collation option.

Changed in version 2.2: Removed deprecated argument: command

count(filter=None, session=None, **kwargs)

DEPRECATED - Get the number of documents in this collection.

The count() method is deprecated and not supported in a transaction. Please use count_documents() or estimated_document_count() instead.

All optional count parameters should be passed as keyword arguments to this method. Valid options include:

  • skip (int): The number of matching documents to skip before returning results.
  • limit (int): The maximum number of documents to count. A limit of 0 (the default) is equivalent to setting no limit.
  • maxTimeMS (int): The maximum amount of time to allow the count command to run, in milliseconds.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • hint (string or list of tuples): The index to use. Specify either the index name as a string or the index specification as a list of tuples (e.g. [(‘a’, pymongo.ASCENDING), (‘b’, pymongo.ASCENDING)]).

The count() method obeys the read_preference of this Collection.

Note

When migrating from count() to count_documents() the following query operators must be replaced:

Operator Replacement
$where $expr
$near $geoWithin with $center
$nearSphere $geoWithin with $centerSphere

$expr requires MongoDB 3.6+

Parameters:
  • filter (optional): A query document that selects which documents to count in the collection.
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.

Changed in version 3.7: Deprecated.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Support the collation option.

insert(doc_or_docs, manipulate=True, check_keys=True, continue_on_error=False, **kwargs)

Insert a document(s) into this collection.

DEPRECATED - Use insert_one() or insert_many() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

save(to_save, manipulate=True, check_keys=True, **kwargs)

Save a document in this collection.

DEPRECATED - Use insert_one() or replace_one() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

update(spec, document, upsert=False, manipulate=False, multi=False, check_keys=True, **kwargs)

Update a document(s) in this collection.

DEPRECATED - Use replace_one(), update_one(), or update_many() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

remove(spec_or_id=None, multi=True, **kwargs)

Remove a document(s) from this collection.

DEPRECATED - Use delete_one() or delete_many() instead.

Changed in version 3.0: Removed the safe parameter. Pass w=0 for unacknowledged write operations.

find_and_modify(query={}, update=None, upsert=False, sort=None, full_response=False, manipulate=False, **kwargs)

Update and return an object.

DEPRECATED - Use find_one_and_delete(), find_one_and_replace(), or find_one_and_update() instead.

ensure_index(key_or_list, cache_for=300, **kwargs)

DEPRECATED - Ensures that an index exists on this collection.

Changed in version 3.0: DEPRECATED

command_cursor – Tools for iterating over MongoDB command results

CommandCursor class to iterate over command results.

class pymongo.command_cursor.CommandCursor(collection, cursor_info, address, retrieved=0, batch_size=0, max_await_time_ms=None, session=None, explicit_session=False)

Create a new command cursor.

The parameter ‘retrieved’ is unused.

address

The (host, port) of the server used, or None.

New in version 3.0.

alive

Does this cursor have the potential to return more data?

Even if alive is True, next() can raise StopIteration. Best to use a for loop:

for doc in collection.aggregate(pipeline):
    print(doc)

Note

alive can be True while iterating a cursor from a failed server. In this case alive will return False after next() fails to retrieve the next batch of results from the server.

batch_size(batch_size)

Limits the number of documents returned in one batch. Each batch requires a round trip to the server. It can be adjusted to optimize performance and limit data transfer.

Note

batch_size can not override MongoDB’s internal limits on the amount of data it will return to the client in a single batch (i.e if you set batch size to 1,000,000,000, MongoDB will currently only return 4-16MB of results per batch).

Raises TypeError if batch_size is not an integer. Raises ValueError if batch_size is less than 0.

Parameters:
  • batch_size: The size of each batch of results requested.
close()

Explicitly close / kill this cursor.

cursor_id

Returns the id of the cursor.

next()

Advance the cursor.

session

The cursor’s ClientSession, or None.

New in version 3.6.

class pymongo.command_cursor.RawBatchCommandCursor(collection, cursor_info, address, retrieved=0, batch_size=0, max_await_time_ms=None, session=None, explicit_session=False)

Create a new cursor / iterator over raw batches of BSON data.

Should not be called directly by application developers - see aggregate_raw_batches() instead.

See also

The MongoDB documentation on

cursors

cursor – Tools for iterating over MongoDB query results

Cursor class to iterate over Mongo query results.

class pymongo.cursor.CursorType
NON_TAILABLE

The standard cursor type.

TAILABLE

The tailable cursor type.

Tailable cursors are only for use with capped collections. They are not closed when the last data is retrieved but are kept open and the cursor location marks the final document position. If more data is received iteration of the cursor will continue from the last document received.

TAILABLE_AWAIT

A tailable cursor with the await option set.

Creates a tailable cursor that will wait for a few seconds after returning the full result set so that it can capture and return additional data added during the query.

EXHAUST

An exhaust cursor.

MongoDB will stream batched results to the client without waiting for the client to request each batch, reducing latency.

class pymongo.cursor.Cursor(collection, filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, manipulate=True, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None)

Create a new cursor.

Should not be called directly by application developers - see find() instead.

See also

The MongoDB documentation on

cursors

c[index]

See __getitem__().

__getitem__(index)

Get a single document or a slice of documents from this cursor.

Raises InvalidOperation if this cursor has already been used.

To get a single document use an integral index, e.g.:

>>> db.test.find()[50]

An IndexError will be raised if the index is negative or greater than the amount of documents in this cursor. Any limit previously applied to this cursor will be ignored.

To get a slice of documents use a slice index, e.g.:

>>> db.test.find()[20:25]

This will return this cursor with a limit of 5 and skip of 20 applied. Using a slice index will override any prior limits or skips applied to this cursor (including those applied through previous calls to this method). Raises IndexError when the slice has a step, a negative start value, or a stop value less than or equal to the start value.

Parameters:
  • index: An integer or slice index to be applied to this cursor
add_option(mask)

Set arbitrary query flags using a bitmask.

To set the tailable flag: cursor.add_option(2)

address

The (host, port) of the server used, or None.

Changed in version 3.0: Renamed from “conn_id”.

alive

Does this cursor have the potential to return more data?

This is mostly useful with tailable cursors since they will stop iterating even though they may return more results in the future.

With regular cursors, simply use a for loop instead of alive:

for doc in collection.find():
    print(doc)

Note

Even if alive is True, next() can raise StopIteration. alive can also be True while iterating a cursor from a failed server. In this case alive will return False after next() fails to retrieve the next batch of results from the server.

batch_size(batch_size)

Limits the number of documents returned in one batch. Each batch requires a round trip to the server. It can be adjusted to optimize performance and limit data transfer.

Note

batch_size can not override MongoDB’s internal limits on the amount of data it will return to the client in a single batch (i.e if you set batch size to 1,000,000,000, MongoDB will currently only return 4-16MB of results per batch).

Raises TypeError if batch_size is not an integer. Raises ValueError if batch_size is less than 0. Raises InvalidOperation if this Cursor has already been used. The last batch_size applied to this cursor takes precedence.

Parameters:
  • batch_size: The size of each batch of results requested.
clone()

Get a clone of this cursor.

Returns a new Cursor instance with options matching those that have been set on the current instance. The clone will be completely unevaluated, even if the current instance has been partially or completely evaluated.

close()

Explicitly close / kill this cursor.

collation(collation)

Adds a Collation to this query.

This option is only supported on MongoDB 3.4 and above.

Raises TypeError if collation is not an instance of Collation or a dict. Raises InvalidOperation if this Cursor has already been used. Only the last collation applied to this cursor has any effect.

Parameters:
collection

The Collection that this Cursor is iterating.

comment(comment)

Adds a ‘comment’ to the cursor.

http://docs.mongodb.org/manual/reference/operator/comment/

Parameters:
  • comment: A string to attach to the query to help interpret and trace the operation in the server logs and in profile data.

New in version 2.7.

count(with_limit_and_skip=False)

DEPRECATED - Get the size of the results set for this query.

The count() method is deprecated and not supported in a transaction. Please use count_documents() instead.

Returns the number of documents in the results set for this query. Does not take limit() and skip() into account by default - set with_limit_and_skip to True if that is the desired behavior. Raises OperationFailure on a database error.

When used with MongoDB >= 2.6, count() uses any hint() applied to the query. In the following example the hint is passed to the count command:

collection.find({‘field’: ‘value’}).hint(‘field_1’).count()

The count() method obeys the read_preference of the Collection instance on which find() was called.

Parameters:
  • with_limit_and_skip (optional): take any limit() or skip() that has been applied to this cursor into account when getting the count

Note

The with_limit_and_skip parameter requires server version >= 1.1.4-

Changed in version 3.7: Deprecated.

Changed in version 2.8: The count() method now supports hint().

cursor_id

Returns the id of the cursor

Useful if you need to manage cursor ids and want to handle killing cursors manually using kill_cursors()

New in version 2.2.

distinct(key)

Get a list of distinct values for key among all documents in the result set of this query.

Raises TypeError if key is not an instance of basestring (str in python 3).

The distinct() method obeys the read_preference of the Collection instance on which find() was called.

Parameters:
  • key: name of key for which we want to get the distinct values
explain()

Returns an explain plan record for this cursor.

Note

Starting with MongoDB 3.2 explain() uses the default verbosity mode of the explain command, allPlansExecution. To use a different verbosity use command() to run the explain command directly.

See also

The MongoDB documentation on

explain

hint(index)

Adds a ‘hint’, telling Mongo the proper index to use for the query.

Judicious use of hints can greatly improve query performance. When doing a query on multiple fields (at least one of which is indexed) pass the indexed field as a hint to the query. Raises OperationFailure if the provided hint requires an index that does not exist on this collection, and raises InvalidOperation if this cursor has already been used.

index should be an index as passed to create_index() (e.g. [('field', ASCENDING)]) or the name of the index. If index is None any existing hint for this query is cleared. The last hint applied to this cursor takes precedence over all others.

Parameters:
  • index: index to hint on (as an index specifier)

Changed in version 2.8: The hint() method accepts the name of the index.

limit(limit)

Limits the number of results to be returned by this cursor.

Raises TypeError if limit is not an integer. Raises InvalidOperation if this Cursor has already been used. The last limit applied to this cursor takes precedence. A limit of 0 is equivalent to no limit.

Parameters:
  • limit: the number of results to return

See also

The MongoDB documentation on

limit

max(spec)

Adds max operator that specifies upper bound for specific index.

When using max, hint() should also be configured to ensure the query uses the expected index and starting in MongoDB 4.2 hint() will be required.

Parameters:
  • spec: a list of field, limit pairs specifying the exclusive upper bound for all keys of a specific index in order.

Changed in version 3.8: Deprecated cursors that use max without a hint().

New in version 2.7.

max_await_time_ms(max_await_time_ms)

Specifies a time limit for a getMore operation on a TAILABLE_AWAIT cursor. For all other types of cursor max_await_time_ms is ignored.

Raises TypeError if max_await_time_ms is not an integer or None. Raises InvalidOperation if this Cursor has already been used.

Note

max_await_time_ms requires server version >= 3.2

Parameters:
  • max_await_time_ms: the time limit after which the operation is aborted

New in version 3.2.

max_scan(max_scan)

DEPRECATED - Limit the number of documents to scan when performing the query.

Raises InvalidOperation if this cursor has already been used. Only the last max_scan() applied to this cursor has any effect.

Parameters:
  • max_scan: the maximum number of documents to scan

Changed in version 3.7: Deprecated max_scan(). Support for this option is deprecated in MongoDB 4.0. Use max_time_ms() instead to limit server side execution time.

max_time_ms(max_time_ms)

Specifies a time limit for a query operation. If the specified time is exceeded, the operation will be aborted and ExecutionTimeout is raised. If max_time_ms is None no limit is applied.

Raises TypeError if max_time_ms is not an integer or None. Raises InvalidOperation if this Cursor has already been used.

Parameters:
  • max_time_ms: the time limit after which the operation is aborted
min(spec)

Adds min operator that specifies lower bound for specific index.

When using min, hint() should also be configured to ensure the query uses the expected index and starting in MongoDB 4.2 hint() will be required.

Parameters:
  • spec: a list of field, limit pairs specifying the inclusive lower bound for all keys of a specific index in order.

Changed in version 3.8: Deprecated cursors that use min without a hint().

New in version 2.7.

next()

Advance the cursor.

remove_option(mask)

Unset arbitrary query flags using a bitmask.

To unset the tailable flag: cursor.remove_option(2)

retrieved

The number of documents retrieved so far.

rewind()

Rewind this cursor to its unevaluated state.

Reset this cursor if it has been partially or completely evaluated. Any options that are present on the cursor will remain in effect. Future iterating performed on this cursor will cause new queries to be sent to the server, even if the resultant data has already been retrieved by this cursor.

session

The cursor’s ClientSession, or None.

New in version 3.6.

skip(skip)

Skips the first skip results of this cursor.

Raises TypeError if skip is not an integer. Raises ValueError if skip is less than 0. Raises InvalidOperation if this Cursor has already been used. The last skip applied to this cursor takes precedence.

Parameters:
  • skip: the number of results to skip
sort(key_or_list, direction=None)

Sorts this cursor’s results.

Pass a field name and a direction, either ASCENDING or DESCENDING:

for doc in collection.find().sort('field', pymongo.ASCENDING):
    print(doc)

To sort by multiple fields, pass a list of (key, direction) pairs:

for doc in collection.find().sort([
        ('field1', pymongo.ASCENDING),
        ('field2', pymongo.DESCENDING)]):
    print(doc)

Beginning with MongoDB version 2.6, text search results can be sorted by relevance:

cursor = db.test.find(
    {'$text': {'$search': 'some words'}},
    {'score': {'$meta': 'textScore'}})

# Sort by 'score' field.
cursor.sort([('score', {'$meta': 'textScore'})])

for doc in cursor:
    print(doc)

Raises InvalidOperation if this cursor has already been used. Only the last sort() applied to this cursor has any effect.

Parameters:
  • key_or_list: a single key or a list of (key, direction) pairs specifying the keys to sort on
  • direction (optional): only used if key_or_list is a single key, if not given ASCENDING is assumed
where(code)

Adds a $where clause to this query.

The code argument must be an instance of basestring (str in python 3) or Code containing a JavaScript expression. This expression will be evaluated for each document scanned. Only those documents for which the expression evaluates to true will be returned as results. The keyword this refers to the object currently being scanned.

Raises TypeError if code is not an instance of basestring (str in python 3). Raises InvalidOperation if this Cursor has already been used. Only the last call to where() applied to a Cursor has any effect.

Parameters:
  • code: JavaScript expression to use as a filter
class pymongo.cursor.RawBatchCursor(collection, filter=None, projection=None, skip=0, limit=0, no_cursor_timeout=False, cursor_type=CursorType.NON_TAILABLE, sort=None, allow_partial_results=False, oplog_replay=False, modifiers=None, batch_size=0, collation=None, hint=None, max_scan=None, max_time_ms=None, max=None, min=None, return_key=False, show_record_id=False, snapshot=False, comment=None)

Create a new cursor / iterator over raw batches of BSON data.

Should not be called directly by application developers - see find_raw_batches() instead.

See also

The MongoDB documentation on

cursors

cursor_manager – Managers to handle when cursors are killed after being closed

DEPRECATED - A manager to handle when cursors are killed after they are closed.

New cursor managers should be defined as subclasses of CursorManager and can be installed on a client by calling set_cursor_manager().

Changed in version 3.3: Deprecated, for real this time.

Changed in version 3.0: Undeprecated. close() now requires an address argument. The BatchCursorManager class is removed.

class pymongo.cursor_manager.CursorManager(client)

Instantiate the manager.

Parameters:
  • client: a MongoClient
close(cursor_id, address)

Kill a cursor.

Raises TypeError if cursor_id is not an instance of (int, long).

Parameters:
  • cursor_id: cursor id to close
  • address: the cursor’s server’s (host, port) pair

Changed in version 3.0: Now requires an address argument.

database – Database level operations

Database level operations.

pymongo.auth.MECHANISMS = frozenset({'PLAIN', 'DEFAULT', 'SCRAM-SHA-1', 'MONGODB-CR', 'SCRAM-SHA-256', 'MONGODB-X509', 'GSSAPI'})

The authentication mechanisms supported by PyMongo.

pymongo.OFF = 0

No database profiling.

pymongo.SLOW_ONLY = 1

Only profile slow operations.

pymongo.ALL = 2

Profile all operations.

class pymongo.database.Database(client, name, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a database by client and name.

Raises TypeError if name is not an instance of basestring (str in python 3). Raises InvalidName if name is not a valid database name.

Parameters:
  • client: A MongoClient instance.
  • name: The database name.
  • codec_options (optional): An instance of CodecOptions. If None (the default) client.codec_options is used.
  • read_preference (optional): The read preference to use. If None (the default) client.read_preference is used.
  • write_concern (optional): An instance of WriteConcern. If None (the default) client.write_concern is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) client.read_concern is used.

See also

The MongoDB documentation on

databases

Changed in version 3.2: Added the read_concern option.

Changed in version 3.0: Added the codec_options, read_preference, and write_concern options. Database no longer returns an instance of Collection for attribute names with leading underscores. You must use dict-style lookups instead::

db[‘__my_collection__’]

Not:

db.__my_collection__
db[collection_name] || db.collection_name

Get the collection_name Collection of Database db.

Raises InvalidName if an invalid collection name is used.

Note

Use dictionary style access if collection_name is an attribute of the Database class eg: db[collection_name].

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

read_concern

Read only access to the ReadConcern of this instance.

New in version 3.2.

add_son_manipulator(manipulator)

Add a new son manipulator to this database.

DEPRECATED - add_son_manipulator is deprecated.

Changed in version 3.0: Deprecated add_son_manipulator.

add_user(name, password=None, read_only=None, session=None, **kwargs)

DEPRECATED: Create user name with password password.

Add a new user with permissions for this Database.

Note

Will change the password if user name already exists.

Note

add_user is deprecated and will be removed in PyMongo 4.0. Starting with MongoDB 2.6 user management is handled with four database commands, createUser, usersInfo, updateUser, and dropUser.

To create a user:

db.command("createUser", "admin", pwd="password", roles=["root"])

To create a read-only user:

db.command("createUser", "user", pwd="password", roles=["read"])

To change a password:

db.command("updateUser", "user", pwd="newpassword")

Or change roles:

db.command("updateUser", "user", roles=["readWrite"])

Warning

Never create or modify users over an insecure network without the use of TLS. See TLS/SSL and PyMongo for more information.

Parameters:
  • name: the name of the user to create
  • password (optional): the password of the user to create. Can not be used with the userSource argument.
  • read_only (optional): if True the user will be read only
  • **kwargs (optional): optional fields for the user document (e.g. userSource, otherDBRoles, or roles). See http://docs.mongodb.org/manual/reference/privilege-documents for more information.
  • session (optional): a ClientSession.

Changed in version 3.7: Added support for SCRAM-SHA-256 users with MongoDB 4.0 and later.

Changed in version 3.6: Added session parameter. Deprecated add_user.

Changed in version 2.5: Added kwargs support for optional fields introduced in MongoDB 2.4

Changed in version 2.2: Added support for read only users

aggregate(pipeline, session=None, **kwargs)

Perform a database-level aggregation.

See the aggregation pipeline documentation for a list of stages that are supported.

Introduced in MongoDB 3.6.

# Lists all operations currently running on the server.
with client.admin.aggregate([{"$currentOp": {}}]) as cursor:
    for operation in cursor:
        print(operation)

All optional aggregate command parameters should be passed as keyword arguments to this method. Valid options include, but are not limited to:

  • allowDiskUse (bool): Enables writing to temporary files. When set to True, aggregation stages can write data to the _tmp subdirectory of the –dbpath directory. The default is False.
  • maxTimeMS (int): The maximum amount of time to allow the operation to run in milliseconds.
  • batchSize (int): The maximum number of documents to return per batch. Ignored if the connected mongod or mongos does not support returning aggregate results using a cursor.
  • collation (optional): An instance of Collation.

The aggregate() method obeys the read_preference of this Database, except when $out or $merge are used, in which case PRIMARY is used.

Note

This method does not support the ‘explain’ option. Please use command() instead.

Note

The write_concern of this collection is automatically applied to this operation.

Parameters:
  • pipeline: a list of aggregation pipeline stages
  • session (optional): a ClientSession.
  • **kwargs (optional): See list of options above.
Returns:

A CommandCursor over the result set.

New in version 3.9.

authenticate(name=None, password=None, source=None, mechanism='DEFAULT', **kwargs)

DEPRECATED: Authenticate to use this database.

Warning

Starting in MongoDB 3.6, calling authenticate() invalidates all existing cursors. It may also leave logical sessions open on the server for up to 30 minutes until they time out.

Authentication lasts for the life of the underlying client instance, or until logout() is called.

Raises TypeError if (required) name, (optional) password, or (optional) source is not an instance of basestring (str in python 3).

Note

  • This method authenticates the current connection, and will also cause all new socket connections in the underlying client instance to be authenticated automatically.
  • Authenticating more than once on the same database with different credentials is not supported. You must call logout() before authenticating with new credentials.
  • When sharing a client instance between multiple threads, all threads will share the authentication. If you need different authentication profiles for different purposes you must use distinct client instances.
Parameters:
  • name: the name of the user to authenticate. Optional when mechanism is MONGODB-X509 and the MongoDB server version is >= 3.4.
  • password (optional): the password of the user to authenticate. Not used with GSSAPI or MONGODB-X509 authentication.
  • source (optional): the database to authenticate on. If not specified the current database is used.
  • mechanism (optional): See MECHANISMS for options. If no mechanism is specified, PyMongo automatically uses MONGODB-CR when connected to a pre-3.0 version of MongoDB, SCRAM-SHA-1 when connected to MongoDB 3.0 through 3.6, and negotiates the mechanism to use (SCRAM-SHA-1 or SCRAM-SHA-256) when connected to MongoDB 4.0+.
  • authMechanismProperties (optional): Used to specify authentication mechanism specific options. To specify the service name for GSSAPI authentication pass authMechanismProperties=’SERVICE_NAME:<service name>’

Changed in version 3.7: Added support for SCRAM-SHA-256 with MongoDB 4.0 and later.

Changed in version 3.5: Deprecated. Authenticating multiple users conflicts with support for logical sessions in MongoDB 3.6. To authenticate as multiple users, create multiple instances of MongoClient.

New in version 2.8: Use SCRAM-SHA-1 with MongoDB 3.0 and later.

Changed in version 2.5: Added the source and mechanism parameters. authenticate() now raises a subclass of PyMongoError if authentication fails due to invalid credentials or configuration issues.

See also

The MongoDB documentation on

authenticate

client

The client instance for this Database.

collection_names(include_system_collections=True, session=None)

DEPRECATED: Get a list of all the collection names in this database.

Parameters:
  • include_system_collections (optional): if False list will not include system collections (e.g system.indexes)
  • session (optional): a ClientSession.

Changed in version 3.7: Deprecated. Use list_collection_names() instead.

Changed in version 3.6: Added session parameter.

command(command, value=1, check=True, allowable_errors=None, read_preference=None, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)), session=None, **kwargs)

Issue a MongoDB command.

Send command command to the database and return the response. If command is an instance of basestring (str in python 3) then the command {command: value} will be sent. Otherwise, command must be an instance of dict and will be sent as is.

Any additional keyword arguments will be added to the final command document before it is sent.

For example, a command like {buildinfo: 1} can be sent using:

>>> db.command("buildinfo")

For a command where the value matters, like {collstats: collection_name} we can do:

>>> db.command("collstats", collection_name)

For commands that take additional arguments we can use kwargs. So {filemd5: object_id, root: file_root} becomes:

>>> db.command("filemd5", object_id, root=file_root)
Parameters:
  • command: document representing the command to be issued, or the name of the command (for simple commands only).

    Note

    the order of keys in the command document is significant (the “verb” must come first), so commands which require multiple keys (e.g. findandmodify) should use an instance of SON or a string and kwargs instead of a Python dict.

  • value (optional): value to use for the command verb when command is passed as a string

  • check (optional): check the response for errors, raising OperationFailure if there are any

  • allowable_errors: if check is True, error messages in this list will be ignored by error-checking

  • read_preference (optional): The read preference for this operation. See read_preferences for options. If the provided session is in a transaction, defaults to the read preference configured for the transaction. Otherwise, defaults to PRIMARY.

  • codec_options: A CodecOptions instance.

  • session (optional): A ClientSession.

  • **kwargs (optional): additional keyword arguments will be added to the command document before it is sent

Note

command() does not obey this Database’s read_preference or codec_options. You must use the read_preference and codec_options parameters instead.

Note

command() does not apply any custom TypeDecoders when decoding the command response.

Changed in version 3.6: Added session parameter.

Changed in version 3.0: Removed the as_class, fields, uuid_subtype, tag_sets, and secondary_acceptable_latency_ms option. Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object. Added the codec_options parameter.

Changed in version 2.7: Added compile_re option. If set to False, PyMongo represented BSON regular expressions as Regex objects instead of attempting to compile BSON regular expressions as Python native regular expressions, thus preventing errors for some incompatible patterns, see PYTHON-500.

Changed in version 2.3: Added tag_sets and secondary_acceptable_latency_ms options.

Changed in version 2.2: Added support for as_class - the class you want to use for the resulting documents

See also

The MongoDB documentation on

commands

create_collection(name, codec_options=None, read_preference=None, write_concern=None, read_concern=None, session=None, **kwargs)

Create a new Collection in this database.

Normally collection creation is automatic. This method should only be used to specify options on creation. CollectionInvalid will be raised if the collection already exists.

Options should be passed as keyword arguments to this method. Supported options vary with MongoDB release. Some examples include:

  • “size”: desired initial size for the collection (in bytes). For capped collections this size is the max size of the collection.
  • “capped”: if True, this is a capped collection
  • “max”: maximum number of objects if capped (optional)

See the MongoDB documentation for a full list of supported options by server version.

Parameters:
  • name: the name of the collection to create
  • codec_options (optional): An instance of CodecOptions. If None (the default) the codec_options of this Database is used.
  • read_preference (optional): The read preference to use. If None (the default) the read_preference of this Database is used.
  • write_concern (optional): An instance of WriteConcern. If None (the default) the write_concern of this Database is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) the read_concern of this Database is used.
  • collation (optional): An instance of Collation.
  • session (optional): a ClientSession.
  • **kwargs (optional): additional keyword arguments will be passed as options for the create collection command

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Added the collation option.

Changed in version 3.0: Added the codec_options, read_preference, and write_concern options.

Changed in version 2.2: Removed deprecated argument: options

current_op(include_all=False, session=None)

DEPRECATED: Get information on operations currently running.

Starting with MongoDB 3.6 this helper is obsolete. The functionality provided by this helper is available in MongoDB 3.6+ using the $currentOp aggregation pipeline stage, which can be used with aggregate(). Note that, while this helper can only return a single document limited to a 16MB result, aggregate() returns a cursor avoiding that limitation.

Users of MongoDB versions older than 3.6 can use the currentOp command directly:

# MongoDB 3.2 and 3.4
client.admin.command("currentOp")

Or query the “inprog” virtual collection:

# MongoDB 2.6 and 3.0
client.admin["$cmd.sys.inprog"].find_one()
Parameters:
  • include_all (optional): if True also list currently idle operations in the result
  • session (optional): a ClientSession.

Changed in version 3.9: Deprecated.

Changed in version 3.6: Added session parameter.

dereference(dbref, session=None, **kwargs)

Dereference a DBRef, getting the document it points to.

Raises TypeError if dbref is not an instance of DBRef. Returns a document, or None if the reference does not point to a valid document. Raises ValueError if dbref has a database specified that is different from the current database.

Parameters:
  • dbref: the reference
  • session (optional): a ClientSession.
  • **kwargs (optional): any additional keyword arguments are the same as the arguments to find().

Changed in version 3.6: Added session parameter.

drop_collection(name_or_collection, session=None)

Drop a collection.

Parameters:
  • name_or_collection: the name of a collection to drop or the collection object itself
  • session (optional): a ClientSession.

Note

The write_concern of this database is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.6: Added session parameter.

Changed in version 3.4: Apply this database’s write concern automatically to this operation when connected to MongoDB >= 3.4.

error()

DEPRECATED: Get the error if one occurred on the last operation.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Changed in version 2.8: Deprecated.

eval(code, *args)

DEPRECATED: Evaluate a JavaScript expression in MongoDB.

Parameters:
  • code: string representation of JavaScript code to be evaluated
  • args (optional): additional positional arguments are passed to the code being evaluated

Warning

the eval command is deprecated in MongoDB 3.0 and will be removed in a future server version.

get_collection(name, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a Collection with the given name and options.

Useful for creating a Collection with different codec options, read preference, and/or write concern from this Database.

>>> db.read_preference
Primary()
>>> coll1 = db.test
>>> coll1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> coll2 = db.get_collection(
...     'test', read_preference=ReadPreference.SECONDARY)
>>> coll2.read_preference
Secondary(tag_sets=None)
Parameters:
incoming_copying_manipulators

DEPRECATED: All incoming SON copying manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

incoming_manipulators

DEPRECATED: All incoming SON manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

last_status()

DEPRECATED: Get status information from the last operation.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Returns a SON object with status information.

Changed in version 2.8: Deprecated.

list_collection_names(session=None, filter=None, **kwargs)

Get a list of all the collection names in this database.

For example, to list all non-system collections:

filter = {"name": {"$regex": r"^(?!system\.)"}}
db.list_collection_names(filter=filter)
Parameters:
  • session (optional): a ClientSession.
  • filter (optional): A query document to filter the list of collections returned from the listCollections command.
  • **kwargs (optional): Optional parameters of the listCollections command can be passed as keyword arguments to this method. The supported options differ by server version.

Changed in version 3.8: Added the filter and **kwargs parameters.

New in version 3.6.

list_collections(session=None, filter=None, **kwargs)

Get a cursor over the collectons of this database.

Parameters:
  • session (optional): a ClientSession.
  • filter (optional): A query document to filter the list of collections returned from the listCollections command.
  • **kwargs (optional): Optional parameters of the listCollections command can be passed as keyword arguments to this method. The supported options differ by server version.
Returns:

An instance of CommandCursor.

New in version 3.6.

logout()

DEPRECATED: Deauthorize use of this database.

Warning

Starting in MongoDB 3.6, calling logout() invalidates all existing cursors. It may also leave logical sessions open on the server for up to 30 minutes until they time out.

name

The name of this Database.

outgoing_copying_manipulators

DEPRECATED: All outgoing SON copying manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

outgoing_manipulators

DEPRECATED: All outgoing SON manipulators.

Changed in version 3.5: Deprecated.

New in version 2.0.

previous_error()

DEPRECATED: Get the most recent error on this database.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Only returns errors that have occurred since the last call to reset_error_history(). Returns None if no such errors have occurred.

Changed in version 2.8: Deprecated.

profiling_info(session=None)

Returns a list containing current profiling information.

Parameters:

Changed in version 3.6: Added session parameter.

See also

The MongoDB documentation on

profiling

profiling_level(session=None)

Get the database’s current profiling level.

Returns one of (OFF, SLOW_ONLY, ALL).

Parameters:

Changed in version 3.6: Added session parameter.

See also

The MongoDB documentation on

profiling

remove_user(name, session=None)

DEPRECATED: Remove user name from this Database.

User name will no longer have permissions to access this Database.

Note

remove_user is deprecated and will be removed in PyMongo 4.0. Use the dropUser command instead:

db.command("dropUser", "user")
Parameters:
  • name: the name of the user to remove
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter. Deprecated remove_user.

reset_error_history()

DEPRECATED: Reset the error history of this database.

This method is obsolete: all MongoDB write operations (insert, update, remove, and so on) use the write concern w=1 and report their errors by default.

Calls to previous_error() will only return errors that have occurred since the most recent call to this method.

Changed in version 2.8: Deprecated.

set_profiling_level(level, slow_ms=None, session=None)

Set the database’s profiling level.

Parameters:
  • level: Specifies a profiling level, see list of possible values below.
  • slow_ms: Optionally modify the threshold for the profile to consider a query or operation. Even if the profiler is off queries slower than the slow_ms level will get written to the logs.
  • session (optional): a ClientSession.

Possible level values:

Level Setting
OFF Off. No profiling.
SLOW_ONLY On. Only includes slow operations.
ALL On. Includes all operations.

Raises ValueError if level is not one of (OFF, SLOW_ONLY, ALL).

Changed in version 3.6: Added session parameter.

See also

The MongoDB documentation on

profiling

system_js

DEPRECATED: SystemJS helper for this Database.

See the documentation for SystemJS for more details.

validate_collection(name_or_collection, scandata=False, full=False, session=None)

Validate a collection.

Returns a dict of validation info. Raises CollectionInvalid if validation fails.

Parameters:
  • name_or_collection: A Collection object or the name of a collection to validate.
  • scandata: Do extra checks beyond checking the overall structure of the collection.
  • full: Have the server do a more thorough scan of the collection. Use with scandata for a thorough scan of the structure of the collection and the individual documents.
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

watch(pipeline=None, full_document=None, resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, start_at_operation_time=None, session=None, start_after=None)

Watch changes on this database.

Performs an aggregation with an implicit initial $changeStream stage and returns a DatabaseChangeStream cursor which iterates over changes on all collections in this database.

Introduced in MongoDB 4.0.

with db.watch() as stream:
    for change in stream:
        print(change)

The DatabaseChangeStream iterable blocks until the next change document is returned or an error is raised. If the next() method encounters a network error when retrieving a batch from the server, it will automatically attempt to recreate the cursor such that no change events are missed. Any error encountered during the resume attempt indicates there may be an outage and will be raised.

try:
    with db.watch(
            [{'$match': {'operationType': 'insert'}}]) as stream:
        for insert_change in stream:
            print(insert_change)
except pymongo.errors.PyMongoError:
    # The ChangeStream encountered an unrecoverable error or the
    # resume attempt failed to recreate the cursor.
    logging.error('...')

For a precise description of the resume process see the change streams specification.

Parameters:
  • pipeline (optional): A list of aggregation pipeline stages to append to an initial $changeStream stage. Not all pipeline stages are valid after a $changeStream stage, see the MongoDB documentation on change streams for the supported stages.
  • full_document (optional): The fullDocument to pass as an option to the $changeStream stage. Allowed values: ‘updateLookup’. When set to ‘updateLookup’, the change notification for partial updates will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred.
  • resume_after (optional): A resume token. If provided, the change stream will start returning changes that occur directly after the operation specified in the resume token. A resume token is the _id value of a change document.
  • max_await_time_ms (optional): The maximum time in milliseconds for the server to wait for changes before responding to a getMore operation.
  • batch_size (optional): The maximum number of documents to return per batch.
  • collation (optional): The Collation to use for the aggregation.
  • start_at_operation_time (optional): If provided, the resulting change stream will only return changes that occurred at or after the specified Timestamp. Requires MongoDB >= 4.0.
  • session (optional): a ClientSession.
  • start_after (optional): The same as resume_after except that start_after can resume notifications after an invalidate event. This option and resume_after are mutually exclusive.
Returns:

A DatabaseChangeStream cursor.

Changed in version 3.9: Added the start_after parameter.

New in version 3.7.

See also

The MongoDB documentation on

changeStreams

with_options(codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a clone of this database changing the specified settings.

>>> db1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> db2 = db1.with_options(read_preference=ReadPreference.SECONDARY)
>>> db1.read_preference
Primary()
>>> db2.read_preference
Secondary(tag_sets=None)
Parameters:
  • codec_options (optional): An instance of CodecOptions. If None (the default) the codec_options of this Collection is used.
  • read_preference (optional): The read preference to use. If None (the default) the read_preference of this Collection is used. See read_preferences for options.
  • write_concern (optional): An instance of WriteConcern. If None (the default) the write_concern of this Collection is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) the read_concern of this Collection is used.

New in version 3.8.

class pymongo.database.SystemJS(database)

DEPRECATED: Get a system js helper for the database database.

SystemJS will be removed in PyMongo 4.0.

list()

Get a list of the names of the functions stored in this database.

driver_info

Advanced options for MongoDB drivers implemented on top of PyMongo.

class pymongo.driver_info.DriverInfo(name=None, version=None, platform=None)

Info about a driver wrapping PyMongo.

The MongoDB server logs PyMongo’s name, version, and platform whenever PyMongo establishes a connection. A driver implemented on top of PyMongo can add its own info to this log message. Initialize with three strings like ‘MyDriver’, ‘1.2.3’, ‘some platform info’. Any of these strings may be None to accept PyMongo’s default.

encryption – Client side encryption

Support for explicit client side encryption.

Support for client side encryption is in beta. Backwards-breaking changes may be made before the final release.

class pymongo.encryption.Algorithm

An enum that defines the supported encryption algorithms.

class pymongo.encryption.ClientEncryption(kms_providers, key_vault_namespace, key_vault_client, codec_options)

Explicit client side encryption.

The ClientEncryption class encapsulates explicit operations on a key vault collection that cannot be done directly on a MongoClient. Similar to configuring auto encryption on a MongoClient, it is constructed with a MongoClient (to a MongoDB cluster containing the key vault collection), KMS provider configuration, and keyVaultNamespace. It provides an API for explicitly encrypting and decrypting values, and creating data keys. It does not provide an API to query keys from the key vault collection, as this can be done directly on the MongoClient.

Note

Support for client side encryption is in beta. Backwards-breaking changes may be made before the final release.

Parameters:
  • kms_providers: Map of KMS provider options. Two KMS providers are supported: “aws” and “local”. The kmsProviders map values differ by provider:

    • aws: Map with “accessKeyId” and “secretAccessKey” as strings. These are the AWS access key ID and AWS secret access key used to generate KMS messages.
    • local: Map with “key” as a 96-byte array or string. “key” is the master key used to encrypt/decrypt data keys. This key should be generated and stored as securely as possible.
  • key_vault_namespace: The namespace for the key vault collection. The key vault collection contains all data keys used for encryption and decryption. Data keys are stored as documents in this MongoDB collection. Data keys are protected with encryption by a KMS provider.

  • key_vault_client: A MongoClient connected to a MongoDB cluster containing the key_vault_namespace collection.

  • codec_options: An instance of CodecOptions to use when encoding a value for encryption and decoding the decrypted BSON value.

New in version 3.9.

close()

Release resources.

Note that using this class in a with-statement will automatically call close():

with ClientEncryption(...) as client_encryption:
    encrypted = client_encryption.encrypt(value, ...)
    decrypted = client_encryption.decrypt(encrypted)
create_data_key(kms_provider, master_key=None, key_alt_names=None)

Create and insert a new data key into the key vault collection.

Parameters:
  • kms_provider: The KMS provider to use. Supported values are “aws” and “local”.

  • master_key: The master_key identifies a KMS-specific key used to encrypt the new data key. If the kmsProvider is “local” the master_key is not applicable and may be omitted. If the kms_provider is “aws”, master_key is required and must have the following fields:

    • region (string): The AWS region as a string.
    • key (string): The Amazon Resource Name (ARN) to the AWS customer master key (CMK).
  • key_alt_names (optional): An optional list of string alternate names used to reference a key. If a key is created with alternate names, then encryption may refer to the key by the unique alternate name instead of by key_id. The following example shows creating and referring to a data key by alternate name:

    client_encryption.create_data_key("local", keyAltNames=["name1"])
    # reference the key with the alternate name
    client_encryption.encrypt("457-55-5462", keyAltName="name1",
                              algorithm=Algorithm.Random)
    
Returns:

The _id of the created data key document.

decrypt(value)

Decrypt an encrypted value.

Parameters:
  • value (Binary): The encrypted value, a Binary with subtype 6.
Returns:

The decrypted BSON value.

encrypt(value, algorithm, key_id=None, key_alt_name=None)

Encrypt a BSON value with a given key and algorithm.

Note that exactly one of key_id or key_alt_name must be provided.

Parameters:
  • value: The BSON value to encrypt.
  • algorithm (string): The encryption algorithm to use. See Algorithm for some valid options.
  • key_id: Identifies a data key by _id which must be a Binary with subtype 4 ( UUID_SUBTYPE).
  • key_alt_name: Identifies a key vault document by ‘keyAltName’.
Returns:

The encrypted value, a Binary with subtype 6.

encryption_options – Support for automatic client side encryption

Support for automatic client side encryption.

Support for client side encryption is in beta. Backwards-breaking changes may be made before the final release.

class pymongo.encryption_options.AutoEncryptionOpts(kms_providers, key_vault_namespace, key_vault_client=None, schema_map=None, bypass_auto_encryption=False, mongocryptd_uri='mongodb://localhost:27020', mongocryptd_bypass_spawn=False, mongocryptd_spawn_path='mongocryptd', mongocryptd_spawn_args=None)

Options to configure automatic encryption.

Automatic encryption is an enterprise only feature that only applies to operations on a collection. Automatic encryption is not supported for operations on a database or view and will result in error. To bypass automatic encryption (but enable automatic decryption), set bypass_auto_encryption=True in AutoEncryptionOpts.

Explicit encryption/decryption and automatic decryption is a community feature. A MongoClient configured with bypassAutoEncryption=true will still automatically decrypt.

Note

Support for client side encryption is in beta. Backwards-breaking changes may be made before the final release.

Parameters:
  • kms_providers: Map of KMS provider options. Two KMS providers are supported: “aws” and “local”. The kmsProviders map values differ by provider:

    • aws: Map with “accessKeyId” and “secretAccessKey” as strings. These are the AWS access key ID and AWS secret access key used to generate KMS messages.
    • local: Map with “key” as a 96-byte array or string. “key” is the master key used to encrypt/decrypt data keys. This key should be generated and stored as securely as possible.
  • key_vault_namespace: The namespace for the key vault collection. The key vault collection contains all data keys used for encryption and decryption. Data keys are stored as documents in this MongoDB collection. Data keys are protected with encryption by a KMS provider.

  • key_vault_client (optional): By default the key vault collection is assumed to reside in the same MongoDB cluster as the encrypted MongoClient. Use this option to route data key queries to a separate MongoDB cluster.

  • schema_map (optional): Map of collection namespace (“db.coll”) to JSON Schema. By default, a collection’s JSONSchema is periodically polled with the listCollections command. But a JSONSchema may be specified locally with the schemaMap option.

    Supplying a `schema_map` provides more security than relying on JSON Schemas obtained from the server. It protects against a malicious server advertising a false JSON Schema, which could trick the client into sending unencrypted data that should be encrypted.

    Schemas supplied in the schemaMap only apply to configuring automatic encryption for client side encryption. Other validation rules in the JSON schema will not be enforced by the driver and will result in an error.

  • bypass_auto_encryption (optional): If True, automatic encryption will be disabled but automatic decryption will still be enabled. Defaults to False.

  • mongocryptd_uri (optional): The MongoDB URI used to connect to the local mongocryptd process. Defaults to 'mongodb://localhost:27020'.

  • mongocryptd_bypass_spawn (optional): If True, the encrypted MongoClient will not attempt to spawn the mongocryptd process. Defaults to False.

  • mongocryptd_spawn_path (optional): Used for spawning the mongocryptd process. Defaults to 'mongocryptd' and spawns mongocryptd from the system path.

  • mongocryptd_spawn_args (optional): A list of string arguments to use when spawning the mongocryptd process. Defaults to ['--idleShutdownTimeoutSecs=60']. If the list does not include the idleShutdownTimeoutSecs option then '--idleShutdownTimeoutSecs=60' will be added.

New in version 3.9.

errors – Exceptions raised by the pymongo package

Exceptions raised by PyMongo.

exception pymongo.errors.AutoReconnect(message='', errors=None)

Raised when a connection to the database is lost and an attempt to auto-reconnect will be made.

In order to auto-reconnect you must handle this exception, recognizing that the operation which caused it has not necessarily succeeded. Future operations will attempt to open a new connection to the database (and will continue to raise this exception until the first successful connection is made).

Subclass of ConnectionFailure.

exception pymongo.errors.BulkWriteError(results)

Exception class for bulk write errors.

New in version 2.7.

exception pymongo.errors.CollectionInvalid(message='', error_labels=None)

Raised when collection validation fails.

exception pymongo.errors.ConfigurationError(message='', error_labels=None)

Raised when something is incorrectly configured.

exception pymongo.errors.ConnectionFailure(message='', error_labels=None)

Raised when a connection to the database cannot be made or is lost.

exception pymongo.errors.CursorNotFound(error, code=None, details=None)

Raised while iterating query results if the cursor is invalidated on the server.

New in version 2.7.

exception pymongo.errors.DocumentTooLarge

Raised when an encoded document is too large for the connected server.

exception pymongo.errors.DuplicateKeyError(error, code=None, details=None)

Raised when an insert or update fails due to a duplicate key error.

exception pymongo.errors.EncryptionError(cause)

Raised when encryption or decryption fails.

This error always wraps another exception which can be retrieved via the cause property.

New in version 3.9.

cause

The exception that caused this encryption or decryption error.

exception pymongo.errors.ExceededMaxWaiters(message='', error_labels=None)

Raised when a thread tries to get a connection from a pool and maxPoolSize * waitQueueMultiple threads are already waiting.

New in version 2.6.

exception pymongo.errors.ExecutionTimeout(error, code=None, details=None)

Raised when a database operation times out, exceeding the $maxTimeMS set in the query or command option.

Note

Requires server version >= 2.6.0

New in version 2.7.

exception pymongo.errors.InvalidName(message='', error_labels=None)

Raised when an invalid name is used.

exception pymongo.errors.InvalidOperation(message='', error_labels=None)

Raised when a client attempts to perform an invalid operation.

exception pymongo.errors.InvalidURI(message='', error_labels=None)

Raised when trying to parse an invalid mongodb URI.

exception pymongo.errors.NetworkTimeout(message='', errors=None)

An operation on an open connection exceeded socketTimeoutMS.

The remaining connections in the pool stay open. In the case of a write operation, you cannot know whether it succeeded or failed.

Subclass of AutoReconnect.

exception pymongo.errors.NotMasterError(message='', errors=None)

The server responded “not master” or “node is recovering”.

These errors result from a query, write, or command. The operation failed because the client thought it was using the primary but the primary has stepped down, or the client thought it was using a healthy secondary but the secondary is stale and trying to recover.

The client launches a refresh operation on a background thread, to update its view of the server as soon as possible after throwing this exception.

Subclass of AutoReconnect.

exception pymongo.errors.OperationFailure(error, code=None, details=None)

Raised when a database operation fails.

New in version 2.7: The details attribute.

code

The error code returned by the server, if any.

details

The complete error document returned by the server.

Depending on the error that occurred, the error document may include useful information beyond just the error message. When connected to a mongos the error document may contain one or more subdocuments if errors occurred on multiple shards.

exception pymongo.errors.ProtocolError(message='', error_labels=None)

Raised for failures related to the wire protocol.

exception pymongo.errors.PyMongoError(message='', error_labels=None)

Base class for all PyMongo exceptions.

has_error_label(label)

Return True if this error contains the given label.

New in version 3.7.

exception pymongo.errors.ServerSelectionTimeoutError(message='', errors=None)

Thrown when no MongoDB server is available for an operation

If there is no suitable server for an operation PyMongo tries for serverSelectionTimeoutMS (default 30 seconds) to find one, then throws this exception. For example, it is thrown after attempting an operation when PyMongo cannot connect to any server, or if you attempt an insert into a replica set that has no primary and does not elect one within the timeout window, or if you attempt to query with a Read Preference that the replica set cannot satisfy.

exception pymongo.errors.WTimeoutError(error, code=None, details=None)

Raised when a database operation times out (i.e. wtimeout expires) before replication completes.

With newer versions of MongoDB the details attribute may include write concern fields like ‘n’, ‘updatedExisting’, or ‘writtenTo’.

New in version 2.7.

exception pymongo.errors.WriteConcernError(error, code=None, details=None)

Base exception type for errors raised due to write concern.

New in version 3.0.

exception pymongo.errors.WriteError(error, code=None, details=None)

Base exception type for errors raised during write operations.

New in version 3.0.

message – Tools for creating messages to be sent to MongoDB

Tools for creating messages to be sent to MongoDB.

Note

This module is for internal use and is generally not needed by application developers.

pymongo.message.delete(collection_name, spec, safe, last_error_args, opts, flags=0, ctx=None)

Get a delete message.

opts is a CodecOptions. flags is a bit vector that may contain the SingleRemove flag or not:

http://docs.mongodb.org/meta-driver/latest/legacy/mongodb-wire-protocol/#op-delete

pymongo.message.get_more(collection_name, num_to_return, cursor_id, ctx=None)

Get a getMore message.

pymongo.message.insert(collection_name, docs, check_keys, safe, last_error_args, continue_on_error, opts, ctx=None)

Get an insert message.

pymongo.message.kill_cursors(cursor_ids)

Get a killCursors message.

pymongo.message.query(options, collection_name, num_to_skip, num_to_return, query, field_selector, opts, check_keys=False, ctx=None)

Get a query message.

pymongo.message.update(collection_name, upsert, multi, spec, doc, safe, last_error_args, check_keys, opts, ctx=None)

Get an update message.

mongo_client – Tools for connecting to MongoDB

Tools for connecting to MongoDB.

See also

High Availability and PyMongo for examples of connecting to replica sets or sets of mongos servers.

To get a Database instance from a MongoClient use either dictionary-style or attribute-style access:

>>> from pymongo import MongoClient
>>> c = MongoClient()
>>> c.test_database
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test_database')
>>> c['test-database']
Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), u'test-database')
class pymongo.mongo_client.MongoClient(host='localhost', port=27017, document_class=dict, tz_aware=False, connect=True, **kwargs)

Client for a MongoDB instance, a replica set, or a set of mongoses.

The client object is thread-safe and has connection-pooling built in. If an operation fails because of a network error, ConnectionFailure is raised and the client reconnects in the background. Application code should handle this exception (recognizing that the operation failed) and then continue to execute.

The host parameter can be a full mongodb URI, in addition to a simple hostname. It can also be a list of hostnames or URIs. Any port specified in the host string(s) will override the port parameter. If multiple mongodb URIs containing database or auth information are passed, the last database, username, and password present will be used. For username and passwords reserved characters like ‘:’, ‘/’, ‘+’ and ‘@’ must be percent encoded following RFC 2396:

try:
    # Python 3.x
    from urllib.parse import quote_plus
except ImportError:
    # Python 2.x
    from urllib import quote_plus

uri = "mongodb://%s:%s@%s" % (
    quote_plus(user), quote_plus(password), host)
client = MongoClient(uri)

Unix domain sockets are also supported. The socket path must be percent encoded in the URI:

uri = "mongodb://%s:%s@%s" % (
    quote_plus(user), quote_plus(password), quote_plus(socket_path))
client = MongoClient(uri)

But not when passed as a simple hostname:

client = MongoClient('/tmp/mongodb-27017.sock')

Starting with version 3.6, PyMongo supports mongodb+srv:// URIs. The URI must include one, and only one, hostname. The hostname will be resolved to one or more DNS SRV records which will be used as the seed list for connecting to the MongoDB deployment. When using SRV URIs, the authSource and replicaSet configuration options can be specified using TXT records. See the Initial DNS Seedlist Discovery spec for more details. Note that the use of SRV URIs implicitly enables TLS support. Pass tls=false in the URI to override.

Note

MongoClient creation will block waiting for answers from DNS when mongodb+srv:// URIs are used.

Note

Starting with version 3.0 the MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads. You can check if the server is available like this:

from pymongo.errors import ConnectionFailure
client = MongoClient()
try:
    # The ismaster command is cheap and does not require auth.
    client.admin.command('ismaster')
except ConnectionFailure:
    print("Server not available")

Warning

When using PyMongo in a multiprocessing context, please read Using PyMongo with Multiprocessing first.

Note

Many of the following options can be passed using a MongoDB URI or keyword parameters. If the same option is passed in a URI and as a keyword parameter the keyword parameter takes precedence.

Parameters:
  • host (optional): hostname or IP address or Unix domain socket path of a single mongod or mongos instance to connect to, or a mongodb URI, or a list of hostnames / mongodb URIs. If host is an IPv6 literal it must be enclosed in ‘[‘ and ‘]’ characters following the RFC2732 URL syntax (e.g. ‘[::1]’ for localhost). Multihomed and round robin DNS addresses are not supported.
  • port (optional): port number on which to connect
  • document_class (optional): default class to use for documents returned from queries on this client
  • type_registry (optional): instance of TypeRegistry to enable encoding and decoding of custom types.
  • tz_aware (optional): if True, datetime instances returned as values in a document by this MongoClient will be timezone aware (otherwise they will be naive)
  • connect (optional): if True (the default), immediately begin connecting to MongoDB in the background. Otherwise connect on the first operation.
Other optional parameters can be passed as keyword arguments:
  • maxPoolSize (optional): The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.

  • minPoolSize (optional): The minimum required number of concurrent connections that the pool will maintain to each connected server. Default is 0.

  • maxIdleTimeMS (optional): The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced. Defaults to None (no limit).

  • socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to None (no timeout).

  • connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable. Defaults to 20000 (20 seconds).

  • server_selector: (callable or None) Optional, user-provided function that augments server selection rules. The function should accept as an argument a list of ServerDescription objects and return a list of server descriptions that should be considered suitable for the desired operation.

  • serverSelectionTimeoutMS: (integer) Controls how long (in milliseconds) the driver will wait to find an available, appropriate server to carry out a database operation; while it is waiting, multiple server monitoring operations may be carried out, each controlled by connectTimeoutMS. Defaults to 30000 (30 seconds).

  • waitQueueTimeoutMS: (integer or None) How long (in milliseconds) a thread will wait for a socket from the pool if the pool has no free sockets. Defaults to None (no timeout).

  • waitQueueMultiple: (integer or None) Multiplied by maxPoolSize to give the number of threads allowed to wait for a socket at one time. Defaults to None (no limit).

  • heartbeatFrequencyMS: (optional) The number of milliseconds between periodic server checks, or None to accept the default frequency of 10 seconds.

  • appname: (string or None) The name of the application that created this MongoClient instance. MongoDB 3.4 and newer will print this value in the server log upon establishing each connection. It is also recorded in the slow query log and profile collections.

  • driver: (pair or None) A driver implemented on top of PyMongo can pass a DriverInfo to add its name, version, and platform to the message printed in the server log when establishing a connection.

  • event_listeners: a list or tuple of event listeners. See monitoring for details.

  • retryWrites: (boolean) Whether supported write operations executed within this MongoClient will be retried once after a network error on MongoDB 3.6+. Defaults to True. The supported write operations are:

    Unsupported write operations include, but are not limited to, aggregate() using the $out pipeline operator and any operation with an unacknowledged write concern (e.g. {w: 0})). See https://github.com/mongodb/specifications/blob/master/source/retryable-writes/retryable-writes.rst

  • retryReads: (boolean) Whether supported read operations executed within this MongoClient will be retried once after a network error on MongoDB 3.6+. Defaults to True. The supported read operations are: find(), find_one(), aggregate() without $out, distinct(), count(), estimated_document_count(), count_documents(), pymongo.collection.Collection.watch(), list_indexes(), pymongo.database.Database.watch(), list_collections(), pymongo.mongo_client.MongoClient.watch(), and list_databases().

    Unsupported read operations include, but are not limited to: map_reduce(), inline_map_reduce(), command(), and any getMore operation on a cursor.

    Enabling retryable reads makes applications more resilient to transient errors such as network failures, database upgrades, and replica set failovers. For an exact definition of which errors trigger a retry, see the retryable reads specification.

  • socketKeepAlive: (boolean) DEPRECATED Whether to send periodic keep-alive packets on connected sockets. Defaults to True. Disabling it is not recommended, see https://docs.mongodb.com/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-mongodb-deployments”,

  • compressors: Comma separated list of compressors for wire protocol compression. The list is used to negotiate a compressor with the server. Currently supported options are “snappy”, “zlib” and “zstd”. Support for snappy requires the python-snappy package. zlib support requires the Python standard library zlib module. zstd requires the zstandard package. By default no compression is used. Compression support must also be enabled on the server. MongoDB 3.4+ supports snappy compression. MongoDB 3.6 adds support for zlib. MongoDB 4.2 adds support for zstd.

  • zlibCompressionLevel: (int) The zlib compression level to use when zlib is used as the wire protocol compressor. Supported values are -1 through 9. -1 tells the zlib library to use its default compression level (usually 6). 0 means no compression. 1 is best speed. 9 is best compression. Defaults to -1.

  • uuidRepresentation: The BSON representation to use when encoding from and decoding to instances of UUID. Valid values are pythonLegacy (the default), javaLegacy, csharpLegacy and standard. New applications should consider setting this to standard for cross language compatibility.

Write Concern options:
(Only set if passed. No default values.)
  • w: (integer or string) If this is a replica set, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<int> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). Passing w=0 disables write acknowledgement and all other write concern options.
  • wTimeoutMS: (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised. Passing wTimeoutMS=0 will cause write operations to wait indefinitely.
  • journal: If True block until write operations have been committed to the journal. Cannot be used in combination with fsync. Prior to MongoDB 2.6 this option was ignored if the server was running without journaling. Starting with MongoDB 2.6 write operations will fail with an exception if this option is used when the server is running without journaling.
  • fsync: If True and the server is running without journaling, blocks until the server has synced all data files to disk. If the server is running with journaling, this acts the same as the j option, blocking until write operations have been committed to the journal. Cannot be used in combination with j.
Replica set keyword arguments for connecting with a replica set - either directly or via a mongos:
  • replicaSet: (string or None) The name of the replica set to connect to. The driver will verify that all servers it connects to match this name. Implies that the hosts specified are a seed list and the driver should attempt to find all members of the set. Defaults to None.
Read Preference:
  • readPreference: The replica set read preference for this client. One of primary, primaryPreferred, secondary, secondaryPreferred, or nearest. Defaults to primary.
  • readPreferenceTags: Specifies a tag set as a comma-separated list of colon-separated key-value pairs. For example dc:ny,rack:1. Defaults to None.
  • maxStalenessSeconds: (integer) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Defaults to -1, meaning no maximum. If maxStalenessSeconds is set, it must be a positive integer greater than or equal to 90 seconds.
Authentication:
  • username: A string.

  • password: A string.

    Although username and password must be percent-escaped in a MongoDB URI, they must not be percent-escaped when passed as parameters. In this example, both the space and slash special characters are passed as-is:

    MongoClient(username="user name", password="pass/word")
    
  • authSource: The database to authenticate on. Defaults to the database specified in the URI, if provided, or to “admin”.

  • authMechanism: See MECHANISMS for options. If no mechanism is specified, PyMongo automatically uses MONGODB-CR when connected to a pre-3.0 version of MongoDB, SCRAM-SHA-1 when connected to MongoDB 3.0 through 3.6, and negotiates the mechanism to use (SCRAM-SHA-1 or SCRAM-SHA-256) when connected to MongoDB 4.0+.

  • authMechanismProperties: Used to specify authentication mechanism specific options. To specify the service name for GSSAPI authentication pass authMechanismProperties=’SERVICE_NAME:<service name>’

TLS/SSL configuration:
  • tls: (boolean) If True, create the connection to the server using transport layer security. Defaults to False.
  • tlsInsecure: (boolean) Specify whether TLS constraints should be relaxed as much as possible. Setting tlsInsecure=True implies tlsAllowInvalidCertificates=True and tlsAllowInvalidHostnames=True. Defaults to False. Think very carefully before setting this to True as it dramatically reduces the security of TLS.
  • tlsAllowInvalidCertificates: (boolean) If True, continues the TLS handshake regardless of the outcome of the certificate verification process. If this is False, and a value is not provided for tlsCAFile, PyMongo will attempt to load system provided CA certificates. If the python version in use does not support loading system CA certificates then the tlsCAFile parameter must point to a file of CA certificates. tlsAllowInvalidCertificates=False implies tls=True. Defaults to False. Think very carefully before setting this to True as that could make your application vulnerable to man-in-the-middle attacks.
  • tlsAllowInvalidHostnames: (boolean) If True, disables TLS hostname verification. tlsAllowInvalidHostnames=False implies tls=True. Defaults to False. Think very carefully before setting this to True as that could make your application vulnerable to man-in-the-middle attacks.
  • tlsCAFile: A file containing a single or a bundle of “certification authority” certificates, which are used to validate certificates passed from the other end of the connection. Implies tls=True. Defaults to None.
  • tlsCertificateKeyFile: A file containing the client certificate and private key. If you want to pass the certificate and private key as separate files, use the ssl_certfile and ssl_keyfile options instead. Implies tls=True. Defaults to None.
  • tlsCRLFile: A file containing a PEM or DER formatted certificate revocation list. Only supported by python 2.7.9+ (pypy 2.5.1+). Implies tls=True. Defaults to None.
  • tlsCertificateKeyFilePassword: The password or passphrase for decrypting the private key in tlsCertificateKeyFile or ssl_keyfile. Only necessary if the private key is encrypted. Only supported by python 2.7.9+ (pypy 2.5.1+) and 3.3+. Defaults to None.
  • ssl: (boolean) Alias for tls.
  • ssl_certfile: The certificate file used to identify the local connection against mongod. Implies tls=True. Defaults to None.
  • ssl_keyfile: The private keyfile used to identify the local connection against mongod. Can be omitted if the keyfile is included with the tlsCertificateKeyFile. Implies tls=True. Defaults to None.
Read Concern options:
(If not set explicitly, this will use the server default)
  • readConcernLevel: (string) The read concern level specifies the level of isolation for read operations. For example, a read operation using a read concern level of majority will only return data that has been written to a majority of nodes. If the level is left unspecified, the server default will be used.
Client side encryption options:
(If not set explicitly, client side encryption will not be enabled.)
  • auto_encryption_opts: A AutoEncryptionOpts which configures this client to automatically encrypt collection commands and automatically decrypt results. Support for client side encryption is in beta. Backwards-breaking changes may be made before the final release.

See also

The MongoDB documentation on

connections

Changed in version 3.9: Added the retryReads keyword argument and URI option. Added the tlsInsecure keyword argument and URI option. The following keyword arguments and URI options were deprecated:

  • wTimeout was deprecated in favor of wTimeoutMS.
  • j was deprecated in favor of journal.
  • ssl_cert_reqs was deprecated in favor of tlsAllowInvalidCertificates.
  • ssl_match_hostname was deprecated in favor of tlsAllowInvalidHostnames.
  • ssl_ca_certs was deprecated in favor of tlsCAFile.
  • ssl_certfile was deprecated in favor of tlsCertificateKeyFile.
  • ssl_crlfile was deprecated in favor of tlsCRLFile.
  • ssl_pem_passphrase was deprecated in favor of tlsCertificateKeyFilePassword.

Changed in version 3.9: retryWrites now defaults to True.

Changed in version 3.8: Added the server_selector keyword argument. Added the type_registry keyword argument.

Changed in version 3.7: Added the driver keyword argument.

Changed in version 3.6: Added support for mongodb+srv:// URIs. Added the retryWrites keyword argument and URI option.

Changed in version 3.5: Add username and password options. Document the authSource, authMechanism, and authMechanismProperties `` options. Deprecated the ``socketKeepAlive keyword argument and URI option. socketKeepAlive now defaults to True.

Changed in version 3.0: MongoClient is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs.

The MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads.

Therefore the alive method is removed since it no longer provides meaningful information; even if the client is disconnected, it may discover a server in time to fulfill the next operation.

In PyMongo 2.x, MongoClient accepted a list of standalone MongoDB servers and used the first it could connect to:

MongoClient(['host1.com:27017', 'host2.com:27017'])

A list of multiple standalones is no longer supported; if multiple servers are listed they must be members of the same replica set, or mongoses in the same sharded cluster.

The behavior for a list of mongoses is changed from “high availability” to “load balancing”. Before, the client connected to the lowest-latency mongos in the list, and used it until a network error prompted it to re-evaluate all mongoses’ latencies and reconnect to one of them. In PyMongo 3, the client monitors its network latency to all the mongoses continuously, and distributes operations evenly among those with the lowest latency. See mongos Load Balancing for more information.

The connect option is added.

The start_request, in_request, and end_request methods are removed, as well as the auto_start_request option.

The copy_database method is removed, see the copy_database examples for alternatives.

The MongoClient.disconnect() method is removed; it was a synonym for close().

MongoClient no longer returns an instance of Database for attribute names with leading underscores. You must use dict-style lookups instead:

client['__my_database__']

Not:

client.__my_database__
close()

Cleanup client resources and disconnect from MongoDB.

On MongoDB >= 3.6, end all server sessions created by this client by sending one or more endSessions commands.

Close all sockets in the connection pools and stop the monitor threads. If this instance is used again it will be automatically re-opened and the threads restarted unless auto encryption is enabled. A client enabled with auto encryption cannot be used again after being closed; any attempt will raise InvalidOperation.

Changed in version 3.6: End all server sessions created by this client.

c[db_name] || c.db_name

Get the db_name Database on MongoClient c.

Raises InvalidName if an invalid database name is used.

event_listeners

The event listeners registered for this client.

See monitoring for details.

address

(host, port) of the current standalone, primary, or mongos, or None.

Accessing address raises InvalidOperation if the client is load-balancing among mongoses, since there is no single address. Use nodes instead.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

New in version 3.0.

primary

The (host, port) of the current primary of the replica set.

Returns None if this client is not connected to a replica set, there is no primary, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

secondaries

The secondary members known to this client.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no visible secondaries, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

arbiters

Arbiters in the replica set.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no arbiters, or this client was created without the replicaSet option.

is_primary

If this client is connected to a server that can accept writes.

True if the current server is a standalone, mongos, or the primary of a replica set. If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

is_mongos

If this client is connected to mongos. If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available..

max_pool_size

The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.

When a server’s pool has reached max_pool_size, operations for that server block waiting for a socket to be returned to the pool. If waitQueueTimeoutMS is set, a blocked operation will raise ConnectionFailure after a timeout. By default waitQueueTimeoutMS is not set.

min_pool_size

The minimum required number of concurrent connections that the pool will maintain to each connected server. Default is 0.

max_idle_time_ms

The maximum number of milliseconds that a connection can remain idle in the pool before being removed and replaced. Defaults to None (no limit).

nodes

Set of all currently connected servers.

Warning

When connected to a replica set the value of nodes can change over time as MongoClient’s view of the replica set changes. nodes can also be an empty set when MongoClient is first instantiated and hasn’t yet connected to any servers, or a network partition causes it to lose connection to all servers.

max_bson_size

The largest BSON object the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

max_message_size

The largest message the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

max_write_batch_size

The maxWriteBatchSize reported by the server.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

Returns a default value when connected to server versions prior to MongoDB 2.6.

local_threshold_ms

The local threshold for this instance.

server_selection_timeout

The server selection timeout for this instance in seconds.

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

read_concern

Read only access to the ReadConcern of this instance.

New in version 3.2.

is_locked

Is this server locked? While locked, all write operations are blocked, although read operations may still be allowed. Use unlock() to unlock.

start_session(causal_consistency=True, default_transaction_options=None)

Start a logical session.

This method takes the same parameters as SessionOptions. See the client_session module for details and examples.

Requires MongoDB 3.6. It is an error to call start_session() if this client has been authenticated to multiple databases using the deprecated method authenticate().

A ClientSession may only be used with the MongoClient that started it.

Returns:An instance of ClientSession.

New in version 3.6.

list_databases(session=None, **kwargs)

Get a cursor over the databases of the connected server.

Parameters:
  • session (optional): a ClientSession.
  • **kwargs (optional): Optional parameters of the listDatabases command can be passed as keyword arguments to this method. The supported options differ by server version.
Returns:

An instance of CommandCursor.

New in version 3.6.

list_database_names(session=None)

Get a list of the names of all databases on the connected server.

Parameters:

New in version 3.6.

database_names(session=None)

DEPRECATED: Get a list of the names of all databases on the connected server.

Parameters:

Changed in version 3.7: Deprecated. Use list_database_names() instead.

Changed in version 3.6: Added session parameter.

drop_database(name_or_database, session=None)

Drop a database.

Raises TypeError if name_or_database is not an instance of basestring (str in python 3) or Database.

Parameters:
  • name_or_database: the name of a database to drop, or a Database instance representing the database to drop
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

Note

The write_concern of this client is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.4: Apply this client’s write concern automatically to this operation when connected to MongoDB >= 3.4.

get_default_database(default=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get the database named in the MongoDB connection URI.

>>> uri = 'mongodb://host/my_database'
>>> client = MongoClient(uri)
>>> db = client.get_default_database()
>>> assert db.name == 'my_database'
>>> db = client.get_database()
>>> assert db.name == 'my_database'

Useful in scripts where you want to choose which database to use based only on the URI in a configuration file.

Parameters:

Changed in version 3.8: Undeprecated. Added the default, codec_options, read_preference, write_concern and read_concern parameters.

Changed in version 3.5: Deprecated, use get_database() instead.

get_database(name=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a Database with the given name and options.

Useful for creating a Database with different codec options, read preference, and/or write concern from this MongoClient.

>>> client.read_preference
Primary()
>>> db1 = client.test
>>> db1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> db2 = client.get_database(
...     'test', read_preference=ReadPreference.SECONDARY)
>>> db2.read_preference
Secondary(tag_sets=None)
Parameters:

Changed in version 3.5: The name parameter is now optional, defaulting to the database named in the MongoDB connection URI.

server_info(session=None)

Get information about the MongoDB server we’re connected to.

Parameters:

Changed in version 3.6: Added session parameter.

close_cursor(cursor_id, address=None)

DEPRECATED - Send a kill cursors message soon with the given id.

Raises TypeError if cursor_id is not an instance of (int, long). What closing the cursor actually means depends on this client’s cursor manager.

This method may be called from a Cursor destructor during garbage collection, so it isn’t safe to take a lock or do network I/O. Instead, we schedule the cursor to be closed soon on a background thread.

Parameters:
  • cursor_id: id of cursor to close
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.7: Deprecated.

Changed in version 3.0: Added address parameter.

kill_cursors(cursor_ids, address=None)

DEPRECATED - Send a kill cursors message soon with the given ids.

Raises TypeError if cursor_ids is not an instance of list.

Parameters:
  • cursor_ids: list of cursor ids to kill
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.3: Deprecated.

Changed in version 3.0: Now accepts an address argument. Schedules the cursors to be closed on a background thread instead of sending the message immediately.

set_cursor_manager(manager_class)

DEPRECATED - Set this client’s cursor manager.

Raises TypeError if manager_class is not a subclass of CursorManager. A cursor manager handles closing cursors. Different managers can implement different policies in terms of when to actually kill a cursor that has been closed.

Parameters:
  • manager_class: cursor manager to use

Changed in version 3.3: Deprecated, for real this time.

Changed in version 3.0: Undeprecated.

watch(pipeline=None, full_document=None, resume_after=None, max_await_time_ms=None, batch_size=None, collation=None, start_at_operation_time=None, session=None, start_after=None)

Watch changes on this cluster.

Performs an aggregation with an implicit initial $changeStream stage and returns a ClusterChangeStream cursor which iterates over changes on all databases on this cluster.

Introduced in MongoDB 4.0.

with client.watch() as stream:
    for change in stream:
        print(change)

The ClusterChangeStream iterable blocks until the next change document is returned or an error is raised. If the next() method encounters a network error when retrieving a batch from the server, it will automatically attempt to recreate the cursor such that no change events are missed. Any error encountered during the resume attempt indicates there may be an outage and will be raised.

try:
    with client.watch(
            [{'$match': {'operationType': 'insert'}}]) as stream:
        for insert_change in stream:
            print(insert_change)
except pymongo.errors.PyMongoError:
    # The ChangeStream encountered an unrecoverable error or the
    # resume attempt failed to recreate the cursor.
    logging.error('...')

For a precise description of the resume process see the change streams specification.

Parameters:
  • pipeline (optional): A list of aggregation pipeline stages to append to an initial $changeStream stage. Not all pipeline stages are valid after a $changeStream stage, see the MongoDB documentation on change streams for the supported stages.
  • full_document (optional): The fullDocument to pass as an option to the $changeStream stage. Allowed values: ‘updateLookup’. When set to ‘updateLookup’, the change notification for partial updates will include both a delta describing the changes to the document, as well as a copy of the entire document that was changed from some time after the change occurred.
  • resume_after (optional): A resume token. If provided, the change stream will start returning changes that occur directly after the operation specified in the resume token. A resume token is the _id value of a change document.
  • max_await_time_ms (optional): The maximum time in milliseconds for the server to wait for changes before responding to a getMore operation.
  • batch_size (optional): The maximum number of documents to return per batch.
  • collation (optional): The Collation to use for the aggregation.
  • start_at_operation_time (optional): If provided, the resulting change stream will only return changes that occurred at or after the specified Timestamp. Requires MongoDB >= 4.0.
  • session (optional): a ClientSession.
  • start_after (optional): The same as resume_after except that start_after can resume notifications after an invalidate event. This option and resume_after are mutually exclusive.
Returns:

A ClusterChangeStream cursor.

Changed in version 3.9: Added the start_after parameter.

New in version 3.7.

See also

The MongoDB documentation on

changeStreams

fsync(**kwargs)

Flush all pending writes to datafiles.

Optional parameters can be passed as keyword arguments:
  • lock: If True lock the server to disallow writes.
  • async: If True don’t block while synchronizing.
  • session (optional): a ClientSession.

Note

Starting with Python 3.7 async is a reserved keyword. The async option to the fsync command can be passed using a dictionary instead:

options = {'async': True}
client.fsync(**options)

Changed in version 3.6: Added session parameter.

Warning

async and lock can not be used together.

Warning

MongoDB does not support the async option on Windows and will raise an exception on that platform.

unlock(session=None)

Unlock a previously locked server.

Parameters:

Changed in version 3.6: Added session parameter.

mongo_replica_set_client – Tools for connecting to a MongoDB replica set

Deprecated. See High Availability and PyMongo.

class pymongo.mongo_replica_set_client.MongoReplicaSetClient(hosts_or_uri, document_class=dict, tz_aware=False, connect=True, **kwargs)

Deprecated alias for MongoClient.

MongoReplicaSetClient will be removed in a future version of PyMongo.

Changed in version 3.0: MongoClient is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs.

The refresh method is removed from MongoReplicaSetClient, as are the seeds and hosts properties.

close()

Cleanup client resources and disconnect from MongoDB.

On MongoDB >= 3.6, end all server sessions created by this client by sending one or more endSessions commands.

Close all sockets in the connection pools and stop the monitor threads. If this instance is used again it will be automatically re-opened and the threads restarted unless auto encryption is enabled. A client enabled with auto encryption cannot be used again after being closed; any attempt will raise InvalidOperation.

Changed in version 3.6: End all server sessions created by this client.

c[db_name] || c.db_name

Get the db_name Database on MongoReplicaSetClient c.

Raises InvalidName if an invalid database name is used.

primary

The (host, port) of the current primary of the replica set.

Returns None if this client is not connected to a replica set, there is no primary, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

secondaries

The secondary members known to this client.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no visible secondaries, or this client was created without the replicaSet option.

New in version 3.0: MongoClient gained this property in version 3.0 when MongoReplicaSetClient’s functionality was merged in.

arbiters

Arbiters in the replica set.

A sequence of (host, port) pairs. Empty if this client is not connected to a replica set, there are no arbiters, or this client was created without the replicaSet option.

max_pool_size

The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.

When a server’s pool has reached max_pool_size, operations for that server block waiting for a socket to be returned to the pool. If waitQueueTimeoutMS is set, a blocked operation will raise ConnectionFailure after a timeout. By default waitQueueTimeoutMS is not set.

max_bson_size

The largest BSON object the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

max_message_size

The largest message the connected server accepts in bytes.

If the client is not connected, this will block until a connection is established or raise ServerSelectionTimeoutError if no server is available.

local_threshold_ms

The local threshold for this instance.

codec_options

Read only access to the CodecOptions of this instance.

read_preference

Read only access to the read preference of this instance.

Changed in version 3.0: The read_preference attribute is now read only.

write_concern

Read only access to the WriteConcern of this instance.

Changed in version 3.0: The write_concern attribute is now read only.

database_names(session=None)

DEPRECATED: Get a list of the names of all databases on the connected server.

Parameters:

Changed in version 3.7: Deprecated. Use list_database_names() instead.

Changed in version 3.6: Added session parameter.

drop_database(name_or_database, session=None)

Drop a database.

Raises TypeError if name_or_database is not an instance of basestring (str in python 3) or Database.

Parameters:
  • name_or_database: the name of a database to drop, or a Database instance representing the database to drop
  • session (optional): a ClientSession.

Changed in version 3.6: Added session parameter.

Note

The write_concern of this client is automatically applied to this operation when using MongoDB >= 3.4.

Changed in version 3.4: Apply this client’s write concern automatically to this operation when connected to MongoDB >= 3.4.

get_database(name=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get a Database with the given name and options.

Useful for creating a Database with different codec options, read preference, and/or write concern from this MongoClient.

>>> client.read_preference
Primary()
>>> db1 = client.test
>>> db1.read_preference
Primary()
>>> from pymongo import ReadPreference
>>> db2 = client.get_database(
...     'test', read_preference=ReadPreference.SECONDARY)
>>> db2.read_preference
Secondary(tag_sets=None)
Parameters:
  • name (optional): The name of the database - a string. If None (the default) the database named in the MongoDB connection URI is returned.
  • codec_options (optional): An instance of CodecOptions. If None (the default) the codec_options of this MongoClient is used.
  • read_preference (optional): The read preference to use. If None (the default) the read_preference of this MongoClient is used. See read_preferences for options.
  • write_concern (optional): An instance of WriteConcern. If None (the default) the write_concern of this MongoClient is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) the read_concern of this MongoClient is used.

Changed in version 3.5: The name parameter is now optional, defaulting to the database named in the MongoDB connection URI.

close_cursor(cursor_id, address=None)

DEPRECATED - Send a kill cursors message soon with the given id.

Raises TypeError if cursor_id is not an instance of (int, long). What closing the cursor actually means depends on this client’s cursor manager.

This method may be called from a Cursor destructor during garbage collection, so it isn’t safe to take a lock or do network I/O. Instead, we schedule the cursor to be closed soon on a background thread.

Parameters:
  • cursor_id: id of cursor to close
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.7: Deprecated.

Changed in version 3.0: Added address parameter.

kill_cursors(cursor_ids, address=None)

DEPRECATED - Send a kill cursors message soon with the given ids.

Raises TypeError if cursor_ids is not an instance of list.

Parameters:
  • cursor_ids: list of cursor ids to kill
  • address (optional): (host, port) pair of the cursor’s server. If it is not provided, the client attempts to close the cursor on the primary or standalone, or a mongos server.

Changed in version 3.3: Deprecated.

Changed in version 3.0: Now accepts an address argument. Schedules the cursors to be closed on a background thread instead of sending the message immediately.

set_cursor_manager(manager_class)

DEPRECATED - Set this client’s cursor manager.

Raises TypeError if manager_class is not a subclass of CursorManager. A cursor manager handles closing cursors. Different managers can implement different policies in terms of when to actually kill a cursor that has been closed.

Parameters:
  • manager_class: cursor manager to use

Changed in version 3.3: Deprecated, for real this time.

Changed in version 3.0: Undeprecated.

get_default_database(default=None, codec_options=None, read_preference=None, write_concern=None, read_concern=None)

Get the database named in the MongoDB connection URI.

>>> uri = 'mongodb://host/my_database'
>>> client = MongoClient(uri)
>>> db = client.get_default_database()
>>> assert db.name == 'my_database'
>>> db = client.get_database()
>>> assert db.name == 'my_database'

Useful in scripts where you want to choose which database to use based only on the URI in a configuration file.

Parameters:
  • default (optional): the database name to use if no database name was provided in the URI.
  • codec_options (optional): An instance of CodecOptions. If None (the default) the codec_options of this MongoClient is used.
  • read_preference (optional): The read preference to use. If None (the default) the read_preference of this MongoClient is used. See read_preferences for options.
  • write_concern (optional): An instance of WriteConcern. If None (the default) the write_concern of this MongoClient is used.
  • read_concern (optional): An instance of ReadConcern. If None (the default) the read_concern of this MongoClient is used.

Changed in version 3.8: Undeprecated. Added the default, codec_options, read_preference, write_concern and read_concern parameters.

Changed in version 3.5: Deprecated, use get_database() instead.

monitoring – Tools for monitoring driver events.

Tools to monitor driver events.

New in version 3.1.

Use register() to register global listeners for specific events. Listeners must inherit from one of the abstract classes below and implement the correct functions for that class.

For example, a simple command logger might be implemented like this:

import logging

from pymongo import monitoring

class CommandLogger(monitoring.CommandListener):

    def started(self, event):
        logging.info("Command {0.command_name} with request id "
                     "{0.request_id} started on server "
                     "{0.connection_id}".format(event))

    def succeeded(self, event):
        logging.info("Command {0.command_name} with request id "
                     "{0.request_id} on server {0.connection_id} "
                     "succeeded in {0.duration_micros} "
                     "microseconds".format(event))

    def failed(self, event):
        logging.info("Command {0.command_name} with request id "
                     "{0.request_id} on server {0.connection_id} "
                     "failed in {0.duration_micros} "
                     "microseconds".format(event))

monitoring.register(CommandLogger())

Server discovery and monitoring events are also available. For example:

class ServerLogger(monitoring.ServerListener):

    def opened(self, event):
        logging.info("Server {0.server_address} added to topology "
                     "{0.topology_id}".format(event))

    def description_changed(self, event):
        previous_server_type = event.previous_description.server_type
        new_server_type = event.new_description.server_type
        if new_server_type != previous_server_type:
            # server_type_name was added in PyMongo 3.4
            logging.info(
                "Server {0.server_address} changed type from "
                "{0.previous_description.server_type_name} to "
                "{0.new_description.server_type_name}".format(event))

    def closed(self, event):
        logging.warning("Server {0.server_address} removed from topology "
                        "{0.topology_id}".format(event))


class HeartbeatLogger(monitoring.ServerHeartbeatListener):

    def started(self, event):
        logging.info("Heartbeat sent to server "
                     "{0.connection_id}".format(event))

    def succeeded(self, event):
        # The reply.document attribute was added in PyMongo 3.4.
        logging.info("Heartbeat to server {0.connection_id} "
                     "succeeded with reply "
                     "{0.reply.document}".format(event))

    def failed(self, event):
        logging.warning("Heartbeat to server {0.connection_id} "
                        "failed with error {0.reply}".format(event))

class TopologyLogger(monitoring.TopologyListener):

    def opened(self, event):
        logging.info("Topology with id {0.topology_id} "
                     "opened".format(event))

    def description_changed(self, event):
        logging.info("Topology description updated for "
                     "topology id {0.topology_id}".format(event))
        previous_topology_type = event.previous_description.topology_type
        new_topology_type = event.new_description.topology_type
        if new_topology_type != previous_topology_type:
            # topology_type_name was added in PyMongo 3.4
            logging.info(
                "Topology {0.topology_id} changed type from "
                "{0.previous_description.topology_type_name} to "
                "{0.new_description.topology_type_name}".format(event))
        # The has_writable_server and has_readable_server methods
        # were added in PyMongo 3.4.
        if not event.new_description.has_writable_server():
            logging.warning("No writable servers available.")
        if not event.new_description.has_readable_server():
            logging.warning("No readable servers available.")

    def closed(self, event):
        logging.info("Topology with id {0.topology_id} "
                     "closed".format(event))

Connection monitoring and pooling events are also available. For example:

class ConnectionPoolLogger(ConnectionPoolListener):

    def pool_created(self, event):
        logging.info("[pool {0.address}] pool created".format(event))

    def pool_cleared(self, event):
        logging.info("[pool {0.address}] pool cleared".format(event))

    def pool_closed(self, event):
        logging.info("[pool {0.address}] pool closed".format(event))

    def connection_created(self, event):
        logging.info("[pool {0.address}][conn #{0.connection_id}] "
                     "connection created".format(event))

    def connection_ready(self, event):
        logging.info("[pool {0.address}][conn #{0.connection_id}] "
                     "connection setup succeeded".format(event))

    def connection_closed(self, event):
        logging.info("[pool {0.address}][conn #{0.connection_id}] "
                     "connection closed, reason: "
                     "{0.reason}".format(event))

    def connection_check_out_started(self, event):
        logging.info("[pool {0.address}] connection check out "
                     "started".format(event))

    def connection_check_out_failed(self, event):
        logging.info("[pool {0.address}] connection check out "
                     "failed, reason: {0.reason}".format(event))

    def connection_checked_out(self, event):
        logging.info("[pool {0.address}][conn #{0.connection_id}] "
                     "connection checked out of pool".format(event))

    def connection_checked_in(self, event):
        logging.info("[pool {0.address}][conn #{0.connection_id}] "
                     "connection checked into pool".format(event))

Event listeners can also be registered per instance of MongoClient:

client = MongoClient(event_listeners=[CommandLogger()])

Note that previously registered global listeners are automatically included when configuring per client event listeners. Registering a new global listener will not add that listener to existing client instances.

Note

Events are delivered synchronously. Application threads block waiting for event handlers (e.g. started()) to return. Care must be taken to ensure that your event handlers are efficient enough to not adversely affect overall application performance.

Warning

The command documents published through this API are not copies. If you intend to modify them in any way you must copy them in your event handler first.

pymongo.monitoring.register(listener)

Register a global event listener.

Parameters:
class pymongo.monitoring.CommandListener

Abstract base class for command listeners.

Handles CommandStartedEvent, CommandSucceededEvent, and CommandFailedEvent.

failed(event)

Abstract method to handle a CommandFailedEvent.

Parameters:
started(event)

Abstract method to handle a CommandStartedEvent.

Parameters:
succeeded(event)

Abstract method to handle a CommandSucceededEvent.

Parameters:
class pymongo.monitoring.ServerListener

Abstract base class for server listeners. Handles ServerOpeningEvent, ServerDescriptionChangedEvent, and ServerClosedEvent.

New in version 3.3.

closed(event)

Abstract method to handle a ServerClosedEvent.

Parameters:
description_changed(event)

Abstract method to handle a ServerDescriptionChangedEvent.

Parameters:
opened(event)

Abstract method to handle a ServerOpeningEvent.

Parameters:
class pymongo.monitoring.ServerHeartbeatListener

Abstract base class for server heartbeat listeners.

Handles ServerHeartbeatStartedEvent, ServerHeartbeatSucceededEvent, and ServerHeartbeatFailedEvent.

New in version 3.3.

failed(event)

Abstract method to handle a ServerHeartbeatFailedEvent.

Parameters:
started(event)

Abstract method to handle a ServerHeartbeatStartedEvent.

Parameters:
succeeded(event)

Abstract method to handle a ServerHeartbeatSucceededEvent.

Parameters:
class pymongo.monitoring.TopologyListener

Abstract base class for topology monitoring listeners. Handles TopologyOpenedEvent, TopologyDescriptionChangedEvent, and TopologyClosedEvent.

New in version 3.3.

closed(event)

Abstract method to handle a TopologyClosedEvent.

Parameters:
description_changed(event)

Abstract method to handle a TopologyDescriptionChangedEvent.

Parameters:
opened(event)

Abstract method to handle a TopologyOpenedEvent.

Parameters:
class pymongo.monitoring.ConnectionPoolListener

Abstract base class for connection pool listeners.

Handles all of the connection pool events defined in the Connection Monitoring and Pooling Specification: PoolCreatedEvent, PoolClearedEvent, PoolClosedEvent, ConnectionCreatedEvent, ConnectionReadyEvent, ConnectionClosedEvent, ConnectionCheckOutStartedEvent, ConnectionCheckOutFailedEvent, ConnectionCheckedOutEvent, and ConnectionCheckedInEvent.

New in version 3.9.

connection_check_out_failed(event)

Abstract method to handle a ConnectionCheckOutFailedEvent.

Emitted when the driver’s attempt to check out a connection fails.

Parameters:
connection_check_out_started(event)

Abstract method to handle a ConnectionCheckOutStartedEvent.

Emitted when the driver starts attempting to check out a connection.

Parameters:
connection_checked_in(event)

Abstract method to handle a ConnectionCheckedInEvent.

Emitted when the driver checks in a Connection back to the Connection Pool.

Parameters:
connection_checked_out(event)

Abstract method to handle a ConnectionCheckedOutEvent.

Emitted when the driver successfully checks out a Connection.

Parameters:
connection_closed(event)

Abstract method to handle a ConnectionClosedEvent.

Emitted when a Connection Pool closes a Connection.

Parameters:
connection_created(event)

Abstract method to handle a ConnectionCreatedEvent.

Emitted when a Connection Pool creates a Connection object.

Parameters:
connection_ready(event)

Abstract method to handle a ConnectionReadyEvent.

Emitted when a Connection has finished its setup, and is now ready to use.

Parameters:
pool_cleared(event)

Abstract method to handle a PoolClearedEvent.

Emitted when a Connection Pool is cleared.

Parameters:
pool_closed(event)

Abstract method to handle a PoolClosedEvent.

Emitted when a Connection Pool is closed.

Parameters:
pool_created(event)

Abstract method to handle a PoolCreatedEvent.

Emitted when a Connection Pool is created.

Parameters:
class pymongo.monitoring.CommandStartedEvent(command, database_name, *args)

Event published when a command starts.

Parameters:
  • command: The command document.
  • database_name: The name of the database this command was run against.
  • request_id: The request id for this operation.
  • connection_id: The address (host, port) of the server this command was sent to.
  • operation_id: An optional identifier for a series of related events.
command

The command document.

command_name

The command name.

connection_id

The address (host, port) of the server this command was sent to.

database_name

The name of the database this command was run against.

operation_id

An id for this series of events or None.

request_id

The request id for this operation.

class pymongo.monitoring.CommandSucceededEvent(duration, reply, command_name, request_id, connection_id, operation_id)

Event published when a command succeeds.

Parameters:
  • duration: The command duration as a datetime.timedelta.
  • reply: The server reply document.
  • command_name: The command name.
  • request_id: The request id for this operation.
  • connection_id: The address (host, port) of the server this command was sent to.
  • operation_id: An optional identifier for a series of related events.
command_name

The command name.

connection_id

The address (host, port) of the server this command was sent to.

duration_micros

The duration of this operation in microseconds.

operation_id

An id for this series of events or None.

reply

The server failure document for this operation.

request_id

The request id for this operation.

class pymongo.monitoring.CommandFailedEvent(duration, failure, *args)

Event published when a command fails.

Parameters:
  • duration: The command duration as a datetime.timedelta.
  • failure: The server reply document.
  • command_name: The command name.
  • request_id: The request id for this operation.
  • connection_id: The address (host, port) of the server this command was sent to.
  • operation_id: An optional identifier for a series of related events.
command_name

The command name.

connection_id

The address (host, port) of the server this command was sent to.

duration_micros

The duration of this operation in microseconds.

failure

The server failure document for this operation.

operation_id

An id for this series of events or None.

request_id

The request id for this operation.

class pymongo.monitoring.ServerDescriptionChangedEvent(previous_description, new_description, *args)

Published when server description changes.

New in version 3.3.

new_description

The new ServerDescription.

previous_description

The previous ServerDescription.

server_address

The address (host, port) pair of the server

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.ServerOpeningEvent(server_address, topology_id)

Published when server is initialized.

New in version 3.3.

server_address

The address (host, port) pair of the server

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.ServerClosedEvent(server_address, topology_id)

Published when server is closed.

New in version 3.3.

server_address

The address (host, port) pair of the server

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.TopologyDescriptionChangedEvent(previous_description, new_description, *args)

Published when the topology description changes.

New in version 3.3.

new_description

The new TopologyDescription.

previous_description

The previous TopologyDescription.

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.TopologyOpenedEvent(topology_id)

Published when the topology is initialized.

New in version 3.3.

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.TopologyClosedEvent(topology_id)

Published when the topology is closed.

New in version 3.3.

topology_id

A unique identifier for the topology this server is a part of.

class pymongo.monitoring.ServerHeartbeatStartedEvent(connection_id)

Published when a heartbeat is started.

New in version 3.3.

connection_id

The address (host, port) of the server this heartbeat was sent to.

class pymongo.monitoring.ServerHeartbeatSucceededEvent(duration, reply, *args)

Fired when the server heartbeat succeeds.

New in version 3.3.

connection_id

The address (host, port) of the server this heartbeat was sent to.

duration

The duration of this heartbeat in microseconds.

reply

An instance of IsMaster.

class pymongo.monitoring.ServerHeartbeatFailedEvent(duration, reply, *args)

Fired when the server heartbeat fails, either with an “ok: 0” or a socket exception.

New in version 3.3.

connection_id

The address (host, port) of the server this heartbeat was sent to.

duration

The duration of this heartbeat in microseconds.

reply

A subclass of Exception.

class pymongo.monitoring.PoolCreatedEvent(address, options)

Published when a Connection Pool is created.

Parameters:
  • address: The address (host, port) pair of the server this Pool is attempting to connect to.

New in version 3.9.

address

The address (host, port) pair of the server the pool is attempting to connect to.

options

Any non-default pool options that were set on this Connection Pool.

class pymongo.monitoring.PoolClearedEvent(address)

Published when a Connection Pool is cleared.

Parameters:
  • address: The address (host, port) pair of the server this Pool is attempting to connect to.

New in version 3.9.

address

The address (host, port) pair of the server the pool is attempting to connect to.

class pymongo.monitoring.PoolClosedEvent(address)

Published when a Connection Pool is closed.

Parameters:
  • address: The address (host, port) pair of the server this Pool is attempting to connect to.

New in version 3.9.

address

The address (host, port) pair of the server the pool is attempting to connect to.

class pymongo.monitoring.ConnectionCreatedEvent(address, connection_id)

Published when a Connection Pool creates a Connection object.

NOTE: This connection is not ready for use until the ConnectionReadyEvent is published.

Parameters:
  • address: The address (host, port) pair of the server this Connection is attempting to connect to.
  • connection_id: The integer ID of the Connection in this Pool.

New in version 3.9.

address

The address (host, port) pair of the server this connection is attempting to connect to.

connection_id

The ID of the Connection.

class pymongo.monitoring.ConnectionReadyEvent(address, connection_id)

Published when a Connection has finished its setup, and is ready to use.

Parameters:
  • address: The address (host, port) pair of the server this Connection is attempting to connect to.
  • connection_id: The integer ID of the Connection in this Pool.

New in version 3.9.

address

The address (host, port) pair of the server this connection is attempting to connect to.

connection_id

The ID of the Connection.

class pymongo.monitoring.ConnectionClosedReason

An enum that defines values for reason on a ConnectionClosedEvent.

New in version 3.9.

ERROR = 'error'

The connection experienced an error, making it no longer valid.

IDLE = 'idle'

The connection became stale by being idle for too long (maxIdleTimeMS).

POOL_CLOSED = 'poolClosed'

The pool was closed, making the connection no longer valid.

STALE = 'stale'

The pool was cleared, making the connection no longer valid.

class pymongo.monitoring.ConnectionClosedEvent(address, connection_id, reason)

Published when a Connection is closed.

Parameters:
  • address: The address (host, port) pair of the server this Connection is attempting to connect to.
  • connection_id: The integer ID of the Connection in this Pool.
  • reason: A reason explaining why this connection was closed.

New in version 3.9.

address

The address (host, port) pair of the server this connection is attempting to connect to.

connection_id

The ID of the Connection.

reason

A reason explaining why this connection was closed.

The reason must be one of the strings from the ConnectionClosedReason enum.

class pymongo.monitoring.ConnectionCheckOutStartedEvent(address)

Published when the driver starts attempting to check out a connection.

Parameters:
  • address: The address (host, port) pair of the server this Connection is attempting to connect to.

New in version 3.9.

address

The address (host, port) pair of the server this connection is attempting to connect to.

class pymongo.monitoring.ConnectionCheckOutFailedReason

An enum that defines values for reason on a ConnectionCheckOutFailedEvent.

New in version 3.9.

CONN_ERROR = 'connectionError'

The connection check out attempt experienced an error while setting up a new connection.

POOL_CLOSED = 'poolClosed'

The pool was previously closed, and cannot provide new connections.

TIMEOUT = 'timeout'

The connection check out attempt exceeded the specified timeout.

class pymongo.monitoring.ConnectionCheckOutFailedEvent(address, reason)

Published when the driver’s attempt to check out a connection fails.

Parameters:
  • address: The address (host, port) pair of the server this Connection is attempting to connect to.
  • reason: A reason explaining why connection check out failed.

New in version 3.9.

address

The address (host, port) pair of the server this connection is attempting to connect to.

reason

A reason explaining why connection check out failed.

The reason must be one of the strings from the ConnectionCheckOutFailedReason enum.

class pymongo.monitoring.ConnectionCheckedOutEvent(address, connection_id)

Published when the driver successfully checks out a Connection.

Parameters:
  • address: The address (host, port) pair of the server this Connection is attempting to connect to.
  • connection_id: The integer ID of the Connection in this Pool.

New in version 3.9.

address

The address (host, port) pair of the server this connection is attempting to connect to.

connection_id

The ID of the Connection.

class pymongo.monitoring.ConnectionCheckedInEvent(address, connection_id)

Published when the driver checks in a Connection into the Pool.

Parameters:
  • address: The address (host, port) pair of the server this Connection is attempting to connect to.
  • connection_id: The integer ID of the Connection in this Pool.

New in version 3.9.

address

The address (host, port) pair of the server this connection is attempting to connect to.

connection_id

The ID of the Connection.

operations – Operation class definitions

Operation class definitions.

class pymongo.operations.DeleteMany(filter, collation=None)

Create a DeleteMany instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the documents to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

Changed in version 3.5: Added the collation option.

class pymongo.operations.DeleteOne(filter, collation=None)

Create a DeleteOne instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the document to delete.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

Changed in version 3.5: Added the collation option.

class pymongo.operations.IndexModel(keys, **kwargs)

Create an Index instance.

For use with create_indexes().

Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of basestring (str in python 3), and the direction(s) must be one of (ASCENDING, DESCENDING, GEO2D, GEOHAYSTACK, GEOSPHERE, HASHED, TEXT).

Valid options include, but are not limited to:

  • name: custom name to use for this index - if none is given, a name will be generated.
  • unique: if True creates a uniqueness constraint on the index.
  • background: if True this index should be created in the background.
  • sparse: if True, omit from the index any documents that lack the indexed field.
  • bucketSize: for use with geoHaystack indexes. Number of documents to group together within a certain proximity to a given longitude and latitude.
  • min: minimum value for keys in a GEO2D index.
  • max: maximum value for keys in a GEO2D index.
  • expireAfterSeconds: <int> Used to create an expiring (TTL) collection. MongoDB will automatically delete documents from this collection after <int> seconds. The indexed field must be a UTC datetime or the data will not expire.
  • partialFilterExpression: A document that specifies a filter for a partial index. Requires server version >= 3.2.
  • collation: An instance of Collation that specifies the collation to use in MongoDB >= 3.4.
  • wildcardProjection: Allows users to include or exclude specific field paths from a wildcard index using the { “$**” : 1} key pattern. Requires server version >= 4.2.

See the MongoDB documentation for a full list of supported options by server version.

Parameters:
  • keys: a single key or a list of (key, direction) pairs specifying the index to create
  • **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments

Changed in version 3.2: Added partialFilterExpression to support partial indexes.

document

An index document suitable for passing to the createIndexes command.

class pymongo.operations.InsertOne(document)

Create an InsertOne instance.

For use with bulk_write().

Parameters:
  • document: The document to insert. If the document is missing an _id field one will be added.
class pymongo.operations.ReplaceOne(filter, replacement, upsert=False, collation=None)

Create a ReplaceOne instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the document to replace.
  • replacement: The new document.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.

Changed in version 3.5: Added the collation option.

class pymongo.operations.UpdateMany(filter, update, upsert=False, collation=None, array_filters=None)

Create an UpdateMany instance.

For use with bulk_write().

Parameters:
  • filter: A query that matches the documents to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.

Changed in version 3.9: Added the ability to accept a pipeline as the update.

Changed in version 3.6: Added the array_filters option.

Changed in version 3.5: Added the collation option.

class pymongo.operations.UpdateOne(filter, update, upsert=False, collation=None, array_filters=None)

Represents an update_one operation.

For use with bulk_write().

Parameters:
  • filter: A query that matches the document to update.
  • update: The modifications to apply.
  • upsert (optional): If True, perform an insert if no documents match the filter.
  • collation (optional): An instance of Collation. This option is only supported on MongoDB 3.4 and above.
  • array_filters (optional): A list of filters specifying which array elements an update should apply. Requires MongoDB 3.6+.

Changed in version 3.9: Added the ability to accept a pipeline as the update.

Changed in version 3.6: Added the array_filters option.

Changed in version 3.5: Added the collation option.

pool – Pool module for use with a MongoDB client.
class pymongo.pool.SocketInfo(sock, pool, address, id)

Store a socket with some metadata.

Parameters:
  • sock: a raw socket object
  • pool: a Pool instance
  • address: the server’s (host, port)
  • id: the id of this socket in it’s pool
authenticate(credentials)

Log in to the server and store these credentials in authset.

Can raise ConnectionFailure or OperationFailure.

Parameters:
  • credentials: A MongoCredential.
check_auth(all_credentials)

Update this socket’s authentication.

Log in or out to bring this socket’s credentials up to date with those provided. Can raise ConnectionFailure or OperationFailure.

Parameters:
  • all_credentials: dict, maps auth source to MongoCredential.
close_socket(reason)

Close this connection with a reason.

command(dbname, spec, slave_ok=False, read_preference=Primary(), codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=PYTHON_LEGACY, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None)), check=True, allowable_errors=None, check_keys=False, read_concern=None, write_concern=None, parse_write_concern_error=False, collation=None, session=None, client=None, retryable_write=False, publish_events=True, user_fields=None)

Execute a command or raise an error.

Parameters:
  • dbname: name of the database on which to run the command
  • spec: a command document as a dict, SON, or mapping object
  • slave_ok: whether to set the SlaveOkay wire protocol bit
  • read_preference: a read preference
  • codec_options: a CodecOptions instance
  • check: raise OperationFailure if there are errors
  • allowable_errors: errors to ignore if check is True
  • check_keys: if True, check spec for invalid keys
  • read_concern: The read concern for this command.
  • write_concern: The write concern for this command.
  • parse_write_concern_error: Whether to parse the writeConcernError field in the command response.
  • collation: The collation for this command.
  • session: optional ClientSession instance.
  • client: optional MongoClient for gossipping $clusterTime.
  • retryable_write: True if this command is a retryable write.
  • publish_events: Should we publish events for this command?
  • user_fields (optional): Response fields that should be decoded using the TypeDecoders from codec_options, passed to bson._decode_all_selective.
idle_time_seconds()

Seconds since this socket was last checked into its pool.

legacy_write(request_id, msg, max_doc_size, with_last_error)

Send OP_INSERT, etc., optionally returning response as a dict.

Can raise ConnectionFailure or OperationFailure.

Parameters:
  • request_id: an int.
  • msg: bytes, an OP_INSERT, OP_UPDATE, or OP_DELETE message, perhaps with a getlasterror command appended.
  • max_doc_size: size in bytes of the largest document in msg.
  • with_last_error: True if a getlasterror command is appended.
receive_message(request_id)

Receive a raw BSON message or raise ConnectionFailure.

If any exception is raised, the socket is closed.

send_cluster_time(command, session, client)

Add cluster time for MongoDB >= 3.6.

send_message(message, max_doc_size)

Send a raw BSON message or raise ConnectionFailure.

If a network exception is raised, the socket is closed.

validate_session(client, session)

Validate this session before use with client.

Raises error if this session is logged in as a different user or the client is not the one that created the session.

write_command(request_id, msg)

Send “insert” etc. command, returning response as a dict.

Can raise ConnectionFailure or OperationFailure.

Parameters:
  • request_id: an int.
  • msg: bytes, the command message.
read_concern – Tools for working with read concern.

Tools for working with read concerns.

class pymongo.read_concern.ReadConcern(level=None)
Parameters:
  • level: (string) The read concern level specifies the level of isolation for read operations. For example, a read operation using a read concern level of majority will only return data that has been written to a majority of nodes. If the level is left unspecified, the server default will be used.

New in version 3.2.

document

The document representation of this read concern.

Note

ReadConcern is immutable. Mutating the value of document does not mutate this ReadConcern.

level

The read concern level.

ok_for_legacy

Return True if this read concern is compatible with old wire protocol versions.

read_preferences – Utilities for choosing which member of a replica set to read from.

Utilities for choosing which member of a replica set to read from.

class pymongo.read_preferences.Primary

Primary read preference.

  • When directly connected to one mongod queries are allowed if the server is standalone or a replica set primary.
  • When connected to a mongos queries are sent to the primary of a shard.
  • When connected to a replica set queries are sent to the primary of the replica set.
document

Read preference as a document.

mode

The mode of this read preference instance.

name

The name of this read preference.

class pymongo.read_preferences.PrimaryPreferred(tag_sets=None, max_staleness=-1)

PrimaryPreferred read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are sent to the primary of a shard if available, otherwise a shard secondary.
  • When connected to a replica set queries are sent to the primary if available, otherwise a secondary.
Parameters:
  • tag_sets: The tag_sets to use if the primary is not available.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.Secondary(tag_sets=None, max_staleness=-1)

Secondary read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are distributed among shard secondaries. An error is raised if no secondaries are available.
  • When connected to a replica set queries are distributed among secondaries. An error is raised if no secondaries are available.
Parameters:
  • tag_sets: The tag_sets for this read preference.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.SecondaryPreferred(tag_sets=None, max_staleness=-1)

SecondaryPreferred read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are distributed among shard secondaries, or the shard primary if no secondary is available.
  • When connected to a replica set queries are distributed among secondaries, or the primary if no secondary is available.
Parameters:
  • tag_sets: The tag_sets for this read preference.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.Nearest(tag_sets=None, max_staleness=-1)

Nearest read preference.

  • When directly connected to one mongod queries are allowed to standalone servers, to a replica set primary, or to replica set secondaries.
  • When connected to a mongos queries are distributed among all members of a shard.
  • When connected to a replica set queries are distributed among all members.
Parameters:
  • tag_sets: The tag_sets for this read preference.
  • max_staleness: (integer, in seconds) The maximum estimated length of time a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations. Default -1, meaning no maximum. If it is set, it must be at least 90 seconds.
document

Read preference as a document.

max_staleness

The maximum estimated length of time (in seconds) a replica set secondary can fall behind the primary in replication before it will no longer be selected for operations, or -1 for no maximum.

min_wire_version

The wire protocol version the server must support.

Some read preferences impose version requirements on all servers (e.g. maxStalenessSeconds requires MongoDB 3.4 / maxWireVersion 5).

All servers’ maxWireVersion must be at least this read preference’s min_wire_version, or the driver raises ConfigurationError.

mode

The mode of this read preference instance.

mongos_mode

The mongos mode of this read preference.

name

The name of this read preference.

tag_sets

Set tag_sets to a list of dictionaries like [{‘dc’: ‘ny’}] to read only from members whose dc tag has the value "ny". To specify a priority-order for tag sets, provide a list of tag sets: [{'dc': 'ny'}, {'dc': 'la'}, {}]. A final, empty tag set, {}, means “read from any member that matches the mode, ignoring tags.” MongoReplicaSetClient tries each set of tags in turn until it finds a set of tags with at least one matching member.

class pymongo.read_preferences.ReadPreference

An enum that defines the read preference modes supported by PyMongo.

See High Availability and PyMongo for code examples.

A read preference is used in three cases:

MongoClient connected to a single mongod:

  • PRIMARY: Queries are allowed if the server is standalone or a replica set primary.
  • All other modes allow queries to standalone servers, to a replica set primary, or to replica set secondaries.

MongoClient initialized with the replicaSet option:

  • PRIMARY: Read from the primary. This is the default, and provides the strongest consistency. If no primary is available, raise AutoReconnect.
  • PRIMARY_PREFERRED: Read from the primary if available, or if there is none, read from a secondary.
  • SECONDARY: Read from a secondary. If no secondary is available, raise AutoReconnect.
  • SECONDARY_PREFERRED: Read from a secondary if available, otherwise from the primary.
  • NEAREST: Read from any member.

MongoClient connected to a mongos, with a sharded cluster of replica sets:

  • PRIMARY: Read from the primary of the shard, or raise OperationFailure if there is none. This is the default.
  • PRIMARY_PREFERRED: Read from the primary of the shard, or if there is none, read from a secondary of the shard.
  • SECONDARY: Read from a secondary of the shard, or raise OperationFailure if there is none.
  • SECONDARY_PREFERRED: Read from a secondary of the shard if available, otherwise from the shard primary.
  • NEAREST: Read from any shard member.
PRIMARY = Primary()
PRIMARY_PREFERRED = PrimaryPreferred(tag_sets=None, max_staleness=-1)
SECONDARY = Secondary(tag_sets=None, max_staleness=-1)
SECONDARY_PREFERRED = SecondaryPreferred(tag_sets=None, max_staleness=-1)
NEAREST = Nearest(tag_sets=None, max_staleness=-1)
results – Result class definitions

Result class definitions.

class pymongo.results.BulkWriteResult(bulk_api_result, acknowledged)

Create a BulkWriteResult instance.

Parameters:
  • bulk_api_result: A result dict from the bulk API
  • acknowledged: Was this write result acknowledged? If False then all properties of this object will raise InvalidOperation.
acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

bulk_api_result

The raw bulk API result.

deleted_count

The number of documents deleted.

inserted_count

The number of documents inserted.

matched_count

The number of documents matched for an update.

modified_count

The number of documents modified.

Note

modified_count is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, this attribute will be set to None.

upserted_count

The number of documents upserted.

upserted_ids

A map of operation index to the _id of the upserted document.

class pymongo.results.DeleteResult(raw_result, acknowledged)

The return type for delete_one() and delete_many()

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

deleted_count

The number of documents deleted.

raw_result

The raw result document returned by the server.

class pymongo.results.InsertManyResult(inserted_ids, acknowledged)

The return type for insert_many().

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

inserted_ids

A list of _ids of the inserted documents, in the order provided.

Note

If False is passed for the ordered parameter to insert_many() the server may have inserted the documents in a different order than what is presented here.

class pymongo.results.InsertOneResult(inserted_id, acknowledged)

The return type for insert_one().

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

inserted_id

The inserted document’s _id.

class pymongo.results.UpdateResult(raw_result, acknowledged)

The return type for update_one(), update_many(), and replace_one().

acknowledged

Is this the result of an acknowledged write operation?

The acknowledged attribute will be False when using WriteConcern(w=0), otherwise True.

Note

If the acknowledged attribute is False all other attibutes of this class will raise InvalidOperation when accessed. Values for other attributes cannot be determined if the write operation was unacknowledged.

See also

WriteConcern

matched_count

The number of documents matched for this update.

modified_count

The number of documents modified.

Note

modified_count is only reported by MongoDB 2.6 and later. When connected to an earlier server version, or in certain mixed version sharding configurations, this attribute will be set to None.

raw_result

The raw result document returned by the server.

upserted_id

The _id of the inserted document if an upsert took place. Otherwise None.

son_manipulator – Manipulators that can edit SON documents as they are saved or retrieved

DEPRECATED: Manipulators that can edit SON objects as they enter and exit a database.

The SONManipulator API has limitations as a technique for transforming your data. Instead, it is more flexible and straightforward to transform outgoing documents in your own code before passing them to PyMongo, and transform incoming documents after receiving them from PyMongo. SON Manipulators will be removed from PyMongo in 4.0.

PyMongo does not apply SON manipulators to documents passed to the modern methods bulk_write(), insert_one(), insert_many(), update_one(), or update_many(). SON manipulators are not applied to documents returned by the modern methods find_one_and_delete(), find_one_and_replace(), and find_one_and_update().

class pymongo.son_manipulator.AutoReference(db)

Transparently reference and de-reference already saved embedded objects.

This manipulator should probably only be used when the NamespaceInjector is also being used, otherwise it doesn’t make too much sense - documents can only be auto-referenced if they have an _ns field.

NOTE: this will behave poorly if you have a circular reference.

TODO: this only works for documents that are in the same database. To fix this we’ll need to add a DatabaseInjector that adds _db and then make use of the optional database support for DBRefs.

transform_incoming(son, collection)

Replace embedded documents with DBRefs.

transform_outgoing(son, collection)

Replace DBRefs with embedded documents.

will_copy()

We need to copy so the user’s document doesn’t get transformed refs.

class pymongo.son_manipulator.NamespaceInjector

A son manipulator that adds the _ns field.

transform_incoming(son, collection)

Add the _ns field to the incoming object

class pymongo.son_manipulator.ObjectIdInjector

A son manipulator that adds the _id field if it is missing.

Changed in version 2.7: ObjectIdInjector is no longer used by PyMongo, but remains in this module for backwards compatibility.

transform_incoming(son, collection)

Add an _id field if it is missing.

class pymongo.son_manipulator.ObjectIdShuffler

A son manipulator that moves _id to the first position.

transform_incoming(son, collection)

Move _id to the front if it’s there.

will_copy()

We need to copy to be sure that we are dealing with SON, not a dict.

class pymongo.son_manipulator.SONManipulator

A base son manipulator.

This manipulator just saves and restores objects without changing them.

transform_incoming(son, collection)

Manipulate an incoming SON object.

Parameters:
  • son: the SON object to be inserted into the database
  • collection: the collection the object is being inserted into
transform_outgoing(son, collection)

Manipulate an outgoing SON object.

Parameters:
  • son: the SON object being retrieved from the database
  • collection: the collection this object was stored in
will_copy()

Will this SON manipulator make a copy of the incoming document?

Derived classes that do need to make a copy should override this method, returning True instead of False. All non-copying manipulators will be applied first (so that the user’s document will be updated appropriately), followed by copying manipulators.

uri_parser – Tools to parse and validate a MongoDB URI

Tools to parse and validate a MongoDB URI.

pymongo.uri_parser.parse_host(entity, default_port=27017)

Validates a host string

Returns a 2-tuple of host followed by port where port is default_port if it wasn’t specified in the string.

Parameters:
  • entity: A host or host:port string where host could be a
    hostname or IP address.
  • default_port: The port number to use when one wasn’t
    specified in entity.
pymongo.uri_parser.parse_ipv6_literal_host(entity, default_port)

Validates an IPv6 literal host:port string.

Returns a 2-tuple of IPv6 literal followed by port where port is default_port if it wasn’t specified in entity.

Parameters:
  • entity: A string that represents an IPv6 literal enclosed
    in braces (e.g. ‘[::1]’ or ‘[::1]:27017’).
  • default_port: The port number to use when one wasn’t
    specified in entity.
pymongo.uri_parser.parse_uri(uri, default_port=27017, validate=True, warn=False, normalize=True, connect_timeout=None)

Parse and validate a MongoDB URI.

Returns a dict of the form:

{
    'nodelist': <list of (host, port) tuples>,
    'username': <username> or None,
    'password': <password> or None,
    'database': <database name> or None,
    'collection': <collection name> or None,
    'options': <dict of MongoDB URI options>,
    'fqdn': <fqdn of the MongoDB+SRV URI> or None
}

If the URI scheme is “mongodb+srv://” DNS SRV and TXT lookups will be done to build nodelist and options.

Parameters:
  • uri: The MongoDB URI to parse.
  • default_port: The port number to use when one wasn’t specified for a host in the URI.
  • validate (optional): If True (the default), validate and normalize all options. Default: True.
  • warn (optional): When validating, if True then will warn the user then ignore any invalid options or values. If False, validation will error when options are unsupported or values are invalid. Default: False.
  • normalize (optional): If True, convert names of URI options to their internally-used names. Default: True.
  • connect_timeout (optional): The maximum time in milliseconds to wait for a response from the DNS server.

Changed in version 3.9: Added the normalize parameter.

Changed in version 3.6: Added support for mongodb+srv:// URIs.

Changed in version 3.5: Return the original value of the readPreference MongoDB URI option instead of the validated read preference mode.

Changed in version 3.1: warn added so invalid options can be ignored.

pymongo.uri_parser.parse_userinfo(userinfo)

Validates the format of user information in a MongoDB URI. Reserved characters like ‘:’, ‘/’, ‘+’ and ‘@’ must be escaped following RFC 3986.

Returns a 2-tuple containing the unescaped username followed by the unescaped password.

Paramaters:
  • userinfo: A string of the form <username>:<password>

Changed in version 2.2: Now uses urllib.unquote_plus so + characters must be escaped.

pymongo.uri_parser.split_hosts(hosts, default_port=27017)

Takes a string of the form host1[:port],host2[:port]… and splits it into (host, port) tuples. If [:port] isn’t present the default_port is used.

Returns a set of 2-tuples containing the host name (or IP) followed by port number.

Parameters:
  • hosts: A string of the form host1[:port],host2[:port],…
  • default_port: The port number to use when one wasn’t specified for a host.
pymongo.uri_parser.split_options(opts, validate=True, warn=False, normalize=True)

Takes the options portion of a MongoDB URI, validates each option and returns the options in a dictionary.

Parameters:
  • opt: A string representing MongoDB URI options.
  • validate: If True (the default), validate and normalize all options.
  • warn: If False (the default), suppress all warnings raised during validation of options.
  • normalize: If True (the default), renames all options to their internally-used names.
pymongo.uri_parser.validate_options(opts, warn=False)

Validates and normalizes options passed in a MongoDB URI.

Returns a new dictionary of validated and normalized options. If warn is False then errors will be thrown for invalid options, otherwise they will be ignored and a warning will be issued.

Parameters:
  • opts: A dict of MongoDB URI options.
  • warn (optional): If True then warnings will be logged and invalid options will be ignored. Otherwise invalid options will cause errors.
write_concern – Tools for specifying write concern

Tools for working with write concerns.

class pymongo.write_concern.WriteConcern(w=None, wtimeout=None, j=None, fsync=None)
Parameters:
  • w: (integer or string) Used with replication, write operations will block until they have been replicated to the specified number or tagged set of servers. w=<integer> always includes the replica set primary (e.g. w=3 means write to the primary and wait until replicated to two secondaries). w=0 disables acknowledgement of write operations and can not be used with other write concern options.
  • wtimeout: (integer) Used in conjunction with w. Specify a value in milliseconds to control how long to wait for write propagation to complete. If replication does not complete in the given timeframe, a timeout exception is raised.
  • j: If True block until write operations have been committed to the journal. Cannot be used in combination with fsync. Prior to MongoDB 2.6 this option was ignored if the server was running without journaling. Starting with MongoDB 2.6 write operations will fail with an exception if this option is used when the server is running without journaling.
  • fsync: If True and the server is running without journaling, blocks until the server has synced all data files to disk. If the server is running with journaling, this acts the same as the j option, blocking until write operations have been committed to the journal. Cannot be used in combination with j.
acknowledged

If True write operations will wait for acknowledgement before returning.

document

The document representation of this write concern.

Note

WriteConcern is immutable. Mutating the value of document does not mutate this WriteConcern.

is_server_default

Does this WriteConcern match the server default.

gridfs – Tools for working with GridFS

GridFS is a specification for storing large objects in Mongo.

The gridfs package is an implementation of GridFS on top of pymongo, exposing a file-like interface.

See also

The MongoDB documentation on

gridfs

class gridfs.GridFS(database, collection='fs', disable_md5=False)

Create a new instance of GridFS.

Raises TypeError if database is not an instance of Database.

Parameters:
  • database: database to use
  • collection (optional): root collection to use
  • disable_md5 (optional): When True, MD5 checksums will not be computed for uploaded files. Useful in environments where MD5 cannot be used for regulatory or other reasons. Defaults to False.

Changed in version 3.1: Indexes are only ensured on the first write to the DB.

Changed in version 3.0: database must use an acknowledged write_concern

See also

The MongoDB documentation on

gridfs

delete(file_id, session=None)

Delete a file from GridFS by "_id".

Deletes all data belonging to the file with "_id": file_id.

Warning

Any processes/threads reading from the file while this method is executing will likely see an invalid/corrupt file. Care should be taken to avoid concurrent reads to a file while it is being deleted.

Note

Deletes of non-existent files are considered successful since the end result is the same: no file with that _id remains.

Parameters:
  • file_id: "_id" of the file to delete
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

Changed in version 3.1: delete no longer ensures indexes.

exists(document_or_id=None, session=None, **kwargs)

Check if a file exists in this instance of GridFS.

The file to check for can be specified by the value of its _id key, or by passing in a query document. A query document can be passed in as dictionary, or by using keyword arguments. Thus, the following three calls are equivalent:

>>> fs.exists(file_id)
>>> fs.exists({"_id": file_id})
>>> fs.exists(_id=file_id)

As are the following two calls:

>>> fs.exists({"filename": "mike.txt"})
>>> fs.exists(filename="mike.txt")

And the following two:

>>> fs.exists({"foo": {"$gt": 12}})
>>> fs.exists(foo={"$gt": 12})

Returns True if a matching file exists, False otherwise. Calls to exists() will not automatically create appropriate indexes; application developers should be sure to create indexes if needed and as appropriate.

Parameters:
  • document_or_id (optional): query document, or _id of the document to check for
  • session (optional): a ClientSession
  • **kwargs (optional): keyword arguments are used as a query document, if they’re present.

Changed in version 3.6: Added session parameter.

find(*args, **kwargs)

Query GridFS for files.

Returns a cursor that iterates across files matching arbitrary queries on the files collection. Can be combined with other modifiers for additional control. For example:

for grid_out in fs.find({"filename": "lisa.txt"},
                        no_cursor_timeout=True):
    data = grid_out.read()

would iterate through all versions of “lisa.txt” stored in GridFS. Note that setting no_cursor_timeout to True may be important to prevent the cursor from timing out during long multi-file processing work.

As another example, the call:

most_recent_three = fs.find().sort("uploadDate", -1).limit(3)

would return a cursor to the three most recently uploaded files in GridFS.

Follows a similar interface to find() in Collection.

If a ClientSession is passed to find(), all returned GridOut instances are associated with that session.

Parameters:
  • filter (optional): a SON object specifying elements which must be present for a document to be included in the result set
  • skip (optional): the number of files to omit (from the start of the result set) when returning the results
  • limit (optional): the maximum number of results to return
  • no_cursor_timeout (optional): if False (the default), any returned cursor is closed by the server after 10 minutes of inactivity. If set to True, the returned cursor will never time out on the server. Care should be taken to ensure that cursors with no_cursor_timeout turned on are properly closed.
  • sort (optional): a list of (key, direction) pairs specifying the sort order for this query. See sort() for details.

Raises TypeError if any of the arguments are of improper type. Returns an instance of GridOutCursor corresponding to this query.

Changed in version 3.0: Removed the read_preference, tag_sets, and secondary_acceptable_latency_ms options.

New in version 2.7.

See also

The MongoDB documentation on

find

find_one(filter=None, session=None, *args, **kwargs)

Get a single file from gridfs.

All arguments to find() are also valid arguments for find_one(), although any limit argument will be ignored. Returns a single GridOut, or None if no matching file is found. For example:

file = fs.find_one({"filename": "lisa.txt"})
Parameters:
  • filter (optional): a dictionary specifying the query to be performing OR any other type to be used as the value for a query for "_id" in the file collection.
  • *args (optional): any additional positional arguments are the same as the arguments to find().
  • session (optional): a ClientSession
  • **kwargs (optional): any additional keyword arguments are the same as the arguments to find().

Changed in version 3.6: Added session parameter.

get(file_id, session=None)

Get a file from GridFS by "_id".

Returns an instance of GridOut, which provides a file-like interface for reading.

Parameters:
  • file_id: "_id" of the file to get
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

get_last_version(filename=None, session=None, **kwargs)

Get the most recent version of a file in GridFS by "filename" or metadata fields.

Equivalent to calling get_version() with the default version (-1).

Parameters:
  • filename: "filename" of the file to get, or None
  • session (optional): a ClientSession
  • **kwargs (optional): find files by custom metadata.

Changed in version 3.6: Added session parameter.

get_version(filename=None, version=-1, session=None, **kwargs)

Get a file from GridFS by "filename" or metadata fields.

Returns a version of the file in GridFS whose filename matches filename and whose metadata fields match the supplied keyword arguments, as an instance of GridOut.

Version numbering is a convenience atop the GridFS API provided by MongoDB. If more than one file matches the query (either by filename alone, by metadata fields, or by a combination of both), then version -1 will be the most recently uploaded matching file, -2 the second most recently uploaded, etc. Version 0 will be the first version uploaded, 1 the second version, etc. So if three versions have been uploaded, then version 0 is the same as version -3, version 1 is the same as version -2, and version 2 is the same as version -1.

Raises NoFile if no such version of that file exists.

Parameters:
  • filename: "filename" of the file to get, or None
  • version (optional): version of the file to get (defaults to -1, the most recent version uploaded)
  • session (optional): a ClientSession
  • **kwargs (optional): find files by custom metadata.

Changed in version 3.6: Added session parameter.

Changed in version 3.1: get_version no longer ensures indexes.

list(session=None)

List the names of all files stored in this instance of GridFS.

Parameters:

Changed in version 3.6: Added session parameter.

Changed in version 3.1: list no longer ensures indexes.

new_file(**kwargs)

Create a new file in GridFS.

Returns a new GridIn instance to which data can be written. Any keyword arguments will be passed through to GridIn().

If the "_id" of the file is manually specified, it must not already exist in GridFS. Otherwise FileExists is raised.

Parameters:
  • **kwargs (optional): keyword arguments for file creation
put(data, **kwargs)

Put data in GridFS as a new file.

Equivalent to doing:

try:
    f = new_file(**kwargs)
    f.write(data)
finally:
    f.close()

data can be either an instance of str (bytes in python 3) or a file-like object providing a read() method. If an encoding keyword argument is passed, data can also be a unicode (str in python 3) instance, which will be encoded as encoding before being written. Any keyword arguments will be passed through to the created file - see GridIn() for possible arguments. Returns the "_id" of the created file.

If the "_id" of the file is manually specified, it must not already exist in GridFS. Otherwise FileExists is raised.

Parameters:
  • data: data to be written as a file.
  • **kwargs (optional): keyword arguments for file creation

Changed in version 3.0: w=0 writes to GridFS are now prohibited.

class gridfs.GridFSBucket(db, bucket_name='fs', chunk_size_bytes=261120, write_concern=None, read_preference=None, disable_md5=False)

Create a new instance of GridFSBucket.

Raises TypeError if database is not an instance of Database.

Raises ConfigurationError if write_concern is not acknowledged.

Parameters:
  • database: database to use.
  • bucket_name (optional): The name of the bucket. Defaults to ‘fs’.
  • chunk_size_bytes (optional): The chunk size in bytes. Defaults to 255KB.
  • write_concern (optional): The WriteConcern to use. If None (the default) db.write_concern is used.
  • read_preference (optional): The read preference to use. If None (the default) db.read_preference is used.
  • disable_md5 (optional): When True, MD5 checksums will not be computed for uploaded files. Useful in environments where MD5 cannot be used for regulatory or other reasons. Defaults to False.

New in version 3.1.

See also

The MongoDB documentation on

gridfs

delete(file_id, session=None)

Given an file_id, delete this stored file’s files collection document and associated chunks from a GridFS bucket.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get _id of file to delete
file_id = fs.upload_from_stream("test_file", "data I want to store!")
fs.delete(file_id)

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be deleted.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

download_to_stream(file_id, destination, session=None)

Downloads the contents of the stored file specified by file_id and writes the contents to destination.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get _id of file to read
file_id = fs.upload_from_stream("test_file", "data I want to store!")
# Get file to write to
file = open('myfile','wb+')
fs.download_to_stream(file_id, file)
file.seek(0)
contents = file.read()

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be downloaded.
  • destination: a file-like object implementing write().
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

download_to_stream_by_name(filename, destination, revision=-1, session=None)

Write the contents of filename (with optional revision) to destination.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get file to write to
file = open('myfile','wb')
fs.download_to_stream_by_name("test_file", file)

Raises NoFile if no such version of that file exists.

Raises ValueError if filename is not a string.

Parameters:
  • filename: The name of the file to read from.
  • destination: A file-like object that implements write().
  • revision (optional): Which revision (documents with the same filename and different uploadDate) of the file to retrieve. Defaults to -1 (the most recent revision).
  • session (optional): a ClientSession
Note:

Revision numbers are defined as follows:

  • 0 = the original stored file
  • 1 = the first revision
  • 2 = the second revision
  • etc…
  • -2 = the second most recent revision
  • -1 = the most recent revision

Changed in version 3.6: Added session parameter.

find(*args, **kwargs)

Find and return the files collection documents that match filter

Returns a cursor that iterates across files matching arbitrary queries on the files collection. Can be combined with other modifiers for additional control.

For example:

for grid_data in fs.find({"filename": "lisa.txt"},
                        no_cursor_timeout=True):
    data = grid_data.read()

would iterate through all versions of “lisa.txt” stored in GridFS. Note that setting no_cursor_timeout to True may be important to prevent the cursor from timing out during long multi-file processing work.

As another example, the call:

most_recent_three = fs.find().sort("uploadDate", -1).limit(3)

would return a cursor to the three most recently uploaded files in GridFS.

Follows a similar interface to find() in Collection.

If a ClientSession is passed to find(), all returned GridOut instances are associated with that session.

Parameters:
  • filter: Search query.
  • batch_size (optional): The number of documents to return per batch.
  • limit (optional): The maximum number of documents to return.
  • no_cursor_timeout (optional): The server normally times out idle cursors after an inactivity period (10 minutes) to prevent excess memory use. Set this option to True prevent that.
  • skip (optional): The number of documents to skip before returning.
  • sort (optional): The order by which to sort results. Defaults to None.
open_download_stream(file_id, session=None)

Opens a Stream from which the application can read the contents of the stored file specified by file_id.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# get _id of file to read.
file_id = fs.upload_from_stream("test_file", "data I want to store!")
grid_out = fs.open_download_stream(file_id)
contents = grid_out.read()

Returns an instance of GridOut.

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be downloaded.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

open_download_stream_by_name(filename, revision=-1, session=None)

Opens a Stream from which the application can read the contents of filename and optional revision.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
grid_out = fs.open_download_stream_by_name("test_file")
contents = grid_out.read()

Returns an instance of GridOut.

Raises NoFile if no such version of that file exists.

Raises ValueError filename is not a string.

Parameters:
  • filename: The name of the file to read from.
  • revision (optional): Which revision (documents with the same filename and different uploadDate) of the file to retrieve. Defaults to -1 (the most recent revision).
  • session (optional): a ClientSession
Note:

Revision numbers are defined as follows:

  • 0 = the original stored file
  • 1 = the first revision
  • 2 = the second revision
  • etc…
  • -2 = the second most recent revision
  • -1 = the most recent revision

Changed in version 3.6: Added session parameter.

open_upload_stream(filename, chunk_size_bytes=None, metadata=None, session=None)

Opens a Stream that the application can write the contents of the file to.

The user must specify the filename, and can choose to add any additional information in the metadata field of the file document or modify the chunk size. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
grid_in = fs.open_upload_stream(
      "test_file", chunk_size_bytes=4,
      metadata={"contentType": "text/plain"})
grid_in.write("data I want to store!")
grid_in.close()  # uploaded on close

Returns an instance of GridIn.

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • filename: The name of the file to upload.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes in GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

open_upload_stream_with_id(file_id, filename, chunk_size_bytes=None, metadata=None, session=None)

Opens a Stream that the application can write the contents of the file to.

The user must specify the file id and filename, and can choose to add any additional information in the metadata field of the file document or modify the chunk size. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
grid_in = fs.open_upload_stream_with_id(
      ObjectId(),
      "test_file",
      chunk_size_bytes=4,
      metadata={"contentType": "text/plain"})
grid_in.write("data I want to store!")
grid_in.close()  # uploaded on close

Returns an instance of GridIn.

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • file_id: The id to use for this file. The id must not have already been used for another file.
  • filename: The name of the file to upload.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes in GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

rename(file_id, new_filename, session=None)

Renames the stored file with the specified file_id.

For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
# Get _id of file to rename
file_id = fs.upload_from_stream("test_file", "data I want to store!")
fs.rename(file_id, "new_test_name")

Raises NoFile if no file with file_id exists.

Parameters:
  • file_id: The _id of the file to be renamed.
  • new_filename: The new name of the file.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

upload_from_stream(filename, source, chunk_size_bytes=None, metadata=None, session=None)

Uploads a user file to a GridFS bucket.

Reads the contents of the user file from source and uploads it to the file filename. Source can be a string or file-like object. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
file_id = fs.upload_from_stream(
    "test_file",
    "data I want to store!",
    chunk_size_bytes=4,
    metadata={"contentType": "text/plain"})

Returns the _id of the uploaded file.

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • filename: The name of the file to upload.
  • source: The source stream of the content to be uploaded. Must be a file-like object that implements read() or a string.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes of GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

upload_from_stream_with_id(file_id, filename, source, chunk_size_bytes=None, metadata=None, session=None)

Uploads a user file to a GridFS bucket with a custom file id.

Reads the contents of the user file from source and uploads it to the file filename. Source can be a string or file-like object. For example:

my_db = MongoClient().test
fs = GridFSBucket(my_db)
file_id = fs.upload_from_stream(
    ObjectId(),
    "test_file",
    "data I want to store!",
    chunk_size_bytes=4,
    metadata={"contentType": "text/plain"})

Raises NoFile if no such version of that file exists. Raises ValueError if filename is not a string.

Parameters:
  • file_id: The id to use for this file. The id must not have already been used for another file.
  • filename: The name of the file to upload.
  • source: The source stream of the content to be uploaded. Must be a file-like object that implements read() or a string.
  • chunk_size_bytes (options): The number of bytes per chunk of this file. Defaults to the chunk_size_bytes of GridFSBucket.
  • metadata (optional): User data for the ‘metadata’ field of the files collection document. If not provided the metadata field will be omitted from the files collection document.
  • session (optional): a ClientSession

Changed in version 3.6: Added session parameter.

Sub-modules:

errors – Exceptions raised by the gridfs package

Exceptions raised by the gridfs package

exception gridfs.errors.CorruptGridFile(message='', error_labels=None)

Raised when a file in GridFS is malformed.

exception gridfs.errors.FileExists(message='', error_labels=None)

Raised when trying to create a file that already exists.

exception gridfs.errors.GridFSError(message='', error_labels=None)

Base class for all GridFS exceptions.

exception gridfs.errors.NoFile(message='', error_labels=None)

Raised when trying to read from a non-existent file.

grid_file – Tools for representing files stored in GridFS

Tools for representing files stored in GridFS.

class gridfs.grid_file.GridIn(root_collection, session=None, disable_md5=False, **kwargs)

Write a file to GridFS

Application developers should generally not need to instantiate this class directly - instead see the methods provided by GridFS.

Raises TypeError if root_collection is not an instance of Collection.

Any of the file level options specified in the GridFS Spec may be passed as keyword arguments. Any additional keyword arguments will be set as additional fields on the file document. Valid keyword arguments include:

  • "_id": unique ID for this file (default: ObjectId) - this "_id" must not have already been used for another file
  • "filename": human name for the file
  • "contentType" or "content_type": valid mime-type for the file
  • "chunkSize" or "chunk_size": size of each of the chunks, in bytes (default: 255 kb)
  • "encoding": encoding used for this file. In Python 2, any unicode that is written to the file will be converted to a str. In Python 3, any str that is written to the file will be converted to bytes.
Parameters:
  • root_collection: root collection to write to
  • session (optional): a ClientSession to use for all commands
  • disable_md5 (optional): When True, an MD5 checksum will not be computed for the uploaded file. Useful in environments where MD5 cannot be used for regulatory or other reasons. Defaults to False.
  • **kwargs (optional): file level options (see above)

Changed in version 3.6: Added session parameter.

Changed in version 3.0: root_collection must use an acknowledged write_concern

_id

The '_id' value for this file.

This attribute is read-only.

abort()

Remove all chunks/files that may have been uploaded and close.

chunk_size

Chunk size for this file.

This attribute is read-only.

close()

Flush the file and close it.

A closed file cannot be written any more. Calling close() more than once is allowed.

closed

Is this file closed?

content_type

Mime-type for this file.

filename

Name of this file.

length

Length (in bytes) of this file.

This attribute is read-only and can only be read after close() has been called.

md5

MD5 of the contents of this file if an md5 sum was created.

This attribute is read-only and can only be read after close() has been called.

name

Alias for filename.

upload_date

Date that this file was uploaded.

This attribute is read-only and can only be read after close() has been called.

write(data)

Write data to the file. There is no return value.

data can be either a string of bytes or a file-like object (implementing read()). If the file has an encoding attribute, data can also be a unicode (str in python 3) instance, which will be encoded as encoding before being written.

Due to buffering, the data may not actually be written to the database until the close() method is called. Raises ValueError if this file is already closed. Raises TypeError if data is not an instance of str (bytes in python 3), a file-like object, or an instance of unicode (str in python 3). Unicode data is only allowed if the file has an encoding attribute.

Parameters:
  • data: string of bytes or file-like object to be written to the file
writelines(sequence)

Write a sequence of strings to the file.

Does not add seperators.

class gridfs.grid_file.GridOut(root_collection, file_id=None, file_document=None, session=None)

Read a file from GridFS

Application developers should generally not need to instantiate this class directly - instead see the methods provided by GridFS.

Either file_id or file_document must be specified, file_document will be given priority if present. Raises TypeError if root_collection is not an instance of Collection.

Parameters:
  • root_collection: root collection to read from
  • file_id (optional): value of "_id" for the file to read
  • file_document (optional): file document from root_collection.files
  • session (optional): a ClientSession to use for all commands

Changed in version 3.8: For better performance and to better follow the GridFS spec, GridOut now uses a single cursor to read all the chunks in the file.

Changed in version 3.6: Added session parameter.

Changed in version 3.0: Creating a GridOut does not immediately retrieve the file metadata from the server. Metadata is fetched when first needed.

_id

The '_id' value for this file.

This attribute is read-only.

__iter__()

Return an iterator over all of this file’s data.

The iterator will return chunk-sized instances of str (bytes in python 3). This can be useful when serving files using a webserver that handles such an iterator efficiently.

Note

This is different from io.IOBase which iterates over lines in the file. Use GridOut.readline() to read line by line instead of chunk by chunk.

Changed in version 3.8: The iterator now raises CorruptGridFile when encountering any truncated, missing, or extra chunk in a file. The previous behavior was to only raise CorruptGridFile on a missing chunk.

aliases

List of aliases for this file.

This attribute is read-only.

chunk_size

Chunk size for this file.

This attribute is read-only.

close()

Make GridOut more generically file-like.

content_type

Mime-type for this file.

This attribute is read-only.

filename

Name of this file.

This attribute is read-only.

length

Length (in bytes) of this file.

This attribute is read-only.

md5

MD5 of the contents of this file if an md5 sum was created.

This attribute is read-only.

metadata

Metadata attached to this file.

This attribute is read-only.

name

Alias for filename.

This attribute is read-only.

read(size=-1)

Read at most size bytes from the file (less if there isn’t enough data).

The bytes are returned as an instance of str (bytes in python 3). If size is negative or omitted all data is read.

Parameters:
  • size (optional): the number of bytes to read

Changed in version 3.8: This method now only checks for extra chunks after reading the entire file. Previously, this method would check for extra chunks on every call.

readchunk()

Reads a chunk at a time. If the current position is within a chunk the remainder of the chunk is returned.

readline(size=-1)

Read one line or up to size bytes from the file.

Parameters:
  • size (optional): the maximum number of bytes to read
seek(pos, whence=0)

Set the current position of this file.

Parameters:
  • pos: the position (or offset if using relative positioning) to seek to
  • whence (optional): where to seek from. os.SEEK_SET (0) for absolute file positioning, os.SEEK_CUR (1) to seek relative to the current position, os.SEEK_END (2) to seek relative to the file’s end.
tell()

Return the current position of this file.

upload_date

Date that this file was first uploaded.

This attribute is read-only.

class gridfs.grid_file.GridOutCursor(collection, filter=None, skip=0, limit=0, no_cursor_timeout=False, sort=None, batch_size=0, session=None)

Create a new cursor, similar to the normal Cursor.

Should not be called directly by application developers - see the GridFS method find() instead.

See also

The MongoDB documentation on

cursors

add_option(*args, **kwargs)

Set arbitrary query flags using a bitmask.

To set the tailable flag: cursor.add_option(2)

next()

Get next GridOut object from cursor.

remove_option(*args, **kwargs)

Unset arbitrary query flags using a bitmask.

To unset the tailable flag: cursor.remove_option(2)

Tools

Many tools have been written for working with PyMongo. If you know of or have created a tool for working with MongoDB from Python please list it here.

Note

We try to keep this list current. As such, projects that have not been updated recently or appear to be unmaintained will occasionally be removed from the list or moved to the back (to keep the list from becoming too intimidating).

If a project gets removed that is still being developed or is in active use please let us know or add it back.

ORM-like Layers

Some people have found that they prefer to work with a layer that has more features than PyMongo provides. Often, things like models and validation are desired. To that end, several different ORM-like layers have been written by various authors.

It is our recommendation that new users begin by working directly with PyMongo, as described in the rest of this documentation. Many people have found that the features of PyMongo are enough for their needs. Even if you eventually come to the decision to use one of these layers, the time spent working directly with the driver will have increased your understanding of how MongoDB actually works.

PyMODM
PyMODM is an ORM-like framework on top of PyMongo. PyMODM is maintained by engineers at MongoDB, Inc. and is quick to adopt new MongoDB features. PyMODM is a “core” ODM, meaning that it provides simple, extensible functionality that can be leveraged by other libraries to target platforms like Django. At the same time, PyMODM is powerful enough to be used for developing applications on its own. Complete documentation is available on readthedocs in addition to a Gitter channel for discussing the project.
Humongolus
Humongolus is a lightweight ORM framework for Python and MongoDB. The name comes from the combination of MongoDB and Homunculus (the concept of a miniature though fully formed human body). Humongolus allows you to create models/schemas with robust validation. It attempts to be as pythonic as possible and exposes the pymongo cursor objects whenever possible. The code is available for download at GitHub. Tutorials and usage examples are also available at GitHub.
Ming
Ming (the Merciless) is a library that allows you to enforce schemas on a MongoDB database in your Python application. It was developed by SourceForge in the course of their migration to MongoDB. See the introductory blog post for more details.
MongoEngine
MongoEngine is another ORM-like layer on top of PyMongo. It allows you to define schemas for documents and query collections using syntax inspired by the Django ORM. The code is available on GitHub; for more information, see the tutorial.
MotorEngine
MotorEngine is a port of MongoEngine to Motor, for asynchronous access with Tornado. It implements the same modeling APIs to be data-portable, meaning that a model defined in MongoEngine can be read in MotorEngine. The source is available on GitHub.
uMongo
uMongo is a Python MongoDB ODM. Its inception comes from two needs: the lack of async ODM and the difficulty to do document (un)serialization with existing ODMs. Works with multiple drivers: PyMongo, TxMongo, motor_asyncio, and mongomock. The source is available on GitHub
No longer maintained
MongoKit
The MongoKit framework is an ORM-like layer on top of PyMongo. There is also a MongoKit google group.
MongoAlchemy
MongoAlchemy is another ORM-like layer on top of PyMongo. Its API is inspired by SQLAlchemy. The code is available on GitHub; for more information, see the tutorial.
Minimongo
minimongo is a lightweight, pythonic interface to MongoDB. It retains pymongo’s query and update API, and provides a number of additional features, including a simple document-oriented interface, connection pooling, index management, and collection & database naming helpers. The source is on GitHub.
Manga
Manga aims to be a simpler ORM-like layer on top of PyMongo. The syntax for defining schema is inspired by the Django ORM, but Pymongo’s query language is maintained. The source is on GitHub.

Framework Tools

This section lists tools and adapters that have been designed to work with various Python frameworks and libraries.

Alternative Drivers

These are alternatives to PyMongo.

  • Motor is a full-featured, non-blocking MongoDB driver for Python Tornado applications.
  • TxMongo is an asynchronous Twisted Python driver for MongoDB.
  • MongoMock is a small library to help testing Python code that interacts with MongoDB via Pymongo.

Contributors

The following is a list of people who have contributed to PyMongo. If you belong here and are missing please let us know (or send a pull request after adding yourself to the list):

  • Mike Dirolf (mdirolf)
  • Jeff Jenkins (jeffjenkins)
  • Jim Jones
  • Eliot Horowitz (erh)
  • Michael Stephens (mikejs)
  • Joakim Sernbrant (serbaut)
  • Alexander Artemenko (svetlyak40wt)
  • Mathias Stearn (RedBeard0531)
  • Fajran Iman Rusadi (fajran)
  • Brad Clements (bkc)
  • Andrey Fedorov (andreyf)
  • Joshua Roesslein (joshthecoder)
  • Gregg Lind (gregglind)
  • Michael Schurter (schmichael)
  • Daniel Lundin
  • Michael Richardson (mtrichardson)
  • Dan McKinley (mcfunley)
  • David Wolever (wolever)
  • Carlos Valiente (carletes)
  • Jehiah Czebotar (jehiah)
  • Drew Perttula (drewp)
  • Carl Baatz (c-w-b)
  • Johan Bergstrom (jbergstroem)
  • Jonas Haag (jonashaag)
  • Kristina Chodorow (kchodorow)
  • Andrew Sibley (sibsibsib)
  • Flavio Percoco Premoli (FlaPer87)
  • Ken Kurzweil (kurzweil)
  • Christian Wyglendowski (dowski)
  • James Murty (jmurty)
  • Brendan W. McAdams (bwmcadams)
  • Bernie Hackett (behackett)
  • Reed O’Brien (reedobrien)
  • Francisco Souza (fsouza)
  • Alexey I. Froloff (raorn)
  • Steve Lacy (slacy)
  • Richard Shea (shearic)
  • Vladimir Sidorenko (gearheart)
  • Aaron Westendorf (awestendorf)
  • Dan Crosta (dcrosta)
  • Ryan Smith-Roberts (rmsr)
  • David Pisoni (gefilte)
  • Abhay Vardhan (abhayv)
  • Alexey Borzenkov (snaury)
  • Kostya Rybnikov (k-bx)
  • A Jesse Jiryu Davis (ajdavis)
  • Samuel Clay (samuelclay)
  • Ross Lawley (rozza)
  • Wouter Bolsterlee (wbolster)
  • Alex Grönholm (agronholm)
  • Christoph Simon (kalanzun)
  • Chris Tompkinson (tompko)
  • Mike O’Brien (mpobrien)
  • T Dampier (dampier)
  • Michael Henson (hensom)
  • Craig Hobbs (craigahobbs)
  • Emily Stolfo (estolfo)
  • Sam Helman (shelman)
  • Justin Patrin (reversefold)
  • Xiuming Chen (cxmcc)
  • Tyler Jones (thomascirca)
  • Amalia Hawkins (hawka)
  • Yuchen Ying (yegle)
  • Kyle Erf (3rf)
  • Luke Lovett (lovett89)
  • Jaroslav Semančík (girogiro)
  • Don Mitchell (dmitchell)
  • Ximing (armnotstrong)
  • Can Zhang (cannium)
  • Sergey Azovskov (last-g)
  • Heewa Barfchin (heewa)
  • Anna Herlihy (aherlihy)
  • Len Buckens (buckensl)
  • ultrabug
  • Shane Harvey (ShaneHarvey)
  • Cao Siyang (caosiyang)
  • Zhecong Kwok (gzcf)
  • TaoBeier(tao12345666333)
  • Jagrut Trivedi(Jagrut)
  • Shrey Batra(shreybatra)
  • Felipe Rodrigues(fbidu)
  • Terence Honles (terencehonles)

Changelog

Changes in Version 3.9.0

Version 3.9 adds support for MongoDB 4.2. Highlights include:

  • Support for MongoDB 4.2 sharded transactions. Sharded transactions have the same API as replica set transactions. See Transactions.

  • New method pymongo.client_session.ClientSession.with_transaction() to support conveniently running a transaction in a session with automatic retries and at-most-once semantics.

  • Initial support for client side field level encyption. See the docstring for MongoClient, AutoEncryptionOpts, and encryption for details. Note: Support for client side encryption is in beta. Backwards-breaking changes may be made before the final release.

  • Added the max_commit_time_ms parameter to start_transaction().

  • Implement the URI options specification in the MongoClient() constructor. Consequently, there are a number of changes in connection options:

    • The tlsInsecure option has been added.
    • The tls option has been added. The older ssl option has been retained as an alias to the new tls option.
    • wTimeout has been deprecated in favor of wTimeoutMS.
    • wTimeoutMS now overrides wTimeout if the user provides both.
    • j has been deprecated in favor of journal.
    • journal now overrides j if the user provides both.
    • ssl_cert_reqs has been deprecated in favor of tlsAllowInvalidCertificates. Instead of ssl.CERT_NONE, ssl.CERT_OPTIONAL and ssl.CERT_REQUIRED, the new option expects a boolean value - True is equivalent to ssl.CERT_NONE, while False is equivalent to ssl.CERT_REQUIRED.
    • ssl_match_hostname has been deprecated in favor of tlsAllowInvalidHostnames.
    • ssl_ca_certs has been deprecated in favor of tlsCAFile.
    • ssl_certfile has been deprecated in favor of tlsCertificateKeyFile.
    • ssl_pem_passphrase has been deprecated in favor of tlsCertificateKeyFilePassword.
    • waitQueueMultiple has been deprecated without replacement. This option was a poor solution for putting an upper bound on queuing since it didn’t affect queuing in other parts of the driver.
  • The retryWrites URI option now defaults to True. Supported write operations that fail with a retryable error will automatically be retried one time, with at-most-once semantics.

  • Support for retryable reads and the retryReads URI option which is enabled by default. See the MongoClient documentation for details. Now that supported operations are retried automatically and transparently, users should consider adjusting any custom retry logic to prevent an application from inadvertently retrying for too long.

  • Support zstandard for wire protocol compression.

  • Support for periodically polling DNS SRV records to update the mongos proxy list without having to change client configuration.

  • New method pymongo.database.Database.aggregate() to support running database level aggregations.

  • Support for publishing Connection Monitoring and Pooling events via the new ConnectionPoolListener class. See monitoring for an example.

  • pymongo.collection.Collection.aggregate() and pymongo.database.Database.aggregate() now support the $merge pipeline stage and use read preference PRIMARY if the $out or $merge pipeline stages are used.

  • Support for specifying a pipeline or document in update_one(), update_many(), find_one_and_update(), UpdateOne(), and UpdateMany().

  • New BSON utility functions encode() and decode()

  • Binary now supports any bytes-like type that implements the buffer protocol.

  • Resume tokens can now be accessed from a ChangeStream cursor using the resume_token attribute.

  • Connections now survive primary step-down when using MongoDB 4.2+. Applications should expect less socket connection turnover during replica set elections.

Unavoidable breaking changes:

  • Applications that use MongoDB with the MMAPv1 storage engine must now explicitly disable retryable writes via the connection string (e.g. MongoClient("mongodb://my.mongodb.cluster/db?retryWrites=false")) or the MongoClient constructor’s keyword argument (e.g. MongoClient("mongodb://my.mongodb.cluster/db", retryWrites=False)) to avoid running into OperationFailure exceptions during write operations. The MMAPv1 storage engine is deprecated and does not support retryable writes which are now turned on by default.
  • In order to ensure that the connectTimeoutMS URI option is honored when connecting to clusters with a mongodb+srv:// connection string, the minimum required version of the optional dnspython dependency has been bumped to 1.16.0. This is a breaking change for applications that use PyMongo’s SRV support with a version of dnspython older than 1.16.0.
Issues Resolved

See the PyMongo 3.9 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.8.0

Warning

PyMongo no longer supports Python 2.6. RHEL 6 users should install Python 2.7 or newer from Red Hat Software Collections. CentOS 6 users should install Python 2.7 or newer from SCL

Warning

PyMongo no longer supports PyPy3 versions older than 3.5. Users must upgrade to PyPy3.5+.

  • ObjectId now implements the ObjectID specification version 0.2.
  • For better performance and to better follow the GridFS spec, GridOut now uses a single cursor to read all the chunks in the file. Previously, each chunk in the file was queried individually using find_one().
  • gridfs.grid_file.GridOut.read() now only checks for extra chunks after reading the entire file. Previously, this method would check for extra chunks on every call.
  • current_op() now always uses the Database’s codec_options when decoding the command response. Previously the codec_options was only used when the MongoDB server version was <= 3.0.
  • Undeprecated get_default_database() and added the default parameter.
  • TLS Renegotiation is now disabled when possible.
  • Custom types can now be directly encoded to, and decoded from MongoDB using the TypeCodec and TypeRegistry APIs. For more information, see the custom type example.
  • Attempting a multi-document transaction on a sharded cluster now raises a ConfigurationError.
  • pymongo.cursor.Cursor.distinct() and pymongo.cursor.Cursor.count() now send the Cursor’s comment() as the “comment” top-level command option instead of “$comment”. Also, note that “comment” must be a string.
  • Add the filter parameter to list_collection_names().
  • Changes can now be requested from a ChangeStream cursor without blocking indefinitely using the new pymongo.change_stream.ChangeStream.try_next() method.
  • Fixed a reference leak bug when splitting a batched write command based on maxWriteBatchSize or the max message size.
  • Deprecated running find queries that set min() and/or max() but do not also set a hint() of which index to use. The find command is expected to require a hint() when using min/max starting in MongoDB 4.2.
  • Documented support for the uuidRepresentation URI option, which has been supported since PyMongo 2.7. Valid values are pythonLegacy (the default), javaLegacy, csharpLegacy and standard. New applications should consider setting this to standard for cross language compatibility.
  • RawBSONDocument now validates that the bson_bytes passed in represent a single bson document. Earlier versions would mistakenly accept multiple bson documents.
  • Iterating over a RawBSONDocument now maintains the same field order of the underlying raw BSON document.
  • Applications can now register a custom server selector. For more information see the server selector example.
  • The connection pool now implements a LIFO policy.

Unavoidable breaking changes:

  • In order to follow the ObjectID Spec version 0.2, an ObjectId’s 3-byte machine identifier and 2-byte process id have been replaced with a single 5-byte random value generated per process. This is a breaking change for any application that attempts to interpret those bytes.
Issues Resolved

See the PyMongo 3.8 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.7.2

Version 3.7.2 fixes a few issues discovered since the release of 3.7.1.

Issues Resolved

See the PyMongo 3.7.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.7.1

Version 3.7.1 fixes a few issues discovered since the release of 3.7.0.

  • Calling authenticate() more than once with the same credentials results in OperationFailure.
  • Authentication fails when SCRAM-SHA-1 is used to authenticate users with only MONGODB-CR credentials.
  • A millisecond rounding problem when decoding datetimes in the pure Python BSON decoder on 32 bit systems and AWS lambda.
Issues Resolved

See the PyMongo 3.7.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.7.0

Version 3.7 adds support for MongoDB 4.0. Highlights include:

Deprecations:

Unavoidable breaking changes:

  • Commands that fail with server error codes 10107, 13435, 13436, 11600, 11602, 189, 91 (NotMaster, NotMasterNoSlaveOk, NotMasterOrSecondary, InterruptedAtShutdown, InterruptedDueToReplStateChange, PrimarySteppedDown, ShutdownInProgress respectively) now always raise NotMasterError instead of OperationFailure.
  • parallel_scan() no longer uses an implicit session. Explicit sessions are still supported.
  • Unacknowledged writes (w=0) with an explicit session parameter now raise a client side error. Since PyMongo does not wait for a response for an unacknowledged write, two unacknowledged writes run serially by the client may be executed simultaneously on the server. However, the server requires a single session must not be used simultaneously by more than one operation. Therefore explicit sessions cannot support unacknowledged writes. Unacknowledged writes without a session parameter are still supported.
Issues Resolved

See the PyMongo 3.7 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.6.1

Version 3.6.1 fixes bugs reported since the release of 3.6.0:

  • Fix regression in PyMongo 3.5.0 that causes idle sockets to be closed almost instantly when maxIdleTimeMS is set. Idle sockets are now closed after maxIdleTimeMS milliseconds.
  • pymongo.mongo_client.MongoClient.max_idle_time_ms now returns milliseconds instead of seconds.
  • Properly import and use the monotonic library for monotonic time when it is installed.
  • aggregate() now ignores the batchSize argument when running a pipeline with a $out stage.
  • Always send handshake metadata for new connections.
Issues Resolved

See the PyMongo 3.6.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.6.0

Version 3.6 adds support for MongoDB 3.6, drops support for CPython 3.3 (PyPy3 is still supported), and drops support for MongoDB versions older than 2.6. If connecting to a MongoDB 2.4 server or older, PyMongo now throws a ConfigurationError.

Highlights include:

Deprecations:

  • The useCursor option for aggregate() is deprecated. The option was only necessary when upgrading from MongoDB 2.4 to MongoDB 2.6. MongoDB 2.4 is no longer supported.
  • The add_user() and remove_user() methods are deprecated. See the method docstrings for alternatives.

Unavoidable breaking changes:

  • Starting in MongoDB 3.6, the deprecated methods authenticate() and logout() now invalidate all cursors created prior. Instead of using these methods to change credentials, pass credentials for one user to the MongoClient at construction time, and either grant access to several databases to one user account, or use a distinct client object for each user.
  • BSON binary subtype 4 is decoded using RFC-4122 byte order regardless of the UUID representation. This is a change in behavior for applications that use UUID representation bson.binary.JAVA_LEGACY or bson.binary.CSHARP_LEGACY to decode BSON binary subtype 4. Other UUID representations, bson.binary.PYTHON_LEGACY (the default) and bson.binary.STANDARD, and the decoding of BSON binary subtype 3 are unchanged.
Issues Resolved

See the PyMongo 3.6 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.5.1

Version 3.5.1 fixes bugs reported since the release of 3.5.0:

  • Work around socket.getsockopt issue with NetBSD.
  • pymongo.command_cursor.CommandCursor.close() now closes the cursor synchronously instead of deferring to a background thread.
  • Fix documentation build warnings with Sphinx 1.6.x.
Issues Resolved

See the PyMongo 3.5.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.5

Version 3.5 implements a number of improvements and bug fixes:

Highlights include:

Changes and Deprecations:

Issues Resolved

See the PyMongo 3.5 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.4

Version 3.4 implements the new server features introduced in MongoDB 3.4 and a whole lot more:

Highlights include:

  • Complete support for MongoDB 3.4:
  • Improved support for logging server discovery and monitoring events. See monitoring for examples.
  • Support for matching iPAddress subjectAltName values for TLS certificate verification.
  • TLS compression is now explicitly disabled when possible.
  • The Server Name Indication (SNI) TLS extension is used when possible.
  • Finer control over JSON encoding/decoding with JSONOptions.
  • Allow Code objects to have a scope of None, signifying no scope. Also allow encoding Code objects with an empty scope (i.e. {}).

Warning

Starting in PyMongo 3.4, bson.code.Code.scope may return None, as the default scope is None instead of {}.

Note

PyMongo 3.4+ attempts to create sockets non-inheritable when possible (i.e. it sets the close-on-exec flag on socket file descriptors). Support is limited to a subset of POSIX operating systems (not including Windows) and the flag usually cannot be set in a single atomic operation. CPython 3.4+ implements PEP 446, creating all file descriptors non-inheritable by default. Users that require this behavior are encouraged to upgrade to CPython 3.4+.

Since 3.4rc0, the max staleness option has been renamed from maxStalenessMS to maxStalenessSeconds, its smallest value has changed from twice heartbeatFrequencyMS to 90 seconds, and its default value has changed from None or 0 to -1.

Issues Resolved

See the PyMongo 3.4 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.3.1

Version 3.3.1 fixes a memory leak when decoding elements inside of a RawBSONDocument.

Issues Resolved

See the PyMongo 3.3.1 release notes in Jira for the list of resolved issues in this release.

Changes in Version 3.3

Version 3.3 adds the following major new features:

  • C extensions support on big endian systems.
  • Kerberos authentication support on Windows using WinKerberos.
  • A new ssl_clrfile option to support certificate revocation lists.
  • A new ssl_pem_passphrase option to support encrypted key files.
  • Support for publishing server discovery and monitoring events. See monitoring for details.
  • New connection pool options minPoolSize and maxIdleTimeMS.
  • New heartbeatFrequencyMS option controls the rate at which background monitoring threads re-check servers. Default is once every 10 seconds.

Warning

PyMongo 3.3 drops support for MongoDB versions older than 2.4. It also drops support for python 3.2 (pypy3 continues to be supported).

Issues Resolved

See the PyMongo 3.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.2.2

Version 3.2.2 fixes a few issues reported since the release of 3.2.1, including a fix for using the connect option in the MongoDB URI and support for setting the batch size for a query to 1 when using MongoDB 3.2+.

Issues Resolved

See the PyMongo 3.2.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.2.1

Version 3.2.1 fixes a few issues reported since the release of 3.2, including running the mapreduce command twice when calling the inline_map_reduce() method and a TypeError being raised when calling download_to_stream(). This release also improves error messaging around BSON decoding.

Issues Resolved

See the PyMongo 3.2.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.2

Version 3.2 implements the new server features introduced in MongoDB 3.2.

Highlights include:

Note

Certain MongoClient properties now block until a connection is established or raise ServerSelectionTimeoutError if no server is available. See MongoClient for details.

Issues Resolved

See the PyMongo 3.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.1.1

Version 3.1.1 fixes a few issues reported since the release of 3.1, including a regression in error handling for oversize command documents and interrupt handling issues in the C extensions.

Issues Resolved

See the PyMongo 3.1.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.1

Version 3.1 implements a few new features and fixes bugs reported since the release of 3.0.3.

Highlights include:

  • Command monitoring support. See monitoring for details.
  • Configurable error handling for UnicodeDecodeError. See the unicode_decode_error_handler option of CodecOptions.
  • Optional automatic timezone conversion when decoding BSON datetime. See the tzinfo option of CodecOptions.
  • An implementation of GridFSBucket from the new GridFS spec.
  • Compliance with the new Connection String spec.
  • Reduced idle CPU usage in Python 2.
Changes in internal classes

The private PeriodicExecutor class no longer takes a condition_class option, and the private thread_util.Event class is removed.

Issues Resolved

See the PyMongo 3.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0.3

Version 3.0.3 fixes issues reported since the release of 3.0.2, including a feature breaking bug in the GSSAPI implementation.

Issues Resolved

See the PyMongo 3.0.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0.2

Version 3.0.2 fixes issues reported since the release of 3.0.1, most importantly a bug that could route operations to replica set members that are not in primary or secondary state when using PrimaryPreferred or Nearest. It is a recommended upgrade for all users of PyMongo 3.0.x.

Issues Resolved

See the PyMongo 3.0.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0.1

Version 3.0.1 fixes issues reported since the release of 3.0, most importantly a bug in GridFS.delete that could prevent file chunks from actually being deleted.

Issues Resolved

See the PyMongo 3.0.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 3.0

PyMongo 3.0 is a partial rewrite of PyMongo bringing a large number of improvements:

  • A unified client class. MongoClient is the one and only client class for connecting to a standalone mongod, replica set, or sharded cluster. Migrating from a standalone, to a replica set, to a sharded cluster can be accomplished with only a simple URI change.
  • MongoClient is much more responsive to configuration changes in your MongoDB deployment. All connected servers are monitored in a non-blocking manner. Slow to respond or down servers no longer block server discovery, reducing application startup time and time to respond to new or reconfigured servers and replica set failovers.
  • A unified CRUD API. All official MongoDB drivers now implement a standard CRUD API allowing polyglot developers to move from language to language with ease.
  • Single source support for Python 2.x and 3.x. PyMongo no longer relies on 2to3 to support Python 3.
  • A rewritten pure Python BSON implementation, improving performance with pypy and cpython deployments without support for C extensions.
  • Better support for greenlet based async frameworks including eventlet.
  • Immutable client, database, and collection classes, avoiding a host of thread safety issues in client applications.

PyMongo 3.0 brings a large number of API changes. Be sure to read the changes listed below before upgrading from PyMongo 2.x.

Warning

PyMongo no longer supports Python 2.4, 2.5, or 3.1. If you must use PyMongo with these versions of Python the 2.x branch of PyMongo will be minimally supported for some time.

SONManipulator changes

The SONManipulator API has limitations as a technique for transforming your data. Instead, it is more flexible and straightforward to transform outgoing documents in your own code before passing them to PyMongo, and transform incoming documents after receiving them from PyMongo.

Thus the add_son_manipulator() method is deprecated. PyMongo 3’s new CRUD API does not apply SON manipulators to documents passed to bulk_write(), insert_one(), insert_many(), update_one(), or update_many(). SON manipulators are not applied to documents returned by the new methods find_one_and_delete(), find_one_and_replace(), and find_one_and_update().

SSL/TLS changes

When ssl is True the ssl_cert_reqs option now defaults to ssl.CERT_REQUIRED if not provided. PyMongo will attempt to load OS provided CA certificates to verify the server, raising ConfigurationError if it cannot.

Gevent Support

In previous versions, PyMongo supported Gevent in two modes: you could call gevent.monkey.patch_socket() and pass use_greenlets=True to MongoClient, or you could simply call gevent.monkey.patch_all() and omit the use_greenlets argument.

In PyMongo 3.0, the use_greenlets option is gone. To use PyMongo with Gevent simply call gevent.monkey.patch_all().

For more information, see PyMongo’s Gevent documentation.

MongoClient changes

MongoClient is now the one and only client class for a standalone server, mongos, or replica set. It includes the functionality that had been split into MongoReplicaSetClient: it can connect to a replica set, discover all its members, and monitor the set for stepdowns, elections, and reconfigs. MongoClient now also supports the full ReadPreference API.

The obsolete classes MasterSlaveConnection, Connection, and ReplicaSetConnection are removed.

The MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads. The connect option is added to control whether these threads are started immediately, or when the client is first used.

Therefore the alive method is removed since it no longer provides meaningful information; even if the client is disconnected, it may discover a server in time to fulfill the next operation.

In PyMongo 2.x, MongoClient accepted a list of standalone MongoDB servers and used the first it could connect to:

MongoClient(['host1.com:27017', 'host2.com:27017'])

A list of multiple standalones is no longer supported; if multiple servers are listed they must be members of the same replica set, or mongoses in the same sharded cluster.

The behavior for a list of mongoses is changed from “high availability” to “load balancing”. Before, the client connected to the lowest-latency mongos in the list, and used it until a network error prompted it to re-evaluate all mongoses’ latencies and reconnect to one of them. In PyMongo 3, the client monitors its network latency to all the mongoses continuously, and distributes operations evenly among those with the lowest latency. See mongos Load Balancing for more information.

The client methods start_request, in_request, and end_request are removed, and so is the auto_start_request option. Requests were designed to make read-your-writes consistency more likely with the w=0 write concern. Additionally, a thread in a request used the same member for all secondary reads in a replica set. To ensure read-your-writes consistency in PyMongo 3.0, do not override the default write concern with w=0, and do not override the default read preference of PRIMARY.

Support for the slaveOk (or slave_okay), safe, and network_timeout options has been removed. Use SECONDARY_PREFERRED instead of slave_okay. Accept the default write concern, acknowledged writes, instead of setting safe=True. Use socketTimeoutMS in place of network_timeout (note that network_timeout was in seconds, where as socketTimeoutMS is milliseconds).

The max_pool_size option has been removed. It is replaced by the maxPoolSize MongoDB URI option. maxPoolSize is now a supported URI option in PyMongo and can be passed as a keyword argument.

The copy_database method is removed, see the copy_database examples for alternatives.

The disconnect method is removed. Use close() instead.

The get_document_class method is removed. Use codec_options instead.

The get_lasterror_options, set_lasterror_options, and unset_lasterror_options methods are removed. Write concern options can be passed to MongoClient as keyword arguments or MongoDB URI options.

The get_database() method is added for getting a Database instance with its options configured differently than the MongoClient’s.

The following read-only attributes have been added:

The following attributes are now read-only:

The following attributes have been removed:

The following attributes have been renamed:

Cursor changes

The conn_id property is renamed to address.

Cursor management changes

CursorManager and set_cursor_manager() are no longer deprecated. If you subclass CursorManager your implementation of close() must now take a second parameter, address. The BatchCursorManager class is removed.

The second parameter to close_cursor() is renamed from _conn_id to address. kill_cursors() now accepts an address parameter.

Database changes

The connection property is renamed to client.

The following read-only attributes have been added:

The following attributes are now read-only:

Use get_database() for getting a Database instance with its options configured differently than the MongoClient’s.

The following attributes have been removed:

  • safe
  • secondary_acceptable_latency_ms
  • slave_okay
  • tag_sets

The following methods have been added:

The following methods have been changed:

  • command(). Support for as_class, uuid_subtype, tag_sets, and secondary_acceptable_latency_ms have been removed. You can instead pass an instance of CodecOptions as codec_options and an instance of a read preference class from read_preferences as read_preference. The fields and compile_re options are also removed. The fields options was undocumented and never really worked. Regular expressions are always decoded to Regex.

The following methods have been deprecated:

The following methods have been removed:

The get_lasterror_options, set_lasterror_options, and unset_lasterror_options methods have been removed. Use WriteConcern with get_database() instead.

Collection changes

The following read-only attributes have been added:

The following attributes are now read-only:

Use get_collection() or with_options() for getting a Collection instance with its options configured differently than the Database’s.

The following attributes have been removed:

  • safe
  • secondary_acceptable_latency_ms
  • slave_okay
  • tag_sets

The following methods have been added:

The following methods have changed:

  • aggregate() now always returns an instance of CommandCursor. See the documentation for all options.
  • count() now optionally takes a filter argument, as well as other options supported by the count command.
  • distinct() now optionally takes a filter argument.
  • create_index() no longer caches indexes, therefore the cache_for parameter has been removed. It also no longer supports the bucket_size and drop_dups aliases for bucketSize and dropDups.

The following methods are deprecated:

The following methods have been removed:

The get_lasterror_options, set_lasterror_options, and unset_lasterror_options methods have been removed. Use WriteConcern with with_options() instead.

Changes to find() and find_one()

The following find/find_one options have been renamed:

These renames only affect your code if you passed these as keyword arguments, like find(fields=[‘fieldname’]). If you passed only positional parameters these changes are not significant for your application.

  • spec -> filter
  • fields -> projection
  • partial -> allow_partial_results

The following find/find_one options have been added:

  • cursor_type (see CursorType for values)
  • oplog_replay
  • modifiers

The following find/find_one options have been removed:

  • network_timeout (use max_time_ms() instead)
  • slave_okay (use one of the read preference classes from read_preferences and with_options() instead)
  • read_preference (use with_options() instead)
  • tag_sets (use one of the read preference classes from read_preferences and with_options() instead)
  • secondary_acceptable_latency_ms (use the localThresholdMS URI option instead)
  • max_scan (use the new modifiers option instead)
  • snapshot (use the new modifiers option instead)
  • tailable (use the new cursor_type option instead)
  • await_data (use the new cursor_type option instead)
  • exhaust (use the new cursor_type option instead)
  • as_class (use with_options() with CodecOptions instead)
  • compile_re (BSON regular expressions are always decoded to Regex)

The following find/find_one options are deprecated:

  • manipulate

The following renames need special handling.

  • timeout -> no_cursor_timeout - The default for timeout was True. The default for no_cursor_timeout is False. If you were previously passing False for timeout you must pass True for no_cursor_timeout to keep the previous behavior.
errors changes

The exception classes UnsupportedOption and TimeoutError are deleted.

gridfs changes

Since PyMongo 1.6, methods open and close of GridFS raised an UnsupportedAPI exception, as did the entire GridFile class. The unsupported methods, the class, and the exception are all deleted.

bson changes

The compile_re option is removed from all methods that accepted it in bson and json_util. Additionally, it is removed from find(), find_one(), aggregate(), command(), and so on. PyMongo now always represents BSON regular expressions as Regex objects. This prevents errors for incompatible patterns, see PYTHON-500. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

PyMongo now decodes the int64 BSON type to Int64, a trivial wrapper around long (in python 2.x) or int (in python 3.x). This allows BSON int64 to be round tripped without losing type information in python 3. Note that if you store a python long (or a python int larger than 4 bytes) it will be returned from PyMongo as Int64.

The as_class, tz_aware, and uuid_subtype options are removed from all BSON encoding and decoding methods. Use CodecOptions to configure these options. The APIs affected are:

This is a breaking change for any application that uses the BSON API directly and changes any of the named parameter defaults. No changes are required for applications that use the default values for these options. The behavior remains the same.

Issues Resolved

See the PyMongo 3.0 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.5

Version 2.9.5 works around ssl module deprecations in Python 3.6, and expected future ssl module deprecations. It also fixes bugs found since the release of 2.9.4.

  • Use ssl.SSLContext and ssl.PROTOCOL_TLS_CLIENT when available.
  • Fixed a C extensions build issue when the interpreter was built with -std=c99
  • Fixed various build issues with MinGW32.
  • Fixed a write concern bug in add_user() and remove_user() when connected to MongoDB 3.2+
  • Fixed various test failures related to changes in gevent, MongoDB, and our CI test environment.
Issues Resolved

See the PyMongo 2.9.5 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.4

Version 2.9.4 fixes issues reported since the release of 2.9.3.

Issues Resolved

See the PyMongo 2.9.4 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.3

Version 2.9.3 fixes a few issues reported since the release of 2.9.2 including thread safety issues in ensure_index(), drop_index(), and drop_indexes().

Issues Resolved

See the PyMongo 2.9.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.2

Version 2.9.2 restores Python 3.1 support, which was broken in PyMongo 2.8. It improves an error message when decoding BSON as well as fixes a couple other issues including aggregate() ignoring codec_options and command() raising a superfluous DeprecationWarning.

Issues Resolved

See the PyMongo 2.9.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9.1

Version 2.9.1 fixes two interrupt handling issues in the C extensions and adapts a test case for a behavior change in MongoDB 3.2.

Issues Resolved

See the PyMongo 2.9.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.9

Version 2.9 provides an upgrade path to PyMongo 3.x. Most of the API changes from PyMongo 3.0 have been backported in a backward compatible way, allowing applications to be written against PyMongo >= 2.9, rather then PyMongo 2.x or PyMongo 3.x. See the PyMongo 3 Migration Guide for detailed examples.

Note

There are a number of new deprecations in this release for features that were removed in PyMongo 3.0.

MongoClient:
  • host
  • port
  • use_greenlets
  • document_class
  • tz_aware
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype
  • disconnect()
  • alive()
MongoReplicaSetClient:
  • use_greenlets
  • document_class
  • tz_aware
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype
  • alive()
Database:
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype
Collection:
  • secondary_acceptable_latency_ms
  • tag_sets
  • uuid_subtype

Warning

In previous versions of PyMongo, changing the value of document_class changed the behavior of all existing instances of Collection:

>>> coll = client.test.test
>>> coll.find_one()
{u'_id': ObjectId('5579dc7cfba5220cc14d9a18')}
>>> from bson.son import SON
>>> client.document_class = SON
>>> coll.find_one()
SON([(u'_id', ObjectId('5579dc7cfba5220cc14d9a18'))])

The document_class setting is now configurable at the client, database, collection, and per-operation level. This required breaking the existing behavior. To change the document class per operation in a forward compatible way use with_options():

>>> coll.find_one()
{u'_id': ObjectId('5579dc7cfba5220cc14d9a18')}
>>> from bson.codec_options import CodecOptions
>>> coll.with_options(CodecOptions(SON)).find_one()
SON([(u'_id', ObjectId('5579dc7cfba5220cc14d9a18'))])
Issues Resolved

See the PyMongo 2.9 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.8.1

Version 2.8.1 fixes a number of issues reported since the release of PyMongo 2.8. It is a recommended upgrade for all users of PyMongo 2.x.

Issues Resolved

See the PyMongo 2.8.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.8

Version 2.8 is a major release that provides full support for MongoDB 3.0 and fixes a number of bugs.

Special thanks to Don Mitchell, Ximing, Can Zhang, Sergey Azovskov, and Heewa Barfchin for their contributions to this release.

Highlights include:

  • Support for the SCRAM-SHA-1 authentication mechanism (new in MongoDB 3.0).
  • JSON decoder support for the new $numberLong and $undefined types.
  • JSON decoder support for the $date type as an ISO-8601 string.
  • Support passing an index name to hint().
  • The count() method will use a hint if one has been provided through hint().
  • A new socketKeepAlive option for the connection pool.
  • New generator based BSON decode functions, decode_iter() and decode_file_iter().
  • Internal changes to support alternative storage engines like wiredtiger.

Note

There are a number of deprecations in this release for features that will be removed in PyMongo 3.0. These include:

The JSON format for Timestamp has changed from ‘{“t”: <int>, “i”: <int>}’ to ‘{“$timestamp”: {“t”: <int>, “i”: <int>}}’. This new format will be decoded to an instance of Timestamp. The old format will continue to be decoded to a python dict as before. Encoding to the old format is no longer supported as it was never correct and loses type information.

Issues Resolved

See the PyMongo 2.8 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.7.2

Version 2.7.2 includes fixes for upsert reporting in the bulk API for MongoDB versions previous to 2.6, a regression in how son manipulators are applied in insert(), a few obscure connection pool semaphore leaks, and a few other minor issues. See the list of issues resolved for full details.

Issues Resolved

See the PyMongo 2.7.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.7.1

Version 2.7.1 fixes a number of issues reported since the release of 2.7, most importantly a fix for creating indexes and manipulating users through mongos versions older than 2.4.0.

Issues Resolved

See the PyMongo 2.7.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.7

PyMongo 2.7 is a major release with a large number of new features and bug fixes. Highlights include:

Breaking changes

Version 2.7 drops support for replica sets running MongoDB versions older than 1.6.2.

Issues Resolved

See the PyMongo 2.7 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6.3

Version 2.6.3 fixes issues reported since the release of 2.6.2, most importantly a semaphore leak when a connection to the server fails.

Issues Resolved

See the PyMongo 2.6.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6.2

Version 2.6.2 fixes a TypeError problem when max_pool_size=None is used in Python 3.

Issues Resolved

See the PyMongo 2.6.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6.1

Version 2.6.1 fixes a reference leak in the insert() method.

Issues Resolved

See the PyMongo 2.6.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.6

Version 2.6 includes some frequently requested improvements and adds support for some early MongoDB 2.6 features.

Special thanks go to Justin Patrin for his work on the connection pool in this release.

Important new features:

Warning

SIGNIFICANT BEHAVIOR CHANGE in 2.6. Previously, max_pool_size would limit only the idle sockets the pool would hold onto, not the number of open sockets. The default has also changed, from 10 to 100. If you pass a value for max_pool_size make sure it is large enough for the expected load. (Sockets are only opened when needed, so there is no cost to having a max_pool_size larger than necessary. Err towards a larger value.) If your application accepts the default, continue to do so.

See How does connection pooling work in PyMongo? for more information.

Issues Resolved

See the PyMongo 2.6 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.5.2

Version 2.5.2 fixes a NULL pointer dereference issue when decoding an invalid DBRef.

Issues Resolved

See the PyMongo 2.5.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.5.1

Version 2.5.1 is a minor release that fixes issues discovered after the release of 2.5. Most importantly, this release addresses some race conditions in replica set monitoring.

Issues Resolved

See the PyMongo 2.5.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.5

Version 2.5 includes changes to support new features in MongoDB 2.4.

Important new features:

  • Support for GSSAPI (Kerberos) authentication.
  • Support for SSL certificate validation with hostname matching.
  • Support for delegated and role based authentication.
  • New GEOSPHERE (2dsphere) and HASHED index constants.

Note

authenticate() now raises a subclass of PyMongoError if authentication fails due to invalid credentials or configuration issues.

Issues Resolved

See the PyMongo 2.5 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.4.2

Version 2.4.2 is a minor release that fixes issues discovered after the release of 2.4.1. Most importantly, PyMongo will no longer select a replica set member for read operations that is not in primary or secondary state.

Issues Resolved

See the PyMongo 2.4.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.4.1

Version 2.4.1 is a minor release that fixes issues discovered after the release of 2.4. Most importantly, this release fixes a regression using aggregate(), and possibly other commands, with mongos.

Issues Resolved

See the PyMongo 2.4.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.4

Version 2.4 includes a few important new features and a large number of bug fixes.

Important new features:

  • New MongoClient and MongoReplicaSetClient classes - these connection classes do acknowledged write operations (previously referred to as ‘safe’ writes) by default. Connection and ReplicaSetConnection are deprecated but still support the old default fire-and-forget behavior.
  • A new write concern API implemented as a write_concern attribute on the connection, Database, or Collection classes.
  • MongoClient (and Connection) now support Unix Domain Sockets.
  • Cursor can be copied with functions from the copy module.
  • The set_profiling_level() method now supports a slow_ms option.
  • The replica set monitor task (used by MongoReplicaSetClient and ReplicaSetConnection) is a daemon thread once again, meaning you won’t have to call close() before exiting the python interactive shell.

Warning

The constructors for MongoClient, MongoReplicaSetClient, Connection, and ReplicaSetConnection now raise ConnectionFailure instead of its subclass AutoReconnect if the server is unavailable. Applications that expect to catch AutoReconnect should now catch ConnectionFailure while creating a new connection.

Issues Resolved

See the PyMongo 2.4 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.3

Version 2.3 adds support for new features and behavior changes in MongoDB 2.2.

Important New Features:

  • Support for expanded read preferences including directing reads to tagged servers - See Secondary Reads for more information.
  • Support for mongos failover.
  • A new aggregate() method to support MongoDB’s new aggregation framework.
  • Support for legacy Java and C# byte order when encoding and decoding UUIDs.
  • Support for connecting directly to an arbiter.

Warning

Starting with MongoDB 2.2 the getLastError command requires authentication when the server’s authentication features are enabled. Changes to PyMongo were required to support this behavior change. Users of authentication must upgrade to PyMongo 2.3 (or newer) for “safe” write operations to function correctly.

Issues Resolved

See the PyMongo 2.3 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.2.1

Version 2.2.1 is a minor release that fixes issues discovered after the release of 2.2. Most importantly, this release fixes an incompatibility with mod_wsgi 2.x that could cause connections to leak. Users of mod_wsgi 2.x are strongly encouraged to upgrade from PyMongo 2.2.

Issues Resolved

See the PyMongo 2.2.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.2

Version 2.2 adds a few more frequently requested features and fixes a number of bugs.

Special thanks go to Alex Grönholm for his contributions to Python 3 support and maintaining the original pymongo3 port. Christoph Simon, Wouter Bolsterlee, Mike O’Brien, and Chris Tompkinson also contributed to this release.

Important New Features:

  • Support for Python 3 - See the Python 3 FAQ for more information.
  • Support for Gevent - See Gevent for more information.
  • Improved connection pooling. See PYTHON-287.

Warning

A number of methods and method parameters that were deprecated in PyMongo 1.9 or older versions have been removed in this release. The full list of changes can be found in the following JIRA ticket:

https://jira.mongodb.org/browse/PYTHON-305

BSON module aliases from the pymongo package that were deprecated in PyMongo 1.9 have also been removed in this release. See the following JIRA ticket for details:

https://jira.mongodb.org/browse/PYTHON-304

As a result of this cleanup some minor code changes may be required to use this release.

Issues Resolved

See the PyMongo 2.2 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.1.1

Version 2.1.1 is a minor release that fixes a few issues discovered after the release of 2.1. You can now use ReplicaSetConnection to run inline map reduce commands on secondaries. See inline_map_reduce() for details.

Special thanks go to Samuel Clay and Ross Lawley for their contributions to this release.

Issues Resolved

See the PyMongo 2.1.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.1

Version 2.1 adds a few frequently requested features and includes the usual round of bug fixes and improvements.

Special thanks go to Alexey Borzenkov, Dan Crosta, Kostya Rybnikov, Flavio Percoco Premoli, Jonas Haag, and Jesse Davis for their contributions to this release.

Important New Features:

  • ReplicaSetConnection - ReplicaSetConnection can be used to distribute reads to secondaries in a replica set. It supports automatic failover handling and periodically checks the state of the replica set to handle issues like primary stepdown or secondaries being removed for backup operations. Read preferences are defined through ReadPreference.
  • PyMongo supports the new BSON binary subtype 4 for UUIDs. The default subtype to use can be set through uuid_subtype The current default remains OLD_UUID_SUBTYPE but will be changed to UUID_SUBTYPE in a future release.
  • The getLastError option ‘w’ can be set to a string, allowing for options like “majority” available in newer version of MongoDB.
  • Added support for the MongoDB URI options socketTimeoutMS and connectTimeoutMS.
  • Added support for the ContinueOnError insert flag.
  • Added basic SSL support.
  • Added basic support for Jython.
  • Secondaries can be used for count(), distinct(), group(), and querying GridFS.
  • Added document_class and tz_aware options to MasterSlaveConnection
Issues Resolved

See the PyMongo 2.1 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 2.0.1

Version 2.0.1 fixes a regression in GridIn when writing pre-chunked strings. Thanks go to Alexey Borzenkov for reporting the issue and submitting a patch.

Issues Resolved
  • PYTHON-271: Regression in GridFS leads to serious loss of data.

Changes in Version 2.0

Version 2.0 adds a large number of features and fixes a number of issues.

Special thanks go to James Murty, Abhay Vardhan, David Pisoni, Ryan Smith-Roberts, Andrew Pendleton, Mher Movsisyan, Reed O’Brien, Michael Schurter, Josip Delic and Jonas Haag for their contributions to this release.

Important New Features:

  • PyMongo now performs automatic per-socket database authentication. You no longer have to re-authenticate for each new thread or after a replica set failover. Authentication credentials are cached by the driver until the application calls logout().
  • slave_okay can be set independently at the connection, database, collection or query level. Each level will inherit the slave_okay setting from the previous level and each level can override the previous level’s setting.
  • safe and getLastError options (e.g. w, wtimeout, etc.) can be set independently at the connection, database, collection or query level. Each level will inherit settings from the previous level and each level can override the previous level’s setting.
  • PyMongo now supports the await_data and partial cursor flags. If the await_data flag is set on a tailable cursor the server will block for some extra time waiting for more data to return. The partial flag tells a mongos to return partial data for a query if not all shards are available.
  • map_reduce() will accept a dict or instance of SON as the out parameter.
  • The URI parser has been moved into its own module and can be used directly by application code.
  • AutoReconnect exception now provides information about the error that actually occured instead of a generic failure message.
  • A number of new helper methods have been added with options for setting and unsetting cursor flags, re-indexing a collection, fsync and locking a server, and getting the server’s current operations.

API changes:

  • If only one host:port pair is specified Connection will make a direct connection to only that host. Please note that slave_okay must be True in order to query from a secondary.
  • If more than one host:port pair is specified or the replicaset option is used PyMongo will treat the specified host:port pair(s) as a seed list and connect using replica set behavior.

Warning

The default subtype for Binary has changed from OLD_BINARY_SUBTYPE (2) to BINARY_SUBTYPE (0).

Issues Resolved

See the PyMongo 2.0 release notes in JIRA for the list of resolved issues in this release.

Changes in Version 1.11

Version 1.11 adds a few new features and fixes a few more bugs.

New Features:

  • Basic IPv6 support: pymongo prefers IPv4 but will try IPv6. You can also specify an IPv6 address literal in the host parameter or a MongoDB URI provided it is enclosed in ‘[‘ and ‘]’.
  • max_pool_size option: previously pymongo had a hard coded pool size of 10 connections. With this change you can specify a different pool size as a parameter to Connection (max_pool_size=<integer>) or in the MongoDB URI (maxPoolSize=<integer>).
  • Find by metadata in GridFS: You can know specify query fields as keyword parameters for get_version() and get_last_version().
  • Per-query slave_okay option: slave_okay=True is now a valid keyword argument for find() and find_one().

API changes:

  • validate_collection() now returns a dict instead of a string. This change was required to deal with an API change on the server. This method also now takes the optional scandata and full parameters. See the documentation for more details.

Warning

The pool_size, auto_start_request, and timeout parameters for Connection have been completely removed in this release. They were deprecated in pymongo-1.4 and have had no effect since then. Please make sure that your code doesn’t currently pass these parameters when creating a Connection instance.

Issues resolved
  • PYTHON-241: Support setting slaveok at the cursor level.
  • PYTHON-240: Queries can sometimes permanently fail after a replica set fail over.
  • PYTHON-238: error after few million requests
  • PYTHON-237: Basic IPv6 support.
  • PYTHON-236: Restore option to specify pool size in Connection.
  • PYTHON-212: pymongo does not recover after stale config
  • PYTHON-138: Find method for GridFS

Changes in Version 1.10.1

Version 1.10.1 is primarily a bugfix release. It fixes a regression in version 1.10 that broke pickling of ObjectIds. A number of other bugs have been fixed as well.

There are two behavior changes to be aware of:

  • If a read slave raises AutoReconnect MasterSlaveConnection will now retry the query on each slave until it is successful or all slaves have raised AutoReconnect. Any other exception will immediately be raised. The order that the slaves are tried is random. Previously the read would be sent to one randomly chosen slave and AutoReconnect was immediately raised in case of a connection failure.
  • A Python long is now always BSON encoded as an int64. Previously the encoding was based only on the value of the field and a long with a value less than 2147483648 or greater than -2147483649 would always be BSON encoded as an int32.
Issues resolved
  • PYTHON-234: Fix setup.py to raise exception if any when building extensions
  • PYTHON-233: Add information to build and test with extensions on windows
  • PYTHON-232: Traceback when hashing a DBRef instance
  • PYTHON-231: Traceback when pickling a DBRef instance
  • PYTHON-230: Pickled ObjectIds are not compatible between pymongo 1.9 and 1.10
  • PYTHON-228: Cannot pickle bson.ObjectId
  • PYTHON-227: Traceback when calling find() on system.js
  • PYTHON-216: MasterSlaveConnection is missing disconnect() method
  • PYTHON-186: When storing integers, type is selected according to value instead of type
  • PYTHON-173: as_class option is not propogated by Cursor.clone
  • PYTHON-113: Redunducy in MasterSlaveConnection

Changes in Version 1.10

Version 1.10 includes changes to support new features in MongoDB 1.8.x. Highlights include a modified map/reduce API including an inline map/reduce helper method, a new find_and_modify helper, and the ability to query the server for the maximum BSON document size it supports.

Warning

MongoDB versions greater than 1.7.4 no longer generate temporary collections for map/reduce results. An output collection name must be provided and the output will replace any existing output collection with the same name. map_reduce() now requires the out parameter.

Issues resolved
  • PYTHON-225: ObjectId class definition should use __slots__.
  • PYTHON-223: Documentation fix.
  • PYTHON-220: Documentation fix.
  • PYTHON-219: KeyError in find_and_modify()
  • PYTHON-213: Query server for maximum BSON document size.
  • PYTHON-208: Fix Connection __repr__.
  • PYTHON-207: Changes to Map/Reduce API.
  • PYTHON-205: Accept slaveOk in the URI to match the URI docs.
  • PYTHON-203: When slave_okay=True and we only specify one host don’t autodetect other set members.
  • PYTHON-194: Show size when whining about a document being too large.
  • PYTHON-184: Raise DuplicateKeyError for duplicate keys in capped collections.
  • PYTHON-178: Don’t segfault when trying to encode a recursive data structure.
  • PYTHON-177: Don’t segfault when decoding dicts with broken iterators.
  • PYTHON-172: Fix a typo.
  • PYTHON-170: Add find_and_modify().
  • PYTHON-169: Support deepcopy of DBRef.
  • PYTHON-167: Duplicate of PYTHON-166.
  • PYTHON-166: Fixes a concurrency issue.
  • PYTHON-158: Add code and err string to db assertion messages.

Changes in Version 1.9

Version 1.9 adds a new package to the PyMongo distribution, bson. bson contains all of the BSON encoding and decoding logic, and the BSON types that were formerly in the pymongo package. The following modules have been renamed:

In addition, the following exception classes have been renamed:

The above exceptions now inherit from bson.errors.BSONError rather than pymongo.errors.PyMongoError.

Note

All of the renamed modules and exceptions above have aliases created with the old names, so these changes should not break existing code. The old names will eventually be deprecated and then removed, so users should begin migrating towards the new names now.

Warning

The change to the exception hierarchy mentioned above is possibly breaking. If your code is catching PyMongoError, then the exceptions raised by bson will not be caught, even though they would have been caught previously. Before upgrading, it is recommended that users check for any cases like this.

  • the C extension now shares buffer.c/h with the Ruby driver
  • bson no longer raises InvalidName, all occurrences have been replaced with InvalidDocument.
  • renamed bson._to_dicts() to decode_all().
  • renamed from_dict() to encode() and to_dict() to decode().
  • added batch_size().
  • allow updating (some) file metadata after a GridIn instance has been closed.
  • performance improvements for reading from GridFS.
  • special cased slice with the same start and stop to return an empty cursor.
  • allow writing unicode to GridFS if an encoding attribute has been specified for the file.
  • added gridfs.GridFS.get_version().
  • scope variables for Code can now be specified as keyword arguments.
  • added readline() to GridOut.
  • make a best effort to transparently auto-reconnect if a Connection has been idle for a while.
  • added list() to SystemJS.
  • added file_document argument to GridOut() to allow initializing from an existing file document.
  • raise TimeoutError even if the getLastError command was run manually and not through “safe” mode.
  • added uuid support to json_util.

Changes in Version 1.8.1

  • fixed a typo in the C extension that could cause safe-mode operations to report a failure (SystemError) even when none occurred.
  • added a __ne__() implementation to any class where we define __eq__().

Changes in Version 1.8

Version 1.8 adds support for connecting to replica sets, specifying per-operation values for w and wtimeout, and decoding to timezone-aware datetimes.

  • fixed a reference leak in the C extension when decoding a DBRef.
  • added support for w, wtimeout, and fsync (and any other options for getLastError) to “safe mode” operations.
  • added nodes property.
  • added a maximum pool size of 10 sockets.
  • added support for replica sets.
  • DEPRECATED from_uri() and paired(), both are supplanted by extended functionality in Connection().
  • added tz aware support for datetimes in ObjectId, Timestamp and json_util methods.
  • added drop() helper.
  • reuse the socket used for finding the master when a Connection is first created.
  • added support for MinKey, MaxKey and Timestamp to json_util.
  • added support for decoding datetimes as aware (UTC) - it is highly recommended to enable this by setting the tz_aware parameter to Connection() to True.
  • added network_timeout option for individual calls to find() and find_one().
  • added exists() to check if a file exists in GridFS.
  • added support for additional keys in DBRef instances.
  • added code attribute to OperationFailure exceptions.
  • fixed serialization of int and float subclasses in the C extension.

Changes in Version 1.7

Version 1.7 is a recommended upgrade for all PyMongo users. The full release notes are below, and some more in depth discussion of the highlights is here.

  • no longer attempt to build the C extension on big-endian systems.
  • added MinKey and MaxKey.
  • use unsigned for Timestamp in BSON encoder/decoder.
  • support True as "ok" in command responses, in addition to 1.0 - necessary for server versions >= 1.5.X
  • BREAKING change to index_information() to add support for querying unique status and other index information.
  • added document_class, to specify class for returned documents.
  • added as_class argument for find(), and in the BSON decoder.
  • added support for creating Timestamp instances using a datetime.
  • allow dropTarget argument for rename.
  • handle aware datetime instances, by converting to UTC.
  • added support for max_scan.
  • raise FileExists exception when creating a duplicate GridFS file.
  • use y2038 for time handling in the C extension - eliminates 2038 problems when extension is installed.
  • added sort parameter to find()
  • finalized deprecation of changes from versions <= 1.4
  • take any non-dict as an "_id" query for find_one() or remove()
  • added ability to pass a dict for fields argument to find() (supports "$slice" and field negation)
  • simplified code to find master, since paired setups don’t always have a remote
  • fixed bug in C encoder for certain invalid types (like Collection instances).
  • don’t transparently map "filename" key to name attribute for GridFS.

Changes in Version 1.6

The biggest change in version 1.6 is a complete re-implementation of gridfs with a lot of improvements over the old implementation. There are many details and examples of using the new API in this blog post. The old API has been removed in this version, so existing code will need to be modified before upgrading to 1.6.

  • fixed issue where connection pool was being shared across Connection instances.
  • more improvements to Python code caching in C extension - should improve behavior on mod_wsgi.
  • added from_datetime().
  • complete rewrite of gridfs support.
  • improvements to the command() API.
  • fixed drop_indexes() behavior on non-existent collections.
  • disallow empty bulk inserts.

Changes in Version 1.5.2

  • fixed response handling to ignore unknown response flags in queries.
  • handle server versions containing ‘-pre-‘.

Changes in Version 1.5.1

  • added _id property for GridFile instances.
  • fix for making a Connection (with slave_okay set) directly to a slave in a replica pair.
  • accept kwargs for create_index() and ensure_index() to support all indexing options.
  • add pymongo.GEO2D and support for geo indexing.
  • improvements to Python code caching in C extension - should improve behavior on mod_wsgi.

Changes in Version 1.5

  • added subtype constants to binary module.
  • DEPRECATED options argument to Collection() and create_collection() in favor of kwargs.
  • added has_c() to check for C extension.
  • added copy_database().
  • added alive to tell when a cursor might have more data to return (useful for tailable cursors).
  • added Timestamp to better support dealing with internal MongoDB timestamps.
  • added name argument for create_index() and ensure_index().
  • fixed connection pooling w/ fork
  • paired() takes all kwargs that are allowed for Connection().
  • insert() returns list for bulk inserts of size one.
  • fixed handling of datetime.datetime instances in json_util.
  • added from_uri() to support MongoDB connection uri scheme.
  • fixed chunk number calculation when unaligned in gridfs.
  • command() takes a string for simple commands.
  • added system_js helper for dealing with server-side JS.
  • don’t wrap queries containing "$query" (support manual use of "$min", etc.).
  • added GridFSError as base class for gridfs exceptions.

Changes in Version 1.4

Perhaps the most important change in version 1.4 is that we have decided to no longer support Python 2.3. The most immediate reason for this is to allow some improvements to connection pooling. This will also allow us to use some new (as in Python 2.4 ;) idioms and will help begin the path towards supporting Python 3.0. If you need to use Python 2.3 you should consider using version 1.3 of this driver, although that will no longer be actively supported.

Other changes:

  • move "_id" to front only for top-level documents (fixes some corner cases).
  • update() and remove() return the entire response to the lastError command when safe is True.
  • completed removal of things that were deprecated in version 1.2 or earlier.
  • enforce that collection names do not contain the NULL byte.
  • fix to allow using UTF-8 collection names with the C extension.
  • added PyMongoError as base exception class for all errors. this changes the exception hierarchy somewhat, and is a BREAKING change if you depend on ConnectionFailure being a IOError or InvalidBSON being a ValueError, for example.
  • added DuplicateKeyError for calls to insert() or update() with safe set to True.
  • removed thread_util.
  • added add_user() and remove_user() helpers.
  • fix for authenticate() when using non-UTF-8 names or passwords.
  • minor fixes for MasterSlaveConnection.
  • clean up all cases where ConnectionFailure is raised.
  • simplification of connection pooling - makes driver ~2x faster for simple benchmarks. see How does connection pooling work in PyMongo? for more information.
  • DEPRECATED pool_size, auto_start_request and timeout parameters to Connection. DEPRECATED start_request().
  • use socket.sendall().
  • removed from_xml() as it was only being used for some internal testing - also eliminates dependency on elementtree.
  • implementation of update() in C.
  • deprecate _command() in favor of command().
  • send all commands without wrapping as {"query": ...}.
  • support string as key argument to group() (keyf) and run all groups as commands.
  • support for equality testing for Code instances.
  • allow the NULL byte in strings and disallow it in key names or regex patterns

Changes in Version 1.3

  • DEPRECATED running group() as eval(), also changed default for group() to running as a command
  • remove pymongo.cursor.Cursor.__len__(), which was deprecated in 1.1.1 - needed to do this aggressively due to it’s presence breaking Django template for loops
  • DEPRECATED host(), port(), connection(), name(), database(), name() and full_name() in favor of host, port, connection, name, database, name and full_name, respectively. The deprecation schedule for this change will probably be faster than usual, as it carries some performance implications.
  • added disconnect()

Changes in Version 1.2.1

  • added Changelog to docs
  • added setup.py doc --test to run doctests for tutorial, examples
  • moved most examples to Sphinx docs (and remove from examples/ directory)
  • raise InvalidId instead of TypeError when passing a 24 character string to ObjectId that contains non-hexadecimal characters
  • allow unicode instances for ObjectId init

Changes in Version 1.2

  • spec parameter for remove() is now optional to allow for deleting all documents in a Collection
  • always wrap queries with {query: ...} even when no special options - get around some issues with queries on fields named query
  • enforce 4MB document limit on the client side
  • added map_reduce() helper - see example
  • added distinct() method on Cursor instances to allow distinct with queries
  • fix for __getitem__() after skip()
  • allow any UTF-8 string in BSON encoder, not just ASCII subset
  • added generation_time
  • removed support for legacy ObjectId format - pretty sure this was never used, and is just confusing
  • DEPRECATED url_encode() and url_decode() in favor of str() and ObjectId(), respectively
  • allow oplog.$main as a valid collection name
  • some minor fixes for installation process
  • added support for datetime and regex in json_util

Changes in Version 1.1.2

  • improvements to insert() speed (using C for insert message creation)
  • use random number for request_id
  • fix some race conditions with AutoReconnect

Changes in Version 1.1.1

  • added multi parameter for update()
  • fix unicode regex patterns with C extension
  • added distinct()
  • added database support for DBRef
  • added json_util with helpers for encoding / decoding special types to JSON
  • DEPRECATED pymongo.cursor.Cursor.__len__() in favor of count() with with_limit_and_skip set to True due to performance regression
  • switch documentation to Sphinx

Changes in Version 1.1

  • added __hash__() for DBRef and ObjectId
  • bulk insert() works with any iterable
  • fix ObjectId generation when using multiprocessing
  • added collection
  • added network_timeout parameter for Connection()
  • DEPRECATED slave_okay parameter for individual queries
  • fix for safe mode when multi-threaded
  • added safe parameter for remove()
  • added tailable parameter for find()

Changes in Version 1.0

Changes in Version 0.16

Changes in Version 0.15.2

  • documentation changes only

Changes in Version 0.15.1

  • various performance improvements
  • API CHANGE no longer need to specify direction for create_index() and ensure_index() when indexing a single key
  • support for encoding tuple instances as list instances

Changes in Version 0.15

  • fix string representation of ObjectId instances
  • added timeout parameter for find()
  • allow scope for reduce function in group()

Changes in Version 0.14.2

  • minor bugfixes

Changes in Version 0.14.1

  • seek() and tell() for (read mode) GridFile instances

Changes in Version 0.14

Changes in Version 0.13

  • better MasterSlaveConnection support
  • API CHANGE insert() and save() both return inserted _id
  • DEPRECATED passing an index name to hint()

Changes in Version 0.12

Changes in Version 0.11.3

  • don’t allow NULL bytes in string encoder
  • fixes for Python 2.3

Changes in Version 0.11.2

  • PEP 8
  • updates for group()
  • VS build

Changes in Version 0.11.1

  • fix for connection pooling under Python 2.5

Changes in Version 0.11

  • better build failure detection
  • driver support for selecting fields in sub-documents
  • disallow insertion of invalid key names
  • added timeout parameter for Connection()

Changes in Version 0.10.3

  • fix bug with large limit()
  • better exception when modules get reloaded out from underneath the C extension
  • better exception messages when calling a Collection or Database instance

Changes in Version 0.10.2

  • support subclasses of dict in C encoder

Changes in Version 0.10.1

  • alias Connection as pymongo.Connection
  • raise an exception rather than silently overflowing in encoder

Changes in Version 0.10

Changes in Version 0.9.7

  • allow sub-collections of $cmd as valid Collection names
  • add version as pymongo.version
  • add --no_ext command line option to setup.py
Python 3 FAQ
What Python 3 versions are supported?

PyMongo supports CPython 3.4+ and PyPy3.5+.

Are there any PyMongo behavior changes with Python 3?

Only one intentional change. Instances of bytes are encoded as BSON type 5 (Binary data) with subtype 0. In Python 3 they are decoded back to bytes. In Python 2 they are decoded to Binary with subtype 0.

For example, let’s insert a bytes instance using Python 3 then read it back. Notice the byte string is decoded back to bytes:

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> c = pymongo.MongoClient()
>>> c.test.bintest.insert_one({'binary': b'this is a byte string'}).inserted_id
ObjectId('4f9086b1fba5222021000000')
>>> c.test.bintest.find_one()
{'binary': b'this is a byte string', '_id': ObjectId('4f9086b1fba5222021000000')}

Now retrieve the same document in Python 2. Notice the byte string is decoded to Binary:

Python 2.7.6 (default, Feb 26 2014, 10:36:22)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo
>>> c = pymongo.MongoClient()
>>> c.test.bintest.find_one()
{u'binary': Binary('this is a byte string', 0), u'_id': ObjectId('4f9086b1fba5222021000000')}

There is a similar change in behavior in parsing JSON binary with subtype 0. In Python 3 they are decoded into bytes. In Python 2 they are decoded to Binary with subtype 0.

For example, let’s decode a JSON binary subtype 0 using Python 3. Notice the byte string is decoded to bytes:

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bson.json_util import loads
>>> loads('{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}')
{'b': b'this is a byte string'}

Now decode the same JSON in Python 2 . Notice the byte string is decoded to Binary:

Python 2.7.10 (default, Feb  7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bson.json_util import loads
>>> loads('{"b": {"$binary": "dGhpcyBpcyBhIGJ5dGUgc3RyaW5n", "$type": "00"}}')
{u'b': Binary('this is a byte string', 0)}
Why can’t I share pickled ObjectIds between some versions of Python 2 and 3?

Instances of ObjectId pickled using Python 2 can always be unpickled using Python 3.

If you pickled an ObjectId using Python 2 and want to unpickle it using Python 3 you must pass encoding='latin-1' to pickle.loads:

Python 2.7.6 (default, Feb 26 2014, 10:36:22)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> from bson.objectid import ObjectId
>>> oid = ObjectId()
>>> oid
ObjectId('4f919ba2fba5225b84000000')
>>> pickle.dumps(oid)
'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...'

Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.9.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads(b'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...', encoding='latin-1')
ObjectId('4f919ba2fba5225b84000000')

If you need to pickle ObjectIds using Python 3 and unpickle them using Python 2 you must use protocol <= 2:

Python 3.6.5 (default, Jun 21 2018, 15:09:09)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> from bson.objectid import ObjectId
>>> oid = ObjectId()
>>> oid
ObjectId('4f96f20c430ee6bd06000000')
>>> pickle.dumps(oid, protocol=2)
b'\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...'

Python 2.7.15 (default, Jun 21 2018, 15:00:48)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads('\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...')
ObjectId('4f96f20c430ee6bd06000000')

PyMongo 3 Migration Guide

PyMongo 3 is a partial rewrite bringing a large number of improvements. It also brings a number of backward breaking changes. This guide provides a roadmap for migrating an existing application from PyMongo 2.x to 3.x or writing libraries that will work with both PyMongo 2.x and 3.x.

PyMongo 2.9

The first step in any successful migration involves upgrading to, or requiring, at least PyMongo 2.9. If your project has a requirements.txt file, add the line “pymongo >= 2.9, < 3.0” until you have completely migrated to PyMongo 3. Most of the key new methods and options from PyMongo 3.0 are backported in PyMongo 2.9 making migration much easier.

Enable Deprecation Warnings

Starting with PyMongo 2.9, DeprecationWarning is raised by most methods removed in PyMongo 3.0. Make sure you enable runtime warnings to see where deprecated functions and methods are being used in your application:

python -Wd <your application>

Warnings can also be changed to errors:

python -Wd -Werror <your application>

Note

Not all deprecated features raise DeprecationWarning when used. For example, the find() options renamed in PyMongo 3.0 do not raise DeprecationWarning when used in PyMongo 2.x. See also Removed features with no migration path.

CRUD API

Changes to find() and find_one()
“spec” renamed “filter”

The spec option has been renamed to filter. Code like this:

>>> cursor = collection.find(spec={"a": 1})

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find(filter={"a": 1})

or this with any version of PyMongo:

>>> cursor = collection.find({"a": 1})
“fields” renamed “projection”

The fields option has been renamed to projection. Code like this:

>>> cursor = collection.find({"a": 1}, fields={"_id": False})

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find({"a": 1}, projection={"_id": False})

or this with any version of PyMongo:

>>> cursor = collection.find({"a": 1}, {"_id": False})
“partial” renamed “allow_partial_results”

The partial option has been renamed to allow_partial_results. Code like this:

>>> cursor = collection.find({"a": 1}, partial=True)

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find({"a": 1}, allow_partial_results=True)
“timeout” replaced by “no_cursor_timeout”

The timeout option has been replaced by no_cursor_timeout. Code like this:

>>> cursor = collection.find({"a": 1}, timeout=False)

can be changed to this with PyMongo 2.9 or later:

>>> cursor = collection.find({"a": 1}, no_cursor_timeout=True)
“network_timeout” is removed

The network_timeout option has been removed. This option was always the wrong solution for timing out long running queries and should never be used in production. Starting with MongoDB 2.6 you can use the $maxTimeMS query modifier. Code like this:

# Set a 5 second select() timeout.
>>> cursor = collection.find({"a": 1}, network_timeout=5)

can be changed to this with PyMongo 2.9 or later:

# Set a 5 second (5000 millisecond) server side query timeout.
>>> cursor = collection.find({"a": 1}, modifiers={"$maxTimeMS": 5000})

or with PyMongo 3.5 or later:

>>> cursor = collection.find({"a": 1}, max_time_ms=5000)

or with any version of PyMongo:

>>> cursor = collection.find({"$query": {"a": 1}, "$maxTimeMS": 5000})

See also

$maxTimeMS

Tailable cursors

The tailable and await_data options have been replaced by cursor_type. Code like this:

>>> cursor = collection.find({"a": 1}, tailable=True)
>>> cursor = collection.find({"a": 1}, tailable=True, await_data=True)

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo import CursorType
>>> cursor = collection.find({"a": 1}, cursor_type=CursorType.TAILABLE)
>>> cursor = collection.find({"a": 1}, cursor_type=CursorType.TAILABLE_AWAIT)
Other removed options

The slave_okay, read_preference, tag_sets, and secondary_acceptable_latency_ms options have been removed. See the Read Preferences section for solutions.

The aggregate method always returns a cursor

PyMongo 2.6 added an option to return an iterable cursor from aggregate(). In PyMongo 3 aggregate() always returns a cursor. Use the cursor option for consistent behavior with PyMongo 2.9 and later:

>>> for result in collection.aggregate([], cursor={}):
...     pass

Read Preferences

The “slave_okay” option is removed

The slave_okay option is removed from PyMongo’s API. The secondaryPreferred read preference provides the same behavior. Code like this:

>>> client = MongoClient(slave_okay=True)

can be changed to this with PyMongo 2.9 or newer:

>>> client = MongoClient(readPreference="secondaryPreferred")
The “read_preference” attribute is immutable

Code like this:

>>> from pymongo import ReadPreference
>>> db = client.my_database
>>> db.read_preference = ReadPreference.SECONDARY

can be changed to this with PyMongo 2.9 or later:

>>> db = client.get_database("my_database",
...                          read_preference=ReadPreference.SECONDARY)

Code like this:

>>> cursor = collection.find({"a": 1},
...                          read_preference=ReadPreference.SECONDARY)

can be changed to this with PyMongo 2.9 or later:

>>> coll2 = collection.with_options(read_preference=ReadPreference.SECONDARY)
>>> cursor = coll2.find({"a": 1})

See also

get_collection()

The “tag_sets” option and attribute are removed

The tag_sets MongoClient option is removed. The read_preference option can be used instead. Code like this:

>>> client = MongoClient(
...     read_preference=ReadPreference.SECONDARY,
...     tag_sets=[{"dc": "ny"}, {"dc": "sf"}])

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo.read_preferences import Secondary
>>> client = MongoClient(read_preference=Secondary([{"dc": "ny"}]))

To change the tags sets for a Database or Collection, code like this:

>>> db = client.my_database
>>> db.read_preference = ReadPreference.SECONDARY
>>> db.tag_sets = [{"dc": "ny"}]

can be changed to this with PyMongo 2.9 or later:

>>> db = client.get_database("my_database",
...                          read_preference=Secondary([{"dc": "ny"}]))

Code like this:

>>> cursor = collection.find(
...     {"a": 1},
...     read_preference=ReadPreference.SECONDARY,
...     tag_sets=[{"dc": "ny"}])

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo.read_preferences import Secondary
>>> coll2 = collection.with_options(
...     read_preference=Secondary([{"dc": "ny"}]))
>>> cursor = coll2.find({"a": 1})

See also

get_collection()

The “secondary_acceptable_latency_ms” option and attribute are removed

PyMongo 2.x supports secondary_acceptable_latency_ms as an option to methods throughout the driver, but mongos only supports a global latency option. PyMongo 3.x has changed to match the behavior of mongos, allowing migration from a single server, to a replica set, to a sharded cluster without a surprising change in server selection behavior. A new option, localThresholdMS, is available through MongoClient and should be used in place of secondaryAcceptableLatencyMS. Code like this:

>>> client = MongoClient(readPreference="nearest",
...                      secondaryAcceptableLatencyMS=100)

can be changed to this with PyMongo 2.9 or later:

>>> client = MongoClient(readPreference="nearest",
...                      localThresholdMS=100)

Write Concern

The “safe” option is removed

In PyMongo 3 the safe option is removed from the entire API. MongoClient has always defaulted to acknowledged write operations and continues to do so in PyMongo 3.

The “write_concern” attribute is immutable

The write_concern attribute is immutable in PyMongo 3. Code like this:

>>> client = MongoClient()
>>> client.write_concern = {"w": "majority"}

can be changed to this with any version of PyMongo:

>>> client = MongoClient(w="majority")

Code like this:

>>> db = client.my_database
>>> db.write_concern = {"w": "majority"}

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo import WriteConcern
>>> db = client.get_database("my_database",
...                          write_concern=WriteConcern(w="majority"))

The new CRUD API write methods do not accept write concern options. Code like this:

>>> oid = collection.insert({"a": 2}, w="majority")

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo import WriteConcern
>>> coll2 = collection.with_options(
...     write_concern=WriteConcern(w="majority"))
>>> oid = coll2.insert({"a": 2})

See also

get_collection()

Codec Options

The “document_class” attribute is removed

Code like this:

>>> from bson.son import SON
>>> client = MongoClient()
>>> client.document_class = SON

can be replaced by this in any version of PyMongo:

>>> from bson.son import SON
>>> client = MongoClient(document_class=SON)

or to change the document_class for a Database with PyMongo 2.9 or later:

>>> from bson.codec_options import CodecOptions
>>> from bson.son import SON
>>> db = client.get_database("my_database", CodecOptions(SON))
The “uuid_subtype” option and attribute are removed

Code like this:

>>> from bson.binary import JAVA_LEGACY
>>> db = client.my_database
>>> db.uuid_subtype = JAVA_LEGACY

can be replaced by this with PyMongo 2.9 or later:

>>> from bson.binary import JAVA_LEGACY
>>> from bson.codec_options import CodecOptions
>>> db = client.get_database("my_database",
...                          CodecOptions(uuid_representation=JAVA_LEGACY))

MongoClient

MongoClient connects asynchronously

In PyMongo 3, the MongoClient constructor no longer blocks while connecting to the server or servers, and it no longer raises ConnectionFailure if they are unavailable, nor ConfigurationError if the user’s credentials are wrong. Instead, the constructor returns immediately and launches the connection process on background threads. The connect option is added to control whether these threads are started immediately, or when the client is first used.

For consistent behavior in PyMongo 2.x and PyMongo 3.x, code like this:

>>> from pymongo.errors import ConnectionFailure
>>> try:
...     client = MongoClient()
... except ConnectionFailure:
...     print("Server not available")
>>>

can be changed to this with PyMongo 2.9 or later:

>>> from pymongo.errors import ConnectionFailure
>>> client = MongoClient(connect=False)
>>> try:
...     result = client.admin.command("ismaster")
... except ConnectionFailure:
...     print("Server not available")
>>>

Any operation can be used to determine if the server is available. We choose the “ismaster” command here because it is cheap and does not require auth, so it is a simple way to check whether the server is available.

The max_pool_size parameter is removed

PyMongo 3 replaced the max_pool_size parameter with support for the MongoDB URI maxPoolSize option. Code like this:

>>> client = MongoClient(max_pool_size=10)

can be replaced by this with PyMongo 2.9 or later:

>>> client = MongoClient(maxPoolSize=10)
>>> client = MongoClient("mongodb://localhost:27017/?maxPoolSize=10")
The “disconnect” method is removed

Code like this:

>>> client.disconnect()

can be replaced by this with PyMongo 2.9 or later:

>>> client.close()
The host and port attributes are removed

Code like this:

>>> host = client.host
>>> port = client.port

can be replaced by this with PyMongo 2.9 or later:

>>> address = client.address
>>> host, port = address or (None, None)

BSON

“as_class”, “tz_aware”, and “uuid_subtype” are removed

The as_class, tz_aware, and uuid_subtype parameters have been removed from the functions provided in bson. Code like this:

>>> from bson import BSON
>>> from bson.son import SON
>>> encoded = BSON.encode({"a": 1}, as_class=SON)

can be replaced by this in PyMongo 2.9 or later:

>>> from bson import BSON
>>> from bson.codec_options import CodecOptions
>>> from bson.son import SON
>>> encoded = BSON.encode({"a": 1}, codec_options=CodecOptions(SON))

Removed features with no migration path

MasterSlaveConnection is removed

Master slave deployments are deprecated in MongoDB. Starting with MongoDB 3.0 a replica set can have up to 50 members and that limit is likely to be removed in later releases. We recommend migrating to replica sets instead.

Requests are removed

The client methods start_request, in_request, and end_request are removed. Requests were designed to make read-your-writes consistency more likely with the w=0 write concern. Additionally, a thread in a request used the same member for all secondary reads in a replica set. To ensure read-your-writes consistency in PyMongo 3.0, do not override the default write concern with w=0, and do not override the default read preference of PRIMARY.

The “compile_re” option is removed

In PyMongo 3 regular expressions are never compiled to Python match objects.

The “use_greenlets” option is removed

The use_greenlets option was meant to allow use of PyMongo with Gevent without the use of gevent.monkey.patch_threads(). This option caused a lot of confusion and made it difficult to support alternative asyncio libraries like Eventlet. Users of Gevent should use gevent.monkey.patch_all() instead.

See also

Gevent

Developer Guide

Technical guide for contributors to PyMongo.

Periodic Executors

PyMongo implements a PeriodicExecutor for two purposes: as the background thread for Monitor, and to regularly check if there are OP_KILL_CURSORS messages that must be sent to the server.

Killing Cursors

An incompletely iterated Cursor on the client represents an open cursor object on the server. In code like this, we lose a reference to the cursor before finishing iteration:

for doc in collection.find():
    raise Exception()

We try to send an OP_KILL_CURSORS to the server to tell it to clean up the server-side cursor. But we must not take any locks directly from the cursor’s destructor (see PYTHON-799), so we cannot safely use the PyMongo data structures required to send a message. The solution is to add the cursor’s id to an array on the MongoClient without taking any locks.

Each client has a PeriodicExecutor devoted to checking the array for cursor ids. Any it sees are the result of cursors that were freed while the server-side cursor was still open. The executor can safely take the locks it needs in order to send the OP_KILL_CURSORS message.

Stopping Executors

Just as Cursor must not take any locks from its destructor, neither can MongoClient and Topology. Thus, although the client calls close() on its kill-cursors thread, and the topology calls close() on all its monitor threads, the close() method cannot actually call wake() on the executor, since wake() takes a lock.

Instead, executors wake periodically to check if self.close is set, and if so they exit.

A thread can log spurious errors if it wakes late in the Python interpreter’s shutdown sequence, so we try to join threads before then. Each periodic executor (either a monitor or a kill-cursors thread) adds a weakref to itself to a set called _EXECUTORS, in the periodic_executor module.

An exit handler runs on shutdown and tells all executors to stop, then tries (with a short timeout) to join all executor threads.

Monitoring

For each server in the topology, Topology uses a periodic executor to launch a monitor thread. This thread must not prevent the topology from being freed, so it weakrefs the topology. Furthermore, it uses a weakref callback to terminate itself soon after the topology is freed.

Solid lines represent strong references, dashed lines weak ones:

_images/periodic-executor-refs.png

See Stopping Executors above for an explanation of the _EXECUTORS set.

It is a requirement of the Server Discovery And Monitoring Spec that a sleeping monitor can be awakened early. Aside from infrequent wakeups to do their appointed chores, and occasional interruptions, periodic executors also wake periodically to check if they should terminate.

Our first implementation of this idea was the obvious one: use the Python standard library’s threading.Condition.wait with a timeout. Another thread wakes the executor early by signaling the condition variable.

A topology cannot signal the condition variable to tell the executor to terminate, because it would risk a deadlock in the garbage collector: no destructor or weakref callback can take a lock to signal the condition variable (see PYTHON-863); thus the only way for a dying object to terminate a periodic executor is to set its “stopped” flag and let the executor see the flag next time it wakes.

We erred on the side of prompt cleanup, and set the check interval at 100ms. We assumed that checking a flag and going back to sleep 10 times a second was cheap on modern machines.

Starting in Python 3.2, the builtin C implementation of lock.acquire takes a timeout parameter, so Python 3.2+ Condition variables sleep simply by calling lock.acquire; they are implemented as efficiently as expected.

But in Python 2, lock.acquire has no timeout. To wait with a timeout, a Python 2 condition variable sleeps a millisecond, tries to acquire the lock, sleeps twice as long, and tries again. This exponential backoff reaches a maximum sleep time of 50ms.

If PyMongo calls the condition variable’s “wait” method with a short timeout, the exponential backoff is restarted frequently. Overall, the condition variable is not waking a few times a second, but hundreds of times. (See PYTHON-983.)

Thus the current design of periodic executors is surprisingly simple: they do a simple time.sleep for a half-second, check if it is time to wake or terminate, and sleep again.