bson – BSON (Binary JSON) Encoding and Decoding

BSON (Binary JSON) encoding and decoding.

The mapping from Python types to BSON types is as follows:

Python Type

BSON Type

Supported Direction

None

null

both

bool

boolean

both

int 1

int32 / int64

py -> bson

bson.int64.Int64

int64

both

float

number (real)

both

str

string

both

list

array

both

dict / SON

object

both

datetime.datetime 2 3

date

both

bson.regex.Regex

regex

both

compiled re 4

regex

py -> bson

bson.binary.Binary

binary

both

bson.objectid.ObjectId

oid

both

bson.dbref.DBRef

dbref

both

None

undefined

bson -> py

bson.code.Code

code

both

str

symbol

bson -> py

bytes 5

binary

both

1

A Python int will be saved as a BSON int32 or BSON int64 depending on its size. A BSON int32 will always decode to a Python int. A BSON int64 will always decode to a Int64.

2

datetime.datetime instances will be rounded to the nearest millisecond when saved

3

all datetime.datetime instances are treated as naive. clients should always use UTC.

4

Regex instances and regular expression objects from re.compile() are both saved as BSON regular expressions. BSON regular expressions are decoded as Regex instances.

5

The bytes type is encoded as BSON binary with subtype 0. It will be decoded back to bytes.

class bson.BSON

BSON (Binary JSON) data.

Warning

Using this class to encode and decode BSON adds a performance cost. For better performance use the module level functions encode() and decode() instead.

decode(codec_options: CodecOptions[_DocumentType] = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)) bson.codec_options._DocumentType

Decode this BSON data.

By default, returns a BSON document represented as a Python dict. To use a different MutableMapping class, configure a CodecOptions:

>>> import collections  # From Python standard library.
>>> import bson
>>> from bson.codec_options import CodecOptions
>>> data = bson.BSON.encode({'a': 1})
>>> decoded_doc = bson.BSON(data).decode()
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.BSON(data).decode(codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>
Parameters

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

classmethod encode(document: Mapping[str, Any], check_keys: bool = False, codec_options: bson.codec_options.CodecOptions = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)) bson.BSON

Encode a document to a new BSON instance.

A document can be any mapping type (like dict).

Raises TypeError if document is not a mapping type, or contains keys that are not instances of basestring (str in python 3). Raises InvalidDocument if document cannot be converted to BSON.

Parameters
  • document: mapping type representing a document

  • check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising InvalidDocument in either case

  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced uuid_subtype option with codec_options.

class bson.Binary(data: Union[memoryview, bytes, _mmap, _array], subtype: int = 0)

Representation of BSON binary data.

This is necessary because we want to represent Python strings as the BSON string type. We need to wrap binary data so we can tell the difference between what should be considered binary data and what should be considered a string when we encode to BSON.

Raises TypeError if data is not an instance of bytes (str in python 2) or subtype is not an instance of int. Raises ValueError if subtype is not in [0, 256).

Note

In python 3 instances of Binary with subtype 0 will be decoded directly to bytes.

Parameters
  • data: the binary data to represent. Can be any bytes-like type that implements the buffer protocol.

  • subtype (optional): the binary subtype to use

Changed in version 3.9: Support any bytes-like type that implements the buffer protocol.

as_uuid(uuid_representation: int = 4) uuid.UUID

Create a Python UUID from this BSON Binary object.

Decodes this binary object as a native uuid.UUID instance with the provided uuid_representation.

Raises ValueError if this Binary instance does not contain a UUID.

Parameters

New in version 3.11.

classmethod from_uuid(uuid: uuid.UUID, uuid_representation: int = 4) bson.binary.Binary

Create a BSON Binary object from a Python UUID.

Creates a Binary object from a uuid.UUID instance. Assumes that the native uuid.UUID instance uses the byte-order implied by the provided uuid_representation.

Raises TypeError if uuid is not an instance of UUID.

Parameters

New in version 3.11.

property subtype: int

Subtype of this binary data.

class bson.Code(code: Union[str, bson.code.Code], scope: Optional[Mapping[str, Any]] = None, **kwargs: Any)

BSON’s JavaScript code type.

Raises TypeError if code is not an instance of basestring (str in python 3) or scope is not None or an instance of dict.

Scope variables can be set by passing a dictionary as the scope argument or by using keyword arguments. If a variable is set as a keyword argument it will override any setting for that variable in the scope dictionary.

Parameters
  • code: A string containing JavaScript code to be evaluated or another instance of Code. In the latter case, the scope of code becomes this Code’s scope.

  • scope (optional): dictionary representing the scope in which code should be evaluated - a mapping from identifiers (as strings) to values. Defaults to None. This is applied after any scope associated with a given code above.

  • **kwargs (optional): scope variables can also be passed as keyword arguments. These are applied after scope and code.

Changed in version 3.4: The default value for scope is None instead of {}.

property scope: Optional[Mapping[str, Any]]

Scope dictionary for this instance or None.

class bson.CodecOptions(document_class: Optional[Type[Mapping[str, Any]]] = None, tz_aware: bool = False, uuid_representation: Optional[int] = 0, unicode_decode_error_handler: str = 'strict', tzinfo: Optional[datetime.tzinfo] = None, type_registry: Optional[bson.codec_options.TypeRegistry] = None, datetime_conversion: Optional[bson.codec_options.DatetimeConversion] = DatetimeConversion.DATETIME)

Create new instance of _BaseCodecOptions(document_class, tz_aware, uuid_representation, unicode_decode_error_handler, tzinfo, type_registry, datetime_conversion)

with_options(**kwargs: Any) bson.codec_options.CodecOptions

Make a copy of this CodecOptions, overriding some options:

>>> from bson.codec_options import DEFAULT_CODEC_OPTIONS
>>> DEFAULT_CODEC_OPTIONS.tz_aware
False
>>> options = DEFAULT_CODEC_OPTIONS.with_options(tz_aware=True)
>>> options.tz_aware
True

New in version 3.5.

class bson.DBRef(collection: str, id: Any, database: Optional[str] = None, _extra: Optional[Mapping[str, Any]] = None, **kwargs: Any)

Initialize a new DBRef.

Raises TypeError if collection or database is not an instance of basestring (str in python 3). database is optional and allows references to documents to work across databases. Any additional keyword arguments will create additional fields in the resultant embedded document.

Parameters
  • collection: name of the collection the document is stored in

  • id: the value of the document’s "_id" field

  • database (optional): name of the database to reference

  • **kwargs (optional): additional keyword arguments will create additional, custom fields

See also

The MongoDB documentation on dbrefs.

as_doc() bson.son.SON[str, Any]

Get the SON document representation of this DBRef.

Generally not needed by application developers

property collection: str

Get the name of this DBRef’s collection.

property database: Optional[str]

Get the name of this DBRef’s database.

Returns None if this DBRef doesn’t specify a database.

property id: Any

Get this DBRef’s _id.

class bson.DatetimeConversion(value)

Options for decoding BSON datetimes.

DATETIME = 1

Decode a BSON UTC datetime as a datetime.datetime.

BSON UTC datetimes that cannot be represented as a datetime will raise an OverflowError or a ValueError.

DATETIME_AUTO = 4

Decode a BSON UTC datetime as a datetime.datetime if possible, and a DatetimeMS if not.

DATETIME_CLAMP = 2

Decode a BSON UTC datetime as a datetime.datetime, clamping to min and max.

DATETIME_MS = 3

Decode a BSON UTC datetime as a DatetimeMS object.

class bson.DatetimeMS(value: Union[int, datetime.datetime])

Represents a BSON UTC datetime.

BSON UTC datetimes are defined as an int64 of milliseconds since the Unix epoch. The principal use of DatetimeMS is to represent datetimes outside the range of the Python builtin datetime class when encoding/decoding BSON.

To decode UTC datetimes as a DatetimeMS, datetime_conversion in CodecOptions must be set to ‘datetime_ms’ or ‘datetime_auto’. See Handling out of range datetimes for details.

Parameters
  • value: An instance of datetime.datetime to be represented as milliseconds since the Unix epoch, or int of milliseconds since the Unix epoch.

as_datetime(codec_options: bson.codec_options.CodecOptions = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)) datetime.datetime

Create a Python datetime from this DatetimeMS object.

Parameters
  • codec_options: A CodecOptions instance for specifying how the resulting DatetimeMS object will be formatted using tz_aware and tz_info. Defaults to DEFAULT_CODEC_OPTIONS.

class bson.Decimal128(value: Union[decimal.Decimal, float, str, Tuple[int, Sequence[int], int]])

BSON Decimal128 type:

>>> Decimal128(Decimal("0.0005"))
Decimal128('0.0005')
>>> Decimal128("0.0005")
Decimal128('0.0005')
>>> Decimal128((3474527112516337664, 5))
Decimal128('0.0005')
Parameters
  • value: An instance of decimal.Decimal, string, or tuple of (high bits, low bits) from Binary Integer Decimal (BID) format.

Note

Decimal128 uses an instance of decimal.Context configured for IEEE-754 Decimal128 when validating parameters. Signals like decimal.InvalidOperation, decimal.Inexact, and decimal.Overflow are trapped and raised as exceptions:

>>> Decimal128(".13.1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]
>>>
>>> Decimal128("1E-6177")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.Inexact: [<class 'decimal.Inexact'>]
>>>
>>> Decimal128("1E6145")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  ...
decimal.Overflow: [<class 'decimal.Overflow'>, <class 'decimal.Rounded'>]

To ensure the result of a calculation can always be stored as BSON Decimal128 use the context returned by create_decimal128_context():

>>> import decimal
>>> decimal128_ctx = create_decimal128_context()
>>> with decimal.localcontext(decimal128_ctx) as ctx:
...     Decimal128(ctx.create_decimal(".13.3"))
...
Decimal128('NaN')
>>>
>>> with decimal.localcontext(decimal128_ctx) as ctx:
...     Decimal128(ctx.create_decimal("1E-6177"))
...
Decimal128('0E-6176')
>>>
>>> with decimal.localcontext(DECIMAL128_CTX) as ctx:
...     Decimal128(ctx.create_decimal("1E6145"))
...
Decimal128('Infinity')

To match the behavior of MongoDB’s Decimal128 implementation str(Decimal(value)) may not match str(Decimal128(value)) for NaN values:

>>> Decimal128(Decimal('NaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('-NaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('sNaN'))
Decimal128('NaN')
>>> Decimal128(Decimal('-sNaN'))
Decimal128('NaN')

However, to_decimal() will return the exact value:

>>> Decimal128(Decimal('NaN')).to_decimal()
Decimal('NaN')
>>> Decimal128(Decimal('-NaN')).to_decimal()
Decimal('-NaN')
>>> Decimal128(Decimal('sNaN')).to_decimal()
Decimal('sNaN')
>>> Decimal128(Decimal('-sNaN')).to_decimal()
Decimal('-sNaN')

Two instances of Decimal128 compare equal if their Binary Integer Decimal encodings are equal:

>>> Decimal128('NaN') == Decimal128('NaN')
True
>>> Decimal128('NaN').bid == Decimal128('NaN').bid
True

This differs from decimal.Decimal comparisons for NaN:

>>> Decimal('NaN') == Decimal('NaN')
False
property bid: bytes

The Binary Integer Decimal (BID) encoding of this instance.

classmethod from_bid(value: bytes) bson.decimal128.Decimal128

Create an instance of Decimal128 from Binary Integer Decimal string.

Parameters
  • value: 16 byte string (128-bit IEEE 754-2008 decimal floating point in Binary Integer Decimal (BID) format).

to_decimal() decimal.Decimal

Returns an instance of decimal.Decimal for this Decimal128.

class bson.Int64

Representation of the BSON int64 type.

This is necessary because every integral number is an int in Python 3. Small integral numbers are encoded to BSON int32 by default, but Int64 numbers will always be encoded to BSON int64.

Parameters
  • value: the numeric value to represent

exception bson.InvalidBSON

Raised when trying to create a BSON object from invalid data.

exception bson.InvalidDocument

Raised when trying to create a BSON object from an invalid document.

exception bson.InvalidStringData

Raised when trying to encode a string containing non-UTF8 data.

class bson.MaxKey

MongoDB internal MaxKey type.

class bson.MinKey

MongoDB internal MinKey type.

class bson.ObjectId(oid: Optional[Union[str, bson.objectid.ObjectId, bytes]] = None)

Initialize a new ObjectId.

An ObjectId is a 12-byte unique identifier consisting of:

  • a 4-byte value representing the seconds since the Unix epoch,

  • a 5-byte random value,

  • a 3-byte counter, starting with a random value.

By default, ObjectId() creates a new unique identifier. The optional parameter oid can be an ObjectId, or any 12 bytes.

For example, the 12 bytes b’foo-bar-quux’ do not follow the ObjectId specification but they are acceptable input:

>>> ObjectId(b'foo-bar-quux')
ObjectId('666f6f2d6261722d71757578')

oid can also be a str of 24 hex digits:

>>> ObjectId('0123456789ab0123456789ab')
ObjectId('0123456789ab0123456789ab')

Raises InvalidId if oid is not 12 bytes nor 24 hex digits, or TypeError if oid is not an accepted type.

Parameters
  • oid (optional): a valid ObjectId.

See also

The MongoDB documentation on ObjectIds.

Changed in version 3.8: ObjectId now implements the ObjectID specification version 0.2.

property binary: bytes

12-byte binary representation of this ObjectId.

classmethod from_datetime(generation_time: datetime.datetime) bson.objectid.ObjectId

Create a dummy ObjectId instance with a specific generation time.

This method is useful for doing range queries on a field containing ObjectId instances.

Warning

It is not safe to insert a document containing an ObjectId generated using this method. This method deliberately eliminates the uniqueness guarantee that ObjectIds generally provide. ObjectIds generated with this method should be used exclusively in queries.

generation_time will be converted to UTC. Naive datetime instances will be treated as though they already contain UTC.

An example using this helper to get documents where "_id" was generated before January 1, 2010 would be:

>>> gen_time = datetime.datetime(2010, 1, 1)
>>> dummy_id = ObjectId.from_datetime(gen_time)
>>> result = collection.find({"_id": {"$lt": dummy_id}})
Parameters
  • generation_time: datetime to be used as the generation time for the resulting ObjectId.

property generation_time: datetime.datetime

A datetime.datetime instance representing the time of generation for this ObjectId.

The datetime.datetime is timezone aware, and represents the generation time in UTC. It is precise to the second.

classmethod is_valid(oid: Any) bool

Checks if a oid string is valid or not.

Parameters
  • oid: the object id to validate

New in version 2.3.

bson.RE_TYPE

alias of re.Pattern

class bson.Regex(pattern: bson.regex._T, flags: Union[str, int] = 0)

BSON regular expression data.

This class is useful to store and retrieve regular expressions that are incompatible with Python’s regular expression dialect.

Parameters
  • pattern: string

  • flags: (optional) an integer bitmask, or a string of flag characters like “im” for IGNORECASE and MULTILINE

classmethod from_native(regex: Pattern[bson.regex._T]) bson.regex.Regex[bson.regex._T]

Convert a Python regular expression into a Regex instance.

Note that in Python 3, a regular expression compiled from a str has the re.UNICODE flag set. If it is undesirable to store this flag in a BSON regular expression, unset it first:

>>> pattern = re.compile('.*')
>>> regex = Regex.from_native(pattern)
>>> regex.flags ^= re.UNICODE
>>> db.collection.insert_one({'pattern': regex})
Parameters
  • regex: A regular expression object from re.compile().

Warning

Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query.

try_compile() Pattern[bson.regex._T]

Compile this Regex as a Python regular expression.

Warning

Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query. try_compile() may raise re.error.

class bson.SON(*args: Any, **kwargs: Any)

SON data.

A subclass of dict that maintains ordering of keys and provides a few extra niceties for dealing with SON. SON provides an API similar to collections.OrderedDict.

clear() None.  Remove all items from D.
copy() a shallow copy of D
get(key: bson.son._Key, default: Optional[Union[bson.son._Value, bson.son._T]] = None) Optional[Union[bson.son._Value, bson.son._T]]

Return the value for key if key is in the dictionary, else default.

pop(k[, d]) v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem() Tuple[bson.son._Key, bson.son._Value]

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key: bson.son._Key, default: bson.son._Value) bson.son._Value

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

to_dict() Dict[bson.son._Key, bson.son._Value]

Convert a SON document to a normal Python dictionary instance.

This is trickier than just dict(…) because it needs to be recursive.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
class bson.Timestamp(time: Union[datetime.datetime, int], inc: int)

Create a new Timestamp.

This class is only for use with the MongoDB opLog. If you need to store a regular timestamp, please use a datetime.

Raises TypeError if time is not an instance of :class: int or datetime, or inc is not an instance of int. Raises ValueError if time or inc is not in [0, 2**32).

Parameters
  • time: time in seconds since epoch UTC, or a naive UTC datetime, or an aware datetime

  • inc: the incrementing counter

as_datetime() datetime.datetime

Return a datetime instance corresponding to the time portion of this Timestamp.

The returned datetime’s timezone is UTC.

property inc: int

Get the inc portion of this Timestamp.

property time: int

Get the time portion of this Timestamp.

bson.decode(data: Union[bytes, memoryview, mmap, array], codec_options: Optional[CodecOptions[_DocumentType]] = None) bson.codec_options._DocumentType

Decode BSON to a document.

By default, returns a BSON document represented as a Python dict. To use a different MutableMapping class, configure a CodecOptions:

>>> import collections  # From Python standard library.
>>> import bson
>>> from bson.codec_options import CodecOptions
>>> data = bson.encode({'a': 1})
>>> decoded_doc = bson.decode(data)
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.decode(data, codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>
Parameters
  • data: the BSON to decode. Any bytes-like object that implements the buffer protocol.

  • codec_options (optional): An instance of CodecOptions.

New in version 3.9.

bson.decode_all(data: Union[bytes, memoryview, mmap, array], codec_options: Optional[CodecOptions[_DocumentType]] = None) List[bson.codec_options._DocumentType]

Decode BSON data to multiple documents.

data must be a bytes-like object implementing the buffer protocol that provides concatenated, valid, BSON-encoded documents.

Parameters
  • data: BSON data

  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.9: Supports bytes-like objects that implement the buffer protocol.

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as Regex objects. Use try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

bson.decode_file_iter(file_obj: Union[BinaryIO, IO], codec_options: Optional[CodecOptions[_DocumentType]] = None) Iterator[bson.codec_options._DocumentType]

Decode bson data from a file to multiple documents as a generator.

Works similarly to the decode_all function, but reads from the file object in chunks and parses bson in chunks, yielding one document at a time.

Parameters
  • file_obj: A file object containing BSON data.

  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

bson.decode_iter(data: bytes, codec_options: Optional[CodecOptions[_DocumentType]] = None) Iterator[bson.codec_options._DocumentType]

Decode BSON data to multiple documents as a generator.

Works similarly to the decode_all function, but yields one document at a time.

data must be a string of concatenated, valid, BSON-encoded documents.

Parameters
  • data: BSON data

  • codec_options (optional): An instance of CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

bson.encode(document: Mapping[str, Any], check_keys: bool = False, codec_options: bson.codec_options.CodecOptions = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)) bytes

Encode a document to BSON.

A document can be any mapping type (like dict).

Raises TypeError if document is not a mapping type, or contains keys that are not instances of basestring (str in python 3). Raises InvalidDocument if document cannot be converted to BSON.

Parameters
  • document: mapping type representing a document

  • check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising InvalidDocument in either case

  • codec_options (optional): An instance of CodecOptions.

New in version 3.9.

bson.gen_list_name() Generator[bytes, None, None]

Generate “keys” for encoded lists in the sequence b”0", b”1", b”2", …

The first 1000 keys are returned from a pre-built cache. All subsequent keys are generated on the fly.

bson.has_c() bool

Is the C extension installed?

bson.is_valid(bson: bytes) bool

Check that the given string represents valid BSON data.

Raises TypeError if bson is not an instance of str (bytes in python 3). Returns True if bson is valid BSON, False otherwise.

Parameters
  • bson: the data to be validated

Sub-modules: