bson
– BSON (Binary JSON) Encoding and Decoding¶
BSON (Binary JSON) encoding and decoding.
The mapping from Python types to BSON types is as follows:
Python Type |
BSON Type |
Supported Direction |
---|---|---|
None |
null |
both |
bool |
boolean |
both |
int 1 |
int32 / int64 |
py -> bson |
bson.int64.Int64 |
int64 |
both |
float |
number (real) |
both |
str |
string |
both |
list |
array |
both |
dict / SON |
object |
both |
date |
both |
|
bson.regex.Regex |
regex |
both |
compiled re 4 |
regex |
py -> bson |
bson.binary.Binary |
binary |
both |
bson.objectid.ObjectId |
oid |
both |
bson.dbref.DBRef |
dbref |
both |
None |
undefined |
bson -> py |
bson.code.Code |
code |
both |
str |
symbol |
bson -> py |
bytes 5 |
binary |
both |
- 1
A Python int will be saved as a BSON int32 or BSON int64 depending on its size. A BSON int32 will always decode to a Python int. A BSON int64 will always decode to a
Int64
.- 2
datetime.datetime instances will be rounded to the nearest millisecond when saved
- 3
all datetime.datetime instances are treated as naive. clients should always use UTC.
- 4
Regex
instances and regular expression objects fromre.compile()
are both saved as BSON regular expressions. BSON regular expressions are decoded asRegex
instances.- 5
The bytes type is encoded as BSON binary with subtype 0. It will be decoded back to bytes.
- class bson.BSON¶
BSON (Binary JSON) data.
Warning
Using this class to encode and decode BSON adds a performance cost. For better performance use the module level functions
encode()
anddecode()
instead.- decode(codec_options: bson.codec_options.CodecOptions = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))) bson.codec_options._DocumentType ¶
Decode this BSON data.
By default, returns a BSON document represented as a Python
dict
. To use a differentMutableMapping
class, configure aCodecOptions
:>>> import collections # From Python standard library. >>> import bson >>> from bson.codec_options import CodecOptions >>> data = bson.BSON.encode({'a': 1}) >>> decoded_doc = bson.BSON(data).decode() <type 'dict'> >>> options = CodecOptions(document_class=collections.OrderedDict) >>> decoded_doc = bson.BSON(data).decode(codec_options=options) >>> type(decoded_doc) <class 'collections.OrderedDict'>
- Parameters
codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as
Regex
objects. Usetry_compile()
to attempt to convert from a BSON regular expression to a Python regular expression object.Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
- classmethod encode(document: Mapping[str, Any], check_keys: bool = False, codec_options: bson.codec_options.CodecOptions = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))) bson.BSON ¶
Encode a document to a new
BSON
instance.A document can be any mapping type (like
dict
).Raises
TypeError
if document is not a mapping type, or contains keys that are not instances ofbasestring
(str
in python 3). RaisesInvalidDocument
if document cannot be converted toBSON
.- Parameters
document: mapping type representing a document
check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising
InvalidDocument
in either casecodec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Replaced uuid_subtype option with codec_options.
- class bson.Binary(data: Union[memoryview, bytes, _mmap, _array], subtype: int = 0)¶
Representation of BSON binary data.
This is necessary because we want to represent Python strings as the BSON string type. We need to wrap binary data so we can tell the difference between what should be considered binary data and what should be considered a string when we encode to BSON.
Raises TypeError if data is not an instance of
bytes
(str
in python 2) or subtype is not an instance ofint
. Raises ValueError if subtype is not in [0, 256).Note
In python 3 instances of Binary with subtype 0 will be decoded directly to
bytes
.- Parameters
data: the binary data to represent. Can be any bytes-like type that implements the buffer protocol.
subtype (optional): the binary subtype to use
Changed in version 3.9: Support any bytes-like type that implements the buffer protocol.
- as_uuid(uuid_representation: int = 4) uuid.UUID ¶
Create a Python UUID from this BSON Binary object.
Decodes this binary object as a native
uuid.UUID
instance with the provideduuid_representation
.Raises
ValueError
if thisBinary
instance does not contain a UUID.- Parameters
uuid_representation: A member of
UuidRepresentation
. Default:STANDARD
. See Handling UUID Data for details.
New in version 3.11.
- classmethod from_uuid(uuid: uuid.UUID, uuid_representation: int = 4) bson.binary.Binary ¶
Create a BSON Binary object from a Python UUID.
Creates a
Binary
object from auuid.UUID
instance. Assumes that the nativeuuid.UUID
instance uses the byte-order implied by the provideduuid_representation
.Raises
TypeError
if uuid is not an instance ofUUID
.- Parameters
uuid: A
uuid.UUID
instance.uuid_representation: A member of
UuidRepresentation
. Default:STANDARD
. See Handling UUID Data for details.
New in version 3.11.
- class bson.Code(code: Union[str, bson.code.Code], scope: Optional[Mapping[str, Any]] = None, **kwargs: Any)¶
BSON’s JavaScript code type.
Raises
TypeError
if code is not an instance ofbasestring
(str
in python 3) or scope is notNone
or an instance ofdict
.Scope variables can be set by passing a dictionary as the scope argument or by using keyword arguments. If a variable is set as a keyword argument it will override any setting for that variable in the scope dictionary.
- Parameters
code: A string containing JavaScript code to be evaluated or another instance of Code. In the latter case, the scope of code becomes this Code’s
scope
.scope (optional): dictionary representing the scope in which code should be evaluated - a mapping from identifiers (as strings) to values. Defaults to
None
. This is applied after any scope associated with a given code above.**kwargs (optional): scope variables can also be passed as keyword arguments. These are applied after scope and code.
Changed in version 3.4: The default value for
scope
isNone
instead of{}
.
- class bson.CodecOptions(document_class: Optional[Type[Mapping[str, Any]]] = None, tz_aware: bool = False, uuid_representation: Optional[int] = 0, unicode_decode_error_handler: str = 'strict', tzinfo: Optional[datetime.tzinfo] = None, type_registry: Optional[bson.codec_options.TypeRegistry] = None)¶
Create new instance of _BaseCodecOptions(document_class, tz_aware, uuid_representation, unicode_decode_error_handler, tzinfo, type_registry)
- with_options(**kwargs: Any) bson.codec_options.CodecOptions ¶
Make a copy of this CodecOptions, overriding some options:
>>> from bson.codec_options import DEFAULT_CODEC_OPTIONS >>> DEFAULT_CODEC_OPTIONS.tz_aware False >>> options = DEFAULT_CODEC_OPTIONS.with_options(tz_aware=True) >>> options.tz_aware True
New in version 3.5.
- class bson.DBRef(collection: str, id: Any, database: Optional[str] = None, _extra: Optional[Mapping[str, Any]] = None, **kwargs: Any)¶
Initialize a new
DBRef
.Raises
TypeError
if collection or database is not an instance ofbasestring
(str
in python 3). database is optional and allows references to documents to work across databases. Any additional keyword arguments will create additional fields in the resultant embedded document.- Parameters
collection: name of the collection the document is stored in
id: the value of the document’s
"_id"
fielddatabase (optional): name of the database to reference
**kwargs (optional): additional keyword arguments will create additional, custom fields
See also
The MongoDB documentation on dbrefs.
- as_doc() bson.son.SON[str, Any] ¶
Get the SON document representation of this DBRef.
Generally not needed by application developers
- property database: Optional[str]¶
Get the name of this DBRef’s database.
Returns None if this DBRef doesn’t specify a database.
- property id: Any¶
Get this DBRef’s _id.
- class bson.Decimal128(value: Union[decimal.Decimal, float, str, Tuple[int, Sequence[int], int]])¶
BSON Decimal128 type:
>>> Decimal128(Decimal("0.0005")) Decimal128('0.0005') >>> Decimal128("0.0005") Decimal128('0.0005') >>> Decimal128((3474527112516337664, 5)) Decimal128('0.0005')
- Parameters
value: An instance of
decimal.Decimal
, string, or tuple of (high bits, low bits) from Binary Integer Decimal (BID) format.
Note
Decimal128
uses an instance ofdecimal.Context
configured for IEEE-754 Decimal128 when validating parameters. Signals likedecimal.InvalidOperation
,decimal.Inexact
, anddecimal.Overflow
are trapped and raised as exceptions:>>> Decimal128(".13.1") Traceback (most recent call last): File "<stdin>", line 1, in <module> ... decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>] >>> >>> Decimal128("1E-6177") Traceback (most recent call last): File "<stdin>", line 1, in <module> ... decimal.Inexact: [<class 'decimal.Inexact'>] >>> >>> Decimal128("1E6145") Traceback (most recent call last): File "<stdin>", line 1, in <module> ... decimal.Overflow: [<class 'decimal.Overflow'>, <class 'decimal.Rounded'>]
To ensure the result of a calculation can always be stored as BSON Decimal128 use the context returned by
create_decimal128_context()
:>>> import decimal >>> decimal128_ctx = create_decimal128_context() >>> with decimal.localcontext(decimal128_ctx) as ctx: ... Decimal128(ctx.create_decimal(".13.3")) ... Decimal128('NaN') >>> >>> with decimal.localcontext(decimal128_ctx) as ctx: ... Decimal128(ctx.create_decimal("1E-6177")) ... Decimal128('0E-6176') >>> >>> with decimal.localcontext(DECIMAL128_CTX) as ctx: ... Decimal128(ctx.create_decimal("1E6145")) ... Decimal128('Infinity')
To match the behavior of MongoDB’s Decimal128 implementation str(Decimal(value)) may not match str(Decimal128(value)) for NaN values:
>>> Decimal128(Decimal('NaN')) Decimal128('NaN') >>> Decimal128(Decimal('-NaN')) Decimal128('NaN') >>> Decimal128(Decimal('sNaN')) Decimal128('NaN') >>> Decimal128(Decimal('-sNaN')) Decimal128('NaN')
However,
to_decimal()
will return the exact value:>>> Decimal128(Decimal('NaN')).to_decimal() Decimal('NaN') >>> Decimal128(Decimal('-NaN')).to_decimal() Decimal('-NaN') >>> Decimal128(Decimal('sNaN')).to_decimal() Decimal('sNaN') >>> Decimal128(Decimal('-sNaN')).to_decimal() Decimal('-sNaN')
Two instances of
Decimal128
compare equal if their Binary Integer Decimal encodings are equal:>>> Decimal128('NaN') == Decimal128('NaN') True >>> Decimal128('NaN').bid == Decimal128('NaN').bid True
This differs from
decimal.Decimal
comparisons for NaN:>>> Decimal('NaN') == Decimal('NaN') False
- classmethod from_bid(value: bytes) bson.decimal128.Decimal128 ¶
Create an instance of
Decimal128
from Binary Integer Decimal string.- Parameters
value: 16 byte string (128-bit IEEE 754-2008 decimal floating point in Binary Integer Decimal (BID) format).
- to_decimal() decimal.Decimal ¶
Returns an instance of
decimal.Decimal
for thisDecimal128
.
- class bson.Int64¶
Representation of the BSON int64 type.
This is necessary because every integral number is an
int
in Python 3. Small integral numbers are encoded to BSON int32 by default, but Int64 numbers will always be encoded to BSON int64.- Parameters
value: the numeric value to represent
- exception bson.InvalidBSON¶
Raised when trying to create a BSON object from invalid data.
- exception bson.InvalidDocument¶
Raised when trying to create a BSON object from an invalid document.
- exception bson.InvalidStringData¶
Raised when trying to encode a string containing non-UTF8 data.
- class bson.MaxKey¶
MongoDB internal MaxKey type.
- class bson.MinKey¶
MongoDB internal MinKey type.
- class bson.ObjectId(oid: Optional[Union[str, bson.objectid.ObjectId, bytes]] = None)¶
Initialize a new ObjectId.
An ObjectId is a 12-byte unique identifier consisting of:
a 4-byte value representing the seconds since the Unix epoch,
a 5-byte random value,
a 3-byte counter, starting with a random value.
By default,
ObjectId()
creates a new unique identifier. The optional parameter oid can be anObjectId
, or any 12bytes
.For example, the 12 bytes b’foo-bar-quux’ do not follow the ObjectId specification but they are acceptable input:
>>> ObjectId(b'foo-bar-quux') ObjectId('666f6f2d6261722d71757578')
oid can also be a
str
of 24 hex digits:>>> ObjectId('0123456789ab0123456789ab') ObjectId('0123456789ab0123456789ab')
Raises
InvalidId
if oid is not 12 bytes nor 24 hex digits, orTypeError
if oid is not an accepted type.- Parameters
oid (optional): a valid ObjectId.
See also
The MongoDB documentation on ObjectIds.
Changed in version 3.8:
ObjectId
now implements the ObjectID specification version 0.2.- classmethod from_datetime(generation_time: datetime.datetime) bson.objectid.ObjectId ¶
Create a dummy ObjectId instance with a specific generation time.
This method is useful for doing range queries on a field containing
ObjectId
instances.Warning
It is not safe to insert a document containing an ObjectId generated using this method. This method deliberately eliminates the uniqueness guarantee that ObjectIds generally provide. ObjectIds generated with this method should be used exclusively in queries.
generation_time will be converted to UTC. Naive datetime instances will be treated as though they already contain UTC.
An example using this helper to get documents where
"_id"
was generated before January 1, 2010 would be:>>> gen_time = datetime.datetime(2010, 1, 1) >>> dummy_id = ObjectId.from_datetime(gen_time) >>> result = collection.find({"_id": {"$lt": dummy_id}})
- Parameters
generation_time:
datetime
to be used as the generation time for the resulting ObjectId.
- property generation_time: datetime.datetime¶
A
datetime.datetime
instance representing the time of generation for thisObjectId
.The
datetime.datetime
is timezone aware, and represents the generation time in UTC. It is precise to the second.
- bson.RE_TYPE¶
alias of
re.Pattern
- class bson.Regex(pattern: bson.regex._T, flags: Union[str, int] = 0)¶
BSON regular expression data.
This class is useful to store and retrieve regular expressions that are incompatible with Python’s regular expression dialect.
- Parameters
pattern: string
flags: (optional) an integer bitmask, or a string of flag characters like “im” for IGNORECASE and MULTILINE
- classmethod from_native(regex: Pattern[bson.regex._T]) bson.regex.Regex[bson.regex._T] ¶
Convert a Python regular expression into a
Regex
instance.Note that in Python 3, a regular expression compiled from a
str
has there.UNICODE
flag set. If it is undesirable to store this flag in a BSON regular expression, unset it first:>>> pattern = re.compile('.*') >>> regex = Regex.from_native(pattern) >>> regex.flags ^= re.UNICODE >>> db.collection.insert_one({'pattern': regex})
- Parameters
regex: A regular expression object from
re.compile()
.
Warning
Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query.
- try_compile() Pattern[bson.regex._T] ¶
Compile this
Regex
as a Python regular expression.Warning
Python regular expressions use a different syntax and different set of flags than MongoDB, which uses PCRE. A regular expression retrieved from the server may not compile in Python, or may match a different set of strings in Python than when used in a MongoDB query.
try_compile()
may raisere.error
.
- class bson.SON(*args: Any, **kwargs: Any)¶
SON data.
A subclass of dict that maintains ordering of keys and provides a few extra niceties for dealing with SON. SON provides an API similar to collections.OrderedDict.
- clear() None. Remove all items from D. ¶
- copy() a shallow copy of D ¶
- get(key: bson.son._Key, default: Optional[Union[bson.son._Value, bson.son._T]] = None) Optional[Union[bson.son._Value, bson.son._T]] ¶
Return the value for key if key is in the dictionary, else default.
- pop(k[, d]) v, remove specified key and return the corresponding value. ¶
If key is not found, d is returned if given, otherwise KeyError is raised
- popitem() Tuple[bson.son._Key, bson.son._Value] ¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key: bson.son._Key, default: bson.son._Value) bson.son._Value ¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- to_dict() Dict[bson.son._Key, bson.son._Value] ¶
Convert a SON document to a normal Python dictionary instance.
This is trickier than just dict(…) because it needs to be recursive.
- update([E, ]**F) None. Update D from dict/iterable E and F. ¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values ¶
- class bson.Timestamp(time: Union[datetime.datetime, int], inc: int)¶
Create a new
Timestamp
.This class is only for use with the MongoDB opLog. If you need to store a regular timestamp, please use a
datetime
.Raises
TypeError
if time is not an instance of :class: int ordatetime
, or inc is not an instance ofint
. RaisesValueError
if time or inc is not in [0, 2**32).- Parameters
- as_datetime() datetime.datetime ¶
Return a
datetime
instance corresponding to the time portion of thisTimestamp
.The returned datetime’s timezone is UTC.
- bson.decode(data: Union[bytes, memoryview, mmap, array], codec_options: Optional[CodecOptions[_DocumentType]] = None) bson.codec_options._DocumentType ¶
Decode BSON to a document.
By default, returns a BSON document represented as a Python
dict
. To use a differentMutableMapping
class, configure aCodecOptions
:>>> import collections # From Python standard library. >>> import bson >>> from bson.codec_options import CodecOptions >>> data = bson.encode({'a': 1}) >>> decoded_doc = bson.decode(data) <type 'dict'> >>> options = CodecOptions(document_class=collections.OrderedDict) >>> decoded_doc = bson.decode(data, codec_options=options) >>> type(decoded_doc) <class 'collections.OrderedDict'>
- Parameters
data: the BSON to decode. Any bytes-like object that implements the buffer protocol.
codec_options (optional): An instance of
CodecOptions
.
New in version 3.9.
- bson.decode_all(data: Union[bytes, memoryview, mmap, array], codec_options: Optional[CodecOptions[_DocumentType]] = None) List[bson.codec_options._DocumentType] ¶
Decode BSON data to multiple documents.
data must be a bytes-like object implementing the buffer protocol that provides concatenated, valid, BSON-encoded documents.
- Parameters
data: BSON data
codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.9: Supports bytes-like objects that implement the buffer protocol.
Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as
Regex
objects. Usetry_compile()
to attempt to convert from a BSON regular expression to a Python regular expression object.Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
- bson.decode_file_iter(file_obj: Union[BinaryIO, IO], codec_options: Optional[CodecOptions[_DocumentType]] = None) Iterator[bson.codec_options._DocumentType] ¶
Decode bson data from a file to multiple documents as a generator.
Works similarly to the decode_all function, but reads from the file object in chunks and parses bson in chunks, yielding one document at a time.
- Parameters
file_obj: A file object containing BSON data.
codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
New in version 2.8.
- bson.decode_iter(data: bytes, codec_options: Optional[CodecOptions[_DocumentType]] = None) Iterator[bson.codec_options._DocumentType] ¶
Decode BSON data to multiple documents as a generator.
Works similarly to the decode_all function, but yields one document at a time.
data must be a string of concatenated, valid, BSON-encoded documents.
- Parameters
data: BSON data
codec_options (optional): An instance of
CodecOptions
.
Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.
New in version 2.8.
- bson.encode(document: Mapping[str, Any], check_keys: bool = False, codec_options: bson.codec_options.CodecOptions = CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))) bytes ¶
Encode a document to BSON.
A document can be any mapping type (like
dict
).Raises
TypeError
if document is not a mapping type, or contains keys that are not instances ofbasestring
(str
in python 3). RaisesInvalidDocument
if document cannot be converted toBSON
.- Parameters
document: mapping type representing a document
check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising
InvalidDocument
in either casecodec_options (optional): An instance of
CodecOptions
.
New in version 3.9.
- bson.gen_list_name() Generator[bytes, None, None] ¶
Generate “keys” for encoded lists in the sequence b”0", b”1", b”2", …
The first 1000 keys are returned from a pre-built cache. All subsequent keys are generated on the fly.
- bson.is_valid(bson: bytes) bool ¶
Check that the given string represents valid
BSON
data.Raises
TypeError
if bson is not an instance ofstr
(bytes
in python 3). ReturnsTrue
if bson is validBSON
,False
otherwise.- Parameters
bson: the data to be validated
Sub-modules:
binary
– Tools for representing binary data to be stored in MongoDBcode
– Tools for representing JavaScript codecodec_options
– Tools for specifying BSON codec optionsdbref
– Tools for manipulating DBRefs (references to documents stored in MongoDB)decimal128
– Support for BSON Decimal128errors
– Exceptions raised by thebson
packageint64
– Tools for representing BSON int64json_util
– Tools for using Python’sjson
module with BSON documentsmax_key
– Representation for the MongoDB internal MaxKey typemin_key
– Representation for the MongoDB internal MinKey typeobjectid
– Tools for working with MongoDB ObjectIdsraw_bson
– Tools for representing raw BSON documents.regex
– Tools for representing MongoDB regular expressionsson
– Tools for working with SON, an ordered mappingtimestamp
– Tools for representing MongoDB internal Timestampstz_util
– Utilities for dealing with timezones in Python