binary – Tools for representing binary data to be stored in MongoDB

bson.binary.BINARY_SUBTYPE = 0

BSON binary subtype for binary data.

This is the default subtype for binary data.

bson.binary.FUNCTION_SUBTYPE = 1

BSON binary subtype for functions.

bson.binary.OLD_BINARY_SUBTYPE = 2

Old BSON binary subtype for binary data.

This is the old default subtype, the current default is BINARY_SUBTYPE.

bson.binary.OLD_UUID_SUBTYPE = 3

Old BSON binary subtype for a UUID.

uuid.UUID instances will automatically be encoded by bson using this subtype when using UuidRepresentation.PYTHON_LEGACY, UuidRepresentation.JAVA_LEGACY, or UuidRepresentation.CSHARP_LEGACY.

Added in version 2.1.

bson.binary.UUID_SUBTYPE = 4

BSON binary subtype for a UUID.

This is the standard BSON binary subtype for UUIDs. uuid.UUID instances will automatically be encoded by bson using this subtype when using UuidRepresentation.STANDARD.

bson.binary.STANDARD = 4

An alias for UuidRepresentation.STANDARD.

Added in version 3.0.

bson.binary.PYTHON_LEGACY = 3

An alias for UuidRepresentation.PYTHON_LEGACY.

Added in version 3.0.

bson.binary.JAVA_LEGACY = 5

An alias for UuidRepresentation.JAVA_LEGACY.

Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.

Added in version 2.3.

bson.binary.CSHARP_LEGACY = 6

An alias for UuidRepresentation.CSHARP_LEGACY.

Changed in version 3.6: BSON binary subtype 4 is decoded using RFC-4122 byte order.

Added in version 2.3.

bson.binary.MD5_SUBTYPE = 5

BSON binary subtype for an MD5 hash.

bson.binary.COLUMN_SUBTYPE = 7

BSON binary subtype for columns.

Added in version 4.0.

bson.binary.SENSITIVE_SUBTYPE = 8

BSON binary subtype for sensitive data.

Added in version 4.5.

bson.binary.USER_DEFINED_SUBTYPE = 128

BSON binary subtype for any user defined structure.

class bson.binary.UuidRepresentation
CSHARP_LEGACY = 6

The C#/.net legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary subtype OLD_UUID_SUBTYPE, using the C# driver’s legacy byte order.

See CSHARP_LEGACY for details.

Added in version 3.11.

JAVA_LEGACY = 5

The Java legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary subtype OLD_UUID_SUBTYPE, using the Java driver’s legacy byte order.

See JAVA_LEGACY for details.

Added in version 3.11.

PYTHON_LEGACY = 3

The Python legacy UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtype OLD_UUID_SUBTYPE.

See PYTHON_LEGACY for details.

Added in version 3.11.

STANDARD = 4

The standard UUID representation.

uuid.UUID instances will automatically be encoded to and decoded from BSON binary, using RFC-4122 byte order with binary subtype UUID_SUBTYPE.

See STANDARD for details.

Added in version 3.11.

UNSPECIFIED = 0

An unspecified UUID representation.

When configured, uuid.UUID instances will not be automatically encoded to or decoded from Binary. When encoding a uuid.UUID instance, an error will be raised. To encode a uuid.UUID instance with this configuration, it must be wrapped in the Binary class by the application code. When decoding a BSON binary field with a UUID subtype, a Binary instance will be returned instead of a uuid.UUID instance.

See UNSPECIFIED for details.

Added in version 3.11.

class bson.binary.BinaryVectorDtype(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

(BETA) Datatypes of vector subtype.

Parameters:
  • FLOAT32 – (0x27) Pack list of float as float32

  • INT8 – (0x03) Pack list of int in [-128, 127] as signed int8

  • PACKED_BIT – (0x10) Pack list of int in [0, 255] as unsigned uint8

The PACKED_BIT value represents a special case where vector values themselves can only be of two values (0 or 1) but these are packed together into groups of 8, a byte. In Python, these are displayed as ints in range [0, 255]

Each value is of type bytes with a length of one.

Added in version 4.10.

class bson.binary.BinaryVector(data, dtype, padding=0)
Parameters:
  • data (Sequence[float | int]) – Sequence of numbers representing the mathematical vector.

  • dtype (BinaryVectorDtype) – The data type stored in binary

  • padding (int) – The number of bits in the final byte that are to be ignored when a vector element’s size is less than a byte and the length of the vector is not a multiple of 8.

class bson.binary.Binary(data, subtype=BINARY_SUBTYPE)

Bases: bytes

Representation of BSON binary data.

We want to represent Python strings as the BSON string type. We need to wrap binary data so that we can tell the difference between what should be considered binary data and what should be considered a string when we encode to BSON.

(BETA) Subtype 9 provides a space-efficient representation of 1-dimensional vector data. Its data is prepended with two bytes of metadata. The first (dtype) describes its data type, such as float32 or int8. The second (padding) prescribes the number of bits to ignore in the final byte. This is relevant when the element size of the dtype is not a multiple of 8.

Raises TypeError if subtype is not an instance of int. Raises ValueError if subtype is not in [0, 256).

Note

Instances of Binary with subtype 0 will be decoded directly to bytes.

Parameters:
  • data (Union[memoryview, bytes, _mmap, _array[Any]]) – the binary data to represent. Can be any bytes-like type that implements the buffer protocol.

  • subtype (int) – the binary subtype to use

Return type:

Binary

Changed in version 3.9: Support any bytes-like type that implements the buffer protocol.

Changed in version 4.10: (BETA) Addition of vector subtype.

as_uuid(uuid_representation=4)

Create a Python UUID from this BSON Binary object.

Decodes this binary object as a native uuid.UUID instance with the provided uuid_representation.

Raises ValueError if this Binary instance does not contain a UUID.

Parameters:

uuid_representation (int) – A member of UuidRepresentation. Default: STANDARD. See Handling UUID Data for details.

Return type:

UUID

Added in version 3.11.

as_vector()

(BETA) From the Binary, create a list of numbers, along with dtype and padding.

Returns:

BinaryVector

Return type:

BinaryVector

Added in version 4.10.

classmethod from_uuid(uuid, uuid_representation=4)

Create a BSON Binary object from a Python UUID.

Creates a Binary object from a uuid.UUID instance. Assumes that the native uuid.UUID instance uses the byte-order implied by the provided uuid_representation.

Raises TypeError if uuid is not an instance of UUID.

Parameters:
Return type:

Binary

Added in version 3.11.

classmethod from_vector(vector, dtype, padding=0)

(BETA) Create a BSON Binary of Vector subtype from a list of Numbers.

To interpret the representation of the numbers, a data type must be included. See BinaryVectorDtype for available types and descriptions.

The dtype and padding are prepended to the binary data’s value.

Parameters:
  • vector (list[int, float]) – List of values

  • dtype (BinaryVectorDtype) – Data type of the values

  • padding (int) – For fractional bytes, number of bits to ignore at end of vector.

Returns:

Binary packed data identified by dtype and padding.

Return type:

Binary

Added in version 4.10.

property subtype: int

Subtype of this binary data.