badger_batcher package

Submodules

badger_batcher.core module

Core functionality

class badger_batcher.core.Batcher(records, max_batch_len=None, max_record_size=None, max_batch_size=None, size_calc_fn=None, when_record_size_exceeded='raises')[source]

Bases: object

Utility that helps batching Iterables, main interface for badger_batcher.

Example usage with max batch len, getting the results as list of lists:

>>> records = (f"record: {rec}" for rec in range(21))
>>> batcher = Batcher(records, max_batch_len=5)
>>> batched_records = batcher.batches()
>>> len(batched_records)
5
>>> records = [f"record: {rec}" for rec in range(5)]
>>> batcher = Batcher(records, max_batch_len=2)
>>> batcher.batches()
[['record: 0', 'record: 1'], ['record: 2', 'record: 3'], ['record: 4']]
>>> records = [b"aaaa", b"bb", b"ccccc", b"d"]
>>> batcher = Batcher(
... records,
... max_batch_len=2,
... max_record_size=4,
... size_calc_fn=len,
... when_record_size_exceeded="skip",
... )
>>> batcher.batches()
[[b'aaaa', b'bb'], [b'd']]
>>> records = [b"a", b"a", b"a", b"b", b"ccc", b"toolargeforbatch", b"dd", b"e"]
>>> batcher = Batcher(
... records,
... max_batch_len=3,
... max_batch_size=5,
... size_calc_fn=len,
... when_record_size_exceeded="skip",
... )
>>> batcher.batches()
[[b'a', b'a', b'a'], [b'b', b'ccc'], [b'dd', b'e']]

Iterating the results one batch at a time:

>>> records = (f"record: {rec}" for rec in range(21))
>>> batcher = Batcher(records, max_batch_len=2)
>>> for batch in batcher:
...       # do something
...       first_batch = batch
...       break
>>> first_batch
['record: 0', 'record: 1']

When processing big chunks of data, considering using iterator, as Batcher will not store the immidiate results of records:

>>> import sys
>>> records = (f"record: {rec}" for rec in range(sys.maxsize))
>>> batcher = Batcher(records, max_batch_len=2)
>>> for batch in batcher:
...       first_batch = batch
...       break
>>> first_batch
['record: 0', 'record: 1']
batches()List[List[Any]][source]

Get all batches.

Will load all batches to memory, when batching big sequences, considering iterating a Batcher instance instead.

Returns

batches of records in a list of lists

max_batch_len: Optional[int]
max_batch_size

alias of Optional[int]

max_record_size

alias of Optional[int]

records: Iterable[Any]
size_calc_fn

alias of Optional[Callable[[Any], int]]

when_record_size_exceeded

alias of Optional[str]

badger_batcher.errors module

exception badger_batcher.errors.RecordSizeExceeded[source]

Bases: Exception

Module contents

Top-level package for Badger Batcher.