badger_batcher package¶
Subpackages¶
Submodules¶
badger_batcher.core module¶
Core functionality
-
class
badger_batcher.core.Batcher(records, max_batch_len=None, max_record_size=None, max_batch_size=None, size_calc_fn=None, when_record_size_exceeded='raises')[source]¶ Bases:
objectUtility that helps batching Iterables, main interface for badger_batcher.
Example usage with max batch len, getting the results as list of lists:
>>> records = (f"record: {rec}" for rec in range(21)) >>> batcher = Batcher(records, max_batch_len=5) >>> batched_records = batcher.batches() >>> len(batched_records) 5
>>> records = [f"record: {rec}" for rec in range(5)] >>> batcher = Batcher(records, max_batch_len=2) >>> batcher.batches() [['record: 0', 'record: 1'], ['record: 2', 'record: 3'], ['record: 4']]
>>> records = [b"aaaa", b"bb", b"ccccc", b"d"] >>> batcher = Batcher( ... records, ... max_batch_len=2, ... max_record_size=4, ... size_calc_fn=len, ... when_record_size_exceeded="skip", ... ) >>> batcher.batches() [[b'aaaa', b'bb'], [b'd']]
>>> records = [b"a", b"a", b"a", b"b", b"ccc", b"toolargeforbatch", b"dd", b"e"] >>> batcher = Batcher( ... records, ... max_batch_len=3, ... max_batch_size=5, ... size_calc_fn=len, ... when_record_size_exceeded="skip", ... ) >>> batcher.batches() [[b'a', b'a', b'a'], [b'b', b'ccc'], [b'dd', b'e']]
Iterating the results one batch at a time:
>>> records = (f"record: {rec}" for rec in range(21)) >>> batcher = Batcher(records, max_batch_len=2) >>> for batch in batcher: ... # do something ... first_batch = batch ... break >>> first_batch ['record: 0', 'record: 1']
When processing big chunks of data, considering using iterator, as Batcher will not store the immidiate results of records:
>>> import sys >>> records = (f"record: {rec}" for rec in range(sys.maxsize)) >>> batcher = Batcher(records, max_batch_len=2) >>> for batch in batcher: ... first_batch = batch ... break >>> first_batch ['record: 0', 'record: 1']
-
batches() → List[List[Any]][source]¶ Get all batches.
Will load all batches to memory, when batching big sequences, considering iterating a Batcher instance instead.
- Returns
batches of records in a list of lists
-
max_batch_len: Optional[int]¶
-
max_batch_size¶ alias of Optional[int]
-
max_record_size¶ alias of Optional[int]
-
records: Iterable[Any]¶
-
size_calc_fn¶ alias of Optional[Callable[[Any], int]]
-
when_record_size_exceeded¶ alias of Optional[str]
-
badger_batcher.errors module¶
Module contents¶
Top-level package for Badger Batcher.