parsl.dataflow.memoization.BasicMemoizer
- class parsl.dataflow.memoization.BasicMemoizer(*, checkpoint_files: Sequence[str] | None = None, checkpoint_period: str | None = None, checkpoint_mode: Literal['task_exit', 'periodic', 'dfk_exit', 'manual'] | None = None, memoize: bool = True)[source]
Memoizer is responsible for ensuring that identical work is not repeated.
When a task is repeated, i.e., the same function is called with the same exact arguments, the result from a previous execution is reused. wiki
The memoizer implementation here does not collapse duplicate calls at call time, but works only when the result of a previous call is available at the time the duplicate call is made.
For instance:
No advantage from Memoization helps memoization here: here: TaskA TaskB | TaskA | | | TaskA done (TaskB) | | | (TaskB) done | | done | done
The memoizer creates a lookup table by hashing the function name and its inputs, and storing the results of the function.
When a task is ready for launch, i.e., all of its arguments have resolved, we add its hash to the task datastructure.
- __init__(*, checkpoint_files: Sequence[str] | None = None, checkpoint_period: str | None = None, checkpoint_mode: Literal['task_exit', 'periodic', 'dfk_exit', 'manual'] | None = None, memoize: bool = True)[source]
Initialize the memoizer.
KWargs:
- checkpoint_filessequence of str, optional
List of paths to checkpoint files to load. By default, all checkpoints from the run directory will be restored. This is usually the right behaviour, but this parameter allows that behaviour to be overridden. See
parsl.utils.get_all_checkpoints()andparsl.utils.get_last_checkpoint()for helpers.
- checkpoint_periodstr, optional
Time interval (in “HH:MM:SS”) at which to checkpoint completed tasks. Only has an effect if
checkpoint_mode='periodic'.
- checkpoint_modestr, optional
Checkpoint mode to use, can be
'dfk_exit','task_exit','periodic'or'manual'. If set toNone, checkpointing will be disabled. Default is None.
memoize : str, enable memoization or not.
Methods
__init__(*[, checkpoint_files, ...])Initialize the memoizer.
check_memo(task)Create a hash of the task and its inputs and check the lookup table for this hash.
This is the user-facing interface to manual checkpointing.
checkpoint_one(cc)Checkpoint a single task to a checkpoint file.
Checkpoint all tasks registered in self.checkpointable_tasks.
close()Called at DFK shutdown.
load_checkpoints(checkpointDirs)Load checkpoints from the checkpoint files into a dictionary.
start(*, run_dir, config_run_dir)Called by the DFK when it starts up.
update_memo_exception(task, e)Called by the DFK when a task completes with an exception.
update_memo_result(task, r)Called by the DFK when a task completes with a successful result.
Attributes
- check_memo(task: TaskRecord) Future[Any] | None[source]
Create a hash of the task and its inputs and check the lookup table for this hash.
If present, the results are returned.
- Parameters:
task (-) – task from the dfk.tasks table
- Returns:
Result of the function if present in table, wrapped in a Future
This call will also set task[‘hashsum’] to the unique hashsum for the func+inputs.
- checkpoint_one(cc: CheckpointCommand) None[source]
Checkpoint a single task to a checkpoint file.
By default the checkpoints are written to the RUNDIR of the current run under RUNDIR/checkpoints/tasks.pkl
- Kwargs:
task : A task to checkpoint.
Note
Checkpointing only works if memoization is enabled
- checkpoint_queue() None[source]
Checkpoint all tasks registered in self.checkpointable_tasks.
By default the checkpoints are written to the RUNDIR of the current run under RUNDIR/checkpoints/tasks.pkl
Note
Checkpointing only works if memoization is enabled
- close() None[source]
Called at DFK shutdown. This gives the checkpoint system an opportunity for graceful shutdown.
- load_checkpoints(checkpointDirs: Sequence[str] | None) Dict[str, Future][source]
Load checkpoints from the checkpoint files into a dictionary.
The results are used to pre-populate the memoizer’s lookup_table
- Kwargs:
checkpointDirs (list) : List of run folder to use as checkpoints Eg. [‘runinfo/001’, ‘runinfo/002’]
- Returns:
dict containing, hashed -> future mappings
- start(*, run_dir: str, config_run_dir: str) None[source]
Called by the DFK when it starts up.
This is an opportunity for the memoization/checkpoint system to initialize itself.
The path to the per-run run directory and the base run directory are passed as parameters.
- update_memo_exception(task: TaskRecord, e: BaseException) None[source]
Called by the DFK when a task completes with an exception.
On every task completion, either this method or
update_memo_resultwill be called, but not both.This is an opportunity for the memoization/checkpoint system to record the outcome of a task for later discovery by a call to check_memo.
- update_memo_result(task: TaskRecord, r: Any) None[source]
Called by the DFK when a task completes with a successful result.
On every task completion, either this method or
update_memo_exceptionwill be called, but not both.This is an opportunity for the memoization/checkpoint system to record the outcome of a task for later discovery by a call to check_memo.