.. _task-configuration:

==================
Task configuration
==================

When you maintain a complete distribution like Debian or one of its
derivatives, you have to deal with special cases and exceptions, for
example:

* disable build/autopkgtest/etc. of a package on a specific architecture
  because it kills the workers
* restrict the build/autopkgtest/etc. of a package to specific workers
  where the build is known to succeed
* etc.

As a derivative, you might want to make opinionated choices and change some
of the build parameters by using a specific build profile on some
packages.

Those tweaks and exceptions are recorded in a
:collection:`debusine:task-configuration` collection, and
then later used to feed the relevant workflows and work requests.

This collection is meant to store configuration data for tasks, as "key/value
pairs" that are going to be fed in the ``task_data`` field of
:ref:`explanation-work-requests`. Tasks can be any :ref:`type of Tasks
<explanation-tasks>`, but for all practical purposes, ``Worker``
and ``Workflow`` tasks are the most likely target for configuration.

The final configured task data is generated by merging multiple snippets of
configuration, each stored in its :bare-data:`debusine:task-configuration`
entry, and each applying at different levels of granularity.


Looking up task configuration entries for a task
------------------------------------------------

To provide fine-grained control of the configuration, we consider
that a *subject* is being processed by a task and that the task can
have a *configuration context*. The *configuration context* is typically
another parameter of the task that can usefully be leveraged to apply some
consistent configuration across all work requests sharing the same
*configuration context*.

.. todo:: When we will be able to use tags for matching task configuration
   entries we will effectively have the equivalent of supporting multiple
   configuration context entries, and possibly we will replace configuration
   context completely.

Those two values are used to lookup the various snippets of configuration.
The snippets are retrieved and processed in the following orders:

* global (subject=None, context=None)
* context level (subject=None, context != None)
* subject level (subject != None, context=None)
* specific-combination level (subject != None, context != None)

The collection can host partial or full configuration data. But it is
expected to be mainly useful to store overrides, i.e. variations compared
the defaults provided by the task or its containing workflow.

For example, for the ``debian-pipeline`` workflow, ``subject`` would typically be
the source package name while ``context`` would be the name of the target
suite.


About templates
---------------

Template entries follow the same structure as other entries, but they are
only used indirectly, when a normal configuration entry refers
to them as part of its ``use_templates`` field.

It is meant to share some common configuration across multiple similar
packages.

Example::

    template:uefi-sign:
      default_values:
        enable_make_signed_source: True
        make_signed_source_purpose: uefi

    template:uefi-sign-with-fwupd-key:
      use_templates:
        - uefi-sign
      default_values:
        make_signed_source_key: AEC1234

    template:uefi-sign-with-grub-key:
      use_templates:
        - uefi-sign
      default_values:
        make_signed_source_key: CBD3214

    Workflow:debian-pipeline:fwupd-efi::
      use_templates:
        - sign-with-fwupd-key

    Workflow:debian-pipeline:fwupdate::
      use_templates:
        - sign-with-fwupd-key

    Workflow:debian-pipeline:grub2::
      use_templates:
        - sign-with-grub-key


Reducing workflow complexity
----------------------------

Having the ability to store overrides at the worker task level saves us from
adding too many configuration parameters on the workflows, so that the only
required parameters are those that are important to control the
orchestration step.

For example, we can have configuration for the sbuild worker
task next to the configuration for the debian-pipeline workflow::

    Workflow:debian-pipeline:::
      default_values:
        ...

    Worker:sbuild::stretch:
      override_values:
        backend: incus-lxc

This shows how the ``sbuild_backend`` parameter might no longer be a needed
input for the ``debian-pipeline`` workflow, though it is still available.


Integration with tasks
----------------------

To be able to apply changes to the submitted ``task_data`` configuration,
we need to be able to know the *subject* and the *context*, which may depend on
information not available when the task is created. For example, the subject
may be derived from an artifact that is the output of a previous work request
in a workflow.

Task configuration can thus be applied only when a task becomes pending, and
subject and context are generated at that time using the task's
:py:meth:`debusine.db.tasks.DBTask.get_task_configuration_subject_context`
method.


Algorithm to apply the configuration
------------------------------------

Looking up relevant configuration entries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The set of possible entries to apply to a task are queried in the database by
task type and task name. The resulting entries are filtered according to the
task subject, context, provided tags and required tags.

Ordering matching entries
~~~~~~~~~~~~~~~~~~~~~~~~~

The entries about to be applied to a task are sorted according to:

* The value of the ``path`` and ``position`` fields, if present

* Task type, name, subject and context, in this order:

    * ``task_type:task_name::``
    * ``task_type:task_name::context``
    * ``task_type:task_name:subject:``
    * ``task_type:task_name:subject:context``

* The database ID of the entry, as a tie-breaker to make ordering deterministic

If an entry uses templates in ``use_templates``, the referenced template
entries are placed *immediately after* the entry in the resulting ordered list.

Building the set of changes to apply
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once the entries are sorted, they are processed to build a set of default
values and override values, with the following operations::

    default_values = dict()
    override_values = dict()
    locked_values = set()
    provide_tags = set()
    require_tags = set()

    for config_item in all_items:
        # Drop all the entries referenced in `delete_values` (except
        # locked values)
        for key in config_item.delete_values:
            if key in locked_values:
                continue
            del default_values[key]
            del override_values[key]

        # Merge the default/override values in the response
        # (except locked values)
        for key, value in config_item.default_values:
            if key in locked_values:
                continue
            default_values[key] = value
        for key, value in config_item.override_values:
            if key in locked_values:
                continue
            override_values[key] = value

        # Update the set of locked values
        locked_values.update(config_item.lock_values)

        # Update the sets of provided tags
        provide_tags.update(config_item.provide_tags)
        require_tags.update(config_item.require_tags)

    return (default_values, override_values, provide_tags, require_tags)

Applying changes
~~~~~~~~~~~~~~~~

Once we have a set of ``default_values``, ``override_values``, and tags to
provide/require, they get applied to the data available in ``task_data``::

    new_task_data = task_data.copy()
    default_values, override_values = get_merged_task_configuration()

    # Apply default values (add missing values, but also replace explicit
    # None values)
    for k, v in default_values:
        if new_task_data.get(k) is None:
            new_task_data[k] = v

    # Apply overrides
    new_task_data.update(override_values)

    # Add provided/required tags
    tags_provided.add(
        ttags.ProvenanceProvided.WORKSPACE, provide_tags
    )
    tags_required.add(
        wtags.ProvenanceRequired.WORKSPACE, require_tags
    )

The result is stored in :py:attr:`WorkRequest.configured_task_data`, which will
be used from that point on as the task's data, while
:py:attr:`WorkRequest.task_data` remains untouched as documentation for the
initial task input.
