Task structure

Root path: src/tasks/<task_id>.

Each task should contain a data processor (to transform common datasets into task-specific datasets), methods, control methods (for quality control), and metrics (Figure 1).

graph LR
  common_dataset[Common<br/>dataset]:::anndata
  subgraph task_specific[Task-specific workflow]
    dataset_processor[/Dataset<br/>processor/]:::component
    solution[Solution]:::anndata
    masked_data[Dataset]:::anndata
    method[/Method/]:::component
    control_method[/Control<br/>method/]:::component
    output[Output]:::anndata
    metric[/Metric/]:::component
    score[Score]:::anndata
  end
  common_dataset --- dataset_processor --> masked_data & solution
  masked_data --- method --> output
  masked_data & solution --- control_method --> output
  solution & output --- metric --> score

Figure 1: Overview of a typical benchmarking workflow in an OpenProblems task. Legend: Grey rectangles are AnnData .h5ad files, purple rhomboids are Viash components.

Process dataset (src/process_dataset): This components processes common components into task-specific dataset objects. In supervised tasks, this component will usually output a solution, a training dataset and a test dataset. In unsupervised tasks, this component usually output a solution and a masked dataset.
Control methods (src/control_methods): This folder contains control (or control) components for the task. These components have the same interface as the regular methods but also receive the solution object as input. It serves as a starting point to test the relative accuracy of new methods in the task, and also as a quality control for the metrics defined in the task. A control method can either be a positive control or a negative control, which set a maximum and minimum threshold for performance, so any new method should perform better than the negative control methods and worse than the positive control method.
- A positive control is a method where the expected results are known, thus resulting in the best possible value for any metric outcome measure.
- A negative control is a simple, naive, or random method that does not rely on any sophisticated techniques or domain knowledge.
Methods (src/methods): This folder contains method components. Each method component outputs a method output object given the training and test datasets (when applicable).
Metrics (src/metrics): This folder contains metric components. Each metric component outputs one or more metric results given a solution object and a method output object.
Benchmarking pipeline (src/workflows): This folder contains a Nextflow pipeline defining the benchmarking workflow for this task.
File and component formats (src/api): This folder contains task-specific file formats and component interfaces. More specifically:
- task_info.yaml: The task name, description and other metadata.
- file_*.yaml: File format specifications for the task.
- comp_*.yaml: Component interface specifications for the task.
- thumbnail.svg: A thumbnail image for the task.
Resource generation scripts (create_datasets/resources.sh): This folder contains scripts for generating benchmarking resources required for the task.
Test resource generation scripts (create_datasets/test_resources.sh): This folder contains scripts for generating test resources for the task.