Task structure
Before defining a new task in OpenProblems, it’s important to understand the typical structure of an OpenProblems task (Figure 1).
A task typically consists of a data processors, methods, control methods and metrics. Each component has a well-defined input-output interface, for which the file formats in the resulting AnnData are also described.
File and component formats
Path: src/api
This folder contains task-specific file formats and component interfaces. More specifically:
file_*.yaml
: File format specifications for the task.comp_*.yaml
: Component interface specifications for the task.
Data processors
Path: src/data_processors
This folder contains components that transforms a common dataset into task-specific dataset objects. The data processor component has been proided by default. In supervised tasks, this component will usually output a solution, a training dataset and a test dataset. In unsupervised tasks, this component usually output a solution and a masked dataset.
Methods
Path: src/methods
This folder contains method components. Each method component outputs a prediction given the training and test datasets (when applicable).
Control methods
Path: src/control_methods
This folder contains control methods for the task. These components have the same interface as the regular methods but also receive the solution object as input. It serves as a starting point to test the relative accuracy of new methods in the task, and also as a quality control for the metrics defined in the task. A control method can either be a positive control or a negative control, which set a maximum and minimum threshold for performance, so any new method should perform better than the negative control methods and worse than the positive control method.
A positive control is a method where the expected results are known, thus resulting in the best possible value for any metric outcome measure.
A negative control is a simple, naive, or random method that does not rely on any sophisticated techniques or domain knowledge.
Metrics
Path: src/metrics
This folder contains metric components. Each metric component outputs one or more metric results given a solution object and a method output object.
Benchmarking pipeline
Path: src/workflows
This folder contains a Nextflow workflow defining the benchmarking workflow for this task.