scripts/create_component/create_python_metric.sh
Check inputs
Check language
Check API file
Read API file
Create output dir
Create config
Create script
Done!
A metric is a quantitative measure used to evaluate the performance of the different methods in solving the specific task problem.
This guide will show you how to create a new Viash component. In the following we will show examples for both Python and R. Note that the Task template repo is used throughout the guide, so make sure to replace any occurrences of "task_template"
with your task of interest.
Make sure you have followed the “Getting started” guide.
Use the create_*_metric.sh
script found in the scripts repository to start creating a new metric. Open the script and update the name
parameter to the desired name of the method.
Check inputs
Check language
Check API file
Read API file
Create output dir
Create config
Create script
Done!
This will create a new folder at src/metrics/my_python_metric
containing a Viash config and a script.
src/metric/my_python_metric
├── script.py Script for running the metric.
├── config.vsh.yaml Config file for metric.
└── ... Optional additional resources.
Check inputs
Check language
Check API file
Read API file
Create output dir
Create config
Create script
Done!
This will create a new folder at src/metrics/my_r_metric
containing a Viash config and a script.
src/metrics/my_r_metric
├── script.R Script for running the metric.
├── config.vsh.yaml Config file for metric.
└── ... Optional additional resources.
Change the --name
to a unique name for your metric. It must match the regex [a-z][a-z0-9_]*
(snakecase).
Some tasks have multiple metric subtypes (e.g. batch_integration
), which will require you to use a different value for --type
corresponding to the desired metric subtype.
The Viash config contains metadata of your metric, which script is used to run it, and the required dependencies.
This is what the config.vsh.yaml
generated by the create_component
component looks like:
config.vsh.yaml
# The API specifies which type of component this is.
# It contains specifications for:
# - The input/output files
# - Common parameters
# - A unit test
__merge__: ../../api/comp_metric.yaml
# A unique identifier for your component (required).
# Can contain only lowercase letters or underscores.
name: my_python_metric
# Metadata for your component
info:
metrics:
# A unique identifier for your metric (required).
# Can contain only lowercase letters or underscores.
name: my_python_metric
# A relatively short label, used when rendering visualisarions (required)
label: My Python Metric
# A one sentence summary of how this metric works (required). Used when
# rendering summary tables.
summary: "FILL IN: A one sentence summary of this metric."
# A multi-line description of how this component works (required). Used
# when rendering reference documentation.
description: |
FILL IN: A (multi-line) description of how this metric works.
# A reference key from the bibtex library at src/common/library.bib (required).
reference: bibtex_reference_key
# URL to the documentation for this metric (required).
documentation_url: https://url.to/the/documentation
# URL to the code repository for this metric (required).
repository_url: https://github.com/organisation/repository
# The minimum possible value for this metric (required)
min: 0
# The maximum possible value for this metric (required)
max: 1
# Whether a higher value represents a 'better' solution (required)
maximize: true
# Component-specific parameters (optional)
# arguments:
# - name: "--n_neighbors"
# type: "integer"
# default: 5
# description: Number of neighbors to use.
# Resources required to run the component
resources:
# The script of your component (required)
- type: python_script
path: script.py
# Additional resources your script needs (optional)
# - type: file
# path: weights.pt
engines:
# Specifications for the Docker image for this component.
- type: docker
image: ghcr.io/openproblems-bio/base_images/python:1.1.0
# Add custom dependencies here (optional). For more information, see
# https://viash.io/reference/config/engines/docker/#setup .
# setup:
# - type: python
# packages: scib==1.1.5
runners:
# This platform allows running the component natively
- type: executable
# Allows turning the component into a Nextflow module / pipeline.
- type: nextflow
directives:
label: [midtime,midmem,midcpu]
config.vsh.yaml
# The API specifies which type of component this is.
# It contains specifications for:
# - The input/output files
# - Common parameters
# - A unit test
__merge__: ../../api/comp_metric.yaml
# A unique identifier for your component (required).
# Can contain only lowercase letters or underscores.
name: my_r_metric
# Metadata for your component
info:
metrics:
# A unique identifier for your metric (required).
# Can contain only lowercase letters or underscores.
name: my_r_metric
# A relatively short label, used when rendering visualisarions (required)
label: My R Metric
# A one sentence summary of how this metric works (required). Used when
# rendering summary tables.
summary: "FILL IN: A one sentence summary of this metric."
# A multi-line description of how this component works (required). Used
# when rendering reference documentation.
description: |
FILL IN: A (multi-line) description of how this metric works.
# A reference key from the bibtex library at src/common/library.bib (required).
reference: bibtex_reference_key
# URL to the documentation for this metric (required).
documentation_url: https://url.to/the/documentation
# URL to the code repository for this metric (required).
repository_url: https://github.com/organisation/repository
# The minimum possible value for this metric (required)
min: 0
# The maximum possible value for this metric (required)
max: 1
# Whether a higher value represents a 'better' solution (required)
maximize: true
# Component-specific parameters (optional)
# arguments:
# - name: "--n_neighbors"
# type: "integer"
# default: 5
# description: Number of neighbors to use.
# Resources required to run the component
resources:
# The script of your component (required)
- type: r_script
path: script.R
# Additional resources your script needs (optional)
# - type: file
# path: weights.pt
engines:
# Specifications for the Docker image for this component.
- type: docker
image: ghcr.io/openproblems-bio/base_images/r:1.1.0
# Add custom dependencies here (optional). For more information, see
# https://viash.io/reference/config/engines/docker/#setup .
# setup:
# - type: r
# packages: tidyverse
runners:
# This platform allows running the component natively
- type: executable
# Allows turning the component into a Nextflow module / pipeline.
- type: nextflow
directives:
label: [midtime,midmem,midcpu]
Please make sure that the following fields in the config file are filled in. The metrics component can contain several metric values these are listed in the info.metrics
.
Each component has it’s own set of dependencies, because different components might have conflicting dependencies.
For your convenience we have created several base images that can be used for python or R scripts. These images can be found in the OpenProblems docker repo base_images. Click on the packages to view the url you need to use. You are not required to use these images but install the required packages to make sure OpenProblems works properly.
openproblems/base_python
Base image for python scripts.
openproblems/base_r
Base image for R scripts.
openproblems/base_pytorch_nvidia
Base image for scripts that use pytorch with nvidia gpu support.
openproblems/base_tensorflow_nvidia
Base image for scripts that use tensorflow with nvidia gpu support.
Update the setup
definition in the platforms
section of the config file. This section describes the packages that need to be installed in the Docker image and are required for your method to run.
If you’re using a custom image use the following minimum setup:
platforms:
- type: docker
Image: your custom image
setup:
- type: apt
packages:
- procps
- libhdf5-dev
- libgeos-dev
- python3
- python3-pip
- python3-dev
- python-is-python3
- type: python
packages:
- rpy2
- anndata~=0.10.0
- scanpy~=1.10.0
- pyyaml
- requests
- jsonschema
github: openproblems-bio/core#subdirectory=packages/python/openproblems
- type: r
packages:
- anndata
- BiocManager
- reticulate
- bit64
github:
- openproblems-bio/core/packages/r/openproblems
Please check out this guide for more information on how to add extra package dependencies.
Tip: After making changes to the components dependencies, you will need to rebuild the docker container as follows:
[notice] Building container 'ghcr.io/openproblems-bio/task_template/metrics/my_python_metric:dev' with Dockerfile
A component’s script typically has five sections:
This is what the script generated by the create_component
component looks like:
script.py
import anndata as ad
## VIASH START
# Note: this section is auto-generated by viash at runtime. To edit it, make changes
# in config.vsh.yaml and then run `viash config inject config.vsh.yaml`.
par = {
'input_solution': 'resources_test/.../solution.h5ad',
'input_prediction': 'resources_test/.../prediction.h5ad',
'output': 'output.h5ad'
}
meta = {
'name': 'my_python_metric'
}
## VIASH END
print('Reading input files', flush=True)
input_solution = ad.read_h5ad(par['input_solution'])
input_prediction = ad.read_h5ad(par['input_prediction'])
print('Compute metrics', flush=True)
# metric_ids and metric_values can have length > 1
# but should be of equal length
uns_metric_ids = [ 'my_python_metric' ]
uns_metric_values = [ 0.5 ]
print("Write output AnnData to file", flush=True)
output = ad.AnnData(
)
output.write_h5ad(par['output'], compression='gzip')
script.R
library(anndata)
## VIASH START
par <- list(
input_solution = "resources_test/.../solution.h5ad",
input_prediction = "resources_test/.../prediction.h5ad",
output = "output.h5ad"
)
meta <- list(
name = "my_r_metric"
)
## VIASH END
cat("Reading input files\n")
input_solution <- anndata::read_h5ad(par[["input_solution"]])
input_prediction <- anndata::read_h5ad(par[["input_prediction"]])
cat("Compute metrics\n")
# metric_ids and metric_values can have length > 1
# but should be of equal length
uns_metric_ids <- c("my_r_metric")
uns_metric_values <- c(0.5)
cat("Write output AnnData to file\n")
output <- anndata::AnnData(
)
output$write_h5ad(par[["output"]], compression = "gzip")
In the top section of the script you can define which packages/libraries the metric needs. If you add a new or different package add the dependency to config.vsh.yaml
in the setup
field (see above).
The Viash code block is designed to facilitate prototyping, by enabling you to execute directly by running python script.py
(or Rscript script.R
for R users). Note that anything between “VIASH START” and “VIASH END” will be removed and replaced with a CLI argument parser when the components are being built by Viash.
Here, the par
dictionary contains all the arguments
defined in the config.vsh.yaml
file (including those from the defined __merge__
file). When adding a argument
in the par
dict also add it to the config.vsh.yaml
in the arguments
section.
This section reads any input AnnData files passed to the component.
This is the most important section of your script, as it defines the core functionality provided by the component. It processes the input data to create results for the particular task at hand.
The output stored in a AnnData object and then written to an .h5ad
file. The format is specified by the API file specified in the __merge__
field in the config file.
Your component’s API file contains the necessary unit tests to check whether your component works and the output is in the correct format.
You can test your component by using the following command:
Running tests in temporary directory: '/tmp/viash_test_accuracy_18442488607226942578'
====================================================================
+/tmp/viash_test_accuracy_18442488607226942578/build_engine_environment/accuracy ---verbosity 6 ---setup cachedbuild ---engine docker
[notice] Building container 'ghcr.io/openproblems-bio/task_template/metrics/accuracy:test' with Dockerfile
[info] docker build -t 'ghcr.io/openproblems-bio/task_template/metrics/accuracy:test' '/tmp/viash_test_accuracy_18442488607226942578/build_engine_environment' -f '/tmp/viash_test_accuracy_18442488607226942578/build_engine_environment/tmp/dockerbuild-accuracy-8B6yA6/Dockerfile'
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 565B done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/openproblems/base_python:1.0.0
#2 DONE 0.1s
#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s
#4 [1/2] FROM docker.io/openproblems/base_python:1.0.0@sha256:bbb0a093e275498bf905237c8cd26124d436f5d35d0f8dd1749c06d0a0a2e88c
#4 DONE 0.0s
#5 [2/2] RUN pip install --upgrade pip && pip install --upgrade --no-cache-dir "scikit-learn"
#5 CACHED
#6 exporting to image
#6 exporting layers done
#6 writing image sha256:30c71a3eaf5a667e179cb53d2b093ed945dd6133e92d64d00bad37d76c8ca2a4 done
#6 naming to ghcr.io/openproblems-bio/task_template/metrics/accuracy:test done
#6 DONE 0.0s
====================================================================
+/tmp/viash_test_accuracy_18442488607226942578/test_run_and_check_output/test_executable
>> Running test 'run'
>> Checking whether input files exist
>> Running script as test
Reading input files
Encode labels
Compute metrics
Write output AnnData to file
>> Checking whether output file exists
>> Reading h5ad files and checking formats
Reading and checking output
AnnData object with n_obs × n_vars = 0 × 0
uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values', 'normalization_id'
All checks succeeded!
====================================================================
+/tmp/viash_test_accuracy_18442488607226942578/test_check_config/test_executable
Load config data
Check .namespace
Check .info.type
Check component metadata
Check references fields
Check Nextflow runner
All checks succeeded!
====================================================================
SUCCESS! All 2 out of 2 test scripts succeeded!
Cleaning up temporary directory
Visit “Run tests” for more information on running unit tests and how to interpret common error messages.
You can also run your component on local files using the viash run
command. For example:
If your component works, please create a pull request.