Dataset workflows
The dataset processing pipeline uses dataset loaders to create raw dataset files (Figure 1). The raw dataset files are then processed to generate common dataset files. Common dataset files are used in one or more tasks.
Directory structure
Dataset file and component formats (
src/datasets/api
): This folder contains specifications for dataset file formats and component interfaces. This documentation page was generated mostly by reading in these files.Dataset loader (
src/datasets/loaders
): This folder contains components to load and format datasets for various sources.Dataset normalization (
src/datasets/normalization
): This folder contains various dataset normalization methods.Dataset processors (
src/datasets/processors
): This folder contains components for processing datasets, such as computing a KNN, PCA, HVG or subsetting.Resource generation scripts (
src/common/resources_scripts
): This folder contains scripts for generating the datasets using the dataset loaders, normalization methods and processors.Test resource generation scripts (
src/common/resources_test_scripts
): This folder contains scripts for generating test resources.
Component type: Dataset loader
Path: src/datasets/loaders
A component which generates a “Common dataset”.
Arguments:
Name | Type | Description |
---|---|---|
--output |
file |
(Output) An unprocessed dataset as output by a dataset loader. |
Warning: Unknown or uninitialised column: `file_type`.
File format: Raw dataset
An unprocessed dataset as output by a dataset loader.
Example file: resources_test/common/pancreas/raw.h5ad
Description:
This dataset contains raw counts and metadata as output by a dataset loader.
The format of this file is derived from the CELLxGENE schema v4.0.0.
Format:
AnnData object
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid'
var: 'feature_id', 'feature_name', 'soma_joinid'
layers: 'counts'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
Slot description:
Slot | Type | Description |
---|---|---|
obs["dataset_id"] |
string |
(Optional) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes. |
obs["assay"] |
string |
(Optional) Type of assay used to generate the cell data, indicating the methodology or technique employed. |
obs["assay_ontology_term_id"] |
string |
(Optional) Experimental Factor Ontology (EFO: ) term identifier for the assay, providing a standardized reference to the assay type. |
obs["cell_type"] |
string |
(Optional) Classification of the cell type based on its characteristics and function within the tissue or organism. |
obs["cell_type_ontology_term_id"] |
string |
(Optional) Cell Ontology (CL: ) term identifier for the cell type, offering a standardized reference to the specific cell classification. |
obs["development_stage"] |
string |
(Optional) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase. |
obs["development_stage_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Developmental Stages (HsapDv: ) ontology is used. If the organism is mouse (organism_ontology_term_id == 'NCBITaxon:10090' ), then the Mouse Developmental Stages (MmusDv: ) ontology is used. Otherwise, the Uberon (UBERON: ) ontology is used. |
obs["disease"] |
string |
(Optional) Information on any disease or pathological condition associated with the cell or donor. |
obs["disease_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (MONDO: ) ontology term, or PATO:0000461 from the Phenotype And Trait Ontology (PATO: ). |
obs["donor_id"] |
string |
(Optional) Identifier for the donor from whom the cell sample is obtained. |
obs["is_primary_data"] |
boolean |
(Optional) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data. |
obs["organism"] |
string |
(Optional) Organism from which the cell sample is obtained. |
obs["organism_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (NCBITaxon: ) which is a child of NCBITaxon:33208 . |
obs["self_reported_ethnicity"] |
string |
(Optional) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits. |
obs["self_reported_ethnicity_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Ancestry Ontology (HANCESTRO: ) is used. |
obs["sex"] |
string |
(Optional) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions. |
obs["sex_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only PATO:0000383 , PATO:0000384 and PATO:0001340 are allowed. |
obs["suspension_type"] |
string |
(Optional) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions. |
obs["tissue"] |
string |
(Optional) Specific tissue from which the cells were derived, key for context and specificity in cell studies. |
obs["tissue_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["tissue_general"] |
string |
(Optional) General category or classification of the tissue, useful for broader grouping and comparison of cell data. |
obs["tissue_general_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["batch"] |
string |
(Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
obs["soma_joinid"] |
integer |
(Optional) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell. |
var["feature_id"] |
string |
(Optional) Unique identifier for the feature, usually a ENSEMBL gene id. |
var["feature_name"] |
string |
A human-readable name for the feature, usually a gene symbol. |
var["soma_joinid"] |
integer |
(Optional) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature. |
layers["counts"] |
integer |
Raw counts. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived. |
uns["dataset_name"] |
string |
A human-readable name for the dataset. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
Component type: Dataset normalization
Path: src/datasets/normalization
A normalization method which processes the raw counts into a normalized dataset.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
An unprocessed dataset as output by a dataset loader. |
--output |
file |
(Output) A normalized dataset. |
--normalization_id |
string |
(Optional) The normalization id to store in the dataset metadata. If not specified, the functionality name will be used. |
--layer_output |
string |
(Optional) The name of the layer in which to store the normalized data. Default: normalized . |
--obs_size_factors |
string |
(Optional) In which .obs slot to store the size factors (if any). Default: size_factors . |
Warning: Unknown or uninitialised column: `file_type`.
File format: Normalized dataset
A normalized dataset
Example file: resources_test/common/pancreas/normalized.h5ad
Description:
This dataset contains raw counts and metadata as output by a dataset loader.
The format of this file is derived from the CELLxGENE schema v4.0.0.
Format:
AnnData object
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
var: 'feature_id', 'feature_name', 'soma_joinid'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
Slot description:
Slot | Type | Description |
---|---|---|
obs["dataset_id"] |
string |
(Optional) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes. |
obs["assay"] |
string |
(Optional) Type of assay used to generate the cell data, indicating the methodology or technique employed. |
obs["assay_ontology_term_id"] |
string |
(Optional) Experimental Factor Ontology (EFO: ) term identifier for the assay, providing a standardized reference to the assay type. |
obs["cell_type"] |
string |
(Optional) Classification of the cell type based on its characteristics and function within the tissue or organism. |
obs["cell_type_ontology_term_id"] |
string |
(Optional) Cell Ontology (CL: ) term identifier for the cell type, offering a standardized reference to the specific cell classification. |
obs["development_stage"] |
string |
(Optional) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase. |
obs["development_stage_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Developmental Stages (HsapDv: ) ontology is used. If the organism is mouse (organism_ontology_term_id == 'NCBITaxon:10090' ), then the Mouse Developmental Stages (MmusDv: ) ontology is used. Otherwise, the Uberon (UBERON: ) ontology is used. |
obs["disease"] |
string |
(Optional) Information on any disease or pathological condition associated with the cell or donor. |
obs["disease_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (MONDO: ) ontology term, or PATO:0000461 from the Phenotype And Trait Ontology (PATO: ). |
obs["donor_id"] |
string |
(Optional) Identifier for the donor from whom the cell sample is obtained. |
obs["is_primary_data"] |
boolean |
(Optional) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data. |
obs["organism"] |
string |
(Optional) Organism from which the cell sample is obtained. |
obs["organism_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (NCBITaxon: ) which is a child of NCBITaxon:33208 . |
obs["self_reported_ethnicity"] |
string |
(Optional) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits. |
obs["self_reported_ethnicity_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Ancestry Ontology (HANCESTRO: ) is used. |
obs["sex"] |
string |
(Optional) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions. |
obs["sex_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only PATO:0000383 , PATO:0000384 and PATO:0001340 are allowed. |
obs["suspension_type"] |
string |
(Optional) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions. |
obs["tissue"] |
string |
(Optional) Specific tissue from which the cells were derived, key for context and specificity in cell studies. |
obs["tissue_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["tissue_general"] |
string |
(Optional) General category or classification of the tissue, useful for broader grouping and comparison of cell data. |
obs["tissue_general_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["batch"] |
string |
(Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
obs["soma_joinid"] |
integer |
(Optional) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell. |
obs["size_factors"] |
double |
(Optional) The size factors created by the normalisation method, if any. |
var["feature_id"] |
string |
(Optional) Unique identifier for the feature, usually a ENSEMBL gene id. |
var["feature_name"] |
string |
A human-readable name for the feature, usually a gene symbol. |
var["soma_joinid"] |
integer |
(Optional) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature. |
layers["counts"] |
integer |
Raw counts. |
layers["normalized"] |
double |
Normalised expression values. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived. |
uns["dataset_name"] |
string |
A human-readable name for the dataset. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
Component type: PCA
Path: src/datasets/processors
Computes a PCA embedding of the normalized data.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
A normalised dataset with a PCA embedding and HVG selection. |
--input_layer |
string |
(Optional) Which layer to use as input. Default: normalized . |
--input_var_features |
string |
(Optional) Column name in .var matrix that will be used to select which genes to run the PCA on. Default: hvg . |
--output |
file |
(Output) A normalised dataset with a PCA embedding. |
--obsm_embedding |
string |
(Optional) In which .obsm slot to store the resulting embedding. Default: X_pca . |
--varm_loadings |
string |
(Optional) In which .varm slot to store the resulting loadings matrix. Default: pca_loadings . |
--uns_variance |
string |
(Optional) In which .uns slot to store the resulting variance objects. Default: pca_variance . |
--num_components |
integer |
(Optional) Number of principal components to compute. Defaults to 50, or 1 - minimum dimension size of selected representation. |
Component type: HVG
Path: src/datasets/processors
Computes the highly variable genes scores.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
A normalized dataset. |
--input_layer |
string |
(Optional) Which layer to use as input. Default: normalized . |
--output |
file |
(Output) A normalised dataset with a PCA embedding and HVG selection. |
--var_hvg |
string |
(Optional) In which .var slot to store whether a feature is considered to be hvg. Default: hvg . |
--var_hvg_score |
string |
(Optional) In which .var slot to store the gene variance score (normalized dispersion). Default: hvg_score . |
--num_features |
integer |
(Optional) The number of HVG to select. Default: 1000 . |
Component type: KNN
Path: src/datasets/processors
Computes the k-nearest-neighbours for each cell.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
A normalised dataset with a PCA embedding. |
--input_layer |
string |
(Optional) Which layer to use as input. Default: normalized . |
--output |
file |
(Output) A normalised data with a PCA embedding, HVG selection and a kNN graph. |
--key_added |
string |
(Optional) The neighbors data is added to .uns[key_added] , distances are stored in .obsp[key_added+'_distances'] and connectivities in .obsp[key_added+'_connectivities'] . Default: knn . |
--num_neighbors |
integer |
(Optional) The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation. Default: 15 . |
Warning: Unknown or uninitialised column: `file_type`.
File format: Common dataset
A dataset processed by the common dataset processing pipeline.
Example file: resources_test/common/pancreas/dataset.h5ad
Description:
This dataset contains both raw counts and normalized data matrices, as well as a PCA embedding, HVG selection and a kNN graph.
Format:
AnnData object
obs: 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid', 'size_factors'
var: 'feature_id', 'feature_name', 'soma_joinid', 'hvg', 'hvg_score'
obsm: 'X_pca'
obsp: 'knn_distances', 'knn_connectivities'
varm: 'pca_loadings'
layers: 'counts', 'normalized'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id', 'pca_variance', 'knn'
Slot description:
Slot | Type | Description |
---|---|---|
obs["dataset_id"] |
string |
(Optional) Identifier for the dataset from which the cell data is derived, useful for tracking and referencing purposes. |
obs["assay"] |
string |
(Optional) Type of assay used to generate the cell data, indicating the methodology or technique employed. |
obs["assay_ontology_term_id"] |
string |
(Optional) Experimental Factor Ontology (EFO: ) term identifier for the assay, providing a standardized reference to the assay type. |
obs["cell_type"] |
string |
(Optional) Classification of the cell type based on its characteristics and function within the tissue or organism. |
obs["cell_type_ontology_term_id"] |
string |
(Optional) Cell Ontology (CL: ) term identifier for the cell type, offering a standardized reference to the specific cell classification. |
obs["development_stage"] |
string |
(Optional) Stage of development of the organism or tissue from which the cell is derived, indicating its maturity or developmental phase. |
obs["development_stage_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the developmental stage, providing a standardized reference to the organism’s developmental phase. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Developmental Stages (HsapDv: ) ontology is used. If the organism is mouse (organism_ontology_term_id == 'NCBITaxon:10090' ), then the Mouse Developmental Stages (MmusDv: ) ontology is used. Otherwise, the Uberon (UBERON: ) ontology is used. |
obs["disease"] |
string |
(Optional) Information on any disease or pathological condition associated with the cell or donor. |
obs["disease_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the disease, enabling standardized disease classification and referencing. Must be a term from the Mondo Disease Ontology (MONDO: ) ontology term, or PATO:0000461 from the Phenotype And Trait Ontology (PATO: ). |
obs["donor_id"] |
string |
(Optional) Identifier for the donor from whom the cell sample is obtained. |
obs["is_primary_data"] |
boolean |
(Optional) Indicates whether the data is primary (directly obtained from experiments) or has been computationally derived from other primary data. |
obs["organism"] |
string |
(Optional) Organism from which the cell sample is obtained. |
obs["organism_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the organism, providing a standardized reference for the organism. Must be a term from the NCBI Taxonomy Ontology (NCBITaxon: ) which is a child of NCBITaxon:33208 . |
obs["self_reported_ethnicity"] |
string |
(Optional) Ethnicity of the donor as self-reported, relevant for studies considering genetic diversity and population-specific traits. |
obs["self_reported_ethnicity_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the self-reported ethnicity, providing a standardized reference for ethnic classifications. If the organism is human (organism_ontology_term_id == 'NCBITaxon:9606' ), then the Human Ancestry Ontology (HANCESTRO: ) is used. |
obs["sex"] |
string |
(Optional) Biological sex of the donor or source organism, crucial for studies involving sex-specific traits or conditions. |
obs["sex_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the biological sex, ensuring standardized classification of sex. Only PATO:0000383 , PATO:0000384 and PATO:0001340 are allowed. |
obs["suspension_type"] |
string |
(Optional) Type of suspension or medium in which the cells were stored or processed, important for understanding cell handling and conditions. |
obs["tissue"] |
string |
(Optional) Specific tissue from which the cells were derived, key for context and specificity in cell studies. |
obs["tissue_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the tissue, providing a standardized reference for the tissue type. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["tissue_general"] |
string |
(Optional) General category or classification of the tissue, useful for broader grouping and comparison of cell data. |
obs["tissue_general_ontology_term_id"] |
string |
(Optional) Ontology term identifier for the general tissue category, aiding in standardizing and grouping tissue types. For organoid or tissue samples, the Uber-anatomy ontology (UBERON: ) is used. The term ids must be a child term of UBERON:0001062 (anatomical entity). For cell cultures, the Cell Ontology (CL: ) is used. The term ids cannot be CL:0000255 , CL:0000257 or CL:0000548 . |
obs["batch"] |
string |
(Optional) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
obs["soma_joinid"] |
integer |
(Optional) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the cell. |
obs["size_factors"] |
double |
(Optional) The size factors created by the normalisation method, if any. |
var["feature_id"] |
string |
(Optional) Unique identifier for the feature, usually a ENSEMBL gene id. |
var["feature_name"] |
string |
A human-readable name for the feature, usually a gene symbol. |
var["soma_joinid"] |
integer |
(Optional) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature. |
var["hvg"] |
boolean |
Whether or not the feature is considered to be a ‘highly variable gene’. |
var["hvg_score"] |
double |
A score for the feature indicating how highly variable it is. |
obsm["X_pca"] |
double |
The resulting PCA embedding. |
obsp["knn_distances"] |
double |
K nearest neighbors distance matrix. |
obsp["knn_connectivities"] |
double |
K nearest neighbors connectivities matrix. |
varm["pca_loadings"] |
double |
The PCA loadings matrix. |
layers["counts"] |
integer |
Raw counts. |
layers["normalized"] |
double |
Normalised expression values. |
uns["dataset_id"] |
string |
A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived. |
uns["dataset_name"] |
string |
A human-readable name for the dataset. |
uns["dataset_url"] |
string |
(Optional) Link to the original source of the dataset. |
uns["dataset_reference"] |
string |
(Optional) Bibtex reference of the paper in which the dataset was published. |
uns["dataset_summary"] |
string |
Short description of the dataset. |
uns["dataset_description"] |
string |
Long description of the dataset. |
uns["dataset_organism"] |
string |
(Optional) The organism of the sample in the dataset. |
uns["normalization_id"] |
string |
Which normalization was used. |
uns["pca_variance"] |
double |
The PCA variance objects. |
uns["knn"] |
object |
Supplementary K nearest neighbors data. |
Component type: Subset
Path: src/datasets/processors
Sample cells and genes randomly.
Arguments:
Name | Type | Description |
---|---|---|
--input |
file |
A dataset processed by the common dataset processing pipeline. |
--input_mod2 |
file |
(Optional) A dataset processed by the common dataset processing pipeline. |
--output |
file |
(Output) A dataset processed by the common dataset processing pipeline. |
--output_mod2 |
file |
(Optional, Output) A dataset processed by the common dataset processing pipeline. |