Human immune – Open Problems in Single Cell Analysis

Info

openproblems_v1/immune_cells
Luecken et al. (2021)
1.18 GiB
02-02-2024
33506 × 12303

Quick links

Used in

Description

Human immune cells from peripheral blood and bone marrow taken from 5 datasets comprising 10 batches across technologies (10X, Smart-seq2).

Preview

dataset is an AnnData object with n_obs × n_vars = 33506 × 12303 with slots:

obs: batch, size_factors, tissue, cell_type
var: feature_name, hvg, hvg_score
obsp: knn_connectivities, knn_distances
obsm: X_pca
varm: pca_loadings
layers: counts, normalized
uns: dataset_description, dataset_id, dataset_name, dataset_organism, dataset_reference, dataset_summary, dataset_url, knn, normalization_id, pca_variance

Reference

Name	Description	Type	Data type	Size
obs
`batch`	A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.	`vector`	`category`	33506
`cell_type`	Classification of the cell type based on its characteristics and function within the tissue or organism.	`vector`	`category`	33506
`size_factors`	The size factors created by the normalisation method, if any.	`vector`	`float32`	33506
`tissue`	Specific tissue from which the cells were derived, key for context and specificity in cell studies.	`vector`	`category`	33506
var
`feature_name`	A human-readable name for the feature, usually a gene symbol.	`vector`	`object`	12303
`hvg`	Whether or not the feature is considered to be a ‘highly variable gene’	`vector`	`bool`	12303
`hvg_score`	A ranking of the features by hvg.	`vector`	`float64`	12303
obsp
`knn_connectivities`	K nearest neighbors connectivities matrix.	`sparsematrix`	`float32`	33506 × 33506
`knn_distances`	K nearest neighbors distance matrix.	`sparsematrix`	`float64`	33506 × 33506
obsm
`X_pca`	The resulting PCA embedding.	`densematrix`	`float32`	33506 × 50
varm
`pca_loadings`	The PCA loadings matrix.	`densematrix`	`float32`	12303 × 50
layers
`counts`	Raw counts	`sparsematrix`	`float32`	33506 × 12303
`normalized`	Normalised expression values	`sparsematrix`	`float32`	33506 × 12303
uns
`dataset_description`	Long description of the dataset.	`atomic`	`str`	1
`dataset_id`	A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived.	`atomic`	`str`	1
`dataset_name`	A human-readable name for the dataset.	`atomic`	`str`	1
`dataset_organism`	The organism of the sample in the dataset.	`atomic`	`str`	1
`dataset_reference`	Bibtex reference of the paper in which the dataset was published.	`atomic`	`str`	1
`dataset_summary`	Short description of the dataset.	`atomic`	`str`	1
`dataset_url`	Link to the original source of the dataset.	`atomic`	`str`	1
`knn`	Supplementary K nearest neighbors data.	`dict`		3
`normalization_id`	Which normalization was used	`atomic`	`str`	1
`pca_variance`	The PCA variance objects.	`dict`		2

Slot crossref data

`dataset.layers['counts']`

In R: dataset$layers[["counts"]]

Type: sparsematrix, data type: float32, shape: 33506 × 12303

Raw counts

`dataset.layers['normalized']`

In R: dataset$layers[["normalized"]]

Type: sparsematrix, data type: float32, shape: 33506 × 12303

Normalised expression values

`dataset.obs['batch']`

In R: dataset$obs[["batch"]]

Type: vector, data type: category, shape: 33506

A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc.

`dataset.obs['size_factors']`

In R: dataset$obs[["size_factors"]]

Type: vector, data type: float32, shape: 33506

The size factors created by the normalisation method, if any.

`dataset.obs['tissue']`

In R: dataset$obs[["tissue"]]

Type: vector, data type: category, shape: 33506

Specific tissue from which the cells were derived, key for context and specificity in cell studies.

`dataset.obs['cell_type']`

In R: dataset$obs[["cell_type"]]

Type: vector, data type: category, shape: 33506

Classification of the cell type based on its characteristics and function within the tissue or organism.

`dataset.obsm['X_pca']`

In R: dataset$obsm[["X_pca"]]

Type: densematrix, data type: float32, shape: 33506 × 50

The resulting PCA embedding.

`dataset.obsp['knn_connectivities']`

In R: dataset$obsp[["knn_connectivities"]]

Type: sparsematrix, data type: float32, shape: 33506 × 33506

K nearest neighbors connectivities matrix.

`dataset.obsp['knn_distances']`

In R: dataset$obsp[["knn_distances"]]

Type: sparsematrix, data type: float64, shape: 33506 × 33506

K nearest neighbors distance matrix.

`dataset.uns['dataset_description']`

In R: dataset$uns[["dataset_description"]]

Type: atomic, data type: str, shape: 1

Long description of the dataset.

`dataset.uns['dataset_id']`

In R: dataset$uns[["dataset_id"]]

Type: atomic, data type: str, shape: 1

A unique identifier for the dataset. This is different from the obs.dataset_id field, which is the identifier for the dataset from which the cell data is derived.

`dataset.uns['dataset_name']`

In R: dataset$uns[["dataset_name"]]

Type: atomic, data type: str, shape: 1

A human-readable name for the dataset.

`dataset.uns['dataset_organism']`

In R: dataset$uns[["dataset_organism"]]

Type: atomic, data type: str, shape: 1

The organism of the sample in the dataset.

`dataset.uns['dataset_reference']`

In R: dataset$uns[["dataset_reference"]]

Type: atomic, data type: str, shape: 1

Bibtex reference of the paper in which the dataset was published.

`dataset.uns['dataset_summary']`

In R: dataset$uns[["dataset_summary"]]

Type: atomic, data type: str, shape: 1

Short description of the dataset.

`dataset.uns['dataset_url']`

In R: dataset$uns[["dataset_url"]]

Type: atomic, data type: str, shape: 1

Link to the original source of the dataset.

`dataset.uns['knn']`

In R: dataset$uns[["knn"]]

Type: dict, data type: ``, shape: 3

Supplementary K nearest neighbors data.

`dataset.uns['normalization_id']`

In R: dataset$uns[["normalization_id"]]

Type: atomic, data type: str, shape: 1

Which normalization was used

`dataset.uns['pca_variance']`

In R: dataset$uns[["pca_variance"]]

Type: dict, data type: ``, shape: 2

The PCA variance objects.

`dataset.var['feature_name']`

In R: dataset$var[["feature_name"]]

Type: vector, data type: object, shape: 12303

A human-readable name for the feature, usually a gene symbol.

`dataset.var['hvg']`

In R: dataset$var[["hvg"]]

Type: vector, data type: bool, shape: 12303

Whether or not the feature is considered to be a ‘highly variable gene’

`dataset.var['hvg_score']`

In R: dataset$var[["hvg_score"]]

Type: vector, data type: float64, shape: 12303

A ranking of the features by hvg.

`dataset.varm['pca_loadings']`

In R: dataset$varm[["pca_loadings"]]

Type: densematrix, data type: float32, shape: 12303 × 50

The PCA loadings matrix.

References

Luecken, Malte D., M. Büttner, K. Chaichoompu, A. Danese, M. Interlandi, M. F. Mueller, D. C. Strobl, et al. 2021. “Benchmarking Atlas-Level Data Integration in Single-Cell Genomics.” Nature Methods 19 (1): 41–50. https://doi.org/10.1038/s41592-021-01336-8.