NGS原理- 单细胞转录组测序-横评13种单细胞测序以及单细胞核测序方法

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了NGS原理- 单细胞转录组测序-横评13种单细胞测序以及单细胞核测序方法相关的知识,希望对你有一定的参考价值。

参考技术A

来源于 Benchmarking single-cell RNA-sequencing protocols for cell atlas projects

对比了13 commonly used scRNA-seq and single-nucleus RNA-seq的方法对比,也算是各有千秋。结果来看,不尽相同。在采用方法策略时候,还是要结合自己的课题,选择合适的方法,不能乱来。

Abstract

测试样品是一个包含人、鼠、狗细胞的混合细胞样品,用于测试13种单细胞测序方案。获得的reads分别mapping到人、鼠、犬的参考序列上,分别计算不同物种的不同测序方法的基因表达量。
The reference sample consists of human PBMCs (60%), and HEK293T (6%), mouse colon (30%), NIH3T3 (3%) and dog MDCK cells (1%). The sample was prepared in one single batch, cryopreserved and sequenced by 13 different sc/snRNA-seq methods. Sequences were uniformly mapped to a joint human, mouse and canine reference, and then separately to produce gene expression counts for each sequencing method.

a, Color legend of sc/snRNA-seq protocols.
b, 人细胞UMAP of 30,807 cells from the human reference sample (Chromium) colored by cell-type annotation.
c, 鼠细胞UMAP of 19,749 cells from the mouse reference (Chromium) colored by cell-type annotation.
d, Boxplots displaying the minimum, the first, second and third quantiles, and the maximum number of genes detected across the protocols, in down-sampled (20,000) HEK293T cells, monocytes and B cells. Cell identities were defined by combining the clustering of each dataset and cell projection on to the reference.
e, Number of detected genes at stepwise. down-sampled, sequencing depths. Points represent the average number of detected genes as a fraction of all cells of the corresponding cell type at the corresponding sequencing depth.
f, Dropout probabilities as a function of expression magnitude, for each protocol and cell type, calculated on down-sampled data (20,000) for 50 randomly selected cells.

a,b, Principal component analysis on down-sampled data (20,000) using highly variable genes between protocols, separated into HEK293T cells, monocytes and B cells, and color coded by protocol (a) and number of detected genes per cell (b).
c, Pearson’s correlation plots across protocols using expression of common genes. For a fair comparison, cells were down-sampled to the same number for each method (B cells, n = 32; monocytes, n = 57; HEK293T cells, n = 55). Protocols are ordered by agglomerative hierarchical clustering.
d, Average log(expression) values of cell-type-specific reference markers for down-sampled (20,000) HEK293T cells, monocytes and B cells.
e, Log(expression) values of reference markers on down-sampled data (20,000) for HEK293T cells, monocytes and B cells (maximum of 50 random cells per technique).
f, Cumulative gene counts per protocol as the average of 100 randomly sampled HEK293T cells, monocytes and B cells, separately on down-sampled data (20,000).

a, The tSNE visualizations of unsupervised clustering in human samples from 13 different methods. Each dataset was analyzed separately after down-sampling to 20,000 reads per cell. Cells are colored by cell type inferred by matchSCore2 before down-sampling. Cells that did not achieve a probability score of 0.5 for any cell type were considered unclassified.
b, Clustering accuracy and ASW for clusters in each protocol.

a–d, UMAP visualization of cells after integrating technologies for 18,034 human (a,b) and 7,902 mouse (c,d) cells. Cells are colored by cell type (a,c) and sc/snRNA-seq protocol (b,d).
e,f, Barplots showing normalized and method-corrected (integrated) expression scores of cell-type-specific signatures for human HEK293T cells, monocytes, B cells (e), and mouse secretory and TA cells (f). Bars represent cells and colors methods.
g,h, Evaluation of method integratability in human (g) and mouse (h) cells. Protocols are compared according to their ability to group cell types into clusters (after integration) and mix with other technologies within the same clusters. Points are colored by sequencing method.

Methods are scored by key analytical metrics, characterizing protocols according to their ability to recapitulate the original structure of complex tissues, and their suitability for cell atlas projects. The methods are ordered by their overall benchmarking score, which is computed by averaging the scores across metrics assessed from the human datasets.

参考文献:

Elisabetta Mereu, Atefeh Lafzi Holger Heyn*
Nature Biotechnology volume 38 , pages747–755(2020) Cite this article

10X单细胞转录组整合、转录组 && ATAC整合分析之VIPCCA

参考技术A

单细胞测序在基因调控、细胞分化和细胞多样性研究中具有革命性意义 。 随着近年来技术的显着改进,每个实验检测的单细胞数量呈指数级增长,同时大规模研究产生的数据集也在快速增长和积累。 因此, 当前单细胞研究中的一个主要计算挑战是对来自多个不同样本或跨不同平台和数据类型的测量进行标准化,以进行有效的综合和比较分析 。 这种综合分析需要开发单细胞数据对齐方法,该方法可以消除批次效应并考虑跨数据集的技术噪声。

最近开发了许多单细胞数据对齐方法 。它们中的大多数,除了一些值得注意的例外,例如最近的 iNMF ,都针对小型和中型数据集。这些现有的方法可以概括为四类:(i) 基于参考的方法 ,例如 Scmap-cluster 和 scAlign,它们基于注释良好的参考数据集对齐新的查询数据集; (ii) 基于聚类的方法 ,例如 Harmony 、DESC,它们通过迭代优化聚类目标函数来消除批效应并在嵌入空间中对齐样本; (iii) 基于匹配的方法 ,例如 MNN 和 Scanorama ,它们应用相互最近的邻居策略来识别跨数据集的重叠单元格和 (iv) 基于投影的方法 ,使用统计模型将来自不同数据集的单个细胞投影到较低的维空间,包括对投影应用典型相关分析的 Seurat ,使用来自非负矩阵分解的潜在因子进行投影的 LIGER , and scVI and others that use variational techniques for projection.

然而, 大多数现有的对齐方法都存在固有缺陷,无法成功应用于大型数据集 。具体而言, 基于参考的方法的对齐将受到参考数据大小和参考中可用的预选细胞类型注释的限制,因此当数据大小增加时,可能会导致错过新发现的机会增加 。像 MNN 这样的基于匹配的方法使用往返游走策略,该策略需要为具有两个以上样本的数据集生成所有成对对齐,这对于大样本量来说将是耗时的。具有复杂参数模型的方法(例如 LIGER 和 scAlign)或具有复杂事后数据处理的方法(例如 Seurat )也难以扩展到大型数据集。 基于 ZINB 的方法(例如 scVI)在捕获多个数据集的复杂表达特征方面可能效率较低 。尽管一些现有的最新方法可以扩展到大型数据集,但由于复杂的参数模型,它们仍然有可能不准确地对齐细胞。因此, 迫切需要开发在计算上也有效的有效对齐方法

除了迫切需要开发可扩展的比对方法外,当前比对方法的另一个阻碍问题是它们的性能通常仅使用单细胞 RNA 测序 (scRNA-seq) 数据进行基准测试和优化。 因此,大多数现有的比对方法不适合整合其他单细胞测序数据类型,例如使用测序 (scATAC-seq) 进行转座酶可及染色质的单细胞测定。 此外, 现有的比对方法(如 Seurat)返回的结果只能保留真实的细胞间关系(或相似性),而不能代表基因表达水平,不适合进行差异表达分析或富集分析等下游分析

为了应对这些挑战, 作者提出了一个统一的计算框架 VIPCCA,它基于非线性概率典型相关分析,用于有效且可扩展的单细胞数据对齐 。 VIPCCA 利用来自深度神经网络的尖端技术对单细胞数据进行非线性建模,从而允许用户通过跨技术、数据类型、条件和模式的多个单细胞数据集的集成来捕获复杂的生物结构。此外,VIPCCA 依靠 变分推理 来进行可扩展计算,从而能够将大规模单细胞数据集与数百万个细胞有效集成。重要的是,VIPCCA 可以将多模态转换为低维空间,而无需任何事后数据处理,这是与现有对齐方法形成直接对比的独特且理想的功能。

加载

Loading data

该函数仅适用于 fit_integrate() 函数训练生成的 AnnData。 在基因表达矩阵中随机选择 2000 个位置。 x轴代表这些位置原始数据的表达值,y轴代表同一位置的vipcca整合后数据的表达值。

After filter, we converting Seurat Object to AnnData via h5Seurat using R packages. In this case, the atac.h5ad file will be generated in the corresponding path .

生活很好,有你更好

以上是关于NGS原理- 单细胞转录组测序-横评13种单细胞测序以及单细胞核测序方法的主要内容,如果未能解决你的问题,请参考以下文章

单细胞转录组测序知识一隅

【单细胞转录组】将序列UMI映射到细胞聚类分群

时间序列的单细胞转录组数据分析

单细胞测序的设计与分析

技术 单细胞转录组测序之10x Genomics

单细胞转录组测序分析--初探Seurat