如何获得`skbio` PCoA（主坐标分析）结果？

Posted 2023-02-23

技术标签:

【中文标题】如何获得`skbio` PCoA（主坐标分析）结果？【英文标题】：How to get `skbio` PCoA (Principal Coordinate Analysis) results? 【发布时间】：2016-07-14 21:10:57 【问题描述】：

我正在查看skbio's PCoA 方法的attributes（如下所列）。我是这个API 的新手，我希望能够获得eigenvectors 和投影到新轴上的原始点，类似于sklearn.decomposition.PCA 中的.fit_transform，所以我可以创建一些PC_1 vs PC_2 样式的图。我想出了如何获得eigvals 和proportion_explained 但features 以None 的形式返回。

是因为它处于测试阶段吗？

如果有任何使用它的教程，将不胜感激。我是scikit-learn 的忠实粉丝，我想开始使用更多scikit's 产品。

|  Attributes
 |  ----------
 |  short_method_name : str
 |      Abbreviated ordination method name.
 |  long_method_name : str
 |      Ordination method name.
 |  eigvals : pd.Series
 |      The resulting eigenvalues.  The index corresponds to the ordination
 |      axis labels
 |  samples : pd.DataFrame
 |      The position of the samples in the ordination space, row-indexed by the
 |      sample id.
 |  features : pd.DataFrame
 |      The position of the features in the ordination space, row-indexed by
 |      the feature id.
 |  biplot_scores : pd.DataFrame
 |      Correlation coefficients of the samples with respect to the features.
 |  sample_constraints : pd.DataFrame
 |      Site constraints (linear combinations of constraining variables):
 |      coordinates of the sites in the space of the explanatory variables X.
 |      These are the fitted site scores
 |  proportion_explained : pd.Series
 |      Proportion explained by each of the dimensions in the ordination space.
 |      The index corresponds to the ordination axis labels

这是我生成principal component analysis 对象的代码。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", 'axes.grid' : False)
import skbio
from scipy.spatial import distance

%matplotlib inline
np.random.seed(0)

# Iris dataset
DF_data = pd.DataFrame(load_iris().data, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
                       columns = load_iris().feature_names)
n,m = DF_data.shape
# print(n,m)
# 150 4

Se_targets = pd.Series(load_iris().target, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])], 
                       name = "Species")

# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data), 
                           index = DF_data.index,
                           columns = DF_data.columns)

# Distance Matrix
Ar_dist = distance.squareform(distance.pdist(DF_standard.T, metric="braycurtis")) # (m x m) distance measure
DM_dist = skbio.stats.distance.DistanceMatrix(Ar_dist, ids=DF_standard.columns)
PCoA = skbio.stats.ordination.pcoa(DM_dist)

【问题讨论】：

【参考方案1】：

您可以使用OrdinationResults.samples 访问转换后的样本坐标。这将返回由样本 ID（即距离矩阵中的 ID）索引的 pandas.DataFrame 行。由于主坐标分析对样本的距离矩阵进行操作，因此转换后的特征坐标 (OrdinationResults.features) 不可用。 scikit-bio 中接受样本 x 特征表作为输入的其他排序方法将提供转换后的特征坐标（例如 CA、CCA、RDA）。

附注：distance.squareform 调用是不必要的，因为skbio.DistanceMatrix 支持正方形或矢量形式的数组。

【讨论】：

我相信.samples 什么也没返回。我可以再试一次，我会确保我的skbio 已更新。我一直在阅读有关 PCoA 的信息，并且很多资源都相当神秘。就PCA而言，是不是步骤相同，只是在距离矩阵而不是协方差矩阵上进行特征分解？ .samples 是pcoa 生成的OrdinationResults 所必需的。如果您仍然收到None，您可以在scikit-bio issue tracker 上发布问题吗？我的理解是 PCoA 应用于距离矩阵，允许使用非欧几里得距离度量，而 PCA 应用于特征表并使用欧几里得距离。因此，在欧几里得距离矩阵上运行 PCoA 等价于 PCA。 Here's 一个有用的排序方法资源。

DF = skbio.OrdinationResults(long_method_name="TESTING",short_method_name="test",eigvals=PCoA.eigvals, samples=DF_data) DF.samples

将未转换的原始数据返回给我。我做错了吗？是的。您不需要直接构造 skbio.OrdinationResults 对象，它只保存排序方法的结果。 scikit-bio 中的每个排序方法都会为您创建此结果对象，您可以从中访问结果。使用 skbio.stats.ordination.pcoa 函数在 skbio.DistanceMatrix 对象上运行 PCoA。您将收到一个skbio.OrdinationResults 对象，您可以在其上调用.samples 以检索转换后的样本坐标。没问题，乐于助人！

以上是关于如何获得`skbio` PCoA（主坐标分析）结果？的主要内容，如果未能解决你的问题，请参考以下文章