Package 'HTSCluster' reference manual

Title:	Clustering High-Throughput Transcriptome Sequencing (HTS) Data
Description:	A Poisson mixture model is implemented to cluster genes from high- throughput transcriptome sequencing (RNA-seq) data. Parameter estimation is performed using either the EM or CEM algorithm, and the slope heuristics are used for model selection (i.e., to choose the number of clusters).
Authors:	Andrea Rau, Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis- Rabusseau
Maintainer:	Andrea Rau <[email protected]>
License:	GPL (>= 3)
Version:	2.0.11
Built:	2025-02-27 05:17:59 UTC
Source:	https://github.com/andreamrau/htscluster

Clustering high throughput sequencing (HTS) data

Description

A Poisson mixture model is implemented to cluster genes from high-throughput transcriptome sequencing (RNA-seq) data. Parameter estimation is performed using either the EM or CEM algorithm, and the slope heuristics are used for model selection (i.e., to choose the number of clusters).

Author(s)

Andrea Rau, Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis-Rabusseau

Maintainer: Andrea Rau

References

Rau, A., Maugis-Rabusseau, C., Martin-Magniette, M.-L., Celeux G. (2015). Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics, 31(9):1420-1427.

Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011) Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at https://inria.hal.science/inria-00638082.

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 2000 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g=3, conds=conds, norm="TC") 

## Estimates of pi and lambda for the selected model

pi.est <- run$pi
lambda.est <- run$lambda


## Not run: PMM for 4 total clusters, with one fixed class
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 3, norm = "TC", conds = conds, 
##    fixed.lambda = list(c(1,1,1))) 
##
##
## Not run: PMM model for 4 clusters, with equal proportions
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 4, norm = "TC", conds = conds, 
##     equal.proportions = TRUE) 
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Split Small-EM init
##
## run1.10 <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC")
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Small-EM init
##
## run1.10bis <-  <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC", split.init = FALSE)
##
##
## Not run: previous model equivalent to the following
##
## for(K in 1:10) {
##	run <- PoisMixClus(y, g = K, conds = conds, norm = "TC")
## }


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 2000 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g=3, conds=conds, norm="TC") 

## Estimates of pi and lambda for the selected model

pi.est <- run$pi
lambda.est <- run$lambda


## Not run: PMM for 4 total clusters, with one fixed class
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 3, norm = "TC", conds = conds, 
##    fixed.lambda = list(c(1,1,1))) 
##
##
## Not run: PMM model for 4 clusters, with equal proportions
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 4, norm = "TC", conds = conds, 
##     equal.proportions = TRUE) 
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Split Small-EM init
##
## run1.10 <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC")
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Small-EM init
##
## run1.10bis <-  <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC", split.init = FALSE)
##
##
## Not run: previous model equivalent to the following
##
## for(K in 1:10) {
##	run <- PoisMixClus(y, g = K, conds = conds, norm = "TC")
## }

Calculate ARI for high-dimensional data via data splits

Description

This function is used to calculate Adjusted Rand Index (ARI) values for high-dimensional data.

Usage

highDimensionARI(x, y, splits = 2, verbose = FALSE) 
highDimensionARI(x, y, splits = 2, verbose = FALSE)

Arguments

`x`	Vector of classification labels
`y`	Vector of classification labels
`splits`	Number of subsets data should be split into
`verbose`	`TRUE` if verbose output is desired

Value

Value of Adjusted Rand Index for samples x and y

Author(s)

Andrea Rau

References

Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011). Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at https://inria.hal.science/inria-00638082.

View HTSCluster User's Guide

Description

Finds the location of the HTSCluster User's Guide and optionally opens it.

Usage

HTSClusterUsersGuide(view=TRUE)
HTSClusterUsersGuide(view=TRUE)

Arguments

view

logical, should the document be opened using the default PDF document reader?

Details

The function vignette("HTSCluster") will find the short HTSCluster Vignette which describes how to obtain the HTSCluster User's Guide. The User's Guide is not itself a true vignette because it is not automatically generated using Sweave during the package build process. This means that it cannot be found using vignette, hence the need for this special function.

If the operating system is other than Windows, then the PDF viewer used is that given by Sys.getenv("R_PDFVIEWER"). The PDF viewer can be changed using Sys.putenv(R_PDFVIEWER=).

Note that this function was adapted from that defined by Gordon Smyth in the edgeR package.

Value

Character string giving the file location. If view=TRUE, the PDF document reader is started and the User's Guide is opened, as a side effect.

Author(s)

Gordon Smyth

Examples

# To get the location:
HTSClusterUsersGuide(view=FALSE)
# To open in pdf viewer:
## Not run: HTSClusterUsersGuide()
# To get the location:
HTSClusterUsersGuide(view=FALSE)
# To open in pdf viewer:
## Not run: HTSClusterUsersGuide()

Parameter initialization for a Poisson mixture model.

Description

These functions implement a variety of initialization methods for the parameters of a Poisson mixture model: the Small EM initialization strategy (emInit) described in Rau et al. (2011), a K-means initialization strategy (kmeanInit) that is itself used to initialize the small EM strategy, the splitting small-EM initialization strategy (splitEMInit) based on that described in Papastamoulis et al. (2014), and a function to initialize a small-EM strategy using the posterior probabilities (probaPostInit) obtained from a previous run with one fewer cluster following the splitting strategy.

Usage


emInit(y, g, conds, norm, alg.type = "EM", 
    init.runs, init.iter, fixed.lambda, equal.proportions, verbose)

kmeanInit(y, g, conds, norm, fixed.lambda,
    equal.proportions)

splitEMInit(y, g, conds, norm, alg.type, fixed.lambda,
    equal.proportions, prev.labels, prev.probaPost, init.runs, 
    init.iter, verbose)

probaPostInit(y, g, conds, norm, alg.type = "EM", 
    fixed.lambda, equal.proportions, probaPost.init, init.iter,
    verbose)

emInit(y, g, conds, norm, alg.type = "EM", 
    init.runs, init.iter, fixed.lambda, equal.proportions, verbose)

kmeanInit(y, g, conds, norm, fixed.lambda,
    equal.proportions)

splitEMInit(y, g, conds, norm, alg.type, fixed.lambda,
    equal.proportions, prev.labels, prev.probaPost, init.runs, 
    init.iter, verbose)

probaPostInit(y, g, conds, norm, alg.type = "EM", 
    fixed.lambda, equal.proportions, probaPost.init, init.iter,
    verbose)

Arguments

`y`	(n x q) matrix of observed counts for n observations and q variables
`g`	Number of clusters. If `fixed.lambda` contains a list of lambda values to be fixed, g corresponds to the number of clusters in addition to those fixed.
`conds`	Vector of length q defining the condition (treatment group) for each variable (column) in `y`
`norm`	The type of estimator to be used to normalize for differences in library size: (“`TC`” for total count, “`UQ`” for upper quantile, “`Med`” for median, “`DESeq`” for the normalization method in the DESeq package, and “`TMM`” for the TMM normalization method (Robinson and Oshlack, 2010). Can also be a vector (of length q) containing pre-estimated library size estimates for each sample.
`alg.type`	Algorithm to be used for parameter estimation (“`EM`” or “`CEM`” for the EM or CEM algorithms, respectively)
`init.runs`	In the case of the Small-EM algorithm, the number of independent runs to be performed. In the case of the splitting Small-EM algorithm, the number of cluster splits to be performed in the splitting small-EM initialization.
`init.iter`	The number of iterations to run within each Small-EM algorithm
`fixed.lambda`	If one (or more) clusters with fixed values of lambda is desires, a list containing vectors of length d (the number of conditions). Note that the values of lambda chosen must satisfy the constraint noted in the technical report.
`equal.proportions`	If `TRUE`, the cluster proportions are set to be equal for all clusters. Default is `FALSE` (unequal cluster proportions)
`prev.labels`	A vector of length n of cluster labels obtained from the previous run (g-1 clusters)
`prev.probaPost`	An n x (g-1) matrix of the conditional probabilities of each observation belonging to each of the g-1 clusters from the previous run
`probaPost.init`	An n x (g) matrix of the conditional probabilities of each observation belonging to each of the g clusters following the splitting strategy in the `splitEMInit` function
`verbose`	If `TRUE`, include verbose output

Details

In practice, the user will not directly call the initialization functions described here; they are indirectly called for a single number of clusters through the PoisMixClus function (via init.type) or via the PoisMixClusWrapper function for a sequence of cluster numbers (via gmin.init.type and split.init).

To initialize parameter values for the EM and CEM algorithms, for the Small-EM strategy (Biernacki et al., 2003) we use the emInit function as follows. For a given number of independent runs (given by init.runs), the following procedure is used to obtain parameter values: first, a K-means algorithm (MacQueen, 1967) is run to partition the data into g clusters ( $\hat{\boldsymbol{z}}^{(0)}$ ). Second, initial parameter values $\boldsymbol{\pi}^{(0)}$ and $\boldsymbol{\lambda}^{(0)}$ are calculated (see Rau et al. (2011) for details). Third, a given number of iterations of an EM algorithm are run (defined by init.iter), using $\boldsymbol{\pi}^{(0)}$ and $\boldsymbol{\lambda}^{(0)}$ as initial values. Finally, among the init.runs sets of parameter values, we use $\hat{\boldsymbol{\lambda}}$ and $\hat{\boldsymbol{\pi}}$ corresponding to the highest log likelihood or completed log likelihood to initialize the subsequent full EM or CEM algorithms, respectively.

For the splitting small EM initialization strategy, we implement an approach similar to that described in Papastamoulis et al. (2014), where the cluster from the previous run (with g-1 clusters) with the largest entropy is chosen to be split into two new clusters, followed by a small EM run as described above.

Value

`pi.init`	Vector of length `g` containing the estimate for $\hat{\boldsymbol{\pi}}$ corresponding to the highest log likelihood (or completed log likelihood) from the chosen inialization strategy.
`lambda.init`	(d x `g`) matrix containing the estimate of $\hat{\boldsymbol{\lambda}}$ corresponding to the highest log likelihood (or completed log likelihood) from the chosen initialization strategy, where d is the number of conditions and `g` is the number of clusters.
`lambda`	(d x `g`) matrix containing the estimate of $\hat{\boldsymbol{\lambda}}$ arising from the splitting initialization and small EM run for a single split, where d is the number of conditions and `g` is the number of clusters.
`pi`	Vector of length `g` containing the estimate for $\hat{\boldsymbol{\pi}}$ arising from the splitting initialization and small EM run for a single split, where `g` is the number of clusters.
`log.like`	Log likelihood arising from the splitting initialization and small EM run for a single split.

Author(s)

Andrea Rau

References

Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biology, 11(R106), 1-28.

Biernacki, C., Celeux, G., Govaert, G. (2003) Choosing starting values for the EM algorithm for getting the highest likelhiood in multivariate Gaussian mixture models. Computational Statistics and Data Analysis, 41(1), 561-575.

MacQueen, J. B. (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, number 1, pages 281-297. Berkeley, University of California Press.

Papastamoulis, P., Martin-Magniette, M.-L., and Maugis-Rabusseau, C. (2014). On the estimation of mixtures of Poisson regression models with large number of components. Computational Statistics and Data Analysis: 3rd special Issue on Advances in Mixture Models, DOI: 10.1016/j.csda.2014.07.005.

Robinson, M. D. and Oshlack, A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11(R25).

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 500 observations

simulate <- PoisMixSim(n = 500, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Calculate initial values for lambda and pi using the Small-EM
## initialization (4 classes, PMM-II model with "TC" library size)
##
## init.values <- emInit(y, g = 4, conds, 
##    norm = "TC", alg.type = "EM", 
##    init.runs = 50, init.iter = 10, fixed.lambda = NA,
##    equal.proportions = FALSE, verbose = FALSE)
## pi.init <- init.values$pi.init
## lambda.init <- init.values$lambda.init


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 500 observations

simulate <- PoisMixSim(n = 500, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Calculate initial values for lambda and pi using the Small-EM
## initialization (4 classes, PMM-II model with "TC" library size)
##
## init.values <- emInit(y, g = 4, conds, 
##    norm = "TC", alg.type = "EM", 
##    init.runs = 50, init.iter = 10, fixed.lambda = NA,
##    equal.proportions = FALSE, verbose = FALSE)
## pi.init <- init.values$pi.init
## lambda.init <- init.values$lambda.init

Log likelihood calculation for a Poisson mixture model

Description

Functions to calculate the log likelihood for a Poisson mixture model, the difference in log likelihoods for two different sets of parameters of a Poisson mixture model or the log-likelihood for each observation.

Usage

logLikePoisMix(y, mean, pi)
logLikePoisMixDiff(y, mean.new, pi.new, mean.old, pi.old)
mylogLikePoisMixObs(y, conds, s, lambda, pi)
logLikePoisMix(y, mean, pi)
logLikePoisMixDiff(y, mean.new, pi.new, mean.old, pi.old)
mylogLikePoisMixObs(y, conds, s, lambda, pi)

Arguments

`y`	(n x q) matrix of observed counts for n observations and q variables
`mean`	List of length g containing the (n x q) matrices of conditional mean expression for all observations, as calculated by the `PoisMixMean` function, where g represents the number of clusters
`mean.new`	List of length g containing the (n x q) matrices of conditional mean expression for all observations for one set of parameters, as calculated by the `PoisMixMean` function, where g represents the number of clusters
`mean.old`	List of length g containing the (n x q) matrices of conditional mean expression for all observations for another set of parameters, as calculated by the `PoisMixMean` function, where g represents the number of clusters
`pi.new`	Vector of length g containing one estimate for $\hat{\boldsymbol{\pi}}$
`pi.old`	Vector of length g containing another estimate for $\hat{\boldsymbol{\pi}}$
`pi`	Vector of length g containing estimate for $\hat{\boldsymbol{\pi}}$
`conds`	Vector of length q defining the condition (treatment group) for each variable (column) in `y`
`s`	Estimate of normalized per-variable library size
`lambda`	(d x `g`) matrix containing the current estimate of lambda, where d is the number of conditions (treatment groups) and `g` is the number of clusters

Details

The logLikePoisMixDiff function is used to calculate the difference in log likelihood for two different sets of parameters in a Poisson mixture model; it is used to determine convergence in the EM algorithm run by the PoisMixClus function. The logLikePoisMix function (taken largely from the mylogLikePoisMix function from the poisson.glm.mix R package) calculates the log likelihood for a given set of parameters in a Poisson mixture model and is used in the PoisMixClus function for the calculation of the BIC and ICL. The mylogLikePoisMixObs function calculates the log likelihood per observation for a given set of parameters in a Poisson mixture model.

Value

`ll`	(Depending on the context), the log likelihood, difference in log likelihoods for two different sets of parameters, or per-observation log-likelihood

Note

In the logLikePoisMixDiff function, we make use of the alternative mass function for a Poisson density proposed by Loader (2000) to avoid computational difficulties. The logLikePoisMixDiff function returns a default value of 100 if one or both of the log likelihoods associated with the two parameter sets takes on a value of $-\infty$ .

Author(s)

Andrea Rau

References

Loader, C. (2000) Fast and accurate computation of binomial probabilities. Available at https://lists.gnu.org/archive/html/octave-maintainers/2011-09/pdfK0uKOST642.pdf.

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", low cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "low")
y <- simulate$y
conds <- simulate$conditions
w <- rowSums(y)               ## Estimate of w
r <- table(conds)             ## Number of replicates per condition
d <- length(unique(conds))    ## Number of conditions
s <- colSums(y) / sum(y)      ## TC estimate of lib size
s.dot <- rep(NA, d)           ## Summing lib size within conditions
for(j in 1:d) s.dot[j] <- sum(s[which(conds == unique(conds)[j])]);

## Initial guess for pi and lambda
g.true <- 4
pi.guess <- simulate$pi
## Recalibrate so that (s.dot * lambda.guess) = 1
lambda.sim <- simulate$lambda
lambda.guess <- matrix(NA, nrow = d, ncol = g.true)
for(k in 1:g.true) {
    tmp <- lambda.sim[,k]/sum(lambda.sim[,k])
    lambda.guess[,k] <- tmp/s.dot
}

## Run the PMM-II model for g = 4
## with EM algorithm and "TC" library size parameter
run <- PoisMixClus(y, g = 4, norm = "TC", conds = conds) 
pi.est <- run$pi
lambda.est <- run$lambda

## Mean values for each of the parameter sets
mean.guess <- PoisMixMean(y, 4, conds, s, lambda.guess)
mean.est <- PoisMixMean(y, 4, conds, s, lambda.est)

## Difference in log likelihoods       
LL.diff <- logLikePoisMixDiff(y, mean.guess, pi.guess, mean.est, pi.est)
LL.diff             ## -12841.11

set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", low cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "low")
y <- simulate$y
conds <- simulate$conditions
w <- rowSums(y)               ## Estimate of w
r <- table(conds)             ## Number of replicates per condition
d <- length(unique(conds))    ## Number of conditions
s <- colSums(y) / sum(y)      ## TC estimate of lib size
s.dot <- rep(NA, d)           ## Summing lib size within conditions
for(j in 1:d) s.dot[j] <- sum(s[which(conds == unique(conds)[j])]);

## Initial guess for pi and lambda
g.true <- 4
pi.guess <- simulate$pi
## Recalibrate so that (s.dot * lambda.guess) = 1
lambda.sim <- simulate$lambda
lambda.guess <- matrix(NA, nrow = d, ncol = g.true)
for(k in 1:g.true) {
    tmp <- lambda.sim[,k]/sum(lambda.sim[,k])
    lambda.guess[,k] <- tmp/s.dot
}

## Run the PMM-II model for g = 4
## with EM algorithm and "TC" library size parameter
run <- PoisMixClus(y, g = 4, norm = "TC", conds = conds) 
pi.est <- run$pi
lambda.est <- run$lambda

## Mean values for each of the parameter sets
mean.guess <- PoisMixMean(y, 4, conds, s, lambda.guess)
mean.est <- PoisMixMean(y, 4, conds, s, lambda.est)

## Difference in log likelihoods       
LL.diff <- logLikePoisMixDiff(y, mean.guess, pi.guess, mean.est, pi.est)
LL.diff             ## -12841.11

Visualize results from clustering using a Poisson mixture model

Description

A function to visualize the clustering results obtained from a Poisson mixture model.

Usage

## S3 method for class 'HTSCluster'
plot(x, file.name = FALSE, 
    graphs = c("map", "map.bycluster", "lambda"), data=NA, ...)
## S3 method for class 'HTSClusterWrapper'
plot(x, file.name = FALSE,
    graphs = c("capushe", "ICL", "BIC"), capushe.validation=NA, ...)
## S3 method for class 'HTSCluster'
plot(x, file.name = FALSE, 
    graphs = c("map", "map.bycluster", "lambda"), data=NA, ...)
## S3 method for class 'HTSClusterWrapper'
plot(x, file.name = FALSE,
    graphs = c("capushe", "ICL", "BIC"), capushe.validation=NA, ...)

Arguments

`x`	An object of class `"HTSCluster"` or `"HTSClusterWrapper"`
`file.name`	Optional file name if plots are to be saved in a PDF file.
`graphs`	Type of graph to be included in plots. May be equal to `"map"`, `"may.bycluster"`, `"weighted.histograms"`, and/or `"lambda"` for objects of class `"HTSCluster"` and `c("ICL", "BIC")` for objects of class `"HTSClusterWrapper"`
`capushe.validation`	Optional number of clusters to use for capushe validation (should be less than the maximum number of clusters specificed in the `"HTSClusterWrapper"` object).
`data`	(n x q) matrix of observed counts for n observations and q variables (only required for the plotting of weighted histograms)
`...`	Additional arguments (mainly useful for plotting)

Details

For objects of class "HTSCluster", the plotting function provides the possibility for the following visualizations:

1) A histogram of maximum conditional probabilities across all clusters.

2) Per-cluster boxplots of maximum conditional probabilities.

3) Weighted histograms of observation profiles (with weights equal to the corresponding conditional probability for each observation in each cluster), plotted independently for each variable. Fitted densities after fitting the Poisson mixture model are overlaid in red.

4) A global view of $\boldsymbol{\lambda}$ and $\boldsymbol{\pi}$ values for the selected model. When the number of conditions <= 2, bar heights represent the value of $\boldsymbol{\lambda}_k$ for each cluster, and bar width corresponds to the value of $\boldsymbol{\pi_k}$ .

For objects of class "HTSClusterWrapper", the plotting function provides the possibility for one or all of the following visualizations:

1) ICL plot for all fitted models.

2) BIC plot for all fitted models.

5) Capushe diagnostic plots.

Author(s)

Andrea Rau

References

Andrea Rau, Gilles Celeux, Marie-Laure Martin-Magniette, and Cathy Maugis-Rabusseau (2011). Clustering high-throughput sequencing data with Poisson mixture models. Technical report RR-7786, Inria Saclay – Ile-de-France.

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 2000 observations
simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm
run <- PoisMixClus(y, g = 3, 
    norm = "TC", conds = conds, init.type = "small-em")

## Visualization of results (not run):
## plot(run)


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 2000 observations
simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm
run <- PoisMixClus(y, g = 3, 
    norm = "TC", conds = conds, init.type = "small-em")

## Visualization of results (not run):
## plot(run)

Poisson mixture model estimation and model selection

Description

These functions implement the EM and CEM algorithms for parameter estimation in a Poisson mixture model for clustering high throughput sequencing observations (e.g., genes) for a single number of clusters (PoisMixClus) or a sequence of cluster numbers (PoisMixClusWrapper). Parameters are initialized using a Small-EM strategy as described in Rau et al. (2011) or the splitting small-EM strategy described in Papastamoulis et al. (2014), and model selection is performed using the ICL criteria. Note that these functions implement the PMM-I and PMM-II models described in Rau et al. (2011).

Usage

PoisMixClus(y, g, conds, norm = "TMM", 
    init.type = "small-em", init.runs = 1, init.iter = 10,
    alg.type = "EM", cutoff = 10e-6, iter = 1000, fixed.lambda = NA,
    equal.proportions = FALSE, prev.labels = NA, 
    prev.probaPost = NA, verbose = FALSE, interpretation = "sum",
	EM.verbose = FALSE, wrapper = FALSE, subset.index = NA)

PoisMixClusWrapper(y, gmin = 1, gmax, conds, 
    norm = "TMM", gmin.init.type = "small-em",
    init.runs = 1, init.iter = 10, split.init = TRUE, alg.type = "EM", 
    cutoff = 10e-6, iter = 1000, fixed.lambda = NA, 
    equal.proportions = FALSE, verbose = FALSE, interpretation = "sum",
	EM.verbose = FALSE, subset.index = NA)
PoisMixClus(y, g, conds, norm = "TMM", 
    init.type = "small-em", init.runs = 1, init.iter = 10,
    alg.type = "EM", cutoff = 10e-6, iter = 1000, fixed.lambda = NA,
    equal.proportions = FALSE, prev.labels = NA, 
    prev.probaPost = NA, verbose = FALSE, interpretation = "sum",
	EM.verbose = FALSE, wrapper = FALSE, subset.index = NA)

PoisMixClusWrapper(y, gmin = 1, gmax, conds, 
    norm = "TMM", gmin.init.type = "small-em",
    init.runs = 1, init.iter = 10, split.init = TRUE, alg.type = "EM", 
    cutoff = 10e-6, iter = 1000, fixed.lambda = NA, 
    equal.proportions = FALSE, verbose = FALSE, interpretation = "sum",
	EM.verbose = FALSE, subset.index = NA)

Arguments

`y`	(n x q) matrix of observed counts for n observations and q variables
`g`	Number of clusters (a single value). If `fixed.lambda` contains a list of lambda values to be fixed, `g` corresponds to the number of clusters in addition to those fixed.
`gmin`	The minimum number of clusters in a sequence to be tested. In cases where clusters are included with a fixed value of lambda, `gmin` corresponds to the minimum number of clusters in addition to those that are fixed.
`gmax`	The maximum number of clusters in a sequence to be tested. In cases where clusters are included with a fixed value of lambda, `gmax` corresponds to the maximum number of clusters in addition to those that are fixed.
`conds`	Vector of length q defining the condition (treatment group) for each variable (column) in `y`
`norm`	The type of estimator to be used to normalize for differences in library size: (“`TC`” for total count, “`UQ`” for upper quantile, “`Med`” for median, “`DESeq`” for the normalization method in the DESeq package, and “`TMM`” for the TMM normalization method (Robinson and Oshlack, 2010). Can also be a vector (of length q) containing pre-estimated library size estimates for each sample. Note that if the user provides pre-calculated normalization factors, the package will make use of `norm/sum(norm)` as normalization factors.
`init.type`	Type of initialization strategy to be used (“`small-em`” for the Small-EM strategy described in Rau et al. (2011), and “`kmeans`” for a simple K-means initialization)
`gmin.init.type`	Type of initialization strategy to be used for the minimum number of clusters in a sequence (`gmin`): (“`small-em`” for the Small-EM strategy described in Rau et al. (2011), and “`kmeans`” for a simple K-means initialization)
`init.runs`	Number of runs to be used for the Small-EM strategy described in Rau et al. (2011), with a default value of 1
`init.iter`	Number of iterations to be used within each run for the Small-EM strategry, with a default value of 10
`split.init`	If `TRUE`, the splitting initialization strategy of Papastamoulis et al. (2014) will be used for cluster sizes (`gmin`+1, ..., `gmax`). If `FALSE`, the initialization strategy specified in `gmin.init.type` is used for all cluster sizes in the sequence.
`alg.type`	Algorithm to be used for parameter estimation (“`EM`” or “`CEM`”)
`cutoff`	Cutoff to declare algorithm convergence (in terms of differences in log likelihoods from one iteration to the next)
`iter`	Maximum number of iterations to be run for the chosen algorithm
`fixed.lambda`	If one (or more) clusters with fixed values of lambda is desired, a list containing vectors of length d (the number of conditions). specifying the fixed values of lambda for each fixed cluster.
`equal.proportions`	If `TRUE`, the cluster proportions are set to be equal for all clusters. Default is `FALSE` (unequal cluster proportions).
`prev.labels`	A vector of length n of cluster labels obtained from the previous run (g-1 clusters) to be used with the splitting small-EM strategy described in described in Papastamoulis et al. (2014). For other initialization strategies, this parameter takes the value NA
`prev.probaPost`	An n x (g-1) matrix of the conditional probabilities of each observation belonging to each of the g-1 clusters from the previous run, to be used with the splitting small-EM strategy of described in Papastamoulis et al. (2012). For other initialization strategies, this parameter takes the value NA
`verbose`	If `TRUE`, include verbose output
`interpretation`	If `"sum"`, cluster behavior is interpreted with respect to overall gene expression level (sums per gene), otherwise for `"mean"`, cluster behavior is interpreted with respect to mean gene expression (means per gene).
`EM.verbose`	If `TRUE`, more informative output is printed about the EM algorithm, including the number of iterations run and the difference between log-likelihoods at the last and penultimate iterations.
`subset.index`	Optional vector providing the indices of a subset of genes that should be used for the co-expression analysis (i.e., row indices of the data matrix `y`.
`wrapper`	`TRUE` if the `PoisMixClus` function is run from within the `PoisMixClusWrapper` main function, and `FALSE` otherwise. This mainly helps to avoid recalculating parameters several times that are used throughout the algorithm (e.g., library sizes, etc.)

Details

Output of PoisMixClus is an S3 object of class HTSCluster, and output of PoisMixClusWrapper is an S3 object of class HTSClusterWrapper.

In a Poisson mixture model, the data $\mathbf{y}$ are assumed to come from g distinct subpopulations (clusters), each of which is modeled separately; the overall population is thus a mixture of these subpopulations. In the case of a Poisson mixture model with g components, the model may be written as

$f(\mathbf{y};g,\boldsymbol{\Psi}_g) = \prod_{i=1}^n \sum_{k=1}^g \pi_k \prod_{j=1}^{d}\prod_{l=1}^{r_j} P(y_{ijl} ; \boldsymbol{\theta}_k)$

for $i = 1, \ldots, n$ observations in $l = 1, \ldots, r_j$ replicates of $j = 1, \ldots, d$ conditions (treatment groups), where $P(\cdot)$ is the standard Poisson density, $\boldsymbol{\Psi}_g = (\pi_1,\ldots,\pi_{g-1}, \boldsymbol{\theta}^\prime)$ , $\boldsymbol{\theta}^\prime$ contains all of the parameters in $\boldsymbol{\theta}_1,\ldots,\boldsymbol{\theta}_g$ assumed to be distinct, and $\boldsymbol{\pi} = (\pi_1,\ldots,\pi_g)^\prime$ are the mixing proportions such that $\pi_k$ is in (0,1) for all k and $\sum_k \pi_k = 1$ .

We consider the following parameterization for the mean $\boldsymbol{\theta}_k = (\mu_{ijlk})$ . We consider

$\mu_{ijlk} = w_i s_{jl} \lambda_{jk}$

where $w_i$ corresponds to the expression level of observation i, $\boldsymbol{\lambda}_k = (\lambda_{1k},\ldots,\lambda_{dk})$ corresponds to the clustering parameters that define the profiles of the genes in cluster k across all variables, and $s_{jl}$ is the normalized library size (a fixed constant) for replicate l of condition j.

There are two approaches to estimating the parameters of a finite mixture model and obtaining a clustering of the data: the estimation approach (via the EM algorithm) and the clustering approach (via the CEM algorithm). Parameter initialization is done using a Small-EM strategy as described in Rau et al. (2011) via the emInit function. Model selection may be performed using the BIC or ICL criteria, or the slope heuristics.

Value

`lambda`	(d x g) matrix containing the estimate of $\hat{\boldsymbol{\lambda}}$
`pi`	Vector of length g containing the estimate of $\hat{\boldsymbol{\pi}}$
`labels`	Vector of length n containing the cluster assignments of the n observations
`probaPost`	Matrix containing the conditional probabilities of belonging to each cluster for all observations
`log.like`	Value of log likelihood
`BIC`	Value of BIC criterion
`ICL`	Value of ICL criterion
`alg.type`	Estimation algorithm used; matches the argument `alg.type` above)
`norm`	Library size normalization factors used
`conds`	Conditions specified by user
`iterations`	Number of iterations run
`logLikeDiff`	Difference in log-likelihood between the last and penultimate iterations of the algorithm
`subset.index`	If provided by the user, the indices of subset of genes used for co-expression analyses
`loglike.all`	Log likelihoods calculated for each of the fitted models for cluster sizes `gmin`, ..., `gmax`
`capushe`	Results of capushe model selection, an object of class `"Capushe"`
`ICL.all`	ICL values calculated for each of the fitted models for cluster sizes `gmin`, ..., `gmax`
`ICL.results`	Object of class `HTSCluster` giving the results from the model chosen via the ICL criterion
`BIC.results`	Object of class `HTSCluster` giving the results from the model chosen via the BIC
`DDSE.results`	Object of class `HTSCluster` giving the results from the model chosen via the DDSE slope heuristics criterion
`Djump.results`	Object of class `HTSCluster` giving the results from the model chosen via the Djump slope heuristics criterion
`all.results`	List of objects of class `HTSCluster` giving the results for all models for cluster sizes `gmin`, ..., `gmax`
`model.selection`	Type of criteria used for model selection, equal to `NA` for direct calls to `PoisMixClus` or `"DDSE"`, `"Djump"`, `"BIC"`, or `"ICL"` for the respective selected models for calls to `PoisMixClusWrapper`

Note

Note that the fixed.lambda argument is primarily intended to be used in the case when a single cluster is fixed to have equal clustering parameters lambda across all conditions (i.e., $\lambda_{j1}=\lambda_{1}=1$ ); this is particularly useful when identifying genes with non-differential expression across all conditions (see the HTSDiff R package for more details). Alternatively, this argument could be used to specify a cluster for which genes are only expressed in a single condition (e.g., $\lambda_{11} = 1$ and $\lambda_{j1} = 0$ for all $j > 1$ ). Other possibilities could be considered, but note that the fixed values of lambda must satisfy the constraint $\sum_j \lambda_{jk}s_{j.} = 1$ for all $k$ imposed in the model; if this is not the case, a warning message will be printed.

Author(s)

Andrea Rau

References

Anders, S. and Huber, W. (2010) Differential expression analysis for sequence count data. Genome Biology, 11(R106), 1-28.

Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C (2011). Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at https://inria.hal.science/inria-00638082.

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g = 3, conds = conds, norm = "TC") 

## Estimates of pi and lambda for the selected model

pi.est <- run$pi
lambda.est <- run$lambda


## Not run: PMM for 4 total clusters, with one fixed class
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 3, norm = "TC", conds = conds, 
##    fixed.lambda = list(c(1,1,1))) 
##
##
## Not run: PMM model for 4 clusters, with equal proportions
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 4, norm = "TC", conds = conds, 
##     equal.proportions = TRUE) 
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Split Small-EM init
##
## run1.10 <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC")
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Small-EM init
##
## run1.10bis <-  <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC", split.init = FALSE)
##
##
## Not run: previous model equivalent to the following
##
## for(K in 1:10) {
##	run <- PoisMixClus(y, g = K, conds = conds, norm = "TC")
## } 

set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g = 3, conds = conds, norm = "TC") 

## Estimates of pi and lambda for the selected model

pi.est <- run$pi
lambda.est <- run$lambda


## Not run: PMM for 4 total clusters, with one fixed class
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 3, norm = "TC", conds = conds, 
##    fixed.lambda = list(c(1,1,1))) 
##
##
## Not run: PMM model for 4 clusters, with equal proportions
## "TC" library size estimate, EM algorithm
##
## run <- PoisMixClus(y, g = 4, norm = "TC", conds = conds, 
##     equal.proportions = TRUE) 
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Split Small-EM init
##
## run1.10 <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC")
##
##
## Not run: PMM model for g = 1, ..., 10 clusters, Small-EM init
##
## run1.10bis <-  <- PoisMixClusWrapper(y, gmin = 1, gmax = 10, conds = conds, 
##	norm = "TC", split.init = FALSE)
##
##
## Not run: previous model equivalent to the following
##
## for(K in 1:10) {
##	run <- PoisMixClus(y, g = K, conds = conds, norm = "TC")
## }

Calculate the conditional per-cluster mean of each observation

Description

This function is used to calculate the conditional per-cluster mean expression for all observations. This value corresponds to $\boldsymbol{\mu} = (\mu_{ijlk}) = (\hat{w}_i \hat{\lambda}_{jk})$ for the PMM-I model and $\boldsymbol{\mu} = (\mu_{ijlk}) = (\hat{w}_i s_{jl} \hat{\lambda}_{jk})$ for the PMM-II model.

Usage

PoisMixMean(y, g, conds, s, lambda)
PoisMixMean(y, g, conds, s, lambda)

Arguments

`y`	(n x q) matrix of observed counts for n observations and q variables
`g`	Number of clusters
`conds`	Vector of length q defining the condition (treatment group) for each variable (column) in `y`
`s`	Estimate of normalized per-variable library size
`lambda`	(d x `g`) matrix containing the current estimate of lambda, where d is the number of conditions (treatment groups) and `g` is the number of clusters

Value

A list of length g containing the (n x q) matrices of mean expression for all observations, conditioned on each of the g clusters

Author(s)

Andrea Rau

References

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
s <- colSums(y) / sum(y) 	## TC estimate of lib size

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g = 3, norm = "TC", conds = conds) 
pi.est <- run$pi
lambda.est <- run$lambda

## Calculate the per-cluster mean for each observation
means <- PoisMixMean(y, g = 3, conds, s, lambda.est) 

set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
s <- colSums(y) / sum(y) 	## TC estimate of lib size

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g = 3, norm = "TC", conds = conds) 
pi.est <- run$pi
lambda.est <- run$lambda

## Calculate the per-cluster mean for each observation
means <- PoisMixMean(y, g = 3, conds, s, lambda.est)

Simulate data from a Poisson mixture model

Description

This function simulates data from a Poisson mixture model, as described by Rau et al. (2011). Data are simulated with varying expression level ( $w_i$ ) for 4 clusters. Clusters may be simulated with “high” or “low” separation, and three different options are available for the library size setting: “equal”, “A”, and “B”, as described by Rau et al. (2011).

Usage

PoisMixSim(n = 2000, libsize, separation)
PoisMixSim(n = 2000, libsize, separation)

Arguments

`n`	Number of observations
`libsize`	The type of library size difference to be simulated (“`equal`”, “`A`”, or “`B`”, as described by Rau et al. (2011))
`separation`	Cluster separation (“`high`” or “`low`”, as described by Rau et al. (2011))

Value

`y`	(n x q) matrix of simulated counts for n observations and q variables
`labels`	Vector of length n defining the true cluster labels of the simulated data
`pi`	Vector of length 4 (the number of clusters) containing the true value of $\boldsymbol{\pi}$
`lambda`	(d x 4) matrix of $\boldsymbol{\lambda}$ values for d conditions (3 in the case of `libsize =` “`equal`” or “`A`”, and 2 otherwise) in 4 clusters (see note below)
`w`	Row sums of `y` (estimate of $\hat{w}$ )
`conditions`	Vector of length q defining the condition (treatment group) for each variable (column) in `y`

Note

If one or more observations are simulated such that all variables have a value of 0, those rows are removed from the data matrix; as such, in some cases the simulated data y may have less than n rows.

The PMM-I model includes the parameter constraint $\sum_k \lambda_{jk} r_j = 1$ , where $r_j$ is the number of replicates in condition (treatment group) $j$ . Similarly, the parameter constraint in the PMM-II model is $\sum_j \sum_l \lambda_{jk}s_{jl} = 1$ , where $s_{jl}$ is the library size for replicate l of condition j. The value of lambda corresponds to that used to generate the simulated data, where the library sizes were set as described in Table 2 of Rau et al. (2011). However, due to variability in the simulation process, the actually library sizes of the data y are not exactly equal to these values; this means that the value of lambda may not be directly compared to an estimated value of $\hat{\boldsymbol{\lambda}}$ as obtained from the PoisMixClus function.

Author(s)

Andrea Rau

References

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

Calculate the conditional probability of belonging to each cluster in a Poisson mixture model

Description

This function computes the conditional probabilities $t_{ik}$ that an observation i arises from the $k^{\mathrm{th}}$ component for the current value of the mixture parameters.

Usage

probaPost(y, g, conds, pi, s, lambda)
probaPost(y, g, conds, pi, s, lambda)

Arguments

`y`	(n x q) matrix of observed counts for n observations and q variables
`g`	Number of clusters
`conds`	Vector of length q defining the condition (treatment group) for each variable (column) in `y`
`pi`	Vector of length `g` containing the current estimate of $\hat{\boldsymbol{\pi}}$
`s`	Vector of length q containing the estimates for the normalized library size parameters for each of the q variables in `y`
`lambda`	(d x `g`) matrix containing the current estimate $\boldsymbol{\lambda}$ , where d is the number of conditions (treatment groups)

Value

`t`	(n x `g`) matrix made up of the conditional probability of each observation belonging to each of the `g` clusters

Note

If all values of $t_{ik}$ are 0 (or nearly zero), the observation is assigned with probability one to belong to the cluster with the closest mean (in terms of the Euclidean distance from the observation). To avoid calculation difficulties, extreme values of $t_{ik}$ are smoothed, such that those smaller than 1e-10 or larger than 1-1e-10 are set equal to 1e-10 and 1-1e-10, respectively.

Author(s)

Andrea Rau

References

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
s <- colSums(y) / sum(y)     ## TC estimate of lib size

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g = 3, norm = "TC",
 	conds = conds) 
pi.est <- run$pi
lambda.est <- run$lambda

## Calculate the conditional probability of belonging to each cluster
proba <- probaPost(y, g = 3, conds = conds, pi = pi.est, s = s, 
	lambda = lambda.est)

## head(round(proba,2))

set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions
s <- colSums(y) / sum(y)     ## TC estimate of lib size

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm

run <- PoisMixClus(y, g = 3, norm = "TC",
 	conds = conds) 
pi.est <- run$pi
lambda.est <- run$lambda

## Calculate the conditional probability of belonging to each cluster
proba <- probaPost(y, g = 3, conds = conds, pi = pi.est, s = s, 
	lambda = lambda.est)

## head(round(proba,2))

Summarize results from clustering using a Poisson mixture model

Description

A function to summarize the clustering results obtained from a Poisson mixture model.

Usage

## S3 method for class 'HTSCluster'
summary(object, ...)
## S3 method for class 'HTSClusterWrapper'
summary(object, ...)
## S3 method for class 'HTSCluster'
summary(object, ...)
## S3 method for class 'HTSClusterWrapper'
summary(object, ...)

Arguments

`object`	An object of class `"HTSCluster"` or `"HTSClusterWrapper"`
`...`	Additional arguments

Details

The summary function for an object of class "HTSCluster" provides the following summary of results:

1) Number of clusters and model selection criterion used, if applicable.

2) Number of observations across all clusters with a maximum conditional probability greater than 90 model.

3) Number of observations per cluster with a maximum conditional probability greater than 90 selected model.

4) $\boldsymbol{\lambda}$ values for the selected model.

5) $\boldsymbol{\pi}$ values for the selected model.

The summary function for an object of class "HTSClusterWrapper" provides the number of clusters selected for the BIC, ICL, DDSE, and Djump model selection approaches.

Author(s)

Andrea Rau

References

Examples


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 2000 observations
simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm
run <- PoisMixClus(y, g = 3, 
    norm = "TC", conds = conds, init.type = "small-em")

## Summary of results:
summary(run)


set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 2000 observations
simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions

## Run the PMM-II model for g = 3
## "TC" library size estimate, EM algorithm
run <- PoisMixClus(y, g = 3, 
    norm = "TC", conds = conds, init.type = "small-em")

## Summary of results:
summary(run)

Package 'HTSCluster'

Help Index

Clustering high throughput sequencing (HTS) data

Description

Author(s)

References

Examples

Calculate ARI for high-dimensional data via data splits

Description

Usage

Arguments

Value

Author(s)

References

View HTSCluster User's Guide

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Parameter initialization for a Poisson mixture model.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Log likelihood calculation for a Poisson mixture model

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Visualize results from clustering using a Poisson mixture model

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Poisson mixture model estimation and model selection

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Calculate the conditional per-cluster mean of each observation

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Simulate data from a Poisson mixture model

Description

Usage

Arguments

Value

Note

Author(s)