Principal component analysis for bigmemory::big.matrix
inputs
Source: R/pca_interface.R
, R/pca_methods.R
pca_bigmatrix.Rd
Perform principal component analysis (PCA) directly on a
bigmemory::big.matrix
without copying the data into R memory. The
exported helpers mirror the structure of base R's prcomp()
while avoiding
the need to materialise large matrices.
Usage
resolve_big_pointer(x, arg, allow_null = FALSE)
pca_scores_bigmatrix(
xpMat,
rotation,
center,
scale,
ncomp = -1L,
block_size = 1024L
)
pca_variable_loadings(rotation, sdev)
pca_variable_correlations(rotation, sdev, column_sd, scale = NULL)
pca_variable_contributions(loadings)
pca_individual_contributions(scores, sdev, total_weight = NA_real_)
pca_individual_cos2(scores)
pca_variable_cos2(correlations)
# S3 method for class 'bigpca'
summary(object, ...)
# S3 method for class 'summary.bigpca'
print(x, digits = max(3, getOption("digits") - 3), ...)
# S3 method for class 'bigpca'
plot(
x,
y,
type = c("scree", "contributions", "correlation_circle", "biplot"),
max_components = 25L,
component = 1L,
top_n = 20L,
components = c(1L, 2L),
data = NULL,
draw = TRUE,
...
)
Arguments
- x
A
summary.bigpca
object.- arg
Character string naming the argument being validated. Used to construct informative error messages.
- allow_null
Logical flag indicating whether
NULL
is accepted for the argument. WhenTRUE
, aNULL
input is returned unchanged.- xpMat
Either a
bigmemory::big.matrix
or an external pointer such asmat@address
that references the sourcebig.matrix
.- rotation
A rotation matrix such as the
rotation
element returned bypca_bigmatrix()
.- center
For
pca_scores_bigmatrix()
, a numeric vector of column means (optional).- scale
Optional numeric vector of scaling factors returned by
pca_bigmatrix()
. If supplied, it indicates the PCA was performed on standardised variables.- ncomp
Number of components to retain. Use a non-positive value to keep all components returned by the decomposition.
- block_size
Number of rows to process per block when streaming data through BLAS kernels. Larger values improve throughput at the cost of additional memory.
- sdev
A numeric vector of component standard deviations, typically the
sdev
element frompca_bigmatrix()
.- column_sd
A numeric vector with the marginal standard deviation of each original variable. When
scale
is supplied, correlations are computed on the standardised scale without rescaling bycolumn_sd
.- loadings
A numeric matrix such as the result of
pca_variable_loadings()
.- scores
For
pca_individual_contributions()
andpca_individual_cos2()
, a numeric matrix of component scores where rows correspond to observations and columns to components.- total_weight
Optional positive scalar giving the effective number of observations to use when computing contributions. Defaults to the number of rows in
scores
.- correlations
For
pca_variable_cos2()
, a numeric matrix of correlations between variables and components.- object
A
bigpca
object created bypca_bigmatrix()
,pca_stream_bigmatrix()
, or related helpers.- ...
Additional arguments passed to plotting helpers.
- digits
Number of significant digits to display when printing importance metrics.
- y
Currently unused.
- type
The plot to draw. Options include "scree" (variance explained), "contributions" (top contributing variables), "correlation_circle" (variable correlations with selected components), and "biplot" (joint display of scores and loadings).
- max_components
Maximum number of components to display in scree plots.
- component
Component index to highlight when drawing contribution plots.
- top_n
Number of variables to display in contribution plots.
- components
Length-two integer vector selecting the components for correlation circle and biplot views.
- data
Optional data source (matrix, data frame,
bigmemory::big.matrix
, or external pointer) used to compute scores for biplots whenx$scores
is unavailable.- draw
Logical; if
FALSE
, return the data prepared for the selected plot instead of drawing it.
Value
For pca_bigmatrix()
, a bigpca
object mirroring a prcomp
result
with elements sdev
, rotation
, optional center
and scale
vectors,
column_sd
, eigenvalues
, explained_variance
, cumulative_variance
, and
the sample covariance matrix. The object participates in S3 generics such as
summary()
and plot()
.
A numeric matrix of scores with rows corresponding to observations and columns to retained components.
A numeric matrix containing variable loadings for each component.
A numeric matrix of correlations between variables and components.
A numeric matrix where each entry represents the contribution of a variable to a component.
For summary.bigpca()
, a summary.bigpca
object containing
component importance measures.
Functions
pca_scores_bigmatrix()
: Project observations into principal component space while streaming from abig.matrix
.pca_variable_loadings()
: Compute variable loadings (covariances between original variables and components).pca_variable_correlations()
: Compute variable-component correlations given column standard deviations.pca_variable_contributions()
: Derive the relative contribution of each variable to the retained components.pca_individual_contributions()
: Compute the relative contribution of individual observations to each component.pca_individual_cos2()
: Compute squared cosine values measuring the quality of representation for individual observations.pca_variable_cos2()
: Compute squared cosine values measuring the quality of representation for variables.summary(bigpca)
: Summarise the component importance metrics for abigpca
result.print(summary.bigpca)
: Print the component importance summary produced bysummary.bigpca()
.plot(bigpca)
: Visualise PCA diagnostics such as scree, correlation circle, contribution, and biplot displays.
See also
bigpca, pca_scores_bigmatrix()
, pca_variable_loadings()
,
pca_variable_correlations()
, pca_variable_contributions()
, and the
streaming variants pca_stream_bigmatrix()
and companions.
Examples
set.seed(123)
mat <- bigmemory::as.big.matrix(matrix(rnorm(40), nrow = 10))
pca <- pca_bigmatrix(mat, center = TRUE, scale = TRUE, ncomp = 3)
scores <- pca_scores_bigmatrix(mat, pca$rotation, pca$center, pca$scale, ncomp = 3)
loadings <- pca_variable_loadings(pca$rotation, pca$sdev)
correlations <- pca_variable_correlations(pca$rotation, pca$sdev, pca$column_sd, pca$scale)
contributions <- pca_variable_contributions(loadings)
list(scores = scores, loadings = loadings, correlations = correlations,
contributions = contributions)
#> $scores
#> [,1] [,2] [,3]
#> [1,] 0.54325339 -0.91493729 -0.41315014
#> [2,] -0.77428157 -0.84150780 0.40023111
#> [3,] 1.77362948 0.79996096 0.05039649
#> [4,] 0.61761772 0.59591822 -0.57031347
#> [5,] 0.23218256 0.95503144 -0.72268699
#> [6,] 2.66813521 -0.45268284 0.33775240
#> [7,] -0.08448606 0.81203899 1.15087152
#> [8,] -2.43891736 0.54722825 -0.96050148
#> [9,] -0.40827305 -1.59689839 -0.36789931
#> [10,] -2.12886032 0.09584844 1.09529988
#>
#> $loadings
#> [,1] [,2] [,3]
#> [1,] 0.8567704 0.2711956 0.3439932
#> [2,] 0.7678169 -0.5050498 0.3324763
#> [3,] -0.7573969 0.3708264 0.5203357
#> [4,] 0.7581891 0.5754452 -0.2056261
#>
#> $correlations
#> [,1] [,2] [,3]
#> [1,] 0.8567704 0.2711956 0.3439932
#> [2,] 0.7678169 -0.5050498 0.3324763
#> [3,] -0.7573969 0.3708264 0.5203357
#> [4,] 0.7581891 0.5754452 -0.2056261
#>
#> $contributions
#> [,1] [,2] [,3]
#> [1,] 0.2969361 0.09224839 0.21836250
#> [2,] 0.2384786 0.31993524 0.20398568
#> [3,] 0.2320498 0.17247851 0.49962661
#> [4,] 0.2325354 0.41533786 0.07802521
#>