Variants of the PCA helpers that stream results directly into
bigmemory::big.matrix objects, enabling file-backed workflows without
materialising dense R matrices.
Usage
pca_spca_stream_bigmatrix(
xpMat,
xpRotation = NULL,
center = TRUE,
scale = FALSE,
ncomp = -1L,
block_size = 2048L,
max_iter = 50L,
tol = 1e-04,
seed = NULL,
return_scores = FALSE,
verbose = FALSE
)
pca_scores_stream_bigmatrix(
xpMat,
xpDest,
rotation,
center,
scale,
ncomp = -1L,
block_size = 1024L
)
pca_variable_loadings_stream_bigmatrix(xpRotation, sdev, xpDest)
pca_variable_correlations_stream_bigmatrix(
xpRotation,
sdev,
column_sd,
scale = NULL,
xpDest
)
pca_variable_contributions_stream_bigmatrix(xpLoadings, xpDest)Arguments
- xpMat
Either a
bigmemory::big.matrixor an external pointer such asmat@addressthat references the sourcebig.matrix.- xpRotation
For
pca_variable_correlations_stream_bigmatrix(), abigmemory::big.matrixor external pointer containing the rotation matrix to stream from.- center
For
pca_scores_bigmatrix(), a numeric vector of column means (optional).- scale
Optional numeric vector of scaling factors returned by
pca_stream_bigmatrix()orpca_bigmatrix(). When supplied, correlations are reported on the scaled data without dividing bycolumn_sd.- ncomp
Number of components to retain. Use a non-positive value to keep all components returned by the decomposition.
- block_size
Number of rows to process per block when streaming data through BLAS kernels. Larger values improve throughput at the cost of additional memory.
- max_iter
Maximum number of block power iterations.
- tol
Convergence tolerance applied to the Frobenius norm of the difference between successive subspace projectors.
- seed
Optional integer seed used to initialise the random starting basis.
- return_scores
Logical; when
TRUE, principal component scores are computed in a final streaming pass over the data.- verbose
Logical; when
TRUE, diagnostic messages describing the iteration progress are emitted.- xpDest
Either a
big.matrixor external pointer referencing the destinationbig.matrixthat stores the computed quantity.- rotation
A rotation matrix such as the
rotationelement returned bypca_bigmatrix().- sdev
A numeric vector of component standard deviations, typically the
sdevelement frompca_bigmatrix().- column_sd
A numeric vector of variable standard deviations used to scale the correlations when the PCA was performed on unscaled data.
- xpLoadings
For
pca_variable_contributions_stream_bigmatrix(), the loadings matrix supplied as abig.matrixor external pointer.
Value
For pca_stream_bigmatrix(), the same bigpca object as
pca_bigmatrix() with the
addition of a rotation_stream_bigmatrix element referencing the populated
big.matrix when xpRotation is supplied. For
pca_spca_stream_bigmatrix(), the same scalable PCA structure as
pca_spca() with the optional pointer populated when provided.
The external pointer supplied in xpDest, invisibly.
Functions
pca_scores_stream_bigmatrix(): Stream PCA scores into a destination big.matrix.pca_variable_loadings_stream_bigmatrix(): Populate big.matrix objects with derived variable diagnostics.pca_variable_correlations_stream_bigmatrix(): Stream variable correlations into a destination big.matrix.pca_variable_contributions_stream_bigmatrix(): Stream variable contributions into a destination big.matrix.
Examples
set.seed(456)
mat <- bigmemory::as.big.matrix(matrix(rnorm(30), nrow = 6))
ncomp <- 2
rotation_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_stream <- pca_stream_bigmatrix(mat, xpRotation = rotation_store, ncomp = ncomp)
score_store <- bigmemory::big.matrix(nrow(mat), ncomp, type = "double")
pca_scores_stream_bigmatrix(
mat,
score_store,
pca_stream$rotation,
pca_stream$center,
pca_stream$scale,
ncomp = ncomp
)
#> <pointer: 0x1290c8a40>
loadings_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_variable_loadings_stream_bigmatrix(
pca_stream$rotation_stream_bigmatrix,
pca_stream$sdev,
loadings_store
)
#> <pointer: 0x109ff98c0>
correlation_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_variable_correlations_stream_bigmatrix(
pca_stream$rotation_stream_bigmatrix,
pca_stream$sdev,
pca_stream$column_sd,
pca_stream$scale,
correlation_store
)
#> <pointer: 0x109fa1190>
contribution_store <- bigmemory::big.matrix(ncol(mat), ncomp, type = "double")
pca_variable_contributions_stream_bigmatrix(
loadings_store,
contribution_store
)
#> <pointer: 0x109f8ff40>