darfix.decomposition.ipca.IPCA#

class darfix.decomposition.ipca.IPCA(data, chunksize, num_components=None, whiten=False, indices=None, rowvar=True)[source]#

Bases: Base

Compute PCA in chunks, using the Incremental principal component analysis implementation in scikit-learn. To compute W, partially fits the rows in chunks (reduced number of images). Then, to compute H, applies dimensionality reduction for every chunk, and horizontally stacks the projection into H.

Parameters:
  • data (array_like) – array of shape (n_samples, n_features). See rowvar.

  • chunksize (int) – Size of every group of samples to apply PCA to. PCA will be fit with arrays of shape (chunksize, n_features), where nfeatures is the number of features per sample. Depending on rowvar, the chunks will be from the rows or from the columns.

  • num_components (Union[None,int], optional) – Number of components to keep, defaults to None.

  • whiten (bool, optional) – If True, whitening is applied to the components.

  • indices (Union[None,array_like], optional) – The indices of the samples to use, defaults to None. If rowvar is False, corresponds to the indices of the features to use.

  • rowvar (bool, optional) – If rowvar is True (default), then each row represents a sample, with features in the columns. Otherwise, the relationship is transposed: each column represents a sample, while the rows contain features.

property data#
fit_transform(max_iter=1, error_step=None, W=None, H=None)[source]#

Fit to data, then transform it

Parameters:
  • max_iter (int, optional) – Maximum number of iterations, defaults to 100

  • error_step (Union[None,int], optional) – If None, error is not computed, defaults to None Else compute error for every error_step iterations.

  • compute_w (bool, optional) – When False, W is not computed, defaults to True

  • compute_h (bool, optional) – When False, H is not computed, defaults to True

frobenius_norm(chunks=200)#

Frobenius norm (||data - WH||) of a data matrix and a low rank approximation given by WH. Minimizing the Fnorm is the most common optimization criterion for matrix factorization methods. Returns: ——- frobenius norm: F = ||data - WH||

property indices#
property num_components#
property num_features#
property num_samples#
property singular_values#

The singular values corresponding to each of the selected components.

Retuns:

array, shape (n_components,)

squared_frobenius_norm(chunks=200)#

Frobenius norm (||data - WH||) of a data matrix and a low rank approximation given by WH. Minimizing the Fnorm is the most common optimization criterion for matrix factorization methods. Returns: ——- frobenius norm: F = ||data - WH||