Pytorch PCA API

Main module for PCA.

class PCA(n_components=None, *, whiten=False, svd_solver='auto', iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)

Bases: object

Principal Component Analysis (PCA).

Works with PyTorch tensors. API similar to sklearn.decomposition.PCA.

Parameters:
  • n_components (int | float | str | None, optional) –

    Number of components to keep.

    • If int, number of components to keep.

    • If float (should be between 0.0 and 1.0), the number of components to keep is determined by the cumulative percentage of variance explained by the components until the proportion is reached.

    • If “mle”, the number of components is selected using Minka’s MLE.

    • If None, all components are kept: n_components = min(n_samples, n_features).

    By default, n_components=None.

  • svd_solver (str, optional) –

    One of {‘auto’, ‘full’, ‘covariance_eigh’}

    • ’auto’: the solver is selected automatically based on the shape of the input.

    • ’full’: Run exact full SVD with torch.linalg.svd

    • ’covariance_eigh’: Compute the covariance matrix and take the eigenvalues decomposition with torch.linalg.eigh. Most efficient for small n_features and large n_samples.

    • ’randomized’: Compute the randomized SVD by the method of Halko et al.

    By default, svd_solver=’auto’.

  • whiten (bool, optional) – If True, the components_ vectors are divided by sqrt(n_samples - 1) and scaled by the singular values to ensure uncorrelated outputs with unit component-wise variances. By default, False.

  • iterated_power (int | str, optional) – Integer or ‘auto’. Number of iterations for the power method computed by randomized SVD. Must be >= 0. Ignored if svd_solver!=’randomized’. By default, ‘auto’.

  • n_oversamples (int, optional) – Additional number of random vectors to sample the range of input data in randomized solver to ensure proper conditioning. Ignored if svd_solver!=’randomized’. By default, 10.

  • power_iteration_normalizer (str, optional) – One of ‘auto’, ‘QR’, ‘LU’, ‘none’. Power iteration normalizer for randomized SVD solver. Ignored if svd_solver!=’randomized’. By default, ‘auto.

  • random_state (int | None, optional) – Seed of randomized SVD solver. Ignored if svd_solver!=’randomized’. By default, None.

components_: Optional[Tensor]

Principal axes in feature space.

explained_variance_: Optional[Tensor]

The amount of variance explained by each of the selected components.

explained_variance_ratio_: Optional[Tensor]

Percentage of variance explained by each of the selected components.

mean_: Optional[Tensor]

Mean of the input data during fit.

n_components_: Union[int, float, None, str]

Number of components to keep.

n_features_in_: int

Number of features in the input data.

n_samples_: int

Number of samples seen during fit.

noise_variance_: Optional[Tensor]

The estimated noise covariance.

singular_values_: Optional[Tensor]

Singular values corresponding to each of the selected components.

whiten: bool

Whether the data is whitened or not.

svd_solver_: str

Solver to use for the PCA computation.

fit_transform(inputs, *, determinist=True)

Fit the PCA model and apply the dimensionality reduction.

Parameters:
  • inputs (Tensor) – Input data of shape (n_samples, n_features).

  • determinist (bool, optional) – If True, the SVD solver is deterministic but the gradient cannot be computed through the PCA fit (the PCA transform is always differentiable though). If False, the SVD can be non-deterministic but the gradient can be computed through the PCA fit. By default, determinist=True.

Returns:

transformed – Transformed data.

Return type:

Tensor

fit(inputs, *, determinist=True)

Fit the PCA model and return it.

Parameters:
  • inputs (Tensor) – Input data of shape (n_samples, n_features).

  • determinist (bool, optional) – If True, the SVD solver is deterministic but the gradient cannot be computed through the PCA fit (the PCA transform is always differentiable though). If False, the SVD can be non-deterministic but the gradient can be computed through the PCA fit. By default, determinist=True.

Returns:

The PCA model fitted on the input data.

Return type:

PCA

transform(inputs, center='fit')

Apply dimensionality reduction to X.

Parameters:
  • inputs (Tensor) – Input data of shape (n_samples, n_features).

  • center (str) –

    One of ‘fit’, ‘input’ or ‘none’. Precise how to center the data.

    • ’fit’: center the data using the mean fitted during fit (default).

    • ’input’: center the data using the mean of the input data.

    • ’none’: do not center the data.

    By default, ‘fit’ (as sklearn PCA implementation)

Returns:

transformed – Transformed data of shape (n_samples, n_components).

Return type:

Tensor

inverse_transform(inputs)

De-transform transformed data.

Parameters:

inputs (Tensor) – Transformed data of shape (n_samples, n_components).

Returns:

de_transformed – De-transformed data of shape (n_samples, n_features) where n_features is the number of features in the input data before applying transform.

Return type:

Tensor

get_covariance()

Compute data covariance with the generative model.

Return type:

Tensor

get_exp_variance_diff()

Get explained variance difference (from noise).

Return type:

Tuple[Tensor, Tensor]

get_precision()

Compute data precision matrix with the generative model.

It is the inverse the covariance matrix but the method is more efficient than computing it directly.

Return type:

Tensor

score_samples(inputs)

Compute score of each sample based on log-likelihood.

Returns:

log_likelihood – Log-likelihood of each sample under the current model, of shape (n_samples,)

Return type:

Tensor

score(inputs)

Return the average score (log-likelihood) of all samples.

Return type:

Tensor

to(*args, **kwargs)

Move the model to the specified device/dtype.

Call the native PyTorch .to() method on all tensors, parameters and NN modules to move the model to the specified device and/or dtype.

Parameters:
  • args (Any) – Positional arguments to pass to the .to() method.

  • kwargs (Any) –

    Keyword arguments to pass to the .to() method. They can be: device : DeviceLikeType

    Device to move the model to.

    dtypetorch.dtype

    Data type to move the model to.

    non_blockingbool, optional

    If True, the operation will be non-blocking. By default, False.

    copy : bool, optional memory_format : torch.memory_format, optional

Return type:

None

Note

By default, the parameters dtype and device are the same as the input data dtype and device during the fit. This method is used if want you to change the dtype and/or device of the model after the fit. For instance if you fit the model on GPU and want to make inference on CPU.

Warning

Require the model to be fitted first.