PCAVectorModel

class menpo.model.PCAVectorModel(samples, centre=True, n_samples=None, max_n_components=None, inplace=True)[source]

Bases: MeanLinearVectorModel

A MeanLinearModel where components are Principal Components.

Principal Component Analysis (PCA) by eigenvalue decomposition of the data’s scatter matrix. For details of the implementation of PCA, see pca.

Parameters
  • samples (ndarray or list or iterable of ndarray) – List or iterable of numpy arrays to build the model from, or an existing data matrix.

  • centre (bool, optional) – When True (default) PCA is performed after mean centering the data. If False the data is assumed to be centred, and the mean will be 0.

  • n_samples (int, optional) – If provided then samples must be an iterator that yields n_samples. If not provided then samples has to be a list (so we know how large the data matrix needs to be).

  • max_n_components (int, optional) – The maximum number of components to keep in the model. Any components above and beyond this one are discarded.

  • inplace (bool, optional) – If True the data matrix is modified in place. Otherwise, the data matrix is copied.

component(index, with_mean=True, scale=1.0)[source]

A particular component of the model, in vectorized form.

Parameters
  • index (int) – The component that is to be returned

  • with_mean (bool, optional) – If True, the component will be blended with the mean vector before being returned. If not, the component is returned on it’s own.

  • scale (float, optional) – A scale factor that should be applied to the component. Only valid in the case where with_mean is True. The scale is applied in units of standard deviations (so a scale of 1.0 with_mean visualizes the mean plus 1 std. dev of the component in question).

Returns

component_vector ((n_features,) ndarray) – The component vector of the given index.

copy()

Generate an efficient copy of this object.

Note that Numpy arrays and other Copyable objects on self will be deeply copied. Dictionaries and sets will be shallow copied, and everything else will be assigned (no copy will be made).

Classes that store state other than numpy arrays and immutable types should overwrite this method to ensure all state is copied.

Returns

type(self) – A copy of this object

eigenvalues_cumulative_ratio()[source]

Returns the cumulative ratio between the variance captured by the active components and the total amount of variance present on the original samples.

Returns

eigenvalues_cumulative_ratio ((n_active_components,) ndarray) – Array of cumulative eigenvalues.

eigenvalues_ratio()[source]

Returns the ratio between the variance captured by each active component and the total amount of variance present on the original samples.

Returns

eigenvalues_ratio ((n_active_components,) ndarray) – The active eigenvalues array scaled by the original variance.

increment(data, n_samples=None, forgetting_factor=1.0, verbose=False)[source]

Update the eigenvectors, eigenvalues and mean vector of this model by performing incremental PCA on the given samples.

Parameters
  • samples (list of Vectorizable) – List of new samples to update the model from.

  • n_samples (int, optional) – If provided then samples must be an iterator that yields n_samples. If not provided then samples has to be a list (so we know how large the data matrix needs to be).

  • forgetting_factor ([0.0, 1.0] float, optional) – Forgetting factor that weights the relative contribution of new samples vs old samples. If 1.0, all samples are weighted equally and, hence, the results is the exact same as performing batch PCA on the concatenated list of old and new simples. If <1.0, more emphasis is put on the new samples. See [1] for details.

References

1

David Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang. “Incremental Learning for Robust Visual Tracking”. IJCV, 2007.

classmethod init_from_components(components, eigenvalues, mean, n_samples, centred, max_n_components=None)[source]

Build the Principal Component Analysis (PCA) using the provided components (eigenvectors) and eigenvalues.

Parameters
  • components ((n_components, n_features) ndarray) – The eigenvectors to be used.

  • eigenvalues ((n_components, ) ndarray) – The corresponding eigenvalues.

  • mean ((n_features, ) ndarray) – The mean vector.

  • n_samples (int) – The number of samples used to generate the eigenvectors.

  • centred (bool) – When True we assume that the data were centered before computing the eigenvectors.

  • max_n_components (int, optional) – The maximum number of components to keep in the model. Any components above and beyond this one are discarded.

classmethod init_from_covariance_matrix(C, mean, n_samples, centred=True, is_inverse=False, max_n_components=None)[source]

Build the Principal Component Analysis (PCA) by eigenvalue decomposition of the provided covariance/scatter matrix. For details of the implementation of PCA, see pcacov.

Parameters
  • C ((n_features, n_features) ndarray or scipy.sparse) – The Covariance/Scatter matrix. If it is a precision matrix (inverse covariance), then set is_inverse=True.

  • mean ((n_features, ) ndarray) – The mean vector.

  • n_samples (int) – The number of samples used to generate the covariance matrix.

  • centred (bool, optional) – When True we assume that the data were centered before computing the covariance matrix.

  • is_inverse (bool, optional) – It True, then it is assumed that C is a precision matrix ( inverse covariance). Thus, the eigenvalues will be inverted. If False, then it is assumed that C is a covariance matrix.

  • max_n_components (int, optional) – The maximum number of components to keep in the model. Any components above and beyond this one are discarded.

instance(weights, normalized_weights=False)[source]

Creates a new vector instance of the model by weighting together the components.

Parameters
  • weights ((n_weights,) ndarray or list) –

    The weightings for the first n_weights components that should be used.

    weights[j] is the linear contribution of the j’th principal component to the instance vector.

  • normalized_weights (bool, optional) – If True, the weights are assumed to be normalized w.r.t the eigenvalues. This can be easier to create unique instances by making the weights more interpretable.

Returns

vector ((n_features,) ndarray) – The instance vector for the weighting provided.

instance_vectors(weights, normalized_weights=False)[source]

Creates new vectorized instances of the model using the first components in a particular weighting.

Parameters
  • weights ((n_vectors, n_weights) ndarray or list of lists) –

    The weightings for the first n_weights components that should be used per instance that is to be produced

    weights[i, j] is the linear contribution of the j’th principal component to the i’th instance vector produced. Note that if n_weights < n_components, only the first n_weight components are used in the reconstruction (i.e. unspecified weights are implicitly 0).

  • normalized_weights (bool, optional) – If True, the weights are assumed to be normalized w.r.t the eigenvalues. This can be easier to create unique instances by making the weights more interpretable.

Returns

vectors ((n_vectors, n_features) ndarray) – The instance vectors for the weighting provided.

Raises

ValueError – If n_weights > n_components

inverse_noise_variance()[source]

Returns the inverse of the noise variance.

Returns

inverse_noise_variance (float) – Inverse of the noise variance.

Raises

ValueError – If noise_variance() == 0

mean()

Return the mean of the model.

Type

ndarray

noise_variance()[source]

Returns the average variance captured by the inactive components, i.e. the sample noise assumed in a Probabilistic PCA formulation.

If all components are active, then noise_variance == 0.0.

Returns

noise_variance (float) – The mean variance of the inactive components.

noise_variance_ratio()[source]

Returns the ratio between the noise variance and the total amount of variance present on the original samples.

Returns

noise_variance_ratio (float) – The ratio between the noise variance and the variance present in the original samples.

original_variance()[source]

Returns the total amount of variance captured by the original model, i.e. the amount of variance present on the original samples.

Returns

optional_variance (float) – The variance captured by the model.

orthonormalize_against_inplace(linear_model)[source]

Enforces that the union of this model’s components and another are both mutually orthonormal.

Note that the model passed in is guaranteed to not have it’s number of available components changed. This model, however, may loose some dimensionality due to reaching a degenerate state.

The removed components will always be trimmed from the end of components (i.e. the components which capture the least variance). If trimming is performed, n_components and n_available_components would be altered - see trim_components() for details.

Parameters

linear_model (LinearModel) – A second linear model to orthonormalize this against.

orthonormalize_inplace()

Enforces that this model’s components are orthonormalized, s.t. component_vector(i).dot(component_vector(j) = dirac_delta.

plot_eigenvalues(figure_id=None, new_figure=False, render_lines=True, line_colour='b', line_style='-', line_width=2, render_markers=True, marker_style='o', marker_size=6, marker_face_colour='b', marker_edge_colour='k', marker_edge_width=1.0, render_axes=True, axes_font_name='sans-serif', axes_font_size=10, axes_font_style='normal', axes_font_weight='normal', figure_size=(10, 6), render_grid=True, grid_line_style='--', grid_line_width=0.5)[source]

Plot of the eigenvalues.

Parameters
  • figure_id (object, optional) – The id of the figure to be used.

  • new_figure (bool, optional) – If True, a new figure is created.

  • render_lines (bool, optional) – If True, the line will be rendered.

  • line_colour (See Below, optional) –

    The colour of the lines. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • line_style ({-, --, -., :}, optional) – The style of the lines.

  • line_width (float, optional) – The width of the lines.

  • render_markers (bool, optional) – If True, the markers will be rendered.

  • marker_style (See Below, optional) –

    The style of the markers. Example options

    {``.``, ``,``, ``o``, ``v``, ``^``, ``<``, ``>``, ``+``,
     ``x``, ``D``, ``d``, ``s``, ``p``, ``*``, ``h``, ``H``,
     ``1``, ``2``, ``3``, ``4``, ``8``}
    

  • marker_size (int, optional) – The size of the markers in points.

  • marker_face_colour (See Below, optional) –

    The face (filling) colour of the markers. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • marker_edge_colour (See Below, optional) –

    The edge colour of the markers. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • marker_edge_width (float, optional) – The width of the markers’ edge.

  • render_axes (bool, optional) – If True, the axes will be rendered.

  • axes_font_name (See Below, optional) –

    The font of the axes. Example options

    {``serif``, ``sans-serif``, ``cursive``, ``fantasy``,
     ``monospace``}
    

  • axes_font_size (int, optional) – The font size of the axes.

  • axes_font_style ({normal, italic, oblique}, optional) – The font style of the axes.

  • axes_font_weight (See Below, optional) –

    The font weight of the axes. Example options

    {``ultralight``, ``light``, ``normal``, ``regular``,
     ``book``, ``medium``, ``roman``, ``semibold``,
     ``demibold``, ``demi``, ``bold``, ``heavy``,
     ``extra bold``, ``black``}
    

  • figure_size ((float, float) or None, optional) – The size of the figure in inches.

  • render_grid (bool, optional) – If True, the grid will be rendered.

  • grid_line_style ({-, --, -., :}, optional) – The style of the grid lines.

  • grid_line_width (float, optional) – The width of the grid lines.

Returns

viewer (MatplotlibRenderer) – The viewer object.

plot_eigenvalues_cumulative_ratio(figure_id=None, new_figure=False, render_lines=True, line_colour='b', line_style='-', line_width=2, render_markers=True, marker_style='o', marker_size=6, marker_face_colour='b', marker_edge_colour='k', marker_edge_width=1.0, render_axes=True, axes_font_name='sans-serif', axes_font_size=10, axes_font_style='normal', axes_font_weight='normal', figure_size=(10, 6), render_grid=True, grid_line_style='--', grid_line_width=0.5)[source]

Plot of the cumulative variance ratio captured by the eigenvalues.

Parameters
  • figure_id (object, optional) – The id of the figure to be used.

  • new_figure (bool, optional) – If True, a new figure is created.

  • render_lines (bool, optional) – If True, the line will be rendered.

  • line_colour (See Below, optional) –

    The colour of the lines. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • line_style ({-, --, -., :}, optional) – The style of the lines.

  • line_width (float, optional) – The width of the lines.

  • render_markers (bool, optional) – If True, the markers will be rendered.

  • marker_style (See Below, optional) –

    The style of the markers. Example options

    {``.``, ``,``, ``o``, ``v``, ``^``, ``<``, ``>``, ``+``,
     ``x``, ``D``, ``d``, ``s``, ``p``, ``*``, ``h``, ``H``,
     ``1``, ``2``, ``3``, ``4``, ``8``}
    

  • marker_size (int, optional) – The size of the markers in points.

  • marker_face_colour (See Below, optional) –

    The face (filling) colour of the markers. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • marker_edge_colour (See Below, optional) –

    The edge colour of the markers. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • marker_edge_width (float, optional) – The width of the markers’ edge.

  • render_axes (bool, optional) – If True, the axes will be rendered.

  • axes_font_name (See Below, optional) –

    The font of the axes. Example options

    {``serif``, ``sans-serif``, ``cursive``, ``fantasy``,
     ``monospace``}
    

  • axes_font_size (int, optional) – The font size of the axes.

  • axes_font_style ({normal, italic, oblique}, optional) – The font style of the axes.

  • axes_font_weight (See Below, optional) –

    The font weight of the axes. Example options

    {``ultralight``, ``light``, ``normal``, ``regular``,
     ``book``, ``medium``, ``roman``, ``semibold``,
     ``demibold``, ``demi``, ``bold``, ``heavy``,
     ``extra bold``, ``black``}
    

  • figure_size ((float, float) or None, optional) – The size of the figure in inches.

  • render_grid (bool, optional) – If True, the grid will be rendered.

  • grid_line_style ({-, --, -., :}, optional) – The style of the grid lines.

  • grid_line_width (float, optional) – The width of the grid lines.

Returns

viewer (MatplotlibRenderer) – The viewer object.

plot_eigenvalues_ratio(figure_id=None, new_figure=False, render_lines=True, line_colour='b', line_style='-', line_width=2, render_markers=True, marker_style='o', marker_size=6, marker_face_colour='b', marker_edge_colour='k', marker_edge_width=1.0, render_axes=True, axes_font_name='sans-serif', axes_font_size=10, axes_font_style='normal', axes_font_weight='normal', figure_size=(10, 6), render_grid=True, grid_line_style='--', grid_line_width=0.5)[source]

Plot of the variance ratio captured by the eigenvalues.

Parameters
  • figure_id (object, optional) – The id of the figure to be used.

  • new_figure (bool, optional) – If True, a new figure is created.

  • render_lines (bool, optional) – If True, the line will be rendered.

  • line_colour (See Below, optional) –

    The colour of the lines. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • line_style ({-, --, -., :}, optional) – The style of the lines.

  • line_width (float, optional) – The width of the lines.

  • render_markers (bool, optional) – If True, the markers will be rendered.

  • marker_style (See Below, optional) –

    The style of the markers. Example options

    {``.``, ``,``, ``o``, ``v``, ``^``, ``<``, ``>``, ``+``,
     ``x``, ``D``, ``d``, ``s``, ``p``, ``*``, ``h``, ``H``,
     ``1``, ``2``, ``3``, ``4``, ``8``}
    

  • marker_size (int, optional) – The size of the markers in points.

  • marker_face_colour (See Below, optional) –

    The face (filling) colour of the markers. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • marker_edge_colour (See Below, optional) –

    The edge colour of the markers. Example options

    {``r``, ``g``, ``b``, ``c``, ``m``, ``k``, ``w``}
    or
    ``(3, )`` `ndarray`
    or
    `list` of length ``3``
    

  • marker_edge_width (float, optional) – The width of the markers’ edge.

  • render_axes (bool, optional) – If True, the axes will be rendered.

  • axes_font_name (See Below, optional) –

    The font of the axes. Example options

    {``serif``, ``sans-serif``, ``cursive``, ``fantasy``,
     ``monospace``}
    

  • axes_font_size (int, optional) – The font size of the axes.

  • axes_font_style ({normal, italic, oblique}, optional) – The font style of the axes.

  • axes_font_weight (See Below, optional) –

    The font weight of the axes. Example options

    {``ultralight``, ``light``, ``normal``, ``regular``,
     ``book``, ``medium``, ``roman``, ``semibold``,
     ``demibold``, ``demi``, ``bold``, ``heavy``,
     ``extra bold``, ``black``}
    

  • figure_size ((float, float) or None, optional) – The size of the figure in inches.

  • render_grid (bool, optional) – If True, the grid will be rendered.

  • grid_line_style ({-, --, -., :}, optional) – The style of the grid lines.

  • grid_line_width (float, optional) – The width of the grid lines.

Returns

viewer (MatplotlibRenderer) – The viewer object.

project(vector)

Projects the vector onto the model, retrieving the optimal linear reconstruction weights.

Parameters

vector ((n_features,) ndarray) – A vectorized novel instance.

Returns

weights ((n_components,) ndarray) – A vector of optimal linear weights.

project_out(vector)

Returns a version of vector where all the basis of the model have been projected out.

Parameters

vector ((n_features,) ndarray) – A novel vector.

Returns

projected_out ((n_features,) ndarray) – A copy of vector with all basis of the model projected out.

project_out_vectors(vectors)

Returns a version of vectors where all the bases of the model have been projected out.

Parameters

vectors ((n_vectors, n_features) ndarray) – A matrix of novel vectors.

Returns

projected_out ((n_vectors, n_features) ndarray) – A copy of vectors with all bases of the model projected out.

project_vectors(vectors)

Projects each of the vectors onto the model, retrieving the optimal linear reconstruction weights for each instance.

Parameters

vectors ((n_samples, n_features) ndarray) – Array of vectorized novel instances.

Returns

projected ((n_samples, n_components) ndarray) – The matrix of optimal linear weights.

project_whitened(vector_instance)[source]

Projects the vector_instance onto the whitened components, retrieving the whitened linear weightings.

Parameters

vector_instance ((n_features,) ndarray) – A novel vector.

Returns

projected ((n_features,) ndarray) – A vector of whitened linear weightings

reconstruct(vector)

Project a vector onto the linear space and rebuild from the weights found.

Parameters

vector ((n_features, ) ndarray) – A vectorized novel instance to project.

Returns

reconstructed ((n_features,) ndarray) – The reconstructed vector.

reconstruct_vectors(vectors)

Projects the vectors onto the linear space and rebuilds vectors from the weights found.

Parameters

vectors ((n_vectors, n_features) ndarray) – A set of vectors to project.

Returns

reconstructed ((n_vectors, n_features) ndarray) – The reconstructed vectors.

trim_components(n_components=None)[source]

Permanently trims the components down to a certain amount. The number of active components will be automatically reset to this particular value.

This will reduce self.n_components down to n_components (if None, self.n_active_components will be used), freeing up memory in the process.

Once the model is trimmed, the trimmed components cannot be recovered.

Parameters

n_components (int >= 1 or float > 0.0 or None, optional) – The number of components that are kept or else the amount (ratio) of variance that is kept. If None, self.n_active_components is used.

Notes

In case n_components is greater than the total number of components or greater than the amount of variance currently kept, this method does not perform any action.

variance()[source]

Returns the total amount of variance retained by the active components.

Returns

variance (float) – Total variance captured by the active components.

variance_ratio()[source]

Returns the ratio between the amount of variance retained by the active components and the total amount of variance present on the original samples.

Returns

variance_ratio (float) – Ratio of active components variance and total variance present in original samples.

whitened_components()[source]

Returns the active components of the model, whitened.

Returns

whitened_components ((n_active_components, n_features) ndarray) – The whitened components.

property components

Returns the active components of the model.

Type

(n_active_components, n_features) ndarray

property eigenvalues

Returns the eigenvalues associated with the active components of the model, i.e. the amount of variance captured by each active component, sorted form largest to smallest.

Type

(n_active_components,) ndarray

property n_active_components

The number of components currently in use on this model.

Type

int

property n_components

The number of bases of the model.

Type

int

property n_features

The number of elements in each linear component.

Type

int