Classification

This module provides functions to train image classification models, like Logistic Regressors and Conditional Random Fields. It provides functions for feature extraction that are largely based on the scikit-image toolbox and it provides functions for model training and optimization that are largely based on the pystruct and scikit-learn toolbox.

Models

class flamingo.classification.models.ConditionalRandomField(model, max_iter=10000, C=1.0, check_constraints=False, verbose=0, negativity_constraint=None, n_jobs=1, break_on_bad=False, show_loss_every=0, tol=0.001, inference_cache=0, inactive_threshold=1e-05, inactive_window=50, logger=None, cache_tol='auto', switch_to=None, clist=None)[source]

Conditional Random Field

Equal to pystruct.learners.OneSlackSSVM (inherited), but also takes string class labels as input.

Parameters:clist (list) – List with possible class names

Notes

Only arguments additional to pystruct.learners.OneSlackSSVM are listed.

class flamingo.classification.models.ConditionalRandomFieldPerceptron(model, max_iter=100, verbose=0, batch=False, decay_exponent=0.0, decay_t0=0.0, average=False, clist=None)[source]

Conditional Random Field

Equal to pystruct.learners.StructuredPerceptron (inherited), but also takes string class labels as input.

Parameters:clist (list) – List with possible class names

Notes

Only arguments additional to pystruct.learners.StrcturedPerceptron are listed.

class flamingo.classification.models.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None)[source]

Logistic Regressor

Equal to sklearn.linear_model.LogisticRegression (inherited), but also takes non-linearized structured data as input.

class flamingo.classification.models.LogisticRegressionRLP(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, rlp_maps=None, rlp_stats=None)[source]

Logistic Regressor with support for Relative Location Priors

Equal to flamingo.classification.models.LogisticRegression (inherited), but supports the use of Relative Location Priors as proposed by [Gould2008].

Parameters:
  • rlp_maps (dict) – Dictionary with for each class an np.ndarray with a relative location map
  • rlp_stats (dict) – Dictionary with for each relative location feature a mean and standard deviation for normalizing to standard normal space

Notes

Only arguments additional to flamingo.classification.models.LogisticRegression are listed.

[Gould2008]Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan, Daphne Koller (2008). Multi-Class Segmentation with Relative Location Prior. International Journal of Computer Vision. doi:10.1007/s11263-008-0140-x
class flamingo.classification.models.SupportVectorMachine(penalty='l2', loss='l2', dual=True, tol=0.0001, C=1.0, multi_class='ovr', fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None)[source]

Support Vecor Machine

Equal to sklearn.svm.LinearSVC (inherited), but also takes non-linearized structured data as input.

flamingo.classification.models.get_model(model_type='LR', n_states=None, n_features=None, rlp_maps=None, rlp_stats=None, C=1.0)[source]

Returns a bare model object

Parameters:

model_type (string, optional) – String indicating the type of model to be constructed. LR = Logistic Regressor (default), LR_RLP = Logistic Regressor with Relative Location Prior, SVM = Support Vector Machine, CRF = Conditional Random Field

Returns:

Bare model object

Return type:

object

Other Parameters:
 
  • n_states (integer) – Number of classes (CRF only)
  • n_features (integer) – Number of features (CRF only)
flamingo.classification.models.predict_model(model, X)[source]

Run class prediction of image with a single model

Parameters:
  • model (object) – Trained model object. Model object should have a score() method.
  • X (list or numpy.ndarray) – 2D array containing training data. Each row is a training instance, while each column is a feature.
Returns:

Class prediction for image

Return type:

np.ndarray

flamingo.classification.models.predict_models(models, sets)[source]

Run class predictions for a set of trained models and corresponding data

Parameters:
  • models (list) – List of lists with each item a trained instance of a model.
  • sets (list) – List of tuples containing data corresponding to the model list.
Returns:

List of lists containing np.ndarrays with class predictions for each image and each model.

Return type:

list

Notes

Models should be trained. Model and set lists should be of equal length. In case of N models and M training sets the models should be organized in a N-length list of M-length lists.

flamingo.classification.models.score_model(model, X_train, Y_train, X_test, Y_test)[source]

Scores a single model using a train and test set

Parameters:
  • model (object) – Trained model object. Model object should have a score() method.
  • X_train (list or numpy.ndarray) – 2D array containing training data. Each row is a training instance, while each column is a feature.
  • Y_train (list or numpy.ndarray) – Array containing class annotations for each training instance.
  • X_test (Similar to X_train, but with test data.) –
Returns:

  • score_train (float) – Training score
  • score_test (float) – Test score

flamingo.classification.models.score_models(models, train_sets, test_sets, **kwargs)[source]

Compute train/test scores for a set of trained models

Parameters:
  • models (list) – List of lists with each item a trained instance of a model.
  • train_sets (list) – List of tuples containing training data corresponding to the model list.
  • test_sets (list) – List of tuples containing test data corresponding to the model list.
  • **kwargs

    Additional arguments passed to the scoring function

Returns:

MultiIndex DataFrame containing training and test scores. Indices “model” and “set” indicate the model and training set number used. Columns “train” and “test” contain the train and test scores respectively.

Return type:

pandas.DataFrame

Notes

Models should be trained. Model and set lists should be of equal length. In case of N models and M training sets the models should be organized in a N-length list of M-length lists. The train and test sets should both be M-length lists.

Examples

>>> models = [models.get_model(model_type='LR'),
              models.get_model(model_type='CRF', n_states=5, n_features=10)]
>>> models_trained = models.train_models(models, [(X_train, Y_train)])
>>> scores = test.score_models(models, [(X_train, Y_train)], [(X_test, Y_test)])
flamingo.classification.models.train_model(model, X_train, Y_train, X_train_prior=None)[source]

Trains a single model against a single training set

Parameters:
  • model (object) – Bare model object. Model object should hava a fit() method.
  • X_train (list or numpy.ndarray) – 2D array containing training data. Each row is a training instance, while each column is a feature.
  • Y_train (list or numpy.ndarray) – Array containing class annotations for each training instance.
  • X_train_prior (list or numpy.ndarray, optional) – 2D array containing prior data. Each row is a training instance, while each column is a feature.

Notes

Models are passed by reference and trained without copying.

flamingo.classification.models.train_models(models, train_sets, prior_sets=None, callback=None)[source]

Trains a set of model against a series of training sets

Parameters:
  • models (list) – List of model objects. Model objects should have a fit() method.
  • train_sets (list) – List of tuples containing training data. The first item in a tuple is a 2D array. Each row is a training instance, while each column is a feature. The second item in a tuple is an array containing class annotations for each training instance.
  • prior_sets (list, optional) – List of 2D arrays containing prior data. Similar to first tuple item in train_sets. Each item is a 2D array. Each row is a training instance, while each column is a feature.
  • callback (function, optional) – Callback function that is called after training of a model finished. Function accepts two parameters: the model object and a tuple with location indices in the resulting model matrix.
Returns:

List of lists with each item a trained instance of one of the models.

Return type:

list

Features

flamingo.classification.features.features.linearize(features)[source]

convert all items in each matrix feature into individual features

Blocks

flamingo.classification.features.blocks.extract_blocks(data, segments, colorspace='rgb', blocks=None, blocks_params={})[source]

Extract all blocks in right order

flamingo.classification.features.blocks.list_blocks()[source]

List all block extraction functions in module

Scale invariant features

Normalizing features

Relative location prior

flamingo.classification.features.relativelocation.compute_prior(annotations, centroids, image_size, superpixel_grid, n=100)[source]

Compute relative location prior according to Gould et al. (2008)

Parameters:ds (string) – String indicating the dataset to be used.
Returns:maps – 4D panel containing the relative location prior maps: maps[<other class>][<given class>] gives a n*n dataframe representing the dimensionless image map
Return type:pandas.Panel4D
Other Parameters:
 n (integer) – Half the size of the dimensionless image map
flamingo.classification.features.relativelocation.smooth_maps(maps, sigma=2)[source]

Convolve relative location prior maps with a gaussian filter for smoothing purposes

Parameters:
  • ds (string) – String indicating the dataset to be used.
  • maps (pandas.Panel4D) – 4D panel containing the relative location prior maps: maps[<other class>][<given class>] gives a n*n dataframe representing the dimensionless image map
Returns:

maps – 4D panel containing the smoothed relative location prior maps.

Return type:

pandas.Panel4D

Other Parameters:
 

sigma (integer) – Size of the gaussian kernel that is to be convolved with the relative location prior maps

flamingo.classification.features.relativelocation.vote_image(Y, maps, centroids=None, img_size=None, winner_takes_all_mode=False)[source]

Class voting based on 1st order prediction and relative location prior maps

Parameters:
  • ds (string) – String indicating the dataset to be used.
  • Ipred (list of lists of tuple of lists of arrays with size [n_models][n_partitions](training,testing)[n_images]) – Arrays contain the 1st order prediction of the labelled images.
Returns:

  • votes (pandas.Panel) – Panel containing the votes for all classes and superpixels: maps[<class>] gives a nx*ny dataframe representing the probability of every superpixel to be <class>
  • Ivote (np.array) – Labelled image based on classes in votes with maximum probability for every superpixel

Channels

flamingo.classification.channels.add_channels(img, colorspace='rgb', methods=['gabor', 'gaussian', 'sobel'], methods_params=None)[source]

Add artificial channels to an image

Parameters:
  • img (np.ndarray) – NxMx3 array with colored image data
  • colorspace (str, optional) – String indicating colorspace of img (rgb/hsv/etc.)
  • methods (list, optional) – List of strings indicating channels to be added
  • methods_params (dict, optional) – Dictionairy with named options for channel functions

Notes

Currently implemented channels are: * gabor, with options frequencies and thetas * gaussian, with option sigmas * sobel, without any options

Returns:NxMx(3+P) array with image data with extra channels where P is the number of channels added
Return type:np.ndarray
flamingo.classification.channels.get_channel_bounds(methods=['gabor', 'gaussian', 'sobel'], methods_params=None)[source]

Get theoretical bounds of channel values

Parameters:
  • methods (list, optional) – List of strings indicating channels to be added
  • methods_params (dict, optional) – Dictionairy with named options for channel functions
Returns:

List of dicts with keys min and max indicating the theoretical boundaries of the channel values

Return type:

list

flamingo.classification.channels.get_number_channels(methods=['gabor', 'gaussian', 'sobel'], methods_params=None)[source]

Get number of artificial channels

Parameters:
  • methods (list, optional) – List of strings indicating channels to be added
  • methods_params (dict, optional) – Dictionairy with named options for channel functions
Returns:

Number of channels added when using the specified settings

Return type:

int

flamingo.classification.channels.normalize_channel(channel, channelstats)[source]

Scale channel to uint8 based on maximum possible filter response

Parameters:
  • channel (np.ndarray) – Array with channel values
  • channelstats (dict) – Dictionary with fields min and max indicating the channel value bounds of the dataset and used for normalization
Returns:

Array with normalized channel values

Return type:

np.ndarray

Test

flamingo.classification.test.aggregate_scores(scores)[source]

Aggregate model scores over training and test sets

Parameters:scores (pandas.DataFrame) – DataFrame with test scores for different models and training sets. Should have at least one level index named “model”.
Returns:DataFrame averaged over all indices except “model”.
Return type:pandas.DataFrame
flamingo.classification.test.compute_confusion_matrix(models, test_sets)[source]

Computes confusion matrix for combinations of models and training/test sets

Parameters:
  • models (list) – List of model objects. Model objects should have a fit() and score() method.
  • test_sets (list) – List of tuples containing test data corresponding to the model list.
Returns:

  • mtxs (list) – List of lists with np.ndarrays that contain confusion matrices for each combination of test set and model
  • classes (list) – List with all unique classes in test sets and the axis labels of the confusion matrices

flamingo.classification.test.compute_learning_curve(models, train_sets, test_sets, step=10, **kwargs)[source]

Computes learning curves for combinations of models and training/test sets

Parameters:
  • models (list) – List of model objects. Model objects should have a fit() and score() method.
  • train_sets (list) – List of tuples containing training data corresponding to the model list.
  • test_sets (list) – List of tuples containing test data corresponding to the model list.
  • step (integer, optional) – Step size of learning curve (default: 10)
  • **kwargs

    All other named arguments are redirected to the function models.train_models()

Returns:

  • all_scores (pandas.DataFrame) – MultiIndex DataFrame containing training and test scores. Indices “model” and “set” indicate the model and training set number used. Index “n” indicates the number of samples used during training. Columns “train” and “test” contain the train and test scores respectively.
  • all_models (list) – List with trained models. Each item corresponds to a single point on the learning curve and can consist of several models organized in a NxM matrix where N is the original number of models trained and M is the number of training sets used.

Visualization

flamingo.classification.plot.plot_learning_curve(scores, ylim=(0.75, 1), filename=None)[source]

Plots learning curves

Parameters:
  • scores (pandas.DataFrame) – DataFrame containing all scores used to plot one or more learning curves. Should at least have the index “n” indicating the number of training samples used.
  • ylim (2-tuple, optional) – Vertical axis limit for learning curve plots.
  • filename (string, optional) – If given, plots are saved to indicated file path.
Returns:

  • list – List with figure handles for all plots
  • list – List with axes handles for all plots

flamingo.classification.plot.save_figure(fig, filename, ext='', figsize=None, dpi=30, **kwargs)[source]

Save figure to file

Parameters:
  • fig (object) – Figure object
  • filename (string) – Path to output file
  • ext (string, optional) – String to be added to the filename before the file extension
  • a Matplotlib figure as an image without borders or frames. (Save) –
    Args:
    fileName (str): String that ends in .png etc.

    fig (Matplotlib figure instance): figure you want to save as the image

    Keyword Args:
    orig_size (tuple): width, height of the original image used to maintain aspect ratio.

Utils

flamingo.classification.utils.aggregate_classes(Y, aggregation=None)[source]

Aggregate class labels into a subsection of class labels

Replaces all class labels in Y with substitutes from the dictionary aggregation.

Parameters:
  • Y (tuple, list or np.ndarray) – Array containing class labels
  • class_aggregation (dict, optional) – Dictionary containing class replacements where each key is a the replacement value of all classes in the corresponding list
Returns:

Aggregated class labels

Return type:

np.ndarray

flamingo.classification.utils.check_sets(train_sets, test_sets, models=None)[source]

Checks if train sets, test sets and models have matching dimensions

Parameters:
  • train_sets (list) – List of tuples containing training data corresponding to the model list.
  • test_sets (list) – List of tuples containing test data corresponding to the model list.
  • models (list) – List of lists with each item a trained instance of a model.
Raises:

ValueError

flamingo.classification.utils.delinearize_data(Y, X)[source]

De-linearizes structured label data

Transforms linearized labell data suitible for the use with LR and SVM into structured data for the use with CRF and SSVM.

Parameters:
  • Y (list) – List with np.ndarrays for each images with label data in one dimensions (u*v)
  • X (list) – List with np.ndarray for each image with feature data in two dimensions (u*v and feature number)
Returns:

Delinearized Y data

Return type:

list

flamingo.classification.utils.get_classes(Y)[source]

Get list of unique classes in Y

Returns a list of unique classes in Y with all None values removed and regardless of the shape and type of Y.

Parameters:Y (list or np.ndarray) – List with np.ndarrays or np.ndarray with class labels
Returns:Array with unique class labels in Y not being None
Return type:np.ndarray
flamingo.classification.utils.int2labels(Y, classes=None)[source]

Transforms class numbers in string class labels

Parameters:
  • Y (list) – List with np.ndarrays with class numbers
  • classes (list, optional) – List with unique class labels possibly in Y
Returns:

Array with class labels rather than numbers

Return type:

np.ndarray

flamingo.classification.utils.labels2image(Y, seg, classes=None)[source]

Transforms class labels and segmentation into class image

Parameters:
  • Y (list) – List with np.ndarrays with class labels
  • seg (np.ndarray) – MxN array with superpixel numbers
  • classes (list, optional) – List with unique class labels possibly in Y
Returns:

Unnormalized single-channel image of class assignments

Return type:

np.ndarray

flamingo.classification.utils.labels2int(Y, classes=None)[source]

Transforms string class labels in numbers

Parameters:
  • Y (list) – List with np.ndarrays with class labels
  • classes (list, optional) – List with unique class labels possibly in Y
Returns:

Array with class numbers rather than labels

Return type:

np.ndarray

flamingo.classification.utils.linearize_data(X=None, Y=None)[source]

Linearizes structured data

Transforms structured data suitible for the use with CRF and SSVM into non-structrued data with a single dimension suitible for the use with LR or SVM.

Parameters:
  • X (list, optional) – List with np.ndarray for each image with feature data in three dimensions (u, v and feature number)
  • Y (list, optional) – List with np.ndarrays for each images with label data in two dimensions (u and v)
Returns:

Either linearized X, linearized Y or both are returned depending on the input

Return type:

np.ndarray or 2-tuple

Table Of Contents

Previous topic

Segmentation

Next topic

Calibration

This Page