API Reference

ActiveLearningLoop

class tf_al.ActiveLearningLoop(model: tf_al.wrapper.model.Model, dataset: tf_al.dataset.Dataset, query_fn, step_size: int = 1, max_rounds: Optional[int] = None, pseudo: bool = True, verbose: bool = False, **kwargs)[source]

Creates an active learning loop. The loop accumulates metrics during training in a dictionary that is returned.

To use with tqdm:

for i in tqdm(my_iterable):
do_something()
Parameters
  • model (Model) – A model wrapped into a Model type object.

  • dataset (Dataset) – The dataset to use (inputs, targets)

  • query_fn (list(str)|str) – The query function to use.

  • step_size (int) – How many new datapoints to add per active learning rounds. (default=1)

  • max_rounds (int) – The max. number of rounds to execute the active learning loop. If None apply until unlabeled data pool is empty. (default=None)

  • pseudo (bool) – Whether or not to execute loop in pseudo mode. Pseudo mode uses already existing labels to perform experiments. (default=True)

  • verbose (bool) – Wheter or not to generate logging output. (default=False)

collect_meta_params()[source]

Collect meta information about experiment to be written into .meta.json.

Returns

(dict) with all meta information.

has_next()[source]

Can another step of the active learning loop be performed?

is_done()[source]

The active learning has executed and is done.

Returns

(bool) whether or not the loop has executed.

run(experiment_name=None, metrics_handler=None)[source]

Runs the active learning loop till the end.

Parameters
  • experiment_name (str) – The name of the file to write to

  • metrics_handler (ExperimentSuitMetrics) – Metrics handler for write/read operations.

step()[source]

Perform a step of the active learning loop.

Oracle

class tf_al.Oracle(callback=None, pseudo_mode=False)[source]

Oracle handles the labeling process for input values.

Parameters
  • callback (Callback) – Function to call for user input for input values. Function receives (pool, indices)

  • pseudo_mode (bool) – Active learning environment in pseudo mode?

annotate(pool, indices, pseudo_mode=None)[source]

Create annotations for given indices and update the pool.

Parameters
  • pool (Pool) – The pool holding information about already annotated inputs.

  • indices (numpy.ndarray|list(int)) – Indices indicating which inputs to annotate.

init(pool, size, pseudo_mode=None)[source]

Initialize pool with given number of samples.

Parameters
  • pool (Pool) – holding information about already labeled targets.

  • size (int) – number of elements to initialize the pool with.

  • pseudo_mode (bool) – Whether or not pseudo labeling of inputs. (Only applicable when pool initialized with targets)

is_pseudo(mode=None)[source]

Is the oracle put into pseudo labeling mode? Meaning: when pool is also in pseudo mode, labels will automatically be set by using known labels.

Pool

class tf_al.Pool(inputs, targets=None, target_shape=None)[source]

Pool that holds information about labeled and unlabeld inputs. The attribute ‘indices’ holds information about the labeled inputs.

Each value of self.indices can take the following states: (value==-1) Corresponding input is labeld (value!=-1) Corresponding input is not labeled

Parameters
  • inputs (numpy.ndarray) – Inputs to the network.

  • targets (numpy.ndarray) – Already known targets, used for experimental runs. (default=None)

  • target_shape (tuple()) – The shape of the target, if None equals the len(inputs). (default=None)

annotate(indices, targets=None)[source]

Annotate inputs of given indices with given targets.

Parameters
  • indices (numpy.ndarray) – The indices to annotate.

  • targets (numpy.ndarray) – The labels to set for the given annotations.

get_indices()[source]

Returns the current labeling state.

Returns

(numpy.ndarray) the indices state. (-1) indicating a labeled input.

get_inputs_by(indices)[source]

Get inputs by indices.

Parameters

indices (numpy.ndarray) – The indices at which to access the data.

Returns

(numpy.ndarray) the data at given indices.

get_labeled_data()[source]

Get data and indices of datapoints which are currently labeled.

Returns

(tuple(numpy.ndarray, numpy.ndarray)) inputs and corresponding targets.

get_labeled_indices()[source]

Get the indices of labeled datapoints.

Returns

(numpy.ndarray) of datapoints that already has been labeled.

get_length_labeled()[source]

Get the number of labeled inputs.

Returns

(int) The number of labeled inputs.

get_length_unlabeled()[source]

Get the number of unlabeld inputs.

Returns

(int) the number of unlabeled inputs

get_targets_by(indices)[source]
get_unlabeled_data()[source]

Get data and their indices of datapoints which are currently not labeled.

Returns

(tuple(numpy.ndarray, numpy.ndarray)) The inputs and their indices in the pool

get_unlabeled_indices()[source]

Get all unlabeled indices for this pool.

Returns

(numpy.ndarray) an array of indices.

has_labeled()[source]

Has pool labeled inputs?

Returns

(bool) true or false depending whether or not there are labeled inputs.

has_unlabeled()[source]

Has pool any unlabeled inputs?

Returns

(bool) true or false depending whether unlabeled data exists.

init(size)[source]

Initialize the pool with specific number of labels. Only applicable when pool in pseudo mode.

Parameters

size (int|list|np.ndarray) – Either the number of datapoints to initialized or an explicit list or array of indices to initialize.

is_pseudo()[source]

Is the pool in pseudo mode? Meaning, true target labels are already known?

Returns

(bool) indicating whether or not true labels are existent.

Dataset

class tf_al.Dataset(inputs, targets, test=None, val=None, init_size=0, init_indices=None)[source]

Splits a dataset into tree parts. Train/Test/validation. The train split is used for selection of

Parameters
  • inputs (numpy.ndarray) – The model inputs.

  • targets (numpy.ndarray) – The targets, labels or values.

  • init_size (int) – The initial size of labeled inputs in the pool.

  • train_size (float|int) – Size of the train split.

  • test_size (float|int) – Size of the test split.

  • val_size (float|int) – Size of the validation split.

check_float_range(value)[source]

Is float in procentual range?

Parameters

value (float) – The value to perform the check on.

check_int_in_range(value)[source]
get_split_ratio()[source]
Returns

(int, int, int) the split ratio between (train, test, eval) sets.

percentage_of(total_number, part)[source]

Calculates the percentage a part takes from given total number.

Parameters
  • total_number (int) – The total number from which to calculate the percentual part.

  • part (int) – The part of which to calculate the percentage.

Returns

(float) representing the percentage of given part im total number.

Metrics

class tf_al.Metrics(base_path, keys=['accuracy', 'loss'])[source]

Uses the given path to create Prepares and writes metrics into a csv file.

Parameters
  • base_path (str) – The base path where to save the metrics.

  • keys (list(str)) – A list of keys.

collect(values, keys=None)[source]

Collect metric values from a dictionary of values.

Parameter:

values (dict): A collection of values collected during training

Returns

(dict) A subset of metrics extracted from the values.

read(filename)[source]

Read a .csv file of metrics.

Parameters

filename (str) – The filename to read in.

Returns

(list(dict)) a list of metric values, per trained iteration.

write(filename, values)[source]

Write given values into a csv file.

Parameters
  • filename (str) – The name of the file.

  • values (list(dict)) – A dictionary of metrics/values to write into a .csv file.

ExperimentSuit

class tf_al.ExperimentSuit(models, query_fns, dataset, step_size=1, max_rounds=None, runs=1, seed=None, no_save_state=False, acceptance_timeout=None, metrics_handler=None, metrics_accumulator=None, verbose=False)[source]

Performs a number of experiments. Iterating over given models and methods.

Parameters
  • models (list(Model)) – The models to iterate over.

  • query_fns (list(str)|list(AcquisitionFunction)|str|AcquisitionFunction) – A list of query functions to use

  • dataset (Dataset) – A dataset for experiment execution.

  • step_size (int) – The number of new datapoints to select after each query. (default=1)

  • max_rounds (int) – The max. number of rounds to query for datapoints per experiment run. If not set, perform query operation as long as there is data. (default=None)

  • seed (int|list(int)) – A single or multiple seeds to perform the experiment configurations over. (default=None)

  • no_save_state (bool) – Initial the model after each active learning round with new weights and start fresh training or load previous weight settings.

  • acceptance_timeout (int) – Timeout in seconds in which experiment can be proceeded or aborted, after successfull (model,query function) iteration. Setting None will automatically proceed. (default: None)

  • metrics_handler (ExperimentSuitMetrics) – A configured metrics handler to use. (default=None)

  • verbose (bool) – Printing log messages? (default=False)

start()[source]

Starts the experiment suit. Runs an experiment for each acquisition function and model combination.

Todo

[x] Last iteration even when no other experiments to run, prompts proceeding request. [ ] Implement run/seed implementation. Run seeds experiments with seeds n-times.

ExperimentSuitMetrics

class tf_al.ExperimentSuitMetrics(base_path, verbose=False)[source]

Uses the given path to write and read experiment metrics and meta information.

If the last segment of the path is not existent it will be created.

Creating a new object pointing to an already existing metrics path will reconstruct all metrics files that were written.

WARNING: The reconstructred files will be locked for appending and writing. Can be unlocked by using the unlock() method.

Parameters
  • base_path (str) – Where to save the experiments? No recursive creation of directories.

  • verbose (bool) – Set debugg mode?

add_dataset_meta(name, path, train_size, test_size=None, val_size=None)[source]

Adding meta information about the dataset used for the experiments

Parameters
  • name (str) – The name of the dataset.

  • path (str) – The path to the dataset used.

  • train_size (float|int) – Similiar to sklearn.model_selection.train_test_split.

  • test_size (float|int) – the size of the test set.

  • val_size (float|int) – the size of the validation set.

add_experiment_meta(experiment_name, model_name, query_fn, params)[source]

Adding meta information about an experiment to the meta file.

Parameters
  • experiment_name (str) – The name of the experiment

  • model_name (str) – Name of the model used

  • query_fn (str) – Name of the acquisition function

  • params (dict) – Dictionary of additional parameters to be saved. Like step_size, iterations, …

get_dataset_info()[source]

Read

Returns

(dict) containing meta information about the used dataset for the experiment

get_experiment_meta(experiment_name)[source]
Parameter:

experiment_name (self): The name of the experiment.

overwrite(experiment_name)[source]

Mark reconstructed experiment metrics to be overwriten.

Parameters

experiment_name (str) – Name of the experiment to mark for overwriting.

read(experiment_name)[source]

Read metrics from a specific experiment.

Parameters

experiment_name (str) – The experiment to read from.

Returns

(list(dict)) of accumulated experiment metrics.

read_meta()[source]

Reads the meta information from the .meta.json file.

Returns

(dict) of meta information.

unlock(experiment_name)[source]

Unlocks a reconstructed file to be available to write it again.

Parameters

experiment_name (str) – Name of the expierment to unlock for appending.

unlock_all()[source]

Unlocks all locked files, being able to append to files again.

write_line(experiment_name, values, filter_keys=None, filter_nan=True)[source]

Writes a new line into one of the experiment files. Creating the experiment file if it not already exists.

Parameter:

experiment_name (str): The name of the experiment performed. values (dict): A dictionary of values to write to the experiment file. filter_keys (list(str)): A list of str keys to filter keys of given values dictionary.

write_meta(content)[source]

Writes a dictionary to .meta.json.

Parameters

content (dict) – The meta information to be written to .meta.json

Model Wrapper

Model

class tf_al.wrapper.Model(model, config=None, name=None, model_type=None, checkpoint=None, verbose=False, checkpoint_path=None, **kwargs)[source]

Base wrapper for deep learning models to interface with the active learning environment.

_model

Tensorflow or pytorch module.

Type

tf.Model

_config

Model configuration

Type

Config

_mode

The mode the model is in ‘train’ or ‘test’/’eval’.

Type

Mode

_model_type

The model type

Type

str

_checkpoints

Created checkpoints.

Type

Checkpoint

Parameters
  • model (tf.Model) – The tensorflow model to be used.

  • config (Config) – Configuration object for the model. (default=None)

  • is_binary (bool) –

  • classification (bool) –

batch_prediction(inputs, batch_size=1, **kwargs)[source]
Parameters
  • inputs (numpy.ndarray) – Inputs going into the model

  • n_times (int) – How many times to sample from posterior?

  • batch_size (int) – In how many batches to split the data?

compile(*args, **kwargs)[source]

Compile the model if needed

disable_batch_norm()[source]

Disable batch normalization for activation of dropout during prediction.

Parameters

model (-) –

evaluate(inputs, targets, **kwargs)[source]

Evaluate a model on given input data and targets.

Parameters
  • inputs (numpy.ndarray) –

  • targets (numpy.ndarray) –

Returns

(list) A list with two values. [loss, accuracy]

fit(*args, **kwargs)[source]

Fit the model to the given data.

Parameters
  • x (numpy.ndarray) – The inputs to train the model on. (default=None)

  • y (numpy.ndarray) – The targets to fit the model to. (default=None)

  • batch_size (int) – The size of each individual batch

Returns

() a record of the trianing procedure

get_model_name(prefix=True)[source]

Returns the model name.

Parameters

prefix (bool) – Prefix the model name with model type?

Returns

(str) the model name.

get_query_fn(name)[source]

Get model specific acquisition function.

Parameters

name (str) – The name of the acquisition function to return.

Returns

(function) the acquisition function to use.

optimize(inputs, targets)[source]

Use to perform optimization during active learning loop.

predict(inputs, **kwargs)[source]

Approximate predictive distribution.

Parameter:

inputs (numpy.ndarray): The inputs for the approximation

reset(pool, dataset)[source]

Use to reset states, weights and other stuff after each active learning loop iteration.

Parameters
  • pool (Pool) – The pool managing labeled and unlabeled indices.

  • dataset (Dataset) – The dataset containting the different splits.

McDropout

class tf_al.wrapper.McDropout(model, config=None, **kwargs)[source]

Wrapper class for neural networks.

compile(*args, **kwargs)[source]

Compile the model if needed

evaluate(inputs, targets, sample_size=10, **kwargs)[source]

Evaluate a model on given input data and targets.

expectation(predictions)[source]

Calculate the mean of the distribution output distribution.

Returns

(numpy.ndarray) The expectation per datapoint

get_query_fn(name)[source]

Get model specific acquisition function.

Parameters

name (str) – The name of the acquisition function to return.

Returns

(function) the acquisition function to use.

std(predictions)[source]

Calculate the standard deviation.

Returns

(numpy.ndarray) The standard deviation per datapoint and target

variance(predictions)[source]

Calculate the variance of the distribution.

Returns

(numpy.ndarray) The variance per datapoint and target

Utils

Logger

tf_al.utils.logger.setup_logger(debug, name='Runner', log_level=10, default_log_level=50)[source]

Setup a logger for the active learning loop

Parameters
  • debug (bool) – activate logging output in console?

  • name (str) – The name of the logger to use. (default=’Runner’)

  • log_level (logging.level) – The log level to use when debug==True. (default=logging.DEBUG)

  • default_log_level (logging.level) – The default log level to use when debug==False. (default=logging.CRITICAL)

Returns

(logging.Logger) a configured logger object.

Tensorflow

tf_al.utils.tf.disable_tf_logs()[source]

Disable tensorflow log messages.

tf_al.utils.tf.set_tf_log_level(level='2')[source]

Set a log level for tensorflow logging messages.

Parameters

level (str) – The log level, one of [0, 1, 2, 3].

tf_al.utils.tf.setup_growth()[source]

Setup memory to grow. Check tf.config.experimental.set_memory_growth for reference.