API Reference¶

ActiveLearningLoop¶

class tf_al.ActiveLearningLoop(model: tf_al.wrapper.model.Model, dataset: tf_al.dataset.Dataset, query_fn, step_size: int = 1, max_rounds: Optional[int] = None, pseudo: bool = True, verbose: bool = False, **kwargs)[source]¶

Creates an active learning loop. The loop accumulates metrics during training in a dictionary that is returned.

To use with tqdm:

for i in tqdm(my_iterable):
do_something()

Parameters

model (Model) – A model wrapped into a Model type object.
dataset (Dataset) – The dataset to use (inputs, targets)
query_fn (list(str)|str) – The query function to use.
step_size (int) – How many new datapoints to add per active learning rounds. (default=1)
max_rounds (int) – The max. number of rounds to execute the active learning loop. If None apply until unlabeled data pool is empty. (default=None)
pseudo (bool) – Whether or not to execute loop in pseudo mode. Pseudo mode uses already existing labels to perform experiments. (default=True)
verbose (bool) – Wheter or not to generate logging output. (default=False)

collect_meta_params()[source]¶

Collect meta information about experiment to be written into .meta.json.

Returns: (dict) with all meta information.

has_next()[source]¶: Can another step of the active learning loop be performed?

is_done()[source]¶

The active learning has executed and is done.

Returns: (bool) whether or not the loop has executed.

run(experiment_name=None, metrics_handler=None)[source]¶

Runs the active learning loop till the end.

Parameters

experiment_name (str) – The name of the file to write to
metrics_handler (ExperimentSuitMetrics) – Metrics handler for write/read operations.

step()[source]¶: Perform a step of the active learning loop.

Oracle¶

class tf_al.Oracle(callback=None, pseudo_mode=False)[source]¶

Oracle handles the labeling process for input values.

Parameters

callback (Callback) – Function to call for user input for input values. Function receives (pool, indices)
pseudo_mode (bool) – Active learning environment in pseudo mode?

annotate(pool, indices, pseudo_mode=None)[source]¶

Create annotations for given indices and update the pool.

Parameters

pool (Pool) – The pool holding information about already annotated inputs.
indices (numpy.ndarray|list(int)) – Indices indicating which inputs to annotate.

init(pool, size, pseudo_mode=None)[source]¶

Initialize pool with given number of samples.

Parameters

pool (Pool) – holding information about already labeled targets.
size (int) – number of elements to initialize the pool with.
pseudo_mode (bool) – Whether or not pseudo labeling of inputs. (Only applicable when pool initialized with targets)

is_pseudo(mode=None)[source]¶: Is the oracle put into pseudo labeling mode? Meaning: when pool is also in pseudo mode, labels will automatically be set by using known labels.

Pool¶

class tf_al.Pool(inputs, targets=None, target_shape=None)[source]¶

Pool that holds information about labeled and unlabeld inputs. The attribute ‘indices’ holds information about the labeled inputs.

Each value of self.indices can take the following states: (value==-1) Corresponding input is labeld (value!=-1) Corresponding input is not labeled

Parameters

inputs (numpy.ndarray) – Inputs to the network.
targets (numpy.ndarray) – Already known targets, used for experimental runs. (default=None)
target_shape (tuple()) – The shape of the target, if None equals the len(inputs). (default=None)

annotate(indices, targets=None)[source]¶

Annotate inputs of given indices with given targets.

Parameters

indices (numpy.ndarray) – The indices to annotate.
targets (numpy.ndarray) – The labels to set for the given annotations.

get_indices()[source]¶

Returns the current labeling state.

Returns: (numpy.ndarray) the indices state. (-1) indicating a labeled input.

get_inputs_by(indices)[source]¶

Get inputs by indices.

Parameters: indices (numpy.ndarray) – The indices at which to access the data.
Returns: (numpy.ndarray) the data at given indices.

get_labeled_data()[source]¶

Get data and indices of datapoints which are currently labeled.

Returns: (tuple(numpy.ndarray, numpy.ndarray)) inputs and corresponding targets.

get_labeled_indices()[source]¶

Get the indices of labeled datapoints.

Returns: (numpy.ndarray) of datapoints that already has been labeled.

get_length_labeled()[source]¶

Get the number of labeled inputs.

Returns: (int) The number of labeled inputs.

get_length_unlabeled()[source]¶

Get the number of unlabeld inputs.

Returns: (int) the number of unlabeled inputs

get_targets_by(indices)[source]¶

get_unlabeled_data()[source]¶

Get data and their indices of datapoints which are currently not labeled.

Returns: (tuple(numpy.ndarray, numpy.ndarray)) The inputs and their indices in the pool

get_unlabeled_indices()[source]¶

Get all unlabeled indices for this pool.

Returns: (numpy.ndarray) an array of indices.

has_labeled()[source]¶

Has pool labeled inputs?

Returns: (bool) true or false depending whether or not there are labeled inputs.

has_unlabeled()[source]¶

Has pool any unlabeled inputs?

Returns: (bool) true or false depending whether unlabeled data exists.

init(size)[source]¶

Initialize the pool with specific number of labels. Only applicable when pool in pseudo mode.

Parameters: size (int|list|np.ndarray) – Either the number of datapoints to initialized or an explicit list or array of indices to initialize.

is_pseudo()[source]¶

Is the pool in pseudo mode? Meaning, true target labels are already known?

Returns: (bool) indicating whether or not true labels are existent.

Dataset¶

class tf_al.Dataset(inputs, targets, test=None, val=None, init_size=0, init_indices=None)[source]¶

Splits a dataset into tree parts. Train/Test/validation. The train split is used for selection of

Parameters

inputs (numpy.ndarray) – The model inputs.
targets (numpy.ndarray) – The targets, labels or values.
init_size (int) – The initial size of labeled inputs in the pool.
train_size (float|int) – Size of the train split.
test_size (float|int) – Size of the test split.
val_size (float|int) – Size of the validation split.

check_float_range(value)[source]¶

Is float in procentual range?

Parameters: value (float) – The value to perform the check on.

check_int_in_range(value)[source]¶

get_split_ratio()[source]¶

Returns: (int, int, int) the split ratio between (train, test, eval) sets.

percentage_of(total_number, part)[source]¶

Calculates the percentage a part takes from given total number.

Parameters

total_number (int) – The total number from which to calculate the percentual part.
part (int) – The part of which to calculate the percentage.

Returns

(float) representing the percentage of given part im total number.

Metrics¶

class tf_al.Metrics(base_path, keys=['accuracy', 'loss'])[source]¶

Uses the given path to create Prepares and writes metrics into a csv file.

Parameters

base_path (str) – The base path where to save the metrics.
keys (list(str)) – A list of keys.

collect(values, keys=None)[source]¶

Collect metric values from a dictionary of values.

Parameter:: values (dict): A collection of values collected during training

Returns: (dict) A subset of metrics extracted from the values.

read(filename)[source]¶

Read a .csv file of metrics.

Parameters: filename (str) – The filename to read in.
Returns: (list(dict)) a list of metric values, per trained iteration.

write(filename, values)[source]¶

Write given values into a csv file.

Parameters

filename (str) – The name of the file.
values (list(dict)) – A dictionary of metrics/values to write into a .csv file.

ExperimentSuit¶

class tf_al.ExperimentSuit(models, query_fns, dataset, step_size=1, max_rounds=None, runs=1, seed=None, no_save_state=False, acceptance_timeout=None, metrics_handler=None, metrics_accumulator=None, verbose=False)[source]¶

Performs a number of experiments. Iterating over given models and methods.

Parameters

models (list(Model)) – The models to iterate over.
query_fns (list(str)|list(AcquisitionFunction)|str|AcquisitionFunction) – A list of query functions to use
dataset (Dataset) – A dataset for experiment execution.
step_size (int) – The number of new datapoints to select after each query. (default=1)
max_rounds (int) – The max. number of rounds to query for datapoints per experiment run. If not set, perform query operation as long as there is data. (default=None)
seed (int|list(int)) – A single or multiple seeds to perform the experiment configurations over. (default=None)
no_save_state (bool) – Initial the model after each active learning round with new weights and start fresh training or load previous weight settings.
acceptance_timeout (int) – Timeout in seconds in which experiment can be proceeded or aborted, after successfull (model,query function) iteration. Setting None will automatically proceed. (default: None)
metrics_handler (ExperimentSuitMetrics) – A configured metrics handler to use. (default=None)
verbose (bool) – Printing log messages? (default=False)

start()[source]¶: Starts the experiment suit. Runs an experiment for each acquisition function and model combination.

Todo

[x] Last iteration even when no other experiments to run, prompts proceeding request. [ ] Implement run/seed implementation. Run seeds experiments with seeds n-times.

ExperimentSuitMetrics¶

class tf_al.ExperimentSuitMetrics(base_path, verbose=False)[source]¶

Uses the given path to write and read experiment metrics and meta information.

If the last segment of the path is not existent it will be created.

Creating a new object pointing to an already existing metrics path will reconstruct all metrics files that were written.

WARNING: The reconstructred files will be locked for appending and writing. Can be unlocked by using the unlock() method.

Parameters

base_path (str) – Where to save the experiments? No recursive creation of directories.
verbose (bool) – Set debugg mode?

add_dataset_meta(name, path, train_size, test_size=None, val_size=None)[source]¶

Adding meta information about the dataset used for the experiments

Parameters

name (str) – The name of the dataset.
path (str) – The path to the dataset used.
train_size (float|int) – Similiar to sklearn.model_selection.train_test_split.
test_size (float|int) – the size of the test set.
val_size (float|int) – the size of the validation set.

add_experiment_meta(experiment_name, model_name, query_fn, params)[source]¶

Adding meta information about an experiment to the meta file.

Parameters

experiment_name (str) – The name of the experiment
model_name (str) – Name of the model used
query_fn (str) – Name of the acquisition function
params (dict) – Dictionary of additional parameters to be saved. Like step_size, iterations, …

get_dataset_info()[source]¶

Read

Returns: (dict) containing meta information about the used dataset for the experiment

get_experiment_meta(experiment_name)[source]¶

Parameter:: experiment_name (self): The name of the experiment.

overwrite(experiment_name)[source]¶

Mark reconstructed experiment metrics to be overwriten.

Parameters: experiment_name (str) – Name of the experiment to mark for overwriting.

read(experiment_name)[source]¶

Read metrics from a specific experiment.

Parameters: experiment_name (str) – The experiment to read from.
Returns: (list(dict)) of accumulated experiment metrics.

read_meta()[source]¶

Reads the meta information from the .meta.json file.

Returns: (dict) of meta information.

unlock(experiment_name)[source]¶

Unlocks a reconstructed file to be available to write it again.

Parameters: experiment_name (str) – Name of the expierment to unlock for appending.

unlock_all()[source]¶: Unlocks all locked files, being able to append to files again.

write_line(experiment_name, values, filter_keys=None, filter_nan=True)[source]¶

Writes a new line into one of the experiment files. Creating the experiment file if it not already exists.

Parameter:: experiment_name (str): The name of the experiment performed. values (dict): A dictionary of values to write to the experiment file. filter_keys (list(str)): A list of str keys to filter keys of given values dictionary.

write_meta(content)[source]¶

Writes a dictionary to .meta.json.

Parameters: content (dict) – The meta information to be written to .meta.json

Active Learning documentation

API Reference¶

ActiveLearningLoop¶

Oracle¶

Pool¶

Dataset¶

Metrics¶

ExperimentSuit¶

ExperimentSuitMetrics¶

Model Wrapper¶

Model¶

McDropout¶

Utils¶

Logger¶

Tensorflow¶