API Reference¶
ActiveLearningLoop¶
-
class
tf_al.
ActiveLearningLoop
(model: tf_al.wrapper.model.Model, dataset: tf_al.dataset.Dataset, query_fn, step_size: int = 1, max_rounds: Optional[int] = None, pseudo: bool = True, verbose: bool = False, **kwargs)[source]¶ Creates an active learning loop. The loop accumulates metrics during training in a dictionary that is returned.
To use with tqdm:
for i in tqdm(my_iterable): do_something()
- Parameters
model (Model) – A model wrapped into a Model type object.
dataset (Dataset) – The dataset to use (inputs, targets)
query_fn (list(str)|str) – The query function to use.
step_size (int) – How many new datapoints to add per active learning rounds. (default=1)
max_rounds (int) – The max. number of rounds to execute the active learning loop. If None apply until unlabeled data pool is empty. (default=None)
pseudo (bool) – Whether or not to execute loop in pseudo mode. Pseudo mode uses already existing labels to perform experiments. (default=True)
verbose (bool) – Wheter or not to generate logging output. (default=False)
-
collect_meta_params
()[source]¶ Collect meta information about experiment to be written into .meta.json.
- Returns
(dict) with all meta information.
-
is_done
()[source]¶ The active learning has executed and is done.
- Returns
(bool) whether or not the loop has executed.
-
run
(experiment_name=None, metrics_handler=None)[source]¶ Runs the active learning loop till the end.
- Parameters
experiment_name (str) – The name of the file to write to
metrics_handler (ExperimentSuitMetrics) – Metrics handler for write/read operations.
Oracle¶
-
class
tf_al.
Oracle
(callback=None, pseudo_mode=False)[source]¶ Oracle handles the labeling process for input values.
- Parameters
callback (Callback) – Function to call for user input for input values. Function receives (pool, indices)
pseudo_mode (bool) – Active learning environment in pseudo mode?
-
annotate
(pool, indices, pseudo_mode=None)[source]¶ Create annotations for given indices and update the pool.
- Parameters
pool (Pool) – The pool holding information about already annotated inputs.
indices (numpy.ndarray|list(int)) – Indices indicating which inputs to annotate.
-
init
(pool, size, pseudo_mode=None)[source]¶ Initialize pool with given number of samples.
- Parameters
pool (Pool) – holding information about already labeled targets.
size (int) – number of elements to initialize the pool with.
pseudo_mode (bool) – Whether or not pseudo labeling of inputs. (Only applicable when pool initialized with targets)
Pool¶
-
class
tf_al.
Pool
(inputs, targets=None, target_shape=None)[source]¶ Pool that holds information about labeled and unlabeld inputs. The attribute ‘indices’ holds information about the labeled inputs.
Each value of self.indices can take the following states: (value==-1) Corresponding input is labeld (value!=-1) Corresponding input is not labeled
- Parameters
inputs (numpy.ndarray) – Inputs to the network.
targets (numpy.ndarray) – Already known targets, used for experimental runs. (default=None)
target_shape (tuple()) – The shape of the target, if None equals the len(inputs). (default=None)
-
annotate
(indices, targets=None)[source]¶ Annotate inputs of given indices with given targets.
- Parameters
indices (numpy.ndarray) – The indices to annotate.
targets (numpy.ndarray) – The labels to set for the given annotations.
-
get_indices
()[source]¶ Returns the current labeling state.
- Returns
(numpy.ndarray) the indices state. (-1) indicating a labeled input.
-
get_inputs_by
(indices)[source]¶ Get inputs by indices.
- Parameters
indices (numpy.ndarray) – The indices at which to access the data.
- Returns
(numpy.ndarray) the data at given indices.
-
get_labeled_data
()[source]¶ Get data and indices of datapoints which are currently labeled.
- Returns
(tuple(numpy.ndarray, numpy.ndarray)) inputs and corresponding targets.
-
get_labeled_indices
()[source]¶ Get the indices of labeled datapoints.
- Returns
(numpy.ndarray) of datapoints that already has been labeled.
-
get_length_labeled
()[source]¶ Get the number of labeled inputs.
- Returns
(int) The number of labeled inputs.
-
get_length_unlabeled
()[source]¶ Get the number of unlabeld inputs.
- Returns
(int) the number of unlabeled inputs
-
get_unlabeled_data
()[source]¶ Get data and their indices of datapoints which are currently not labeled.
- Returns
(tuple(numpy.ndarray, numpy.ndarray)) The inputs and their indices in the pool
-
get_unlabeled_indices
()[source]¶ Get all unlabeled indices for this pool.
- Returns
(numpy.ndarray) an array of indices.
-
has_labeled
()[source]¶ Has pool labeled inputs?
- Returns
(bool) true or false depending whether or not there are labeled inputs.
-
has_unlabeled
()[source]¶ Has pool any unlabeled inputs?
- Returns
(bool) true or false depending whether unlabeled data exists.
Dataset¶
-
class
tf_al.
Dataset
(inputs, targets, test=None, val=None, init_size=0, init_indices=None)[source]¶ Splits a dataset into tree parts. Train/Test/validation. The train split is used for selection of
- Parameters
inputs (numpy.ndarray) – The model inputs.
targets (numpy.ndarray) – The targets, labels or values.
init_size (int) – The initial size of labeled inputs in the pool.
train_size (float|int) – Size of the train split.
test_size (float|int) – Size of the test split.
val_size (float|int) – Size of the validation split.
-
check_float_range
(value)[source]¶ Is float in procentual range?
- Parameters
value (float) – The value to perform the check on.
-
get_split_ratio
()[source]¶ - Returns
(int, int, int) the split ratio between (train, test, eval) sets.
-
percentage_of
(total_number, part)[source]¶ Calculates the percentage a part takes from given total number.
- Parameters
total_number (int) – The total number from which to calculate the percentual part.
part (int) – The part of which to calculate the percentage.
- Returns
(float) representing the percentage of given part im total number.
Metrics¶
-
class
tf_al.
Metrics
(base_path, keys=['accuracy', 'loss'])[source]¶ Uses the given path to create Prepares and writes metrics into a csv file.
- Parameters
base_path (str) – The base path where to save the metrics.
keys (list(str)) – A list of keys.
-
collect
(values, keys=None)[source]¶ Collect metric values from a dictionary of values.
- Parameter:
values (dict): A collection of values collected during training
- Returns
(dict) A subset of metrics extracted from the values.
ExperimentSuit¶
-
class
tf_al.
ExperimentSuit
(models, query_fns, dataset, step_size=1, max_rounds=None, runs=1, seed=None, no_save_state=False, acceptance_timeout=None, metrics_handler=None, metrics_accumulator=None, verbose=False)[source]¶ Performs a number of experiments. Iterating over given models and methods.
- Parameters
models (list(Model)) – The models to iterate over.
query_fns (list(str)|list(AcquisitionFunction)|str|AcquisitionFunction) – A list of query functions to use
dataset (Dataset) – A dataset for experiment execution.
step_size (int) – The number of new datapoints to select after each query. (default=1)
max_rounds (int) – The max. number of rounds to query for datapoints per experiment run. If not set, perform query operation as long as there is data. (default=None)
seed (int|list(int)) – A single or multiple seeds to perform the experiment configurations over. (default=None)
no_save_state (bool) – Initial the model after each active learning round with new weights and start fresh training or load previous weight settings.
acceptance_timeout (int) – Timeout in seconds in which experiment can be proceeded or aborted, after successfull (model,query function) iteration. Setting None will automatically proceed. (default: None)
metrics_handler (ExperimentSuitMetrics) – A configured metrics handler to use. (default=None)
verbose (bool) – Printing log messages? (default=False)
ExperimentSuitMetrics¶
-
class
tf_al.
ExperimentSuitMetrics
(base_path, verbose=False)[source]¶ Uses the given path to write and read experiment metrics and meta information.
If the last segment of the path is not existent it will be created.
Creating a new object pointing to an already existing metrics path will reconstruct all metrics files that were written.
WARNING: The reconstructred files will be locked for appending and writing. Can be unlocked by using the unlock() method.
- Parameters
base_path (str) – Where to save the experiments? No recursive creation of directories.
verbose (bool) – Set debugg mode?
-
add_dataset_meta
(name, path, train_size, test_size=None, val_size=None)[source]¶ Adding meta information about the dataset used for the experiments
- Parameters
name (str) – The name of the dataset.
path (str) – The path to the dataset used.
train_size (float|int) – Similiar to sklearn.model_selection.train_test_split.
test_size (float|int) – the size of the test set.
val_size (float|int) – the size of the validation set.
-
add_experiment_meta
(experiment_name, model_name, query_fn, params)[source]¶ Adding meta information about an experiment to the meta file.
- Parameters
experiment_name (str) – The name of the experiment
model_name (str) – Name of the model used
query_fn (str) – Name of the acquisition function
params (dict) – Dictionary of additional parameters to be saved. Like step_size, iterations, …
-
get_dataset_info
()[source]¶ Read
- Returns
(dict) containing meta information about the used dataset for the experiment
-
get_experiment_meta
(experiment_name)[source]¶ - Parameter:
experiment_name (self): The name of the experiment.
-
overwrite
(experiment_name)[source]¶ Mark reconstructed experiment metrics to be overwriten.
- Parameters
experiment_name (str) – Name of the experiment to mark for overwriting.
-
read
(experiment_name)[source]¶ Read metrics from a specific experiment.
- Parameters
experiment_name (str) – The experiment to read from.
- Returns
(list(dict)) of accumulated experiment metrics.
-
read_meta
()[source]¶ Reads the meta information from the .meta.json file.
- Returns
(dict) of meta information.
-
unlock
(experiment_name)[source]¶ Unlocks a reconstructed file to be available to write it again.
- Parameters
experiment_name (str) – Name of the expierment to unlock for appending.
-
write_line
(experiment_name, values, filter_keys=None, filter_nan=True)[source]¶ Writes a new line into one of the experiment files. Creating the experiment file if it not already exists.
- Parameter:
experiment_name (str): The name of the experiment performed. values (dict): A dictionary of values to write to the experiment file. filter_keys (list(str)): A list of str keys to filter keys of given values dictionary.
Model Wrapper¶
Model¶
-
class
tf_al.wrapper.
Model
(model, config=None, name=None, model_type=None, checkpoint=None, verbose=False, checkpoint_path=None, **kwargs)[source]¶ Base wrapper for deep learning models to interface with the active learning environment.
-
_model
¶ Tensorflow or pytorch module.
- Type
tf.Model
-
_config
¶ Model configuration
- Type
Config
-
_mode
¶ The mode the model is in ‘train’ or ‘test’/’eval’.
- Type
Mode
-
_model_type
¶ The model type
- Type
str
-
_checkpoints
¶ Created checkpoints.
- Type
Checkpoint
- Parameters
model (tf.Model) – The tensorflow model to be used.
config (Config) – Configuration object for the model. (default=None)
is_binary (bool) –
classification (bool) –
-
batch_prediction
(inputs, batch_size=1, **kwargs)[source]¶ - Parameters
inputs (numpy.ndarray) – Inputs going into the model
n_times (int) – How many times to sample from posterior?
batch_size (int) – In how many batches to split the data?
-
disable_batch_norm
()[source]¶ Disable batch normalization for activation of dropout during prediction.
- Parameters
model (-) –
-
evaluate
(inputs, targets, **kwargs)[source]¶ Evaluate a model on given input data and targets.
- Parameters
inputs (numpy.ndarray) –
targets (numpy.ndarray) –
- Returns
(list) A list with two values. [loss, accuracy]
-
fit
(*args, **kwargs)[source]¶ Fit the model to the given data.
- Parameters
x (numpy.ndarray) – The inputs to train the model on. (default=None)
y (numpy.ndarray) – The targets to fit the model to. (default=None)
batch_size (int) – The size of each individual batch
- Returns
() a record of the trianing procedure
-
get_model_name
(prefix=True)[source]¶ Returns the model name.
- Parameters
prefix (bool) – Prefix the model name with model type?
- Returns
(str) the model name.
-
get_query_fn
(name)[source]¶ Get model specific acquisition function.
- Parameters
name (str) – The name of the acquisition function to return.
- Returns
(function) the acquisition function to use.
-
McDropout¶
-
class
tf_al.wrapper.
McDropout
(model, config=None, **kwargs)[source]¶ Wrapper class for neural networks.
-
evaluate
(inputs, targets, sample_size=10, **kwargs)[source]¶ Evaluate a model on given input data and targets.
-
expectation
(predictions)[source]¶ Calculate the mean of the distribution output distribution.
- Returns
(numpy.ndarray) The expectation per datapoint
-
get_query_fn
(name)[source]¶ Get model specific acquisition function.
- Parameters
name (str) – The name of the acquisition function to return.
- Returns
(function) the acquisition function to use.
-
Utils¶
Logger¶
-
tf_al.utils.logger.
setup_logger
(debug, name='Runner', log_level=10, default_log_level=50)[source]¶ Setup a logger for the active learning loop
- Parameters
debug (bool) – activate logging output in console?
name (str) – The name of the logger to use. (default=’Runner’)
log_level (logging.level) – The log level to use when debug==True. (default=logging.DEBUG)
default_log_level (logging.level) – The default log level to use when debug==False. (default=logging.CRITICAL)
- Returns
(logging.Logger) a configured logger object.