Getting started—Benchmarking an algorithm (generating data) with Python
For benchmarking an algorithm in Python, we use the coco-experiment
(cocoex
) Python package.
Installation (assuming Python is present)
From a system shell, execute
python -m pip install coco-experiment cocopp
Using cocoex
Depending on the algorithm, we have to chose the appropriate benchmark suite
>>> import cocoex # install(ed) with "pip install coco-experiment"
>>> cocoex.known_suite_names
'bbob',
['bbob-biobj',
'bbob-biobj-ext',
'bbob-constrained',
'bbob-largescale',
'bbob-mixint',
'bbob-biobj-mixint']
see also here.
An impractical (oversimplistic) example
To get an idea of the ultimately necessary code, we first show a working but oversimplistic example for running an experiment that benchmarks scipy.optimize.fmin
on the bbob
suite. Including boilerplate overhead code, it has 2 + 2 + 2 + 3 = 9 lines of (effective) code:
import cocoex # experimentation module
import scipy # to define the solver to be benchmarked
### input
= "bbob"
suite_name = scipy.optimize.fmin # optimizer to be benchmarked
fmin
### prepare
= cocoex.Suite(suite_name, "", "")
suite = cocoex.Observer(suite_name, "")
observer
### go
for problem in suite: # this loop may take several minutes or more
# generates the data for cocopp
problem.observe_with(observer) =False) fmin(problem, problem.initial_solution, disp
A practical example
A practical still short-ish example code that benchmarks scipy.optimize.fmin
on the bbob
suite with a given experimentation budget looks like this:
import cocoex # experimentation module
import cocopp # post-processing module (not strictly necessary)
import scipy # to define the solver to be benchmarked
### input
= "bbob"
suite_name = scipy.optimize.fmin # optimizer to be benchmarked
fmin = 1 # x dimension, increase to 3, 10, 30,...
budget_multiplier
### prepare
= cocoex.Suite(suite_name, "", "") # see https://numbbo.github.io/coco-doc/C/#suite-parameters
suite = '{}_of_{}_{}D_on_{}'.format(
output_folder __name__, fmin.__module__ or '', int(budget_multiplier), suite_name)
fmin.= cocoex.Observer(suite_name, "result_folder: " + output_folder)
observer = cocoex.ExperimentRepeater(budget_multiplier) # 0 == no repetitions
repeater = cocoex.utilities.MiniPrint()
minimal_print
### go
while not repeater.done(): # while budget is left and successes are few
for problem in suite: # loop takes 2-3 minutes x budget_multiplier
if repeater.done(problem):
continue # skip this problem
# generate data for cocopp
problem.observe_with(observer) * [0]) # for better comparability
problem(problem.dimension = fmin(problem, repeater.initial_solution_proposal(problem),
xopt =False)
disp# make sure the returned solution is evaluated
problem(xopt) # track evaluations and final_target_hit
repeater.track(problem) # show progress
minimal_print(problem)
### post-process data
+ ' bfgs!'); # re-run folders look like "...-001" etc cocopp.main(observer.result_folder
The benchmarking data are written to a subfolder in the exdata
folder (see also here). The last line postprocesses the obtained data and compares the result with BFGS (see also Getting started—Analysing and comparing results (postprocess)). The resulting figures and tables can be browsed from the file ./ppdata/index.html
.
For running the benchmarking in several batches a somewhat expanded example example_experiment_complete.py
can be found in this subfolder.
Benchmarking “your own” algorithm
For benchmarking another algorithm,
copy-paste the above code or (preferably) download the linked batching example
example_experiment_complete.py
and themkl_bugfix.py
filerevise the lines following
### input
, in particular- reassign
fmin
to run the desired algorithm - adapt the calling code
fmin(problem, ...)
respectively - some algorithms may need a maximal budget, like
problem.dimension * algorithm_budget_multiplier
as input parameter
- reassign
expect to iterate over the below points many times, start with a low budget to reduce waiting time! For first experiments, the suite can be filtered to contain lesser instances and few dimensions like
suite = cocoex.Suite(suite_name, "instances: 1-5", "dimensions: 2,3,5,10,20")
(for thebbob
suite).execute the file from a system shell like
1 # under *nix, consider using "nohup nice ..." python example_experiment_complete.py
or from an ipython shell or notebook like
%run example_experiment_complete.py 1
where the “1” is used as the above
budget_multiplier
when usingexample_experiment_complete.py
.inspect the figures (and tables) generated by
cocopp.main
, see also Getting started—Analysing and comparing results (postprocess).when using the downloaded example, ideally, add an algorithm to compare with in the call to
cocopp.main
as shown above (last line). The choice depends on the used test suite and preference.repeat from 4. with increased
budget_multiplier
argument
Practical hints
Benchmarking is usually an iterative process of repeatedly running experiments and inspecting data. The recommended approach is to successively increase the above budget_multiplier
starting with short(er) experiments (which take at first only minutes, then hours, then…) and correcting successively for bugs or setup inaccuracies on the way by carefully inspecting the resulting data.
Parameters that influence termination conditions of the algorithm may have a significant impact on the performance results. It is advisable to determine whether these parameters meet the test suite and the experimental conditions. More specifically,
- check the termination status of the algorithm, at least on a random test basis
- termination is usually advisable when the algorithm has stalled for some period of time. Long stagnations in the empirical runtime distributions indicate too late termination, whereas a still increasing distribution graph right before the budget is exhausted indicates too early termination.1
The above budget_multiplier
for the experiment is ideally not used as input budget to the algorithm because it is meant to be interpreted as experimental condition and not as algorithm parameter. For algorithms without any other termination criteria however it may be used. When the experimental budget is smaller than the budget that the algorithm uses, the latter will determine the outcome. When the algorithm terminates earlier unsuccessfully, another experiment sweep is invoked by repeater
.
In order to get some results as quickly as possible, and besides running batches in parallel, it can be useful to omit in the first experiments some instances and the largest dimension of the suite (in exchange for a larger budget_multiplier
) using the suite parameters options. Running the first 5 instances only and passing min_successes=5
to RepeatExperiment
has been found to be a useful practical approach.
Footnotes
The budget crosses in the figures indicate the actually used budget (median) which can be (quite) different from the budget given here.↩︎