How to Publish Your Data Within COCO

There are two ways to make data easily accessible to the community:

In both cases, first, the data need to be prepared. For this, for each dataset (that is, each benchmarked algorithm variant):

  1. Zip the data folder.
    A data zipfile contains a single folder under which all data from a single full experiment was collected. The folder can contain subfolders (or subsub…folders), for example of data from different (sub)batches of the complete experiment. Valid formats are .gzip or .tgz or .zip
  2. Rename the zip file.
    The name of the zipfile defines the name of the data set. The name should represent the benchmarked algorithm and may contain authors names (but rather not the name of the test suite). The name can have any length, but the first ten-or-so characters should be a meaningful algorithm abbreviation.

Propose Inclusion to the COCO Data Archive1

This option is available if one or several datasets were used in a publication or in a preprint available for example on arXiv or HAL. For this:

  1. Upload the above data zipfile(s) to a file sharing site or to an accessible URL.
  2. Ask for the inclusion into cocopp.archives by opening an issue on Github with
    • the publication reference and a link to the paper
    • a very short description of each dataset including the name of
      • the algorithm
      • the test suite
      • the zip file
    • a link to the dataset zip file(s)
    • (optional but encouraged) a link to the source code to reproduce the dataset

Host an Archive

Hosting an archive means putting one or several data zipfiles with an added “archive definition text file” online in a dedicated folder that can be accessed under an URL, like http://lq-cma.gforge.inria.fr/data-archives/lq-gecco2019. For example, any folder under a personal homepage root will do.

For this:

  1. Move the above data zipfile(s) into a clean folder, possibly with subfolders (click to see more).

    The folder name is only used as part of the URL and can be changed after creating the archive. If desired, subfolders can be created that become part of the names of the datasets under this subfolder. These can not be changed without repeating the following creation procedure:

  2. Create the archive (two lines of Python code, click to see more).

    Assume the data zipfiles are in the folder elisa_2020 or its subfolders and cocopp is installed (pip install cocopp). In a Python shell, it suffices to type:

    import cocopp
    cocopp.archiving.create('elisa_2020')

    thereby “creating” the archive locally by adding an archive definition file to the folder elisa_2020. Archives can contain other archives as subfolders or, the other way around, additional subarchives can be created in any archive subfolder. This is how https://numbbo.github.io/data-archive/ is organized.

    Alternative code (from a system shell, click to expand)

    python -c “import cocopp; cocopp.archiving.create(‘elisa_2020’)”

  3. Upload the archive folder and its content to where it can be accessed via an URL. The archive is now accessible with cocopp.archiving.get('URL') (see below example).

  4. Open an issue at the Github repository of COCO (you need to have a Github account) signalling the URL of the archive with a short description of the dataset(s) in the archive.

Example of an resulting archive

For example, the bbob-mixint archive on Github contains four datasets. The folder structure for these four datasets looks like this:

bbob-mixint/
|-- 2019-gecco-benchmark/
|   |-- CMA-ES-pycma.tgz
|   |-- DE-scipy.tgz
|   |-- RANDOMSEARCH.tgz
|   `-- TPE-hyperopt.tgz
|-- 2022/
|   `-- CMA-ESwM_Hamano.tgz
`-- coco_archive_definition.txt

and the corresponding coco_archive_definition.txt file looks like

[('2019-gecco-benchmark/CMA-ES-pycma.tgz', '0d8e7f2c77f4e43176bc9424ee8f9a0bfe8e7f66fabc95b15ea7a56ad8b1d667', 38514), 
 ('2019-gecco-benchmark/DE-scipy.tgz', '494483b1bce9185f8977ce9abf6f6eac3a660efd6fa09321e305dfb79296cd18', 35401), 
 ('2019-gecco-benchmark/RANDOMSEARCH.tgz', '14b237093fd1f393871c578b6b28b6f9a6c3d8dc8921e3bdb024b3cc7cdd287d', 26006), 
 ('2019-gecco-benchmark/TPE-hyperopt.tgz', '34fede46a00c8adef4c388565c3b759c07a7d7d83366e115632b407764e64bf6', 19633), 
 ('2022/CMA-ESwM_Hamano.tgz', 'caaf35f552822bc8376716c6af9f41aaceeebc1e63fece386fa12929c53338ca', 16406)]

with hashcodes and filesizes as additional entries.

Using a self-hosted archive in cocopp

Here’s an example of how to use a self-hosted archive using the cocopp package for postprocessing.

import cocopp

url = 'http://lq-cma.gforge.inria.fr/data-archives/lq-gecco2019'
arch = cocopp.archiving.get(url)
print(arch)  # `arch` "is" a `list` of relative filenames
['CMA-ES__2019-gecco-surr.tgz',
 'SLSQP+CMA_2019-gecco-surr.tgz',
 'SLSQP-11_2019-gecco-surr.tgz',
 'lq-CMA-ES_2019-gecco-surr.tgz']

# compare local result with data from lq-cma archive
# and from the cocopp.archives.bbob archive
cocopp.main([# 'exdata/my_local_results',  # in case
    arch.get('SLSQP-11'),  # downloads if necessary
    cocopp.archives.bbob.get_first('2010/IPOP-CMA'),
    arch.get('CMA-ES_2019')])

Footnotes

  1. Currently requires a Github account. If this is an issue for you, please contact one of the BBOBies via email.↩︎