Usage¶
datasets.json¶
The root element is a dictionary with fields:
author: string containing the name of the authoryear: int containing the yearinstall_dir: string containing the install directorydatasets: list of datasets objectdecompress_path: the path were files are decompressed
Definitions¶
Each dataset is described by a JSON file which. Each dataset has the following field:
ensemble:trueif contains multiple instruments,falseotherwisegroups: list of strings representing the groups contained in this dataset; the default nameallmust be always presentinstruments: the list of the instruments contained in the datasetsources:format: the format of the audio recordings of the single source-separated tracks
recording:format: the format of the audio recordings of the mixed tracks
ground_truth: N.B. each ground_truth has an ``int`` value, indicating ``0`` -> false, ``1`` -> true (manual or mechanical - Disklavier - annotation), ``2`` -> true (automatic annotation with state-of-art algorithms)[group-name]: a dictionary representing the ground-truth contained by each dataset groupmisaligned: if artificially misaligned scores are providedscore: if original scores are providedbroad_alignment: if broad_alignment scores are providedprecise_alignment: if precisely aligned scores are providedvelocities: if velocities are providedf0: if f0 values are providedsustain: if sustain values are providedsoft: if sustain values are providedsostenuto: if sustain values are provided
songs: the list of songs in the datasetcomposer: the composer family nameinstruments: list of instruments in the songrecording: dictionarypath: a list of paths to be mixed for reconstructing the full track (usually only one)
sources: dictionarypath: a list of paths to the single instrument tracks in the same order asinstruments
ground_truth: list of paths to the ground_truth json files. One ground_truth path per instrument is always provided. The order of the ground_truth path is the same of sources and of the instruments. Note that some ground_truth paths can be identical (as in PHENICX for indicating that violin1 and violin2 are playing exactly the same thing).groups: list of strings representing a group of the dataset. The groupallmust always be there; any other string is possible and should be exposed in thegroupsfield at dataset-level
install: where information for the installation process are storedurl: the url to download the dataset including the protocolpost-process: a list of shell commands to be executed to prepare the dataset; they can be lists themselves to allow the use of references to the installation directory with the syntax&install_dir: every occurrence of&install_dirwill be replaced with the value ofinstall_dirindatasets.json; final slash doesn’t matterunpack:trueif the url needs to be unpacked (untar, unzip, …)login: true if you a login is needed - not used anymore, but maybe useful in future
In general, I maintained the following principles:
- if a list of files is provided where you would logically expect one file,
you should ‘sum’ the files in the list, whatever this means according to
that type of file; this typically happens in the
ground_truthfiles. or in the recording where only the single sources are available. - all the fields can have the value ‘unknown’ to indicate that it is not available in that dataset; if you treat ‘unknown’ with the meaning of unavailable everything will be fine; however, in some cases it can mean that the data are available but that information is not documented.
Ground-truth json format¶
The ground_truth is contained in JSON files indexed in each definition file. Each ground truth file contains only one isntrument in a dictionary with the following structure:
score:onsets: onsets in seconds; if BPM is not available, timings are computed using 60 BPMoffsets: offsets in seconds; if BPM is not available, timings are computed using 60 BPMpitches: list of midi pitches in onset ascending order and range [0-127]notes: list of note names in onsets ascending ordervelocities: list of velocities in onsets ascending order and range [0-127]beats: list of times in which there was a beat in the original score; use this to reconstruct instant BPM
misaligned:onsets: onsets in secondsoffsets: offsets in secondspitches: list of midi pitches in onset ascending order and range [0-127]- ``notes`: list of note names in onsets ascending order
velocities: list of velocities in onsets ascending order and range [0-127]
precise_alignment:onsets: onsets in secondsoffsets: offsets in secondspitches: list of midi pitches in onset ascending order and range [0-127]- ``notes`: list of note names in onsets ascending order
velocities: list of velocities in onsets ascending order and range [0-127]
broad_alignment: alignment which does not consider the asynchronies between simultaneous notesonsets: onsets in secondsoffsets: offsets in secondspitches: list of midi pitches in onset ascending order and range [0-127]- ``notes`: list of note names in onsets ascending order
velocities: list of velocities in onsets ascending order and range [0-127]
missing: list of boolean values indicating which notes are missing in the score (i.e. notes that you can consider as being played but not in the score); use this value to mask the performance/scoreextra: list of boolean values indicating which notes are extra in the score (i.e. notes that you can consider as not being played but in the score); use this value to mask the performance/scoref0: list of f0 frequencies, frame by frame; duration of each frame should be 46 ms with 10 ms of hop.sustain:values: list of sustain changes; each susvalue is a number between 0 and 127, where values < 63 mean sustain OFF and values >= 63 mean sustain ON, but intermediate values can be used (e.g. for half-pedaling).times: list of floats representing the time of each sustain change in seconds.
soft:values: list of soft-pedal changes; each value is a number between 0 and 127, where values < 63 mean soft pedal OFF and values >= 63 mean soft pedal ON, but intermediate values can be used (e.g. for half-pedaling).times: list of floats representing the time of each soft pedal change in seconds.
sostenuto:values: list of sostenuto-pedal changes; each value is a number between 0 and 127, where values < 63 mean sostenuto pedal OFF and values >= 63 mean sostenuto pedal ON, but intermediate values can be used (e.g. for half-pedaling).times: list of floats representing the time of each sostenuto pedal change in seconds.
instrument: General Midi program number associated with this instrument, starting from 0. 128 indicates a drum kit (should be synthesized on channel 8 with a program number of your choice, usually 0). 255 indicates no instrument specified.
Note that json ground_truth files have extension .json.gz,
indicating that they are compressed using the gzip Python
module. Thus, you need to decompress them:
import gzip
import json
ground_truth = json.load(gzip.open(‘ground_truth.json.gz’, ‘rt’))
print(ground_truth)