ASMD: Audio-Score Meta Dataset¶
Installation¶
I suggest to clone this repo and to use it with python >= 3.6. If you
need to use it in multiple projects (or folders), just clone the code and
fix the install_dir
in datasets.json
, so that you can have only
one copy of the huge datasets.
The following describes how to install dependecies needed for the usage of the dataset API. I suggest to use poetry to manage different versions of python and virtual environments with an efficient dependency resolver.
During the installation, the provided ground-truth will be extracted; however, you can recreate them from scratch for tweaking parameters. The next section will explain how you can achieve this.
The easy way¶
pip install asmd
- Install
wget
if you want SMD dataset (next release will remove this dependency) - Run
python -m asmd.install
and follows the steps
The hard way (if you want to contribute)¶
Once you have cloned the repo follow these steps:
Install poetry, pyenv and python¶
- Install
wget
if you want SMD dataset (next release will remove this dependency) - Install
python 3
- Install poetry
- Install pyenv and fix your
.bashrc
(optional) pyenv install 3.6.9
(optional, recommended python >= 3.6.9)poetry new myproject
cd myproject
pyenv local 3.6.9
(optional, recommended python >= 3.6.9)
Setup new project or testing¶
git clone https://gitlab.di.unimi.it/federicosimonetta/asmd/
poetry add asmd/
- Execute
poetry run python -m asmd.install
; alternatively runpoetry shell
and thenpython -m asmd.install
- Follow the steps
Now you can start developing in the parent directory (myproject
) and
you can use from asmd import audioscoredataset as asd
.
Use poetry
to manage packages of your project.
Alternative way¶
- clone the project.
- to build the modules in place, run
poetry run python setup.py build_ext --inplace
- create an ad-hoc directory for testing anywhere:
- copy there the original
pyproject.toml
- install the needed dependencies with
poetry update
- export the
PYTHONPATH
environment variable: e.g.export PYTHONPATH="path/to/asmd"
.
- copy there the original
You’re now ready to use ASMD without downloading it from PyPI.
To create the python package, just run poetry build
.
Reproduce from scratch¶
To recreate the ground-truth in our format you have to convert the annotations
using the scirpt generate_ground_truth.py
.
N.B. You should have ``wget`` installed in your system, otherwise SMD dataset can’t be downloaded.
You can run the script with python 3
. You can also skip the already
existing datasets by using the --blacklist
and --whitelist
argument. If
you do this, their ground truth will not be added to the final archive, thus,
remember to backup the previous one and to merge the archives.
Generate misaligned data¶
If you want, you can generate misaligned data using the --train
and
--misalign
options of generate_ground_truth.py
. It will run
alignment_stats.py
, which collects data about the datasets with real
non-aligned scores and saves stats in _alignment_stats.pkl
file in the ASMD
module directory. Then, it runs generate_ground_truth.py
using the collected
statistics: it will generate misaligned data by using the same deviation
distribution of the available non-aligned data.
Note that misaligned data should be annotated as 2
in the ground_truth
value of the dataset groups description (see ASMD: Audio-Score Meta Dataset ), otherwise no
misaligned value will be added to the misaligned
field. Moreover, the
dataset group data should have precise_alignment or broad_alignment filled
by the annotation conversion step, otherwise errors can raise during the
misalignment procedure.
For more info, see python -m asmd.generate_ground_truth -h
.
A usual pipeline is:
- Generate music score data and other ground-truth except artificial one:
python -m asmd.generate_ground_truth --normal
- Train a statistical model (can skip this):
python -m asmd.generate_ground_truth --train
- Generate misalignment using the trained model (trains it if not available):
python -m asmd.generate_ground_truth --misalign
Usage¶
datasets.json¶
The root element is a dictionary with fields:
author
: string containing the name of the authoryear
: int containing the yearinstall_dir
: string containing the install directorydatasets
: list of datasets objectdecompress_path
: the path were files are decompressed
Definitions¶
Each dataset is described by a JSON file which. Each dataset has the following field:
ensemble
:true
if contains multiple instruments,false
otherwisegroups
: list of strings representing the groups contained in this dataset; the default nameall
must be always presentinstruments
: the list of the instruments contained in the datasetsources
:format
: the format of the audio recordings of the single source-separated tracks
recording
:format
: the format of the audio recordings of the mixed tracks
ground_truth
: N.B. each ground_truth has an ``int`` value, indicating ``0`` -> false, ``1`` -> true (manual or mechanical - Disklavier - annotation), ``2`` -> true (automatic annotation with state-of-art algorithms)[group-name]
: a dictionary representing the ground-truth contained by each dataset groupmisaligned
: if artificially misaligned scores are providedscore
: if original scores are providedbroad_alignment
: if broad_alignment scores are providedprecise_alignment
: if precisely aligned scores are providedvelocities
: if velocities are providedf0
: if f0 values are providedsustain
: if sustain values are providedsoft
: if sustain values are providedsostenuto
: if sustain values are provided
songs
: the list of songs in the datasetcomposer
: the composer family nameinstruments
: list of instruments in the songrecording
: dictionarypath
: a list of paths to be mixed for reconstructing the full track (usually only one)
sources
: dictionarypath
: a list of paths to the single instrument tracks in the same order asinstruments
ground_truth
: list of paths to the ground_truth json files. One ground_truth path per instrument is always provided. The order of the ground_truth path is the same of sources and of the instruments. Note that some ground_truth paths can be identical (as in PHENICX for indicating that violin1 and violin2 are playing exactly the same thing).groups
: list of strings representing a group of the dataset. The groupall
must always be there; any other string is possible and should be exposed in thegroups
field at dataset-level
install
: where information for the installation process are storedurl
: the url to download the dataset including the protocolpost-process
: a list of shell commands to be executed to prepare the dataset; they can be lists themselves to allow the use of references to the installation directory with the syntax&install_dir
: every occurrence of&install_dir
will be replaced with the value ofinstall_dir
indatasets.json
; final slash doesn’t matterunpack
:true
if the url needs to be unpacked (untar, unzip, …)login
: true if you a login is needed - not used anymore, but maybe useful in future
In general, I maintained the following principles:
- if a list of files is provided where you would logically expect one file,
you should ‘sum’ the files in the list, whatever this means according to
that type of file; this typically happens in the
ground_truth
files. or in the recording where only the single sources are available. - all the fields can have the value ‘unknown’ to indicate that it is not available in that dataset; if you treat ‘unknown’ with the meaning of unavailable everything will be fine; however, in some cases it can mean that the data are available but that information is not documented.
Ground-truth json format¶
The ground_truth is contained in JSON files indexed in each definition file. Each ground truth file contains only one isntrument in a dictionary with the following structure:
score
:onsets
: onsets in seconds; if BPM is not available, timings are computed using 60 BPMoffsets
: offsets in seconds; if BPM is not available, timings are computed using 60 BPMpitches
: list of midi pitches in onset ascending order and range [0-127]notes
: list of note names in onsets ascending ordervelocities
: list of velocities in onsets ascending order and range [0-127]beats
: list of times in which there was a beat in the original score; use this to reconstruct instant BPM
misaligned
:onsets
: onsets in secondsoffsets
: offsets in secondspitches
: list of midi pitches in onset ascending order and range [0-127]- ``notes`: list of note names in onsets ascending order
velocities
: list of velocities in onsets ascending order and range [0-127]
precise_alignment
:onsets
: onsets in secondsoffsets
: offsets in secondspitches
: list of midi pitches in onset ascending order and range [0-127]- ``notes`: list of note names in onsets ascending order
velocities
: list of velocities in onsets ascending order and range [0-127]
broad_alignment
: alignment which does not consider the asynchronies between simultaneous notesonsets
: onsets in secondsoffsets
: offsets in secondspitches
: list of midi pitches in onset ascending order and range [0-127]- ``notes`: list of note names in onsets ascending order
velocities
: list of velocities in onsets ascending order and range [0-127]
missing
: list of boolean values indicating which notes are missing in the score (i.e. notes that you can consider as being played but not in the score); use this value to mask the performance/scoreextra
: list of boolean values indicating which notes are extra in the score (i.e. notes that you can consider as not being played but in the score); use this value to mask the performance/scoref0
: list of f0 frequencies, frame by frame; duration of each frame should be 46 ms with 10 ms of hop.sustain
:values
: list of sustain changes; each susvalue is a number between 0 and 127, where values < 63 mean sustain OFF and values >= 63 mean sustain ON, but intermediate values can be used (e.g. for half-pedaling).times
: list of floats representing the time of each sustain change in seconds.
soft
:values
: list of soft-pedal changes; each value is a number between 0 and 127, where values < 63 mean soft pedal OFF and values >= 63 mean soft pedal ON, but intermediate values can be used (e.g. for half-pedaling).times
: list of floats representing the time of each soft pedal change in seconds.
sostenuto
:values
: list of sostenuto-pedal changes; each value is a number between 0 and 127, where values < 63 mean sostenuto pedal OFF and values >= 63 mean sostenuto pedal ON, but intermediate values can be used (e.g. for half-pedaling).times
: list of floats representing the time of each sostenuto pedal change in seconds.
instrument
: General Midi program number associated with this instrument, starting from 0. 128 indicates a drum kit (should be synthesized on channel 8 with a program number of your choice, usually 0). 255 indicates no instrument specified.
Note that json ground_truth files have extension .json.gz
,
indicating that they are compressed using the gzip
Python
module. Thus, you need to decompress them:
import gzip
import json
ground_truth = json.load(gzip.open(‘ground_truth.json.gz’, ‘rt’))
print(ground_truth)
Adding new datasets¶
In order to add new datasets, you have to create the correspondent definition in a JSON file. The definitions can be in any directory but you have to provide this path to the API and to the installation script (you will be asked for this, so you can’t be wrong).
The dataset files, instead, should be in the installation directory and the paths in the definition should not take into account the installation directory.
If you also want to add the new dataset to the installation procedure, you should:
- Provide a conversion function for the ground truth
- Add the conversion function with all parameters to the JSON definition (section
install>conversion
) - Rerun the
install.py
andconvert_gt.py
scripts
Adding new definitions¶
The most important thing is that one ground-truth file is provided for each instrument.
If you want to add datasets to the installation procedure, taking advantage of
the artificially misalignment, add the paths to the files (ground-truth, audio,
etc.), even if they still do not exist, because convert_gt.py
relies on
those paths to create the files. It is important to provide an index starting
with -
at the end of the path (see the other sections as example), so that
it is possible to distinguish among multiple instruments (for instance, PHENICX
provides one ground-truth file for all the violins of a song, even if there are
4 different violins). The index allows convert_gt
to better handle
different files and to pick the ground-truth wanted.
It is mandatory to provide a url, a name and so on. Also, provide a
composer and instrument list. Please, do not use new words for
instruments already existing (for instance, do not use saxophone
if
sax
already exists in other datasets).
Provide a conversion function¶
Docs available at Utilities to convert from ground-truth
The conversion function takes as input the name of the file in the original dataset. You can also use the bundled conversion functions (see docs).
- use
deepcopy(gt)
to create the output ground truth. - use decorator
@convert
to provide the input file extensions and parameters
You should consider three possible cases for creating the conversion function:
- there is a bijective relationship between instruments and ground_truth file you have, that is, you already have a convesion file per each instrument and you should just convert all of them (1-to-1 relationship)
- in your dataset, all the instruments are inside just one ground-truth file (n-to-1 relationship)
- just one ground-truth file is provided that replicates for multiple
instruments (one ground-truth for all the
violins
, as if they were a single instrument, 1-to-n relationship )
Here is a brief description of how your conversion function should work
to tackle these three different situations. - In the 1st case, you can
just output a list with only one dictionary. - In the 2nd case, you can
output a list with all the dictionary inside it, in the same order as
the ground-truth file paths you added to datasets.json
. The script
will repeatly convert them and each times it will pick a different
element of the list. - In the 3rd case, you can still output a single
element list.
If you want to output a list with only one dict, you can also output the dict itself. The decorator will take care of handling file names and of putting the output dict inside a list.
Finally, you can also use multiple conversion functions if your ground-truth is splitted among multiple files, but note that the final ground-truth is produced as the sum of all the elements of all the dictionaries created.
Add your function to the JSON definition¶
In the JSON definitions, you should declare the functions that should be
used for converting the ground-truth and their parameters. The section
where you can do this is in install>conversion
.
Here, you should put a list like the following:
[
[
"module1.function1", {
"argument1_name": argument1_value,
"argument2_name": argument2_value
}
],
[
"module2.function2", {
"argument1_name": argument1_value,
"argument2_name": argument2_value
}
]
]
Note that you have to provide the name of the function, which will be
evaluated with the eval
python function. Also, you can use any
function in any module, included the bundled functions - in this case,
use just the function name w/o the module.
Utilities to convert from ground-truth¶
-
asmd.convert_from_file.
_sort_lists
(*lists)[source]¶ Sort multiple lists in-place with reference to the first one
-
asmd.convert_from_file.
change_ext
(input_fn, new_ext, no_dot=False, remove_player=False)[source]¶ Return the input path input_fn with new_ext as extension and the part after the last ‘-’ removed. If no_dot is True, it will not add a dot before of the extension, otherwise it will add it if not present. remove_player can be used to remove the name of the player in the last part of the file name when: use this for the traditional_flute dataset; it will remove the last part after ‘_’.
-
asmd.convert_from_file.
convert
(exts, no_dot=True, remove_player=False)[source]¶ This function is designed to be used as decorators for functions which converts from a filetype to our JSON format.
Example of usage:
>>> @convert(['.myext'], no_dot=True, remove_player=False) ... def function_which_converts(...): ... pass
Parameters: - ext (*) – the possible extensions of the ground-truths to be converted, e.g. [‘.mid’, ‘.midi’]. You can also use this parameter to remove exceeding parts at the end of the filename (see from_bach10_mat and from_bach10_f0 source code)
- no_dot (*) – if True, don’t add a dot before of the extension, if False, add it if not present; this is useful if you are using the extension to remove other parts in the file name (see ext).
- remove_player (*) – if True, remove the name of the player in the last part of the file name: use this for the traditional_flute dataset; it will remove the part after the last ‘_’.
-
asmd.convert_from_file.
from_bach10_f0
(nmat_fn, sources=range(0, 4))[source]¶ Open a matlab mat file nmat_fn in the MIREX format (Bach10) for frame evaluation and convert it to our ground_truth representation. This fills: f0. sources is an iterable containing the indices of the sources to be considered, where the first source is 0. Returns a list of dictionary, one per source.
-
asmd.convert_from_file.
from_bach10_mat
(mat_fn, sources=range(0, 4))[source]¶ Open a txt file txt_fn in the MIREX format (Bach10) and convert it to our ground_truth representation. This fills: precise_alignment, pitches. sources is an iterable containing the indices of the sources to be considered, where the first source is 0. Returns a list of dictionary, one per source.
-
asmd.convert_from_file.
from_midi
(midi_fn, alignment='precise_alignment', pitches=True, velocities=True, merge=True, beats=False)[source]¶ Open a midi file midi_fn and convert it to our ground_truth representation. This fills velocities, pitches, beats, sustain, soft, sostenuto and alignment (default: precise_alignment). Returns a list containing a dictionary. alignment can also be None or False, in that case no alignment is filled. If merge is True, the returned list will contain a dictionary for each track. Beats are filled according to tempo changes.
This functions is decorated with 3 different sets of parameters:
- from_midi is the decorated version with remove_player=False
- from_midi_remove_player is the decorated version with remove_player=True
- from_midi_asap is the decorated version which accept extension ‘.score.mid’ which is used in the script to import scores from ASAP
N.B. To allow having some annotation for subgroups of a dataset, this function returns None when it cannot find the specified midi file; in this way, that file is not taken into account while merging the various annotations (e.g. asap group inside Maestro dataset)
-
asmd.convert_from_file.
from_midi_asap
(midi_fn, alignment='precise_alignment', pitches=True, velocities=True, merge=True, beats=False)¶ Open a midi file midi_fn and convert it to our ground_truth representation. This fills velocities, pitches, beats, sustain, soft, sostenuto and alignment (default: precise_alignment). Returns a list containing a dictionary. alignment can also be None or False, in that case no alignment is filled. If merge is True, the returned list will contain a dictionary for each track. Beats are filled according to tempo changes.
This functions is decorated with 3 different sets of parameters:
- from_midi is the decorated version with remove_player=False
- from_midi_remove_player is the decorated version with remove_player=True
- from_midi_asap is the decorated version which accept extension ‘.score.mid’ which is used in the script to import scores from ASAP
N.B. To allow having some annotation for subgroups of a dataset, this function returns None when it cannot find the specified midi file; in this way, that file is not taken into account while merging the various annotations (e.g. asap group inside Maestro dataset)
-
asmd.convert_from_file.
from_midi_remove_player
(midi_fn, alignment='precise_alignment', pitches=True, velocities=True, merge=True, beats=False)¶ Open a midi file midi_fn and convert it to our ground_truth representation. This fills velocities, pitches, beats, sustain, soft, sostenuto and alignment (default: precise_alignment). Returns a list containing a dictionary. alignment can also be None or False, in that case no alignment is filled. If merge is True, the returned list will contain a dictionary for each track. Beats are filled according to tempo changes.
This functions is decorated with 3 different sets of parameters:
- from_midi is the decorated version with remove_player=False
- from_midi_remove_player is the decorated version with remove_player=True
- from_midi_asap is the decorated version which accept extension ‘.score.mid’ which is used in the script to import scores from ASAP
N.B. To allow having some annotation for subgroups of a dataset, this function returns None when it cannot find the specified midi file; in this way, that file is not taken into account while merging the various annotations (e.g. asap group inside Maestro dataset)
-
asmd.convert_from_file.
from_musicnet_csv
(csv_fn, sr=44100.0)[source]¶ Open a csv file csv_fn and convert it to our ground_truth representation. This fills: broad_alignment, score, pitches. This returns a list containing only one dict. sr is the samplerate of the audio files (MusicNet csv contains the sample number as onset and offsets of each note) and it shold be a float.
N.B. MusicNet contains wav files at 44100 Hz as samplerate. N.B. Lowest in pitch in musicnet is 21, so we assume that they count pitch starting with 0 as in midi.org standard. N.B. score times are provided with BPM 60 for all the scores
-
asmd.convert_from_file.
from_phenicx_txt
(txt_fn)[source]¶ Open a txt file txt_fn in the PHENICX format and convert it to our ground_truth representation. This fills: broad_alignment.
-
asmd.convert_from_file.
from_sonic_visualizer
(gt_fn, alignment='precise_alignment')[source]¶ Takes a filename of a sonic visualizer output file exported as ‘csv’ and fills the ‘alignment’ specified
-
asmd.convert_from_file.
prototype_gt
= {'broad_alignment': {'notes': [], 'offsets': [], 'onsets': [], 'pitches': [], 'velocities': []}, 'extra': [], 'f0': [], 'instrument': 255, 'misaligned': {'notes': [], 'offsets': [], 'onsets': [], 'pitches': [], 'velocities': []}, 'missing': [], 'precise_alignment': {'notes': [], 'offsets': [], 'onsets': [], 'pitches': [], 'velocities': []}, 'score': {'beats': [], 'notes': [], 'offsets': [], 'onsets': [], 'pitches': [], 'velocities': []}, 'soft': {'times': [], 'values': []}, 'sostenuto': {'times': [], 'values': []}, 'sustain': {'times': [], 'values': []}}¶ The dictionary prototype for containing the ground_truth. use:
>>> from copy import deepcopy ... from convert_from_file import prototype_gt ... prototype_gt = deepcopy(prototype_gt)
>>> prototype_gt { "precise_alignment": { "onsets": [], "offsets": [], "pitches": [], "notes": [], "velocities": [] }, "misaligned": { "onsets": [], "offsets": [], "pitches": [], "notes": [], "velocities": [] }, "score": { "onsets": [], "offsets": [], "pitches": [], "notes": [], "velocities": [], "beats": [] }, "broad_alignment": { "onsets": [], "offsets": [], "pitches": [], "notes": [], "velocities": [] }, "f0": [], "soft": { "values": [], "times": [] }, "sostenuto": { "values": [], "times": [] }, "sustain": { "values": [], "times": [] }, "instrument": 255, }
Note:
pitches
,velocities
,sustain
,sostenuto
,soft
, and (if available)instrument
must be in range [0, 128)
Python API¶
Intro¶
This project also provides a few API for filtering the datasets according to some specified prerequisites and getting the data in a convenient format.
Python¶
Import audioscoredataset
and create a Dataset
object, giving the
path of the datasets.json
file in this directory as argument to the
constructor. Then, you can use the filter
method to filter data
according to your needs (you can also re-filter them later without
reloading datasets.json
).
You will find a value paths
in your Dataset
instance containing
the correct paths to the files you are requesting.
Moreover, the method get_item
returns an array of audio values and a
structured_array representing the ground_truth as loaded from the json
file.
Example:
from asmd import asmd
d = asmd.Dataset()
# d = asd.Dataset(paths=['path_to_my_definitions', 'path_to_default_definitions'])
d.filter(instrument='piano', ensemble=False, composer='Mozart', ground_truth=['precise_alignment'])
audio_array, sources_array, ground_truth_array = d.get_item(1)
audio_array = d.get_mix(2)
source_array = d.get_source(2)
ground_truth_list = d.get_gts(2)
mat = get_score_mat(2, score_type=['precise_alignment'])
Note that you can inherit from asmd.Dataset
and
torch.utils.data.Dataset
to create a PyTorch compatible dataset which only
load audio files when thay are accessed. You will just need to implement the
__getitem__
method.
Documentation¶
-
asmd.asmd.
load_definitions
(path)[source]¶ Given a path to a directory, returns a list of dictionaries containing the definitions found in that directory (not recursive search)
-
class
asmd.asmd.
Dataset
(paths=['default_path'], metadataset_path=['default_path'])[source]¶ -
__init__
(definitions=['/home/docs/checkouts/readthedocs.org/user_builds/asmd/checkouts/latest/asmd/definitions/'], metadataset_path='/home/docs/checkouts/readthedocs.org/user_builds/asmd/checkouts/latest/asmd/datasets.json', empty=False)[source]¶ Load the dataset description and populate the paths
This object has a fundamental field named paths which is a list; each entry contain another list of 3 values representing thepath to, respectively: mixed recording, signle-sources audio, ground-truth file per each source
Parameters: - definitions (*) – paths where json dataset definitions are stored; if empty, the default definitions are used
- metadataset_path (*) – the path were the generic information about where this datetimeis installed are stored
- empty (*) – if True, no definition is loaded
Returns: instance of the class
Return type: * AudioScoreDataset
-
__weakref__
¶ list of weak references to the object (if defined)
-
get_audio
(idx, sources=None)[source]¶ Get the mixed audio of certain sources or of the mix
Parameters: - idx (int) – The index of the song to retrieve.
- sources (list or None) – A list containing the indices of sources to be mixed and returned. If None, no sources will be mixed and the global mix will be returned.
Returns: - numpy.ndarray – A (n x 1) array which represents the mixed audio.
- int – The sampling rate of the audio array
-
get_audio_data
(idx)[source]¶ Returns audio data of a specific item without loading the full audio.
N.B. see essentia.standard.MetadataReader!
Returns: - list of tuples – each tuple is referred to a source and contains the following
- int – duration in seconds
- int – bitrate (kb/s)
- int – sample rate
- int – number of channels
-
get_beats
(idx)[source]¶ Get a list of beat position in seconds, to be used together with the score data.
Parameters: idx (int) – The index of the song to retrieve. Returns: each row contains beat positions of each ground truth Return type: numpy.ndarray
-
get_gts
(idx)[source]¶ Return the ground-truth of the wanted item
Parameters: idx (int) – the index of the wanted item Returns: list of dictionary representing the ground truth of each single source Return type: list
-
get_gts_paths
(idx) → List[str][source]¶ Return paths to the ground-truth files, one for each source
Returns list of string
-
get_initial_bpm
(idx) → Optional[float][source]¶ Return the initial bpm of the first source if score alignment type is available at index idx, otherwise returns None
-
get_item
(idx)[source]¶ Returns the mixed audio, sources and ground truths of the specified item.
Parameters: idx (int) – the index of the wanted item Returns: - numpy.ndarray – audio of the mixed sources
- list – a list of numpy.ndarray representing the audio of each source
- list – list of dictionary representing the ground truth of each single source
-
get_missing_extra_notes
(idx, kind: str) → List[numpy.ndarray][source]¶ Returns the missing or extra notes of a song. For each source, an array of boolean values is returned. If you want the missing/extra notes for the whole song, use
dataset_utils.get_score_mat
kind can be ‘extra’ or ‘missing’
-
get_mix
(idx, sr=None)[source]¶ Returns the audio array of the mixed song
Parameters: - idx (int) – the index of the wanted item
- sr (int or None) – the sampling rate at which the audio will be returned (if needed, a resampling is performed). If None, no resampling is performed
Returns: - mix (numpy.ndarray) – the audio waveform of the mixed song
- int – The sampling rate of the audio array
-
get_mix_paths
(idx) → List[str][source]¶ Return paths to the mixed recording if available
Returns list of string (usually only one)
-
get_pianoroll
(idx, score_type=['misaligned'], resolution=0.25, onsets=False, velocity=True)[source]¶ Create pianoroll from list of pitches, onsets and offsets (in this order).
Parameters: - idx (int) – The index of the song to retrieve.
- score_type (list of str) – The key to retrieve the list of notes from the ground_truths. see chose_score_type for explanation
- resolution (float) – The duration of each column (in seconds)
- onsets (bool) – If True, the value ‘-1’ is put sn each onset
- velocity (bool) – if True, values of each note is the velocity (except the first frame if onsets is used)
Returns: A (128 x n) array where rows represent pitches and columns are time instants sampled with resolution provided as argument.
Return type: numpy.ndarray
Note
In the midi.org standard, pitches start counting from 0; however, sometimes people use to count pitches from 1. Depending on the dataset that you are using, verify how pitches are counted. In the ASMD default ground-truths, pitches are set with 0-based indexing.
In case your dataset does not start counting pitches from 0, you should correct the output of this function.
-
get_score_duration
(idx)[source]¶ Returns the duration of the most aligned score available for a specific item
-
get_source
(idx)[source]¶ Returns the sources at the specified index
Parameters: idx (int) – the index of the wanted item Returns: - list – a list of numpy.ndarray representing the audio of each source
- int – The sampling rate of the audio array
-
get_sources_paths
(idx) → List[str][source]¶ Return paths to single-sources audio recordings, one for each audio
Returns list of string
-
idx_chunk_to_whole
(name, idx)[source]¶ Given a dataset name and an idx or a list of idx relative to the input dataset, returns the idx relative to this whole dataset.
Use this method if you need, for instance the index of a song for which you have the index in a single dataset.
-
parallel
(func, *args, **kwargs)[source]¶ Applies a function to all items in paths in parallel using joblib.Parallel.
You can pass any argument to joblib.Parallel by using keyword arguments.
Parameters: func (callable) – the function that will be called; it must accept two arguments that are the index of the song and the dataset. Then, it can accept all args and kwargs that are passed to this function:
>>> def myfunc(i, dataset, pinco, pal=lino): ... # do not use `filter` and `chunks` here ... print(pinco, pal) ... print(dataset.paths[i]) ... marco, etto = 4, 5 ... d = Dataset().filter(datasets='Bach10') ... d.parallel(myfunc, marco, n_jobs=8, pal=etto)
filter and chunks shouldn’t be used.
Returns: The list of objects returned by each func Return type: list
-
-
asmd.dataset_utils.
_check_consistency
(dataset, fix=False)[source]¶ Checks that is a dataset is included, then at least one of its songs is included and that if a dataset is excluded, then all of its songs are excluded.
If fix is True, if fixes the dataset inclusion, otherwise raise a RuntimeError
-
asmd.dataset_utils.
_compare_dataset
(compare_func, dataset1, dataset2, **kwargs)[source]¶ Returns a new dataset where each song and dataset are included only if compare_func is True for each corresponding couplke of songs and datasets
-
asmd.dataset_utils.
choice
(dataset, p=[0.6, 0.2, 0.2], random_state=None)[source]¶ Returns N non-overlapping datasets randomply sampled from dataset, where N is len(p); each song belong to a dataset according to the distribution probability p. Note that p is always normalized to sum to 1.
random_state is an int or a np.random.RandomState object.
-
asmd.dataset_utils.
chose_score_type
(score_type, gts)[source]¶ Return the proper score type according to the following rules
Parameters: - score_type (list of str) – The key to retrieve the list of notes from the ground_truths. If multiple keys are provided, only one is retrieved by using the following criteria: if there is precise_alignment in the list of keys and in the ground truth, use that; otherwise, if there is broad_alignment in the list of keys and in the ground truth, use that; otherwise if misaligned in the list of keys and in the ground truth, use use score.
- gts (list of dict) – The list of ground truths from which you want to chose a score_type
-
asmd.dataset_utils.
complement
(dataset, **kwargs)[source]¶ Takes one dataset and returns a new dataset representing the complement of the input
This functions calls filter to populate the paths and returns them woth all the sources. However, you can pass any argument to filter, e.g. the sources argument
-
asmd.dataset_utils.
filter
(dataset, instruments=[], ensemble=None, mixed=True, sources=False, all=False, composer='', datasets=[], groups=[], ground_truth=[], copy=False)[source]¶ Filter the paths of the songs which accomplish the filter described in kwargs. If this dataset was already fltered, only filters those paths that are already included.
For advanced usage:
So that a dataset can be filtered, it must have the following keys:
- songs
- name
- included
All the attributes are checked at the song level, except for:
- ensemble: this is checked at the dataset-level (i.e. each dataset can be for ensemble or not) This may change in future releases
- ground_truth: this is checked at group level (i.e. each subgroup can have different annotations)
Similarly, each song must have the key
included
and optionally the other keys that you want to filter, as described by the arguments of this function.Parameters: - instruments (list of str) – a list of strings representing the instruments that you want to select (exact match with song)
- ensemble (bool) – if loading songs which are composed for an ensemble of instrument. If None, ensemble field will not be checked and will select both (default None)
- mixed (bool) – if returning the mixed track for ensemble song (default True )
- sources (bool) – if returning the source track for ensemble recording which provide it (default False )
- all (bool) – only valid if sources is True : if True , all sources (audio and ground-truth) are returned, if False, only the first target instrument is returned. Default False.
- composer (string) – the surname of the composer to filter
- groups (list of strings) – a list of strings containing the name of the groups that you want to retrieve with a logic ‘AND’ among them. If empty, all groups are used. Example of groups are: ‘train’, ‘validation’, ‘test’. The available groups depend on the dataset. Only Maestro dataset supported for now.
- datasets (list of strings) – a list of strings containing the name of the datasets to be used. If empty, all datasets are used. See License for the list of default datasets. The matching is case insensitive.
- ground_truth (dict[str, int]) – a dictionary (string, int) representing the type of ground-truths needed (logical AND among list elements). Each entry has the form needed_ground_truth_type as key and level_of_truth as value, where needed_ground_truth_type is the key of the ground_truth dictionary and level_of_truth is an int ranging from 0 to 2 (0->False, 1->True (manual annotation), 2->True(automatic annotation)). If only part of a dataset contains a certain ground-truth type, you should use the group attribute to only select those songs.
- copy (bool) – If True, a new Dataset object is returned, and the calling one is leaved untouched
Returns: - The input dataset as modified (d = Dataset().filter(…))
- If
copy
is True, return a new Dataset object.
-
asmd.dataset_utils.
get_pedaling_mat
(dataset, idx, frame_based=False, winlen=0.046, hop=0.01)[source]¶ Get data about pedaling
Parameters: - idx (int) – The index of the song to retrieve.
- frame_based (bool) – If True, the output will contain one row per frame, otherwise one row per control changes event. Frames are deduced from winlen and hop.
- winlen (float) – The duration of a frame in seconds; only used if frame_based is True.
- hop (float) – The amount of hop-size in seconds; only used if frame_based is True.
Returns: list of 2d-arrays each listing all the control changes events in a track. Rows represent control changes or frames (according to frame_based_option) while columns represent (time, sustain value, sostenuto value, soft value).
If frame_based is used, time is the central time of the frame and frames are computed using the most aligned score available for this item.
If frame_based is False, value -1 is used for pedaling type not affected in a certain control change (i.e. a control change affects one type of pedaling, so the other two will have value -1).
The output is sorted by time.
Return type: list[np.ndarry]
-
asmd.dataset_utils.
get_score_mat
(dataset, idx, score_type=['misaligned'], return_notes='')[source]¶ Get the score of a certain score, with times of score_type
Parameters: - idx (int) – The index of the song to retrieve.
- score_type (list of str) – The key to retrieve the list of notes from the ground_truths. see chose_score_type for explanation
- return_notes (str) –
'missing'
,'extra'
or'both'
; the notes that will be returned together with the score; seeasmd.asmd.Dataset.get_missing_extra_notes
for more info
Returns: - numpy.ndarray – A (n x 6) array where columns represent pitches, onsets (seconds), offsets (seconds), velocities, MIDI program instrument and number of the instrument. Ordered by onsets. If some information is not available, value -255 is used. The array is sorted by onset, pitch and offset (in this order)
- numpy.ndarray – A boolean array with True if the note is missing or extra (depending on
return_notes
); only ifreturn_notes is not None
- numpy.ndarray – Another boolean array with True if the note is missing or extra (depending on
return_notes
); only ifreturn_notes == 'both'
-
asmd.dataset_utils.
intersect
(*datasets, **kwargs)[source]¶ Takes datasets and returns a new dataset representing the intersection of them The datasets must have the same order in the datasets and songs (e.g. two datasets initialized in the same way and only filtered)
This functions calls filter to populate the paths and returns them woth all the sources. However, you can pass any argument to filter, e.g. the sources argument
-
asmd.dataset_utils.
union
(*datasets, **kwargs)[source]¶ Takes datasets and returns a new dataset representing the union of them The datasets must have the same order in the datasets and songs (e.g. two datasets initialized in the same way and only filtered)
This functions calls filter to populate the paths and returns them woth all the sources. However, you can pass any argument to filter, e.g. the sources argument
General Utilities¶
-
asmd.utils.
f0_to_midi_pitch
(f0)[source]¶ Return a midi pitch (in 0-127) given a frequency value in Hz
-
asmd.utils.
frame2time
(frame: int, hop_size=3072, win_len=4096) → float[source]¶ Takes frame index (int) and returns the corresponding central sample The output will use the same unity of measure as
hop_size
andwin_len
(e.g. samples or seconds). Indices start from 0.Returns a float!
-
asmd.utils.
mat2midipath
(mat, path)[source]¶ Writes a midi file from a mat like asmd:
pitch, start (sec), end (sec), velocity
If mat is empty, just do nothing.
-
asmd.utils.
mat_stretch
(mat, target)[source]¶ Changes times of mat in-place so that it has the same average BPM and initial time as target.
Returns mat changed in-place.
-
asmd.utils.
midipath2mat
(path)[source]¶ Open a midi file with one instrument track and construct a mat like asmd:
pitch, start (sec), end (sec), velocity
Rows are sorted by onset, pitch and offset (in this order)
-
asmd.utils.
nframes
(dur, hop_size=3072, win_len=4096) → float[source]¶ Compute the numbero of frames given a total duration, the hop size and window length. Output unitiy of measure will be the same as the inputs unity of measure (e.g. samples or seconds).
N.B. This returns a float!
-
asmd.utils.
open_audio
(audio_fn: Union[str, pathlib.Path]) → Tuple[numpy.ndarray, int][source]¶ Open the audio file in audio_fn and returns a numpy array containing it, one row for each channel (only Mono supported for now) and the orginal sample_rate
-
asmd.utils.
time2frame
(time, hop_size=3072, win_len=4096) → int[source]¶ Takes a time position and outputs the best frame representing it. The input must use the same unity of measure for
time
,hop_size
, andwin_len
(e.g. samples or seconds). Indices start from 0.Returns and int!
Utilities for statistical analysis¶
Scientific notes¶
Artificial misalignment¶
This dataset tries to overcome the problem of needing manual alignment of scores to audio for training models which exploit audio and scores at the both time. The underlying idea is that we have many scores and a lot of audio and users of trained models could easily take advantage of such multimodality (the ability of the model to exploit both scores and audio). The main problem is the annotation stage: we have quite a lot of aligned data, but we miss the corresponding scores, and if we have the scores, we almost always miss the aligned performance.
The approach used is to statistical analyze the available manual
annotations and to reproduce it. Indeed, with misaligned
data I mean
data which try to reproduce the statistical features of the difference
between scores and aligned data.
New description¶
You can evaluate the various approach by running python -m
asmd.alignment_stats
. The script will use Eita Nakamura method to match notes
between the score and the performance and will collect statistics only on the
matched notes; it will then compute the distance between the misaligned score
onset/offset sequence and the real score onset sequence, considering only the
matchng notes, using the L1 error between matching notes. The evaluation uses
vienna_corpus, traditional_flute, MusicNet, Bach10 and asap group
from Maestro dataset for a total of 875 scores, split in train-set and
test-set with 70-30 proportion, resulting in 641 songs for training and 234
songs for testing.
However, since Eita’s method takes a long time on some scores, I removed the scores for which Eita’s method ends after 20 seconds; this resulted in a total of 347 songs for training and ~143 songs for testing (~54% and ~61% of the total number of songs with an available score).
Both the two compared methods are based on the random choice of a standard deviation and a mean for the whole song according to the collected distributions of standard deviations and means. Statistics are collected for onsets differences and duration ratios between performance and score. After the estimation of new onsets and offsets, onsets a sorted and offsets are made lower than the next onsets with the same pitch.
The two methods differ for how the standardized misalignment is computed/generated:
- old method randomly choses it according to the collected distribution
- new method uses an HMM with Gaussian mixture emissions instead of a simple distribution
Moreover, the misaligned data are computed with models trained on the stretched scores, so that the training data consists of scores at the same average BPM as the performance; the misaligned data, then, consists of times at that average BPM.
The following table resumes the results of the comparison:
Ons | Offs | |
HMM | 18.6 ± 49.7 | 20.7 ± 50.6 |
Hist | 7.43 ± 15.5 | 8.95 ± 15.5 |
Misaligned data are finally created by training Histogram on all the 875 scores (~481 considering songs where Eita’s method takes less than 20 sec). Misaligned data are more similar to a new performance than to a symbolic score; for most of MIR applications, however, misaligned data are enough for both training and evaluation.
BPM for score alignment¶
Previously, the BPM was always forced to 20, so that, if the BPM is not available, notes duration can still be expressed in seconds.
Since 0.5, the BPM is simply set to 60 if not available; however, positions of
beats are always provided, so that the user can reconstruct the instant BPM.
The function get_initial_bpm
from the Python API also provides a way to
retrieve the initial instant BPM from the score.
An easy way to get an approximative BPM, is to stretch the score to the duration of the corresponding performance. This can also be done for the beats, and, consequently, for the instant BPM. For instance, let T_0 and T_1 be the initial and ending time of the performance, and let t_0 and t_1 be the initial and ending times of the score. Then, the stretched times of the score at the average performance BPM are given by:
t_new = (t_old - t_0) * (T_1 - T_0) + T_0
where t_old is an original time instant in the score and t_new is the new time
instant after the stretching. Applying this formula to the beat times can help
you to compute the new instant BPM while keeping the same average BPM as the
performance. This functionality is provided by asmd.utils.mat_stretch
for
onsets and offsets, but not for beats yet.
License¶
Ground-truth annotations¶
All ground-truth annotations we used are originally released under Creative Commons licenses. We release our adaptations under Creative Commons.
They are retrieved starting from the following projects:
Datasets | Name used in the default definitions |
---|---|
Bach10 | Bach10 |
Maestro | Maestro |
MusicNet | MusicNet |
PHENICX - Anechoic | PHENICX |
Saarland Music Dataset (SMD) | SMD |
Traditional flute dataset | traditional_flute |
TRIOS dataset | TRIOS (removed: link is dead) |
Vienna Corpus | vienna_corpus |
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Code¶
All the code is released under MIT license:
Copyright 2020 Federico Simonetta https://federicosimonetta.eu.org
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
This paper describes an open-source Python framework for handling datasets for music processing tasks, built with the aim of improving the reproducibility of research projects in music computing and assessing the generalization abilities of machine learning models. The framework enables the automatic download and installation of several commonly used datasets for multimodal music processing. Specifically, we provide a Python API to access the datasets through Boolean set operations based on particular attributes, such as intersections and unions of composers, instruments, and so on. The framework is designed to ease the inclusion of new datasets and the respective ground-truth annotations so that one can build, convert, and extend one’s own collection as well as distribute it by means of a compliant format to take advantage of the API. All code and ground-truth are released under suitable open licenses.
For a gentle introduction, see our paper [1]
TODO¶
- add automatic matching of songs among multiple datasets based on metadata (and maybe audio ID?)
- change the filter function for each level of filtering which takes keyword and value and filter that keyword at that level
- describe datasets provided by default
- generic description of the framework
- improve “adding_datasets” with a full example
- add section “examples”
- move wget to curl
- support Windows systems
Cite us¶
[1] Simonetta, Federico ; Ntalampiras, Stavros ; Avanzini, Federico: ASMD: an automatic framework for compiling multimodal datasets. In: Proceedings of the 17th Sound and Music Computing Conference. Torino, 2020 arXiv:2003.01958
—
Federico Simonetta