Python API

Intro

This project also provides a few API for filtering the datasets according to some specified prerequisites and getting the data in a convenient format.

Python

Import audioscoredataset and create a Dataset object, giving the path of the datasets.json file in this directory as argument to the constructor. Then, you can use the filter method to filter data according to your needs (you can also re-filter them later without reloading datasets.json).

You will find a value paths in your Dataset instance containing the correct paths to the files you are requesting.

Moreover, the method get_item returns an array of audio values and a structured_array representing the ground_truth as loaded from the json file.

Example:

from asmd import asmd

d = asmd.Dataset()
# d = asd.Dataset(paths=['path_to_my_definitions', 'path_to_default_definitions'])
d.filter(instrument='piano', ensemble=False, composer='Mozart', ground_truth=['precise_alignment'])

audio_array, sources_array, ground_truth_array = d.get_item(1)

audio_array = d.get_mix(2)
source_array = d.get_source(2)
ground_truth_list = d.get_gts(2)

mat = get_score_mat(2, score_type=['precise_alignment'])

Note that you can inherit from asmd.Dataset and torch.utils.data.Dataset to create a PyTorch compatible dataset which only load audio files when thay are accessed. You will just need to implement the __getitem__ method.

Documentation

asmd.asmd.load_definitions(path)[source]

Given a path to a directory, returns a list of dictionaries containing the definitions found in that directory (not recursive search)

class asmd.asmd.Dataset(paths=['default_path'], metadataset_path=['default_path'])[source]
__init__(definitions=['/home/docs/checkouts/readthedocs.org/user_builds/asmd/checkouts/latest/asmd/definitions/'], metadataset_path='/home/docs/checkouts/readthedocs.org/user_builds/asmd/checkouts/latest/asmd/datasets.json', empty=False)[source]

Load the dataset description and populate the paths

This object has a fundamental field named paths which is a list; each entry contain another list of 3 values representing thepath to, respectively: mixed recording, signle-sources audio, ground-truth file per each source

Parameters:
  • definitions (*) – paths where json dataset definitions are stored; if empty, the default definitions are used
  • metadataset_path (*) – the path were the generic information about where this datetimeis installed are stored
  • empty (*) – if True, no definition is loaded
Returns:

instance of the class

Return type:

* AudioScoreDataset

__weakref__

list of weak references to the object (if defined)

get_audio(idx, sources=None)[source]

Get the mixed audio of certain sources or of the mix

Parameters:
  • idx (int) – The index of the song to retrieve.
  • sources (list or None) – A list containing the indices of sources to be mixed and returned. If None, no sources will be mixed and the global mix will be returned.
Returns:

  • numpy.ndarray – A (n x 1) array which represents the mixed audio.
  • int – The sampling rate of the audio array

get_audio_data(idx)[source]

Returns audio data of a specific item without loading the full audio.

N.B. see essentia.standard.MetadataReader!

Returns:
  • list of tuples – each tuple is referred to a source and contains the following
  • int – duration in seconds
  • int – bitrate (kb/s)
  • int – sample rate
  • int – number of channels
get_beats(idx)[source]

Get a list of beat position in seconds, to be used together with the score data.

Parameters:idx (int) – The index of the song to retrieve.
Returns:each row contains beat positions of each ground truth
Return type:numpy.ndarray
get_gts(idx)[source]

Return the ground-truth of the wanted item

Parameters:idx (int) – the index of the wanted item
Returns:list of dictionary representing the ground truth of each single source
Return type:list
get_gts_paths(idx) → List[str][source]

Return paths to the ground-truth files, one for each source

Returns list of string

get_initial_bpm(idx) → Optional[float][source]

Return the initial bpm of the first source if score alignment type is available at index idx, otherwise returns None

get_item(idx)[source]

Returns the mixed audio, sources and ground truths of the specified item.

Parameters:idx (int) – the index of the wanted item
Returns:
  • numpy.ndarray – audio of the mixed sources
  • list – a list of numpy.ndarray representing the audio of each source
  • list – list of dictionary representing the ground truth of each single source
get_missing_extra_notes(idx, kind: str) → List[numpy.ndarray][source]

Returns the missing or extra notes of a song. For each source, an array of boolean values is returned. If you want the missing/extra notes for the whole song, use dataset_utils.get_score_mat

kind can be ‘extra’ or ‘missing’

get_mix(idx, sr=None)[source]

Returns the audio array of the mixed song

Parameters:
  • idx (int) – the index of the wanted item
  • sr (int or None) – the sampling rate at which the audio will be returned (if needed, a resampling is performed). If None, no resampling is performed
Returns:

  • mix (numpy.ndarray) – the audio waveform of the mixed song
  • int – The sampling rate of the audio array

get_mix_paths(idx) → List[str][source]

Return paths to the mixed recording if available

Returns list of string (usually only one)

get_pianoroll(idx, score_type=['misaligned'], resolution=0.25, onsets=False, velocity=True)[source]

Create pianoroll from list of pitches, onsets and offsets (in this order).

Parameters:
  • idx (int) – The index of the song to retrieve.
  • score_type (list of str) – The key to retrieve the list of notes from the ground_truths. see chose_score_type for explanation
  • resolution (float) – The duration of each column (in seconds)
  • onsets (bool) – If True, the value ‘-1’ is put sn each onset
  • velocity (bool) – if True, values of each note is the velocity (except the first frame if onsets is used)
Returns:

A (128 x n) array where rows represent pitches and columns are time instants sampled with resolution provided as argument.

Return type:

numpy.ndarray

Note

In the midi.org standard, pitches start counting from 0; however, sometimes people use to count pitches from 1. Depending on the dataset that you are using, verify how pitches are counted. In the ASMD default ground-truths, pitches are set with 0-based indexing.

In case your dataset does not start counting pitches from 0, you should correct the output of this function.

get_score_duration(idx)[source]

Returns the duration of the most aligned score available for a specific item

get_songs()[source]

Returns a list of dict, each representing a song

get_source(idx)[source]

Returns the sources at the specified index

Parameters:idx (int) – the index of the wanted item
Returns:
  • list – a list of numpy.ndarray representing the audio of each source
  • int – The sampling rate of the audio array
get_sources_paths(idx) → List[str][source]

Return paths to single-sources audio recordings, one for each audio

Returns list of string

idx_chunk_to_whole(name, idx)[source]

Given a dataset name and an idx or a list of idx relative to the input dataset, returns the idx relative to this whole dataset.

Use this method if you need, for instance the index of a song for which you have the index in a single dataset.

parallel(func, *args, **kwargs)[source]

Applies a function to all items in paths in parallel using joblib.Parallel.

You can pass any argument to joblib.Parallel by using keyword arguments.

Parameters:func (callable) –

the function that will be called; it must accept two arguments that are the index of the song and the dataset. Then, it can accept all args and kwargs that are passed to this function:

>>>  def myfunc(i, dataset, pinco, pal=lino):
...     # do not use `filter` and `chunks` here
...     print(pinco, pal)
...     print(dataset.paths[i])
... marco, etto = 4, 5
... d = Dataset().filter(datasets='Bach10')
... d.parallel(myfunc, marco, n_jobs=8, pal=etto)

filter and chunks shouldn’t be used.

Returns:The list of objects returned by each func
Return type:list
asmd.dataset_utils._check_consistency(dataset, fix=False)[source]

Checks that is a dataset is included, then at least one of its songs is included and that if a dataset is excluded, then all of its songs are excluded.

If fix is True, if fixes the dataset inclusion, otherwise raise a RuntimeError

asmd.dataset_utils._compare_dataset(compare_func, dataset1, dataset2, **kwargs)[source]

Returns a new dataset where each song and dataset are included only if compare_func is True for each corresponding couplke of songs and datasets

asmd.dataset_utils.choice(dataset, p=[0.6, 0.2, 0.2], random_state=None)[source]

Returns N non-overlapping datasets randomply sampled from dataset, where N is len(p); each song belong to a dataset according to the distribution probability p. Note that p is always normalized to sum to 1.

random_state is an int or a np.random.RandomState object.

asmd.dataset_utils.chose_score_type(score_type, gts)[source]

Return the proper score type according to the following rules

Parameters:
  • score_type (list of str) – The key to retrieve the list of notes from the ground_truths. If multiple keys are provided, only one is retrieved by using the following criteria: if there is precise_alignment in the list of keys and in the ground truth, use that; otherwise, if there is broad_alignment in the list of keys and in the ground truth, use that; otherwise if misaligned in the list of keys and in the ground truth, use use score.
  • gts (list of dict) – The list of ground truths from which you want to chose a score_type
asmd.dataset_utils.complement(dataset, **kwargs)[source]

Takes one dataset and returns a new dataset representing the complement of the input

This functions calls filter to populate the paths and returns them woth all the sources. However, you can pass any argument to filter, e.g. the sources argument

asmd.dataset_utils.filter(dataset, instruments=[], ensemble=None, mixed=True, sources=False, all=False, composer='', datasets=[], groups=[], ground_truth=[], copy=False)[source]

Filter the paths of the songs which accomplish the filter described in kwargs. If this dataset was already fltered, only filters those paths that are already included.

For advanced usage:

So that a dataset can be filtered, it must have the following keys:

  • songs
  • name
  • included

All the attributes are checked at the song level, except for:

  • ensemble: this is checked at the dataset-level (i.e. each dataset can be for ensemble or not) This may change in future releases
  • ground_truth: this is checked at group level (i.e. each subgroup can have different annotations)

Similarly, each song must have the key included and optionally the other keys that you want to filter, as described by the arguments of this function.

Parameters:
  • instruments (list of str) – a list of strings representing the instruments that you want to select (exact match with song)
  • ensemble (bool) – if loading songs which are composed for an ensemble of instrument. If None, ensemble field will not be checked and will select both (default None)
  • mixed (bool) – if returning the mixed track for ensemble song (default True )
  • sources (bool) – if returning the source track for ensemble recording which provide it (default False )
  • all (bool) – only valid if sources is True : if True , all sources (audio and ground-truth) are returned, if False, only the first target instrument is returned. Default False.
  • composer (string) – the surname of the composer to filter
  • groups (list of strings) – a list of strings containing the name of the groups that you want to retrieve with a logic ‘AND’ among them. If empty, all groups are used. Example of groups are: ‘train’, ‘validation’, ‘test’. The available groups depend on the dataset. Only Maestro dataset supported for now.
  • datasets (list of strings) – a list of strings containing the name of the datasets to be used. If empty, all datasets are used. See License for the list of default datasets. The matching is case insensitive.
  • ground_truth (dict[str, int]) – a dictionary (string, int) representing the type of ground-truths needed (logical AND among list elements). Each entry has the form needed_ground_truth_type as key and level_of_truth as value, where needed_ground_truth_type is the key of the ground_truth dictionary and level_of_truth is an int ranging from 0 to 2 (0->False, 1->True (manual annotation), 2->True(automatic annotation)). If only part of a dataset contains a certain ground-truth type, you should use the group attribute to only select those songs.
  • copy (bool) – If True, a new Dataset object is returned, and the calling one is leaved untouched
Returns:

  • The input dataset as modified (d = Dataset().filter(…))
  • If copy is True, return a new Dataset object.

asmd.dataset_utils.get_pedaling_mat(dataset, idx, frame_based=False, winlen=0.046, hop=0.01)[source]

Get data about pedaling

Parameters:
  • idx (int) – The index of the song to retrieve.
  • frame_based (bool) – If True, the output will contain one row per frame, otherwise one row per control changes event. Frames are deduced from winlen and hop.
  • winlen (float) – The duration of a frame in seconds; only used if frame_based is True.
  • hop (float) – The amount of hop-size in seconds; only used if frame_based is True.
Returns:

list of 2d-arrays each listing all the control changes events in a track. Rows represent control changes or frames (according to frame_based_option) while columns represent (time, sustain value, sostenuto value, soft value).

If frame_based is used, time is the central time of the frame and frames are computed using the most aligned score available for this item.

If frame_based is False, value -1 is used for pedaling type not affected in a certain control change (i.e. a control change affects one type of pedaling, so the other two will have value -1).

The output is sorted by time.

Return type:

list[np.ndarry]

asmd.dataset_utils.get_score_mat(dataset, idx, score_type=['misaligned'], return_notes='')[source]

Get the score of a certain score, with times of score_type

Parameters:
  • idx (int) – The index of the song to retrieve.
  • score_type (list of str) – The key to retrieve the list of notes from the ground_truths. see chose_score_type for explanation
  • return_notes (str) – 'missing', 'extra' or 'both'; the notes that will be returned together with the score; see asmd.asmd.Dataset.get_missing_extra_notes for more info
Returns:

  • numpy.ndarray – A (n x 6) array where columns represent pitches, onsets (seconds), offsets (seconds), velocities, MIDI program instrument and number of the instrument. Ordered by onsets. If some information is not available, value -255 is used. The array is sorted by onset, pitch and offset (in this order)
  • numpy.ndarray – A boolean array with True if the note is missing or extra (depending on return_notes); only if return_notes is not None
  • numpy.ndarray – Another boolean array with True if the note is missing or extra (depending on return_notes); only if return_notes == 'both'

asmd.dataset_utils.intersect(*datasets, **kwargs)[source]

Takes datasets and returns a new dataset representing the intersection of them The datasets must have the same order in the datasets and songs (e.g. two datasets initialized in the same way and only filtered)

This functions calls filter to populate the paths and returns them woth all the sources. However, you can pass any argument to filter, e.g. the sources argument

asmd.dataset_utils.union(*datasets, **kwargs)[source]

Takes datasets and returns a new dataset representing the union of them The datasets must have the same order in the datasets and songs (e.g. two datasets initialized in the same way and only filtered)

This functions calls filter to populate the paths and returns them woth all the sources. However, you can pass any argument to filter, e.g. the sources argument