FMA: A Dataset For Music Analysis

Kirell Benzi, Michaël Defferrard,
Pierre Vandergheynst,
Xavier Bresson,
EPFL LTS2.

Note that this is a beta release and that this repository as well as the
paper and data are subject to change. Stay tuned!

Data

The dataset is a dump of the Free Music Archive.
You got various sizes:

Small: 4,000 clips of
30 seconds, 10 balanced genres (GTZAN-like) (~3.4 GiB)
Medium: 14,511 clips
of 30 seconds, 20 unbalanced genres (~12.2 GiB)
Large (available soon): 77,643 clips of 30 seconds, 68 unbalanced genres
(~90 GiB)
Huge (subject to distribution constraints): 77,643 untrimmed clips, 68
unbalanced genres (~900 GiB)

Notes:

All datasets come with MP3 audio (128 kbps, 44.1 kHz, stereo) of all clips.
All datasets come with the following meta-data about each clip: artist,
title, list of genres (and top genre), play count.
Meta-data about all clips are stored in a JSON file to be loaded as a
pandas dataframe.
As additional audio meta-data, each clip of datasets 1 and 2 come with all
Echonest features.
Please see the paper for a description of how the data was collected and
cleaned.

Code

This repository features the following notebooks:

Generation: generation of the datasets.
Analysis: loading and basic analysis of the data.
Baselines: baseline models for various tasks.
Usage: how to load the datasets and train your own models.

Installation

# Install Python 3.6 and create a virtual environment.
pyenv install 3.6.0
pyenv virtualenv 3.6.0 fma
pyenv activate fma

# Clone the repository.
git clone https://github.com/mdeff/fma.git
cd fma

# Install the dependencies.
make install

# Fill in the configuration.
cat .env
DATA_DIR=/path/to/fma_small

# Open the Jupyter notebook.
jupyter-notebook

# Or run a notebook.
make fma_baselines.ipynb

External dependencies: ffmpeg.
Install CUDA to train on GPU.
See Tensorflow's instructions.

License

Please cite our paper if you use our code or data.
The code is released under the terms of the MIT license.
The dataset is meant for research only.
We are grateful to SWITCH and EPFL for hosting the dataset within the context
of the SCALE-UP project, funded in
part by the swissuniversities
SUC P-2 program.

名稱與所有者	oppa3109/fma
主編程語言	Jupyter Notebook
編程語言	Jupyter Notebook (語言數: 3)
平台
許可證	MIT License

創建於	2017-03-13 23:55:48
推送於	2017-03-11 13:40:23
最后一次提交	2017-03-11 13:38:41
發布數	0

星數	0
關注者數	0
派生數	1
提交數	32
已啟用問題?
問題數	0
打開的問題數	0
拉請求數	0
打開的拉請求數	0
關閉的拉請求數	0

已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?

fma

Github星跟蹤圖

FMA: A Dataset For Music Analysis

Data

Code

Installation

License

主要指標