Downloading publicly available datasetsΒΆ
We provide scripts to download and process the following publicly available datasets:
- An4 - Alphanumeric database
- Librispeech - reading english books
- TED-LIUM 3 (ted3) - TED talks
- Voxforge
- common voice (old version)
Simply run the respective scripts in sonosco > datasets > download_datasets
with the
output_path flag and it will download and process the dataset. Further, it will create
a manifest file for the dataset.
For example
python an4.py --target-dir temp/data/an4