List of datasets for machine-learning research

These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals.

Although they do not need to be labeled, high-quality datasets for unsupervised learning can also be difficult and costly to produce.

The data portals which are suitable for a specific subtype of machine learning application are listed in the subsequent sections.

Further details are provided in the project's GitHub repository and respective Hugging Face dataset card.

This section includes datasets that ... Taskmaster-2: 17,289 dialogs in the seven domains (restaurants, food ordering, movies, hotels, flights, music and sports).

Further information is provided in the GitHub repository of the project and the Hugging Face data card.

The scripts to process the data are available in the GitHub repo mentioned on the paper: https://github.com/google-research/FLAN/tree/main/flan.