Dataset Concepts Overview
The Datasets module supports comprehensive management and organization of machine learning datasets. You can work with dataframes (single CSV files) or entire datasets consisting of multiple files.
1. Dataset Client
Interaction with this module is streamlined by a dynamic API client, making it easy to:
- Upload, download, or modify dataset files.
- Create and manage dataset versions for iterative improvements.
2. Dataframes
A dataframe within this module is effectively one CSV file. This allows you to:
- Quickly inspect and manipulate structured data.
- Version small or medium-sized datasets without extra overhead.
3. Key Benefits
-
Accessibility
- Access data from any location or machine, boosting availability across teams.
-
Storage Optimization
- Use a hierarchical structure to inherit files from parent datasets, reducing redundancy.
-
Versioning
- Create multiple dataset versions to track changes and streamline updates or file removals.
Next Steps
- Datasets UI – Learn how to visually manage dataframes and dataset files.
- Dataset Client REST API – Automate dataset creation and versioning through API calls.
- Dataframes – Explore single CSV file handling in the “Dataframes” module.