Docs
As organizations generate larger volumes of research, operational, AI, and business data, managing how people identify and reference that data becomes increasingly important.
Historically, data collections have often been referenced by technical storage locations, cloud platforms, folder structures, bucket names, or file paths. While this may work for technical teams, it creates confusion for researchers, analysts, business users, and external collaborators who simply need access to the correct data.
MLADU Data Sets solve this challenge by providing a standardized way to define, organize, and reference collections of data regardless of where the data resides.
By separating data identification from technical connectivity, MLADU helps organizations transfer data faster, reduce errors, and improve collaboration across internal teams and external partners.
A MLADU Data Set represents a collection of data that exists at a specific Data Station.
A Data Set may contain:
Rather than identifying data by its storage location, MLADU allows organizations to define the data collection itself as a uniquely identifiable asset.
Each Data Set is assigned:
This allows users to quickly locate and reference the correct collection of data without needing to understand how or where the data is stored.
A Data Set exists at a Data Station, but it serves a different purpose.
A Data Station defines how MLADU connects to a data storage platform or environment.
A Data Set defines the data collection itself.
Think of it this way:
This separation creates a more intuitive experience for users while allowing technical administrators to manage connectivity independently.
Example: Multiple Researchers Using the Same Data Station
Imagine two researchers working within the same AWS S3 environment.
Both researchers access the same Data Station
Data Station
However, they utilize different Data Sets:
Data Set #1
Data Set #2
Data Set #3
Although the data resides within the same storage platform, the Data Sets allow users to clearly distinguish between different collections of information without confusion.
Example: CRO Data Delivery
Contract Research Organizations (CROs) frequently deliver data to sponsors and clients.
Instead of asking clients to locate data within a storage platform, the CRO can define and publish a clearly named Data Set such as:
Data Set Name
Description
The client can immediately identify the correct dataset without needing to understand the underlying storage architecture.
MLADU Data Sets were designed around a simple principle:
People who need data should not have to think about technical connectivity, infrastructure, integration platforms, or security configurations.
They simply want access to the correct data as quickly as possible.
Simplifying Data Identification
As organizations adopt more cloud platforms, AI systems, research repositories, and analytics environments, it becomes increasingly difficult to identify data based solely on technical storage locations.
Consider the difference between these two references:
Traditional Reference
Azure Storage Container XYZ, Folder ABC, Snapshot March 2026
MLADU Reference
XYZ March 2026
The second example is dramatically easier to understand, communicate, and use.
Eliminating Naming Confusion
One of the most common challenges in large organizations is that different teams often refer to the same data collection using different names.
Examples include:
All of these may refer to the same information.
MLADU Data Sets create a single authoritative definition that everyone can reference consistently.
Faster Data Transfers
Because users can quickly identify the correct Data Set, transfer setup becomes significantly faster.
Instead of navigating technical paths and verifying storage locations, users simply select the desired Data Set and proceed with the transfer.
Over hundreds or thousands of transfers, these efficiencies save substantial time and reduce operational complexity.
MLADU uses role-based access controls to ensure Data Sets are managed consistently and securely.
Only two roles can modify Data Set definitions and metadata.
Data Owner
The Data Owner is responsible for managing the Data Set catalog within the organization.
Think of the Data Owner as a data steward, curator, or chief librarian.
Responsibilities include:
The Data Owner helps ensure that users can easily locate and understand available data collections.
Portal Admin
The Portal Admin serves as the MLADU account super user.
As a backup administrator, the Portal Admin can also create, modify, and manage Data Sets when necessary.
Standard Users
All other MLADU users are limited to viewing:
This approach allows users to discover and leverage available data while maintaining centralized governance.
To support governance and compliance requirements, MLADU maintains a detailed audit history of Data Set modifications.
Organizations can track:
This visibility supports regulatory compliance, operational accountability, and data governance initiatives.
MLADU provides flexible visibility controls to support both internal collaboration and external data sharing.
Private Data Sets
By default, all Data Sets are private.
Private Data Sets are visible only within your MLADU organization and can be selected by authorized users during transfer creation.
This model supports most enterprise and research use cases while maintaining strong control over data visibility.
Public Data Sets
Organizations that distribute data to external parties can choose to make Data Sets publicly available.
For a Data Set to be public, it must exist at a Data Station that is also configured as public.
Once enabled, any MLADU user can discover the Data Set by viewing the associated public Data Station.
External users can see:
This allows organizations to advertise available data collections without exposing technical connectivity details.
Ideal for Data Vendors and Research Organizations
Public Data Sets are particularly valuable for:
For example, a research consortium could publish:
Data Set Name
Description
Researchers can easily discover and request the appropriate data without requiring manual coordination.
The number of Data Sets available to your organization is determined by two factors.
1. Your MLADU Subscription Plan
Every MLADU subscription includes a predefined number of Data Sets.
This allows organizations to start with a cost-effective solution that meets their initial requirements.
2. Additional Data Set Capacity
Organizations with larger data catalogs can purchase additional Data Sets as needed.
This flexible approach allows customers to expand their data catalog without overpaying for unused capacity.
Whether you manage dozens of datasets or tens of thousands of research collections, MLADU provides a scalable framework for organizing and transferring data efficiently.
MLADU Data Sets provide a smarter way to organize, identify, and transfer data.
By separating data collections from technical connectivity, organizations can improve user experiences, reduce confusion, accelerate data transfers, and strengthen data governance.
Whether you are managing clinical trial data, genomic research, AI training datasets, financial information, or operational records, MLADU Data Sets make it easier for users to find the right data and move it where it needs to go.
Ready to simplify how your organization manages and transfers data?
Start a free MLADU trial and experience how Data Sets and Data Stations work together to create a faster, more secure, and more scalable data transfer platform.
Prefer a guided walkthrough?
Schedule a personalized MLADU demonstration and see how organizations are using MLADU to organize, discover, and transfer terabytes and petabytes of data with confidence.
Topics