Comment on page
- NannyML Cloud is deployed as a containerized application on a dedicated Kubernetes cluster, making it more scalable, robust and resource-efficient.
- Application configuration and model inputs/outputs are stored in databases running in the cluster. The volumes are persisted using Azure Managed Disks.
- Network traffic is routed through an Azure Load Balancer (ALB). The ALB instance has a dynamic public IP assigned to it. It is also exposed using a prefix of your choosing as
Managed application architecture
- Only a NannyML support team member can access the cluster, after explicit approval has been granted by you.
- We’re in the process of implementing an automated backup solution.
- NannyML is not intended to serve as your primary storage for model in/outputs. Please ensure you safeguard that data using alternative storage means (e.g. Azure Delta Lake).
- NannyML integrates with Azure Active Directory (Azure AD) for user authentication.
- After you register NannyML as an application in Azure AD, you can control user access to the NannyML cloud application using Azure AD users and groups.
- A subscription and or resource group in Azure where you have sufficient permissions for resource creation
- An application identifier and tenant ID, obtained from performing an “app registration” in Azure AD.
The infrastructure cost of NannyML consists of three components: compute cost, network traffic cost and storage cost. The compute cost will attribute the most to the overall cost. You can exert influence on this cost by tweaking the compute infrastructure used by NannyML. Factors to tweak are:
- Cluster size: set a fixed number of nodes for strict cost control or increase robustness and scalability by enabling cluster auto-scaling. Less nodes is cheaper, more nodes increase your computation power and failure tolerance.
- Node type: you can select the VM size that matches your use workload best. NannyML computations are typically not strictly time-bound but might use a lot of memory for larger datasets. Balance between cost and a future-proof setup when selecting the appropriate CPU and memory sizes. Workloads in Kubernetes fail when running out of memory but are just throttled when hitting CPU limits.
This is what a minimal NannyML setup looks like:
- Auto-scaling cluster, minimum of 1 node, maximum of 3 nodes. Start off at a single node, favoring lower cost above robustness.
- Nodes of a “General Purpose” type like the Azure Dsv3 series. For a small number of models the
Standard_D4s_v3are excellent options.