Azure Storage Account Data Lake
Data Lake Storage Gen2 makes Azure Storage the foundation for building enterprise data lakes on Azure. Designed from the start to service multiple petabytes of information while sustaining hundreds of gigabits of throughput, Data Lake Storage Gen2 allows you to easily manage massive amounts of data.
Made by
Massdriver
Official
Yes
Clouds
Tags
Azure Data Lake Storage Account
Azure Data Lake Storage (ADLS) is a highly scalable and secure data lake that allows enterprises to capture data of any size, type, and ingestion speed for analytics and operational data. ADLS integrates with Azure Blob Storage to provide a hierarchical namespace, enabling fine-grained access controls, high-performance data access, and integration with other Azure services like Azure Synapse Analytics and Azure Databricks.
Design Decisions
- Enable Data Lake: Data Lake is enabled in the storage account to support hierarchical namespace and efficient data access.
- Replication and Redundancy: Configurable replication types based on the need for redundancy and disaster recovery. Options include LRS, ZRS, GRS, and RA-GRS.
- Access Tiers: Includes support for hot and cool access tiers to optimize costs based on data access patterns.
- Blob Properties: Includes configuration for delete retention policies to prevent accidental deletions.
- Monitoring and Alarms: Automatic configuration of monitoring alerts for availability and latency to ensure system health and performance.
- IAM Roles: Configures appropriate IAM roles for read and read/write access using Azure RBAC.
Runbook
Checking Storage Account Availability
If the storage account is not accessible, verify its availability status.
az storage account show --name <storage_account_name> --resource-group <resource_group>
Check the statusOfPrimary
value to ensure it is available
.
Access Issues with Storage Account
If there are permission or access issues, verify the IAM role assignments.
az role assignment list --scope /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Storage/storageAccounts/<storage_account_name>
Ensure that the required Storage Blob Data Reader
and Storage Blob Data Contributor
roles are assigned.
Monitoring Alerts for Latency and Availability
To troubleshoot latency or availability issues, check the configured metric alerts.
az monitor metrics alert list --resource-group <resource_group> --resource /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Storage/storageAccounts/<storage_account_name>
Verify that alerts for availability and latency are set up correctly and check any triggered alerts.
Data Access Latency
If experiencing high latency with E2E or server access, check the corresponding metrics.
az monitor metrics list --resource /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Storage/storageAccounts/<storage_account_name> --metric SuccessE2ELatency,SuccessServerLatency
Review the metrics values and compare them against the configured thresholds to identify if there are any performance issues.
Ensure appropriate actions are taken based on the alerts and metrics observations. Use the Azure portal or CLI to adjust configurations as needed.
Variable | Type | Description |
---|---|---|
account.access_tier | string | How frequently will the data be accessed? Hot data is accessed frequently, while cool data is accessed less frequently. Hot data is cheaper to write to, but costs more to store. Cool data is more expensive to write to, but costs less to store. |
account.region | string | The region where the storage account will be created. Cannot be changed after deployment. |
account.tier | string | The performance tier of the storage account. Premium storage accounts do not support geo-replication. Learn more. Cannot be changed after deployment. |
monitoring.mode | string | Enable and customize Function App metric alarms. |
redundancy.data_protection | integer | Set the number of days to allow data recovery if data is deleted (minimum 1, maximum 365). |
redundancy.replication_type | string | No description |
redundancy.zone_redundancy | boolean | Enable zone redundancy for the storage account. Cannot be changed after deployment. |