AWS EKS Cluster
Elastic Kubernetes Service is an open source container orchestration platform that automates many of the manual processes involved in deploying, managing, and scaling containerized applications.
Made by
Massdriver
Official
Yes
Clouds
Tags
aws-eks-cluster
AWS EKS (Elastic Kubernetes Service) is Amazon's managed Kubernetes service, making it easy to deploy, operate, and scale containerized applications and providing benefits such as automatic scaling of worker nodes, automatic upgrades and patching, integration with other AWS services, and access to the Kubernetes community and ecosystem.
Use Cases
Container orchestration
Kubernetes is the most powerful container orchestrator, making it easy to deploy, scale, and manage containerized applications.
Microservices architecture
Kubernetes can be used to build and manage microservices-based applications, allowing for flexibility and scalability in a distributed architecture.
Big Data and Machine Learning
Kubernetes can be used to deploy and manage big data and machine learning workloads, providing scalability and flexibility for processing and analyzing large data sets.
Internet of Things (IoT)
Kubernetes can be used to manage and orchestrate IoT applications, providing robust management and scaling capabilities for distributed IoT devices and gateways.
Design
EKS provides a "barebones" Kubernetes control plane, meaning that it only includes the essential components required to run a Kubernetes cluster. These components include the Kubernetes API server, etcd (a distributed key-value store for storing Kubernetes cluster data), the controller manager and the scheduler.
In order simplify deploying and operating a Kubernetes cluster, this bundle includes numerous optional addons to deliver a fully capable and feature rich cluster that's ready for production workloads. Some of these addons are listed below.
Cluster Autoscaler
A cluster autoscaler is installed into every EKS cluster to automatically scale the number of nodes in the cluster based on the current resource usage. This providers numerous benefits such as cost efficiency, higher availability and better resource utilization.
NGINX Ingress Controller
Users can optionally install the "official" Kubernetes NGINX ingress controller (not to be confused with NGINX's own ingress controller based on the paid nGinx-plus) into their cluster, which allows workloads in your EKS cluster to be accessible from the internet.
External-DNS and Cert-Manager
If users associate one or more Route53 domains to their EKS cluster, this bundle will automatically install external-dns and cert-manager in the cluster, allowing the cluster to automatically create and manage DNS records and TLS certificates for internet accessible workloads.
EBS CSI Driver
Beginning in Kubernetes version 1.23, EKS no longer comes with the default EBS provisioner. In order to allow users to continue using the default gp2
storage class, this bundle includes the EBS CSI Driver, which replaces the deprecated EBS provisioner.
EFS CSI Driver
Optionally, users can also install the EFS CSI Driver which will allow the EKS cluster to attach EFS volumes to cluster workloads for persistant storage. EFS volumes offer some benefits over EBS volumes, such as allowing multiple pods to use the volume simultaneously (ReadWriteMany) and not being being locked to a single AWS availability zone, but these benefits come with higher storage costs and increased latency.
Fargate
Fargate can be enabled to allow AWS to provide on-demand, right-sized compute capacity for running containers on EKS without managing node pools or clusters of EC2 instances.
For workloads that require high uptime, its recommended to keep some node pools populated even when enabling Fargate to ensure compute is always available during surges.
Fargate has many limitations.
Currently only namespace
selectors are implemented. If you need label
selectors please file an issue.
Best Practices
Managed Node Groups
Worker nodes in the cluster are provisioned as managed node groups.
Secure Networking
Cluster is designed according to AWS's EKS networking best practices including deploying nodes in private subnets and only deploying public load balancers into public subnets.
Cluster Autoscaler
A cluster autoscaler is automatically installed to provide node autoscaling as workload demand increases.
Open ID Connect (OIDC) Provider
Cluster is pre-configured for out-of-the box support of IAM Roles for Service Accounts (IRSA).
Security
Nodes Deployed into Private Subnets
Worker nodes are provisioned into private subnets for security.
IAM Roles for Service Accounts
IRSA allows kubernetes pods to assume AWS IAM Roles, removing the need for static credentials to access AWS services.
Secret Encryption
An AWS KMS key is created and associated to the cluster to enable encryption of secrets at rest.
IMDSv2 Required on Node Groups
The Instance Metadata Service version 2 (IMDSv2) is required on all EKS node groups. IMDSv1, which was the cause of the 2019 CapitalOne data breach, is disabled on all node groups.
Connecting
After you have deployed a Kubernetes cluster through Massdriver, you may want to interact with the cluster using the powerful kubectl command line tool.
Install Kubectl
You will first need to install kubectl
to interact with the kubernetes cluster. Installation instructions for Windows, Mac and Linux can be found here.
Note: While kubectl
generally has forwards and backwards compatibility of core capabilities, it is best if your kubectl
client version is matched with your kubernetes cluster version. This ensures the best stability and compability for your client.
The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a kubeconfig
file.
Download the Kubeconfig File
The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a kubeconfig
file. The kubernetes-cluster
artifact that is created when you make a kubernetes cluster in Massdriver contains the basic information needed to create a kubeconfig
file. Because of this, Massdriver makes it very easy for you to download a kubeconfig
file that will allow you to use kubectl
to query and administer your cluster.
To download a kubeconfig
file for your cluster, navigate to the project and target where the kubernetes cluster is deployed and move the mouse so it hovers over the artifact connection port. This will pop a windows that allows you to download the artifact in raw JSON, or as a kubeconfig
yaml. Select "Kube Config" from the drop down, and click the button. This will download the kubeconfig
for the kubernetes cluster to your local system.
Use the Kubeconfig File
Once the kubeconfig
file is downloaded, you can move it to your desired location. By default, kubectl
will look for a file named config
located in the $HOME/.kube
directory. If you would like this to be your default configuration, you can rename and move the file to $HOME/.kube/config
.
A single kubeconfig
file can hold multiple cluster configurations, and you can select your desired cluster through the use of contexts
. Alternatively, you can have multiple kubeconfig
files and select your desired file through the KUBECONFIG
environment variable or the --kubeconfig
flag in kubectl
.
Once you've configured your environment properly, you should be able to run kubectl
commands.
Runbook
Troubleshooting EKS Cluster Connectivity Issues
Kubernetes cluster endpoint might be unreachable. Verify the connectivity and authentication.
Check Cluster Endpoint
aws eks describe-cluster --name your-cluster-name --query "cluster.endpoint"
Ensure the endpoint is reachable from your network.
Check Authentication Token
aws eks get-token --cluster-name your-cluster-name
Verify that the token is generated without errors.
Kubernetes API Server Logs
kubectl logs -n kube-system $(kubectl get pods -n kube-system -l k8s-app=kube-apiserver -o name) -c kube-apiserver
This command aggregates logs from the API server for debugging potential issues.
Certificate Issues with Cert-Manager
Cert-Manager might fail to issue certificates due to misconfigurations or API rate limits.
Check Cert-Manager Logs
kubectl logs -n md-core-services -l app=cert-manager
Look for errors indicating why certificates might be failing.
Validate ClusterIssuer
kubectl describe clusterissuer letsencrypt-prod
Ensure that the ClusterIssuer configuration is correct and the ACME server is reachable.
DNS Resolution Problems with External-DNS
DNS records might fail to update in Route 53.
Check External-DNS Logs
kubectl logs -n md-core-services -l app=external-dns
Identify any error messages related to DNS updates or API limits.
Verify Route 53 Hosted Zones
aws route53 list-hosted-zones
Ensure that hosted zones' IDs and names match your Route 53 configuration.
EBS CSI Driver Storage Issues
Persistent Volumes may fail to provision or attach to nodes.
Check EBS CSI Driver Logs
kubectl logs -n kube-system -l app=ebs-csi-controller
Review logs to identify issues with volume provisioning or attachment.
Manually Describe a Volume
aws ec2 describe-volumes --volume-ids vol-xxxxxxx
Verify the status and details of the problematic volume directly.
Pod Scheduling Problems (Cluster Autoscaler)
Pods might remain in "Pending" state due to lack of resources or other scheduling issues.
Check Cluster Autoscaler Logs
kubectl logs -n kube-system -l app=cluster-autoscaler
Look for reasons why the autoscaler might not be scaling up nodes.
Verify Node Resources
kubectl describe node <node-name>
Check node capacity and allocations to identify resource issues.
Metrics and Monitoring Issues
Problems with collecting or visualizing metrics using Prometheus and Grafana.
Check Prometheus Operator Logs
kubectl logs -n md-observability -l app.kubernetes.io/name=prometheus-operator
Identify potential issues with Prometheus scraping or alerting configurations.
Access Grafana UI
kubectl port-forward svc/grafana -n md-observability 3000:3000
Verify that Grafana is accessible and that dashboards display the expected metrics.
By utilizing these runbook commands and tools, you can troubleshoot and manage your AWS EKS resources effectively.
AWS Access
If you would like to manage access your EKS cluster through AWS IAM principals, you can do so via the aws-auth
ConfigMap. This will allow the desired AWS IAM principals to view cluster status in the AWS console, as well as generate short-lived credentials for kubectl
access. Refer to the AWS documentation for more details.
Note: In order to connect to the EKS cluster to view or modify the aws-auth
ConfigMap, you'll need to download the kubeconfig
file and use kubectl
as discussed earlier.
Variable | Type | Description |
---|---|---|
core_services.enable_efs_csi | boolean | Enabling this will install the AWS EFS storage controller into your cluster, allowing you to provision persistent volumes backed by EFS file systems. |
core_services.enable_ingress | boolean | Enabling this will create an nginx ingress controller in the cluster, allowing internet traffic to flow into web accessible services within the cluster |
core_services.route53_hosted_zones[] | array(string) | No description |
fargate.enabled | boolean | Enables EKS Fargate |
k8s_version | string | The version of Kubernetes to run. WARNING: Upgrading Kubernetes version must be done one minor version at a time. For example, upgrading from 1.28 to 1.30 requires upgrading to 1.29 first. |
monitoring.control_plane_log_retention | integer | Duration to retain control plane logs in AWS Cloudwatch (Note: control plane logs do not contain application or container logs) |
monitoring.prometheus.grafana_enabled | boolean | Install Grafana into the cluster to provide a metric visualizer |
monitoring.prometheus.persistence_enabled | boolean | This setting will enable persistence of Prometheus data via EBS volumes. However, in small clusters (less than 5 nodes) this can create problems of pod scheduling and placement due EBS volumes being zonally-locked, and thus should be disabled. |
node_groups[].advanced_configuration_enabled | boolean | No description |
node_groups[].instance_type | string | Instance type to use in the node group |
node_groups[].max_size | integer | Maximum number of instances in the node group |
node_groups[].min_size | integer | Minimum number of instances in the node group |
node_groups[].name_suffix | string | The name of the node group |