GCP GKE Cluster
GKE is a managed Kubernetes service in GCP that implements the full Kubernetes API, 4-way autoscaling, release channels and multi-cluster support.
Made by
Massdriver
Official
Yes
Clouds
Tags
Google Kubernetes Engine (GKE)
Google Kubernetes Engine (GKE) is a managed, production-ready environment for deploying containerized applications. It provides a managed Kubernetes service that simplifies the tasks of managing, scaling, and upgrading containerized applications.
Design Decisions
- Node Pools Management: We utilize the GKE managed node pools feature, allowing different machine types and auto-scaling configurations.
- Workload Identity: Enabled to improve security by allowing Kubernetes workloads to authenticate as Google Cloud service accounts.
- Private Cluster: Private nodes are enabled to ensure that nodes do not have public IP addresses.
- Add-ons Configuration: Add-ons like horizontal pod autoscaling, HTTP load balancing, and DNS cache are configured.
- Logging and Monitoring: Both system and workload components logging and monitoring are configured through GKE's logging and monitoring services.
- Security: Shielded nodes are enabled for enhanced protection against rootkits and bootkits.
- Custom Service Accounts: Node pools are configured to use custom service accounts with specific IAM roles for enhanced security.
Install Kubectl
You will first need to install kubectl
to interact with the kubernetes cluster. Installation instructions for Windows, Mac and Linux can be found here.
Note: While kubectl
generally has forwards and backwards compatibility of core capabilities, it is best if your kubectl
client version is matched with your kubernetes cluster version. This ensures the best stability and compability for your client.
The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a kubeconfig
file.
Download the Kubeconfig File
The standard way to manage connection and authentication details for kubernetes clusters is through a configuration file called a kubeconfig
file. The kubernetes-cluster
artifact that is created when you make a kubernetes cluster in Massdriver contains the basic information needed to create a kubeconfig
file. Because of this, Massdriver makes it very easy for you to download a kubeconfig
file that will allow you to use kubectl
to query and administer your cluster.
For more information on downloading your kubeconfig
file, check out our documentation.
Use the Kubeconfig File
Once the kubeconfig
file is downloaded, you can move it to your desired location. By default, kubectl
will look for a file named config
located in the $HOME/.kube
directory. If you would like this to be your default configuration, you can rename and move the file to $HOME/.kube/config
.
A single kubeconfig
file can hold multiple cluster configurations, and you can select your desired cluster through the use of contexts
. Alternatively, you can have multiple kubeconfig
files and select your desired file through the KUBECONFIG
environment variable or the --kubeconfig
flag in kubectl
.
Runbook
Unable to Connect to GKE Cluster
If you are unable to connect to your GKE cluster, you might need to reconfigure your Kubernetes context.
# Retrieve cluster credentials
gcloud container clusters get-credentials <cluster-name> --region <region> --project <project-id>
This command configures kubectl
to use the cluster's credentials.
Troubleshooting Pod Issues
Sometimes, your pods might not behave as expected. You can describe and get logs for the pods:
# Describe the pod
kubectl describe pod <pod-name> -n <namespace>
# Check pod logs
kubectl logs <pod-name> -n <namespace>
Use these commands to get detailed information about the pod's state and any recent log output.
Checking Cluster and Nodes Health
To check the overall health of your GKE cluster and the nodes, you can use the following commands:
# Get cluster details
gcloud container clusters describe <cluster-name> --region <region> --project <project-id>
# List all nodes
kubectl get nodes
These commands provide an overview of the cluster's configuration and the status of all nodes within it.
Debugging Service Issues
If a service is not behaving as expected, you can describe the service and its endpoints:
# Describe the service
kubectl describe service <service-name> -n <namespace>
# Check endpoints
kubectl get endpoints <service-name> -n <namespace>
These commands give insights into the service configuration and which pods are backing it.
Pod Scheduling Issues
If your pods are not scheduling, there might be a resource constraint or taints/tolerations issue:
# Check scheduler events
kubectl get events -n <namespace>
# Describe the nodes
kubectl describe nodes
Reviewing events and node descriptions can help identify why pods are not being scheduled.
Node Pool Autoscaling Issues
If node pools are not autoscaling as expected, verify the autoscaling configuration:
# Describe the node pool
gcloud container node-pools describe <node-pool-name> --region <region> --cluster <cluster-name> --project <project-id>
Check the current settings and history to diagnose discrepancies.
Container Crashes
If a container within a pod crashes or restarts frequently:
# Describe the pod to see container status and reason for restarts
kubectl describe pod <pod-name> -n <namespace>
# Get logs, including previous instance if it crashed
kubectl logs <pod-name> -n <namespace> --previous
Assess the logs and container status to determine the cause of the crashes.
Variable | Type | Description |
---|---|---|
cluster_networking.cluster_ipv4_cidr_block | string | CIDR block to use for kubernetes pods. Set to /netmask (e.g. /16) to have a range chosen with a specific netmask. Set to a CIDR notation (e.g. 10.96.0.0/14) from the RFC-1918 private networks (e.g. 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) to pick a specific range to use. |
cluster_networking.master_ipv4_cidr_block | string | CIDR block to use for kubernetes control plane. The mask for this must be exactly /28. Must be from the RFC-1918 private networks (e.g. 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), and should not conflict with other ranges in use. It is recommended to use consecutive /28 blocks from the 172.16.0.0/16 range for all your GKE clusters (172.16.0.0/28 for the first cluster, 172.16.0.16/28 for the second, etc.). |
cluster_networking.services_ipv4_cidr_block | string | CIDR block to use for kubernetes services. Set to /netmask (e.g. /20) to have a range chosen with a specific netmask. Set to a CIDR notation (e.g. 10.96.0.0/14) from the RFC-1918 private networks (e.g. 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) to pick a specific range to use. |
core_services.cloud_dns_managed_zones[] | array(string) | No description |
core_services.enable_ingress | boolean | Enabling this will create an nginx ingress controller in the cluster, allowing internet traffic to flow into web accessible services within the cluster |
node_groups[].is_spot | boolean | Spot instances are more affordable, but can be preempted at any time. |
node_groups[].machine_type | string | Machine type to use in the node group |
node_groups[].max_size | number | Maximum number of instances in the node group |
node_groups[].min_size | number | Minimum number of instances in the node group |
node_groups[].name | string | The name of the node group |