Metricbeat is a lightweight agent that collects and sends metrics from your systems and services to Elasticsearch or Logstash. It provides valuable insights into the health and performance of your infrastructure , making it an essential tool for monitoring and observability. By the end of this tutorial, you will have Metricbeat set up and running, allowing you to effectively monitor your GKE clusters.
Creating a Cloud Storage bucket for exporting files with K8s metrics collection #
Create a bucket according to desired settings in Cloud Storage: #
Configure bucket with “uniform-bucket-level-access” – bucket access level with IAM ref:
https://cloud.google.com/storage/docs/uniform-bucket-level-access#should-you-use
gcloud example:
gcloud storage buckets create gs://<NOME_DO_NOVO_BUCKET> --project=<YOUR_PROJECT_ID> --default-storage-class=STANDARD --location=<YOUR_REGION> --uniform-bucket-level-access
This bucket (with NOME_DO_NOVO_BUCKET as the BUCKET NAME) will store the exported files and the GKE cluster metrics collected by the metricbeat component (configurations from steps 3 and 4)
Configure bucket permissions. #
Get the cluster service account for the bucket IAM access settings: Example using gcloud:
gcloud container clusters describe <CLUSTER_NAME> --region <CLUSTER_ZONE_REGION> --format 'json(serviceaccount, nodeConfig.serviceAccount)'
- <CLUSTER_ZONE_REGION> : cluster region – depending on the Location type – example: us-east1 or us-east1-c
Example output:
{
"nodeConfig": {
"serviceAccount":
"(SERVICE_ACCOUNT_NAME)@PROJECT_ID.iam.gserviceaccount.com" }
}
Add the “storage.objectAdmin” role permission to the service account. ref: #
https://cloud.google.com/storage/docs/access-control/iam-roles#:~:text=Storage%20Object%20Admin
Example using gcloud:
gcloud storage buckets add-iam-policy-binding gs://<NEW_BUCKET_NAME> --role "roles/storage.objectAdmin" --member
"serviceAccount:<SERVICE_ACCOUNT_NAME>@<YOUR_PROJECT_ID>.iam.gserviceaccount.com"
- <NEW_BUCKET_NAME> : bucket created in step 1.1.
- <SERVICE_ACCOUNT_NAME> : service account of the cluster obtained by step 1.2
GKE Metadata #
Add GKE_METADATA metadata to the cluster #
Example using gcloud:
gcloud container clusters update <CLUSTER_NAME> --location=<CLUSTER_ZONE_REGION> --workload-pool=<YOUR_PROJECT_ID>.svc.id.goog
- <CLUSTER_ZONE_REGION> : cluster region – depending on the Location type – example: us-east1 or us-east1-c
Enable Integration Add-ons on the GKE cluster. #
GcsFuseCsiDriver add-on on cluster #
Example using gcloud:
gcloud container clusters update <CLUSTER_NAME> --update-addons GcsFuseCsiDriver=ENABLED --region <CLUSTER_ZONE_REGION>
- <CLUSTER_ZONE_REGION> : cluster region – depending on the Location type – example: us-east1 or us-east1-c
Configure metricbeat deployment with file export to bucket in Cloud Storage. #
kube-state-metrics deployment #
Get the kube-state-metrics template and deploy it:
https://kube-state-metrics-template.s3.amazonaws.com/kube-state-metrics-template.yml
Deployment of metricbeat #
Get the metricbeat template:
Manually adjust the following parameters in the template:
- <BUCKET_NAME> : name of the bucket for integrating and exporting files, created in step 1.1.
Adjust the only-dir= gke/ parameter<YOUR_REGION>/<CLUSTER_NAME>
- <YOUR_REGION> : region where the cluster is located
- <CLUSTER_NAME> : name of the cluster that metricbeat will collect and send metrics to in the integration.
This template is already prepared for creating objects in the cluster for metricbeat to work:
- ServiceAccount – will be used when executing the metricbeat service;
- ClusterRole – k8s API and object configurations – read-only;
- Roles and ClusterRoleBinding – additional configurations for reading k8s APIs in metricbeat;
- ConfigMaps – parameters and configurations for integrating metricbeat with kubernetes;
- DaemonSet – metricbeat service that collects metrics and exports files to the bucket.
Metricbeat Service Account #
By default, the name of the Service Account used in the deployment in step 4.2 is metricbeat
Link the metricbeat Service Account with the IAM service account for access to the integration bucket:
Example:
gcloud iam service-accounts add-iam-policy-binding
<SERVICE_ACCOUNT_NAME>@<YOUR_PROJECT_ID>.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member
"serviceAccount:<YOUR_PROJECT_ID>.svc.id.goog[kube-system/metricbeat]"
- <SERVICE_ACCOUNT_NAME> : service account of the cluster obtained by step 1.2
Metricbeat Service Account Integration Bucket Permissions #
Configure the same IAM role/permissions for the integration bucket in the metricbeat Service Account linked in step 4.3
Example:
gcloud storage buckets add-iam-policy-binding gs://<BUCKET_NAME> --role "roles/storage.objectAdmin" --member
"serviceAccount:${PROJECT_ID}.svc.id.goog[kube-system/metricbeat]"
- <BUCKET_NAME> : name of the bucket for integrating and exporting files, created in step 1.1.
Check the metricbeat deployment and check the export. #
Verify that metricbeat pods are running. #
Example:
kubectl get pods -n kube-system -o wide
NOTE : Metricbeat will spin up one pod per node to collect metrics.
Check the pod logs to see if metrics collection events are being generated. #
Example:
Check if after a few minutes of the pod running, the files are being exported to the integration bucket: #
Example:
- <NOME_DO_BUCKET> : bucket name
- <SUA_REGIAO> : region inserted in the prefix as per step 4.2;
- <NOME_DO_CLUSTER> : name of the cluster configured in the prefix step 4.2
- <NOME_DO_CLUSTER_PREFIX> : cluster name, according to the prefix settings in step 4.2
- <NOME_DO_NODE>: name of the node on which the pod that exported the metric is running
File export #
Due to metricbeat limitations, only 1024 log files are preserved. For the system to function correctly, at least the files from the last 7 days need to be preserved – however, we recommend keeping them for at least 35 days.
Since the available configuration is by size and not by time, we recommend the following:
- Leave the default setting (which is 10mb per file) for 1 day; – After exactly 24 hours, check the number of files generated:
- If more than 145 files were generated, please let us know as the bucket will not retain files for a week;
- If 29 or more were generated, your configuration is correct; – If it is less than 29, apply the following formula:
FILESIZE = 10240 / 29 * QUANTITY
For example, if 5 files were generated:
FILEZISE = 10240 / 29 * 5 = 1765
Then, inside the metricbeat-deployment-template-gke-bucket.yml file, set data -> metricbeat.yml:-> output.file -> rotate_every_kb to 1765 instead of 10240.