Loki and Grafana for Logs

🔖 cloud-software ⏲️ 2 minutes to read

I wanted to aggregate logs from a few different servers I'm responsible for; some in the cloud, some on my own network. Since it's all personal stuff, I didn't want to pay for Splunk or similar to do the ingestion/log aggregation, so opted for Grafana Loki as a self-hosted solution.

Components

Grafana Loki is actually a few different components:

  1. Loki - responsible for ingesting, managing and querying logs
  2. Grafana - querying/dashboarding for logs (by calling Loki with queries and processing results)
  3. Promtail - gathers logs from various sources and sends them to Loki

Loki doesn't care what shape the logs are in - it just deals with the storage and querying of those strings. It's also architected as such that it supports different storage backends, and the example Docker Compose scripts use a mock Amazon S3 blob store.

Log Patterns

Log patterns are only relevant when querying logs - Loki processes log messages as strings, meaning that it's actually in Grafana that you bring meaning to them using a pattern to extract labels. The patterns below are the ones I use to query various log sources.

Standard Nginx Log Pattern

This is the standard, out of the box Nginx log pattern.

pattern `<ip> - - <_> "<method> <uri> <_>" <status> <size> "<_>" "<agent>"`

Nginx Proxy Manager Log Format

The nginx.conf for the Nginx Proxy Manager defines a "proxy" format.

pattern `[<_>] <_> <upstream_status> <status> - <request_method> <scheme> <host_header> "<request_uri>" [Client <remote_addr>] [Length <body_bytes_sent>] [Gzip <_>] [Sent-to <_>] "<http_user_agent>" "<_>"`

Amazon CloudFront Logs

See Standard log file format - Amazon CloudFront.

pattern `<date>	<time>	<edge_location>	<sc_bytes>	<c_ip>	<cs_method>	<cs_host>	<cs_uri_stem>	<sc_status>	<cs_referer>	<cs_user_agent>	<cs_uri_query>	<cs_cookie>	<edge_result_type>	<edge_request_id>	<host_header>	<cs_protocol>	<cs_bytes>	<time_taken>	<forwarded_for>	<ssl_protocol>	<ssl_cipher>	<edge_response_result_type>	<cs_protocol_version>	<fle_status>	<fle_encrypted_fields>	<c_port>	<time_to_first_byte>	<edge_detailed_result_type>	<sc_content_type>	<sc_content_len>	<sc_range_start>	<sc_range_end>`

Example Queries

Here's a few example queries, using the log patterns above:

CloudFront: URIs and Status Codes Over Time

sum by (uri) (
    count_over_time(
        {__aws_log_type="s3_cloudfront"} | pattern `<date>	<time>	<edge_location>	<sc_bytes>	<cs_ip>	<cs_method>	<cs_host>	<cs_uri_stem>	<sc_status>	<cs_referer>	<cs_user_agent>	<cs_uri_query>	<cs_cookie>	<edge_result_type>	<edge_request_id>	<host_header>	<cs_protocol>	<cs_bytes>	<time_taken>	<forwarded_for>	<ssl_protocol>	<ssl_cipher>	<edge_response_result_type>	<cs_protocol_version>	<fle_status>	<fle_encrypted_fields>	<c_port>	<time_to_first_byte>	<edge_detailed_result_type>	<sc_content_type>	<sc_content_len>	<sc_range_start>	<sc_range_end>`
        | label_format uri=`{{ .cs_method }} {{ .cs_protocol }}://{{ .host_header }}{{ .cs_uri_stem }} ({{ .sc_status }})`
        [$__interval]
  )
)

CloudFront: Airport Codes Over Time

sum by (airport) (
    count_over_time(
        {__aws_log_type="s3_cloudfront"} | pattern `<date>	<time>	<edge_location>	<sc_bytes>	<c_ip>	<cs_method>	<cs_host>	<cs_uri_stem>	<sc_status>	<cs_referer>	<cs_user_agent>	<cs_uri_query>	<cs_cookie>	<edge_result_type>	<edge_request_id>	<host_header>	<cs_protocol>	<cs_bytes>	<time_taken>	<forwarded_for>	<ssl_protocol>	<ssl_cipher>	<edge_response_result_type>	<cs_protocol_version>	<fle_status>	<fle_encrypted_fields>	<c_port>	<time_to_first_byte>	<edge_detailed_result_type>	<sc_content_type>	<sc_content_len>	<sc_range_start>	<sc_range_end>`
        | label_format airport=`{{ regexReplaceAll "([A-Z]+)([0-9]+)-(.*)" .edge_location "$1" }}`
        [$__interval]
    )
)

CloudFront: Average Time to First Byte Over Time

avg by (host_header) (
    avg_over_time({__aws_log_type="s3_cloudfront"}
    | pattern `<date>	<time>	<edge_location>	<sc_bytes>	<c_ip>	<cs_method>	<cs_host>	<cs_uri_stem>	<sc_status>	<cs_referer>	<cs_user_agent>	<cs_uri_query>	<cs_cookie>	<edge_result_type>	<edge_request_id>	<host_header>	<cs_protocol>	<cs_bytes>	<time_taken>	<forwarded_for>	<ssl_protocol>	<ssl_cipher>	<edge_response_result_type>	<cs_protocol_version>	<fle_status>	<fle_encrypted_fields>	<c_port>	<time_to_first_byte>	<edge_detailed_result_type>	<sc_content_type>	<sc_content_len>	<sc_range_start>	<sc_range_end>`
    | unwrap time_to_first_byte [$__auto])
)

CloudFront: Bytes Transferred By Edge Result Type Over Time

sum by (edge_detailed_result_type) (
    sum_over_time (
        {__aws_log_type="s3_cloudfront"} | pattern `<date>	<time>	<edge_location>	<sc_bytes>	<c_ip>	<cs_method>	<cs_host>	<cs_uri_stem>	<sc_status>	<cs_referer>	<cs_user_agent>	<cs_uri_query>	<cs_cookie>	<edge_result_type>	<edge_request_id>	<host_header>	<cs_protocol>	<cs_bytes>	<time_taken>	<forwarded_for>	<ssl_protocol>	<ssl_cipher>	<edge_response_result_type>	<cs_protocol_version>	<fle_status>	<fle_encrypted_fields>	<c_port>	<time_to_first_byte>	<edge_detailed_result_type>	<sc_content_type>	<sc_content_len>	<sc_range_start>	<sc_range_end>`
        | label_format total_bytes=`{{ add .sc_bytes .cs_bytes }}`
        | unwrap total_bytes
        [$__interval]
  )
)

CloudFront: Specific URL Parts Over Time

sum by (post) (
    count_over_time(
        {__aws_log_type="s3_cloudfront"} | pattern `<date>	<time>	<edge_location>	<sc_bytes>	<c_ip>	<cs_method>	<cs_host>	<cs_uri_stem>	<sc_status>	<cs_referer>	<cs_user_agent>	<cs_uri_query>	<cs_cookie>	<edge_result_type>	<edge_request_id>	<host_header>	<cs_protocol>	<cs_bytes>	<time_taken>	<forwarded_for>	<ssl_protocol>	<ssl_cipher>	<edge_response_result_type>	<cs_protocol_version>	<fle_status>	<fle_encrypted_fields>	<c_port>	<time_to_first_byte>	<edge_detailed_result_type>	<sc_content_type>	<sc_content_len>	<sc_range_start>	<sc_range_end>`
        | host_header = "alanedwardes.com"
        | cs_uri_stem =~ "/blog/posts/(.*)/"
        | label_format post=`{{ .cs_method }} {{ regexReplaceAll "/blog/posts/(.*)/" .cs_uri_stem "${1}" }} ({{ .sc_status }})`
        | sc_status >= 200 and sc_status < 400
        [$__interval]
    )
)

Promtail on Lambda

To allow CloudFront logs to be piped from Amazon S3 into Loki, there is a handy CloudFormation template which uses a Docker container hosted on AWS Lambda. There are a few steps to get this working however, documented below:

Mirroring the Docker Image on a Private Registry

Unfortunately, AWS Lambda functions can only use Docker image URIs hosted on an ECR private registry in the same region as the function, meaning we must mirror the official image. First, pull the latest lambda-promtail image:

docker pull public.ecr.aws/grafana/lambda-promtail:main

Ensure you have created a private ECR repo, and authenticated your machine. First, tag the lambda-promtail docker image with your private ECR repo URI:

docker tag public.ecr.aws/grafana/lambda-promtail:main <aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<ecr-repo-name>:<your-tag>

Push the image to your private registry:

docker push <aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<ecr-repo-name>:<your-tag>

Enable S3 Bucket Events to EventBridge

Under the "Properties" tab for the S3 Bucket in the AWS console, there is an option to enable events for AWS EventBridge. This needs to be enabled.

🏷️ log loki logs cloudfront image grafana docker nginx private querying query amazon s3 patterns aws

⬅️ Previous post: Optimising Rocket Chat Content Delivery with CloudFront

🎲 Random post: Graphing Sensor Data from a Raspberry Pi with Grafana

Comments

Please click here to load comments.