Prometheus Metrics with
SpringBoot + GRPC Services

topher-lamey

Topher Lamey|July 30, 2020

SpringBoot has lots of great built-in Micrometer support for RestControllers that allows you to expose useful metrics via the Prometheus Actuator. We make use of those for our REST-based Edge services and are able to do cool things around monitoring and alerting.

However, all of our internal services use LogNet’s awesome SpringBoot GRPC library to communicate but there’s no native Micrometer support.  GRPC itself does have internal metrics but they aren’t yet exposed to Spring in that GRPC library.  Since we are a tiny startup with limited resources, we did some simple things to get Micrometer hooked up to our GRPC services for some basic metrics.

Add Automated Security Testing to Your Pipeline

Micrometer Setup

Our Micrometer setup was to include the dependency in our service’s build file:

implementation("io.micrometer:micrometer-registry-prometheus")

And since these are internal services, we exposed everything:

YAML
management:
 endpoints:
   web:
     exposure:
       include: "*"

Then for every service, we have the HTTP endpoints $HOST:$PORT/actuator/metrics and $HOST:$PORT/actuator/prometheus available for use.

Prometheus Configuration

We run things in Kubernetes, so we first add the following annotations to our service pods to make them discoverable by Prometheus.

YAML
metadata:
  annotations: 
    prometheus.io/scrape: "true"
    prometheus.io/path: "/actuator/prometheus"
    prometheus.io/port: "<port>"

And we add the following job to Prometheus Server’s prometheus.yml to discover and scrape pods.

YAML
scrape_config:
  - job_name: kubernetes-pods
	kubernetes_sd_configs:
  	- role: pod
	relabel_configs:
  	- action: keep
    	regex: true
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_scrape
  	- action: replace
    	regex: (.+)
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_path
    	target_label: __metrics_path__
  	- action: replace
    	regex: ([^:]+)(?::\d+)?;(\d+)
    	replacement: $1:$2
    	source_labels:
      	- __address__
      	- __meta_kubernetes_pod_annotation_prometheus_io_port
    	target_label: __address__
  	- action: labelmap
    	regex: __meta_kubernetes_pod_label_(.+)
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_namespace
    	target_label: kubernetes_namespace
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_pod_name
    	target_label: kubernetes_pod_name

This job is already included by default with the Prometheus Helm chart.

Method Timings

We went with the standard Spring/Micrometer generic method timing approach for this.  The upside was that it was trivial to implement, but the downside is that we have to remember to annotate each GRPC method.

In a @Configuration class, we added a TimedAspect bean:

Kotlin
@Bean
fun timedAspect(registry: MeterRegistry): TimedAspect {
   return TimedAspect(registry)
}

And then for every GRPC call, we throw on a @Timed annotation.

Kotlin
@Timed
override fun getFoo(request: FooService.GetFooRequest,
                               responseObserver: StreamObserver<FooService.FooResponse>) {
[...]
}

This adds then adds the GRPC method metrics to the Prometheus actuator under the /actuator/prometheus endpoint:

Kotlin
# HELP method_timed_seconds  
# TYPE method_timed_seconds summary
method_timed_seconds_count{class="com.stackhawk.FooService",exception="none",method="createFoo",} 3.0
method_timed_seconds_sum{class="com.stackhawk.Foo",exception="none",method="createFoo",} 0.0344318
# HELP method_timed_seconds_max  
# TYPE method_timed_seconds_max gauge
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="createFoo",} 0.0272329
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="updateFoo",} 0.0181494

With that getting pulled into Prometheus, we can then do things like get the average length per GRPC call using PromQL like so:

Kotlin
rate(method_timed_seconds_sum[1m]) / rate(method_timed_seconds_count[1m])

Exception Metrics

For this, we decided to hook in a Micrometer registry counter into our existing generic GRPC exception handler, which lives in an internal shared library that all GRPC services automatically pull in via our common Gradle platform.

All we did here was to add the MeterRegistry to the constructor, so it gets set by the Spring context.  Then we use that MeterRegistry instance to increment a counter with the full class name as a Tag in the catch block.

Kotlin
class GlobalGrpcExceptionHandler(private val registry: MeterRegistry? = null) : ServerInterceptor {

   private val logger: Logger = LoggerFactory.getLogger(GlobalGrpcExceptionHandler::class.java)

   override fun <ReqT : Any?, RespT : Any?> interceptCall(call: ServerCall<ReqT, RespT>?, headers: Metadata?, next: ServerCallHandler<ReqT, RespT>?): ServerCall.Listener<ReqT> {
       val delegate = next?.startCall(call, headers)
       return object : ForwardingServerCallListener.SimpleForwardingServerCallListener<ReqT>(delegate) {
           override fun onHalfClose() {
               try {
                   super.onHalfClose()
               } catch (e: Exception) {
                   registry?.counter("grpc.exception.counter", Tags.of("type", e.javaClass.canonicalName))?.increment()

                   logger.error(e.message, e)
                   call?.close(Status.INTERNAL
                           .withCause(e)
                           .withDescription(e.message), Metadata())
               }
           }
       }
   }
}

Then each service gets the context’s MeterRegistry autowired into a config constructor and just sets it on the exception handler bean:

Kotlin
@Configuration
class FooConfig(private val meterRegistry: MeterRegistry) {

	@Bean
	@GRpcGlobalInterceptor
	fun globalGrpcExceptionHandler(): GlobalGrpcExceptionHandler {
  		 return GlobalGrpcExceptionHandler(meterRegistry)
	}
}

With those in place, the /actuator/prometheus endpoint now has a new counter with the full class name of the exception as a tag:

Kotlin
# HELP grpc_exception_counter_total  
# TYPE grpc_exception_counter_total counter
grpc_exception_counter_total{type="software.amazon.awssdk.core.exception.SdkClientException",} 1.0

Which in PromQL then lets you do stuff like:

Kotlin
rate(grpc_exception_counter_total[1m])

Find and Fix Security Vulnerabilities


Topher Lamey  |  July 30, 2020