Prometheus Metrics with SpringBoot + GRPC Services

Topher Lamey
Topher Lamey
Share on twitter
Share on facebook
Share on linkedin
Share on reddit
Topher Lamey

Topher Lamey

Share on twitter
Share on facebook
Share on linkedin
Share on reddit

SpringBoot has lots of great built-in Micrometer support for RestControllers that allows you to expose useful metrics via the Prometheus Actuator. We make use of those for our REST-based Edge services and are able to do cool things around monitoring and alerting.

However, all of our internal services use LogNet’s awesome SpringBoot GRPC library to communicate but there’s no native Micrometer support.  GRPC itself does have internal metrics but they aren’t yet exposed to Spring in that GRPC library.  Since we are a tiny startup with limited resources, we did some simple things to get Micrometer hooked up to our GRPC services for some basic metrics.

Micrometer Setup

Our Micrometer setup was to include the dependency in our service’s build file:

implementation("io.micrometer:micrometer-registry-prometheus")

And since these are internal services, we exposed everything:

management:
 endpoints:
   web:
     exposure:
       include: "*"

Then for every service, we have the HTTP endpoints $HOST:$PORT/actuator/metrics and $HOST:$PORT/actuator/prometheus available for use.

Prometheus Configuration

We run things in Kubernetes, so we first add the following annotations to our service pods to make them discoverable by Prometheus.

metadata:
  annotations: 
    prometheus.io/scrape: "true"
    prometheus.io/path: "/actuator/prometheus"
    prometheus.io/port: "<port>"

And we add the following job to Prometheus Server’s prometheus.yml to discover and scrape pods.

scrape_config:
  - job_name: kubernetes-pods
	kubernetes_sd_configs:
  	- role: pod
	relabel_configs:
  	- action: keep
    	regex: true
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_scrape
  	- action: replace
    	regex: (.+)
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_path
    	target_label: __metrics_path__
  	- action: replace
    	regex: ([^:]+)(?::\d+)?;(\d+)
    	replacement: $1:$2
    	source_labels:
      	- __address__
      	- __meta_kubernetes_pod_annotation_prometheus_io_port
    	target_label: __address__
  	- action: labelmap
    	regex: __meta_kubernetes_pod_label_(.+)
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_namespace
    	target_label: kubernetes_namespace
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_pod_name
    	target_label: kubernetes_pod_name

This job is already included by default with the Prometheus Helm chart.

Method Timings

We went with the standard Spring/Micrometer generic method timing approach for this.  The upside was that it was trivial to implement, but the downside is that we have to remember to annotate each GRPC method.

In a @Configuration class, we added a TimedAspect bean:

@Bean
fun timedAspect(registry: MeterRegistry): TimedAspect {
   return TimedAspect(registry)
}

And then for every GRPC call, we throw on a @Timed annotation.

@Timed
override fun getFoo(request: FooService.GetFooRequest,
                               responseObserver: StreamObserver<FooService.FooResponse>) {
[...]
}

This adds then adds the GRPC method metrics to the Prometheus actuator under the /actuator/prometheus endpoint:

# HELP method_timed_seconds  
# TYPE method_timed_seconds summary
method_timed_seconds_count{class="com.stackhawk.FooService",exception="none",method="createFoo",} 3.0
method_timed_seconds_sum{class="com.stackhawk.Foo",exception="none",method="createFoo",} 0.0344318
# HELP method_timed_seconds_max  
# TYPE method_timed_seconds_max gauge
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="createFoo",} 0.0272329
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="updateFoo",} 0.0181494

With that getting pulled into Prometheus, we can then do things like get the average length per GRPC call using PromQL like so:

rate(method_timed_seconds_sum[1m]) / rate(method_timed_seconds_count[1m])

Exception Metrics

For this, we decided to hook in a Micrometer registry counter into our existing generic GRPC exception handler, which lives in an internal shared library that all GRPC services automatically pull in via our common Gradle platform.

All we did here was to add the MeterRegistry to the constructor, so it gets set by the Spring context.  Then we use that MeterRegistry instance to increment a counter with the full class name as a Tag in the catch block.

class GlobalGrpcExceptionHandler(private val registry: MeterRegistry? = null) : ServerInterceptor {

   private val logger: Logger = LoggerFactory.getLogger(GlobalGrpcExceptionHandler::class.java)

   override fun <ReqT : Any?, RespT : Any?> interceptCall(call: ServerCall<ReqT, RespT>?, headers: Metadata?, next: ServerCallHandler<ReqT, RespT>?): ServerCall.Listener<ReqT> {
       val delegate = next?.startCall(call, headers)
       return object : ForwardingServerCallListener.SimpleForwardingServerCallListener<ReqT>(delegate) {
           override fun onHalfClose() {
               try {
                   super.onHalfClose()
               } catch (e: Exception) {
                   registry?.counter("grpc.exception.counter", Tags.of("type", e.javaClass.canonicalName))?.increment()

                   logger.error(e.message, e)
                   call?.close(Status.INTERNAL
                           .withCause(e)
                           .withDescription(e.message), Metadata())
               }
           }
       }
   }
}

Then each service gets the context’s MeterRegistry autowired into a config constructor and just sets it on the exception handler bean:

@Configuration
class FooConfig(private val meterRegistry: MeterRegistry) {

	@Bean
	@GRpcGlobalInterceptor
	fun globalGrpcExceptionHandler(): GlobalGrpcExceptionHandler {
  		 return GlobalGrpcExceptionHandler(meterRegistry)
	}
}

With those in place, the /actuator/prometheus endpoint now has a new counter with the full class name of the exception as a tag:

# HELP grpc_exception_counter_total  
# TYPE grpc_exception_counter_total counter
grpc_exception_counter_total{type="software.amazon.awssdk.core.exception.SdkClientException",} 1.0

Which in PromQL then lets you do stuff like:

rate(grpc_exception_counter_total[1m])

More StackHawk
Ryan Severns
Zachary Conger
Scott Gerlach

KAAKAWW!!! [ kǝn'grats ]

The Demo Gods Approve!
We’ll reach out to you soon to schedule a 45 minute demo. Please complete this 3 minute survey so we can prepare a demo that is specific to you.

KAAKAWW!!! [ kǝn'grats ]

You're signed up for the newsletter!
We’ll keep you up to date on content and other happenings here at StackHawk.