Prometheus Metrics with SpringBoot + GRPC Services

Topher Lamey   |   Jul 30, 2020

LinkedIn
X (Twitter)
Facebook
Reddit
Subscribe To StackHawk Posts

However, all of our internal services use LogNet’s awesome SpringBoot GRPC library to communicate but there’s no native Micrometer support. GRPC itself does have internal metrics but they aren’t yet exposed to Spring in that GRPC library. Since we are a tiny startup with limited resources, we did some simple things to get Micrometer hooked up to our GRPC services for some basic metrics.

Blog Banner - Add Automated Security Testing to Your Pipeline (With Button)

Micrometer Setup

Our Micrometer setup was to include the dependency in our service’s build file:

implementation("io.micrometer:micrometer-registry-prometheus")

And since these are internal services, we exposed everything:

management:
 endpoints:
   web:
     exposure:
       include: "*"

Then for every service, we have the HTTP endpoints$HOST:$PORT/actuator/metrics
and$HOST:$PORT/actuator/prometheus
available for use.

Prometheus Configuration

We run things in Kubernetes, so we first add the following annotations to our service pods to make them discoverable by Prometheus.

metadata:
  annotations: 
    prometheus.io/scrape: "true"
    prometheus.io/path: "/actuator/prometheus"
    prometheus.io/port: "<port>"

And we add the following job to Prometheus Server’s prometheus.yml to discover and scrape pods.

scrape_config:
  - job_name: kubernetes-pods
	kubernetes_sd_configs:
  	- role: pod
	relabel_configs:
  	- action: keep
    	regex: true
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_scrape
  	- action: replace
    	regex: (.+)
    	source_labels:
      	- __meta_kubernetes_pod_annotation_prometheus_io_path
    	target_label: __metrics_path__
  	- action: replace
    	regex: ([^:]+)(?::d+)?;(d+)
    	replacement: $1:$2
    	source_labels:
      	- __address__
      	- __meta_kubernetes_pod_annotation_prometheus_io_port
    	target_label: __address__
  	- action: labelmap
    	regex: __meta_kubernetes_pod_label_(.+)
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_namespace
    	target_label: kubernetes_namespace
  	- action: replace
    	source_labels:
      	- __meta_kubernetes_pod_name
    	target_label: kubernetes_pod_name

This job is already included by default with the Prometheus Helm chart .

Method Timings

We went with the standard Spring/Micrometer generic method timing approach for this. The upside was that it was trivial to implement, but the downside is that we have to remember to annotate each GRPC method.

In a@Configuration
class, we added aTimedAspect
bean:
@Bean
fun timedAspect(registry: MeterRegistry): TimedAspect {
   return TimedAspect(registry)
}

And then for every GRPC call, we throw on a@Timed
annotation.
@Timed
override fun getFoo(request: FooService.GetFooRequest,
                               responseObserver: StreamObserver<FooService.FooResponse>) {
[...]
}

This adds then adds the GRPC method metrics to the Prometheus actuator under the/actuator/prometheus
endpoint:

# HELP method_timed_seconds  
# TYPE method_timed_seconds summary
method_timed_seconds_count{class="com.stackhawk.FooService",exception="none",method="createFoo",} 3.0
method_timed_seconds_sum{class="com.stackhawk.Foo",exception="none",method="createFoo",} 0.0344318
# HELP method_timed_seconds_max  
# TYPE method_timed_seconds_max gauge
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="createFoo",} 0.0272329
method_timed_seconds_max{class="com.stackhawk.FooService",exception="none",method="updateFoo",} 0.0181494

With that getting pulled into Prometheus, we can then do things like get the average length per GRPC call using PromQL like so:

rate(method_timed_seconds_sum[1m]) / rate(method_timed_seconds_count[1m])

Exception Metrics

For this, we decided to hook in a Micrometer registry counter into our existing generic GRPC exception handler, which lives in an internal shared library that all GRPC services automatically pull in via our common Gradle platform .

All we did here was to add theMeterRegistry
to the constructor, so it gets set by the Spring context. Then we use thatMeterRegistry
instance to increment a counter with the full class name as aTag
in the catch block.

class GlobalGrpcExceptionHandler(private val registry: MeterRegistry? = null) : ServerInterceptor {

   private val logger: Logger = LoggerFactory.getLogger(GlobalGrpcExceptionHandler::class.java)

   override fun <ReqT : Any?, RespT : Any?> interceptCall(call: ServerCall<ReqT, RespT>?, headers: Metadata?, next: ServerCallHandler<ReqT, RespT>?): ServerCall.Listener<ReqT> {
       val delegate = next?.startCall(call, headers)
       return object : ForwardingServerCallListener.SimpleForwardingServerCallListener<ReqT>(delegate) {
           override fun onHalfClose() {
               try {
                   super.onHalfClose()
               } catch (e: Exception) {
                   registry?.counter("grpc.exception.counter", Tags.of("type", e.javaClass.canonicalName))?.increment()

                   logger.error(e.message, e)
                   call?.close(Status.INTERNAL
                           .withCause(e)
                           .withDescription(e.message), Metadata())
               }
           }
       }
   }
}

Then each service gets the context’s MeterRegistry autowired into a config constructor and just sets it on the exception handler bean:

@Configuration
class FooConfig(private val meterRegistry: MeterRegistry) {

	@Bean
	@GRpcGlobalInterceptor
	fun globalGrpcExceptionHandler(): GlobalGrpcExceptionHandler {
  		 return GlobalGrpcExceptionHandler(meterRegistry)
	}
}

With those in place, the/actuator/prometheus
endpoint now has a new counter with the full class name of the exception as a tag:
# HELP grpc_exception_counter_total  
# TYPE grpc_exception_counter_total counter
grpc_exception_counter_total{type="software.amazon.awssdk.core.exception.SdkClientException",} 1.0

Which in PromQL then lets you do stuff like:

rate(grpc_exception_counter_total[1m])

Blog Banner - Find and Fix Security Vulnerabilities Banner

FEATURED POSTS

What is Cloud API Security? A Complete Guide

Discover essential strategies for cloud API security: Learn about data encryption, authentication mechanisms, and how to combat common threats like injection attacks and broken access control. Get tips on secure coding practices, traffic management, and choosing the right security solutions for your cloud environment.

Security Testing for the Modern Dev Team

See how StackHawk makes web application and API security part of software delivery.

Watch a Demo

StackHawk provides DAST & API Security Testing

Get Omdia analyst’s point-of-view on StackHawk for DAST.

"*" indicates required fields

More Hawksome Posts

Get Hands-on Experience.
Give Us a Test Drive!

We know you might want to test drive a full version of security software before you talk to us. So, Get It On!