Quarkus logging splunk

Introduction

Splunk is a middleware solution that receives, stores, indexes and finally allows to exploit the logs of an application.

This Quarkus extension provides the support of the official Splunk client library to index log events through the HTTP Event collection, provided by Splunk enterprise solution.

  • The official client is an opensource library available here.

  • The documentation of HTTP Event collection can be found here.

Installation

If you want to use this extension, you need to add the quarkus-logging-splunk extension first. In your pom.xml file, add:

<dependency>
    <groupId>io.quarkiverse.logging.splunk</groupId>
    <artifactId>quarkus-logging-splunk</artifactId>
</dependency>

Features

The extension can be used transparently with any log frontend used by Quarkus (Log4j, SLF4J, …​ ).

Log message formatting

In all cases the log message formatter is aligned by default with the one of Quarkus console handler:

quarkus.log.handler.splunk.format="%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] (%t) %s%e%n"

This can be adapted in order to avoid duplication with metadata that are passed in a structured way.

Log event metadata

The type of metadata depends on the serialization format.

If quarkus.log.handler.splunk.raw is enabled or quarkus.log.handler.splunk.serialization is raw, there are no per-event metadata. Only few global metadata shared between all events of a batch are sent via HTTP headers and query parameters.

In other cases, the extension uses structured logging, via JSON serialization. There are two supported structured formats:

  • The nested serialization is the default format of Splunk HEC Java client and defines the name of some pre-defined metadata. Combined with quarkus.log.handler.splunk.format=%s%e it also support log messages that are themselves JSON.

  • The flat serialization is a simpler and more generic format, also used by the OpenTelemetry Splunk HEC exporter.

Some metadata can be indexed by Splunk, see indexed fields. The default _json source type indexes metadata passed in the fields object.

The extension provides the support of the resolution of MDC scoped properties, as defined in JBoss supported formatters.

Serialization format nested flat

HEC metadata

time and host are always sent. source, sourcetype, index are sent if not empty.

Pre-defined metadata

Only event.severity is sent by default. Other metadata can be added:

  • event.thread via quarkus.log.handler.splunk.include-thread-name

  • event.exception via quarkus.log.handler.splunk.include-exception

  • event.logger via quarkus.log.handler.splunk.include-logger-name

Only fields.severity is sent by default. The metadata name can be customized via quarkus.log.handler.splunk.metadataSeverityFieldName Other metadata can be added:

  • fields.thread via quarkus.log.handler.splunk.include-thread-name

  • fields.exception via quarkus.log.handler.splunk.include-exception

  • fields.logger via quarkus.log.handler.splunk.include-logger-name

MDC properties

Passed via event.properties

Passed via fields

Static metadata

Passed via fields

A structured query to Splunk HEC looks like:

curl -k -v -X POST https://localhost:8080/services/collector/event/1.0 -H "Content-type: application/json; profile=\"urn:splunk:event:1.0\"; charset=utf-8" -H "Authorization: Splunk 29fe2838-cab6-4d17-a392-37b7b8f41f75" -d@events.json

Nested serialization example
{
  "time": "1673001538.042",
  "host": "hostname",
  "source": "mysource",
  "sourcetype": "_json",
  "index": "main",
  "event": {
    "message": "2023-01-06 ERROR The log message",
    "logger": "com.acme.MyClass",
    "severity": "ERROR",
    "exception": "java.lang.NullPointerException",
    "properties": {
      "mdc-key": "mdc-value"
    }
  },
  "fields": {
    "key": "static-value"
  }
}
Flat serialization example
{
  "time": "1673001538.042",
  "host": "hostname",
  "source": "mysource",
  "index": "main",
  "event": "2023-01-06 ERROR The log message",
  "fields": {
    "severity": "ERROR",
    "mdc-key": "mdc-value",
    "key": "static-value"
  }
}

Connectivity failures

Batched events that cannot be sent to the Splunk indexer will be logged to stdout:

  • Formatted using console handler settings if the console handler is enabled

  • Formatted using splunk handler settings otherwise

In any case, the root cause of the failure is always logged to stderr.

Asynchronous handler

By default, the log handler is synchronous and only the HTTP requests to HEC endpoint are done asynchronously:

sync

This can be an issue because the Splunk library #send is synchronized, so any preprocessing of the batch HTTP request itself happens on the application thread of the log event that triggered the batch to be full (either by reaching quarkus.log.handler.splunk.batch-size-count or quarkus.log.handler.splunk.batch-size-bytes)

By enabling quarkus.log.handler.splunk.async=true, an intermediate event queue is used, which decouples the flushing of the batch from any application thread:

async

By default quarkus.log.handler.splunk.async.queue-length=block, so applicative threads will block once the queue limit has reached quarkus.log.handler.splunk.async.queue-length.

There’s no link between quarkus.log.handler.splunk.async.queue-length and quarkus.log.handler.splunk.batch-size-count.

Sequential and parallel modes

The number of events kept in memory for batching purposes is not limited. After tuning quarkus.log.handler.splunk.batch-size-count and quarkus.log.handler.splunk.batch-size-bytes, in case the HEC endpoint cannot keep up with the batch throughput, using multiple HTTP connections might help to reduce memory usage on the client.

By setting quarkus.log.handler.splunk.send-mode=parallel multiple batches will be sent over the wire in parallel, potentially increasing throughput with the HEC endpoint.

Extension Configuration Reference

This extension follows the log handlers configuration domain that is defined by Quarkus, every configuration property of this extension will belong to the following configuration root : quarkus.log.handler.splunk

When present this extension is enabled by default, meaning the client would expect a valid connection to a Splunk indexer and would print an error message for every log created by the application.

So in local environment, the log handler can be disabled with the following property :

quarkus.log.handler.splunk.enabled=false

Every configuration property of the extension is overridable at runtime.

Configuration property fixed at build time - All other configuration properties are overridable at runtime

Configuration property

Type

Default

Determine whether to enable the handler

boolean

true

The splunk handler log level. By default, it is no more strict than the root handler level.

Level

ALL

Splunk HEC endpoint base url. With raw events, the endpoint targeted is /services/collector/raw. With flat or nested JSON events, the endpoint targeted is /services/collector/event/1.0.

string

https://localhost:8088/

Disable TLS certificate validation with HEC endpoint

boolean

false

The application token to authenticate with HEC, the token is mandatory if the extension is enabled https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#HEC_token

string

The strategy to send events to HEC. In sequential mode, there is only one HTTP connection to HEC and the order of events is preserved, but performance is lower. In parallel mode, event batches are sent asynchronously over multiple HTTP connections, and events with the same timestamp (that has 1 millisecond resolution) may be indexed out of order by Splunk.

sequential, parallel

sequential

A GUID to identify an HEC client and guarantee isolation at HEC level in case of slow clients. https://docs.splunk.com/Documentation/Splunk/latest/Data/AboutHECIDXAck#About_channels_and_sending_data

string

Batching delay before sending a group of events. If 0, the events are sent immediately.

Duration

10S

Maximum number of events in a batch. By default 10, if 0 no batching.

long

10

Maximum total size in bytes of events in a batch. By default 10KB, if 0 no batching.

long

10

Maximum number of retries in case of I/O exceptions with HEC connection.

long

0

The log format, defining which metadata are inlined inside the log main payload. Specific metadata (hostname, category, thread name, …​), as well as MDC key/value map, can also be sent in a structured way.

string

%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] (%t) %s%e%n

Whether to send the thrown exception message as a structured metadata of the log event (as opposed to %e in a formatted message, it does not include the exception name or stacktrace). Only applicable to 'nested' serialization.

boolean

false

Whether to send the logger name as a structured metadata of the log event (equivalent of %c in a formatted message). Only applicable to 'nested' serialization.

boolean

false

Whether to send the thread name as a structured metadata of the log event (equivalent of %t in a formatted message). Only applicable to 'nested' serialization.

boolean

false

Overrides the host name metadata value.

string

The equivalent of %h in a formatted message

The source value to assign to the event data. For example, if you’re sending data from an app you’re developing, you could set this key to the name of the app. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata

string

The optional format of the events, to enable some parsing on Splunk side. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata A given source type may have indexed fields extraction enabled, which is the case of the built-in _json used for nested serialization.

string

_json for nested serialization, not set otherwise

The optional name of the index by which the event data is to be stored. If set, it must be within the list of allowed indexes of the token (if it has the indexes parameter set). https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata

string

The name of the key used to convey the severity / log level in the metadata fields. Only applicable to 'flat' serialization. With 'nested' serialization, there is already a 'severity' field.

string

severity

The format of the payload. - With raw serialization, the log message is sent 'as is' in the HTTP body. Metadata can only be common to a whole batch and are sent via HTTP parameters. - With nested serialization, the log message is sent into a 'message' field of a JSON structure which also contains dynamic metadata. - With flat serialization, the log message is sent into the root 'event' field. Dynamic metadata is sent via the 'fields' root object.

raw, nested, flat

nested

Indicates whether to log asynchronously

boolean

false

The queue length to use before flushing writing

int

512

Determine whether to block the publisher (rather than drop the message) when the queue is full

block, discard

block

Optional static key/value pairs to populate the "fields" key of event metadata. This isn’t applicable to raw serialization. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata

Map<String,String>

About the Duration format

The format for durations uses the standard java.time.Duration format. You can learn more about it in the Duration#parse() javadoc.

You can also provide duration values starting with a number. In this case, if the value consists only of a number, the converter treats the value as seconds. Otherwise, PT is implicitly prepended to the value to obtain a standard java.time.Duration format.