Quarkus logging splunk
Introduction
Splunk is a middleware solution that receives, stores, indexes and finally allows to exploit the logs of an application.
This Quarkus extension provides the support of the official Splunk client library to index log events through the HTTP Event collection, provided by Splunk enterprise solution.
Installation
If you want to use this extension, you need to add the quarkus-logging-splunk
extension first.
In your pom.xml
file, add:
<dependency>
<groupId>io.quarkiverse.logging.splunk</groupId>
<artifactId>quarkus-logging-splunk</artifactId>
</dependency>
Features
The extension can be used transparently with any log frontend used by Quarkus (Log4j, SLF4J, … ).
Log message formatting
In all cases the log message formatter is aligned by default with the one of Quarkus console handler:
quarkus.log.handler.splunk.format="%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p [%c{3.}] (%t) %s%e%n"
This can be adapted in order to avoid duplication with metadata that are passed in a structured way.
Log event metadata
The type of metadata depends on the serialization format.
If quarkus.log.handler.splunk.raw
is enabled or quarkus.log.handler.splunk.serialization
is raw
, there are no per-event metadata.
Only few global metadata shared between all events of a batch are sent via HTTP headers and query parameters.
In other cases, the extension uses structured logging, via JSON serialization. There are two supported structured formats:
-
The
nested
serialization is the default format of Splunk HEC Java client and defines the name of some pre-defined metadata. Combined withquarkus.log.handler.splunk.format=%s%e
it also support log messages that are themselves JSON. -
The
flat
serialization is a simpler and more generic format, also used by the OpenTelemetry Splunk HEC exporter.
Some metadata can be indexed by Splunk, see indexed fields.
The default _json
source type indexes metadata passed in the fields
object.
The extension provides the support of the resolution of MDC scoped properties, as defined in JBoss supported formatters.
Serialization format | nested |
flat |
---|---|---|
HEC metadata |
|
|
Pre-defined metadata |
Only
|
Only
|
MDC properties |
Passed via |
Passed via |
Static metadata |
Passed via |
A structured query to Splunk HEC looks like:
curl -k -v -X POST https://localhost:8080/services/collector/event/1.0 -H "Content-type: application/json; profile=\"urn:splunk:event:1.0\"; charset=utf-8" -H "Authorization: Splunk 29fe2838-cab6-4d17-a392-37b7b8f41f75" -d@events.json
{
"time": "1673001538.042",
"host": "hostname",
"source": "mysource",
"sourcetype": "_json",
"index": "main",
"event": {
"message": "2023-01-06 ERROR The log message",
"logger": "com.acme.MyClass",
"severity": "ERROR",
"exception": "java.lang.NullPointerException",
"properties": {
"mdc-key": "mdc-value"
}
},
"fields": {
"key": "static-value"
}
}
{
"time": "1673001538.042",
"host": "hostname",
"source": "mysource",
"index": "main",
"event": "2023-01-06 ERROR The log message",
"fields": {
"severity": "ERROR",
"mdc-key": "mdc-value",
"key": "static-value"
}
}
Connectivity failures
Batched events that cannot be sent to the Splunk indexer will be logged to stdout:
-
Formatted using console handler settings if the console handler is enabled
-
Formatted using splunk handler settings otherwise
In any case, the root cause of the failure is always logged to stderr.
Asynchronous handler
By default, the log handler is synchronous and only the HTTP requests to HEC endpoint are done asynchronously:
This can be an issue because the Splunk library #send
is synchronized, so any preprocessing of the batch HTTP request itself happens on the application thread of the log event that triggered the batch to be full (either by reaching quarkus.log.handler.splunk.batch-size-count
or quarkus.log.handler.splunk.batch-size-bytes
)
By enabling quarkus.log.handler.splunk.async=true
, an intermediate event queue is used, which decouples the flushing of the batch from any application thread:
By default quarkus.log.handler.splunk.async.queue-length=block
, so applicative threads will block once the queue limit has reached quarkus.log.handler.splunk.async.queue-length
.
There’s no link between quarkus.log.handler.splunk.async.queue-length
and quarkus.log.handler.splunk.batch-size-count
.
Sequential and parallel modes
The number of events kept in memory for batching purposes is not limited.
After tuning quarkus.log.handler.splunk.batch-size-count
and quarkus.log.handler.splunk.batch-size-bytes
, in case the HEC endpoint cannot keep up with the batch throughput, using multiple HTTP connections might help to reduce memory usage on the client.
By setting quarkus.log.handler.splunk.send-mode=parallel
multiple batches will be sent over the wire in parallel, potentially increasing throughput with the HEC endpoint.
Extension Configuration Reference
This extension follows the log handlers
configuration domain that is defined by Quarkus, every configuration property of this extension will belong to the following configuration root : quarkus.log.handler.splunk
When present this extension is enabled by default, meaning the client would expect a valid connection to a Splunk indexer and would print an error message for every log created by the application.
So in local environment, the log handler can be disabled with the following property :
quarkus.log.handler.splunk.enabled=false
Every configuration property of the extension is overridable at runtime.
Configuration property fixed at build time - All other configuration properties are overridable at runtime
Type |
Default |
|
---|---|---|
Determine whether to enable the handler Environment variable: |
boolean |
|
The splunk handler log level. By default, it is no more strict than the root handler level. Environment variable: |
|
|
Splunk HEC endpoint base url. With raw events, the endpoint targeted is /services/collector/raw. With flat or nested JSON events, the endpoint targeted is /services/collector/event/1.0. Environment variable: |
string |
|
Disable TLS certificate validation with HEC endpoint Environment variable: |
boolean |
|
The application token to authenticate with HEC, the token is mandatory if the extension is enabled https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#HEC_token Environment variable: |
string |
|
The strategy to send events to HEC. In sequential mode, there is only one HTTP connection to HEC and the order of events is preserved, but performance is lower. In parallel mode, event batches are sent asynchronously over multiple HTTP connections, and events with the same timestamp (that has 1 millisecond resolution) may be indexed out of order by Splunk. Environment variable: |
|
|
A GUID to identify an HEC client and guarantee isolation at HEC level in case of slow clients. https://docs.splunk.com/Documentation/Splunk/latest/Data/AboutHECIDXAck#About_channels_and_sending_data Environment variable: |
string |
|
Batching delay before sending a group of events. If 0, the events are sent immediately. Environment variable: |
|
|
Maximum number of events in a batch. By default 10, if 0 no batching. Environment variable: |
long |
|
Maximum total size in bytes of events in a batch. By default 10KB, if 0 no batching. Environment variable: |
long |
|
Maximum number of retries in case of I/O exceptions with HEC connection. Environment variable: |
long |
|
The log format, defining which metadata are inlined inside the log main payload. Specific metadata (hostname, category, thread name, …), as well as MDC key/value map, can also be sent in a structured way. Environment variable: |
string |
|
Whether to send the thrown exception message as a structured metadata of the log event (as opposed to %e in a formatted message, it does not include the exception name or stacktrace). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Whether to send the logger name as a structured metadata of the log event (equivalent of %c in a formatted message). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Whether to send the thread name as a structured metadata of the log event (equivalent of %t in a formatted message). Only applicable to 'nested' serialization. Environment variable: |
boolean |
|
Overrides the host name metadata value. Environment variable: |
string |
|
The source value to assign to the event data. For example, if you’re sending data from an app you’re developing, you could set this key to the name of the app. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
string |
|
The optional format of the events, to enable some parsing on Splunk side. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata A given source type may have indexed fields extraction enabled, which is the case of the built-in _json used for nested serialization. Environment variable: |
string |
|
The optional name of the index by which the event data is to be stored. If set, it must be within the list of allowed indexes of the token (if it has the indexes parameter set). https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
string |
|
The name of the key used to convey the severity / log level in the metadata fields. Only applicable to 'flat' serialization. With 'nested' serialization, there is already a 'severity' field. Environment variable: |
string |
|
The format of the payload. - With raw serialization, the log message is sent 'as is' in the HTTP body. Metadata can only be common to a whole batch and are sent via HTTP parameters. - With nested serialization, the log message is sent into a 'message' field of a JSON structure which also contains dynamic metadata. - With flat serialization, the log message is sent into the root 'event' field. Dynamic metadata is sent via the 'fields' root object. Environment variable: |
|
|
Indicates whether to log asynchronously Environment variable: |
boolean |
|
The queue length to use before flushing writing Environment variable: |
int |
|
Determine whether to block the publisher (rather than drop the message) when the queue is full Environment variable: |
|
|
Optional static key/value pairs to populate the "fields" key of event metadata. This isn’t applicable to raw serialization. https://docs.splunk.com/Documentation/Splunk/latest/Data/FormateventsforHTTPEventCollector#Event_metadata Environment variable: |
|
About the Duration format
The format for durations uses the standard You can also provide duration values starting with a number.
In this case, if the value consists only of a number, the converter treats the value as seconds.
Otherwise, |