Single log statement per API call — the good, bad and the next steps

Anant Agarwal
4 min readFeb 24, 2022

# Why do we log?

> Bruce Wayne : Why do we log Alfred?

> Alfred Pennyworth : So that we can learn to pick ourselves up.

At Standard Chartered nexus we take our superheroes pretty seriously 😉

Logging is the act of keeping events that occur in an application, messages are either written to a file (referred to as a log file) or streamed. Such messages are typically unstructured and are composed of info, warnings, and errors.

When we think of logs, the first thing that comes to our mind is below.

192.168.1.6 — — [15/Feb/2022:08:56:43 +0000] “GET /index.html HTTP/1.1” 200 43 “-” “system”192.168.1.101 — — [15/Feb/2022:08:56:45 +0000] “GET /status.html HTTP/1.1” 200 4 “-” “probe”

Writing a good log message is an art in itself.

When debugging an issue it can consume lot of developer’s time and it might be harder to get to the root cause, if the log messages are cryptic, unstructured, and misses relevant data supporting it.

Seldom, one does read the logs until there is an issue that needs to be identified or dealt with.

Although there are enough software available today that does log processing. But at the end of the day when something goes wrong; it is a human who reads the logs and makes sense of what’s going on, with the necessary support of the tools in hand. The application log in essence is for both human and machine consumption.

Logs can be technical in nature and do miss the business context in which the messages were logged.

Let’s take an example of a REST API call made to the system, it may go through a filter, controller, service, and probably persistent layer or call another API or publish an event.

High level trace flow

A lot is going on when the request traverses through these different layers. In a good day we may have around 10–30 info log lines depending on the use-case. Typically, we use some form of trace-id to stitch these logs together and using this trace-id we can have these log lines appear in chronological order using our favourite logging tool.

There are a few issues with this approach

  • Common Information is repeated in each of these logs, which results in unnecessary storage and bandwidth.
  • Going through these logs takes time and creating dashboards of such request makes it harder if we want to get some business sense.

At Standard Chartered nexus, we wanted to have a good mix of both human and machine-readable logs that made sense in terms of answering both technical and business questions.

The question that we asked ourselves, why can’t we have a single log line per request that would have all the relevant information that we wanted to use for debugging and have dashboards that would support it.

That’s when Standard Chartered nexus Tracer was born.

Each request can be modelled as a single log and should answer -

  • who and when triggered the request
  • technical info on the request; request here could be an HTTP, gRPC, Events, Camunda workflow
  • the business context to which this request was made
  • what business steps were executed, and the time taken
  • response code along with stack trace if any
  • errors and their categories
  • response time for the request to complete

Having a single log file means

  • maintaining state in memory, till the request completes its journey either in success state or error state. We achieved this using commons set of web-filters.
  • providing dedicated nexus-tracer api’s for dev’s to use and fill in the necessary data.

Ended up with the structure below, as you can see a single json gives us a snapshot of what actually happened along the request path. We can easily extend this model to add more contextual data.

"tracer": {
"exec": {
"traceId": "a7116967f5085046",
"spanId": "a7116967f5085046",
"type": "API",
"userId": "Nexus-user-11"
},
"req": {
"resource": "/v1/chargeoff/21",
"version": "v1",
"httpMethod": "POST",
"startTime": 1615261486743
},
"actn": {
"name": "Initializing bpm charge-off process",
"steps": [
{
"idx": 1,
"msg": "Sending request to initiate bpm"
},
{
"idx": 2,
"msg": "Validating request parameters [21]"
},
{
"idx": 3,
"msg": "Committing process to database"
}
],
"outcome": "SUCCESS"
},
"resp": {
"status": 200,
"responseTime": 251
}
}

# Good Parts

  • Single JSON log entry that answers the above-mentioned queries.
  • Easy dashboards that give good technical and business overview.
  • Easy for dev’s to debug issues, as they can easily see what steps were executed before erroring out.
  • Dev’s can fine tune the business steps that take more time.
  • Redundant logs were removed, hence improving our bandwidth and storage usage.
  • We successfully deployed this solution to production.

# Bad Parts

  • Well maintaining state does come at a cost and cost here is memory & cpu usage.
  • Marshalling of object to json is an expensive operation.

# Next steps

  • Continue to optimize the logs and remove redundancy.
  • Finding the right balance of log info that is performant, concise and serves the purpose.

--

--