One of our products, Hermes is an audit logs service. Currently, Hermes is in the prototype phase and uses a Go REST API server to ingest audit logs and send them to Loki.
We were trying out different databases, ingesters & tools to see which are best suited for Hermes and should be able to scale with high traffic without losing a single audit log & which can search through high amount data efficiently.
We decided to benchmark different combinations of ingesters (Vector, Fluentd, Fluent-Bit, etc.) and storage & query tools (Mongodb, Clickhouse, Elasticsearch, etc.).
The first round of benchmarks will be lightweight and extensive benchmarks will follow later once we pick the right tools for Hermes.
The following tests and benchmarks have been performed on a MacBook Pro (14-inch, 2021) with Apple M1 Pro and 16 GB RAM, the tools to be tested were dockerized with docker desktop running with 4 GB Memory, 4 CPUs & 1 GB Swap.
Fluent Bit is a super-fast, lightweight, highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. Source: fluent-bit website Clickhouse is the fastest OLAP database on earth. ClickHouse works 100–1000x faster than traditional approaches. Companies like Uber, Cloudflare, Spotify, and eBay use Clickhouse. Source: Clickhouse website
So few pointers before we go ahead,
Fluent-bit is fast at ingesting logs/data, processing them, and sending them to a destination.
Clickhouse is efficient at handling and querying data.
Fluent-bit does not support Clickhouse by default.
The fluent-bit ecosystem lets users write their plugins in Golang and add additional support required.
For faster querying in Clickhouse, an efficient table schema with indexes, compression, etc. should be established.
Clickhouse plugin for fluent-bit
I developed a fluent-bit output plugin for Clickhouse.
This config makes fluent-bit ingest data via HTTP server listening on port 8888 and sends the data to Clickhouse with configuration stated.
I ramped up the number of concurrent requests/queries by modifying the config.xml. After multiple tests, I finalized the following config.
Load testing tool
I developed a load testing tool with Node.js that can be used to benchmark REST API-based endpoints of Fluent-bit.
Another tool to load test is the querying part of Clickhouse.
These results are dependent on the ram allocated to the Docker engine, in my case, it's(4 GiBs).
Fluent-bit can handle loads up to 2000 req/sec but in the case of bigger batches, the speed goes down drastically. (200 X 10) & (300 X 10)
In the case of long-term light batches, Fluent-bit performs consistently. (10 X 1000)
Fluent-bit performs at average speeds in the case of average loads (50 X 50).
Clickhouse shows the best req/sec performance with an average load (50 X 50).
Also, Clickhouse's performance was pretty satisfactory for all the different variations of records in DB. (1.1 mils, 50k, 25k, 10k, 2k & 1k).
Clickhouse was able to manage short-term high loads and long-term light loads efficiently. (100 X 10) and (10 X 5000).
We will be posting more blogs regarding benchmarks, tools, etc., as we go on to build Hermes and many other dev tools. Please leave comments below.