TLDR: Beats by ElasticSearch are fantastic product. But they are really meant to solve one type of problem which was not mine here. For task, like polling an API in a loop and storing result in ElasticSearch, creation a poller from scratch is really simpler
Cryptocurrencies are everywhere today. Everyone wants to take its part of the cake. And… So did I. As a data scientist, my first reflex was to gather data about the crypto market. So, I started creating a poller for the Kraken platform. Price, volume, timestamp of each transaction of each fiat trading pair (XXX/USD and XXX/EUR) was to be polled and stored. And what a best database to store those kinds of data than ElasticSearch (further referenced as ES)?
This article is not about why I chose ES over any other database why if someone want to know:
- Easy installation (I used Docker)
- Time series indexation. As my data are based on a transaction timestamp, this was perfect.
- Aggregation, why implement time based average, percentile, whatever advanced metric you want, when the database has it for you ?
- Kibana, which give us a very nice dashboard tool for free
I need a poller for ElasticSearch. After some search, I found Elastic Beats. Beats are little program written in golang and designed around a main polling loop. After looking the documentation and the community list of Beats, it started to look like the perfect tool for me.
And you what? It worked! I created KrakenBeat using the Beat stack. You can still retrieve KrakenBeat in the community list of Beat by the way.
But why this article then? After some time, I chose to completely rewrite KrakenBeat using my favorite language, Python. And this article is my humble report of why I to restart from scratch and why my poller is now better than ever, at least for me…
Please check this blog post for an in-depth example of the creation of a Beat. I will only give my experience on certain part of this process and how it compares to a “from scratch” approach.
To create a Beat, you will need to install python (only 2.7, seriously Elastic ?!) and use a makefile. My two computers are running windows. I’ve used Bash for windows (which is an excellent piece of work) but that’s complicating the task of bootstrapping the poller. For my python poller, I needed… Python.
A trade on kraken, and in my database, is an object with a price, volume and a timestamp. On my left, with KrakenBeat, I had to create a .yaml file and again use `make`. Beat need us to instantiate a trade as a Map of key:value. There will be no validation in code… On the other side, in pure python, I have access to the official ElasticSearch ORM: Elasticsearch-dsl which enable me to describe the trade like:
And that’s all.
At the end, there were all sort of little thing which made the maintenance and further development more complex. Beat are not badly designed, but it feels like an over-complicated workflow for my need, and surely like 80% of people need when pooling web.
I will no list all the differences between the two pollers, but I think you’ve got the idea. Even if Beat is “official”, why do something complex? As I’ve read recently, program should scale for human first, then for the machine. This is exactly the point here. Even if Beat can scale very well, thank to the use of golang, and have some nice feature. For a simple job, please do something simple. Today everything is complex, why?
Another example as conclusion. I’ve seen people going on a full angular stack, with typescript and all, just to fill an html template. Guys, if you want to render a template, please use a rendering engine. You could have done that in like 5 minutes with jinja2 and have free server-side rending on the way. Of course, angular will be nice if you want a lot of feature, but you will have time to scale your code later if needed. Start small, grow faster!