Over the last year or so I've been looking quite a bit at Elasticsearch for use as a general purpose time series database for operational data.

Whilst there are definitely a lot of options in this space, most notably InfluxDB, I keep coming back to Elasticsearch when we're talking about large volumes of data where you're doing a lot of analytical workload.

More than a few times, I've been asked to explain what Elasticsearch looks like to the would-be developer/operations person. This isn't too suprising, the documentation isn't great at giving a real world architectural overview - which can really help contextualise the documentation.

Having been asked again today, I've decided to write one up here - so I can save some time explaining the same thing over and over.

It's important to note for the would be reader that this is my imperfect understanding of Elasticsearch. If you spot any glaringly obvious errors please let me know and I will update this accordingly and you'll have my eternal thanks for helping me grok this a little better.

So, as they say: the show must go on!

Elasticsearch concepts

ElasticSearch is a search engine built on top of the Apache Lucene. It's great for full text search (duh) as well as analytical workloads with adhoc queries and aggregations.

In NoSQL parlance, it would be classified as a document-oriented database and that's primarily how your application will run with it.

You insert your DOCUMENT into an INDEX.

  • An index is a namespace for documents and is analogous to a database table
  • Unlike a table, there is no schema per se
  • You define what fields are indexed for search at this level

An INDEX is backed by one or more SHARDS

  • When you create an index, you specify the number of primary shards your data will be split across.
  • Elasticsearch uses a hash function to determine what shard your document is stored in and accessed from.
  • You cannot change the number of primary shards after the index is created.

A SHARD can have zero or more REPLICA SHARDS

  • For each primary shard in your index you will have X replica copies (defined by the index policy)
  • If the node hosting the primary shard fails, a replica shard will be promoted to primary
  • Replica shards are used for scaling out read performance - the more replica shards you have, the more reads you can service.
  • Unlike primary shards, you can change the number of replica shards any time

ELASTICSEARCH runs as a CLUSTER made up of NODES

  • Nodes automatically form a cluster when correctly configured
  • Elasticsearch will automatically distribute (and move) shards as needed by your index configuration
  • Your application can talk to any node in the cluster and they will forward your request to a node with the data to service your request
  • If you have a busy cluster, you can deploy proxy nodes - these are nodes that don't store shards and can be used to direct incoming requests

A SHARD is a Lucene index

  • Every time your query needs to access a shard, the Lucene engine needs to be running for that data
  • Don't confuse a Lucene index (a shard) with an Elasticsearch Index (a collection of shards)

Caring and feeding for your cluster

I won't cover setting up and maintaining quorum in the cluster, because that's pretty well covered elsewhere. If you're running on AWS there's even a managed product available which helps simplify things a lot.

For the keen observer, that last bullet point raises some interesting constraints when managing your cluster.

Elasticsearch manages lucene process, but remember they don't share the same memory space. Because your indexes have to be loaded fully into virtual memory, make sure you leave enough memory free for your index data (lucene).

Don't allocate more than 32GB heap to Elasticsearch. Due to how Java addresses memory, this will slow things down heaps (see what I did there?).

Read this. No really.

Working with elasticsearch

In short, deploying Elasticsearch for your search application requires some careful planning on both the ingest and query side.

Primarily you want your cluster to have enough memory so your busy shards (read or write) stay resident in physical memory. Otherwise your nodes will spend all their time paging data in and out of disk, which defeats the point!

Read the designing for scale part of the Elasticsearch guide.

Your best bet is to split the data across indexes by a meaningful criteria, and in the case of time series data this is a natural fit.

Elasticsearch has a feature called index templates, which are super useful for dynamically creating indexes with specific settings. Writes are directed to the correct index and you can automatically have the new index added to an index alias for reads.

Conclusion

Elasticsearch is a great tool, but requires you to plan ahead: Hopefully I've given you a good introduction to how things hang together and where there are sharp edges.

I highly recommend reading through the entire Elasticsearch: The definitive guide document, with the above information in mind ahead of time I think it makes for a much more cohesive read.

Good luck and happy hacking!

Thanks to @ZacharyTong for pointing out that Lucene does in fact support paging index segments (statement removed), and @warkolm for spotting a mistake regarding number of replicas and an opportunity to clarify!

After a suggestion by someone, I got it in my mind that a certain group could really do with a NNTP (Usenet) caching proxy. NNTP Proxies and Caches do already exist, but none of them support cache hierachies - that is, trying to resolve articles from peer caches before talking upstream.

The use case here is WACAN, a wireless network where each participant may want to offer their article cache for use by members on the network.

So after talking about it for a little while, I wrote one.

It's pretty terrible code actually (suprise!) - I've hand written the parser and for now it supports the bare minimum set of the protocol to support SABnzbd. NZBGet won't work, but could with some minor command support.

This ended up being useful to me for some of my other forever projects, so I've had a chance to look back at this recently and am considering revisiting it if I have time.

If that happens, I'll be looking to rewrite the parser using Ragel and targeting either Go (even though it has the pretty useful net/textproto package) or C++.

The new implementation will be a lot simpler, backing usenet requests onto stateless HTTP requests - leaving the implementation as a fairly flexible and pluggable exercise. I've done a bit of testing with this already, specifically trying to use the news and nntp schemes over HTTP - though library support for this (I'm looking at you net/http and libcurl is pretty average).

Watch this space, I guess.

Having been writing apps (poorly) with AngularJS for a while now, I was pretty excited to realise that combining this with Phonegap/Cordova - I could start writing portable mobile apps!

For the uninitiated, Cordova is an open source mobile app development platform. Basically, it bundles your app for your device and runs an embedded (lightweight) webserver and runs your app in a fullscreen browser control. Nifty. Phonegap is the Adobe commercial version, and they kindly donated the core of it to the Apache FSF (thanks Adobe!), which is the bit known as Cordova.

Now what project to learn with? Hmm... aha!

puush is a pretty cool image sharing service run by a guy who I used to LAN with, and is pretty much entirely funded by their insanely popular rythym game, osu!. It's a hobby project.

One of the problems with being a hobby project is that it doesn't get a lot of love. Specifically, their mobile app for iOS no longer works unless you still have an iPhone 3 (how quaint). I use the app /a lot/, and would love it to work on mobile without carrying an extra device around.

After vague promises of maybe adopting my code for their official app, I decided to see how hard it would be to replicate this on iOS.

As it turns out, not hard at all! After a week of development I had a fully functioning prototype with most (not all) of the features implemented. This ended up being a fantastic starter project for both size and combination of native features (camera, local storage, modal dialogs, etc).

Whilst the puush guys haven't adopted the code yet, there's nothing stopping anyone with an apple developer account from building and publishing to the app store. Which I've decided to do, after I polish it up a bit and make it ready for prime time.

Probably the biggest change I need to make is converting it to a grunt project so it bundles and minifies all the code locally instead of including it from cloud CDNs... Don't look at me like that - it was development code!

Anyway, check it out: puush-phonegap on GitHub

I had to split up my posts to avoid my hiatus post from being an unreadable mess.

This post is a bit of a recap of some projects I've worked on or am working on. There are another two posts following this one covering some larger posts.

Startup weekend mentoring

I haven't participated in another Startup Weekend in Perth since my first one. I don't think I ever wrote it up in its full glory either, needless to say it was pretty intense and despite not having the experience I was after, I learnt a bunch and would recommend it to anyone who is interested in that kind of thing.

In preparation for the SWPerth7 event the organisers did a call out for mentors. This is something I'm interested in, but I honestly didn't expect to get accepted. Heh, suckers.

Mentoring was a great experience, and I genuinely hope I was able to help the teams with their planning and validation - my feedback seems to suggest so. I was a bit worried I'd commit a cardinal sin and be prescriptive about things "You should do this or that", but I was able keep it to leading questions and answering specific advice questions - so yay for that.

In the future, I'd love to do this again and aim to get some of the pitch coaching mentor timeslots. I think I might have more to help on this side of things.

Distillation automation

I managed to get most of the parts together to completely automate my still for the purposes of , uh.. extracting essential oils and distilled water.

The hardest part so far is the temperature probe which is an annoying 5mm in diameter in stainless steel. I was hoping to use a one wire digital probe, however the smallest package for these is 5mm in diameter - leaving no room for the stainless steel shroud.

I'll have to order some K-type sensors, and once they arrive and the shed is cleaned - I should have some updates on that particular project.

Video streaming

I've been pretty interested in getting involved in the video streaming project over at the Perth Linux Users Group for a while now.

They're looking to move from DV capture to HDMI over USB and have been waiting on some custom hardware to get made, which looks like a good path forwards.

But in the meantime my work has given me some budget to put together a solution with off the shelf parts. I've happily spent the entire budget and have a nice pile of bits ready to go.

Whilst I'm still keen on the open source solution, it's going to have to wait a while for me to play with these new toys.

Expect a post on this soon.

WACAN

I'm happy to be a founding member of this organisation. A bunch of guys involved in WAFreeNet incorporated an association to further the goals of building a community operated wireless network across the region.

The incorporation (or rather, some of the people involved on either side of the should-we-incorporate fence) has caused some division in the community, but has also achieved some really great stuff. Specifically, the organisation has a relationship with WAIA which has helped secure a tenancy for a great core node at QV1 as well as access to some CDN traffic over the network.

I've perpetually been unable to participate in these networks - since 2003(?) I've been testing line of sight to each house I've lived in with no luck. More recently I've had perfect LOS to QV1, but at the wrong angle for the sector panel - but it looks like our new house should be able to manage an ok connection.

My involvement in this group has primarily been as a member, I'm not really interested in committees any more - but I've organised a few public meetings (none of which I've been able to attend, heh).

I think I'll keep doing that for a while.

Pelican Modules

I promised a while back to put the code up for some of the pelican modules I wrote to support this site. As of sometime last year, they're now available on GitHub here and here.

... Oh, hello there. Hello? Hi. Is this thing on? Well, uhh.. welcome back. It's been a smidge over two years since I last posted on here, so I figure I should pop in an update - things have been busy but at the same time not really.

Along with the new content I've given the site a fresh lick of paint, I'm not going to spend ages getting it where I want because, frankly - the longer I spend on it, the less I'll be happy with it.

Real life things (tm)

I don't normally talk about real life stuff on here, but things have been busy and some pretty big and meaningful changes have been afoot so buckle up.

I quit my job. Twice. Without going into too much detail, I'm trying to sort out where my career is going. Being "on the tools" is making me feel like shit and killing what interest I do have, though part of this is environmental/situational and even more of it is a me problem.

Anyway, this is an ongoing thing and I need to sort things out still. At this point I'm considering moving to a whiteboard position (solutions architecture) or make a dramatic jump into something completely not IT related (book keeping seems reasonable at this point). Updates as they come, but we'll see where I land in the next 6-12.

Around two months after my last post, I proposed to my gal of 11 years. She said Yes! (thank god). Despite being stressful, I'm as happy about this as ever and it looks like it mightn't be anywhere near as stressful as planning a wedding :)

Oh, and we bought a house. And an adorable puppy named Frank(ie)!

So things in the real life department are pretty awesome. And expensive. So expensive. But mostly awesome :)

Projects

Nope! This post is getting too long already. You're going to have to check out the next three posts covering this section. Doing this big update makes me feel a little better, I don't feel like I've been so lazy for the last two years now.

Go me :)

Company

Last year I founded my second real company, Meta Technology (website coming soon, for now please enjoy the SSL warning and webmail links).

I've been doing some contract work on and off for a handful of tech startups here in Perth, and this company will house that. This is part of my longer term goal of having a legal entity to fund some of my projects, support any that become viable and provide me with some level of protection for those that might not go so well.

So far, this is a happy story - the company has paid back its loan to me and exceeded its goal of breaking even. So far. With some additional work lined up to pad out the rest of the financial year, things are looking good.

So if you know anyone looking for a solutions architect for hire (specialising in applications at scale and the clouuuud), feel free to refer them to me.

I'll have a website real soon, promise ;)

This is the personal website of Will Dowling, a DevOps Engineer hailing from Perth, Western Australia.

Twitter

I talk shit here sometimes.

GitHub

My terrible code. For free.

LinkedIn

Pay me to write bad code and talk shit for you.

Tumblr

Pretty pictures, rarely my own.

Flickr

Pretty pictures, actually mine.