Wednesday, February 18, 2015

Why Your Early Start-Up Needs an Infrastructure Engineer

Most early stage start-ups dive straight into hiring as many top-notch Software Engineers as they can get their hands on, and then watch the river of code that flows from their fingertips. What they might not realize is that their engineering teams are missing an important component - an Infrastructure Engineer! To elaborate...

The Situation

Your tech start-up has a marginally functional prototype of your latest brainchild, and it just helped you land your first major round of funding. Whoo!!! Now you're on the hunt for scary good talent to start banging out code so as to commence the Journey to Alpha. It's a good day.

The Need

Rockstar 10x Software Engineers are an absolute must-have. You are, after all, in the business of writing software - who better to hire than, ya know, the people who write software? These fine folks are going to translate your vision from pure thought to actual, usable stuff that you can sell to customers. So, that's your recruitment strategy: go for the super devs and call it a day.

The Actual Need

A group of engineers who can collaboratively hammer out a product that is worthy of putting in front of your potential customers is the real goal here - not simply gathering a collection of awesome Software Engineers. Building out your team early on with nothing but SE's tends to leave you lopsided, weighted too heavily toward the "crank out product code" side. To balance out this early engineering collective, you really want to include an Infrastructure Engineer, which I'll fondly refer to as an InfraNerd from here on out (because, honestly, no one wants to be called an "IE").

What Usually Happens

Your start-up ramps up to a team of 5-10 SE's, and they get busy building out your product to (mostly) the spec they're handed. Here are a few generalizations about how things might operate in the early stages of product development:
  • Each SE is likely using the toolchain and build routine that has always treated them well in the past
  • Some write tests to cover a respectable percentage of their code
  • The rest write somewhere between 0 and ∅ tests
  • More SE's come on board
  • Everyone pushes to a dev branch, which is merged to master inconsistently at best
  • Master breaks. Every. Damn. Time.
  • Moar SE's!
  • Codebase begins to win recognition as the best Italian food in town
  • Alpha (and possibly Beta) products fall flat on their face when deployed for customer POC / demo
  • Massive architecture rework is considered, then immediately ignored in the interest of Getting Shit Done
  • You realize that the product isn't even remotely Production-ready and begin attempts to recruit the mythical "DevOps Engineer"
  • After determining that "DevOps Engineers" are mostly made up of dreams and disappointment, you end up with an Ops Person (who may or may not be an InfraNerd)
The nastiest pain point of "the usual way" is generally when the Ops Person is given the unenviable task of "fixing" all the things. It's all too easy for ill will to be felt toward the person who is coming in and telling a sizable group of intelligent, competent, experienced SE's that they need to stop doing X and start doing Y. Right now. Because their stuff is BUSTED. 

This is generally a bad time for everyone involved.

How an InfraNerd Changes the Equation

Your friendly neighborhood InfraNerd can, given the appropriate care and feeding, avoid a lot of the heartache-inducing moments enumerated above. Bringing an InfraNerd on-board early in the life of your start-up means you get to deal with an engineering team moving faster than the founders can keep up, rather than trying to un-break the world.

Here's the straight dope: your SE's are immensely smart folks, and they can work freaking magic at the keyboard pouring forth rivers of feature-packed and shockingly cool software. They casually tweak algorithms that the masses only speak of in hushed reverent tones, they grok unnervingly complex systems that most people couldn't navigate with a GPS and a tour guide, and they build amazing things out of thin air. They know their world forward and backward. But they (usually) don't know the world that sits just below theirs.

Your InfraNerd, however, will know that shadowy world intimately, and they'll draw on that knowledge to augment your engineering team such that it will be able to sustain a much higher level of Awesome. It's not that an InfraNerd can do things that a SE can't grasp, but rather that they think of solutions that your SE's are not accustomed to thinking of (and often don't have time for).

What an InfraNerd Brings to the Table

Without an InfraNerd, your SE's will very likely have something like Jenkins in place to handle some ad-hoc test runs and the like, but I can almost promise it will be broken and underutilized. After all, who has time to fuss with automating the build pipeline? Answer: your InfraNerd does! Their purpose in life is to automate the world, and your team's velocity will ramp up quickly because of it.

An InfraNerd is going to bring in tools of which your SE's may not even be aware, all for the sake of automating the world. They'll help avoid the nightmare of shady shell scripts being used for deployment by calling on one or more of the many available Configuration Management & Orchestration tools (Ansible, Fabric, Chef, Capistrano, Puppet, SaltStack, etc). There's a good chance your SE's know about such things, but have never really seen a need to use them in their day-to-day workflow. There's a 100% chance your SE's will fall in love with such things when they see how easily they can deploy their code changes to arbitrary environments with minimum fuss.

Your InfraNerd will do their damnedest to reduce the amount of code your SE's have to write and maintain. SE's tend to be inventors by nature, and with that nature comes the tendency to see a wheel-shaped hole and immediately set out to find a suitable chunk of rock from which to sculpt a mighty fine wheel. They know the problem and they're capable of building what's needed to fix it, so they set out to do so. Usually the problem that needs solving has already been solved by some existing tools, and your InfraNerd will jump at the chance to use those tools to reduce the team's impromptu Wheel Re-Invention Drills. The real beauty here is that most of the tools an InfraNerd ushers in will quite often open the door to some really cool enhancements and features that might never have crossed anyone's mind up to now.

In the midst of all their other work, your InfraNerd will also weave in tasks to build up what is undeniably the single most important part of your systems: infrastructure you can use to measure everything. They're going to find every logfile your SE's have ever dumped anywhere and get them automatically ingested into a real-time, centralized, and searchable format by way of a service like ELK or Splunk. If there's a metric that can be extracted from anything, they're going to find a way to get to it and pump it into a time series data store like InfluxDB or Graphite. Then they'll take all those data and craft dashboards to display them in all their unapologetic and illuminating glory with something like Grafana. They'll use all of this to tell you stories about your product that you'd never imagined possible. It's kinda great.

The Wrap-Up

In a young start-up looking to hire on Engineer #10, it's going to feel pretty wrong to fill that seat with anyone other than a Software Engineer. Chances are that it's going to feel wrong anytime before #50, to be honest. You should consider, however, that as expensive as that #10 slot might seem, filling it with an InfraNerd can really be a game-changer. You'll end up with an engineering team that turns out a better product in less time and likely doesn't need to rebuild everything from the ground up before you can ship Beta. You'll be faster to market with a superior product and happy SE's - and that slot will suddenly seem so cheap that you'll want 2 or 3 more InfraNerds before you know it.

So now you need to find some InfraNerds and get them on the payroll! How do you find them, and what should they be doing once they're on-board? Tune in next time!

Tuesday, February 3, 2015

Getting gmond metrics into InfluxDB

I'm a huge fan of InfluxDB + Grafana. InfluxDB is on a good path toward making metrics not suck, and Grafana has a great vision for interactive and exceedingly scriptable dashboards.

One problem I've run into recently is that while I can get all kinds of cool custom metrics into InfluxDB without much struggle, I don't have a tool that will spit out nice system metrics into InfluxDB.

My favorite tool for getting useful system metrics so far has been Ganglia's gmond. It gives some decent disk and memory data, but my favorite bit is the built-in CPU % utilization and network bytes in/out metrics. You'd be surprised how few tools actually give that information out-of-the-box.

However, the back-end I don't really want to use for gmond is Ganglia. What it does, it does exceedingly well. What I want it to do, it does terribly. So I either needed to adjust my expectations, or find some way to get gmond metrics into InfluxDB. Turns out there were tools available to get data from most other monitoring and metrics tools into InfluxDB (collectd, statsd, fluentd, graphite, etc) but nothing that would work with gmond.

Now, you can make gmetad output data in Graphite format and point it at InfluxDB's Graphite input plugin. I don't much care for that approach, for a few reasons:

  • it forces an awkward (and generally performance-hindering) data layout in InfluxDB
  • it requires you to run gmetad, which feels a bit heavy when it would just act as a proxy
  • it introduces more layers when I'd much rather simplify things

I'd much rather have some simple tool that polls gmond and puts the metrics into InfluxDB in a sane way.

So, I built one. Get it at:

It's not *quite* where I want it to be yet, but it should be of some use right now. I've listed some enhancements I want to make in the Readme.

My thinking behind this tool (and any others I might build) is that, as an infrastructure tool, it should be easy to get started with, easy to automate once you've got the hang of it, and easy to forget about once you've handed if off to your config management tools.

To that end, if any one of those qualities is not present, consider it a bug and file an issue on Github - or send me a pull request, I love those things.

Happy graphing!