Monday, April 18, 2016

Let's Build Something: Elixir, Part 5a - Data Ingest, Consumption, and Validation

Whew! Been a little while, but let's keep cruisin'! This installment will tackle defining a custom data type for our data points, providing a means by which we can queue up ingested data and consume it continuously, and validate it as we go along. We'll also write some tests to keep our sanity.

First let's define what our data points should look like. We'll keep it simple for now, and can expand later. Open up a new file at lib/stats_yard/data_point.ex:


There a couple of things happening here that you might be interested in:

  • Line 4: We define a type for our DataPoint struct. Note that __MODULE__ is just a safe way to reference the name of the current module (StatsYard.DataPoint). We can later reference this as StatsYard.DataPoint.t when the need arises.
  • Line 5: Define our basic DataPoint struct. Note that structs always inherit the name of the module in which they are defined. If we wanted it to be referenced as something other than %StatsYard.DataPoint{} we would need to define a module within the outer module, such as StatsYard.DataPoint.Thing. There's no real need for that in this case.
  • Line 7-10: Set up a validation function that will only work when the argument passed is one of our shiny new structs, and the various keys therein pass our guards. Specifically we want the metric and entity fields to be strings (or binaries, in Elixir/Erlang land), and the value to be some type of number.
  • Line 11-12: If we end up in this version of the validate function, log a message and return a tuple to indicate success and spit back out the provided struct.
  • Line 15: Define the "fall-through" validate function that will match on any argument that doesn't pass the above guards. In this case, log a warning and an :invalid tuple with the provided value included.
This module is intended to wrap up the structure of the data we want to use and the functions that are relevant in inspecting and validating it.

Next up, let's add something to let us queue up incoming data points. I like Joe Kain's BlockingQueue. It's a statically-sized GenServer'd FIFO queue that blocks when full or empty. Super simple, and very effective. NB: Joe also has a super awesome blog called Learning Elixir that you really should check out.

First up we need to add it to our deps list in mix.exs:


Then follow that up with a mix deps.get, and we're ready to roll.

First let's walk through the idea here. I want to have a BlockingQueue GenServer that catches our ingested DataPoints, and a separate consumer GenServer that will pop those DataPoints off the ingest queue and validate them.  The most important part of all this is that I don't want to do things in batches, nor do I want to have to explicitly trigger the consumption of data from the ingest queue. Enter supervision and streams!

BlockingQueue's API gives two functions for popping events off a queue: pop/1 and pop_stream/1. As you might have guessed, pop/1 removes and returns the oldest single value from the queue, while pop_stream/1 returns a Stream function - specifically, Stream.repeatedly/1. If you're unfamiliar with Streams, they're effectively a safe mechanism by which you can perform actions on arbitrarily large data sets without having to pull the entire thing into memory. I'm not the best to describe these in great detail, but Joe Kain and Elixir's Getting Started guide have some good descriptions and applications.

So the layout of these bits is going to be something like this:

  1. Start up a named and supervised BlockingQueue GenServer
  2. Start up a named and supervised consumer GenServer that can find the above BlockingQueue process
  3. Consumer process grabs hold of the Stream function returned by BlockingQueue.pop_stream/1 and proceeds to validate every DataPoint that gets pushed into the queue

Here's our consumer:


So here's what's going on in this module:

  1. Line 6: This is my nifty way of telling this function "keep an eye out for things popping up in this stream of data, and take appropriate action when you see a new item." We'll test this out shortly in iex
  2. Line 7: Try to validate, as a DataPoint, every item that comes off the stream
  3. Line 8-9: If the validate succeeds, return the DataPoint, otherwise discard it entirely (remember that our StatsYard.DataPoint.validate/1 function will log a warning when a value fails validation)
  4. Line 18: Note that our public start_link/2 function expects to receive the PID of our ingest queue, which we'll provide in lib/stats_yard.ex when we set up our supervision tree
  5. Line 23-25: Start up a linked process that will kick off our queue consumption loop in consume_data_points/1
  6. Line 27: Set our GenServer's state to the PID of our ingest queue, and we're done!
Notice that this is a very simple GenServer - so simple, in fact, that it doesn't even have any direct means of interaction. For now, this is more than sufficient - we just want something that we can supervise and organize appropriately, with the option to extend it for more robust behavior in the future. (For the studious, you're right - there's always Elixir's GenEvent, but that's for future posts!)

Now let's rig all this up in our supervision tree, and then we'll poke around in iex to see if it's all working as expected. Notice that this has been cleaned up a bit to accommodate our TimeStampWriter bits without getting too cluttered:


(I know that's a big chunk of code to dump into a blog post - my apologies. I mostly wanted to be sure to point out that the structure of this stuff changed significantly. Newer changes will be limited to just the diffs. :-) )

Nothing super exciting here, other than a few things to note:
  1. All supervisors are now started up in their own independent functions, which are called from the pared-down start/2 function
  2. Our supervisors are now named appropriately (Lines 20 and 36)
  3. Our start_main_ingest/0 function lists two workers to be started up under the appropriate supervisor, which will start in the order listed (this will be on the quiz)
  4. Atoms used to name our GenServer processes are pulled out and returned from simple functions at the bottom of the file, so as to avoid headaches later
Enough work, let's play with it! Fire up iex and we'll see if our stuff works:


Cool! We're able to push things into our BlockingQueue without having to know much about it, and our IngestConsumer immediately received the pushed values and attempted to validate them, the results of which are spit back out via log messages.

Now for that quiz I mentioned earlier: in what order were our two ingest GenServers started? Yup, the order listed in our supervision tree definition - the queue first, then the consumer. Why does this matter? 

There's a failure case that we need to recognize and accommodate. Specifically, if our ingest queue process dies, it will indeed be restarted by the supervisor... but our consumer process will merrily chug along holding onto a Stream function that references a now-dead process! That sounds like bad news, but let's verify that I'm not making stuff up:


I know that's a bit dense, but the gist (har har) of it is that we used Supervisor.which_children/1 to see what the PIDs of our two GenServers were, stopped the main_ingest_queue process (rather rudely, too, a la :kill), then checked to see that the expected PID had indeed updated in the supervisor's state. Then we tried to push a value into the main ingest queue, which did indeed work since it had been restarted, but our ingest consumer process never knew about it, because it's waiting for events to flow in from a dead process. That's lame, so let's fix it!

Turns out, this a super simple one-line fix, but reading the docs is a must in order to understand why this fix is appropriate (head over to the Supervisor docs, then search for "rest_for_one"). In lib/stats_yard.ex:


And now to test it out in iex:


Woohoo! Worked like a charm. What's happening here? First, read the docs. :-) Second, in a nutshell, using the strategy rest_for_all causes the supervisor to consider the position of any process that dies under its supervision and then kill and restart all supervised processes that were started subsequent to the original dead process. In our case, the queue process is the first one, so if it dies, everything in the supervision tree of our MainIngestSupervisor is restarted. If it were, for example, the 3rd process started by this supervisor, then the 4th, 5th, ..., nth processes would be restarted, while the 1st and 2nd processes would be left alone. Super handy stuff here!

To Be Continued...

So now we're in a good place from a supervision point of view. This post is already pretty lengthy, so I'm going to title it "Part 5a," and we'll continue with some unit tests and documentation in Part 5b.

Til next time!

Thursday, April 14, 2016

Let's Build Something: Elixir, Part 4 - Better Tests, TypeSpecs, and Docs

We left off with our first test case working, but less-than-ideal. Specifically, it's leaving the timestamp file it writes sitting on the disk, and in an inappropriate location (the root of our project). This is super lame, and we should fix that.

Enter ExUnit callbacks and tags! ExUnit allows us to pass configuration data into and out of our tests by means of a dictionary, usually referred to as the "context". We can make good use of this context data by way of setup callbacks and tags. These are described well in the docs, and we'll lean on their examples for what we need to accomplish here.

So our test is currently testing TimestampWriter's ability to... ya know... write timestamps. And it works great, other than leaving the temp file sitting in the root of our project. While we could just add some code to our test to explicitly handle this, a better (and less-repetitious) approach is to modify our overall test case to do some setup and tear-down tasks for us automatically!

First, remove the junk file leftover at stats_yard/tstamp_write.test, and then we'll add a setup callback to our test case that will force our writes to happen in an appropriate directory:


Here we're exercising a one-way communication from our test to the setup callback by way of a tag. What's happening here is that ExUnit will call our setup callback before execution of every test within our test case. By preceding a test definition with @tag, we are specifying that a context dictionary should be passed to our setup callback that contains a key-value pair of { cd: "fixtures" }This is mostly copy-pasta'd straight out of the ExUnit docs, but a bit of explanation can't hurt:
  • Line 2: We need to make sure our tests don't run in parallel since we're going to be switching directories. It sucks, but it's the nature of the beast
  • Line 4: Define our setup callback, which will be executed prior to every test that is run
  • Line 5: Check to see if our cd tag is present in the callback's current context dict. This is necessary because the same callback is executed for every test, but not every test will necessarily use this particular tag
  • Line 6-7: Store the current directory and switch to the directory specified in the context
  • Line 8: When the test exits (whether success or fail), switch back to our original directory
  • Line 14: Our handy-dandy tag for the test that immediately follows
Let's see if it works!


Nope! ExUnit apparently doesn't create directories for you. Oops. Easy fix, and again:


Much better. Now we should do some cleanup after the fact, because let's face it - no one wants temp files committed to their repo.


A quick rundown of the updates (slightly out of order):

  • Line 23: Add a `tempfile` tag to our test to indicate that we're going to (attempt) to write a transient file
  • Line 24: Make our test accept a dict argument called `context` (which will contain our `tempfile` key)
  • Line 25: To keep things DRY, refer to the context's value for :tempfile instead of repeating the filename explicitly
  • Line 9: When the test is done, check to see if a tempfile was specified for the test that's being set up
  • Line 10: Make sure the tempfile actually got written, otherwise Line 11 will blow up
  • Line 11: Call the "dirty" version of File.rm/1, just in case there are any weird permissions issues that prevent deletion of the file
So now we should be able to run our test, and see precisely zero remnants of it:


Perfect! Now we can write more tests in here and gain some nice organization and cleanup bits without having to provide anything beyond a couple of appropriate tags. (And after all that, yes, I do realize that a tempfile doesn't necessarily need to go into its own directory if it's just going to be immediately deleted. This just makes me feel better.)

To wrap up, let's make our GenServer's module and API a bit more legit with a typespec and docstrings:


The @moduledoc and @doc directives are pretty straightforward - wrap up your docstrings in the """ markers, and get yo' docs on. Keep in mind that the docs are in markdown, so you can (and really should) make them look pretty.

The @spec directive on Line 22 is simply a way to specify the types of the arguments our function can accept, an the type of value it will return. Easy stuff, and super helpful when we start looking into static analysis - it can help iron out a ton of bugs early on.

Next Time

Now that we've spent some time on some of the basics that we'll be seeing over and over again, the next post will get into more of the meat of our project and start doing stuff that's more interesting than writing a timestamp to a file. Specifically, we'll define the first iteration of our data format and figure out a way to represent that in code such that we can validate incoming requests for appropriate structure.

Until then, feel free to peruse the source for this post at: https://github.com/strofcon/stats-yard/tree/lets-build-4









Monday, April 4, 2016

Let's Build Something: Elixir, Part 3 - Getting Started with ExUnit for Testing

NOTE: Before you get too far into this one, I want to mention that I realized I wasn't following convention in my Elixir file names, so there's a commit at the beginning of this post's branch that fixes it (and it's been merged to master as the others have as well). Just a heads-up in case it seems weird all of a sudden. :-)

Last time we made our TimestampWriter GenServer a supervised process to make it more resilient to bad inputs and other process-killing events. Now it's time to protect our GenServer from a much more sneaky and persistent assailant - us! This seems like a good time to get familiar with ExUnit and build our first test case for StatsYard.

Defining ExUnit test cases is pretty similar to defining any other module in our Elixir project. If you pop into the stats_yard/test directory and take a look at stats_yard_test.exs, you'll see a simple example test:


Running this test is as easy as a quick mix test:


Let's break that test down just a bit:
  • Line 1: As mentioned above, a test case is simply an Elixir module
  • Line 2: Pull in ExUnit's test case bits
  • Line 3: This line will cause ExUnit to do some magic that we'll discuss at a later date
  • Line 5: Defines a unit test with an arbitrary string label
  • Line 6: Makes the bold assertion that 1 + 1 does in fact equal 2
    • assert is basically just shorthand (or more accurately, a macro) that says "everything after the keyword 'assert' better be true, otherwise I'm gonna blow up and fail spectacularly". (There's a bit more to it, and we'll tackle that next.)
To see assert do its thing in a less-than-true situation, we can just change the 2 to a 3 on Line 6 and run  mix test:


In a nutshell, what happened here is that we insisted 1 + 1 = 3, and assert totally called us on it. What were we thinking???

There's some interesting stuff in that output block. First it tells us which test failed ("the truth"), what file that test lives in (test/stats_yard_test.ex), and the line number that the test definition starts on (:5). After that, it tells us the general type of 'thing' we were trying to do (assert with ==) and shows us the specific assertion that failed.

Next up are two interesting and very helpful lines: lhs and rhs. These acronyms stand for "left-hand side" and "right-hand side" respectively, and these lines actually give us some insight into the way the test actually works under the hood. If you haven't encountered them before, lhs and rhs are hyper-relevant to one of Elixir's most powerful features: pattern matching!

These two lines are telling us that ExUnit took our assert expression and made an attempted pattern match expression out of it, with the actual evaluated value on the left-hand side of the match, and the asserted value on the right-hand side, like so:


In this iex session we can see an example of both of the test attempts we've tried so far - the first one being the successful test, and the second being the intentional failure. Hopefully this provides a bit of clarity around how ExUnit is actually accomplishing this particular test.

So that's all fine and dandy, but we really should work on testing our TimestampWriter. Go ahead and switch that pesky 3 back to a 2, and we'll get started!

First let's create a directory that will hold our tests - it's cleaner, and seems to be the convention used in most projects. Then we'll create a file in there to hold our first real test case (note that test files have to be named "<stuff>_test.exs, otherwise the mix test task will skip them):


In timestamp_writer_test.exs we'll start out with the bare-bones first increment of our test, BUT we'll try to make it fail first by passing a bad argument to our public API function write_timestamp/2:


(Note that I stopped naming these modules StatsYardTest.*, no point to it as far as I can tell.)

And a quick run to see what's up:


Huh... well that's... um... awesome? Not really. The GenServer process did indeed fail like expected, but the tests still technically passed. What gives?

As it turns out, we tested our public API function, write_timestamp/2, not so much our GenServer. Our function is simply calling GenServer.cast/2 which then asynchronously sends a message to our TimestampWriter process. That send is indeed successful and returns the :ok atom - even though our process dies shortly thereafter - and that's exactly how GenServer.cast/2 is intended to operate.

So how do we fix that? Well to be entirely honest, I don't know yet. BUT! There is a silver lining - this experience has made me re-think whether or not this particular activity is best handled as a cast or a call, which basically boils down to "should it be asynchronous with no response, or synchronous with a response?" Given the intended purpose of this particular function, I think a call is more appropriate: we're going to need some manner of acknowledgement that our data has indeed been written to disk before moving on to whatever our next task might be.

So! Back to our TimestampWriter code:


To recap the changes here:
  • Line 9: Switch from GenServer.cast/3 to GenServer.call/3
  • Line 24: Switch from GenServer.cast/3 to GenServer.call/3 and add an (unused) argument, _from, for the sender's PID (we don't particularly need this right now, hence the underscore to keep the compiler happy)
  • Line 25: Bind the result of our file write operation to result
  • Line 26: Use the appropriate response from a call, which is to return the :reply atom, a result of some sort and the new state of the GenServer
Notice a cool thing here, too: our interface to the GenServer didn't change at all, so we don't need to update our test! We should be able to run mix test and see our test fail appropriately:

(Note: There will still be some extra output after this as a result of our GenServer tanking on bad inputs. We'll try to fix that another time.)

Perfect! Now if we stop passing a known-bad argument to our public function in the test, we should get a nice passing test:


Success! Whew. That was a bit of a runaround to get a simple test in place, but I learned a lot, so it doesn't seem like a wasted effort to me.

As a final cleanup step (for now), I'm going to remove the simple truth test from the out-of-the-box test file that mix creates, because I don't really care for the clutter.

Experiment

We left a bit of an unpleasant side effect in place with our test. Hint: our timestamp writer spits out a timestamp somewhere. Figure out where it's landing, then peruse the ExUnit docs and see if you can figure out how to make that stop happening. No need for that clutter! The next blog post will cover how to fix this.

Next Time

We're not quite doing TDD, but hey, it's a start! Next time 'round we'll clean all of this up a bit more (as mentioned in the Experiment above) with some typespecs and docs. Exciting stuff, eh? At the end of the day, it's worlds easier to do these things up front, rather than trying to retrofit them later - plus we can make use of them for some testing convenience (or at least that's the hope!)

For now, you can peruse the source for this post at: https://github.com/strofcon/stats-yard/tree/lets-build-3