MLFE v0.2.4 Released

Nov 9^th, 2016

MLFE version 0.2.4 is now available and includes a few significant changes:

fixes for type aliases and their use in union types and type constructors thanks to Daniel Abrahamsson’s work.
early support for records with row polymorphism.
some fixes for how maps are typed.

Type Aliases

Up until Daniel’s fixes, the following wouldn’t type correctly:

module circles

export new/1, area/1

type radius = float

type circle = Circle radius

new r =
  Circle r

area c =
  let pi = 3.14159 in
  match c with
    Circle r -> pi *. (r *. r)

The issue was that the typer wasn’t looking up what radius’s members were when it tried to type a circle and further, type variables used in nested types weren’t being handled correctly either. Big thanks to Daniel for his work on this.

Records

Basic records with row polymorphism are now supported in MLFE (not compatible with Erlang’s records):

module records_with_x

export make_xy/2, make_xyz/3, get_x/1, get_x_and_the_record/1

make_xy x y = {x=x, y=y}

make_xyz x y z = {x=x, y=y, z=z}

get_x rec =
  match rec with
    {x=x} -> x

{- No matter what type of record you pass in, the record in the
   returned tuple will always remember the full set of its fields
   in the return type.  E.g. if you pass in {x=1, y=2, z=3}, the
   return type won't forget about y and z, the full type returned
   by this function will be (if you pass that exact record):

   (int, {x: int, y: int, z: int})

-}
get_x_and_the_record rec =
  match rec with
    {x=x} -> (x, rec)

Worth noting that the above module of course constructs polymorphic records. You could call make_xy "foo" "bar" and make_xy 1 2.0 and the two would be distinct types:

{x="foo", y="bar"}  -- {x: string, y: string}
{x=1, y=2.0}        -- {x: int, y: float}

My understanding of row polymorphism so far has been driven by a presentation by Neel Krishnaswami as well as skimming a few papers and some of Elm’s history.

Currently missing:

adding fields to a record or updating/replacing them
accessing individual record fields

I’m not currently planning on adding the removal of record fields as it seems like it could get problematic and/or confusing quickly although it poses no particular technical challenges as far as I can tell.

To add fields to a record I’d like the following to work:

-- This takes {x: 'a} and returns ('a, {x: 'a, hello: string})
get_x_and_add_hello rec =
  match rec with
    {x=x} -> (x, {rec | hello="world!"})

Accessing individual members should look pretty simple, e.g.

-- should type to (int, string):
let tuple = get_x_and_add_hello {x=1} in
  match tuple with
    (x, rec) -> (rec.x, rec.hello)

For the Curious

MLFE’s typer does use row variables to capture the extra fields in a record being passed around. The row variable is constructed and added to any record the first time it’s typed and the function unify_records/4 in mlfe_typer.erl is probably also of interest.

Records are currently compiled to maps simply because it makes implementation right now far easier than as tuples. As such we need a way to distinguish between records and maps in pattern matches so the code generation adds a key '__struct__' to both with the key either 'map' or 'record' to distinguish between the two. Compiled pattern matches of course leverage this as well.

Contributing and Discussing

As always contributions are welcome subject to the licence and code of conduct as specified in the project’s repository. Please join the IRC channel #mlfe on freenode for help or general discussion (governed by the same code of conduct as the project). I’m on IRC as j14159 and Twitter with the same username.

MLFE v0.2.0 Released

Aug 1^st, 2016

ML-flavoured Erlang v0.2.0 is now ready for use. This version includes a couple of major missing pieces from the first v0.1.0 release including:

support for maps, both constructing literal maps and pattern matching
binary support
UTF-8 binary strings

This release also includes some very early support for writing unit tests inside the modules they’re concerned with and running them via EUnit and rebar3 with the help of Tristan Sloughter’s plugin. If you’re using the plugin, you’ll be able to run tests defined in .mlfe files with rebar3 eunit.

Here’s the example module from the new tour of the language:

module my_module

// one function that takes a single argument will be publicly accessible:
export double/1

/* Our double function defines an "add" function inside of itself.
   Comments for now are C-style.
 */
double x =
  let add a b = a + b in
  add x x

test "doubling 2 is 4" = some_test_checker (double 2) 4

// Basic ADT with type constructors:
type even_or_odd = Even int | Odd int

// A function from integers to our ADT:
even_or_odd x =
  let rem = x % 2 in
  match rem with
      0 -> Even x
    | _ -> Odd x

The full changelog is available here and there’s a new language tour document available that explains the use of the language in a little more detail for anyone new to it.

What’s Next

For version 0.3.0 my main goals are:

probable project rename
type annotations/ascriptions
records. I’d like some basic row polymorphism here (I believe this fits with records in OCaml and Elm) but need to do some more reading and research. In all likelihood records will get syntax similar to OCaml’s and be compiled as Erlang maps to start.

Nice-to-haves include:

quoted atoms
the |> “pipe” operator
do notation/side effects

Contributing and Discussing

Contributions are welcome subject to the licence and code of conduct as specified in the project repository. Please join the IRC channel #mlfe on freenode for help or general discussion (governed by the same code of conduct as the project). I’m on IRC as j14159 and Twitter with the same username.

Introducing MLFE

Jun 27^th, 2016

What

ML-Flavoured Erlang is a statically typed language with basic parametric polymorphism (generics) for the Erlang VM that I’ve been hacking away at in my spare time since roughly the beginning of 2016. I’m making the repository public now for version 0.1.0 with the Apache 2.0 license as I think I’ve hit the point of it being “not entirely useless”. Currently every type is inferred as there are no type annotations and the error messages leave much to be desired but it works. Here’s a simple example module:

module example_module

/* A function that adds 2 to an integer.  If you try to call this
   with a float you'll get a type error.
*/
add2 x = x + 2

// a basic ADT, just no-argument type constructors:
type even_odd Even | Odd

// Convert an integer to the basic ADT:
even_or_odd x = match x % 2 with
    0 -> Even
  | _ -> Odd

It’s written in Erlang using leex and yecc for parsing and the cerl module to build a Core Erlang AST for compilation by Erlang’s compile module. I had to build the docs for cerl directly as they don’t seem to be generally hosted anywhere. If you’re curious and want to explore, you can find the source for cerl in your Erlang distribution’s compiler lib. For OSX, Erlang 18.3’s compiler source installed by Homebrew on my machine lives in the following folder:

/usr/local/Cellar/erlang/18.3/lib/erlang/lib/compiler-6.0.3

I used edoc to generate API docs and keep them around locally for reference.

The MLFE repository has some basic getting started help in the README but the user experience is rough around the edges to put it mildly. Those of you familiar already with Erlang will note the distinct (temporary) absence of binaries and maps as well.

Why

I started working on this for two reasons: I wanted something like the various forms of ML ADTs to be available on the Erlang VM along with static typing (I like the concision and the help), and I wanted to learn more about how type systems are implemented.

Much of the ADT requirement can be satisfied by atoms and tuples in Erlang proper but it always ends up feeling a bit cumbersome and while dialyzer is great, it feels like I have to work pretty hard to properly constrain contracts to what I really want. None of this stops me for reaching for Erlang when it’s a good fit, it’s just a friction point I wanted to try to smooth out.

How

Critical to this happening at all were the following resources:

“How the OCaml type checker works” by Oleg Kiselyov.
github.com/tomprimozic’s “Grow Your Own Type System” for some clarification on arrows and schema instantiation.
Benjamin C. Pierce’s Types and Programming Languages which I still need to finish and re-read properly.
the source to Lisp Flavoured Erlang for clearing up some leex stuff and use of cerl.

What’s Next

For version 0.2.0 it’s all mostly low-hanging fruit:

binaries
maps
quoted atoms
a test form/macro
do notation/side effects

I’d like to write a basic rebar3 plugin for it as well to simplify working with Erlang since so much functionality is going to depend on mixing with it for the time being. Longer term the type inferencer needs to be substantially reworked and possibly even rewritten. Specific goals there:

logging of typing decisions tied to reference cells and binding points (function and variable names)
proper garbage collection of reference cells, probably abandoning processes for this and using something like ETS
always generalizing type variables

The incomplete short-list of things I’d like to see in 1.0.0:

records with structural pattern matching. I want to be able to match on a subset of fields.
type annotations
anonymous functions
pattern matching in function definitions, e.g. f (x, y) = x + y to destructure a tuple without an explicit match ... with ...
an Emacs mode so I don’t have to manually indent things
something like go fmt. My intention is to integrate comments with the AST partially to enable this.

Contributing and Discussing

Contributions are welcome subject to the licence and code of conduct as specified in the project repository. I lurk in #erlounge on freenode as j14159 and I’m on Twitter with the same username.

How I Made Broken ADTs

Jun 12^th, 2016

I’ve made some progress on ADTs in my toy BEAM language and have discovered an issue that initially turned up with recursive types but would be exhibited by any type that references another user-defined ADT. Spoiler: the answer is to have a set of bindings from type names to their instantiated types and use these by deep-copying as with functions (h/t Pierce’s TAPL for the bit about making copies). Briefly, the following works:

type my_list 'x = Nil | Cons ('x, my_list 'x)

f x = match x with
    Nil                         -> :nil
  | Cons (i, Nil), is_integer i -> :length_one
  | Cons (i, _)                 -> :more_than_one_item

While this does not:

type recursive_tuples = int
                      | float
                      | (string, recursive_tuples)

f x = match x with
    i, is_integer i             -> :int
  | f, is_float f               -> :float
  | (key, value)                -> :pair
  | (key, (another_key, value)) -> :nested_pair

The nested tuple (another_key, value) fails to unify with the type recursive_tuples. This is because the AST node for the type definition gets instantiated to this record:

#adt{name="recursive_tuples"
     vars=[],
     members=[t_int,
              t_float,
              {t_tuple, [t_string,
                         #adt{name="recursive_tuples",
                              vars=[],
                              members=[]}]}]}

The first example works because the typer maintains a proplist of type constructor to AST type nodes so any reference to a type constructor will be able to find a type to instantiate. The second example fails to unify because once the typer sees the tuple, it uses the tuple member’s enclosed members to try to type the nested tuple. Note that the defined ADT specifies its members while the reference to the ADT later does not – there’s nothing to unify with as it appears to be an empty type.

Referring to a set of bindings for the types available to the module should solve this, along with deep copying for multiple instances. I’m tempted to just use generalization and instantiation but it may ultimately be simpler to copy and make new reference cells.

ADTs and Unification

May 27^th, 2016

I’ve been tinkering with a statically-typed language for the Erlang VM for the past few months, up until recently little more than an ML-shaped front for Core Erlang with a type inferencer and incomplete support for basic Erlang stuff. Before fleshing out some of the required bits (like binaries), I’m working through implementing simple abstract data types. I’m aiming for something roughly analogous to OCaml’s variants or Haskell’s GADTs. Essentially I want something like the following to work:

type my_list 'x = Nil | Cons ('x, my_list 'x)

but also:

type json_subset =  int
                  | string
                  | (string, json_subset)
                  | Array (list json_subset)

It’s the latter that makes type unification a bit squirelly and this post is mostly thinking through how I’m going to attempt it at first. It’s entirely possible I don’t fully understand the problem I’m trying to tackle yet but as this entire project is an attempt to better understand type inference and checking I’m pretty OK with that.

For clarity, the unification function gets called currently in the following situations:

for items being cons’d together into a list
across all pattern match clauses to type the match (all patterns must unify with each other, all result types must unify as well)
inside each clause to ensure the types forced by guard functions match the types in the result expression
inside each guard, sort of a special case of function application that I can likely generalize better.
Erlang FFI returns. These look like pattern matches but only the result portion of each clause is checked.
function application.

A cleaner example to describe the problem is based on the following types:

type t = int | A int
type u = int | float
type v = int | float | (string, float)

Inferencing of basic matches already works, e.g.

f x = match x with
    0 -> :zero
  | 1 -> :one
  | _ -> :more_than_one

will type to {t_arrow, [{t_int}], t_atom} meaning “a function from integers to atoms”. What I’d like is for

f x = match x with
    i, is_integer i -> :int
  | A _             -> :got_an_A

to type to {t_arrow, [{t_adt, "t"}], t_atom} but this won’t be too hard either.

The following is a little more difficult:

f x = match x with
    i, is_integer i                 -> :int_but_could_be_one_of_three_types
  | f, is_float f                   -> :could_be_u_or_v
  | (s, f), is_string s, is_float f -> :definitely_v

The first two clauses (when unified) are shaped like a u while the third makes it a v. Unification works by trying to unify only two types at a time at present so in a naive implementation the first and second clauses will unify to {t_arrow, [{t_adt, "u"}], t_atom} and the third will cause a unification failure because there is no type encapsulating u and v. I think I’ve got two options, aside from saying “reorder your clauses”:

Rework unify to take a list of expressions instead of two.
Require a union type covering v and u in order to type

I think #1 is better (“just works”) but I think I’m going to choose #2 as at moment the list of problems to solve before opening up the source is a bit nasty, including but not limited to:

type errors are nothing more than {error, {cannot_unify, Type1, Type2}} and don’t even include a line number. “User-hostile” is an understatement.
documentation and testing facilities are nonexistent. I’ve got some particular directions I want to go with both (mostly selfish learning stuff) before I take the lid off.
a truly scary number of dialyzer errors. That my type-checking language doesn’t type-check is a source of mild amusement for me at the moment.
pretty much zero support for compiling anything but a list of files.
some trivial-to-implement stuff like sending and receiving messages.
haven’t even begun to tackle strings-vs-lists, etc.

I’ll probably make the repository public once at least the first three items of the above list are done as right now those are the bits I’m most embarassed about.

Extra bits for the curious:

implemented using leex and yecc with the cerl and compile modules to create Core Erlang ASTs and generate .beam files respectively.
has a basic FFI to Erlang, I haven’t yet tried calling dializer so it requires a set of clauses to type the result.
type checker was based initially on How OCaml type checker works.
not much of a design behind the language at the moment. Bits of syntax come from Haskell, OCaml, and Elm at the moment. It’s grown as I’ve learned.

Recent Influences

Mar 12^th, 2015

Over a lunch the other day with Arthur I let fly a number of references that I’m sure I’ve mentioned to others as well, and thought I should collect a few of them in one place. Some of these talks, posts, and books are more recent than others but collectively they are some of the bigger influences on my developing view of what constitutes “professionalism” and generally improving some different aspects my life. I’ve tried to give credit to the people from whom I saw them where I remember.

By no means is this a complete list, it’s simply those pieces that immediately came to mind as being influential for me within the last couple of years.

On Being A Senior Engineer

If memory serves, I first saw John Allspaw’s post mentioned in a tweet from Coda Hale. My first reading was the cause of some immediate and serious introspection as I compared how I acted when first holding this title against Allspaw’s ideal and found myself wanting. Much of what is in his post is the current foundation of what I would consider to be professionalism and definitely contributes to some aspirational goals.

Death By 1000 Paper Cuts

I don’t recall where I first saw Julie Pagano’s post but it’s one that made me start watching how I interact with my different communities, work included. This was an excellent prompt for me to start examining how my approach at any given point is harming or helping the people around me. I still have a lot of room for improvement but this was a great beginning kick in the ass.

Leaders Eat Last

Simon Sinek’s talk (he has a book by the same name that I’ve not yet read) was recommended to me by Tyler. The way Sinek explains what makes a leader and connects that into how we form groups and work together most efficiently is what initially struck me. Tying the workings of our our brain chemistry into our interactions also changed some of my thinking about how to better support and participate in the various communities I value. I’ve worked in the dysfunctional type of organization he describes and I want to work in and build the opposite.

Aurelius’ Meditations

A fairly well known book that I’ll likely be re-reading every year or two for the rest of my life. The main impacts my first reading had were making me more aware of my frustrated reactions and even anger, helping me manage them and better understand what actually matters. The results (aided by my notebook system for reflection) have been less fundamentally irrelevant frustration and better focus. A large part of what Aurelius seemed to be struggling with in my first reading was finding balance with all of the different people and pressures surrounding him. The translation I read was introduced to me by the Farnam Street blog – that Amazon link is an affiliate one for the blog (thanks to Tyler for introducing me to Farnam Street).

Speaking While Female

This NYT piece has made me much more cognizant of how much I interrupt people and I’m trying to rein this in. I’m not actually participating in a community if all I do is broadcast my opinion and try to talk over people.

Burnout and Ops

This presentation from Lars Lehtonen probably deserves watching a few times. This one is fantastic for getting some perspective on what’s actually valuable. I’m not sure who exactly I saw reference this at first but it was definitely one of Vince, Chris, or Michael.

Keeping Reasonable Hours

This post was a prompt to consider two things: what is actually valuable to me, and how can I focus on being more efficient and effective? Since restricting the volume of hours I spend directly on work and bursting only when necessary, my effectiveness definitely seems to have increased. It leaves more time for my brain to recuperate, my relationships (which also provide a form of recovery), and for me to work on generally improving things related to my career (e.g. reading computer science papers) and personal life.

Cluster Reading List

Feb 28^th, 2015

At work we use Mesos to manage a dynamic pool of resources for our Spark jobs and while I’ve also dabbled with Hadoop in the past, I’ve found my understanding of the mechanics and initial design concerns of all of the above a bit lacking. Rather than parrotting the standard marketing lines, I wanted to understand the fundamental reasons and motivation behind treating a pool of machines (virtual or otherwise) as an anonymized pile of aggregate CPUs and RAM for many different types of work. A non-exhaustive list of things I want to understand are:

How are larger jobs broken down and parallelized?
How are parts of jobs made to coexist in an environment of constantly changing demands?
How do we ensure full utilization of extant resources and do we need to in a cloud environment as opposed to simply scaling the resources?
While I understand the basics of being concerned with “data locality” (get the computation as close as possible to the data), what are some practical ways to achieve this in a general way, applicable to many applications?

In hopes of answering these (and other) questions, I started to compile this reading list of papers that seemed like they would help answer them. An earlier form of this list was posted as a gist here.

At present I have completed the Google File System, MapReduce, and Microsoft Dryad papers, all of which I have thoroughly enjoyed reading and they have already contributed to a better understanding of how I should be approaching Spark.

The List

This list was built primarily by crawling through the references sections of a few different papers. Some of these have come by way of suggestions from Omar and Michael as noted below. In pursuit of a reasonable reading and study plan for myself, I tried to lay the material out in something of a logical path from early successes to more recent ideas and implementations.

If you feel I’ve left out anything particularly awesome or am incorrect in any assumptions/descriptions with respect to the later papers, I’d love to hear about it and will update my list accordingly.

Fundamentals

A brief list of papers and ideas that I thought would give me something of a foundation in what the main problems, concerns, and early solutions were.

Google File System – most of the later material references this paper and it seems like this enabled a lot of other ideas later on. The distribution of chunks of a file around a cluster enables a lot of interesting data locality stuff.
Google MapReduce – a pretty lucid and well thought out divide-and-conquer approach to parallelizing a huge job. The functional programming part of me loved the LISP shout-out/inspiration.
Dryad for a more fine-grained/raw approach to programming against a cluster.
Quincy is, from what I understand so far, a more detailed look at an alternative scheduler for Dryad.
Delay Scheduling for a detailed look at a different Hadoop scheduler. Note the involvement of Matei Zaharia who later worked on both the Mesos and Spark projects.

Cluster Schedulers

Just the two big ones here, unfortunately there don’t seem to be any papers out there on Google’s Borg project.

Mesos as mentioned above is what I currently use to manage Spark resources.
Omega is (was?) a project whose aim is/was to fix shortcomings in Google’s existing cluster scheduling/management system (assuming Borg).

More Programming Models

DryadLINQ is a higher-level programming model built on top of Dryad. Composition FTW.
Pregel a graph processing paper from Google. References Dryad as well but I’m not sure how close it is as I haven’t read it yet and it seems to be less about dataflow.
MapReduce Online for an iterative MapReduce.
Distributed GraphLab sounds like another graph-like computation model but one that embraces asynchrony and parallelism a bit more. Possibly belongs in the next category but I’m not sure yet.

Data Flow

MillWheel as recommended by Omar talks about stream processing but does appear to tackle the problem with a form of computation graph that looks like a dataflow approach.
Naiad is relatively recent from Microsoft Research (2013) and intriguingly mentions “cyclic dataflow” (also via Omar).
Derflow is about “deterministic dataflow” with Erlang. Sounds too good to pass up.

Big Ideas

Condor is referenced both by MapReduce and Dryad as an inspiration for cluster management approaches so I’ve added it to get a bit more perspective at the end.
The Datacenter as a Computer (thanks to Michael for the reminder) is a bit over 100 pages but seems to cover a lot more of Google’s general approach to clusters and more.

IX and the Throughput/latency Tradeoff

Jan 14^th, 2015

Since Saem put me onto the Coursera “Learning How To Learn” course and Tavis (as well as Tyler iirc) pointed me at Scott Young, I’ve been trying to figure out how to better approach learning all of the material I try to throw at myself.

This coming Monday for the Polyglot reading group we’re covering the IX protected dataplane OS paper which describes a set of tradeoffs that operating systems must make when deciding how to handle IO (network in this case).

While initially skimming the paper to try to figure out how I might chunk it to build a more cohesive understanding, I realized that I had an intuitive grip on the tradeoff between throughput and latency but I’d never tried to properly define it for myself. Being able to clearly define something I think is a mark of actual understanding and builds the ability to leverage it. I figured this would be a good exercise in that spirit and both Dr Oakley and Scott Young seem to agree from what I’ve seen so far.

My Attempt

Describe throughput, latency, and the tradeoff between them as generally as possible but with a view to how they pertain to the IX paper.

Throughput

Is a measure of how much uninterrupted work can be done.

Latency

Is a measure of how long it takes from submission of some work until its completion.

I think “submission” is likely interchangeable with any of the following if you prefer:

definition
creation
identification

The Tradeoff

In a system with limited capacity to do work, latency is proportional to the ability to interrupt existing work. If interruption of current work is not permitted, meaning existing work proceeds with no breaks, the latency is equivalent to the time it takes to complete the pre-existing work because the system cannot do the new work until it finishes the existing work. If interruptions are permitted without restriction, throughput is entirely bound by the frequency of interruptions.

In a system that preempts work, the balance is struck by limiting the frequency of interruptions.

In a cooperative system, the balance is struck by yielding control after an alotted amount of work is done (note that this depends on altruistic workers).

I’m going to try to apply this same sort of approach for simpler base concepts. I think it’s roughly in line with the Feynman technique albeit simplified for the moment perhaps.

If you disagree with my definition with respect to the IX paper, I’d love to hear about it. As mentioned I’ve only skimmed the paper so far and so it’s entirely likely my definition and/or understanding will change a bit as I dig in deeper.

Update

Thanks to Saem poking holes in the above definition, I’ve started to refine it a bit. The primary issue he turned up is that nowhere did I specify a cost for the interruptions required by latency and this further turned up an error in my definition of latency. I’ve adjusted the definition of latency and the tradeoff as follows:

Latency

Is a measure of the time from submission of work until some acknowledgement that the work is proceeding. Servicing latency requires a non-zero amount of work because the expected minimum of the acknowledgement itself is work.

Note that this definition is primarily concerned with responsiveness rather than completeness of the submitted work and that latency consumes some measure of available throughput since it requires work.

The Tradeoff

A system with a limited capacity to do work, like a computer, is one with a bounded or static throughput. In order for this system to provide the lowest possible latency, it must be able to interrupt existing work without restriction and consume throughput for at least acknowledgement of the receipt of new work. On the other hand, to provide the highest possible throughput the system must prevent interruptions that also wish to consume its limited resource in favour of any existing work’s use of it.

Becoming Better

Dec 7^th, 2014

What harms us is persisting in self-deceit and ignorance.

Marcus Aurelius

This post is an attempt by me to describe a few techniques/practices I’m trying to use in the pursuit of becoming better. What I’m trying to improve is my knowledge, its application, my general behaviour, and consistency in all of these things.

The root of this endeavour can probably be traced back to Saem asking me which notebook system I use as we and a few other friends suffer from a minor notebook addiction (Paper-ya is dangerous). At the time I had never heard of bullet journalling nor any of the other systems and none especially met how I tended to approach note taking. After more recently discussing the concept of the non-zero day with Tyler, he sent me a link to a modern copy of Benjamin Franklin’s journal. Reading the Farmam Street post on common place books was the final missing piece for me.

I’m roughly 3 weeks in to this new process and am finding it quite useful. After ranting to several people about it, Omar suggested I document what I’m doing and here we are.

Better?

To be more specific, I’m defining a “better me” as one who:

creates more things (efficiently, valuing completeness)
wants to hear opposing view points and ideas in order to find the truth faster
has more knowledge
has more patience
is basically kinder and more supportive.

I think the last point is something many of us might define differently. I do not at all think that kindness means blanket acceptance of everything somebody else desires and neverending sympathy but rather showing everyone common decency, basic respect, and an interest in understanding where another person is coming from (their experiences, goals, limits).

The Books

In this pursuit of a better me, I use three different notebooks.

The Franklin Journal

As the reproduction I mentioned above sold out quickly and I already have more notebooks than I know what to do with, I’m using a small pocket notebook for this. The format is essentially the same as Franklin’s with some small modifications, one of which is that each week gets a page for “lessons learned last week”. This is just a space for me to detail any large overarching themes or things I discovered. The chart tracking my failures at the 13 virtues on a daily basis is the same as in the reproduction.

As in Franklin’s original, each day gets two dedicated pages:

The Plan, a list of non-work goals I have for the day. This is another difference in that I don’t use a set schedule as Franklin did. Between the tools for my job and Google Calendar I have enough tools to track and schedule based on hours, etc.
A set of “goodness” goals for the day.

The Plan

Everything in my daily plan is related directly to my own personal goals and improvement. This can include everything from dedicating time to a project (this blog post has been in a couple of Plans) to exercise and basic housework that needs doing. What occurs in every Plan is reading.

Each daily plan has a goal of at least 40 pages read from the book I’m currently working on as well as at least one longer form blog post or paper. This has been very helpful in keeping my pace up and Charles recently told me he similarly blocks off reading time in his calendar. For blog posts and papers in this context, I try to look for things outside my normal areas of experience, such as articles on Model View Culture or even completely different areas of study such as (today) Equality of What. That last one is a bit longer and more difficult for me so I’m dedicating a few days to completely digest it.

Each item I complete gets a check-mark and each item I fail to complete gets an X.

Doing Good

As in Franklin’s journal, this page is always titled “what good will I do today?”. I’m treating it as an exercise in reflecting on how I can make a positive contribution to the communities I participate in. Based on an article on gratitude posted recently by Geordie, I’ve lately been trying to integrate expressing gratitude to others as part of this.

At the end of the day I add “what good did I do?” to the middle of this page and list what actions I have taken to positively impact the communities and people around me.

Failing

All of the above likely sounds relatively strict. An important part of this so far for me has been to set the bar a bit high and forgive myself for failing. It’s an ongoing process of learning about how I spend my time and what I prioritize. The Franklin journal helps me measure my progress so that I can see when I’m lapsing and when I’m improving – a continous self-intervention. The great thing about the Plan is that even if I don’t complete everything, it’s an easy way to make sure I have a non-zero day (or non-one, non-two, etc).

While I try to start every day by setting a Plan and Goodness goals, I have failed to do this a couple of times as well. When this occurs, I borrow Omar’s practice of performing a personal retrospective at the end of the day. As in the agile software development side of things, I simply describe a 2-3 things I thought went well and 2-3 things I know I could have done better. The key for me has been to keep the aspects of reflection and measurement unbroken.

My Common Place Book

I originally came across mention of this in Farnam Street’s post “How Do People Get New Ideas?” which linked to a post on combinatorial creativity. Before I knew the term “common place book”, this notebook started as a place for me to make general observations and notes about the papers I read and rough outlines of ideas and presentations.

It still serves these purposes but has become more focused on the following as well:

Specific ideas and quotes from texts. The quote I opened this post with is an entry, along with a few thoughts about it and some of the surrounding text.
My thoughts on the portions of texts I extract or sections of things I’ve read. If I disagree with something I read in a blog post, I expand on it in this book and come back to link other supporting or dissenting ideas and opinions to it.
Trying to synthesize a larger understanding of these items I latch onto. Reading Meditations was where this practice started to really take shape for me but elements of it helped earlier this year when reading AIM-349. I suspect this sort of thing is second nature for those with more of a background in serious study.

My Project Book

This is a bit of a side channel for possible product ideas and specific projects I want to plan in a bit more depth, all software related at present. Here I try to describe ideas for projects more fully, sometimes mocking/sketching out what aspects of them might look like. These descriptions and sketches are a minor form of private feasibility study or even gut check and have led to me discarding some ideas before wasting too much time on them. Because this book often distills thoughts and research items from my common place book, it tends to fill at a slower pace than the other two notebooks.

Tools

For those also suffering from notebook addictions, here’s what I’m actually using:

My Franklin journal is a lined Leuchtturm pocket notebook. The size is pretty much perfect for the amount I write.
My common place book is (embarassingly?) a swag notebook from F5 made by Journalbooks, I believe this is the model. As it’s almost full now, I have a dotted Leuchtterm replacement ready to go.
My project book is a grid Works notebook. I like the rubberized cover but the Leuchtterm and Rhodia papers are nicer.

If you care about pens, I’m pretty simple. I stick to the Zebra F-301 currently with 1.0mm refills.

Akka and JDBC to Services

Jul 27^th, 2014

I started mapping out this post over a month ago after I saw another question about how to handle blocking database calls on the Akka mailing list. I’ve been working towards most of what follows over the last year and after discussing the basics with a few others in the Vancouver Polyglot community with some additional prompting from Saem, I finally put this together (project link for the lazy).

This represents my current thinking (as of roughly July 2014) about a reasonable approach to the problems of state, blocking, and failure for data store interaction within the confines of Scala and Akka. This post specifically focuses on JDBC connections but is roughly applicable to any system that provides a blocking API. I define “reasonable approach” to be one that has significant enough upsides to combat its explicitly acknowledged weaknesses and negatives (I’ll get to those towards the end). The particular approach I’m outlining here is a way to structure an application such that you have some reasonable flexibility later on and some clear benefits up front, yielding a future-viable design.

Actors And Databases

The initial question often seems to be “how do I handle JDBC/blocking calls with Akka”, the answer to which almost always involves “bulkheading” which we’ll get to a bit later in this post. The issue I want to tackle first is one of state.

Database connections:

Are state
Have a lifecycle (connected, in-transaction, disconnected, etc)
Are prone to errors (timeouts, network partitions, etc)

Database connection lifecycles should be tied directly to an actor’s lifecycle with errors on the connection handled by supervision. A 1:1 Connection:DAO actor is specifically what I am advocating. I don’t think the DAO approach suffers from lack of acceptance or understanding so I’m not going to get into specifics about it here. The basics around actor-connection coupling are as follows:

Actors get a connection factory method with the most basic type being () => java.sql.Connection.

This method can be constructed elsewhere, closing over the necessary immutable state required to instantiate a connection, e.g. JDBC URL.
On any crash, the connection simply gets closed and the replacement actor instance will lazily recreate its required connection. Supervision strategies and preRestart() behaviour can be easily tailored to not always kill the connection as this default behaviour is obviously not always desirable. Tearing down the connection and starting a new actor instance on a column index error is clearly extreme.

val databaseUrl = "postgresql://some-hostname:5432/db-name" Class.forName("my.sql.database.driver.classname") class BasicJdbcActor(connFac: () => Connection) extends Actor { lazy val conn = connFac() override def preRestart(why: Throwable, msg: Option[Any]): Unit = try { conn.close() } def receive = { case anything => throw new Exception("Where's my implementation?") } } def connFac = () => DriverManager.getConnection(databaseUrl) def makeMeAnActor(sys: ActorSystem): ActorRef = sys.actorOf(Props(new BasicJdbcActor(connFac)))

Anyone who has deployed Erlang/OTP or Akka applications into a production will have noticed the problem that I have just introduced: Akka actors (like processes and gen_servers in Erlang) by default have an unbounded mailbox. Since we’re making blocking calls inside the database actor, when a burst of requests occurs they will all queue inside the actor’s mailbox without any sort of back pressure. Parallelism partially helps to solve this problem.

Parallelism and Pooling

Release It! advocates against writing one’s own connection pool and in general I think Mr Nygard puts it fairly well:

…I oppose anyone rolling their own connection pool class. It is always more difficult than you think to make a reliable, safe, high-performance connection pool.

Having said that, I will now happily violate his basic rule as well as Chris Stucchio’s position by saying the use of a router and the previously mentioned supervision strategy applied to a pool of actors is reasonably sound, safer due to less sharing of mutable state (a traditional connection pool), and performant enough (with caveats to follow).

UPDATE: James Roper was kind enough to correct some mistaken assumptions on part with respect to connection pooling below. I’ll be giving some thought over the next while as to a good way to leverage connection pooling within this approach.

Using supervision strategies to enable both configurable and adaptable responses to errors is better than an opaque connection pool. Seth Falcon’s pooler project (hat tip to James for suggesting it to me) is where I first started to understand how useful and fundamentally sane this approach can be. While I think it is safe to assume that JVM connection pool implementations these days are fast enough and avoid things like synchronized and heavy locks, passing around a reference to such an object containing so much mutable state is at odds with much of what makes both Scala and Akka effective.

It is important to note that a realistic supervision strategy would not cause the actor in question to restart on every exception, discarding the existing connection and creating another. If the connection itself has no problems, “Resume” is a perfectly legitimate response from the strategy in many cases. Your strategy must be tailored to your particular problem and requirements.

// very naive, be more specific based on your problem: val restartStrategy = OneForOneStrategy( maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _ => Restart } def newPool(sys: ActorSystem): ActorRef = { val props = Props(new BasicJdbcActor(connFac)) val pool = RoundRobinPool(4, supervisorStrategy = restartStrategy) sys.actorOf(pool.props(props)) }

Those in the audience paying attention will note that even though we now have parallel handling of requests, we have still not solved the actual queue problem. Two of the simplest solutions are:

Limit the number of requests in flight with a different actor mailbox type: akka.dispatch.BoundedMailbox. The downside here is that message sends are now blocking which may add to throughput issues.
Use proper timeouts with the “ask” pattern and a circuit breaker to fail faster and back off. More on this approach later.

Bulkheading

“Bulkheading” is a pattern used to protect one set of components from negative influence by one or more different sets of components in your system. Here we’re specifically concerned with preventing the set of actors making JDBC calls from using up all of the compute resources (threads) used for other operations, e.g. responding to web requests, recording metrics, reporting on JVM stats, etc.

As an example, let’s suppose you have a simple application supporting a couple of basic GET requests:

/user/:id to get a specific user object from the database by ID
/ping to check if the system is alive

Further, let’s assume you’re using something like the Spray library and thus have actors servicing HTTP requests. If your single Akka dispatcher has a maximum of N threads and you have M or more parallel user requests occurring at the same time, when M >= N any call to the /ping endpoint will block without a response until one or more of the user requests complete. Lest you think that success or failure of these calls is guaranteed to be fast (or even to occur at all), I encourage you to read Release It! at a minimum.

In order to bulkhead everything else from the JDBC actors, we create a different dispatcher for the database actors and spawn them only on this separate dispatcher. Now the non-database components will not be affected when the database actors have blocked all of their threads. Note that this is not a panacea – should all of the database actor threads be blocked, any component that relies on this pool of actors will be prevented from making progress until earlier calls complete.

Dispatcher configuration:

my-dispatcher { type = Dispatcher executor = "fork-join-executor" fork-join-executor { parallelism-min = 2 //2 threads per core parallelism-factor = 2.0 // The max that the dispatcher will create: parallelism-max = 8 } }

And a minor change to our pool with a router:

val props = Props(new BasicJdbcActor(connFac)) .withDispatcher("my-dispatcher") val pool = RoundRobinPool(4, supervisorStrategy = restartStrategy) sys.actorOf(pool.props(props))

This is just the beginning of isolating and protecting different pools of actors and tasks (e.g. futures). While I don’t want to get too deep into tuning dispatchers and prioritizing access for different components, you should be aware that there are a variety of ways to affect quality-of-service for clients of your actors. Here are some of the simplest places to start:

The number of threads per dispatcher (with high and low thresholds) can be adjusted as needed. If you need more throughput, make more actors and give them correspondingly more threads. Don’t forget that running more connections has its own penalties and limitations.
A “pinned” dispatcher can be used wherein each actor gets its own thread all to itself.
For higher-priority clients, give them their own larger pool of actors on yet another dispatcher if necessary (almost last resort I think).

DAOs

Those of you with some experience will note that I’m actually conflating DAO and Aggregate Roots for reasons that will become more clear shortly.

Each aggregate root gets an actor implementation that provides all of the necessary CRUD functionality or whatever subset is needed (a DAO with a message-passing interface). In addition I favour creating unique case classes for each type of request that the DAO actor services rather than more ambiguous strings or tuples. All interaction with the actor is performed via the ask pattern.

Hiding CRUD mechanics is important because SQL or some DSL analogue appearing throughout your code and packages is a leak. Data store details are details about serialization and with very few exceptions should not be leaking into your business logic. “Save this thing”, “find things that match these criteria”, etc are not specific mechanics, they’re closer to goals that need to be satisfied. “INSERT x INTO TABLE y” are particulars that are not germane to your business logic.

case class Person(name: String, email: String) case class PersonById(id: Int) class PersonDao(cf: () => Connection) extends Actor { lazy val conn = cf() override def preRestart(why: Throwable, msg: Option[Any]): Unit = try { conn.close() } def receive = { case Person(n, e) => //call insert function with above connection sender ! 1 // mock person ID case PersonById(id) => //get person from connection above sender ! Person("name", "email") // mock } }

Failure as a First Class Citizen

Rather than the raw ask pattern, informed somewhat by common Erlang approaches to gen_server interfaces I always create a client trait in front of the DAO actors returning Futures that can easily be turned into a standalone class or object where necessary (e.g. testing).

trait PersonClient { // supply a router with a pool of PersonDao: val personPool: ActorRef // how long should we wait for a response from PersonDao: val timeoutInMillis: Long implicit val timeout = Timeout(timeoutInMillis millis) def addPerson(p: Person): Future[Int] = (personPool ? p).mapTo[Int] def personById(id: Long): Future[Person] = (personPool ? PersonById).mapTo[Person] }

Why Futures?

They keep failure handling front and centre via recover() and recoverWith() (I believe Finagle may be responsible for this).
Explicitly asynchronous.
Easy to combine and parallelize via map(), flatMap(), zip(), for-comprehensions, etc.

The first point bears emphasis: futures in Scala and Finagle expose failure and doing something with that failure as a primary concern. For the simplified starting point described here, timeouts are the primary failure we’re initially concerned with. To that end, we can combine sensible timeouts with circuit breakers to help solve the queue growth problem and keep our systems responsive.

On sensible timeout values, I will defer to Pat Helland:

Some application developers may push for no timeout and argue it is OK to wait indefinitely. I typically propose they set the timeout to 30 years. That, in turn, generates a response that I need to be reasonable and not silly. Why is 30 years silly but infinity is reasonable? I have yet to see a messaging application that really wants to wait for an unbounded period of time…

By embracing failures as legitimate responses, we can begin to build systems that degrade gracefully and continue to function in the face of most problems that occur (see most of what has been written by the Netflix team). This point cannot be overstated: when talking to a data store, you are explicitly making calls to an external and very probably remote system of some sort and thus failure must be accounted for and must be handled.

Smells Like A Service

When constructed with the above approach, an application has effectively been built as a set of internal services. Since all of the clients talk to the internal services via a mechanism that expects failure and encourages its proper handling, you can simply swap out the implementation behind the scenes for remote services whenever such growth is necessary, e.g.

Dispatch HTTP calls already return futures.
Netty or Akka TCP clients writing to a Promise.
Actors talking to external services via RabbitMQ, 0MQ, etc (though not what I would generally recommend).

It is far simpler to move functionality to an external application (separate service) in this manner than with a more traditional DAO or ORM approach.

Explicit Negatives

Using the ask pattern means that responses from the actor are typed as Any, which effectively bypasses type checking. One solution with its own issues is to send an already constructed promise with your message to the DAO actor using tell/!. The DAO actor could then write a success or failure to this promise, yielding proper type safety but complicating timeouts.

If you rely entirely on the Akka circuit breaker implementation, the problem can be mostly solved. Since the circuit breaker will return a failed future after a timeout you specify, you can send an already constructed Promise to the actor and give the circuit breaker the result of Promise.future. Since the circuit breaker does not have access to the promise you constructed, you should have no problems with a race to write a result to it on low timeouts and/or slow actors (see the example project for this method).
It should be fairly obvious that with complex problem domains and a lot of models the actor and thus connection count can get high. When this occurs, you already have an interface that looks like a service and provides reasonable support for failure handling. Move larger and/or high-traffic aggregate roots out of the application into actual separate services, swapping out the functionality behind the Future-based API without changing the interface. The only difference to callers should be at most some different exception classes on failure and completion times.

While I think externalizing services like this is generally a great idea, exercise caution. Move bounded contexts out first and don’t forget that when you move to an external service, you add a set of latencies in serialization, deserialization, and additional network traffic. Ensure that you both measure and account for this.
Connections are hidden so the functionality provided by your DAO actors must be complete (no arbitrary SQL). You can get around this by building an actor that takes closures or something along those lines as transaction blocks. See Querolous for this sort of idea but I think this is missing the point in the context of this particular discussion.
Transactions that span multiple aggregate roots are essentially impossible in any traditional sense (those that are handled entirely inside the database), as they also are in most honest distributed system designs. People like Pat Helland and Peter Bailis amongst many others are doing very interesting work in this area (apologies to the many deserving people I’ve left out here, too many to name).
Timeouts are not guaranteed to trigger failure exactly when you want them to (it may take longer).
The router + actor approach introduces aspects of queue growth as described above so there is an obvious need for back pressure and capacity planning. Circuit breakers help mitigate this but they’re not a silver bullet.
Boilerplate.

The Immediate Benefits

There are three particular benefits for any system, especially if done in the early stages of its life:

APIs that are asynchronous, composable, and failure-centric.
Easily configurable parallelism and limits.
Comprehensible subcomponents (the most important benefit of service oriented architectures whether micro or otherwise).

At present I think the above strengths combined with the relative ease of partitioning a system into other more granular services later on (not to mention CQRS and event sourcing) make this a reasonable approach to designing a system. While I’m not 100% happy with the boilerplate feeling I get at times, the concrete isolation of the models into their initially internal services has definitely helped me already to keep things clean and understandable. Having said that, it is far from lost on me how much less code it takes to do the same thing in Erlang with the caveat that I have yet to see something comparable to futures in for-comprehensions there (please correct me if I’m missing something like this!). If you missed the two links above, my example project using the send-in-promises approach is available here.

Thanks to Saem and Ed for proof reading and pointing out where more clarity was necessary. Comments, criticism and general dicussion welcome in comments or on Twitter.

← Older Blog Archives