FinagleCon

FinagleCon was held at TwitterHQ in San Francisco. It is refreshing to see a nice working atmosphere with free food and drinks. Now for the contents.

Twitter’s RPC framework, Finagle, has been in production since August 2010 and has over 140 contributors. In addition to Twitter, it has been adopted by many large companies such as SoundCloud. Initially written in Java with FP constructs (monads, maps, etc.) all over, it was soon after rewritten in Scala.

Finagle is based on three core concepts: Simplicity, Composability, and Separation of Concerns. These concepts are shown through three primitive building blocks: Future, Service, and Filter.

• Futures provide an easy interface to create asynchronous computation and to model sequential or asynchronous data-flows.
• Services are functions that return futures, used to abstract away, possibly remote, service calls.
• Filters are essentially decorators and are meant to contain modular blocks of re-usable, non-business logic. Example usages are LoggingFilter and RetryingFilter.

The use of Futures makes it easy to test asynchronous computations. Services and filters both can be created separately, each containing their specialized logic. This modularity makes it easy to test and reason about them separately. Services and filters are easily composed, just like functions do, which makes it convenient to test chains. Services and filters are meant to separate behaviour from domain logic.

As amazing as Finagle is, there are some things one should be aware of. To create a really resilient application with Finagle one has to be an expert in its internals. Many configuration parameters influence each other, e.g. queue size and time-outs. With a properly tuned setup Finagle is properly fast and resilient (the defaults are good as well, mind you). As most data centres are heterogenous in their setup, faster machines are added to the pool, and other conditions change, one has to keep attention to the tuning continuously in order to maintain optimal performance.

Some general advice, watch out for traffic amplification due to retries, keep your timeouts low so retry is useful, but not as low that you introduce spurious timeouts.

For extra points, keep hammering your application until it breaks, find out why it breaks, fix it, and repeat.

The future

In addition to this heads up we were also given a nice insight in the upcoming things for Finagle.

In order to make more informed decision, we will get a new Failure type which contains more information instead of ‘just’ a Throwable. In this new Failure, an added field indicates whether it is safe to retry.

There are several issues with the current way of fine-tuning Finagle, as mentioned, you need to be an expert to use all the configuration parameters properly. Next to this the configuration is static and doesn’t take into account changing environments and behaviour of downstream services. Because the tuning of the parameters is tightly coupled with the implementation of Finagle it is also hard to change the implementation significantly without significant re-tuning.

In order to battle the last two points, Finagle will introduce Service Level Objectives (SLO). The SLO is a higher-level goal that Finagle should strive to reach instead of low-level hardcoded parameters. What these SLO will be exactly is not yet known.

The community

The Finagle team will synchronize the internal Finagle repository with the Github repository every Monday. They will strive to publish a snapshot version of the change as well.

For someone looking to write his own protocol to connect to his service, finagle-serial is a nice project to start with. It is small enough to grasp within a day but big enough to be non-trivial.

It was found that the ParGCCardsPerStrideChunk garbage collection option, available from 7u40, can halve GC times on large heaps. It is recommended to try this parameter. Tuning seems to be hard to do and is generally done by copying a ‘known good set’ of parameters.

Scrooge is a good utility to use for Thrift and Scala as it is aware of Scala features such as Traits and Objects and can generate relevant transformations for them.

When you want to connect to multiple data-centres from a single data-centre one can use LatencyCompensation to include latency times.

Orchestration

The philosophy behind docker is that in order to be solved, a large problem has to be divided into its root problems. One can then proceed by solving every one of these problems step by step. Additionally all elements of the solution need to communicate through a common app.

Docker has always been a tool with a single purpose: the creating, transport, and running of images. Until today there where several issues with docker that make using it somewhat trying at times. It lacked in capabilities for orchestration which is categorized by:

1. Installation of a docker host from scratch;
2. Clustering of multiple docker hosts to spread resource utilization over the cluster;
3. Managing inter-container dependencies at runtime.

Today this changed as Docker Inc. announced a new set of tools.

Provisioning: Machine

Machine provides a one step installer for creating a new docker host on your local machine, a publicly hosted cloud, or a private cloud. It will automatically provision a new machine and set the environment variables such that any following docker command runs on the newly created host. This is very similar to what boot2docker provides.

There are several engines for provisioning in different platforms such as:

• VirtualBox
• VMWare
• AWS
• Microsoft Hypervisor

Clustering: Swarm

Ideally you want to control a cluster of docker hosts with the same interface as you control a single host. In other words the interface needs to be transparent or standardized. With swarm you can.

All existing commands on docker work with the swarm as well. Just point your docker binary to the swarm proxy and you are controlling the cluster instead of one single machine. Swarm is location/data center aware and also incorporates resource management. The default strategy is to use as little hosts as possible. The strategy places several lighter containers on the same node in order to reserve other nodes for heavier containers.

The main features are:

1. Resource management
2. Scheduling honoring constraints
3. Health checks on the cluster and nodes
4. Supporting the entire docker interface

Additionally Mesos can be used to provide the scheduling. Docker Inc. also announced that Mesos will be a first class citizen in Docker. The goal is to be able to run docker containers along side other Mesos jobs in the Mesos cluster.

It appears that swarm is not supported yet by machine, sadly.

Managing inter-container dependencies: Composer

Setting up applications that require multiple containers to function correctly is difficult. Keeping them running is even harder. Docker proposes the Docker Composer.

Traditionally it ran on one single machine and, until today, orchestration needed to be done manually or through external tools.

Docker Hub

Docker Inc. also announces an enterprise version of the Docker Hub. It is able to run wherever the enterprise needs it to run and comes with safe 1-click upgrades. Enterprises are adopting containers as development is up to 30 times faster with halve the error rate.

Some fun facts:

• 100000 contributors to docker hub
• 157 TB of data transmitted each month
• 50 TB of data stored

The timeline for 2015:

1. Increase performance of pulls
2. Increase transparancy by adding and improving on status pages
3. Engage in partnership with Microsoft. Most notably this will result being able to run Linux on Microsoft Azure.

Notes on the Advanced Akka Course

The Advanced Akka course is provided by Typesafe and is aimed at teaching advanced usages of Akka. The course covers the basics of Akka, Remoting, Clustering, Routers, CRDTs, Cluster Sharding and Akka Persistance. The following post starts with a general introduction to Akka and presents the takeaways from the course as we experienced them.

A general overview of Akka

The reader which is already familiar with Akka can skip this section.

According to the Akka site this is Akka:

Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM.

Akka achieves this by using Actors.

Actors are very lightweight concurrent entities.

Each Actor has a corresponding mailbox stored separately from the Actor. The Actors together with their mailboxes reside in an ActorSystem. Additionally, the ActorSystem contains the Dispatcher which executes the handling of a message by an actor. Each Actor only handles a single message at a time.

In Akka everything is remote by design and philosophy. In practice this means that each Actor is identified by its ActorRef. This is a reference to the actor which provides Location Transparency.

Actors communicate with each other by sending messages to an another Actor through an ActorRef. This sending of the message takes virtually no time.

In addition to ActorRef there exists also an ActorSelection which contains a path to one or more actors. Upon each sending of the message the path is traversed until the actor is found or when not. No message is send back when the actor is not found however.

States: Started - Stopped - Terminated If an actor enters the Stopped state it first stops its child actors before entering the Terminated state.

Best-practices

Import the context.dispatcher instead of the global Scala ExecutionContext. It is the ExecutionContext managed by Akka. Using the global context causes the Actors to be run in the global Thread pool.

You should not use PoisonPill as it will be removed from future versions of Akka since it is not specific enough. Roll your own message to make sure the appropriate actions for graceful shutdown are done. Use context.stop to stop your actor.

Place your business logic in a separate trait and mix it in to the actor. This increases testability as you can easily unit test the trait containing the business logic. Also, you should put the creation of any child actors inside a separate method so the creation can be overridden from tests.

Remoting

With the Remoting extension it is possible to communicate with other Actor Systems. This communication is often done through ActorSelections instead of ActorRef.

Remoting uses Java serialisation by default which is slow and fragile in light of changing definitions. It is possible and recommended to use another mechanism such as Google Protobuf.

Clustering

Akka has a simple perspective on cluster management with regards to split-brain scenarios. Nodes become dead when they are observed as dead and they cannot resurrect. The only way a node can come up again is if it registers itself again.

When a net split happens the other nodes are marked as unreachable. When using a Singleton, this means that only the nodes that can reach the singleton will access it. The others will not decide on a new Singleton in order to prevent a split-brain scenario.

Another measure against split-brain is contacting the seed nodes in order. The first seed node is required to be up.

The seed nodes are tried in order.

FSM

There is an library for writing finite state machines called FSM. For larger actors it can be useful to use the FSM. Otherwise stick to pure become and unbecome.

FSM also has an interval timer for scheduling messages. However, the use of stay() resets the interval timer therefore you could have issues with never executing what is at the end of the timer.

Routers

There are two different kinds of routers: Pools and Groups. Pools are in charge of their own children and they are created and killed by the pool. Groups are configured with an ActorSelection that defines the actors to which the group should sent its messages. There are several implementations: Consistent Hash, Random, Round Robin, BroadCast, Scatter - Gather First, and Smallest Mailbox. The names are self-explanatory.

Synchronisation of data with CRDTs

Synchronising data between multiple nodes can be done by choosing your datatype so that If the timestamps and events are generated in one place no duplicate entries occur. Therefore merging a map from a different node in your map is easily done by copying entries you don’t already have to your own data.

This can be implemented by letting each member node broadcast which data-points they have. Each node can then detect which information is lacking and request the specific data from the node that claimed to have the data. At some future point in time all nodes will be in sync. This is called eventual consistency.

Singleton

If you have a singleton cluster manager proxy it only starts when the cluster is formed. A cluster is formed if a member connects. The proxy will then pass on the buffered messages.

Cluster Sharding

Sharding is a way to split up a group of actors in a cluster. This can be useful if the group is too large to fit in the memory of a single machine. The Cluster Sharding feature takes care of the partitioning of the actors using a hash you have to define with a function shardResolver. The sharded actors can be messaged with an unique identifier using ClusterSharding(system).shardRegion("Counter") which proxies the message to the correct actor. ClusterSharding.start is what the Manager is to Singletons.

It is recommended to put the sharding functions into a singleton object for easy re-use of your shards, containing the functions to start the sharding extension and proxy to the shard etc. It is also convenient to adds tell and initialise helper functions to respectively send a message and initialise the actor by its unique id.

Akka Persistence

Akka persistence uses a Journal to store which messages were processed. One of the supported storage mechanisms is Cassandra. It is also possible to use a file-based journal which, of course, is not recommended.

In the current version of Akka there are two approaches to persistence: command sourcing and event sourcing. Simply but, in command storing each message is first persisted and then offered to the actor to do as it pleases whereas in event sourcing only the results of actions are persisted. The latter is preferred and will be the only remaining method in following versions.

Both methods support storing a snapshot of the current state and recovering from it.

Command Sourcing

The main problem with command sourcing lies in that all messages are replayed. This includes requests for information from dead actors which wastes resources for nothing. Moreover, in case of errors, the last message that killed the actor is also replayed and probably killing the actor again in the proces.

Event Sourcing

With event sourcing one only stores state changing events. Events are received by the receiveRecover method. External side-effects should be performed in the receive method. The code for the internal side-effect of the event should be the same in both the receive and receiveRecover methods. The actor or trait for this will be named PersistentActor.

One can use Akka Persistence to “pause” long living actors, e.g. actors that have seen no activity lately. This frees up memory. When the actor is needed again it can be safely restored from the persistence layer.

Tidbits

Akka 3 is to be released “not super soon”. It will contain typed actors. The consequence of this is that the sender field will be removed from the actor. Therefore, for request-response, the ActorRef should be added to the request itself.

Concluding

The Advanced Akka course gives a lot of insights and concrete examples of how to use the advanced Akka features of clustering, sharding and persisting data across multiple nodes in order to create a system that really is highly available, resilient and scalable. It also touches on the bleeding edge functionalities, the ideas and concepts around it and what to expect next in this growing ecosystem.

The Difference Between Shallow and Deep Embedding

Deep and shallow embedding are terms associated with Domain Specific Languages (DSL). A DSL is a language geared toward a specific domain. The dot language is an example of such a DSL for describing Graphs. Conceptually, a shallow embedding captures the semantics of the data of the domain in a data type and provides a fixed interpretation of the data, whereas a deep embedding goes beyond this and captures the semantics of the operations on the domain enabling variable interpretations.

We will illustrate this difference by embedding a simple expression language with summation, multiplication and constants in Haskell. Haskell is especially well-suited for and often used as a host language for embedded DSLs.

We express our language with the following interface. A type synonym Exp for normal Ints and three separate functions representing summation, multiplication, and constants.

We embedded the data of the domain in Haskell and provided functions for construction of the model and we can easily represent the calculation of an expression as $4 + 6 * 8$ with the following lines of Haskell:

The advantage of this embedding that calculating the value of our expression is very fast. Other than the value we cannot determine anything else regarding our expression. This becomes more problematic when we add variables to our language.

We change our type to contain binding information and add two functions to represent the assignment and usage of variables.

And in our naivity we can write the expression $x + 6 * 8$ as follows:

Obviously, evaluating this creates havoc! What is the value of x? We should, of course, have introduced it first:

Now we have assigned a value to x and we can safely use it in our expression.

Had we used a deep embedding we could have prevented the cataclysmic error by first checking whether each variable is assigned before it is used. We create a deep embedding of our expression by using a Haskell data type.

Note that we do not specify how the bindings should be stored, only that such a thing exists. We now define a function that checks whether we use a variable before it is defined.1

With the function above we can check whether an expression is well-formed. With our deep embedding we can even define transformations of our expression; e.g. differentiate with respect to a variable.

Deep embedding allows us to utilize the semantics of our model by defining multiple interpretations of our DSL. The downside is that just calculating the value of our expression has become slower due to the added overhead of the constructors, whereas the shallow embedding can be evaluated by only using Ints.

In short:

• Shallow embedding should be used when you only need a single interpretation or when you are in a hurry.
• Deep embedding should be used in all other cases.

More reading material on this subject:

1. Most often you should use folds (2) instead of this direct recursion.

Combining Graphviz (Dot) and TikZ With Dot2tex

We all want to create good looking documents and good looking documents need good looking images. Because we want consistency and because we are lazy we want to do this as automatic as possible. That is why we use LaTeX, it creates beautifully typeset documents without much manual effort.

Similarly, we use graphviz to generate our graphs for us. It’s automatic layout is the best in the field and the (declarative) dot language is easy to understand and compact to write. We can either include the PDFs dot generated in our document by using \includegraphics or we could use the latex graphviz package, remember that we are lazy. We can easily get the image in our first example in our PDF.

There is a shadow side to using Graphviz/dot as well. There are two problems. Firstly, the image just looks a bit out of place around the nicely smoothed text in a PDF. Secondly, we lack the ability to use TeX code in our graph. This means we are limited to the formatting by dot and the graphs could therefore appear out of style with other figures in our document.

No worries, with TikZ it is possible to create very fancy graphs and images in general but you have to all the positioning manually! Imagine inserting a node and having to reorder everything!

Enter dot2tex it brings all the love of graphviz/dot to TeX/TikZ. Using dot2tex has many advantages:

1. Lets you write your graphs in familiar dot syntax;
2. Let dot – or whichever layout engine you prefer – determine the placement of your nodes and arrows;
3. Style your nodes however you want by using TikZ styles;
4. Optionally, fine-tune the graph by adding extra tikz drawings.

Rather than manually calling dot2tex for every dot file you have please use the dot2texi package. This is the interface to dot2tex and when used as follows generates the image as displayed in Figure 2.

For more TikZ goodness check out the example site.

Happy writing!

Why You Should Switch to Declarative Programming

We are reaching limits of what is feasible with imperative languages and we should move to declarative languages.

When applications written in imperative languages grow, the code becomes convoluted. Why? Imperatively programmed applications contain statements such as if X do Y else do Z. As Y and Z contain invisible side-effects the correctness of the program relies on some implicit invariant. This invariant has to be maintained by the programmer or else the code will break. Thus each time a new feature is added to an application or a bug is fixed the code for the application gets more complex as keeping the invariant intact becomes harder. After a while the code becomes spaghetti-code and bugs are introduced as the programmer fails to maintain the invariant. This is going to happen despite the best intentions of the programmer to keep things clean. Why is this?

Software and Building Architecture Compared

People tend to only understand what they can see. For most people it is difficult to grasp more abstract matters without somehow visualizing them. Software is an example of such an abstract matter. Let us visit the process of developing software through a comparison with developing a building.

JCU App Installation Script

With this script it should be possible to install and build the JCU app in a local directory. It does not build the UHC for you. If that is wanted the option could be build in of course!

Getting Rid of Programming JavaScript With Haskell

For my Experimentation Project at Utrecht University I ported the “JCU” application to Haskell. The JCU application is used to give Dutch High school students the opportunity to taste Prolog.

The project uses the Utrecht Haskell Compiler and its JavaScript backend. The UHC translates Haskell to Core and then translates this Core language to JavaScript. For more information on this see the blog of the creator of the UHC JavaScript backend.

Please read my report on this project. The project is hosted on GitHub in the following repositories:

update 28-01-2012: The keyword jscript in the UHC has been changed to js in order to avoid association with Microsoft’s JScript. Also new Object syntax is now available in the foreign import directives.