# The difference between shallow and deep embedding

## Jul 13, 2013 23:06 · 791 words · 4 minutes read

Deep and shallow embedding are terms associated with Domain Specific Languages
(DSL). A DSL is a language geared toward a specific domain. The dot language{:target=”_blank”} is
an example of such a DSL for describing Graphs. Conceptually, a shallow
embedding captures the semantics of the data of the domain in a data type and
provides a *fixed* interpretation of the data, whereas a deep embedding goes
beyond this and captures the semantics of the operations on the domain enabling
*variable* interpretations.

We will illustrate this difference by embedding a simple expression language with summation, multiplication and constants in Haskell. Haskell is especially well-suited for and often used as a host language for embedded DSLs.

We express our language with the following interface. A type synonym `Exp`

for
normal `Int`

s and three separate functions representing summation,
multiplication, and constants.

We embedded the *data* of the domain in Haskell and provided functions for
construction of the model and we can easily represent the calculation of an
expression as $4 + 6 * 8$ with the following lines of Haskell:

The advantage of this embedding that calculating the value of our expression is very fast. Other than the value we cannot determine anything else regarding our expression. This becomes more problematic when we add variables to our language.

We change our type to contain binding information and add two functions to represent the assignment and usage of variables.

type Exp = ([String ⊨ Int], Int) assign :: String -> Int -> Exp var :: String -> ExpAnd in our naivity we can write the expression $x + 6 * 8$ as follows:

val = var "x" `plus` (const 6 `times` const 8)Obviously, evaluating this creates havoc! What is the value of `x`

? We should,
of course, have introduced it first:

Now we have assigned a value to `x`

and we can safely use it in our
expression.

Had we used a deep embedding we could have prevented the cataclysmic error by first checking whether each variable is assigned before it is used. We create a deep embedding of our expression by using a Haskell data type.

data Exp where Plus :: Exp -> Exp -> Exp -- plus Times :: Exp -> Exp -> Exp -- times Const :: Int -> Exp -- const Assign :: String -> Int -> Exp -- assign Var :: String -> Exp -- varNote that we do not specify *how* the bindings should be stored, only that
such a thing exists. We now define a function that checks whether we use a
variable before it is defined.^{1}

With the function above we can *check* whether an expression is well-formed.
With our deep embedding we can even define transformations of our expression;
e.g. differentiate with respect to a variable.

Deep embedding allows us to utilize the semantics of our model by defining
multiple interpretations of our DSL. The downside is that just calculating the
value of our expression has become slower due to the added overhead of the
constructors, whereas the shallow embedding can be evaluated by only using
`Int`

s.

In short:

**Shallow embedding**should be used when you only need a single interpretation or when you are in a hurry.**Deep embedding**should be used in all other cases.

More reading material on this subject:

- This presentation by Josef Svenningsson.
- Combining Deep and Shallow Embedding for EDSL (Josef Svenningsson and Emil Axelsson, 2012)
- Certifying Machine Code Safety: Shallow versus Deep Embedding (Martin Wildmoser and Tobias Nipkow, 2004)
- Deep versus Shallow embeddings in Coq

- Most often you should use folds (2) instead of this direct recursion.
^{[return]}