Sudoku solver in C++11

TL;DR Show me the code : Gist
Tell me how fast it runs : Follows the same trend reported in the original post. For instance, the difficult problem that was solved in 72.27 seconds by the Python code took 45.34 seconds.

EDIT :
After being chastised by many insightful comments on Reddit, I changed the code to use {cpp}auto const&{/cpp} in places that matter, compiled with -O3, and re-ran both Python and C++ versions, and changed the times above to reflect those experiments. I had reported the time from Norvig’s webpage for the Python version which is not at all fair as that was probably on years old machine.

This post chronicles my effort to translate this Sudoku solver written in Python to C++11. (Yes, I know, I should call it C++, since the standard has been ratified, but C++11 feels like a completely new language, so I mentally separate C++03 and C++11). The first part sets up some lists to hold a grid configuration. Please refer to the original post to find out what each data structure means. I will show the Python code followed by the C++ with some explanation.
[python]
def cross(A, B):
“Cross product of elements in A and elements in B.”
return [a+b for a in A for b in B]
[/python]
The above code is doing a cartesian product of two lists. I am going to use {cpp}std::vector<>{/cpp} for representing lists. So here is the corresponding C++11 code for doing the cartesian product.
[cpp]
template
auto cross(const vector& a, const vector& b) ->
vector {
vector output;
for (auto _a : a) {
for (auto _b : b) {
output.push_back(_a+_b);
}
}
return output;
}
[/cpp]
Note that the Python code is generic with respect to the type of the output list. The type of the output list is determined by the type of the expression {python}a+b{/python}. We can emulate the same in C++ using {cpp}decltype{/cpp} and late-specified return type as shown.
[python]
digits = ’123456789′
rows = ‘ABCDEFGHI’
cols = digits
squares = cross(rows, cols)
[/python]
These are pretty straightforward :
[cpp]
vector digits = {“1″, “2″, “3″, “4″, “5″, “6″, “7″, “8″, “9″};
vector rows = {“A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”};
vector cols = digits;
vector squares = cross(rows, cols);
[/cpp]

[python]
unitlist = ([cross(rows, c) for c in cols] +
[cross(r, cols) for r in rows] +
[cross(rs, cs) for rs in ('ABC','DEF','GHI') for cs in ('123','456','789')])
[/python]
{python}unitlist{/python} is a list of lists of strings in Python. I chose {cpp}vector>{/cpp} to represent it.
[cpp]
vector> unitlist;
[/cpp]
The code populates {python}unitlist{/python} with 3 lists. Let’s look at the last list first.
[python]
[cross(rs, cs) for rs in ('ABC','DEF','GHI') for cs in ('123','456','789')]
[/python]
It’s hard to get this concise in C++, but we can try. Python uniformly treats a string as a list of characters. However C++ doesn’t. So the first order of business is to write something that can convert a string literal like {cpp}”123″{/cpp} to a {cpp}vector{/cpp}. To make it look more natural, I am going to use the C++11 user-defined literal facility. I am going to convert something like {cpp}”123″_vec{/vec} to a {cpp}vector{/cpp}. Here is the code to do it :
[cpp]
vector operator”" _vec(char const* str, size_t N) {
vector output;
for_each(str, str+N, [=, &output] (char c) {
output.push_back(string(1, c));
});
return output;
}
[/cpp]
Now, here is the translation for the last of the three lists that make up {python}unitlist{/python} :
[cpp]
for (vector rs : {“ABC”_vec, “DEF”_vec, “GHI”_vec}) {
for (vector cs : {“123″_vec, “456″_vec, “789″_vec}) {
unitlist.push_back(cross(rs, cs));
}
}
[/cpp]
Almost as concise, eh? The translations for the other two lists are easier.
[cpp]
for (auto _c : cols) {
vector c = {_c};
unitlist.push_back(cross(rows, c));
}
for (auto _r : rows) {
vector r = {_r};
unitlist.push_back(cross(r, cols));
}
[/cpp]
Let’s look at the next two data structures :
[python]
units = dict((s, [u for u in unitlist if s in u])
for s in squares)
peers = dict((s, set(sum(units[s],[]))-set([s]))
for s in squares)
[/python]
A {cpp}map{/cpp} is the straightforward translation for a Python dictionary.
[cpp]
map>> units;
map> peers;
[/cpp]
{python}units{/python} is easy to translate:
[cpp]
for (string s : squares) {
for (vector u : unitlist) {
if (find(u.begin(), u.end(), s) != u.end())
units[s].push_back(u);
}
}
[/cpp]
However, the way {python}peers{/python} is populated is a bit tricky. Python {python}sum{/python} function converts a list of lists into a list by simply appending the lists together. The equivalent can be achieved with {cpp}std::accumulate{/cpp}. That is not the most efficient way to achieve the flattening of the nested list in C++, but I am just going to use this as it is not in any inner loop. Once I get the flattened vector, I am going to conditionally copy it to the output set, instead of doing a set difference.
[cpp]
for (auto s : squares) {
vector temp =
accumulate(units[s].begin(), units[s].end(), vector(),
[=](const vector& a1, const vector& a2) -> vector {
vector ret;
for (auto e : a1) ret.push_back(e);
for (auto e : a2) ret.push_back(e);
return ret;
});
set& output = peers[s];
copy_if(temp.begin(), temp.end(), inserter(output, output.end()),
[=] (string t) { return !(t == s); });
}
[/cpp]
Next, let’s tackle the {python}assign, eliminate, parse_grid, and grid_values{/python} functions. All of either return the grid representation, which is a dictionary mapping strings to strings ({cpp}map{/cpp}), or {python}False{/python}, if the grid was inconsistent. Since C++ functions can have only one return type, I chose to represent the return type as follows :
[cpp]
template
struct maybe {
bool valid;
static unsigned created, destroyed;
union {
shared_ptr value;
};

maybe() : valid(false) {}

maybe(T *_v) : valid(true), value(_v) {}
maybe(const shared_ptr& _v) : valid(true), value(_v) {
++created;
}

maybe(const maybe& other) : valid(other.valid),
value(other.value) { ++created; }
operator T& () {
if (valid) return *value;
assert(false);
}

~maybe() {
if (valid) {
value.reset();
++destroyed;
}
}

bool is_valid() {
return valid;
}
};
typedef map value_t;
[/cpp]
The {cpp}maybe{/cpp} type will either contain a pointer to a data item of type {cpp}T{/cpp}, in which case {cpp}valid{/cpp} will be {cpp}true{/cpp}, or it will be empty. That way, the C++ functions can return {cpp}maybe{/cpp}.
assign :
[python]
def assign(values, s, d):
“”"Eliminate all the other values (except d) from values[s] and propagate.
Return values, except return False if a contradiction is detected.”"”
other_values = values[s].replace(d, ”)
if all(eliminate(values, s, d2) for d2 in other_values):
return values
else:
return False
[/python]

[cpp]
maybe assign(maybe _values, string s, string d) {
value_t& values = _values;

size_t pos = values[s].find(d);
assert(pos != string::npos);

string temp = values[s];
temp.replace(pos, 1, “”);
string other_values = temp;

for (auto d2 : other_values) {
if (eliminate(_values, s, string(1, d2)).is_valid() == false)
return maybe();
}

return _values;
}
[/cpp]

eliminate :
[python]
def eliminate(values, s, d):
“”"Eliminate d from values[s]; propagate when values or places <= 2.
Return values, except return False if a contradiction is detected."""
if d not in values[s]:
return values ## Already eliminated
values[s] = values[s].replace(d,'')
## (1) If a square s is reduced to one value d2, then eliminate d2 from the peers.
if len(values[s]) == 0:
return False ## Contradiction: removed last value
elif len(values[s]) == 1:
d2 = values[s]
if not all(eliminate(values, s2, d2) for s2 in peers[s]):
return False
## (2) If a unit u is reduced to only one place for a value d, then put it there.
for u in units[s]:
dplaces = [s for s in u if d in values[s]]
if len(dplaces) == 0:
return False ## Contradiction: no place for this value
elif len(dplaces) == 1:
# d can only be in one place in unit; assign it there
if not assign(values, dplaces[0], d):
return False
return values
[/python]

[cpp]
maybe eliminate(maybe _values, string s, string d) {
value_t& values = _values;

size_t pos = values[s].find(d);
if (pos == string::npos)
return _values;

values[s].replace(pos, 1, “”);

if (values[s].length() == 0)
return maybe();
else if (values[s].length() == 1) {
string d2 = values[s];
for (auto s2 : peers[s]) {
if (eliminate(_values, s2, d2).is_valid() == false)
return maybe();
}
}

for (auto u : units[s]) {
vector dplaces;
for (auto _s : u) {
if (values[_s].find(d) != string::npos)
dplaces.push_back(_s);
}
if (dplaces.size() == 0)
return maybe();
if (dplaces.size() == 1)
if (assign(_values, dplaces[0], d).is_valid() == false)
return maybe();
}

return _values;
}
[/cpp]
The only notable thing is returning {cpp}maybe(){/cpp} in place of Python’s {python}False{/python}. Otherwise, the C++ code has almost one-to-one correspondence with the Python code. The final piece of the puzzle is searching for a solution.
[python]
def search(values):
“Using depth-first search and propagation, try all possible values.”
if values is False:
return False ## Failed earlier
if all(len(values[s]) == 1 for s in squares):
return values ## Solved!
## Chose the unfilled square s with the fewest possibilities
n,s = min((len(values[s]), s) for s in squares if len(values[s]) > 1)
return some(search(assign(values.copy(), s, d))
for d in values[s])
[/python]

The C++ code is here :
[cpp]

maybe search(maybe _values) {
if (_values.is_valid() == false)
return maybe();

value_t& values = _values;

value_t temp;

copy_if(values.begin(), values.end(),
inserter(temp, temp.begin()),
[=, &values] (value_t::value_type v) -> bool {
return v.second.length() > 1;
});

if (temp.empty())
return _values;

auto min_s = min_element(temp.begin(), temp.end(),
[=, &values] (value_t::value_type v1,
value_t::value_type v2) -> bool {
return v1.second.length() < v2.second.length();
});

string s = min_s->first;

for (auto d : values[s]) {
maybe v_copy(make_shared(values));
maybe retval = search(assign(v_copy, s, string(1, d)));
if (retval.is_valid()) return retval;
}
return maybe();
}
[/cpp]

The code slightly deviates from the Python implementation, but remains similar in spirit. First, I copy the elements of the grid that have more than one valid entry in them to a temporary vector. If the temporary vector is empty, then we are done. If not, I pick the entry with smallest number of valid elements. That is what the {cpp}min_element{/cpp} is doing. Then I recurse by setting that grid cell to each possible valid entry. The gist has a complete runnable program that takes the input grid as a command line argument. An example run :
[cpp]
./a.out 003020600900305001001806400008102900700000008006708200002609500800203009005010300
13666 ms
4 8 3 | 9 2 1 | 6 5 7
9 6 7 | 3 4 5 | 8 2 1
2 5 1 | 8 7 6 | 4 9 3
———+———+———
5 4 8 | 1 3 2 | 9 7 6
7 2 9 | 5 6 4 | 1 3 8
1 3 6 | 7 9 8 | 2 4 5
———+———+———
3 7 2 | 6 8 9 | 5 1 4
8 1 4 | 2 5 3 | 7 6 9
6 9 5 | 4 1 7 | 3 8 2
[/cpp]
Time wise, I found that the C++ implementation followed the same trend as the Python implementation. That is, most problems were solved in less than few milliseconds. The hard one that took 72.27 seconds by Python was solved in 45.34 seconds by the C++ code.

Yet Another SFINAE tool in C++11

SFINAE in C++03 typically uses the {cpp}sizeof{/cpp} operator to determine which overload for a function was valid, and hence whether the type under consideration has a particular property, usually a {cpp}typedef{/cpp}. Here is the prototypical C++03 code for finding if a type {cpp}T{/cpp} has a {cpp}typedef foo{/cpp} inside it.
[cpp]
template
struct has_typedef_foo {
typedef struct {} yes;
typedef yes no[2];

template
static yes& test(typename C::foo*);

template
static no& test(…);

static const bool value = (sizeof(test(0)) == sizeof(yes));
};
[/cpp]

With C++11, we can use another unevaluated context, namely {cpp}decltype{/cpp} to check for the return type of the function that was valid. The {cpp}value{/cpp} variable can now be
[cpp]
static const bool value = std::is_same(0)), yes&>::value;
[/cpp]
There is yet another unevaluated context in C++11, namely the {cpp}noexcept{/cpp} operator. The operator is passed an expression, and it returns {cpp}true{/cpp} or {cpp}false{/cpp} depending on whether the expression is expected NOT to throw an exception or IS expected to throw an exception, respectively. We can make use of that fact, and define the two versions of the {cpp}test{/cpp} function to throw and not throw. Here is a version that uses the {cpp}noexcept{/cpp} operator, and doesn’t have the helper {cpp}yes/no{/cpp} types.
[cpp]
template
struct has_typedef_foo {
template
static void test(typename C::foo*) noexcept;

template
static void test(…) noexcept(false);

static const bool value = noexcept(test(0));
};
[/cpp]

Lazy Lists in Different Languages

Here is the canonical implementation of Fibonacci series in Haskell. The lazy evaluation semantics of Haskell is used to build the Fibonacci series as a lazy list. I thought about implementing lazy lists in languages that don’t have lazy evaluation semantics.

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

The above code uses the zipWith function, which operates on lazy lists. I didn’t want to implement that yet, so I wrote the following simpler implementation in Haskell, which still result in lazy lists. The first one, “from”, is just an infinite sequence of integers starting at n.

from n = [n] ++ (from (n+1))

The second one is the infinite Fibonacci series, but without using zipWith.

fibs_helper m n = m : (fibs_helper n (m+n))
fibs = fibs_helper 0 1

Lazy Lists in Python

The first language I decided to try was Python. The representation for the lazy list I chose was a pair of objects. The first, called “first”, is the first element in the list. The second, called “rest”, is a function, that returns the rest of the lazy list. The “first” of the “rest” is the second element, and so on. I just decided to put the pair of objects in a dictionary. So if I had a lazy list ‘L’, L['first'] would give me the first element, and L['rest']() would give me the rest of the lazy list. The first order of business was to write the “Take” function, that returns the first ‘n’ elements of a lazy list as a regular list. Here is the recursive implementation of “Take”. Take(0, l) is the empty list. Take(n, l) is the first element of l (l['first']) appended with Take(n-1, l['rest']()). Easy enough.

def Take(n, l):
  if (n==0):
    return []
  else:
    return [l['first']] + Take(n-1, l['rest']())

Now, we are ready to implement the ‘from’ lazy list. Obviously, the first element of from(n) is ‘n’. What about the ‘rest’ part? Remember, the ‘rest’ is supposed to be a function that should return the lazy list corresponding to the rest of the list. We know the rest of the list is just From(n+1). So we just create an anonymous function (lambda) and set ‘rest’ to that anonymous function. The key observation here is that the lambda is capturing n+1. So when From(n)['rest'] is called, it knows to return From(n+1).

def From(n):
  return {'first':n,
          'rest': lambda: From(n+1) }

The implementation of fibs is as follows:

def fibs_helper(m, n):
  return {'first':m,
          'rest': lambda: fibs_helper(n, m+n)}
 
fibs=fibs_helper(0, 1)

Since an element in a Fibonacci sequence depends on the previous two elements in the sequence, I make the helper function take 2 arguments, namely two consecutive elements in the sequence. The function fibs_helper(m, n) returns the lazy list starting at m. Again, the ‘first’ is m. The ‘rest’ is recursively defined by calling fibs_helper with n and n+m, as they are the next two elements in the sequence. Go ahead, try out

Take(10, From(42))
Take(20, fibs)

Lazy Lists in R

Next, I tried to do it in R, that also has strict evaluation semantics. This is pretty straightforward, and looks identical to the Python implementation.

take <- function(n, l) {
  if (n == 0) c()
  else c(l$first, take(n-1, l$rest()))
}
from <- function(n) {
  list(first=n, rest=function() { from(n+1) })
}
fibs_helper <- function(m, n) {
  list(first=m, rest=function() { fibs_helper(n, m+n) })
}

fibs <- fibs_helper(0, 1)

Lazy Lists in C++

Next, I tried to implement it in C++. First, the “take” function template is shown below. It take a lazy list, and as long as that class has a “first” member and a “rest” method, the “take” function is happy. It just recursively calls itself and returns a std::vector.

#include <vector>
using namespace std;
template<typename F>
vector<int> take(int n, F f)
{
  if (n==0)
    return vector<int>();
  else {
    vector<int> retval;
    retval.push_back(f.first);
    vector<int> temp = take(n-1, f.rest());
    std::copy(temp.begin(), temp.end(),
              std::inserter(retval, retval.end()));
    return retval;
  }
}

Now, let’s look at the “from” lazy list. It is a class with the “first” member, which is set in the constructor. The “rest” is a method that returns a new object of type “from”, but constructed with n+1. Simple enough, again.

struct from {
  int first;
  from(int n) : first(n) {}
  from rest() { return from(first+1); }
};

fibs is similar in spirit, but since we know the infinite list always begins at 0, I just declared an object called fibs, constructed with (0,1).

struct fibs {
  int m, n;
  int first;
  fibs(int _m, int _n) : m(_m), n(_n), first(_m) {}
  fibs rest() { return fibs(n, m+n); }
} fibs(0, 1);

Here is the main program that calls “take” on fibs, to get the first 10 elements, and prints them out.

int main()
{
  vector<int> t = take(10, fibs);
  for (unsigned i=0, e=t.size(); i!=e; ++i)
    cout << t[i] << "\n";
  return 0;
}

Lazy Lists in C++0x

Or C++11, as it is known now. I just changed the implementation of “rest”. Instead of a being a class method, it is just a class member of type std::function<from()>, (or std::function<fibs()>). I set it in the constructor to a lambda expression, just like the Python or R implementation.

struct from {
  int first;
  function<from()> rest;
  from(int n) : first(n), rest([=] ()->from {return from(n+1);}) {}
};

struct fibs {
  int first;
  function<fibs()> rest;
  fibs(int m, int n) : first(m), rest([=] ()->fibs {return fibs(n, m+n);}) {}
} fibs(0, 1);

Deconstructing Intel’s Array Building Blocks

Intel recently released a beta of their “Array Building Blocks” C++ library (ArBB from now on). I downloaded it and started looking at it, not from the perspective of using it to write programs, but from the perspective of understanding its design and implementation. According to Intel’s website, ArBB is a “sophisticated library for vector parallelism”. Using ArBB, one can implement data-parallel algorithms that are common in domains such as image processing, financial analytics, and computer graphics. The promise of ArBB is that, the implementations will be :

  • Portable : the same program can be run on a wide variety of platforms. The programmer writes the program without worrying about low-level hardware details like how many cores are present, whether the cores support SIMD extensions like SSE, AltiVec etc.
  • Scalable : ArBB programs scale nicely with increasing number of cores in current generation processors. And as newer chips come out with more cores and better SIMD extensions, ArBB programs will automatically utilize these features without needing to rewrite the programs.

Ok, so how do they do it? Let’s look at the basics of ArBB. ArBB exposes 3 “concepts” to the user of the library :

  • Data types: Scalar data types like i8, i16, u32, f32, f64 etc. Vector data types like dense, nested etc. where T can be ArBB scalar data types or user-defined data types. One can also have matrices by providing the dimension as a second template parameter, for eg., dense.
  • Functions: Plain user functions, but having one or more scalar or vector data types as parameters, and returning void. Here’s are some examples :
    void scalar_function(f32& result, f32 a, f32 b)
    {
      result = a + b;
    }
    void vec_function(dense<f32> &result, dense<f32> A, dense<f32> B, f32 C)
    {
      result = A+B/C;
    }

    Even though they don’t look special, they are. I will get into that later, but for the impatient, think extensive operator overloading to generate abstract syntax tree for expressions.

  • Function invoke mechanisms: The above functions can be invoked on scalar or vector data with special library “built-in” functions called “call” and “map”, like so:
    f32 t1=40.0f;
    f32 ans;
    call(scalar_function)(ans, t1, 2.0f);
    dense<f32> res, v1, v2;
    call(vec_function)(res, v1, v2, ans);
    map(scalar_function)(res, v1, res);

    Notice that the special ArBB functions are invoked using the extra “call” function, and NOT as a regular C++ function like so:

    scalar_function(ans, t1, 2.0f); //Wrong

So, what’s going on here? With the “call” function, the programmer is expressing to the library that a function has to be performed on each element of a vector. The ArBB library takes over the job of generating code for the function at run-time, and uses the ArBB run time support system of run the code on multiple cores. To do this, the library has to “capture” the computation that needs to be performed. The way it does this, is using C++ operator overloading. Here is what I meant when I said these functions are “special”. When you see a statement like

result = a+b;

the “=” and “+” operator functions kick-in, and produce abstract syntax tree (AST) fragments that get added to the current “context”. The current “context” is set up by the “call” function. I can imagine the “call” function to look something like this :

template<typename F>
F arbb::call(F f)
{
  if(code = lookup_cache(f)) return code;
  setup_AST_context();
  f();
  finish_context();
  code = generate_code_from_current_context(); // Memoize if already generated
  // code is a function object with the same signature as original f,
  // so that call(f)(...original f arguments...) will work
  return code;
}

Note that one can also perform operations on the vectors outside of these special functions, something like :

int main()
{
  dense<i32> a,b,c;
  c = a+b;
  ...
}

These work as expected, i.e., ‘c’ gets the value of ‘a+b’ immediately after the statement. Since these are not called using the “call” function, I believe the “+” operator (and other operators) really has 2 modes of execution. One inside the context of a “call” function, where it generates AST fragments, and the other outside the context, where it does plain element-wise operations on the vector operands. And the “call” function sets a global flag that switches the behavior to AST generation. I also believe the element-wise code is plain sequential code. I wrote the following simple program for vector addition :

#include "arbb.hpp"
#include "util.h"
#include <iostream>
#include <vector>

using namespace arbb;
using namespace std;

#define N 1024

#define NUM_RUNS 16384

void add(dense<f32> &res, dense<f32> a, dense<f32> b, dense<f32> c)
{
  res = a+b+c;
}

int main()
{
  dense<f32> a(N), b(N), c(N), d(N);

  vector<float> va, vb, vc, vd(N);
  ULONG64 begin, end;

  for(unsigned i=0; i<N; ++i) {
    va.push_back((float)i);
    vb.push_back((float)(2*i));
    vc.push_back((float)(3*i));
  }
  bind(a, &va[0], N);
  bind(b, &vb[0], N);
  bind(c, &vc[0], N);
  bind(d, &vd[0], N);

  begin = readTime();
  for(unsigned i=0; i<NUM_RUNS; ++i) {
    call(add)(d, a, b, c); // VERSION 1
    //d = a+b+c;           // VERSION 2
  }
  end = readTime();
  cout << "time = " << (double)(end-begin)/MICRO << "\n";
  return 0;
}

There are 2 versions of vector addition, one using the “call” on the “add” function (VERSION 1), and the other using inline addition of the vectors (VERSION 2). The vector sizes are 1024, and I perform the addition 16384 times. On my Core i7 machine, VERSION 1 takes 0.202 seconds, and VERSION 2 takes 9.329 seconds. This tells us that the “call” function generates optimized version of the code, and also memoizes the code (and probably puts the code in hash table, indexed by the function pointer) so that subsequent “call”s don’t invoke the run-time compiler again. Looking at the huge time difference between VERSIONs 1 and 2, the “immediate” execution is probably sequential.

Now, more needs to be said about the special functions. It is straight-forward to think about the AST generation when the function is a sequence of assignments. But what about control flow? What about if-then-else and loops? If you think about it a little, it is easy to see that the native C “if-else”, “for”, or “while” statements cannot be used to express control flow to the library. To express, say, an if-else construct to the library, the programmer has to explicitly spell out what the if-condition is, what the then-part is, and what the else-part is. Something like the following is conceivable :

void func(dense<f32> &result, dense<f32> a, dense<f32> b, bool c)
{
  // Want to express the following :
  // if(c) result = a+b;
  // else result = a-b;
  begin_if_context();
  set_condition(c);
    begin_then_part();
    result = a+b;
    end_the_part();
    begin_else_part();
    result = a-b;
    end_else_part();
  end_if_context();
}

Note that the begin and end functions need to be there so that the AST fragments that are getting built by the operator overload functions can be attached to the correct basic blocks. Of course, the above gets quite cumbersome. ArBB provides macros which are named to resemble the C keywords, like _if, _for, _while, etc., that take care of the cumbersome stuff. Look at the following ArBB code :

void mandelMp(std::complex<f32> c, f32& ret)
{
  i32 count = 0;
  std::complex<f32>  z = std::complex<f32> (0.f, 0.f);

  i32 i;
  _for (i = 0, i < MAX_COUNT, i++) {
    _if (abs(z) >= 2.0f) {
      _break;
    } _end_if;
    z = z * z + (std::complex<f32>)c;
    count ++;
  } _end_for;

  ret = (f32)count;
}

This is part of the Mandelbrot program in the provided as samples along with ArBB beta. “g++ -E” on the above produces the following:

void mandelMp(std::complex<f32> c, f32& ret)
{
  i32 count = 0;
  std::complex<f32> z = std::complex<f32> (0.f, 0.f);

  i32 i;
  {{{{ arbb_1::detail::loop_stack().push(true); arbb_1::detail::loop_stack().top().for_begin(); i = 0; if (arbb_1::detail::capturing()) { arbb_1::detail::loop_stack().top().for_end_init(); { arbb_1::boolean c = arbb_1::detail::accept_empty_cond(i < MAX_COUNT); arbb_1::detail::loop_stack().top().for_end_cond(c); } i++; arbb_1::detail::loop_stack().top().for_end_step(); } bool forever = true; while (forever) {{ if (!arbb_1::detail::capturing()) { if (!arbb_1::detail::loop_stack().top().first_iteration()) { i++; } if (!value(arbb_1::detail::accept_empty_cond(i < MAX_COUNT))) { break; } } {
    {{{{ arbb_1::detail::if_stack().push(true); arbb_1::detail::if_stack().top().begin(abs(z) >= 2.0f); if (arbb_1::detail::capturing() || arbb_1::detail::if_stack().top().if_value()) {{ {
      { if (arbb_1::detail::capturing()) { arbb_break(arbb_1::detail::function::current(), arbb_1::detail::throw_on_error_details()); } else { break; } };
    } }} arbb_1::detail::if_stack().top().end(); arbb_1::detail::if_stack().pop(); }}}};
    z = z * z + (std::complex<f32>)c;
    count ++;
  } if (arbb_1::detail::capturing()) { break; } }} arbb_1::detail::loop_stack().top().for_end(); arbb_1::detail::loop_stack().pop(); }}}};

  ret = (f32)count;
}

The expansions for _if, _end_if, _for, etc., look quite hairy, but what they are really doing is setting up the context so that statements following the macros get attached to the correct basic blocks.

After understanding some of the internals of ArBB, I began ruminating on the merits of the library a bit. I would like to call this approach to data-parallel programming, “stylized” parallel programming. What the library writers are really saying is that, write your programs in a particular style, that clearly tells us what the parallel computation is, and we will take care of the grunt work for you. This is probably a pragmatic approach. Instead of inventing a new language, the library writers are asking the programmers to write plain C++, but just in a particular style. However, it does have some pitfalls. Novice users may not understand the style completely. For example, statements inside the special functions that don’t use the special data types get executed only the first time “call” is invoked. If they refer to global variables, the second “call” won’t reflect intermediate updates to the global variable. The programmer has to use special “capture” function to force re-capture and code generation from AST again. The macro mechanism could be quite error-prone. I am not sure what kind of errors will result if say, I forgot to put an _end_if. I am guessing it will be an obscure run-time error, which is not as nice as a compile time error for forgetting to end a scope with “}”.

The library is really embedding a special domain specific language (DSL) in C++. Programming language purists may frown at the flaky and fragile macros used for embedding the DSL. Something else that I thought about was, the operator overloading and macro tricks are quite straight-forward, and anyone with decent C++ knowledge and enough time should be able to reproduce that aspect of the library quite easily. So where is the value in ArBB? I believe it is the run-time support, and the optimizing compiler that generates code at run-time. BTW, LLVM has been used in similar contexts before, so I won’t be surprised if ArBB is using LLVM too, although I didn’t bother to dig deeper to verify that.

9 Digit Problem in Haskell

I was led from here to here to here, and got intrigued by the 9-digit problem. Here is the description, copied verbatim from here :

Find a number consisting of 9 digits in which each of the digits from 1 to 9 appears only once. This number must also satisfy these divisibility requirements:

  1. The number should be divisible by 9.
  2. If the rightmost digit is removed, the remaining number should be divisible by 8.
  3. If the rightmost digit of the new number is removed, the remaining number should be divisible by 7.
  4. And so on, until there’s only one digit (which will necessarily be divisible by 1).

There are solutions in many languages in those webpages, but I wanted to write my own in Haskell, just for kicks. Here it is :

thelist = [1..9]

check [] = False
check xs = (foldl (\x -> \y -> 10*x+y) 0 xs) `mod` (length xs) == 0

onemore [] = [[x] | x <- thelist]
onemore list = [list ++ [l] | l<-thelist,elem l list == False, check (list ++ [l]) == True]

testmore [] = []
testmore (x:xs) = (onemore x) ++ (testmore xs)

result = foldl (.) id (replicate 8 testmore) (onemore [])

Here is another version in Haskell, just for comparison.

Church Numerals Using C++0x Lambda Expressions

The upcoming C++ standard has added lambda expressions to the language. I thought it would an interesting (and arguably pointless) exercise to try and represent church numerals using lambda expressions. I only partly succeeded in the amount of time I spent thinking about it. Let’s refresh the definition of Church numerals. Church numeral C(N) for integer N is a function with the following properties :

  • It takes as input, another function f.
  • Its result is a function g.
  • The property of g is such that g(x) = f(f(… N times …(x))

The definition is intentionally silent on the argument and return types of f and g. In fact, we don’t need to know the types for the definition. For a non-strict language like Haskell, Church numerals comes naturally. Unfortunately, C++0x lambda expressions are monomorphic, i.e., you have to say upfront what the argument and return types are. (I know, I know, you can put it inside a template class and use the template parameters as the types, but that’s not the same thing). So, as a first step, I started off defining the functions in a simple manner. They take one integer, and return one integer.

#include<functional>
#include<iostream>

using namespace std;

typedef function<int(int)> F;

Now, I define church numeral to be a function that takes an argument of type ‘F’ and returns a result of type ‘F’. Here is the code for a function called ‘church’ that takes an unsigned integer and returns the church numeral representation.

static const F id = [=](int x) { return x; };

function<F(F)> church(unsigned int i)
{
  if(i == 0) {
    return [=] (F f) { return id; };
  }
 
  return [=] (F f) {
    F tmp = [=](int x) { return f(church(i-1)(f)(x)); };
    return tmp;
  };
}

When ‘i’ is zero, ‘church’ simply returns the identity (id) function. For other values, church(i) uses church(i-1) recursively. It passes the incoming function ‘f’ to church(i-1). The resulting function is applied on ‘x’ and the incoming function ‘f’ is applied on this result. This entire operation is wrapped inside a lambda expression which is the return value of church(i). Here is the reverse operation, namely unchurch, that takes a church numeral and return an unsigned int.

static const F plusone = [=](int x) { return x+1; };

unsigned int unchurch(function<F(F)> f)
{
  return f(plusone)(0);
}

plusone just returns the incoming argument incremented by one. unchurch just passes plusone to the incoming church numeral and applies the resulting function on 0. Now let us define Plus, that takes 2 church numerals and returns a church numeral that represents the sum of the incoming 2.

function<F(F)> Plus(function <F(F)> M, function <F(F)> N)
{
  return [=] (F f) {
    F tmp = [=](int x) { return M(f)(N(f)(x)); };
    return tmp;
  };
}

Similarly Mult, that returns the product.

function<F(F)> Mult(function <F(F)> M, function <F(F)> N)
{
  return [=] (F f) {
    return N(M(f));
  };
}

Go ahead try out the following (works with gcc 4.5):

int main()
{
  cout << unchurch(church(42)) << "\n";
  cout << unchurch(Plus(church(41), church(1))) << "\n";
  cout << unchurch(Mult(church(6), church(7))) << "\n";
  return 0;
}

So far so good. The difficulty arises when we try to define exponentiation. In Haskell it is quite simple :

exp m n = n m

What I want to say in C++ is :

// Does not compile
function<F(F)> Exp(function <F(F)> M, function <F(F)> N)
{
  return N(M);
}

Of course, I can’t do that because ‘F’ is defined to be function<int(int)>. It really needs to be function<anything(anything)>. The monomorphic nature of lambda expressions is really limiting the expression of exponentiation here. I would really be interested in seeing a different way to define ‘F’ such that exponentiation can be expressed as naturally as in Haskell.

Church Numerals in C++

Church numerals are a representation of natural numbers {0,1,2,…}. The representation encodes each number as a function. Let’s call the function corresponding to natural number N as C(N). C(N) has the following properties :

  • It takes as input, another function f.
  • Its result is a function g.
  • The property of g is such that g(x) = f(f(… N times …(x))

In other words, C(N) takes as input a function and returns another function that just applies the input function N times. It’s easy to represent C(N) in Haskell. Here is a Haskell function ‘church’, that takes an integer ‘n’ as input and produces the Church numeral for ‘n’.

type Church a = (a -> a) -> a -> a

church :: Integer -> Church a
church 0 = \f -> \x -> x
church n = \f -> \x -> f (church (n-1) f x)

The reverse operation, namely taking a church numeral and producing an integer, is also straightforward. Just pass the function f(x)=x+1 to the church numeral (remember, a church numeral itself is a function taking another function as input), and to the resulting function, pass 0. The result of the application of the latter function should be the integer corresponding to the Church numeral.

unchurch :: Church Integer -> Integer
unchurch n = n (\x -> x + 1) 0

Now, how can we represent Church numerals in C++? First, we need a suitable representation for a “function”. I am sure that there are many ways to do this. I choose to use the “Metafunction class” method in the Abrahams-Gurtovoy book. I represent a function as a C++ (non-templated) class, that has a nested templated class called “apply”. The “apply” class defines a nested type called “result”. As an illustration, here is the “identity” function (f(x) = x) :

struct identity {
  template <typename X>
  struct apply {
    typedef X result;
  };
};

Now, its easy to see that the “apply” of the Church encoding of zero should be the identity function. Here is zero, represented as a C++ Church numeral:

struct Zero {
  template <typename F>
  struct apply {
    typedef identity result;
  };
};

Note the subtlety here. The apply of Zero takes as input, a “function” F, and produces the identity function as result. How do we represent one? The apply function in Church encoding of one should just apply the input function once. So apply can just define result to be the input F as shown below :

struct One {
  template <typename F>
  struct apply {
    typedef F result;
  };
};

It gets interesting for two. The apply of Church encoding of two should apply the input function twice. In this case, instead of using a typedef, we have to define the result explicitly, as shown below :

struct Two {
  template <typename F>
  struct apply {
    struct result {
      template<typename X>
      struct apply {
        typedef typename One::template apply<F>::result::template apply<X>::result prev;
        typedef typename F::template apply<prev>::result result;
      };
    };
  };
};

As Michael Witten points out in the comments, ‘prev’ above can also be simplified to

typedef typename F::template apply<X>::result prev;

But I showed it the long way because it naturally leads to the definition of Church<N>. Just define prev from Church<N-1> as below.

Now we are ready to define the generic version of Church encoding (namely Church<N>), that takes the number to be encoded as a template parameter. We can just extrapolate Two. Instead of applying “One” first to define prev, we can recursively apply Church<N-1> :

template<int N>
struct Church {
  template <typename F>
  struct apply {
    struct result {
      template<typename X>
      struct apply {
        typedef typename Church<N-1>::template apply<F>::result::template apply<X>::result prev;
        typedef typename F::template apply<prev>::result result;
      };
    };
  };
};

The case to terminate recursion, namely Church<0> can be defined as

template<>
struct Church<0> : Zero {};

We need the corresponding Unchurch too, to actually make sense out of the representation. I choose to use the Boost library’s MPL int for this purpose. First, I define the function ‘plusone’, whose apply just increments its input mpl::int_<>:

struct plusone {
  template <typename T>
  struct apply {
    typedef mpl::plus<T, mpl::int_<1> > result;
  };
};

I make Unchurch contain a nested enum that just holds the integer corresponding to the Church numeral ‘N’.

template <typename N>
struct Unchurch {
  enum {value = N::template apply<plusone>::result::template apply<mpl::int_<0> >::result::value };
};

Here is how Unchurch works :

  • First, ‘plusone’ is passed to N to get back the result.
  • result now is a function that applies ‘plusone’ ‘N’ times.
  • I apply this result function on mpl::int_<0>.
  • I define value of Unchurch as the result::value got from the previous application.

Simple, no? With this representation of Church numerals in C++, operations like addition, multiplication, and exponentiation become straightforward, almost one to one mapping from the Haskell implementation. Here is the addition representation is Haskell :

plus m n = \f -> \x -> m f (n f x)

and the implementation in C++ :

template <typename M, typename N>
struct Plus {
  template <typename F>
  struct apply {
    struct result {
      template<typename X>
      struct apply {
        typedef typename N::template apply<F>::result::template apply<X>::result tmp1;
        typedef typename M::template apply<F>::result::template apply<tmp1>::result result;
      };
    };
  };
};

Similarly, the implementations of multiply and exponentiation in Haskell :

mult m n = \f -> n (m f)
exp m n = n m

and in C++ :

template <typename M, typename N>
struct Mult {
  template <typename F>
  struct apply : N::template apply<typename M::template apply<F>::result> {};
};

template <typename M, typename N>
struct Exp : N::template apply<M>::result {};

Spend some time looking at Mult and Exp. It is amazing how functional programming is quite straightforward in C++, once you have the right representation of things. Go ahead and try out the following :

cout << Unchurch<Church<42> >::value << endl;
cout << Unchurch<Plus<Church<1>, Church<41> > >::value << endl;
cout << Unchurch<Mult<Church<6>, Church<7> > >::value << endl;
cout << Unchurch<Exp<Church<2>, Church<9> > >::value << endl;

You might have to pass -ftemplate-depth-1024 to g++ to get this compiled.

Integers from scratch

We take integers ({… -3, -2, -1, 0, 1, 2, 3, …}) for granted in a programming language. For example, in C or C++, we just say

int i;

and we get to use ‘i’ as a 32-bit integer. But in this post, I am going to create a representation for integers from scratch. Let’s call it Pint, ‘P’ for Peano, on whose axioms I base the representation on, or maybe the pint of beer I drank before I thought of this. Here is a Haskell data type representing integers :

data Pint = Zero | Succ Pint | Pred Pint

And here is the same representation using C++ templates :

struct Zero {};

template<typename>
struct Succ {};

template<typename>
struct Pred {};

In plain English, this means :

  • Zero is a Pint.
  • Anything of the form Succ x is a Pint.
  • Anything of the form Pred x is a Pint.

My intention is to represent, say 3, as

Succ Succ Succ Zero

or in C++

Succ<Succ<Succ<Zero> > >

Negative numbers are represented symmetrically using Pred. -2 is represented as

Pred Pred Zero

or in C++

Pred<Pred<Zero> >

It will nice to see see what a Pint actually represents in terms of normal integers. So here is a transform that converts Pint to int. In Haskell, it is a function that takes Pint and produces a Integer.

pint2int Zero = 0
pint2int (Succ p) = 1 + (pint2int p)
pint2int (Pred p) = (pint2int p) - 1

The code above is quite straightforward. It establishes the base case first, namely saying pint2int of Zero is 0. Then it says pint2int of Succ p is whatever pint2int p returned, plus 1. Simple, no? The conversion for Pred p is symmetric. In C++, I am going to make pint2int a templated class, with a nested enum value, which will hold the integer value corresponding to the Pint given as the template parameter.

template<typename>
struct Pint2Int;

template<>
struct Pint2Int<Zero> {
enum { value = 0 };
};

template<typename N>
struct Pint2Int<Succ<N> > {
enum { value = Pint2Int<N>::value+1 };
};

template<typename N>
struct Pint2Int<Pred<N> > {
enum { value = Pint2Int<N>::value-1 };
};

The specializations of Pint2Int define the nested value recursively, and in the base case Zero, the value is 0. Go ahead, try out

pint2int (Succ (Succ Zero))

or

cout << Pint2Int<Pred<Pred<Pred<Zero> > > >::value << endl;

You should see 2 and -3 respectively. Now, it gets cumbersome to define Pint’s in terms of Zero always. The representation gets quite long. It might be nice to have a conversion from int to Pint. Here is one for Haskell.

int2pint 0 = Zero
int2pint n = if (n > 0)
then (Succ (int2pint (n-1)))
else (Pred (int2pint (n+1)))

We first establish the ‘Zero’ case, i.e., int2pint 0 is Zero. For a positive number ‘n’, we can recursively call int2pint (n-1) and return Succ of the return value. We are sure the recursion will terminate because ‘n-1′ will eventually become 0. However, this won’t work for a negative ‘n’. So the other case recursively calls int2pint on ‘n+1′, which we are sure will eventually reach 0.
The same code is a little more involved in C++. My intention is to provide a template class Int2Pint with a nested ‘type’. For example, Int2Pint<2>::type will be Succ < Succ <Zero> >. Here is the definition of Int2Pint.

template<int N, bool= (N>0)>;
struct Int2Pint;

template<>
struct Int2Pint<0> {
typedef Zero type;
};

template<int N>
struct Int2Pint<N, true> {
typedef Succ<typename Int2Pint<N-1>::type> type;
};

template<int N>
struct Int2Pint<N, false> {
typedef Pred<typename Int2Pint<N+1>::type> type;
};

There are 2 template arguments, one the integer we want to convert to Pint, and the other to tell us whether the given integer is positive or not. The default value for the bool template argument is ‘(N>0)’, which itself depends on the first argument ‘N’. Again, we establish the Zero case. Then we do template specialization for the true and false values for the second arguments. The specializations define ‘type’ by invoking Int2Pint recursively on N-1 and N+1 respectively. Try

pint2int (int2pint -4242)

and

cout << Pint2Int<Int2Pint<480>::type>::value << endl;

You should see -4242 and 480 respectively, although g++ will take a long time to compile the program, because of all the recursive template instantiations.

Now that we have the representation down pat, we can play some fun games with it. How about defining the basic operations like +, -, etc. The only rule is, we have to implementation in the Pint domain and we cannot use the integer arithmetic provided by the language. In other words, we simply cannot convert Pint to int, perform the operations on int, and then convert back to Pint. Why? Because it is more fun and challenging that way..

Let’s define ‘plus’. Here is the Haskell definition.

plus Zero n = n
plus (Succ n1) n2 = Succ (plus n1 n2)
plus (Pred n1) n2 = Pred (plus n1 n2)

This again follows the principle of defining the Zero case, then defining the other cases recursively, making sure the recursion will eventually reach Zero.
In C++, I define Plus<>, with the intention of having a nested type. So Plus<N1,N2>::type will be a Pint whose value will be N1+N2, N1 and N2 being Pint’s themselves.

template<typename, typename>
struct Plus;

template<typename N>
struct Plus<Zero, N> {
  typedef N type;
};

template<typename N1, typename N2>
struct Plus<Succ<N1>, N2> {
  typedef Succ<typename Plus<N1, N2>::type> type;
};

template<typename N1, typename N2>
struct Plus<Pred<N1>, N2> {
  typedef Pred<typename Plus<N1, N2>::type> type;

Here is ‘sub’ and ‘mul’ in Haskell.

negatepint Zero = Zero
negatepint (Succ n) = Pred (negatepint n)
negatepint (Pred n) = Succ (negatepint n)

sub n1 n2 = plus n1 (negatepint n2)

mul (Succ Zero) n = n
mul (Succ n1) n2 = plus (mul n1 n2) n2
mul (Pred n1) n2 = plus (mul n1 n2) (negatepint n2)

For ‘sub’, I make use of ‘negatepint’, and then just call plus after negating the second operand to sub. Here is the code in C++, again with nested ‘type’s holding the result of the operation.

template<typename N>
struct Negate;

template<>
struct Negate<Zero> {
  typedef Zero type;
};

template<typename N>
struct Negate<Succ<N> > {
  typedef Pred<typename Negate<N>::type> type;
};

template<typename N>
struct Negate<Pred<N> > {
  typedef Succ<typename Negate<N>::type> type;
};

template<typename N1, typename N2>
struct Minus {
  typedef typename Plus<N1, typename Negate<N2>::type>::type type;
};

template <typename, typename>
struct Multiply;

template<typename N>
struct Multiply<Zero, N> {
  typedef Zero type;
};

template<typename N>
struct Multiply<_1, N> {
  typedef N type;
};

template<typename N1, typename N2>
struct Multiply<Succ<N1>, N2> {
  typedef typename Plus<typename Multiply<N1, N2>::type, N2>::type type;
};

template<typename N1, typename N2>
struct Multiply<Pred<N1>, N2> {
  typedef typename Plus<typename Multiply<N1, N2>::type, typename Negate<N2>::type>::type type;
};

Now, how do we do division? Well, that’s another post.

C++ Expression Templates

C++ expression templates encode all the information about an expression in its type. Or, in compiler parlance, it encodes the parse tree of an expression in its type. Let’s explore this concise definition in detail. First of all, what are expressions, and what are types? The following snippets are all expressions :

a
a+b
(c=a+b, d=x-y)

The following is also an expression, just a complicated one.

if_(a<b) [
  c = d+e,
  r = s+t
].else_[
  x = y+f,
  z = d-e
]

In the following code

int a,x; float b; double d;
struct foo { ... };
foo bar;
a-x
b+a;

the type of a and x are int, the type of bar is foo, the type of the expression a-x is int, and the type of the expression b+a is float. We have template classes in C++, which can lead to some fancy types. For example,

template<class T>
struct foo { ... };
template<class T>
struct bar { ... };
/* The following are all types */
foo<int>
bar<double>
foo<bar<int> >
bar<std::vector<float> >
bar<foo<bar<foo<int> > > >

The mechanism used in C++ expression templates is a combination of these fancy types and some clever operator overloading. Let’s start by defining some simple classes.

struct var {};
struct plus {};
struct minus {};
struct multiplies {};
struct divides {};

Don’t be surprised by the empty classes. I am defining them this way to just illustrate expression templates. To make the expression templates do useful work, these classes should probably contain some methods and member variables. But that’s another post. Now let’s define the “expression” class.

template<class Op, class LHS, class RHS>
struct expr {};

There are many ways to define this class, different tutorials on expression templates on the web do it differently. But the way I have it, is a straightforward way to think about it. The expr class is has three template arguments. The Op argument represents the operation being performed, the LHS and RHS arguments represent the operands of that operation. The intention is to represent and expression a+b, where a and b are of type var, using the following instantiation of expr :

expr<plus, var, var>

So, how do we do that? Well, we are free to overload the ‘+’ operator, and are free to choose what the return type of the operator. We can go ahead and do this :

expr<plus, var, var>
operator +(const var &, const var &)
{
  return expr<plus, var, var>();
}

Now, whenever we do a+b, where a and b are of type var, the above function is invoked and a new object of type expr<plus, var, var> is constructed. The promise of expression templates is met, i.e., the type of the expression encodes the parse tree of the expression.

We are not done yet. What about the expression a+b+c? The compiler evaluates a+b first, so it would call the above function. But that function returns an object of the type expr<plus, var, var>. Then the compiler has to evaluate (a+b) + c, which means it would look for an operator overload that takes expr<plus, var, var> and var. So, we have to make one for that. As the expressions get bigger, the type of operands to the ‘+’ operator get more and more fancy. But the good news is, we can make the C++ compiler do the work for us, by templating the operator ‘+’ overload function itself. Let’s make the type of the two operands to ‘+’ as the template parameters as follows :

template<class LHS, class RHS>
expr<plus, LHS, RHS>
operator +(const LHS &, const RHS &)
{
  return expr<plus, LHS, RHS>();
}

Ponder the above function for a while. The return type is expr<plus, LHS, RHS>. Its operands are of type LHS and RHS. Let’s go back to a+b+c. First, the compiler tries to evaluate a+b. It looks for an operator overload. There is a match above, with LHS=var, RHS=var. It instantiates such a function, calls it and returns expr<plus, var, var>. Now it tries to evaluate (a+b) + c. Again, it looks for an operator overload. There is a match, with LHS=expr<plus, var, var> and RHS=var. It goes through the drill again, and returns an object of the following type :

expr<plus, expr<plus, var, var>, var>

Let’s complete the picture by defining the overloads for -, *, and / operators.

template<class LHS, class RHS>
expr<plus, LHS, RHS>
operator +(const LHS &, const RHS &)
{
  return expr<plus, LHS, RHS>();
}
template<class LHS, class RHS>
expr<minus, LHS, RHS>
operator -(const LHS &, const RHS &)
{
  return expr<minus, LHS, RHS>();
}
template<class LHS, class RHS>
expr<multiplies, LHS, RHS>
operator *(const LHS &, const RHS &)
{
  return expr<multiplies, LHS, RHS>();
}
template<class LHS, class RHS>
expr<divides, LHS, RHS>
operator /(const LHS &, const RHS &)
{
  return expr<divides, LHS, RHS>();
}

Imagine now, what happens when the compiler encounters an expression like a+b-c*d/e. Every time it needs to evaluate an operator, it would create an instance of one of the above functions (if not already created), and return the relevant type. That’s a lot of instantiations, and a lot of new types. But ultimately, the type of the expression contains all the information about the expression.

Hello world!

This is my first post.