Type-safe scripting with C++ (and other weird explorations)

The why and the what(-the-f***)

Let’s start this short tale with some background.

For reasons unclear, I’ve started working on my n-th ~~abandonable~~ side-project. Much detail isn’t necessary here: it’s basically a C++ library for performing simple math operations— averaging, sum, standard deviation, autocorrelation and the like. Each operation is implemented as Functor: instantiate it, call it’s operator(), and done! For a concrete example, consider the SumFunctor below (ignore the inheritance for now), which offers overloads to sum two integers or sum (concatenate) two strings:

// sumfunctor.h
struct SumFunctor : AbstractFunctor {
    int operator()(int a, int b) const { 
        int sum = a + b; // sum integers
        std::cout << "SumFunctor::operator(int, int) = " << sum << "\n";
        return sum;
    }
    std::string operator()(const std::string& a, const std::string& b) const { 
        std::string sum = a + b; // concatenate strings
        std::cout << "SumFunctor::operator(string, string) = " << sum << "\n";
        return sum;
    }
};

// sumfunctor.h

struct SumFunctor : AbstractFunctor {

int operator()(int a, int b) const {

int sum = a + b; // sum integers

std::cout << "SumFunctor::operator(int, int) = " << sum << "\n";

return sum;

}

std::string operator()(const std::string& a, const std::string& b) const {

std::string sum = a + b; // concatenate strings

std::cout << "SumFunctor::operator(string, string) = " << sum << "\n";

return sum;

}

};

However, there’s a catch: I also want to be able to invoke these functors from some sort of scripting language. The exact language isn’t relevant (though I have my eye on Tcl); suffices to say that it will, as usual, be dynamically typed. The goal here is to ~~suffer~~ be able to play around with these building blocks somewhat faster, potentially being able to integrate them more easily with other languages. Ultimately, I want to be able to write something like the following:

# myscript.xyz
# My hypothetical scripting language
a = 30
b = Sum(a, 12) # call SumFunctor

# myscript.xyz

# My hypothetical scripting language

a = 30

b = Sum(a, 12) # call SumFunctor

Now, here come the juicy questions:

since data types in the script will only be known at runtime (i.e., once the script is being executed), how can an hypothetical interpreter going over myscript.xyz dispatch the calls to the correct overloads in the C++ functor? And,
how can I make a functor’s operator()s invocable in the most type-safe way possible?

Ideally, I’d like to have something akin to Qt‘s Q_INVOKABLE tag, but I most certainly don’t want to write a Meta-Object Compiler from scratch, and I’d like to avoid devilish macros as much as possible. So, strap in for some weird templated exploration.

Boundary conditions

To better understand where this is headed, let’s define some constraints and solidify some assumptions.

I want an hypothetical interpreter to transparently deal with available functors, regardless of their implementation specifics. Furthermore, I might want to load/unload functions dynamically, e.g. via some sort of import statement. This can be solved quite classically, by having all functors inherit from the same abstract base class (like the AbstractFunctor in the above snippet), which will provide a cohesive minimal API (OOP haters intensify).

How such interpreter accesses the functors isn’t relevant here: they might be stored in a map, vector, list, etc., from which the right one is picked and called. However, there needs to be a cohesive way to pass arguments to the functor, without knowing either their types nor quantities beforehand.
To approach this, we can take some cues from Python: when writing custom C extensions, data is passed back and forth using PyObjects, which effectively work as containers for runtime data. This is not distant from Qt’s QVariant, — also a generic runtime data container — that appears quite often in the interface between Qt/C++ and QML/JS.
To avoid rolling our own polymorphic container at first, let’s start off by using C++17’s freshly added std::variant instead.

“But”, I hear you say, “std::variant is not runtime polymorphic; all possible types need to be known at compilation time!”. That’s right. std::variant indeed defines a compile-time sum-type. However, it allows for some limited runtime introspection and provides value semantics, all which come in handy. This will temporarily restrict the data types our functors can deal with, but that’s a constraint we shall lift later. For now, let’s define a simple variant_t,

using variant_t = std::variant<int, double, std::string>;

1	using variant_t = std::variant<int, double, std::string>;

that packages a value coming from the interpreter. Let’s start off by accepting ints, doubles and strings argument types, which shall be bundled in a vector.

Our AbstractFunctor is thus bound to have a method such as void AbstractFunctor::invoke(const std::vector<variant_t>&), through which it can be, well, invoked. Notice that the functors’ return values are being ignored (our invoke method is void, after all). That’s yet another temporary simplification. We’ll get to that. Eventually.

Verifying a variant vector’s veridical validity

At this point, a first step would be asking the question: given a list of arguments (std::vector<variant_t>) and a certain function signature (operator(A, B, C, ...)), how can we verify that the argument types match said signature?

This turned out to be an interesting point, since it lies somewhere between compile-time constraints and run-time requirements. And, while I could likely concoct some solution exploiting typeid/type_info, I figured it’d make sense to explore a path using templates. That way, I could offload as much type-checking as possible to the compilation phase.
So, for the SumFunctor::operator(int,int) above, I’d want to write something as matches<int, int>(arglist), which verifies if arglist contains two ints. We’re thus searching for something of the form

template<typename Args...> bool matches(const std::vector<variant_t>& arglist);

1	template<typename Args...> bool matches(const std::vector<variant_t>& arglist);

where our typename Args... statement makes use of C++11’s template parameter packs. So, strap in for a brief jump into recursive template meta-programming territory!

Tiny trip through templated territory

Template meta-programming is basically functional programming with a cumbersome syntax. In its recursive form, it follows the principles of mathematical induction quite closely: define a base case (a proof for an $n = 0$) and an inductive step (proof that $n = k$ also holds for $n = k + 1$), and voilà, the compiler will do the expansions for you.

For example: in a language better suited for such task, like Haskell, we could compactly define the recursive sum of a list of integers:

sum [x] = x -- base case: sum of a list with one element, [x], is itself
sum (x:xs) = x + sum xs -- inductive step: peel off x from list + sum the rest (xs)

1 2	sum [x] = x -- base case: sum of a list with one element, [x], is itself sum (x:xs) = x + sum xs -- inductive step: peel off x from list + sum the rest (xs)

Calling something like sum [1, 2, 3] would essentially expand to 1 + sum [2, 3] -> 1 + 2 + sum [3] -> 1 + 2 + 3, unsurprisingly yielding 6. To write something similar in C++, we end up being a bit more verbose:

constexpr int sum(int x) 
{ return x; } // base case: sum of a single element is itself

template <typename ...Args> constexpr int sum(int x, Args... xs) 
{ return x + sum(xs...); } // inductive step: peel off x from list + sum the rest (xs...)

constexpr int sum(int x)

{ return x; } // base case: sum of a single element is itself

template <typename ...Args> constexpr int sum(int x, Args... xs)

{ return x + sum(xs...); } // inductive step: peel off x from list + sum the rest (xs...)

While the syntax is unfortunately less readable, the base/inductive case pattern is very recognizable. Calling sum(1, 2, 3) spits out 6 as expected. The added constexprs aren’t mandatory, but allow for compile-time computation when possible, which is very neat.
By expanding this C++ example a bit, the sum function can be made to work with basically any combination of types for which operator+ is defined:

template <typename Tx> constexpr decltype(auto) sum(Tx&& x) noexcept 
{ return std::move(x); }

template <typename Tx, typename ...Txs> constexpr decltype(auto) sum(Tx&& x, Txs&&... xs) noexcept
{ return x + sum(std::forward<Txs>(xs)...); }

template <typename Tx> constexpr decltype(auto) sum(Tx&& x) noexcept

{ return std::move(x); }

template <typename Tx, typename ...Txs> constexpr decltype(auto) sum(Tx&& x, Txs&&... xs) noexcept

{ return x + sum(std::forward<Txs>(xs)...); }

Sure, this introduces C++14’s decltype(auto), adds an overkill noexcept for funsies, and throws some std::move and std::forward into the mix, but hey, you get to recursively add apples and oranges together at compile time*. Tasty**!

* Provided operator+(apple, orange) is defined. YMMV.
** Could’ve been even tastier if written with a fold expression, but let’s leave that aside.

Ok, back to our main point.

Returning to the route of recursive random ramblings

In light of this recently acquired knowledge (and after banging my head against a wall for some time), a fleshed out bool matches<...>(...) function looks like the following:

/* Base case: check that the N-th argument (counting from the back) is of type 'Type' */
template <size_t N, typename Type>
bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) {
    if constexpr(N > 0)
        return std::get_if<Type>(&arglist.at(arglist.size() - N)) != nullptr;
    return true;
}
/* Inductive step: peel off First type from 'Types' */
template <size_t N, typename First, typename Second, typename... Types>
bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) { 
    if constexpr(N > 0)
        return checkArgumentTypeAt<N, First>(arglist) 
                    && checkArgumentTypeAt<N-1, Second, Types...>(arglist);
    else
        return true;
}
/* Entry point; the matches() function. */
template <typename... Types>
bool matches(const std::vector<variant_t>& arglist) noexcept override {
    const size_t N = sizeof...(Types);
    if constexpr(N > 0)
        return arglist.size() == N && checkArgumentTypeAt<N, Types...>(arglist);
    else
        return arglist.empty(); // for a function w/ no args: list should be empty
}

/* Base case: check that the N-th argument (counting from the back) is of type 'Type' */

template <size_t N, typename Type>

bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) {

if constexpr(N > 0)

return std::get_if<Type>(&arglist.at(arglist.size() - N)) != nullptr;

return true;

}

/* Inductive step: peel off First type from 'Types' */

template <size_t N, typename First, typename Second, typename... Types>

bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) {

if constexpr(N > 0)

return checkArgumentTypeAt<N, First>(arglist)

&& checkArgumentTypeAt<N-1, Second, Types...>(arglist);

else

return true;

}

/* Entry point; the matches() function. */

template <typename... Types>

bool matches(const std::vector<variant_t>& arglist) noexcept override {

const size_t N = sizeof...(Types);

if constexpr(N > 0)

return arglist.size() == N && checkArgumentTypeAt<N, Types...>(arglist);

else

return arglist.empty(); // for a function w/ no args: list should be empty

}

First off, there is some syntactical sugar to address here: get_if allows me to check at runtime if an underlying std::variant contains a given type. Next, operator sizeof... returns the length of a template parameter pack — we can thus know how many Types were provided as template arguments. Last, but not least, if constexpr is a C++17 godsend, that enables proper conditional branching at compile-time.

In the snippet, matches acts as an entry point for our recursive checkArgumentTypeAt, which, as the name indicates, checks if the element at the $N$-th position of the arglist has the expected type. The base case and inductive step mechanics are the same as the defined in the previous section. Note, however, that different from our recursive sum example, the First and Second types must be explicitly peeled off from the template parameter pack. This is because the signature of checkArgumentTypeAt is the same in both specializations, and the compiler needs to be provided with enough information for the SFINAE rules to kick in correctly. If we instead had written

template <size_t N, typename First>
bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) { /* ... */ }
template <size_t N, typename First, typename... Types> // omitted 'typename Second' from before
bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) { /* ... */ }

template <size_t N, typename First>

bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) { /* ... */ }

template <size_t N, typename First, typename... Types> // omitted 'typename Second' from before

bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) { /* ... */ }

since typename... Types can be empty, a call such as checkArgumentTypeAt<1, int>(arglist) would’ve been ambiguous* — as GCC happily confirms:

Error: call of overloaded 'checkArgumentTypeAt(...)' is ambiguous
             return arglist.size() == N && checkArgumentTypeAt<N, Args...>(arglist);
                                           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~

Error: call of overloaded 'checkArgumentTypeAt(...)' is ambiguous

return arglist.size() == N && checkArgumentTypeAt<N, Args...>(arglist);

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~

* Some other type deduction rules are also lurking in the shadows here. For more, here’s a mandatory Scott Meyers talk.

Deterministically dispatching to designated definitions: dope!

Once the matches function verifies the argument list, I want to be able to dispatch it to a corresponding method. At this point, I figured it’d make sense to define the concept of a Dispatcher, i.e., an object responsible for calling a particular operator() overload on our target functor.

The Dispatcher can thus be templated according to the target functor and the signature of the operator() that will be called:

struct AbstractDispatcher
{
    virtual ~AbstractDispatcher() = default;
    virtual bool matches(const std::vector<variant_t>& arglist) const noexcept = 0;
    virtual void dispatch(const std::vector<variant_t>& arglist) = 0;
};

template <class Functor, typename... Types>
struct Dispatcher : public AbstractDispatcher
{
    Dispatcher(Functor _f) : f(_f) {}
    bool matches(const std::vector<variant_t>& arglist) const noexcept override;
    void dispatch(const std::vector<variant_t> &arglist) override { /* what goes in here? */ }
private:
    Functor f; // the target functor
    void operator()(const Types&... arglist) {
        if constexpr(std::is_pointer<Functor>::value) {
            (*f)(arglist...); // dereference if f is a pointer
        else
            f(arglist...);
    }
    template <size_t N, typename First, typename Second, typename... RemainingTypes>
    bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) const;
    template <size_t N, typename Type>
    bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) const;
};

struct AbstractDispatcher

{

virtual ~AbstractDispatcher() = default;

virtual bool matches(const std::vector<variant_t>& arglist) const noexcept = 0;

virtual void dispatch(const std::vector<variant_t>& arglist) = 0;

};

template <class Functor, typename... Types>

struct Dispatcher : public AbstractDispatcher

{

Dispatcher(Functor _f) : f(_f) {}

bool matches(const std::vector<variant_t>& arglist) const noexcept override;

void dispatch(const std::vector<variant_t> &arglist) override { /* what goes in here? */ }

private:

Functor f; // the target functor

void operator()(const Types&... arglist) {

if constexpr(std::is_pointer<Functor>::value) {

(*f)(arglist...); // dereference if f is a pointer

else

f(arglist...);

}

template <size_t N, typename First, typename Second, typename... RemainingTypes>

bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) const;

template <size_t N, typename Type>

bool checkArgumentTypeAt(const std::vector<variant_t>& arglist) const;

};

To please the OOP gods (and allow runtime polymorphism), Dispatcher inherits from an AbstractDispatcher with a minimal API. The matches and checkArgumentTypeAt from before are also present, now as const members functions.

Note that Dispatcher has a member f of type Functor. That allows storing the target functor however it is best suited: reference, pointer or RAII member (which provides support for lambdas!). Right below it is an operator(), which will basically invoke the Functor‘s operator() by correctly dereferencing f. In essence, Dispatcher::operator() is just an indirection to leverage C++’s static dispatch rules, and select the correct target Functor::operator() at compile time.

The missing piece of the puzzle is the elusive dispatch method. There are two problems here: 1) how can the $N$-th element in the arglist be converted to the correct type and, 2) how can Dispatcher::operator() be invoked with the right number of arguments?

Problem 1 is fairly straightforward. If the matches<...>(...) function returned true, it is known that the $N$-th element in the list has the same type as the $N$-th type in the Types... template pack. To extract it, one can leverage std::tuple in a very amusing way:

template<size_t N> using TypeAt = typename std::tuple_element<N, std::tuple<Types...>>::type;

1	template<size_t N> using TypeAt = typename std::tuple_element<N, std::tuple<Types...>>::type;

This TypeAt expression wraps Types... in a std::tuple, and uses std::tuple_element to fetch the type of the $N$-th entry — all at compile time. Now, within the Dispatcher, retrieving and converting the $N$-th argument in arglist is just a matter of writing std::get<TypeAt<N>>(arglist.at(N)). Neat!

A solution for problem 2 builds on top on problem 1, and our complete dispatch method takes form. Unfortunately, it ends up being way less elegant then desired:

template <class Functor, typename... Types> 
struct Dispatcher : public AbstractDispatcher 
{
    // ...
    void dispatch(const std::vector<variant_t>& arglist) override {
        const size_t N = sizeof...(Types);
        if constexpr(N == 0) {
            (*this)();
        } else if constexpr(N == 1) {
            (*this)(std::get<TypeAt<0>>(arglist.at(0)));
        } else if constexpr(N == 2) {
            (*this)(std::get<TypeAt<0>>(arglist.at(0)),
                    std::get<TypeAt<1>>(arglist.at(1)));
        } // add other Ns ad nauseam
    }
    // ...
}

template <class Functor, typename... Types>

struct Dispatcher : public AbstractDispatcher

{

// ...

void dispatch(const std::vector<variant_t>& arglist) override {

const size_t N = sizeof...(Types);

if constexpr(N == 0) {

(*this)();

} else if constexpr(N == 1) {

(*this)(std::get<TypeAt<0>>(arglist.at(0)));

} else if constexpr(N == 2) {

(*this)(std::get<TypeAt<0>>(arglist.at(0)),

std::get<TypeAt<1>>(arglist.at(1)));

} // add other Ns ad nauseam

}

// ...

}

Once again, we combine sizeof... and if constexpr to only evaluate the correct conditional branch at compilation time. Unfortunately, the total amount of supported arguments needs to be added by hand, which is rather disappointing. AFAIK, there’s no way to circumvent this (and the fact that even Qt has done something similar further convinces me of that.) If any bright soul sees a better alternative here, send a pigeon!

Remaining remarks

When developing a new functor, I’d like to define which methods will be “invocable” via the dispatcher-shenanigans conjured up in the last sections. The aforementioned mythical (but not forgotten) interpreter should see an invoke method that does all the black magic internally. So, let’s circle back to the first code snippet, and define how our AbstractFunctor could look like:

struct AbstractFunctor
{
private:
    std::vector<std::shared_ptr<AbstractDispatcher>> dispatchers;
public:
    virtual ~AbstractFunctor() = 0;

    bool invoke(const std::vector<variant_t>& arglist) {
        for(auto& d : dispatchers) {
            if(d->matches(arglist)) {
                d->dispatch(arglist);
                return true;
            }
        }
        return false;
    }
protected:
    template<typename... Args, class Functor>
    void makeInvocable(Functor f) {
        dispatchers.emplace_back(std::make_shared<Dispatcher<Functor, Args...>>(f));
    }
};
AbstractFunctor::~AbstractFunctor() {} // Keep class abstract, despite lacking pure virtuals

struct AbstractFunctor

{

private:

std::vector<std::shared_ptr<AbstractDispatcher>> dispatchers;

public:

virtual ~AbstractFunctor() = 0;

bool invoke(const std::vector<variant_t>& arglist) {

for(auto& d : dispatchers) {

if(d->matches(arglist)) {

d->dispatch(arglist);

return true;

}

return false;

}

protected:

template<typename... Args, class Functor>

void makeInvocable(Functor f) {

dispatchers.emplace_back(std::make_shared<Dispatcher<Functor, Args...>>(f));

}

};

AbstractFunctor::~AbstractFunctor() {} // Keep class abstract, despite lacking pure virtuals

Neat little detail: different from classes/structs, template parameter pack expansion rules for functions allow packs to appear anywhere in the list (so long as the function signature provides enough cues as to what-goes-where). This can be exploited to write a little makeInvocable helper method that doesn’t require manually specifying the Functor type.
I can then add it to the constructor of our old friend, SumFunctor:

struct SumFunctor : AbstractFunctor
{
    // ...
    SumFunctor() {
        makeInvocable<int, int>(this); // invoke operator(int, int)
        makeInvocable<std::string, std::string>(this); // invoke operator(string, string)
    }
    //...
};

struct SumFunctor : AbstractFunctor

{

// ...

SumFunctor() {

makeInvocable<int, int>(this); // invoke operator(int, int)

makeInvocable<std::string, std::string>(this); // invoke operator(string, string)

}

//...

};

Let’s finally put everything together:

int main(int, char**)
{
    SumFunctor sumf;
    sumf.invoke({1, 5});
    sumf.invoke({"foo", "bar"});
}

int main(int, char**)

{

SumFunctor sumf;

sumf.invoke({1, 5});

sumf.invoke({"foo", "bar"});

}

This yields the very unsurprising — but exciting — output:

SumFunctor::operator(int, int) = 6
SumFunctor::operator(string, string) = foobar

1 2	SumFunctor::operator(int, int) = 6 SumFunctor::operator(string, string) = foobar

If you want to play around with this, check/modify/compile the full source-code at Wandbox (or check the GitHub Gist). Have at it!

Conclusions*

While this was a fairly long write-up, it ended up being a fairly shallow exploration in both C++ template madness and in the scripting problem outlined in the introduction. There are still quite a few open problems: how to deal with return values in the functors? How to circumvent the limitations of the introduced variant_t? Can we add support to default arguments? The list goes on.

While a few of those I’ve got worked out already, others are widely unresolved. In any case, I’ll save all these ramblings for a post to come yet in this decade.

Edit: I absolutely cannot NOT mention ChaiScript, which is everything this post wishes to be when it grows up: a fully developed, type-safe scripting language for C++. Absolutely fantastic code by Jason Turner & Co. If all this nonsense piqued your interest, take a look at their GitHub!

’til next time!

*Alas, additional alliterations aren’t available anymore.