A simple C++ serialization framework

I implemented a simple C++ binary serialization framework that makes a structure or class serializable by adding a macro that usually takes one line of code as shown in the example below:

struct A
{
    bool x;
    int y;

    AWL_SERIALIZABLE(x, y)
};

There is also a macro that makes a structure or a class equatable:

AWL_MEMBERWISE_EQUATABLE(A)

The framework supports built-in types and standard C++ containers, so it can serializable not only structure A, but also the following class B:

class B
{
public:

    B() : m_aggregated{}, m_set { 0, 1, 2 }, m_v{ 3, 4 }, m_a{ 'a', 'b', 'c' }, m_bs(3ul)
    {
    }

    AWL_SERIALIZABLE(m_aggregated, m_set, m_v, m_a, m_bs, m_opt, m_flag)

private:

    A m_aggregated;
    std::set<int> m_set;
    std::vector<int> m_v;
    std::array<char, 3> m_a;
    std::bitset<3> m_bs;
    std::optional<uint32_t> m_opt = 25u;
    bool m_flag = true;
};

AWL_MEMBERWISE_EQUATABLE(B)

All the source code is available on GitHub as a part of AWL library that has IO module with the sequential stream interface that we use for serialization:

class SequentialInputStream
{
public:

    virtual bool End() = 0;
                
    virtual size_t Read(uint8_t * buffer, size_t count) = 0;

    virtual ~SequentialInputStream() = default;
};

class SequentialOutputStream
{
public:

    virtual void Write(const uint8_t * buffer, size_t count) = 0;

    virtual ~SequentialOutputStream() = default;
};

The code below demonstrates how to serialize/deserialize class B using an implementation of the sequential stream interface based on std::vector :

    //an instance of class B to serialize
    B sample;

    std::vector<uint8_t> reusable_v;

    {
        VectorOutputStream out(reusable_v);

        Write(out, sample);
    }

    //an instance of class B to deserialize
    B result;
    
    {
        VectorInputStream in(reusable_v);

        Read(in, result);
    }

    //it is where AWL_MEMBERWISE_EQUATABLE macro comes to play
    assert(sample == result);

    //ensure we have read all the data from the input stream
    assert(in.End());

There are also an implementation of the sequential stream on the standard C++ stream and an implementation that supports hashing and buffering that can be used in the same way.

As you will see below, AWL uses static polymorphism, so custom streams can be defined as the classes with member function signatures compatible to SequentialInputStream and SequentialOutputStream and may not derive from them.

How it works?

It definitely works, below I’ll explain how.

AWL_SERIALIZABLE macro is petty simple, it actually defines a function that returns a tuple of the references to the class members. In the above example with class A, it expands into this:

struct A
{
    int x;
    double y;

    constexpr auto as_const_tuple() const
    {
        return std::tie(x, y);
    }

    constexpr auto as_tuple() const
    {
        return as_const_tuple();
    }

    constexpr auto as_tuple()
    {
        return std::tie(x, y);
    }
};

The macro reduces the task of the serialization of a class to the task of the serialization of a tuple of references so we define Read and Write functions as follows:

template <class Stream, typename T>
inline typename std::enable_if<is_tuplizable_v<T>, void>::type Read(Stream & s, T & val)
{
    Read(s, object_as_tuple(val));
}

template <class Stream, typename T>
inline typename std::enable_if<is_tuplizable_v<T>, void>::type Write(Stream & s, const T & val)
{
    Write(s, object_as_tuple(val));
}

As you probably noted, AWL does not use as_tuple member function directly, but uses object_as_tuple adapter function to enable the serialization of 3rd-party classes that cannot be changed (see an example):

template <class T>
inline constexpr auto object_as_tuple(T & val)
{
    return val.as_tuple();
}

template <class T>
inline constexpr auto object_as_tuple(const T & val)
{
    return val.as_tuple();
}

To implement Read and Write overloads for tuples we use the for_each helper function that iterates over a tuple elements using C++17 fold expressions:

template <typename... Args, typename Func, std::size_t... index>
inline constexpr void for_each(const std::tuple<Args...>& t, Func&& f, std::index_sequence<index...>)
{
    (f(std::get<index>(t)), ...);
}

template <typename... Args, typename Func>
inline constexpr void for_each(const std::tuple<Args...>& t, Func&& f)
{
    for_each(t, f, std::index_sequence_for<Args...>{});
}

and pass the lambdas that read and write tuple elements as its argument:

template<class Stream, typename ... Fields>
inline void Read(Stream & s, std::tuple<Fields& ...> val)
{
    for_each(val, [&s](auto& field) { Read(s, field); });
}

template<class Stream, typename ... Fields>
inline void Write(Stream & s, const std::tuple<Fields& ...> & val)
{
    for_each(val, [&s](auto& field) { Write(s, field); });
}

the rest of the work is to define Read and Write overloads for all the types we serialize, including built-in types, standard C++ containers (AWL supports most of them) and probably types that require nontrivial serialization.

As you probably guess, AWL_MEMBERWISE_EQUATABLE macro compares two tuples and expands into this:

inline bool operator == (const A & left, const A & right)
{
    return awl::object_as_tuple(left) == awl::object_as_tuple(right);
}

inline bool operator != (const A & left, const A & right)
{
    return awl::object_as_tuple(left) != awl::object_as_tuple(right);
}

A similar technique is demonstrated on cppreference.com.

The definitions of the used terms

  • A type is serializable if both Read and Write function overloads are defined for that type. AWL_SERIALIZABLE macro macro makes a class tuplizable that in its turn makes the class serializable because Read and Write function overloads are defined for tuplizable classes.
  • Custom serialization of a type is implemented by defining Read and Write function overloads for a type explicitly.
  • 3-rd party class is a class we can’t add AWL_SERIALIZABLE macro to or a class we can’t change at all. For such classes there are two options: make the class tuplizable but implementing object_as_tuple adapter function and is_tuplizable_v online variable or implement Read and Write function overloads.

Advantages and limitations

The advantages of this serialization technique is that it is simple, intuitive and has zero overhead, because it does not use additional data structures, but supporting serializable types that are changed from one version of an application to the next is problematic. For example, if a class was modified by adding extra fields the behavior of both

  • older version of the application deserializing new versions of the class
  • newer version of an application deserializing old versions of the class

is undefined. And, probably, these scenarios will cause a fatal application error.

Another limitation is that, using this technique allows to serialize only a tree of objects (but not a graph and even not a directed graph), because we do not have a mechanism that would prevent an object from being serialized twice (we do not compare the objects references as .NET or Java serialization engines do), so the serialization of a type like std::share_ptr<A> can be problematic.

Also, in the current implementation, the serialization format is not platform independent. The size of int type and the byte order are supposed to be invariable, for example. But theoretically platform independent format can be supported in future versions.

Version tolerant serialization

Currently I have a prototype of version tolerant serialization engine that uses a similar template metaprogramming techniques and supports the scenario when a field of a class was added, deleted, renamed or its type was changed.

Use AWL for free

If you need AWL, feel free to fork it on GitHub, but keep in mind that version compatibility is not guaranteed and there is no warranty of any kind.

Leave a Reply

Your email address will not be published. Required fields are marked *