Version tolerant serialization in C++

Last time I have been working on a C++ binary serialization framework that allows to serialize simple data structures with a few lines of code. First, you add AWL_REFLECT macro to all your structures as follows:

#include "Awl/Reflection.h"
#include <string>
#include <vector>
#include <set>

struct A
{
    int a;
    bool b;
    std::string c;
    double d;

    AWL_REFLECT(a, b, c, d)
};
struct C
{
    int x;
    A a;

    AWL_REFLECT(x, a)
};

struct B
{
    A a;
    A b;
    int x;
    bool y;
    std::vector<A> v;
    std::set<C> v1;

    AWL_REFLECT(a, b, x, y, v, v1)
};

Then you define Reader and Writer:

#include "Awl/Io/Vts.h"

// Generate std::variant containing all the data types participating in the serialization.
using V = awl::io::helpers::variant_from_structs<A, B>;

// Define the reader.
template <class IStream>
using Reader = awl::io::Reader<V, IStream>;

// Defined the writer.
template <class OStream>
using Writer = awl::io::Writer<V, OStream>;

and serialize structures A and B with the following code:

#include <iostream>
#include "Awl/Io/VectorStream.h"

int main()
{
    // std::vector that will contain serialized data.
    std::vector<uint8_t> v;

    const A a_expected = { 1, true, "abc", 2.0 };
    const C c_expected = { 7, a_expected };
    const B b_expected = { a_expected, a_expected, 1, true, std::vector<A>{ a_expected, a_expected, a_expected }, { c_expected } };

    try
    {
        // Write A and B.
        {
            // A stream that writes into std::vector.
            awl::io::VectorOutputStream out(v);

            // Serialization context.
            Writer ctx;

            // Write automatically generated meta information first.
            ctx.WriteNewPrototypes(out);

            // Write A and B.
            ctx.WriteV(out, a_expected);
            ctx.WriteV(out, b_expected);
        }

        // Read A and B.
        {
            // A stream that reads from std::vector.
            awl::io::VectorInputStream in(v);

            // Serialization context.
            Reader ctx;

            // Read the meta information first.
            ctx.ReadOldPrototypes(in);

            // Read A.
            A a;
            ctx.ReadV(in, a);
            assert(a == a_expected);

            // Read B.
            B b;
            ctx.ReadV(in, b);
            assert(b == b_expected);

            // Ensure we read entire stream.
            assert(in.End());
        }
    }
    catch (const awl::io::IoException& e)
    {
        std::cout << "IO error: " << e.What() << std::endl;

        return 1;
    }

    return 0;
}

The serialization is version tolerant, this means that if you wrote A and B structures into a file and then added new fields to you structures or deleted some fields in a new version of your software you are still able to read your new A and B from that file. The only inconvenience with the deleting is that you should include the type of deleted field into std::variant, for example if you delete field c from structure A you define std::variant in the new version of your software as follows:

// Include std::string to the variant to make serialization engine aware of the type of the deleted field.
using V = awl::io::helpers::variant_from_structs<A, B, std::string>;

but if you delete field b from structure A you do not need to include bool type to std::variant because another fields of type bool still exists in structure B and so they are automatically included into std::variant. Also if you delete v1 from B you do not need to include std::set<C> into std::variant because std::vector<C> and std::set<C> are identical at the metadata level, they both are sequence<C>.

You even able to rename a field by specializing FieldMap class template. The code below renames B::v1 with B::v2:

namespace awl::io
{
    template <>
    class FieldMap<B>
    {
    public:

        static std::string_view GetNewName(std::string_view old_name)
        {
            using namespace std::literals;

            if (old_name == "v1"sv)
            {
                return "v2"sv;
            }

            return old_name;
        }
    };
}

If you add a new field no action is required while the type of the new filed is known by the framework. If it is not you specialize its descriptor and overload Read and Write functions as the code below does for std::optional:

namespace awl::io
{
    template <class T>
    struct type_descriptor<std::optional<T>>
    {
        static constexpr auto name()
        {
            return fixed_string("optional<") + make_type_name<T>() + fixed_string(">");
        }
    };

    static_assert(make_type_name<std::optional<std::string>>() == fixed_string("optional<sequence<int8_t>>"));

    template <class Stream, typename T, class Context = FakeContext>
        requires sequential_input_stream<Stream>
    void Read(Stream & s, std::optional<T>& opt_val, const Context & ctx = {})
    {
        bool has_value;

        Read(s, has_value, ctx);

        if (has_value)
        {
            T val;

            Read(s, val, ctx);

            opt_val = std::move(val);
        }
    }

    template <class Stream, typename T, class Context = FakeContext>
        requires sequential_output_stream<Stream>
    void Write(Stream & s, const std::optional<T>& opt_val, const Context & ctx = {})
    {
        const bool has_value = opt_val.has_value();

        Write(s, has_value, ctx);

        if (has_value)
        {
            Write(s, opt_val.value(), ctx);
        }
    }
}

Advantages

  • The advantages of this serialization technique is that it is simple, intuitive, has close to zero overhead and its performance is comparable with std::memmove.
  • It works directly with C++ structures and does not require additional wrappers and generators like Protobuf does, for example, and thus allows to serialize template classes.

Limitations

  • It is not cross-language and not cross-platform. For example, the representation of arithmetic types depends on the platform because the framework simply casts them to uint8_t* with reinterpret_cast.
  • It allows to serialize only a tree of objects (but not a graph and even not a directed graph), because we do not have a mechanism that would prevent an object from being serialized twice (we do not compare the objects references as .NET or Java serialization engines do), so the serialization of a type like std::share_ptr can be problematic.
  • It requires all the types participating in the serialization to be default constructible that can be problematic in certain scenarios, for example, when we need to initialize std::vector with an instance of an allocator or std::set with an instance of a comparer.

Future Improvements

Non-default constructible types

We need to invent a mechanism to handle non default-constructible types. Assume we have two sets of the same type with different comparers:

#include <set>
#include <vector>

class Compare
{
public:

    Compare(bool less) : m_less(less) {}

    bool operator () (int a, int b) const
    {
        if (m_less)
        {
            return a < b;
        }

        return b < a;
    }

private:

    const bool m_less;
};

using Set = std::set<int, Compare>;

struct A
{
    Set s;
    std::vector<Set> v;

    AWL_REFLECT(s, v)
};

int main()
{
    A a { Set(Compare(true)), std::vector{Set(Compare(false))}};

    // ...

    awl::io::Write(<some-stream>, a);

    // ..

    // How to read it?

    return 0;
}

How to read structure A from a stream? Should the compares be serialized or not? If they should the next questions is what about the allocators?

Further usage of C++20 concepts

Another improvement is that we need to make is_tuplizable_v and is_reflectable_v not a boolean variables, but concepts as in the sample code below:

#include <concepts>
#include <iostream>

class A
{
public:

    int foo() { return 25; }
};

template <class T>
concept self_fooable = requires(T& t)
{
    t.foo();
};

static_assert(self_fooable<A>);

template <class T> requires self_fooable<T>
constexpr auto object_as_foo(T& val)
{
    return val.foo();
}

template <class T>
concept fooable = requires(T & t)
{
    object_as_foo(t);
};

static_assert(fooable<A>);

class B {};

constexpr auto object_as_foo(B&)
{
    return 1;
}

static_assert(fooable<B>);

int main()
{
    A a;

    std::cout << object_as_foo(a) << std::endl;
    
    return 0;
}

We need a separate self_fooable concept because we can’t make as_tuple not a member function, but a friend function, because this will require the structure name to be AWL_REFLECT macro parameter.

It is not clear enough how to define serializable concept, for example, the code below is not quite correct because Read and Write accept different streams:

template <class Stream, class T>
concept serializable = requires(Stream& s, T& val)
{
    Read(s, t);
    Write(s, std::as_const(t));
};

Reflection for C++26

In a far future when C++ supports reflection we’ll probably get rid of AWL_REFLECT macro.

Source Code

The framework is a part of AWL Library available on GitHub. Feel free to clone it and test.

Links:

1 Response to Version tolerant serialization in C++

  1. dmitriano says:

    JSON and other formats
    https://github.com/getml/reflect-cpp

Leave a Reply

Your email address will not be published. Required fields are marked *