A sample C++ code demonstrating why int is not atomic

The code below demonstrates why it is not guaranteed that 4-byte value being written by another thread is read either as original or final, but it can be read “partially written”:

static constexpr int offset=2;
alignas(64) char vars[64+4-offset];
static volatile unsigned * const p = reinterpret_cast<unsigned *>(&vars[64-offset]);

unsigned getVar()
{
    return *p;
}

void loop()
{
    while(true)
    {
        *p = -1;
        *p = 0;
    }
}

#include <thread>
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <map>

int main()
{
    std::thread thread(loop);
    std::map<unsigned,int> xs;
    for(int i=0;i<10000000;++i)
    {
        const auto x=getVar();
        ++xs[x];
    }
    for(const auto& x : xs)
        std::cout << std::setfill('0') << std::setw(8) << std::hex << x.first << ": " << std::dec << x.second << " times\n";
    std::exit(0); // exit, killing the thread without abnormal termination via std::terminate
}

The code can be compiled with the following command, for example:

g++-5 -std=c++11 -O3 -Wall -pedantic-errors test.cpp -pthread -o test

While we write only zeros and minus ones in loop() function, we read half-ones and half-zeros:

00000000: 5002914 times
0000ffff: 1478 times
ffff0000: 1975 times
ffffffff: 4993633 times

The key point here is that the value is at the boundary of the cash line, that is 64 bit on the most architectures, volatile keyword prevents loop() function from being completely optimized.

If we use assembler to ensure the value is read and wrote with the single instruction, we get the same effect:

static constexpr int offset=2;
alignas(64) char vars[64+4-offset];

unsigned getVar()
{
    unsigned result;
    asm volatile("mov %0, dword ptr [%1]\n":"=r"(result):"r"(&vars[64-offset]));
    return result;
}

void loop()
{
    asm volatile(R"(
    1:
        mov dword ptr [%0],-1
        mov dword ptr [%0],0
        jmp 1b
    )"::"r"(&vars[64-offset]));
}

#include <thread>
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <map>

int main()
{
    std::thread thread(loop);
    std::map<unsigned,int> xs;
    for(int i=0;i<10000000;++i)
    {
        const auto x=getVar();
        ++xs[x];
    }
    for(const auto& x : xs)
        std::cout << std::setfill('0') << std::setw(8) << std::hex << x.first << ": " << std::dec << x.second << " times\n";
    std::exit(0); // exit, killing the thread without abnormal termination via std::terminate
}
g++-5 -std=c++11 -O3 -Wall -pedantic-errors test.cpp -masm=intel -pthread -o test

The number of half-zeros and half-ones can vary with both alternatives:

00000000: 4394110 times
0000ffff: 1151 times
ffff0000: 501387 times
ffffffff: 5103352 times

In C#, for example, reading and writing to primitive types of 32 bit size or smaller are atomic.

Leave a Reply

Your email address will not be published. Required fields are marked *