The code below demonstrates why it is not guaranteed that 4-byte value being written by another thread is read either as original or final, but it can be read “partially written”:
static constexpr int offset=2; alignas(64) char vars[64+4-offset]; static volatile unsigned * const p = reinterpret_cast<unsigned *>(&vars[64-offset]); unsigned getVar() { return *p; } void loop() { while(true) { *p = -1; *p = 0; } } #include <thread> #include <iostream> #include <iomanip> #include <cstdlib> #include <map> int main() { std::thread thread(loop); std::map<unsigned,int> xs; for(int i=0;i<10000000;++i) { const auto x=getVar(); ++xs[x]; } for(const auto& x : xs) std::cout << std::setfill('0') << std::setw(8) << std::hex << x.first << ": " << std::dec << x.second << " times\n"; std::exit(0); // exit, killing the thread without abnormal termination via std::terminate }
The code can be compiled with the following command, for example:
g++-5 -std=c++11 -O3 -Wall -pedantic-errors test.cpp -pthread -o test
While we write only zeros and minus ones in loop() function, we read half-ones and half-zeros:
00000000: 5002914 times 0000ffff: 1478 times ffff0000: 1975 times ffffffff: 4993633 times
The key point here is that the value is at the boundary of the cash line, that is 64 bit on the most architectures, volatile keyword prevents loop() function from being completely optimized.
If we use assembler to ensure the value is read and wrote with the single instruction, we get the same effect:
static constexpr int offset=2; alignas(64) char vars[64+4-offset]; unsigned getVar() { unsigned result; asm volatile("mov %0, dword ptr [%1]\n":"=r"(result):"r"(&vars[64-offset])); return result; } void loop() { asm volatile(R"( 1: mov dword ptr [%0],-1 mov dword ptr [%0],0 jmp 1b )"::"r"(&vars[64-offset])); } #include <thread> #include <iostream> #include <iomanip> #include <cstdlib> #include <map> int main() { std::thread thread(loop); std::map<unsigned,int> xs; for(int i=0;i<10000000;++i) { const auto x=getVar(); ++xs[x]; } for(const auto& x : xs) std::cout << std::setfill('0') << std::setw(8) << std::hex << x.first << ": " << std::dec << x.second << " times\n"; std::exit(0); // exit, killing the thread without abnormal termination via std::terminate }
g++-5 -std=c++11 -O3 -Wall -pedantic-errors test.cpp -masm=intel -pthread -o test
The number of half-zeros and half-ones can vary with both alternatives:
00000000: 4394110 times 0000ffff: 1151 times ffff0000: 501387 times ffffffff: 5103352 times
In C#, for example, reading and writing to primitive types of 32 bit size or smaller are atomic.