The following C++ code compares the performance of std::atomic and std::mutex:
#include <atomic>
#include <mutex>
#include <iostream>
#include <chrono>
#include <thread>
const size_t size = 100000000;
std::mutex mutex;
bool var = false;
typedef std::chrono::high_resolution_clock Clock;
void testA()
{
std::atomic<bool> sync(true);
const auto start_time = Clock::now();
for (size_t counter = 0; counter < size; counter++)
{
var = sync.load();
//sync.store(true);
//sync.exchange(true);
}
const auto end_time = Clock::now();
std::cout << 1e-6*std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time).count() << " s\n";
}
void testB()
{
const auto start_time = Clock::now();
for (size_t counter = 0; counter < size; counter++)
{
std::unique_lock<std::mutex> lock(mutex);
var=!var;
}
const auto end_time = Clock::now();
std::cout << 1e-6*std::chrono::duration_cast<std::chrono::microseconds>(end_time - start_time).count() << " s\n";
}
int main()
{
std::thread t1(testA);
t1.join();
std::thread t2(testB);
t2.join();
}
The code can be compiled with GCC using the following command:
g++ -std=c++11 -pthread -O3 test.cpp -o test
On x86/64 platform (Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz, family 6, model 60, cores 4, Ubuntu 14.04.5 LTS, Trusty Tahr) the output is:
0.044558 s
1.90761 s
so, atomic::load() is 42 times faster then locking and unlocking std::mutex. It is interesting that -O3 option gives significant optimization, without this option the output is:
0.584058 s
3.091 s
std::lock_guard and std::unique_lock, takes the same time, std::mutex and std::recursive_mutex are also very close.


I’m afraid you’re comparing two routines that don’t do the same thing.
The line ‘var = sync.load();’ doesn’t change the state of ‘var’, as in the other test.
Theoretically ‘var’ can be removed from the code, but probably I used it to prevent something from being optimized.
it’s a bit unfair to use std::unique_lock because it has to go through additional indirection to reach the mutex. std::lock_guard or direct use of std::mutex::lock/unlock would be a better comparison
I am not sure, probably you are right, but read to the end of the post. I used them both and did not notice a significant difference.
Definition of “synchronization primitive”
https://stackoverflow.com/questions/8017507/definition-of-synchronization-primitive