C++11 CLH Lock Implementation
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
As a hobbyist programmer I have been reading "The Art of Multiprocessor Programming" (Herlihy, Maurice; Shavit, Nir) and converting the Java spin locks into C++ - I have enjoyed this and learned a lot. I have implemented the CLH spin queue lock (due to Travis Craig, Erik Hagersten, and Anders Landin) but I feel unhappy about the way that I have the raw pointers interacting with smart pointers and would be grateful for code review advice.
As I understand it I must use a smart pointer, in this case std::unique_ptr
, for my thread local node pointer or the lock will not deallocate properly and the clh_lock
will leak. But...
I also have no choice but to use a raw atomic tail queue pointer in order access the atomic exchange member function.
Therefore, I have to come up with some combination of either, raw or smart, prior node pointer and mangle it between raw tail
and smart local_node
pointers. (I settled on raw prior_node
pointer as this gave least, but still, ugly code in the unlock member function.)
Or am I missing something?
I have verified the lock on a 24 logical core HP Z600 refurbished workstation and measured its performance. It has proved correct and also efficient when compared to std::mutex
. Further, according to MSVC profiler, it has no memory leaks.
The implementation uses two general purpose header files:
A cross platform PAUSE function to avoid spin-wait empty loop undesirable effects such as branch mis-prediction, thread starvation on HT processors and overly high power consumption.
A (naive) cache line information file.
Here is the cpu_relax header file:
namespace sync {
inline static void cpu_relax() COMPILER == LLVM)
asm volatile("pausen": : :"memory");
#endif
Here is the cache_info header file:
namespace sync
#define CACHELINE_SIZE 64
//almost uniformly 64 bytes on Intel and AMD processors at all cache levels
//it will do for now
Here is the header file:
namespace sync
/**
* @brief The Craig-Landin-Hagersten (CLH) Lock is due to Travis Craig, Erik Hagersten, and Anders Landin[2,3] and like the Anderson Lock it uses the queue lock strategy, however, unlike the bounded queue of Anderson it uses an unbounded and, interestingly, implicit linked list of synchronisation variables. Being queue based it also offer the guarantees of first-come-first-served fairness and no thread starvation.
*
* [2] T. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, University of Washington, Department of Computer Science, February 1993.
* [3] P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. of the Eighth International Symposium on Parallel Processing (IPPS), pp. 165â 171, April 1994. IEEE Computer Society, April 1994. Vancouver
*/
class clh_lock
// neat cacheline padding trick by David Geier geidav.wordpress.com
// cacheline padding is essential to avoid false sharing
using node_t = std::pair<bool, uint8_t[CACHELINE_SIZE - sizeof(bool)]>;
static_assert(sizeof(node_t) == CACHELINE_SIZE, "");
public:
clh_lock() : tail(new node_t)
void lock()
local_node->first = true;
prior_node = tail.exchange(local_node.get());
while (prior_node->first)
cpu_relax();
void unlock()
local_node->first = false;
node_t* tmp = prior_node;
prior_node = local_node.release();
local_node.reset(tmp);
~clh_lock()
delete tail;
private:
std::atomic<node_t*> tail;
static thread_local std::unique_ptr<node_t> local_node;
static thread_local node_t* prior_node;
;
Here is the implementation file:
namespace sync
thread_local std::unique_ptr<clh_lock::node_t> clh_lock::local_node(new clh_lock::node_t);
thread_local clh_lock::node_t* clh_lock::prior_node = nullptr;
Here is the (part) verification and timing code that performs a number of thread thrashing timing runs and saves the data out as a CSV file for R studio:
//thread thrashing tests
clocks::stopwatch<> timer;
int samples100;
int nthreads25;
int ftor1000;
std::ofstream df"clh_lock.csv";
using lock_t = sync::mcs_lock;
lock_t l;
std::vector<int> N;
std::atomic_bool gofalse;
auto f = [&](int n)->void
while(!go);
for(int i0; i < ftor; ++i)
l.lock();
N.push_back(n);
l.unlock();
;
for(int j2; j < nthreads; ++j)
df << j << ((j < nthreads - 1) ?"," :"n");
for(int n0; n < samples; ++n)
std::cout << n << " ";
for(int j2; j < nthreads; ++j)
std::vector<std::thread> v;
int thdsj;
std::cout << thds << ",";
N.clear();
for(int i0; i < thds; ++i) v.emplace_back(f, i + 1);
timer.start();
go = true;
for(auto& t : v) t.join();
timer.stop();
int s0;
for(auto& n : N) s += n;
int sum = ((thds * (thds + 1)) / 2) * ftor;
assert(s == sum);
df << w.elapsed() << (j < nthreads - 1 ?"," :"n");
std::cout << "n";
std::cout << "n";
c++ c++11 thread-safety pointers locking
add a comment |Â
up vote
1
down vote
favorite
As a hobbyist programmer I have been reading "The Art of Multiprocessor Programming" (Herlihy, Maurice; Shavit, Nir) and converting the Java spin locks into C++ - I have enjoyed this and learned a lot. I have implemented the CLH spin queue lock (due to Travis Craig, Erik Hagersten, and Anders Landin) but I feel unhappy about the way that I have the raw pointers interacting with smart pointers and would be grateful for code review advice.
As I understand it I must use a smart pointer, in this case std::unique_ptr
, for my thread local node pointer or the lock will not deallocate properly and the clh_lock
will leak. But...
I also have no choice but to use a raw atomic tail queue pointer in order access the atomic exchange member function.
Therefore, I have to come up with some combination of either, raw or smart, prior node pointer and mangle it between raw tail
and smart local_node
pointers. (I settled on raw prior_node
pointer as this gave least, but still, ugly code in the unlock member function.)
Or am I missing something?
I have verified the lock on a 24 logical core HP Z600 refurbished workstation and measured its performance. It has proved correct and also efficient when compared to std::mutex
. Further, according to MSVC profiler, it has no memory leaks.
The implementation uses two general purpose header files:
A cross platform PAUSE function to avoid spin-wait empty loop undesirable effects such as branch mis-prediction, thread starvation on HT processors and overly high power consumption.
A (naive) cache line information file.
Here is the cpu_relax header file:
namespace sync {
inline static void cpu_relax() COMPILER == LLVM)
asm volatile("pausen": : :"memory");
#endif
Here is the cache_info header file:
namespace sync
#define CACHELINE_SIZE 64
//almost uniformly 64 bytes on Intel and AMD processors at all cache levels
//it will do for now
Here is the header file:
namespace sync
/**
* @brief The Craig-Landin-Hagersten (CLH) Lock is due to Travis Craig, Erik Hagersten, and Anders Landin[2,3] and like the Anderson Lock it uses the queue lock strategy, however, unlike the bounded queue of Anderson it uses an unbounded and, interestingly, implicit linked list of synchronisation variables. Being queue based it also offer the guarantees of first-come-first-served fairness and no thread starvation.
*
* [2] T. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, University of Washington, Department of Computer Science, February 1993.
* [3] P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. of the Eighth International Symposium on Parallel Processing (IPPS), pp. 165â 171, April 1994. IEEE Computer Society, April 1994. Vancouver
*/
class clh_lock
// neat cacheline padding trick by David Geier geidav.wordpress.com
// cacheline padding is essential to avoid false sharing
using node_t = std::pair<bool, uint8_t[CACHELINE_SIZE - sizeof(bool)]>;
static_assert(sizeof(node_t) == CACHELINE_SIZE, "");
public:
clh_lock() : tail(new node_t)
void lock()
local_node->first = true;
prior_node = tail.exchange(local_node.get());
while (prior_node->first)
cpu_relax();
void unlock()
local_node->first = false;
node_t* tmp = prior_node;
prior_node = local_node.release();
local_node.reset(tmp);
~clh_lock()
delete tail;
private:
std::atomic<node_t*> tail;
static thread_local std::unique_ptr<node_t> local_node;
static thread_local node_t* prior_node;
;
Here is the implementation file:
namespace sync
thread_local std::unique_ptr<clh_lock::node_t> clh_lock::local_node(new clh_lock::node_t);
thread_local clh_lock::node_t* clh_lock::prior_node = nullptr;
Here is the (part) verification and timing code that performs a number of thread thrashing timing runs and saves the data out as a CSV file for R studio:
//thread thrashing tests
clocks::stopwatch<> timer;
int samples100;
int nthreads25;
int ftor1000;
std::ofstream df"clh_lock.csv";
using lock_t = sync::mcs_lock;
lock_t l;
std::vector<int> N;
std::atomic_bool gofalse;
auto f = [&](int n)->void
while(!go);
for(int i0; i < ftor; ++i)
l.lock();
N.push_back(n);
l.unlock();
;
for(int j2; j < nthreads; ++j)
df << j << ((j < nthreads - 1) ?"," :"n");
for(int n0; n < samples; ++n)
std::cout << n << " ";
for(int j2; j < nthreads; ++j)
std::vector<std::thread> v;
int thdsj;
std::cout << thds << ",";
N.clear();
for(int i0; i < thds; ++i) v.emplace_back(f, i + 1);
timer.start();
go = true;
for(auto& t : v) t.join();
timer.stop();
int s0;
for(auto& n : N) s += n;
int sum = ((thds * (thds + 1)) / 2) * ftor;
assert(s == sum);
df << w.elapsed() << (j < nthreads - 1 ?"," :"n");
std::cout << "n";
std::cout << "n";
c++ c++11 thread-safety pointers locking
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
As a hobbyist programmer I have been reading "The Art of Multiprocessor Programming" (Herlihy, Maurice; Shavit, Nir) and converting the Java spin locks into C++ - I have enjoyed this and learned a lot. I have implemented the CLH spin queue lock (due to Travis Craig, Erik Hagersten, and Anders Landin) but I feel unhappy about the way that I have the raw pointers interacting with smart pointers and would be grateful for code review advice.
As I understand it I must use a smart pointer, in this case std::unique_ptr
, for my thread local node pointer or the lock will not deallocate properly and the clh_lock
will leak. But...
I also have no choice but to use a raw atomic tail queue pointer in order access the atomic exchange member function.
Therefore, I have to come up with some combination of either, raw or smart, prior node pointer and mangle it between raw tail
and smart local_node
pointers. (I settled on raw prior_node
pointer as this gave least, but still, ugly code in the unlock member function.)
Or am I missing something?
I have verified the lock on a 24 logical core HP Z600 refurbished workstation and measured its performance. It has proved correct and also efficient when compared to std::mutex
. Further, according to MSVC profiler, it has no memory leaks.
The implementation uses two general purpose header files:
A cross platform PAUSE function to avoid spin-wait empty loop undesirable effects such as branch mis-prediction, thread starvation on HT processors and overly high power consumption.
A (naive) cache line information file.
Here is the cpu_relax header file:
namespace sync {
inline static void cpu_relax() COMPILER == LLVM)
asm volatile("pausen": : :"memory");
#endif
Here is the cache_info header file:
namespace sync
#define CACHELINE_SIZE 64
//almost uniformly 64 bytes on Intel and AMD processors at all cache levels
//it will do for now
Here is the header file:
namespace sync
/**
* @brief The Craig-Landin-Hagersten (CLH) Lock is due to Travis Craig, Erik Hagersten, and Anders Landin[2,3] and like the Anderson Lock it uses the queue lock strategy, however, unlike the bounded queue of Anderson it uses an unbounded and, interestingly, implicit linked list of synchronisation variables. Being queue based it also offer the guarantees of first-come-first-served fairness and no thread starvation.
*
* [2] T. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, University of Washington, Department of Computer Science, February 1993.
* [3] P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. of the Eighth International Symposium on Parallel Processing (IPPS), pp. 165â 171, April 1994. IEEE Computer Society, April 1994. Vancouver
*/
class clh_lock
// neat cacheline padding trick by David Geier geidav.wordpress.com
// cacheline padding is essential to avoid false sharing
using node_t = std::pair<bool, uint8_t[CACHELINE_SIZE - sizeof(bool)]>;
static_assert(sizeof(node_t) == CACHELINE_SIZE, "");
public:
clh_lock() : tail(new node_t)
void lock()
local_node->first = true;
prior_node = tail.exchange(local_node.get());
while (prior_node->first)
cpu_relax();
void unlock()
local_node->first = false;
node_t* tmp = prior_node;
prior_node = local_node.release();
local_node.reset(tmp);
~clh_lock()
delete tail;
private:
std::atomic<node_t*> tail;
static thread_local std::unique_ptr<node_t> local_node;
static thread_local node_t* prior_node;
;
Here is the implementation file:
namespace sync
thread_local std::unique_ptr<clh_lock::node_t> clh_lock::local_node(new clh_lock::node_t);
thread_local clh_lock::node_t* clh_lock::prior_node = nullptr;
Here is the (part) verification and timing code that performs a number of thread thrashing timing runs and saves the data out as a CSV file for R studio:
//thread thrashing tests
clocks::stopwatch<> timer;
int samples100;
int nthreads25;
int ftor1000;
std::ofstream df"clh_lock.csv";
using lock_t = sync::mcs_lock;
lock_t l;
std::vector<int> N;
std::atomic_bool gofalse;
auto f = [&](int n)->void
while(!go);
for(int i0; i < ftor; ++i)
l.lock();
N.push_back(n);
l.unlock();
;
for(int j2; j < nthreads; ++j)
df << j << ((j < nthreads - 1) ?"," :"n");
for(int n0; n < samples; ++n)
std::cout << n << " ";
for(int j2; j < nthreads; ++j)
std::vector<std::thread> v;
int thdsj;
std::cout << thds << ",";
N.clear();
for(int i0; i < thds; ++i) v.emplace_back(f, i + 1);
timer.start();
go = true;
for(auto& t : v) t.join();
timer.stop();
int s0;
for(auto& n : N) s += n;
int sum = ((thds * (thds + 1)) / 2) * ftor;
assert(s == sum);
df << w.elapsed() << (j < nthreads - 1 ?"," :"n");
std::cout << "n";
std::cout << "n";
c++ c++11 thread-safety pointers locking
As a hobbyist programmer I have been reading "The Art of Multiprocessor Programming" (Herlihy, Maurice; Shavit, Nir) and converting the Java spin locks into C++ - I have enjoyed this and learned a lot. I have implemented the CLH spin queue lock (due to Travis Craig, Erik Hagersten, and Anders Landin) but I feel unhappy about the way that I have the raw pointers interacting with smart pointers and would be grateful for code review advice.
As I understand it I must use a smart pointer, in this case std::unique_ptr
, for my thread local node pointer or the lock will not deallocate properly and the clh_lock
will leak. But...
I also have no choice but to use a raw atomic tail queue pointer in order access the atomic exchange member function.
Therefore, I have to come up with some combination of either, raw or smart, prior node pointer and mangle it between raw tail
and smart local_node
pointers. (I settled on raw prior_node
pointer as this gave least, but still, ugly code in the unlock member function.)
Or am I missing something?
I have verified the lock on a 24 logical core HP Z600 refurbished workstation and measured its performance. It has proved correct and also efficient when compared to std::mutex
. Further, according to MSVC profiler, it has no memory leaks.
The implementation uses two general purpose header files:
A cross platform PAUSE function to avoid spin-wait empty loop undesirable effects such as branch mis-prediction, thread starvation on HT processors and overly high power consumption.
A (naive) cache line information file.
Here is the cpu_relax header file:
namespace sync {
inline static void cpu_relax() COMPILER == LLVM)
asm volatile("pausen": : :"memory");
#endif
Here is the cache_info header file:
namespace sync
#define CACHELINE_SIZE 64
//almost uniformly 64 bytes on Intel and AMD processors at all cache levels
//it will do for now
Here is the header file:
namespace sync
/**
* @brief The Craig-Landin-Hagersten (CLH) Lock is due to Travis Craig, Erik Hagersten, and Anders Landin[2,3] and like the Anderson Lock it uses the queue lock strategy, however, unlike the bounded queue of Anderson it uses an unbounded and, interestingly, implicit linked list of synchronisation variables. Being queue based it also offer the guarantees of first-come-first-served fairness and no thread starvation.
*
* [2] T. Craig. Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, University of Washington, Department of Computer Science, February 1993.
* [3] P. Magnussen, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. of the Eighth International Symposium on Parallel Processing (IPPS), pp. 165â 171, April 1994. IEEE Computer Society, April 1994. Vancouver
*/
class clh_lock
// neat cacheline padding trick by David Geier geidav.wordpress.com
// cacheline padding is essential to avoid false sharing
using node_t = std::pair<bool, uint8_t[CACHELINE_SIZE - sizeof(bool)]>;
static_assert(sizeof(node_t) == CACHELINE_SIZE, "");
public:
clh_lock() : tail(new node_t)
void lock()
local_node->first = true;
prior_node = tail.exchange(local_node.get());
while (prior_node->first)
cpu_relax();
void unlock()
local_node->first = false;
node_t* tmp = prior_node;
prior_node = local_node.release();
local_node.reset(tmp);
~clh_lock()
delete tail;
private:
std::atomic<node_t*> tail;
static thread_local std::unique_ptr<node_t> local_node;
static thread_local node_t* prior_node;
;
Here is the implementation file:
namespace sync
thread_local std::unique_ptr<clh_lock::node_t> clh_lock::local_node(new clh_lock::node_t);
thread_local clh_lock::node_t* clh_lock::prior_node = nullptr;
Here is the (part) verification and timing code that performs a number of thread thrashing timing runs and saves the data out as a CSV file for R studio:
//thread thrashing tests
clocks::stopwatch<> timer;
int samples100;
int nthreads25;
int ftor1000;
std::ofstream df"clh_lock.csv";
using lock_t = sync::mcs_lock;
lock_t l;
std::vector<int> N;
std::atomic_bool gofalse;
auto f = [&](int n)->void
while(!go);
for(int i0; i < ftor; ++i)
l.lock();
N.push_back(n);
l.unlock();
;
for(int j2; j < nthreads; ++j)
df << j << ((j < nthreads - 1) ?"," :"n");
for(int n0; n < samples; ++n)
std::cout << n << " ";
for(int j2; j < nthreads; ++j)
std::vector<std::thread> v;
int thdsj;
std::cout << thds << ",";
N.clear();
for(int i0; i < thds; ++i) v.emplace_back(f, i + 1);
timer.start();
go = true;
for(auto& t : v) t.join();
timer.stop();
int s0;
for(auto& n : N) s += n;
int sum = ((thds * (thds + 1)) / 2) * ftor;
assert(s == sum);
df << w.elapsed() << (j < nthreads - 1 ?"," :"n");
std::cout << "n";
std::cout << "n";
c++ c++11 thread-safety pointers locking
edited Jan 5 at 23:53
Jamalâ¦
30.1k11114225
30.1k11114225
asked Jan 5 at 23:48
headwedge
287
287
add a comment |Â
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184407%2fc11-clh-lock-implementation%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password