A fast and (Ideally) thread safe object pool in C

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
4
down vote
favorite
I have a helper struct called Scratch, that is a calculation 'Scratch Pad'. Generally, I need only one of these objects, but if I have nested calculations I'll need one scratch per level of nesting. And if I want to make the calculation thread safe then each thread will need one or more Scratch objects.
For simple cases a singleton Scratch object works fine and I'm able to achieve 38 steps per second. If I change from using the singleton to just instantiating a Scratch object each time I drop to 10 steps per second.
Using a non thread safe version of the Pool below (commenting out the mutex locks and unlocks) got me just about back up to 35-38ish steps per second. But the thread safe version as listed below knocks me down to 16 steps per second. So, in order to make threading worth it, I'd have to overcome that initial performance blow.
I guess theoretically two cores should be able to almost breakeven and 3 or more should be able to make the lock cost worth it, but I haven't tried that yet.
Any suggestions, comments, criticisms greatly appreciated.
typedef struct Pool
Scratch** scratches;
int n;
int i;
pthread_mutex_t mutex;
Pool;
Pool* pool_;
Pool* AEPoolCreate()
Pool* pool = (Pool*)malloc(sizeof(Pool));
pool->scratches = (Scratch**)malloc(sizeof(Scratch*));
pool->scratches[0] = AEScratchCreate();
pool->n = 1;
pool->i = 0;
pthread_mutex_init(&pool->mutex, NULL);
return pool;
Scratch* AEPoolGet(Pool* pool)
pthread_mutex_lock(&pool->mutex);
if (pool->i == pool->n)
pool->n *= 2;
printf("[ Pool ===================== ]n");
printf(" increased to a size of %dn", pool->n);
printf("[ =========================== ]nn");
pool->scratches = (Scratch**)realloc(pool->scratches, sizeof(Scratch*)*pool->n);
for (int i=pool->i;i<pool->n;i++)
pool->scratches[i] = AEScratchCreate();
Scratch* scratch = pool->scratches[pool->i++];
pthread_mutex_unlock(&pool->mutex);
return scratch;
void AEPoolPut(Pool* pool, Scratch* scratch)
pthread_mutex_lock(&pool->mutex);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
pool->i--;
pool->scratches[pool->i] = scratch;
pthread_mutex_unlock(&pool->mutex);
Here is the Scratch object as requested:
// Scratch =
typedef struct Scratch
Obj* stack;
byte sp;
byte cp;
byte vp;
Scratch;
Scratch* AEScratchCreate()
Scratch* scratch = (Scratch*)malloc(sizeof(Scratch));
scratch->stack = (Obj*)malloc(sizeof(Obj)*10);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
return scratch;
void AEScratchRelease(Scratch* scratch)
if (scratch == 0) return;
free(scratch->stack);
free(scratch);
// Dim =====
typedef union
long n;
double x;
void* p;
Dim;
// Obj =====
typedef struct
Dim a;
Dim b;
Dim c;
// byte type;
Obj;
performance c multithreading
add a comment |Â
up vote
4
down vote
favorite
I have a helper struct called Scratch, that is a calculation 'Scratch Pad'. Generally, I need only one of these objects, but if I have nested calculations I'll need one scratch per level of nesting. And if I want to make the calculation thread safe then each thread will need one or more Scratch objects.
For simple cases a singleton Scratch object works fine and I'm able to achieve 38 steps per second. If I change from using the singleton to just instantiating a Scratch object each time I drop to 10 steps per second.
Using a non thread safe version of the Pool below (commenting out the mutex locks and unlocks) got me just about back up to 35-38ish steps per second. But the thread safe version as listed below knocks me down to 16 steps per second. So, in order to make threading worth it, I'd have to overcome that initial performance blow.
I guess theoretically two cores should be able to almost breakeven and 3 or more should be able to make the lock cost worth it, but I haven't tried that yet.
Any suggestions, comments, criticisms greatly appreciated.
typedef struct Pool
Scratch** scratches;
int n;
int i;
pthread_mutex_t mutex;
Pool;
Pool* pool_;
Pool* AEPoolCreate()
Pool* pool = (Pool*)malloc(sizeof(Pool));
pool->scratches = (Scratch**)malloc(sizeof(Scratch*));
pool->scratches[0] = AEScratchCreate();
pool->n = 1;
pool->i = 0;
pthread_mutex_init(&pool->mutex, NULL);
return pool;
Scratch* AEPoolGet(Pool* pool)
pthread_mutex_lock(&pool->mutex);
if (pool->i == pool->n)
pool->n *= 2;
printf("[ Pool ===================== ]n");
printf(" increased to a size of %dn", pool->n);
printf("[ =========================== ]nn");
pool->scratches = (Scratch**)realloc(pool->scratches, sizeof(Scratch*)*pool->n);
for (int i=pool->i;i<pool->n;i++)
pool->scratches[i] = AEScratchCreate();
Scratch* scratch = pool->scratches[pool->i++];
pthread_mutex_unlock(&pool->mutex);
return scratch;
void AEPoolPut(Pool* pool, Scratch* scratch)
pthread_mutex_lock(&pool->mutex);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
pool->i--;
pool->scratches[pool->i] = scratch;
pthread_mutex_unlock(&pool->mutex);
Here is the Scratch object as requested:
// Scratch =
typedef struct Scratch
Obj* stack;
byte sp;
byte cp;
byte vp;
Scratch;
Scratch* AEScratchCreate()
Scratch* scratch = (Scratch*)malloc(sizeof(Scratch));
scratch->stack = (Obj*)malloc(sizeof(Obj)*10);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
return scratch;
void AEScratchRelease(Scratch* scratch)
if (scratch == 0) return;
free(scratch->stack);
free(scratch);
// Dim =====
typedef union
long n;
double x;
void* p;
Dim;
// Obj =====
typedef struct
Dim a;
Dim b;
Dim c;
// byte type;
Obj;
performance c multithreading
Can you also show theScratchtypedef? How big is it? Did you realize thatAEScratchCreate()is making a copy of that struct?
â JS1
Apr 9 at 9:38
@JS1 It's a few bytes. Yes, AEScratchCreate() is my Scratch constructor. I have added the Scratch definition to the question above. When I simply create and release a new Scratch object on each run, it drops to 10 steps per second, so it's pretty expensive. Thanks.
â aepryus
Apr 9 at 12:23
How often areAEPoolGetandAEPoolPutcalled per step?
â 1201ProgramAlarm
Apr 9 at 15:33
@1201ProgramAlarm Rough estimate.. IâÂÂd say about a million times per step. IâÂÂm running this on an iPhone 7. This is for a generalized function parser, but in this case IâÂÂm testing it on a 400x400 Game of Life grid. A scratch is needed for each expression and there are perhaps about 5 expressions per cell.
â aepryus
Apr 9 at 15:43
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I have a helper struct called Scratch, that is a calculation 'Scratch Pad'. Generally, I need only one of these objects, but if I have nested calculations I'll need one scratch per level of nesting. And if I want to make the calculation thread safe then each thread will need one or more Scratch objects.
For simple cases a singleton Scratch object works fine and I'm able to achieve 38 steps per second. If I change from using the singleton to just instantiating a Scratch object each time I drop to 10 steps per second.
Using a non thread safe version of the Pool below (commenting out the mutex locks and unlocks) got me just about back up to 35-38ish steps per second. But the thread safe version as listed below knocks me down to 16 steps per second. So, in order to make threading worth it, I'd have to overcome that initial performance blow.
I guess theoretically two cores should be able to almost breakeven and 3 or more should be able to make the lock cost worth it, but I haven't tried that yet.
Any suggestions, comments, criticisms greatly appreciated.
typedef struct Pool
Scratch** scratches;
int n;
int i;
pthread_mutex_t mutex;
Pool;
Pool* pool_;
Pool* AEPoolCreate()
Pool* pool = (Pool*)malloc(sizeof(Pool));
pool->scratches = (Scratch**)malloc(sizeof(Scratch*));
pool->scratches[0] = AEScratchCreate();
pool->n = 1;
pool->i = 0;
pthread_mutex_init(&pool->mutex, NULL);
return pool;
Scratch* AEPoolGet(Pool* pool)
pthread_mutex_lock(&pool->mutex);
if (pool->i == pool->n)
pool->n *= 2;
printf("[ Pool ===================== ]n");
printf(" increased to a size of %dn", pool->n);
printf("[ =========================== ]nn");
pool->scratches = (Scratch**)realloc(pool->scratches, sizeof(Scratch*)*pool->n);
for (int i=pool->i;i<pool->n;i++)
pool->scratches[i] = AEScratchCreate();
Scratch* scratch = pool->scratches[pool->i++];
pthread_mutex_unlock(&pool->mutex);
return scratch;
void AEPoolPut(Pool* pool, Scratch* scratch)
pthread_mutex_lock(&pool->mutex);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
pool->i--;
pool->scratches[pool->i] = scratch;
pthread_mutex_unlock(&pool->mutex);
Here is the Scratch object as requested:
// Scratch =
typedef struct Scratch
Obj* stack;
byte sp;
byte cp;
byte vp;
Scratch;
Scratch* AEScratchCreate()
Scratch* scratch = (Scratch*)malloc(sizeof(Scratch));
scratch->stack = (Obj*)malloc(sizeof(Obj)*10);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
return scratch;
void AEScratchRelease(Scratch* scratch)
if (scratch == 0) return;
free(scratch->stack);
free(scratch);
// Dim =====
typedef union
long n;
double x;
void* p;
Dim;
// Obj =====
typedef struct
Dim a;
Dim b;
Dim c;
// byte type;
Obj;
performance c multithreading
I have a helper struct called Scratch, that is a calculation 'Scratch Pad'. Generally, I need only one of these objects, but if I have nested calculations I'll need one scratch per level of nesting. And if I want to make the calculation thread safe then each thread will need one or more Scratch objects.
For simple cases a singleton Scratch object works fine and I'm able to achieve 38 steps per second. If I change from using the singleton to just instantiating a Scratch object each time I drop to 10 steps per second.
Using a non thread safe version of the Pool below (commenting out the mutex locks and unlocks) got me just about back up to 35-38ish steps per second. But the thread safe version as listed below knocks me down to 16 steps per second. So, in order to make threading worth it, I'd have to overcome that initial performance blow.
I guess theoretically two cores should be able to almost breakeven and 3 or more should be able to make the lock cost worth it, but I haven't tried that yet.
Any suggestions, comments, criticisms greatly appreciated.
typedef struct Pool
Scratch** scratches;
int n;
int i;
pthread_mutex_t mutex;
Pool;
Pool* pool_;
Pool* AEPoolCreate()
Pool* pool = (Pool*)malloc(sizeof(Pool));
pool->scratches = (Scratch**)malloc(sizeof(Scratch*));
pool->scratches[0] = AEScratchCreate();
pool->n = 1;
pool->i = 0;
pthread_mutex_init(&pool->mutex, NULL);
return pool;
Scratch* AEPoolGet(Pool* pool)
pthread_mutex_lock(&pool->mutex);
if (pool->i == pool->n)
pool->n *= 2;
printf("[ Pool ===================== ]n");
printf(" increased to a size of %dn", pool->n);
printf("[ =========================== ]nn");
pool->scratches = (Scratch**)realloc(pool->scratches, sizeof(Scratch*)*pool->n);
for (int i=pool->i;i<pool->n;i++)
pool->scratches[i] = AEScratchCreate();
Scratch* scratch = pool->scratches[pool->i++];
pthread_mutex_unlock(&pool->mutex);
return scratch;
void AEPoolPut(Pool* pool, Scratch* scratch)
pthread_mutex_lock(&pool->mutex);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
pool->i--;
pool->scratches[pool->i] = scratch;
pthread_mutex_unlock(&pool->mutex);
Here is the Scratch object as requested:
// Scratch =
typedef struct Scratch
Obj* stack;
byte sp;
byte cp;
byte vp;
Scratch;
Scratch* AEScratchCreate()
Scratch* scratch = (Scratch*)malloc(sizeof(Scratch));
scratch->stack = (Obj*)malloc(sizeof(Obj)*10);
scratch->cp = 0;
scratch->vp = 0;
scratch->sp = 0;
return scratch;
void AEScratchRelease(Scratch* scratch)
if (scratch == 0) return;
free(scratch->stack);
free(scratch);
// Dim =====
typedef union
long n;
double x;
void* p;
Dim;
// Obj =====
typedef struct
Dim a;
Dim b;
Dim c;
// byte type;
Obj;
performance c multithreading
edited Apr 9 at 12:20
asked Apr 8 at 1:13
aepryus
1235
1235
Can you also show theScratchtypedef? How big is it? Did you realize thatAEScratchCreate()is making a copy of that struct?
â JS1
Apr 9 at 9:38
@JS1 It's a few bytes. Yes, AEScratchCreate() is my Scratch constructor. I have added the Scratch definition to the question above. When I simply create and release a new Scratch object on each run, it drops to 10 steps per second, so it's pretty expensive. Thanks.
â aepryus
Apr 9 at 12:23
How often areAEPoolGetandAEPoolPutcalled per step?
â 1201ProgramAlarm
Apr 9 at 15:33
@1201ProgramAlarm Rough estimate.. IâÂÂd say about a million times per step. IâÂÂm running this on an iPhone 7. This is for a generalized function parser, but in this case IâÂÂm testing it on a 400x400 Game of Life grid. A scratch is needed for each expression and there are perhaps about 5 expressions per cell.
â aepryus
Apr 9 at 15:43
add a comment |Â
Can you also show theScratchtypedef? How big is it? Did you realize thatAEScratchCreate()is making a copy of that struct?
â JS1
Apr 9 at 9:38
@JS1 It's a few bytes. Yes, AEScratchCreate() is my Scratch constructor. I have added the Scratch definition to the question above. When I simply create and release a new Scratch object on each run, it drops to 10 steps per second, so it's pretty expensive. Thanks.
â aepryus
Apr 9 at 12:23
How often areAEPoolGetandAEPoolPutcalled per step?
â 1201ProgramAlarm
Apr 9 at 15:33
@1201ProgramAlarm Rough estimate.. IâÂÂd say about a million times per step. IâÂÂm running this on an iPhone 7. This is for a generalized function parser, but in this case IâÂÂm testing it on a 400x400 Game of Life grid. A scratch is needed for each expression and there are perhaps about 5 expressions per cell.
â aepryus
Apr 9 at 15:43
Can you also show the
Scratch typedef? How big is it? Did you realize that AEScratchCreate() is making a copy of that struct?â JS1
Apr 9 at 9:38
Can you also show the
Scratch typedef? How big is it? Did you realize that AEScratchCreate() is making a copy of that struct?â JS1
Apr 9 at 9:38
@JS1 It's a few bytes. Yes, AEScratchCreate() is my Scratch constructor. I have added the Scratch definition to the question above. When I simply create and release a new Scratch object on each run, it drops to 10 steps per second, so it's pretty expensive. Thanks.
â aepryus
Apr 9 at 12:23
@JS1 It's a few bytes. Yes, AEScratchCreate() is my Scratch constructor. I have added the Scratch definition to the question above. When I simply create and release a new Scratch object on each run, it drops to 10 steps per second, so it's pretty expensive. Thanks.
â aepryus
Apr 9 at 12:23
How often are
AEPoolGet and AEPoolPut called per step?â 1201ProgramAlarm
Apr 9 at 15:33
How often are
AEPoolGet and AEPoolPut called per step?â 1201ProgramAlarm
Apr 9 at 15:33
@1201ProgramAlarm Rough estimate.. IâÂÂd say about a million times per step. IâÂÂm running this on an iPhone 7. This is for a generalized function parser, but in this case IâÂÂm testing it on a 400x400 Game of Life grid. A scratch is needed for each expression and there are perhaps about 5 expressions per cell.
â aepryus
Apr 9 at 15:43
@1201ProgramAlarm Rough estimate.. IâÂÂd say about a million times per step. IâÂÂm running this on an iPhone 7. This is for a generalized function parser, but in this case IâÂÂm testing it on a 400x400 Game of Life grid. A scratch is needed for each expression and there are perhaps about 5 expressions per cell.
â aepryus
Apr 9 at 15:43
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
One small improvement would be to not acquire the lock in AEPoolPut until right before you access pool (i.e., clear out the fields in scratch before acquiring the lock).
There isn't really anything else I see that can change here that would help (all assuming the number of scratches required is more-or-less stable, so that AEPoolGet will quickly reach the point where it won't have to allocate any more scratches. There are some tweaks that might be applicable if you continually grow the number of pools.
In code the threads that use this, you can implement some sort of thread local scratch caching mechanism, so that if a thread finishes using a scratch object it'll hang on to it locally and reuse it. This will bypass all this code. If there are too many of these "free" scratch objects, you can return the excess back to the pool. Depending on your usage, you might only need to cache one scratch object.
Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think youâÂÂve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks!
â aepryus
Apr 9 at 16:00
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
One small improvement would be to not acquire the lock in AEPoolPut until right before you access pool (i.e., clear out the fields in scratch before acquiring the lock).
There isn't really anything else I see that can change here that would help (all assuming the number of scratches required is more-or-less stable, so that AEPoolGet will quickly reach the point where it won't have to allocate any more scratches. There are some tweaks that might be applicable if you continually grow the number of pools.
In code the threads that use this, you can implement some sort of thread local scratch caching mechanism, so that if a thread finishes using a scratch object it'll hang on to it locally and reuse it. This will bypass all this code. If there are too many of these "free" scratch objects, you can return the excess back to the pool. Depending on your usage, you might only need to cache one scratch object.
Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think youâÂÂve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks!
â aepryus
Apr 9 at 16:00
add a comment |Â
up vote
1
down vote
accepted
One small improvement would be to not acquire the lock in AEPoolPut until right before you access pool (i.e., clear out the fields in scratch before acquiring the lock).
There isn't really anything else I see that can change here that would help (all assuming the number of scratches required is more-or-less stable, so that AEPoolGet will quickly reach the point where it won't have to allocate any more scratches. There are some tweaks that might be applicable if you continually grow the number of pools.
In code the threads that use this, you can implement some sort of thread local scratch caching mechanism, so that if a thread finishes using a scratch object it'll hang on to it locally and reuse it. This will bypass all this code. If there are too many of these "free" scratch objects, you can return the excess back to the pool. Depending on your usage, you might only need to cache one scratch object.
Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think youâÂÂve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks!
â aepryus
Apr 9 at 16:00
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
One small improvement would be to not acquire the lock in AEPoolPut until right before you access pool (i.e., clear out the fields in scratch before acquiring the lock).
There isn't really anything else I see that can change here that would help (all assuming the number of scratches required is more-or-less stable, so that AEPoolGet will quickly reach the point where it won't have to allocate any more scratches. There are some tweaks that might be applicable if you continually grow the number of pools.
In code the threads that use this, you can implement some sort of thread local scratch caching mechanism, so that if a thread finishes using a scratch object it'll hang on to it locally and reuse it. This will bypass all this code. If there are too many of these "free" scratch objects, you can return the excess back to the pool. Depending on your usage, you might only need to cache one scratch object.
One small improvement would be to not acquire the lock in AEPoolPut until right before you access pool (i.e., clear out the fields in scratch before acquiring the lock).
There isn't really anything else I see that can change here that would help (all assuming the number of scratches required is more-or-less stable, so that AEPoolGet will quickly reach the point where it won't have to allocate any more scratches. There are some tweaks that might be applicable if you continually grow the number of pools.
In code the threads that use this, you can implement some sort of thread local scratch caching mechanism, so that if a thread finishes using a scratch object it'll hang on to it locally and reuse it. This will bypass all this code. If there are too many of these "free" scratch objects, you can return the excess back to the pool. Depending on your usage, you might only need to cache one scratch object.
answered Apr 9 at 15:51
1201ProgramAlarm
2,4752618
2,4752618
Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think youâÂÂve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks!
â aepryus
Apr 9 at 16:00
add a comment |Â
Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think youâÂÂve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks!
â aepryus
Apr 9 at 16:00
Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think youâÂÂve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks!
â aepryus
Apr 9 at 16:00
Ya, generally the number of scratches created is going to be very small. For my current test only one gets created. But I think youâÂÂve triggered the solution for me. I just need to store one Pool per thread and than I can remove the locks while maintaining thread safety and speed. Thanks!
â aepryus
Apr 9 at 16:00
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191504%2fa-fast-and-ideally-thread-safe-object-pool-in-c%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Can you also show the
Scratchtypedef? How big is it? Did you realize thatAEScratchCreate()is making a copy of that struct?â JS1
Apr 9 at 9:38
@JS1 It's a few bytes. Yes, AEScratchCreate() is my Scratch constructor. I have added the Scratch definition to the question above. When I simply create and release a new Scratch object on each run, it drops to 10 steps per second, so it's pretty expensive. Thanks.
â aepryus
Apr 9 at 12:23
How often are
AEPoolGetandAEPoolPutcalled per step?â 1201ProgramAlarm
Apr 9 at 15:33
@1201ProgramAlarm Rough estimate.. IâÂÂd say about a million times per step. IâÂÂm running this on an iPhone 7. This is for a generalized function parser, but in this case IâÂÂm testing it on a 400x400 Game of Life grid. A scratch is needed for each expression and there are perhaps about 5 expressions per cell.
â aepryus
Apr 9 at 15:43