Proper way to process the huge amount of data from a sensor using Parallel.For [closed]

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:
object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;
private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)
points2d = new Point2D[DATA_LENGTH];
points3d = new Point3D[DATA_LENGTH];
for (var index = 0; index < DATA_LENGTH; index++)
points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
private void OnDataArrived(object sendor, DataArrivedEventArgs e)
getPointsFromSensor(out Point2D points2d, out Point3D points3d);
int validDataCount = 0;
container.Clear(); // Remove old points
// reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
container.AddRange(BitConverter.GetBytes(validDataCount));
Parallel.For(0, DATA_LENGTH, index =>
int points2dX = points2d[index].X;
int points2dY = points2d[index].Y;
if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
lock (sync)
container.AddRange(BitConverter.GetBytes(points3d[index].X));
container.AddRange(BitConverter.GetBytes(points3d[index].Y));
container.AddRange(BitConverter.GetBytes(points3d[index].Z));
Interlocked.Increment(ref validDataCount);
);
var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];
The Point2D and Point3D are defined as follows:
class Point2D
public int X;
public int Y;
public Point2D(int x, int y)
X = x;
Y = y;
class Point3D
public double X;
public double Y;
public double Z;
public Point3D(double x, double y, double z)
X = x;
Y = y;
Z = z;
It turns out that the Parallel.For version is taking much time compared to simple for loop.
I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.
c#
closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â Heslacher, Hosch250, Toby Speight, t3chb0t, Mast
add a comment |Â
up vote
1
down vote
favorite
I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:
object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;
private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)
points2d = new Point2D[DATA_LENGTH];
points3d = new Point3D[DATA_LENGTH];
for (var index = 0; index < DATA_LENGTH; index++)
points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
private void OnDataArrived(object sendor, DataArrivedEventArgs e)
getPointsFromSensor(out Point2D points2d, out Point3D points3d);
int validDataCount = 0;
container.Clear(); // Remove old points
// reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
container.AddRange(BitConverter.GetBytes(validDataCount));
Parallel.For(0, DATA_LENGTH, index =>
int points2dX = points2d[index].X;
int points2dY = points2d[index].Y;
if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
lock (sync)
container.AddRange(BitConverter.GetBytes(points3d[index].X));
container.AddRange(BitConverter.GetBytes(points3d[index].Y));
container.AddRange(BitConverter.GetBytes(points3d[index].Z));
Interlocked.Increment(ref validDataCount);
);
var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];
The Point2D and Point3D are defined as follows:
class Point2D
public int X;
public int Y;
public Point2D(int x, int y)
X = x;
Y = y;
class Point3D
public double X;
public double Y;
public double Z;
public Point3D(double x, double y, double z)
X = x;
Y = y;
Z = z;
It turns out that the Parallel.For version is taking much time compared to simple for loop.
I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.
c#
closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â Heslacher, Hosch250, Toby Speight, t3chb0t, Mast
3
You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question.// Random values just in this code snippetjust makes your question bordering off topic.
â Heslacher
Jan 19 at 10:57
1
I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â Heslacher
Jan 19 at 15:17
2
Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â Heslacher
Jan 19 at 15:23
The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â Toby Speight
Jan 19 at 16:37
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:
object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;
private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)
points2d = new Point2D[DATA_LENGTH];
points3d = new Point3D[DATA_LENGTH];
for (var index = 0; index < DATA_LENGTH; index++)
points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
private void OnDataArrived(object sendor, DataArrivedEventArgs e)
getPointsFromSensor(out Point2D points2d, out Point3D points3d);
int validDataCount = 0;
container.Clear(); // Remove old points
// reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
container.AddRange(BitConverter.GetBytes(validDataCount));
Parallel.For(0, DATA_LENGTH, index =>
int points2dX = points2d[index].X;
int points2dY = points2d[index].Y;
if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
lock (sync)
container.AddRange(BitConverter.GetBytes(points3d[index].X));
container.AddRange(BitConverter.GetBytes(points3d[index].Y));
container.AddRange(BitConverter.GetBytes(points3d[index].Z));
Interlocked.Increment(ref validDataCount);
);
var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];
The Point2D and Point3D are defined as follows:
class Point2D
public int X;
public int Y;
public Point2D(int x, int y)
X = x;
Y = y;
class Point3D
public double X;
public double Y;
public double Z;
public Point3D(double x, double y, double z)
X = x;
Y = y;
Z = z;
It turns out that the Parallel.For version is taking much time compared to simple for loop.
I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.
c#
I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:
object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;
private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)
points2d = new Point2D[DATA_LENGTH];
points3d = new Point3D[DATA_LENGTH];
for (var index = 0; index < DATA_LENGTH; index++)
points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
private void OnDataArrived(object sendor, DataArrivedEventArgs e)
getPointsFromSensor(out Point2D points2d, out Point3D points3d);
int validDataCount = 0;
container.Clear(); // Remove old points
// reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
container.AddRange(BitConverter.GetBytes(validDataCount));
Parallel.For(0, DATA_LENGTH, index =>
int points2dX = points2d[index].X;
int points2dY = points2d[index].Y;
if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
lock (sync)
container.AddRange(BitConverter.GetBytes(points3d[index].X));
container.AddRange(BitConverter.GetBytes(points3d[index].Y));
container.AddRange(BitConverter.GetBytes(points3d[index].Z));
Interlocked.Increment(ref validDataCount);
);
var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];
The Point2D and Point3D are defined as follows:
class Point2D
public int X;
public int Y;
public Point2D(int x, int y)
X = x;
Y = y;
class Point3D
public double X;
public double Y;
public double Z;
public Point3D(double x, double y, double z)
X = x;
Y = y;
Z = z;
It turns out that the Parallel.For version is taking much time compared to simple for loop.
I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.
c#
edited Jan 26 at 8:30
Nikita B
12.3k11652
12.3k11652
asked Jan 19 at 10:21
Ravi Joshi
18516
18516
closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â Heslacher, Hosch250, Toby Speight, t3chb0t, Mast
closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â Heslacher, Hosch250, Toby Speight, t3chb0t, Mast
3
You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question.// Random values just in this code snippetjust makes your question bordering off topic.
â Heslacher
Jan 19 at 10:57
1
I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â Heslacher
Jan 19 at 15:17
2
Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â Heslacher
Jan 19 at 15:23
The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â Toby Speight
Jan 19 at 16:37
add a comment |Â
3
You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question.// Random values just in this code snippetjust makes your question bordering off topic.
â Heslacher
Jan 19 at 10:57
1
I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â Heslacher
Jan 19 at 15:17
2
Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â Heslacher
Jan 19 at 15:23
The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â Toby Speight
Jan 19 at 16:37
3
3
You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question.
// Random values just in this code snippet just makes your question bordering off topic.â Heslacher
Jan 19 at 10:57
You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question.
// Random values just in this code snippet just makes your question bordering off topic.â Heslacher
Jan 19 at 10:57
1
1
I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â Heslacher
Jan 19 at 15:17
I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â Heslacher
Jan 19 at 15:17
2
2
Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â Heslacher
Jan 19 at 15:23
Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â Heslacher
Jan 19 at 15:23
The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â Toby Speight
Jan 19 at 16:37
The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â Toby Speight
Jan 19 at 16:37
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
5
down vote
Usually it makes absolutely no sense to use a parallel loop with a lock inside it. The performance loss you are talking about comes most probably from it. They cannot run in parallel because they have to wait for each other. The lock also costs some time and if it happens to often, the parallel loop might actually be slower then the normal one.
You might gain some performance if you redesign it in a way that does not requrie synchronization.
add a comment |Â
up vote
5
down vote
There are three rules I find useful when working with parallel loops:
Do not lock inside the loop. (already explained in t3chb0t's answer)
Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.
Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using
ArraySegmentclass, for example), and useParallel.Forto iterate over array of those chunks.
Following those three rules should give you satisfactory results.
I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried usingConcurrentBagbut didn't see any performance improvements. Any suggestions?
â Ravi Joshi
Jan 19 at 12:09
Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create abyte[DATA_LENGTH*8]array and write directly into it using indexes without locks. Calculate write indexes asindex*8, so they do not overlap.
â Nikita B
Jan 19 at 12:17
I see. I got your point but there is one problem. Some of the elements of the arraybyte[DATA_LENGTH * 8]may be empty, since someindexcan generate invalid value and hence the conditional statementifis going to returnfalse.
â Ravi Joshi
Jan 19 at 12:30
Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â Pieter Witvoet
Jan 19 at 12:39
Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloadedParallel.Formethod that I linked in my answer.
â Nikita B
Jan 19 at 12:45
 |Â
show 2 more comments
up vote
2
down vote
In addition to the other answers, it's worth noting that you're doing very little work inside the parallel.for - grabbing two random values, performing a few comparison, getting some bytes and performing some add-range and additions.
These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.
Then its worth noting, as other answers have, that you have locked some of the slower operations - meaning that it'll run that part at no faster than single thread speed. Having a lock inside a parallel operation isn't necessarily bad (e.g. if it's only a very small fraction of that operations work) but by definition it adds a bottleneck.
When working on small, higher performance loops like this I try to do the following:
1) Have each thread work on its own data, and synchronize at the end (or in a thread-safe, high-speed manner like interlocked)
2) Batch the data into appropriately sized chunks before passing to the thread.
For example:
var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
//review, can easily be off by one due to division
for (int index = 0; index<( DataLength/threadCount); index++)
// Random values just in this code snippet
int config_1 = random.Next(-index, index);
int config_2 = random.Next(-index, index);
if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
//this *may* be threadsafe, but worth reviewing to be sure.
threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
Interlocked.Increment(ref validDataCount);
);
var container = containers.SelectMany(x=>x).ToList();
container.AddRange(BitConverter.GetBytes(validDataCount));
Of course, this can get tedious, So do consider other parallel tools and patterns to assist. For example PLinq might be a good example in this case, because it can do the batching behind the scenes and help manage the creation and flow of state between the threads (whereas we had to do that manually here.
Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The lineint config_1 = random.Next(-index, index)isint config_1 = sensor_data(index)in the real code, where the length ofsensor_dataisDataLength. In order to get allint config_1, I iterate over completeDataLength. Hence0 < index < DATA_LENGTH. Ques: I see that in your suggestion0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over(DATA_LENGTH/threadCount)portion of data.
â Ravi Joshi
Jan 19 at 15:11
Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â Ravi Joshi
Jan 19 at 16:06
The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like:for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++)(please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â¦
â NPSF3000
Jan 19 at 16:11
make sense. i will check the values and get back to you later. thank you very much.
â Ravi Joshi
Jan 19 at 16:31
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
Usually it makes absolutely no sense to use a parallel loop with a lock inside it. The performance loss you are talking about comes most probably from it. They cannot run in parallel because they have to wait for each other. The lock also costs some time and if it happens to often, the parallel loop might actually be slower then the normal one.
You might gain some performance if you redesign it in a way that does not requrie synchronization.
add a comment |Â
up vote
5
down vote
Usually it makes absolutely no sense to use a parallel loop with a lock inside it. The performance loss you are talking about comes most probably from it. They cannot run in parallel because they have to wait for each other. The lock also costs some time and if it happens to often, the parallel loop might actually be slower then the normal one.
You might gain some performance if you redesign it in a way that does not requrie synchronization.
add a comment |Â
up vote
5
down vote
up vote
5
down vote
Usually it makes absolutely no sense to use a parallel loop with a lock inside it. The performance loss you are talking about comes most probably from it. They cannot run in parallel because they have to wait for each other. The lock also costs some time and if it happens to often, the parallel loop might actually be slower then the normal one.
You might gain some performance if you redesign it in a way that does not requrie synchronization.
Usually it makes absolutely no sense to use a parallel loop with a lock inside it. The performance loss you are talking about comes most probably from it. They cannot run in parallel because they have to wait for each other. The lock also costs some time and if it happens to often, the parallel loop might actually be slower then the normal one.
You might gain some performance if you redesign it in a way that does not requrie synchronization.
answered Jan 19 at 10:51
t3chb0t
32.1k54195
32.1k54195
add a comment |Â
add a comment |Â
up vote
5
down vote
There are three rules I find useful when working with parallel loops:
Do not lock inside the loop. (already explained in t3chb0t's answer)
Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.
Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using
ArraySegmentclass, for example), and useParallel.Forto iterate over array of those chunks.
Following those three rules should give you satisfactory results.
I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried usingConcurrentBagbut didn't see any performance improvements. Any suggestions?
â Ravi Joshi
Jan 19 at 12:09
Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create abyte[DATA_LENGTH*8]array and write directly into it using indexes without locks. Calculate write indexes asindex*8, so they do not overlap.
â Nikita B
Jan 19 at 12:17
I see. I got your point but there is one problem. Some of the elements of the arraybyte[DATA_LENGTH * 8]may be empty, since someindexcan generate invalid value and hence the conditional statementifis going to returnfalse.
â Ravi Joshi
Jan 19 at 12:30
Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â Pieter Witvoet
Jan 19 at 12:39
Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloadedParallel.Formethod that I linked in my answer.
â Nikita B
Jan 19 at 12:45
 |Â
show 2 more comments
up vote
5
down vote
There are three rules I find useful when working with parallel loops:
Do not lock inside the loop. (already explained in t3chb0t's answer)
Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.
Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using
ArraySegmentclass, for example), and useParallel.Forto iterate over array of those chunks.
Following those three rules should give you satisfactory results.
I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried usingConcurrentBagbut didn't see any performance improvements. Any suggestions?
â Ravi Joshi
Jan 19 at 12:09
Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create abyte[DATA_LENGTH*8]array and write directly into it using indexes without locks. Calculate write indexes asindex*8, so they do not overlap.
â Nikita B
Jan 19 at 12:17
I see. I got your point but there is one problem. Some of the elements of the arraybyte[DATA_LENGTH * 8]may be empty, since someindexcan generate invalid value and hence the conditional statementifis going to returnfalse.
â Ravi Joshi
Jan 19 at 12:30
Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â Pieter Witvoet
Jan 19 at 12:39
Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloadedParallel.Formethod that I linked in my answer.
â Nikita B
Jan 19 at 12:45
 |Â
show 2 more comments
up vote
5
down vote
up vote
5
down vote
There are three rules I find useful when working with parallel loops:
Do not lock inside the loop. (already explained in t3chb0t's answer)
Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.
Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using
ArraySegmentclass, for example), and useParallel.Forto iterate over array of those chunks.
Following those three rules should give you satisfactory results.
There are three rules I find useful when working with parallel loops:
Do not lock inside the loop. (already explained in t3chb0t's answer)
Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.
Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using
ArraySegmentclass, for example), and useParallel.Forto iterate over array of those chunks.
Following those three rules should give you satisfactory results.
edited Jan 19 at 13:09
Konrad Rudolph
4,8531431
4,8531431
answered Jan 19 at 11:04
Nikita B
12.3k11652
12.3k11652
I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried usingConcurrentBagbut didn't see any performance improvements. Any suggestions?
â Ravi Joshi
Jan 19 at 12:09
Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create abyte[DATA_LENGTH*8]array and write directly into it using indexes without locks. Calculate write indexes asindex*8, so they do not overlap.
â Nikita B
Jan 19 at 12:17
I see. I got your point but there is one problem. Some of the elements of the arraybyte[DATA_LENGTH * 8]may be empty, since someindexcan generate invalid value and hence the conditional statementifis going to returnfalse.
â Ravi Joshi
Jan 19 at 12:30
Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â Pieter Witvoet
Jan 19 at 12:39
Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloadedParallel.Formethod that I linked in my answer.
â Nikita B
Jan 19 at 12:45
 |Â
show 2 more comments
I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried usingConcurrentBagbut didn't see any performance improvements. Any suggestions?
â Ravi Joshi
Jan 19 at 12:09
Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create abyte[DATA_LENGTH*8]array and write directly into it using indexes without locks. Calculate write indexes asindex*8, so they do not overlap.
â Nikita B
Jan 19 at 12:17
I see. I got your point but there is one problem. Some of the elements of the arraybyte[DATA_LENGTH * 8]may be empty, since someindexcan generate invalid value and hence the conditional statementifis going to returnfalse.
â Ravi Joshi
Jan 19 at 12:30
Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â Pieter Witvoet
Jan 19 at 12:39
Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloadedParallel.Formethod that I linked in my answer.
â Nikita B
Jan 19 at 12:45
I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using
ConcurrentBag but didn't see any performance improvements. Any suggestions?â Ravi Joshi
Jan 19 at 12:09
I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using
ConcurrentBag but didn't see any performance improvements. Any suggestions?â Ravi Joshi
Jan 19 at 12:09
Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a
byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.â Nikita B
Jan 19 at 12:17
Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a
byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.â Nikita B
Jan 19 at 12:17
I see. I got your point but there is one problem. Some of the elements of the array
byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.â Ravi Joshi
Jan 19 at 12:30
I see. I got your point but there is one problem. Some of the elements of the array
byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.â Ravi Joshi
Jan 19 at 12:30
Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â Pieter Witvoet
Jan 19 at 12:39
Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â Pieter Witvoet
Jan 19 at 12:39
Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded
Parallel.For method that I linked in my answer.â Nikita B
Jan 19 at 12:45
Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded
Parallel.For method that I linked in my answer.â Nikita B
Jan 19 at 12:45
 |Â
show 2 more comments
up vote
2
down vote
In addition to the other answers, it's worth noting that you're doing very little work inside the parallel.for - grabbing two random values, performing a few comparison, getting some bytes and performing some add-range and additions.
These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.
Then its worth noting, as other answers have, that you have locked some of the slower operations - meaning that it'll run that part at no faster than single thread speed. Having a lock inside a parallel operation isn't necessarily bad (e.g. if it's only a very small fraction of that operations work) but by definition it adds a bottleneck.
When working on small, higher performance loops like this I try to do the following:
1) Have each thread work on its own data, and synchronize at the end (or in a thread-safe, high-speed manner like interlocked)
2) Batch the data into appropriately sized chunks before passing to the thread.
For example:
var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
//review, can easily be off by one due to division
for (int index = 0; index<( DataLength/threadCount); index++)
// Random values just in this code snippet
int config_1 = random.Next(-index, index);
int config_2 = random.Next(-index, index);
if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
//this *may* be threadsafe, but worth reviewing to be sure.
threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
Interlocked.Increment(ref validDataCount);
);
var container = containers.SelectMany(x=>x).ToList();
container.AddRange(BitConverter.GetBytes(validDataCount));
Of course, this can get tedious, So do consider other parallel tools and patterns to assist. For example PLinq might be a good example in this case, because it can do the batching behind the scenes and help manage the creation and flow of state between the threads (whereas we had to do that manually here.
Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The lineint config_1 = random.Next(-index, index)isint config_1 = sensor_data(index)in the real code, where the length ofsensor_dataisDataLength. In order to get allint config_1, I iterate over completeDataLength. Hence0 < index < DATA_LENGTH. Ques: I see that in your suggestion0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over(DATA_LENGTH/threadCount)portion of data.
â Ravi Joshi
Jan 19 at 15:11
Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â Ravi Joshi
Jan 19 at 16:06
The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like:for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++)(please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â¦
â NPSF3000
Jan 19 at 16:11
make sense. i will check the values and get back to you later. thank you very much.
â Ravi Joshi
Jan 19 at 16:31
add a comment |Â
up vote
2
down vote
In addition to the other answers, it's worth noting that you're doing very little work inside the parallel.for - grabbing two random values, performing a few comparison, getting some bytes and performing some add-range and additions.
These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.
Then its worth noting, as other answers have, that you have locked some of the slower operations - meaning that it'll run that part at no faster than single thread speed. Having a lock inside a parallel operation isn't necessarily bad (e.g. if it's only a very small fraction of that operations work) but by definition it adds a bottleneck.
When working on small, higher performance loops like this I try to do the following:
1) Have each thread work on its own data, and synchronize at the end (or in a thread-safe, high-speed manner like interlocked)
2) Batch the data into appropriately sized chunks before passing to the thread.
For example:
var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
//review, can easily be off by one due to division
for (int index = 0; index<( DataLength/threadCount); index++)
// Random values just in this code snippet
int config_1 = random.Next(-index, index);
int config_2 = random.Next(-index, index);
if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
//this *may* be threadsafe, but worth reviewing to be sure.
threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
Interlocked.Increment(ref validDataCount);
);
var container = containers.SelectMany(x=>x).ToList();
container.AddRange(BitConverter.GetBytes(validDataCount));
Of course, this can get tedious, So do consider other parallel tools and patterns to assist. For example PLinq might be a good example in this case, because it can do the batching behind the scenes and help manage the creation and flow of state between the threads (whereas we had to do that manually here.
Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The lineint config_1 = random.Next(-index, index)isint config_1 = sensor_data(index)in the real code, where the length ofsensor_dataisDataLength. In order to get allint config_1, I iterate over completeDataLength. Hence0 < index < DATA_LENGTH. Ques: I see that in your suggestion0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over(DATA_LENGTH/threadCount)portion of data.
â Ravi Joshi
Jan 19 at 15:11
Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â Ravi Joshi
Jan 19 at 16:06
The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like:for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++)(please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â¦
â NPSF3000
Jan 19 at 16:11
make sense. i will check the values and get back to you later. thank you very much.
â Ravi Joshi
Jan 19 at 16:31
add a comment |Â
up vote
2
down vote
up vote
2
down vote
In addition to the other answers, it's worth noting that you're doing very little work inside the parallel.for - grabbing two random values, performing a few comparison, getting some bytes and performing some add-range and additions.
These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.
Then its worth noting, as other answers have, that you have locked some of the slower operations - meaning that it'll run that part at no faster than single thread speed. Having a lock inside a parallel operation isn't necessarily bad (e.g. if it's only a very small fraction of that operations work) but by definition it adds a bottleneck.
When working on small, higher performance loops like this I try to do the following:
1) Have each thread work on its own data, and synchronize at the end (or in a thread-safe, high-speed manner like interlocked)
2) Batch the data into appropriately sized chunks before passing to the thread.
For example:
var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
//review, can easily be off by one due to division
for (int index = 0; index<( DataLength/threadCount); index++)
// Random values just in this code snippet
int config_1 = random.Next(-index, index);
int config_2 = random.Next(-index, index);
if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
//this *may* be threadsafe, but worth reviewing to be sure.
threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
Interlocked.Increment(ref validDataCount);
);
var container = containers.SelectMany(x=>x).ToList();
container.AddRange(BitConverter.GetBytes(validDataCount));
Of course, this can get tedious, So do consider other parallel tools and patterns to assist. For example PLinq might be a good example in this case, because it can do the batching behind the scenes and help manage the creation and flow of state between the threads (whereas we had to do that manually here.
In addition to the other answers, it's worth noting that you're doing very little work inside the parallel.for - grabbing two random values, performing a few comparison, getting some bytes and performing some add-range and additions.
These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.
Then its worth noting, as other answers have, that you have locked some of the slower operations - meaning that it'll run that part at no faster than single thread speed. Having a lock inside a parallel operation isn't necessarily bad (e.g. if it's only a very small fraction of that operations work) but by definition it adds a bottleneck.
When working on small, higher performance loops like this I try to do the following:
1) Have each thread work on its own data, and synchronize at the end (or in a thread-safe, high-speed manner like interlocked)
2) Batch the data into appropriately sized chunks before passing to the thread.
For example:
var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
//review, can easily be off by one due to division
for (int index = 0; index<( DataLength/threadCount); index++)
// Random values just in this code snippet
int config_1 = random.Next(-index, index);
int config_2 = random.Next(-index, index);
if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
//this *may* be threadsafe, but worth reviewing to be sure.
threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
Interlocked.Increment(ref validDataCount);
);
var container = containers.SelectMany(x=>x).ToList();
container.AddRange(BitConverter.GetBytes(validDataCount));
Of course, this can get tedious, So do consider other parallel tools and patterns to assist. For example PLinq might be a good example in this case, because it can do the batching behind the scenes and help manage the creation and flow of state between the threads (whereas we had to do that manually here.
answered Jan 19 at 14:24
NPSF3000
60539
60539
Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The lineint config_1 = random.Next(-index, index)isint config_1 = sensor_data(index)in the real code, where the length ofsensor_dataisDataLength. In order to get allint config_1, I iterate over completeDataLength. Hence0 < index < DATA_LENGTH. Ques: I see that in your suggestion0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over(DATA_LENGTH/threadCount)portion of data.
â Ravi Joshi
Jan 19 at 15:11
Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â Ravi Joshi
Jan 19 at 16:06
The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like:for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++)(please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â¦
â NPSF3000
Jan 19 at 16:11
make sense. i will check the values and get back to you later. thank you very much.
â Ravi Joshi
Jan 19 at 16:31
add a comment |Â
Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The lineint config_1 = random.Next(-index, index)isint config_1 = sensor_data(index)in the real code, where the length ofsensor_dataisDataLength. In order to get allint config_1, I iterate over completeDataLength. Hence0 < index < DATA_LENGTH. Ques: I see that in your suggestion0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over(DATA_LENGTH/threadCount)portion of data.
â Ravi Joshi
Jan 19 at 15:11
Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â Ravi Joshi
Jan 19 at 16:06
The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like:for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++)(please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â¦
â NPSF3000
Jan 19 at 16:11
make sense. i will check the values and get back to you later. thank you very much.
â Ravi Joshi
Jan 19 at 16:31
Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line
int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.â Ravi Joshi
Jan 19 at 15:11
Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line
int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.â Ravi Joshi
Jan 19 at 15:11
Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â Ravi Joshi
Jan 19 at 16:06
Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â Ravi Joshi
Jan 19 at 16:06
The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like:
for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â¦â NPSF3000
Jan 19 at 16:11
The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like:
for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â¦â NPSF3000
Jan 19 at 16:11
make sense. i will check the values and get back to you later. thank you very much.
â Ravi Joshi
Jan 19 at 16:31
make sense. i will check the values and get back to you later. thank you very much.
â Ravi Joshi
Jan 19 at 16:31
add a comment |Â
3
You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question.
// Random values just in this code snippetjust makes your question bordering off topic.â Heslacher
Jan 19 at 10:57
1
I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â Heslacher
Jan 19 at 15:17
2
Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â Heslacher
Jan 19 at 15:23
The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â Toby Speight
Jan 19 at 16:37