Proper way to process the huge amount of data from a sensor using Parallel.For [closed]

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
1
down vote

favorite

I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:

object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;

private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)

 points2d = new Point2D[DATA_LENGTH];
 points3d = new Point3D[DATA_LENGTH];
 for (var index = 0; index < DATA_LENGTH; index++)
 
 points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
 points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
 


private void OnDataArrived(object sendor, DataArrivedEventArgs e)

 getPointsFromSensor(out Point2D points2d, out Point3D points3d);

 int validDataCount = 0;

 container.Clear(); // Remove old points
 // reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
 container.AddRange(BitConverter.GetBytes(validDataCount));

 Parallel.For(0, DATA_LENGTH, index =>
 
 int points2dX = points2d[index].X;
 int points2dY = points2d[index].Y;

 if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
 
 lock (sync)
 
 container.AddRange(BitConverter.GetBytes(points3d[index].X));
 container.AddRange(BitConverter.GetBytes(points3d[index].Y));
 container.AddRange(BitConverter.GetBytes(points3d[index].Z));
 
 Interlocked.Increment(ref validDataCount);
 
 );

 var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
 container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
 container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];

The Point2D and Point3D are defined as follows:

class Point2D

 public int X;
 public int Y;

 public Point2D(int x, int y)
 
 X = x;
 Y = y;
 


class Point3D

 public double X;
 public double Y;
 public double Z;

 public Point3D(double x, double y, double z)
 
 X = x;
 Y = y;
 Z = z;

It turns out that the Parallel.For version is taking much time compared to simple for loop.

I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.

edited Jan 26 at 8:30

Nikita B

12.3k11652

asked Jan 19 at 10:21

Ravi Joshi

18516

closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â€“ Heslacher, Hosch250, Toby Speight, t3chb0t, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question. // Random values just in this code snippet just makes your question bordering off topic.
â€“Â Heslacher
Jan 19 at 10:57

1

I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â€“Â Heslacher
Jan 19 at 15:17

2

Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 19 at 15:23

The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â€“Â Toby Speight
Jan 19 at 16:37

add a commentÂ |Â

up vote
1
down vote

favorite

I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:

object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;

private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)

 points2d = new Point2D[DATA_LENGTH];
 points3d = new Point3D[DATA_LENGTH];
 for (var index = 0; index < DATA_LENGTH; index++)
 
 points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
 points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
 


private void OnDataArrived(object sendor, DataArrivedEventArgs e)

 getPointsFromSensor(out Point2D points2d, out Point3D points3d);

 int validDataCount = 0;

 container.Clear(); // Remove old points
 // reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
 container.AddRange(BitConverter.GetBytes(validDataCount));

 Parallel.For(0, DATA_LENGTH, index =>
 
 int points2dX = points2d[index].X;
 int points2dY = points2d[index].Y;

 if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
 
 lock (sync)
 
 container.AddRange(BitConverter.GetBytes(points3d[index].X));
 container.AddRange(BitConverter.GetBytes(points3d[index].Y));
 container.AddRange(BitConverter.GetBytes(points3d[index].Z));
 
 Interlocked.Increment(ref validDataCount);
 
 );

 var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
 container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
 container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];

The Point2D and Point3D are defined as follows:

class Point2D

 public int X;
 public int Y;

 public Point2D(int x, int y)
 
 X = x;
 Y = y;
 


class Point3D

 public double X;
 public double Y;
 public double Z;

 public Point3D(double x, double y, double z)
 
 X = x;
 Y = y;
 Z = z;

It turns out that the Parallel.For version is taking much time compared to simple for loop.

I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.

edited Jan 26 at 8:30

Nikita B

12.3k11652

asked Jan 19 at 10:21

Ravi Joshi

18516

closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â€“ Heslacher, Hosch250, Toby Speight, t3chb0t, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question. // Random values just in this code snippet just makes your question bordering off topic.
â€“Â Heslacher
Jan 19 at 10:57

1

I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â€“Â Heslacher
Jan 19 at 15:17

2

Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 19 at 15:23

The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â€“Â Toby Speight
Jan 19 at 16:37

add a commentÂ |Â

up vote
1
down vote

favorite

I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:

object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;

private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)

 points2d = new Point2D[DATA_LENGTH];
 points3d = new Point3D[DATA_LENGTH];
 for (var index = 0; index < DATA_LENGTH; index++)
 
 points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
 points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
 


private void OnDataArrived(object sendor, DataArrivedEventArgs e)

 getPointsFromSensor(out Point2D points2d, out Point3D points3d);

 int validDataCount = 0;

 container.Clear(); // Remove old points
 // reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
 container.AddRange(BitConverter.GetBytes(validDataCount));

 Parallel.For(0, DATA_LENGTH, index =>
 
 int points2dX = points2d[index].X;
 int points2dY = points2d[index].Y;

 if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
 
 lock (sync)
 
 container.AddRange(BitConverter.GetBytes(points3d[index].X));
 container.AddRange(BitConverter.GetBytes(points3d[index].Y));
 container.AddRange(BitConverter.GetBytes(points3d[index].Z));
 
 Interlocked.Increment(ref validDataCount);
 
 );

 var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
 container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
 container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];

The Point2D and Point3D are defined as follows:

class Point2D

 public int X;
 public int Y;

 public Point2D(int x, int y)
 
 X = x;
 Y = y;
 


class Point3D

 public double X;
 public double Y;
 public double Z;

 public Point3D(double x, double y, double z)
 
 X = x;
 Y = y;
 Z = z;

It turns out that the Parallel.For version is taking much time compared to simple for loop.

I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.

edited Jan 26 at 8:30

Nikita B

12.3k11652

asked Jan 19 at 10:21

Ravi Joshi

18516

I am getting the huge amount of data from a sensor. In order to speed up data processing, I thought of using Parallel.For. Below is the sample code:

object sync = new object();
private Random random = new Random();
private List<byte> container = new List<byte>();
private readonly int FRAME_WIDTH = 1000;
private readonly int DATA_LENGTH = 217088;
private readonly int FRAME_HEIGHT = 800;

private void getPointsFromSensor(out Point2D points2d, out Point3D points3d)

 points2d = new Point2D[DATA_LENGTH];
 points3d = new Point3D[DATA_LENGTH];
 for (var index = 0; index < DATA_LENGTH; index++)
 
 points2d[index] = new Point2D(random.Next(-index, index), random.Next(-index, index));
 points3d[index] = new Point3D(random.NextDouble(), random.NextDouble(), random.NextDouble());
 


private void OnDataArrived(object sendor, DataArrivedEventArgs e)

 getPointsFromSensor(out Point2D points2d, out Point3D points3d);

 int validDataCount = 0;

 container.Clear(); // Remove old points
 // reserving 4 bytes for storing 'validDataCount' and we are going to modify it later
 container.AddRange(BitConverter.GetBytes(validDataCount));

 Parallel.For(0, DATA_LENGTH, index =>
 
 int points2dX = points2d[index].X;
 int points2dY = points2d[index].Y;

 if (points2dX >= 0 && points2dX < FRAME_WIDTH && points2dY >= 0 && points2dY < FRAME_HEIGHT)
 
 lock (sync)
 
 container.AddRange(BitConverter.GetBytes(points3d[index].X));
 container.AddRange(BitConverter.GetBytes(points3d[index].Y));
 container.AddRange(BitConverter.GetBytes(points3d[index].Z));
 
 Interlocked.Increment(ref validDataCount);
 
 );

 var validDataCountBytes = BitConverter.GetBytes(validDataCount);//4 bytes
 container[0] = validDataCountBytes[0]; container[1] = validDataCountBytes[1];
 container[2] = validDataCountBytes[2]; container[3] = validDataCountBytes[3];

The Point2D and Point3D are defined as follows:

class Point2D

 public int X;
 public int Y;

 public Point2D(int x, int y)
 
 X = x;
 Y = y;
 


class Point3D

 public double X;
 public double Y;
 public double Z;

 public Point3D(double x, double y, double z)
 
 X = x;
 Y = y;
 Z = z;

It turns out that the Parallel.For version is taking much time compared to simple for loop.

I am wondering if there exists is a better way to use Parallel.For inside a callback function in C#.

edited Jan 26 at 8:30

Nikita B

12.3k11652

asked Jan 19 at 10:21

Ravi Joshi

18516

edited Jan 26 at 8:30

Nikita B

12.3k11652

edited Jan 26 at 8:30

Nikita B

12.3k11652

edited Jan 26 at 8:30

Nikita B

12.3k11652

asked Jan 19 at 10:21

Ravi Joshi

18516

asked Jan 19 at 10:21

Ravi Joshi

18516

asked Jan 19 at 10:21

Ravi Joshi

18516

closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â€“ Heslacher, Hosch250, Toby Speight, t3chb0t, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by Heslacher, Hosch250, Toby Speight, t3chb0t, Mast Jan 20 at 22:28

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions must involve real code that you own or maintain. Pseudocode, hypothetical code, or stub code should be replaced by a concrete implementation. Questions seeking an explanation of someone else's code are also off-topic." â€“ Heslacher, Hosch250, Toby Speight, t3chb0t, Mast

If this question can be reworded to fit the rules in the help center, please edit the question.

3

You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question. // Random values just in this code snippet just makes your question bordering off topic.
â€“Â Heslacher
Jan 19 at 10:57

1

I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â€“Â Heslacher
Jan 19 at 15:17

2

Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 19 at 15:23

The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â€“Â Toby Speight
Jan 19 at 16:37

add a commentÂ |Â

3

You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question. // Random values just in this code snippet just makes your question bordering off topic.
â€“Â Heslacher
Jan 19 at 10:57

1

I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â€“Â Heslacher
Jan 19 at 15:17

2

Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 19 at 15:23

The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â€“Â Toby Speight
Jan 19 at 16:37

You should post your real code otherwise we can't really help you. Take the code directly from your IDE and paste it into your question. // Random values just in this code snippet just makes your question bordering off topic.
â€“Â Heslacher
Jan 19 at 10:57

I have now voted to close because of example code. Two comments from you that this isn't your real actual code is reason enough.
â€“Â Heslacher
Jan 19 at 15:17

Now you aren't allowed anymore because you already received answer. See: what you may and may not do after receiving answers.
â€“Â Heslacher
Jan 19 at 15:23

The current question title, which states your concerns about the code, applies to too many questions on this site to be useful. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
â€“Â Toby Speight
Jan 19 at 16:37

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
5
down vote

Usually it makes absolutely no sense to use a parallel loop with a lock inside it. The performance loss you are talking about comes most probably from it. They cannot run in parallel because they have to wait for each other. The lock also costs some time and if it happens to often, the parallel loop might actually be slower then the normal one.

You might gain some performance if you redesign it in a way that does not requrie synchronization.

answered Jan 19 at 10:51

t3chb0t

32.1k54195

add a commentÂ |Â

up vote
5
down vote

There are three rules I find useful when working with parallel loops:

Do not lock inside the loop. (already explained in t3chb0t's answer)

Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.

Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using ArraySegment class, for example), and use Parallel.For to iterate over array of those chunks.

Following those three rules should give you satisfactory results.

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

answered Jan 19 at 11:04

Nikita B

12.3k11652

I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using ConcurrentBag but didn't see any performance improvements. Any suggestions?
â€“Â Ravi Joshi
Jan 19 at 12:09

Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.
â€“Â Nikita B
Jan 19 at 12:17

I see. I got your point but there is one problem. Some of the elements of the array byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.
â€“Â Ravi Joshi
Jan 19 at 12:30

Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â€“Â Pieter Witvoet
Jan 19 at 12:39

Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded Parallel.For method that I linked in my answer.
â€“Â Nikita B
Jan 19 at 12:45

Â |Â
show 2 more comments

up vote
2
down vote

In addition to the other answers, it's worth noting that you're doing very little work inside the parallel.for - grabbing two random values, performing a few comparison, getting some bytes and performing some add-range and additions.

These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.

Then its worth noting, as other answers have, that you have locked some of the slower operations - meaning that it'll run that part at no faster than single thread speed. Having a lock inside a parallel operation isn't necessarily bad (e.g. if it's only a very small fraction of that operations work) but by definition it adds a bottleneck.

When working on small, higher performance loops like this I try to do the following:

1) Have each thread work on its own data, and synchronize at the end (or in a thread-safe, high-speed manner like interlocked)

2) Batch the data into appropriately sized chunks before passing to the thread.

For example:

var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
 
 //review, can easily be off by one due to division
 for (int index = 0; index<( DataLength/threadCount); index++)
 
 // Random values just in this code snippet
 int config_1 = random.Next(-index, index); 
 int config_2 = random.Next(-index, index);

 if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
 
 //this *may* be threadsafe, but worth reviewing to be sure.
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
 Interlocked.Increment(ref validDataCount);
 
 
);
var container = containers.SelectMany(x=>x).ToList(); 
container.AddRange(BitConverter.GetBytes(validDataCount));

Of course, this can get tedious, So do consider other parallel tools and patterns to assist. For example PLinq might be a good example in this case, because it can do the batching behind the scenes and help manage the creation and flow of state between the threads (whereas we had to do that manually here.

answered Jan 19 at 14:24

NPSF3000

60539

Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.
â€“Â Ravi Joshi
Jan 19 at 15:11

Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â€“Â Ravi Joshi
Jan 19 at 16:06

The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like: for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â€¦
â€“Â NPSF3000
Jan 19 at 16:11

make sense. i will check the values and get back to you later. thank you very much.
â€“Â Ravi Joshi
Jan 19 at 16:31

add a commentÂ |Â

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
5
down vote

You might gain some performance if you redesign it in a way that does not requrie synchronization.

answered Jan 19 at 10:51

t3chb0t

32.1k54195

add a commentÂ |Â

up vote
5
down vote

You might gain some performance if you redesign it in a way that does not requrie synchronization.

answered Jan 19 at 10:51

t3chb0t

32.1k54195

add a commentÂ |Â

up vote
5
down vote

You might gain some performance if you redesign it in a way that does not requrie synchronization.

answered Jan 19 at 10:51

t3chb0t

32.1k54195

You might gain some performance if you redesign it in a way that does not requrie synchronization.

answered Jan 19 at 10:51

t3chb0t

32.1k54195

answered Jan 19 at 10:51

t3chb0t

32.1k54195

answered Jan 19 at 10:51

t3chb0t

32.1k54195

answered Jan 19 at 10:51

t3chb0t

32.1k54195

add a commentÂ |Â

up vote
5
down vote

There are three rules I find useful when working with parallel loops:

Do not lock inside the loop. (already explained in t3chb0t's answer)

Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.

Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using ArraySegment class, for example), and use Parallel.For to iterate over array of those chunks.

Following those three rules should give you satisfactory results.

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

answered Jan 19 at 11:04

Nikita B

12.3k11652

I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using ConcurrentBag but didn't see any performance improvements. Any suggestions?
â€“Â Ravi Joshi
Jan 19 at 12:09

Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.
â€“Â Nikita B
Jan 19 at 12:17

I see. I got your point but there is one problem. Some of the elements of the array byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.
â€“Â Ravi Joshi
Jan 19 at 12:30

Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â€“Â Pieter Witvoet
Jan 19 at 12:39

Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded Parallel.For method that I linked in my answer.
â€“Â Nikita B
Jan 19 at 12:45

Â |Â
show 2 more comments

up vote
5
down vote

There are three rules I find useful when working with parallel loops:

Do not lock inside the loop. (already explained in t3chb0t's answer)

Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.

Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using ArraySegment class, for example), and use Parallel.For to iterate over array of those chunks.

Following those three rules should give you satisfactory results.

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

answered Jan 19 at 11:04

Nikita B

12.3k11652

I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using ConcurrentBag but didn't see any performance improvements. Any suggestions?
â€“Â Ravi Joshi
Jan 19 at 12:09

Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.
â€“Â Nikita B
Jan 19 at 12:17

I see. I got your point but there is one problem. Some of the elements of the array byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.
â€“Â Ravi Joshi
Jan 19 at 12:30

Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â€“Â Pieter Witvoet
Jan 19 at 12:39

Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded Parallel.For method that I linked in my answer.
â€“Â Nikita B
Jan 19 at 12:45

Â |Â
show 2 more comments

up vote
5
down vote

There are three rules I find useful when working with parallel loops:

Do not lock inside the loop. (already explained in t3chb0t's answer)

Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.

Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using ArraySegment class, for example), and use Parallel.For to iterate over array of those chunks.

Following those three rules should give you satisfactory results.

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

answered Jan 19 at 11:04

Nikita B

12.3k11652

There are three rules I find useful when working with parallel loops:

Do not lock inside the loop. (already explained in t3chb0t's answer)

Do not write to a shared buffer inside a loop, unless you can do it in a thread-safe manner without any additional synchronization (including concurrent collections or other objects that use locks internally). In case you can't, use one of the overloads that supports local states, and write to that local state without a lock.

Do not iterate over every element of array unless a significant amount of work is required to process each element. In general, you should split your initial data into reasonably-sized chunks (using ArraySegment class, for example), and use Parallel.For to iterate over array of those chunks.

Following those three rules should give you satisfactory results.

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

answered Jan 19 at 11:04

Nikita B

12.3k11652

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

edited Jan 19 at 13:09

Konrad Rudolph

4,8531431

answered Jan 19 at 11:04

Nikita B

12.3k11652

answered Jan 19 at 11:04

Nikita B

12.3k11652

answered Jan 19 at 11:04

Nikita B

12.3k11652

I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using ConcurrentBag but didn't see any performance improvements. Any suggestions?
â€“Â Ravi Joshi
Jan 19 at 12:09

Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.
â€“Â Nikita B
Jan 19 at 12:17

I see. I got your point but there is one problem. Some of the elements of the array byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.
â€“Â Ravi Joshi
Jan 19 at 12:30

Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â€“Â Pieter Witvoet
Jan 19 at 12:39

Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded Parallel.For method that I linked in my answer.
â€“Â Nikita B
Jan 19 at 12:45

Â |Â
show 2 more comments

I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using ConcurrentBag but didn't see any performance improvements. Any suggestions?
â€“Â Ravi Joshi
Jan 19 at 12:09

Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.
â€“Â Nikita B
Jan 19 at 12:17

I see. I got your point but there is one problem. Some of the elements of the array byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.
â€“Â Ravi Joshi
Jan 19 at 12:30

Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â€“Â Pieter Witvoet
Jan 19 at 12:39

Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded Parallel.For method that I linked in my answer.
â€“Â Nikita B
Jan 19 at 12:45

I understand. Thanks. I need to collect the bytes, hence I used list. However since it is not thread safe, I used a lock. I also tried using ConcurrentBag but didn't see any performance improvements. Any suggestions?
â€“Â Ravi Joshi
Jan 19 at 12:09

Concurrent collections have internal synchronization. This does count as additional synchronization that I mentioned in (2), so they are a no-go. In your code you always get 8 bytes per iteration. So you can simply create a byte[DATA_LENGTH*8] array and write directly into it using indexes without locks. Calculate write indexes as index*8, so they do not overlap.
â€“Â Nikita B
Jan 19 at 12:17

I see. I got your point but there is one problem. Some of the elements of the array byte[DATA_LENGTH * 8] may be empty, since some index can generate invalid value and hence the conditional statement if is going to return false.
â€“Â Ravi Joshi
Jan 19 at 12:30

Another idea is to create a separate results collection per chunk, which can then be combined into the final results when all chunks have been processed.
â€“Â Pieter Witvoet
Jan 19 at 12:39

Yes, Pieter is right. If empty spaces in resulting array are unacceptable, you have to fill local buffers instead. See the documentation to overloaded Parallel.For method that I linked in my answer.
â€“Â Nikita B
Jan 19 at 12:45

Â |Â
show 2 more comments

up vote
2
down vote

These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.

When working on small, higher performance loops like this I try to do the following:

For example:

var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
 
 //review, can easily be off by one due to division
 for (int index = 0; index<( DataLength/threadCount); index++)
 
 // Random values just in this code snippet
 int config_1 = random.Next(-index, index); 
 int config_2 = random.Next(-index, index);

 if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
 
 //this *may* be threadsafe, but worth reviewing to be sure.
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
 Interlocked.Increment(ref validDataCount);
 
 
);
var container = containers.SelectMany(x=>x).ToList(); 
container.AddRange(BitConverter.GetBytes(validDataCount));

answered Jan 19 at 14:24

NPSF3000

60539

Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.
â€“Â Ravi Joshi
Jan 19 at 15:11

Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â€“Â Ravi Joshi
Jan 19 at 16:06

The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like: for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â€¦
â€“Â NPSF3000
Jan 19 at 16:11

make sense. i will check the values and get back to you later. thank you very much.
â€“Â Ravi Joshi
Jan 19 at 16:31

add a commentÂ |Â

up vote
2
down vote

These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.

When working on small, higher performance loops like this I try to do the following:

For example:

var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
 
 //review, can easily be off by one due to division
 for (int index = 0; index<( DataLength/threadCount); index++)
 
 // Random values just in this code snippet
 int config_1 = random.Next(-index, index); 
 int config_2 = random.Next(-index, index);

 if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
 
 //this *may* be threadsafe, but worth reviewing to be sure.
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
 Interlocked.Increment(ref validDataCount);
 
 
);
var container = containers.SelectMany(x=>x).ToList(); 
container.AddRange(BitConverter.GetBytes(validDataCount));

answered Jan 19 at 14:24

NPSF3000

60539

Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.
â€“Â Ravi Joshi
Jan 19 at 15:11

Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â€“Â Ravi Joshi
Jan 19 at 16:06

The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like: for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â€¦
â€“Â NPSF3000
Jan 19 at 16:11

make sense. i will check the values and get back to you later. thank you very much.
â€“Â Ravi Joshi
Jan 19 at 16:31

add a commentÂ |Â

up vote
2
down vote

These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.

When working on small, higher performance loops like this I try to do the following:

For example:

var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
 
 //review, can easily be off by one due to division
 for (int index = 0; index<( DataLength/threadCount); index++)
 
 // Random values just in this code snippet
 int config_1 = random.Next(-index, index); 
 int config_2 = random.Next(-index, index);

 if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
 
 //this *may* be threadsafe, but worth reviewing to be sure.
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
 Interlocked.Increment(ref validDataCount);
 
 
);
var container = containers.SelectMany(x=>x).ToList(); 
container.AddRange(BitConverter.GetBytes(validDataCount));

answered Jan 19 at 14:24

NPSF3000

60539

These are very fast, very cheap operations. Parallel.For adds in a ton of overhead that likely exceeds the cost of the operations.

When working on small, higher performance loops like this I try to do the following:

For example:

var threadCount = Enviroment.ProcessorCount;
var threadContainers = new Enumerable.Range(0,threadCount).Select(_ => new List<byte>()).ToArray();
Parallel.For(0, threadCount, thread=>
 
 //review, can easily be off by one due to division
 for (int index = 0; index<( DataLength/threadCount); index++)
 
 // Random values just in this code snippet
 int config_1 = random.Next(-index, index); 
 int config_2 = random.Next(-index, index);

 if (config_1 >= 0 && config_1 < FRAME_WIDTH && config_2 >= 0 && config_2 < FRAME_HEIGHT)
 
 //this *may* be threadsafe, but worth reviewing to be sure.
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_1));
 threadContainers[thread].AddRange(BitConverter.GetBytes(config_2));
 Interlocked.Increment(ref validDataCount);
 
 
);
var container = containers.SelectMany(x=>x).ToList(); 
container.AddRange(BitConverter.GetBytes(validDataCount));

answered Jan 19 at 14:24

NPSF3000

60539

answered Jan 19 at 14:24

NPSF3000

60539

answered Jan 19 at 14:24

NPSF3000

60539

answered Jan 19 at 14:24

NPSF3000

60539

Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.
â€“Â Ravi Joshi
Jan 19 at 15:11

Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â€“Â Ravi Joshi
Jan 19 at 16:06

The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like: for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â€¦
â€“Â NPSF3000
Jan 19 at 16:11

make sense. i will check the values and get back to you later. thank you very much.
â€“Â Ravi Joshi
Jan 19 at 16:31

add a commentÂ |Â

Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.
â€“Â Ravi Joshi
Jan 19 at 15:11

Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â€“Â Ravi Joshi
Jan 19 at 16:06

The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like: for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â€¦
â€“Â NPSF3000
Jan 19 at 16:11

make sense. i will check the values and get back to you later. thank you very much.
â€“Â Ravi Joshi
Jan 19 at 16:31

Amazing. However, I am not able to understand it completely. Before asking my query, let me first explain the real scenario. The line int config_1 = random.Next(-index, index) is int config_1 = sensor_data(index) in the real code, where the length of sensor_data is DataLength. In order to get all int config_1, I iterate over complete DataLength. Hence 0 < index < DATA_LENGTH. Ques: I see that in your suggestion 0 < index < (DATA_LENGTH/threadCount). So I suspect that instead of looking at complete data, it is looking only over (DATA_LENGTH/threadCount) portion of data.
â€“Â Ravi Joshi
Jan 19 at 15:11

Updated the code in question for better understanding. Please have a look. Apologies for the inconvenience. Thank you again.
â€“Â Ravi Joshi
Jan 19 at 16:06

The exact chunking logic is situation dependent - for example in the demo while the count of the loops was important, the order is next. You may find in reality the order and position is important - leading to something like: for (int index = (length/threadcount)*thread; index<(length/threadcount)*(thread+1); index++) (please doublecheck, esp on unusual values). What I often do is write a Method that takes in array, and splits it into n chunks to make life simpler e.g. stackoverflow.com/questions/419019/â€¦
â€“Â NPSF3000
Jan 19 at 16:11

make sense. i will check the values and get back to you later. thank you very much.
â€“Â Ravi Joshi
Jan 19 at 16:31

add a commentÂ |Â

搜尋此網誌

trjhtr