Build clusters from connections (virus propagation)
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
8
down vote
favorite
The scenario is the following: There is a list of persons and places where those persons visit. Each person is connected to a single or multiple places. See the picture below, where O[i] represents objects (persons) and P[i] places, respectively.
I need to get the list of places connected by those persons.
Codes:
class Program
static void Main(string args)
var Connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var Clusters = Connections.Select(x => x.Item2).ToList();
bool proceed = true;
do
var pairs = Clusters.SelectMany(x => Clusters, (x, y) =>
Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
proceed = false;
foreach (var pair in pairs)
var cluster1 = pair.Item1;
var cluster2 = pair.Item2;
if (cluster1.Intersect(cluster2).Any())
cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
Clusters.Remove(cluster2);
proceed = true;
break;
while (proceed);
foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output:
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
Questions:
Is this optimal way to solve the given problem for larger data?
How can I estimate the computational complexity of those operations?
Am I doing here something potentially dangerous or excessive with the collections?
Any other concerns or remarks?
Any feedback would be appreciated.
c# performance linq collections
add a comment |Â
up vote
8
down vote
favorite
The scenario is the following: There is a list of persons and places where those persons visit. Each person is connected to a single or multiple places. See the picture below, where O[i] represents objects (persons) and P[i] places, respectively.
I need to get the list of places connected by those persons.
Codes:
class Program
static void Main(string args)
var Connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var Clusters = Connections.Select(x => x.Item2).ToList();
bool proceed = true;
do
var pairs = Clusters.SelectMany(x => Clusters, (x, y) =>
Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
proceed = false;
foreach (var pair in pairs)
var cluster1 = pair.Item1;
var cluster2 = pair.Item2;
if (cluster1.Intersect(cluster2).Any())
cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
Clusters.Remove(cluster2);
proceed = true;
break;
while (proceed);
foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output:
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
Questions:
Is this optimal way to solve the given problem for larger data?
How can I estimate the computational complexity of those operations?
Am I doing here something potentially dangerous or excessive with the collections?
Any other concerns or remarks?
Any feedback would be appreciated.
c# performance linq collections
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Vogel612â¦
Jun 29 at 10:54
As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â VisualMelon
Jun 29 at 12:46
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
The scenario is the following: There is a list of persons and places where those persons visit. Each person is connected to a single or multiple places. See the picture below, where O[i] represents objects (persons) and P[i] places, respectively.
I need to get the list of places connected by those persons.
Codes:
class Program
static void Main(string args)
var Connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var Clusters = Connections.Select(x => x.Item2).ToList();
bool proceed = true;
do
var pairs = Clusters.SelectMany(x => Clusters, (x, y) =>
Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
proceed = false;
foreach (var pair in pairs)
var cluster1 = pair.Item1;
var cluster2 = pair.Item2;
if (cluster1.Intersect(cluster2).Any())
cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
Clusters.Remove(cluster2);
proceed = true;
break;
while (proceed);
foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output:
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
Questions:
Is this optimal way to solve the given problem for larger data?
How can I estimate the computational complexity of those operations?
Am I doing here something potentially dangerous or excessive with the collections?
Any other concerns or remarks?
Any feedback would be appreciated.
c# performance linq collections
The scenario is the following: There is a list of persons and places where those persons visit. Each person is connected to a single or multiple places. See the picture below, where O[i] represents objects (persons) and P[i] places, respectively.
I need to get the list of places connected by those persons.
Codes:
class Program
static void Main(string args)
var Connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var Clusters = Connections.Select(x => x.Item2).ToList();
bool proceed = true;
do
var pairs = Clusters.SelectMany(x => Clusters, (x, y) =>
Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
proceed = false;
foreach (var pair in pairs)
var cluster1 = pair.Item1;
var cluster2 = pair.Item2;
if (cluster1.Intersect(cluster2).Any())
cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
Clusters.Remove(cluster2);
proceed = true;
break;
while (proceed);
foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output:
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
Questions:
Is this optimal way to solve the given problem for larger data?
How can I estimate the computational complexity of those operations?
Am I doing here something potentially dangerous or excessive with the collections?
Any other concerns or remarks?
Any feedback would be appreciated.
c# performance linq collections
edited Jun 29 at 10:53
Vogel612â¦
20.9k345124
20.9k345124
asked Jun 25 at 0:03
L_J
1837
1837
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Vogel612â¦
Jun 29 at 10:54
As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â VisualMelon
Jun 29 at 12:46
add a comment |Â
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Vogel612â¦
Jun 29 at 10:54
As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â VisualMelon
Jun 29 at 12:46
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Vogel612â¦
Jun 29 at 10:54
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Vogel612â¦
Jun 29 at 10:54
As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â VisualMelon
Jun 29 at 12:46
As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â VisualMelon
Jun 29 at 12:46
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
5
down vote
accepted
I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:
IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);
This will make it reusable (and so testable).
You ask "Am I doing here something potentially dangerous or excessive with the collections?". I'm not entirely sure what you mean... but I think no. Collections are efficient, and it's hard to do anything dangerous in C# by accident. If anything, you should probably be using more powerful collections.
Tuples
What is Tuple<string, List<string>>
? What are Item1
and Item2
? Give it a meaningful type, with meaningful member names!
public class Person
public Person(string name, IReadOnlyCollection<string> places)
Name = name;
Places = places;
public string Name get;
public IReadOnlyCollection<string> Places get;
Complexity
Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.
First you need to identify the parameters you care about. Let's say n
for the number of people and 'b' number of places. You iterate over pairs of people, which gives you n^2
(just to enumerate the pairs).
Inside that loop you are looking for an intersection between 2 sets of up to b
elements, which can be no better than O(b)
(that's the case where the hashes turn out perfectly, assuming Intersect
converts one of the lists to a HashSet
internally). cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)))
is O(b^2)
. You ought instead be using HashSet
s from the outset, which will get the inner working down to something more like O(b)
again (i.e. HashSet<T>.UnionWith(HashSet<T>)
is kind of just about linear most of the time). ... but then that matching can only happen n
times... so it's a separate concern.
Then there is the outer most loop... which I don't want to think about... but in the worst case it will be run n
times, giving you a worst-case complexity of O(n*n*n*b + n*n*b*b)
, or just about O(n*n*n*b)
if you use HashSets.
I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n)
.
Technically, your printing out at the end has third order time complexity: O(m*k*k)
in number of clusters m
, and number of places per cluster k
. Consider using a StringBuilder
or string.Join
instead of Aggregate
and repeated concatenation (that will make it O(m*k)
).
Disjoint Sets
Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b)
time-complexity (max tree-depth is O(n)
) on this problem.
The basic idea is that you maintain a table pointers which describe a set of trees, each tree being an independent (disjoint) set. You can follow the pointers from any member of a set to the root member, and by assigning the parent of one root node to the root of another you join the two sets together. You can throw an implementation together in no time, with very little code.
Some very unscientific benchmarking suggests that a somewhat untested disjoint sets implementation I just threw together is a couple of orders of magnitude faster than your implementation (using HashSets and with reduced Pair count, as suggested in Paparazzo's answer) on randomly generated problems. Your implementation is taking over 10seconds on my machine to solve a problem with b=4000
, which the disjoint sets solve in a few milliseconds. The disjoint set version seems to be linear in n*b
(easier to scale than just b
in random problems); yours is taking a while for larger problems larger than n*b=5000
, and I wouldn't wish to comment on the experimental time-complexity with these few (noisy) results.
A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)
add a comment |Â
up vote
3
down vote
Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).
Consistency
I would make bool proceed = false;
into var proceed = false;
to keep consistent with your usage of var
.
Naming
Booleans usually have a prefix of is
, has
, will
, etc. So perhaps willProceed
is more fitting. Ultimately, it might be best to name it something that describes the scenario in which we will proceed. In this case, I believe you proceed as long has you have found intersecting pairs/clusters. So maybe hasIntersectingPairs
or hasIntersectingClusters
it's not as short, but it definitely is more clear on its intentions.
Casing
Typically you would make local variables such as Connections
and Clusters
lower-case (i.e. var connections = ...;
and var clusters = ...;
).
Unnecessary Foreach
I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault
and an if-statement.
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
willProceed = true;
No need for a break
and it's a bit more clear what the possible paths your program can take, since your foreach loop will either find an intersecting pair and stop (via break
) or will find no intersecting pair and continue until out of pairs
.
Additionally naming it cluster
(instead of pair
) makes the naming line up with cluster1 -> cluster.Item1
and cluster2 -> cluster.Item2
without having to make unnecessary variables. If you prefer the extra variables I would then suggest you take VisualMelon's advice and make a meaningful type instead of using Tuple<>
.
do/while vs while
You declare bool proceed = true;
then use a do/while loop as if it was initialized as false, but then immediately set it to false inside the loop. The following should be a bit more straight forward and accomplish the same thing,
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
...
Putting My Suggestions Together
static void Main(string args)
var connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var clusters = connections.Select(x => x.Item2).ToList();
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
hasIntersectingPairs = true;
foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output still Matches
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
add a comment |Â
up vote
1
down vote
Look good to me.
Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:
IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);
This will make it reusable (and so testable).
You ask "Am I doing here something potentially dangerous or excessive with the collections?". I'm not entirely sure what you mean... but I think no. Collections are efficient, and it's hard to do anything dangerous in C# by accident. If anything, you should probably be using more powerful collections.
Tuples
What is Tuple<string, List<string>>
? What are Item1
and Item2
? Give it a meaningful type, with meaningful member names!
public class Person
public Person(string name, IReadOnlyCollection<string> places)
Name = name;
Places = places;
public string Name get;
public IReadOnlyCollection<string> Places get;
Complexity
Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.
First you need to identify the parameters you care about. Let's say n
for the number of people and 'b' number of places. You iterate over pairs of people, which gives you n^2
(just to enumerate the pairs).
Inside that loop you are looking for an intersection between 2 sets of up to b
elements, which can be no better than O(b)
(that's the case where the hashes turn out perfectly, assuming Intersect
converts one of the lists to a HashSet
internally). cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)))
is O(b^2)
. You ought instead be using HashSet
s from the outset, which will get the inner working down to something more like O(b)
again (i.e. HashSet<T>.UnionWith(HashSet<T>)
is kind of just about linear most of the time). ... but then that matching can only happen n
times... so it's a separate concern.
Then there is the outer most loop... which I don't want to think about... but in the worst case it will be run n
times, giving you a worst-case complexity of O(n*n*n*b + n*n*b*b)
, or just about O(n*n*n*b)
if you use HashSets.
I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n)
.
Technically, your printing out at the end has third order time complexity: O(m*k*k)
in number of clusters m
, and number of places per cluster k
. Consider using a StringBuilder
or string.Join
instead of Aggregate
and repeated concatenation (that will make it O(m*k)
).
Disjoint Sets
Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b)
time-complexity (max tree-depth is O(n)
) on this problem.
The basic idea is that you maintain a table pointers which describe a set of trees, each tree being an independent (disjoint) set. You can follow the pointers from any member of a set to the root member, and by assigning the parent of one root node to the root of another you join the two sets together. You can throw an implementation together in no time, with very little code.
Some very unscientific benchmarking suggests that a somewhat untested disjoint sets implementation I just threw together is a couple of orders of magnitude faster than your implementation (using HashSets and with reduced Pair count, as suggested in Paparazzo's answer) on randomly generated problems. Your implementation is taking over 10seconds on my machine to solve a problem with b=4000
, which the disjoint sets solve in a few milliseconds. The disjoint set version seems to be linear in n*b
(easier to scale than just b
in random problems); yours is taking a while for larger problems larger than n*b=5000
, and I wouldn't wish to comment on the experimental time-complexity with these few (noisy) results.
A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)
add a comment |Â
up vote
5
down vote
accepted
I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:
IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);
This will make it reusable (and so testable).
You ask "Am I doing here something potentially dangerous or excessive with the collections?". I'm not entirely sure what you mean... but I think no. Collections are efficient, and it's hard to do anything dangerous in C# by accident. If anything, you should probably be using more powerful collections.
Tuples
What is Tuple<string, List<string>>
? What are Item1
and Item2
? Give it a meaningful type, with meaningful member names!
public class Person
public Person(string name, IReadOnlyCollection<string> places)
Name = name;
Places = places;
public string Name get;
public IReadOnlyCollection<string> Places get;
Complexity
Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.
First you need to identify the parameters you care about. Let's say n
for the number of people and 'b' number of places. You iterate over pairs of people, which gives you n^2
(just to enumerate the pairs).
Inside that loop you are looking for an intersection between 2 sets of up to b
elements, which can be no better than O(b)
(that's the case where the hashes turn out perfectly, assuming Intersect
converts one of the lists to a HashSet
internally). cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)))
is O(b^2)
. You ought instead be using HashSet
s from the outset, which will get the inner working down to something more like O(b)
again (i.e. HashSet<T>.UnionWith(HashSet<T>)
is kind of just about linear most of the time). ... but then that matching can only happen n
times... so it's a separate concern.
Then there is the outer most loop... which I don't want to think about... but in the worst case it will be run n
times, giving you a worst-case complexity of O(n*n*n*b + n*n*b*b)
, or just about O(n*n*n*b)
if you use HashSets.
I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n)
.
Technically, your printing out at the end has third order time complexity: O(m*k*k)
in number of clusters m
, and number of places per cluster k
. Consider using a StringBuilder
or string.Join
instead of Aggregate
and repeated concatenation (that will make it O(m*k)
).
Disjoint Sets
Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b)
time-complexity (max tree-depth is O(n)
) on this problem.
The basic idea is that you maintain a table pointers which describe a set of trees, each tree being an independent (disjoint) set. You can follow the pointers from any member of a set to the root member, and by assigning the parent of one root node to the root of another you join the two sets together. You can throw an implementation together in no time, with very little code.
Some very unscientific benchmarking suggests that a somewhat untested disjoint sets implementation I just threw together is a couple of orders of magnitude faster than your implementation (using HashSets and with reduced Pair count, as suggested in Paparazzo's answer) on randomly generated problems. Your implementation is taking over 10seconds on my machine to solve a problem with b=4000
, which the disjoint sets solve in a few milliseconds. The disjoint set version seems to be linear in n*b
(easier to scale than just b
in random problems); yours is taking a while for larger problems larger than n*b=5000
, and I wouldn't wish to comment on the experimental time-complexity with these few (noisy) results.
A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:
IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);
This will make it reusable (and so testable).
You ask "Am I doing here something potentially dangerous or excessive with the collections?". I'm not entirely sure what you mean... but I think no. Collections are efficient, and it's hard to do anything dangerous in C# by accident. If anything, you should probably be using more powerful collections.
Tuples
What is Tuple<string, List<string>>
? What are Item1
and Item2
? Give it a meaningful type, with meaningful member names!
public class Person
public Person(string name, IReadOnlyCollection<string> places)
Name = name;
Places = places;
public string Name get;
public IReadOnlyCollection<string> Places get;
Complexity
Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.
First you need to identify the parameters you care about. Let's say n
for the number of people and 'b' number of places. You iterate over pairs of people, which gives you n^2
(just to enumerate the pairs).
Inside that loop you are looking for an intersection between 2 sets of up to b
elements, which can be no better than O(b)
(that's the case where the hashes turn out perfectly, assuming Intersect
converts one of the lists to a HashSet
internally). cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)))
is O(b^2)
. You ought instead be using HashSet
s from the outset, which will get the inner working down to something more like O(b)
again (i.e. HashSet<T>.UnionWith(HashSet<T>)
is kind of just about linear most of the time). ... but then that matching can only happen n
times... so it's a separate concern.
Then there is the outer most loop... which I don't want to think about... but in the worst case it will be run n
times, giving you a worst-case complexity of O(n*n*n*b + n*n*b*b)
, or just about O(n*n*n*b)
if you use HashSets.
I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n)
.
Technically, your printing out at the end has third order time complexity: O(m*k*k)
in number of clusters m
, and number of places per cluster k
. Consider using a StringBuilder
or string.Join
instead of Aggregate
and repeated concatenation (that will make it O(m*k)
).
Disjoint Sets
Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b)
time-complexity (max tree-depth is O(n)
) on this problem.
The basic idea is that you maintain a table pointers which describe a set of trees, each tree being an independent (disjoint) set. You can follow the pointers from any member of a set to the root member, and by assigning the parent of one root node to the root of another you join the two sets together. You can throw an implementation together in no time, with very little code.
Some very unscientific benchmarking suggests that a somewhat untested disjoint sets implementation I just threw together is a couple of orders of magnitude faster than your implementation (using HashSets and with reduced Pair count, as suggested in Paparazzo's answer) on randomly generated problems. Your implementation is taking over 10seconds on my machine to solve a problem with b=4000
, which the disjoint sets solve in a few milliseconds. The disjoint set version seems to be linear in n*b
(easier to scale than just b
in random problems); yours is taking a while for larger problems larger than n*b=5000
, and I wouldn't wish to comment on the experimental time-complexity with these few (noisy) results.
A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)
I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:
IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);
This will make it reusable (and so testable).
You ask "Am I doing here something potentially dangerous or excessive with the collections?". I'm not entirely sure what you mean... but I think no. Collections are efficient, and it's hard to do anything dangerous in C# by accident. If anything, you should probably be using more powerful collections.
Tuples
What is Tuple<string, List<string>>
? What are Item1
and Item2
? Give it a meaningful type, with meaningful member names!
public class Person
public Person(string name, IReadOnlyCollection<string> places)
Name = name;
Places = places;
public string Name get;
public IReadOnlyCollection<string> Places get;
Complexity
Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.
First you need to identify the parameters you care about. Let's say n
for the number of people and 'b' number of places. You iterate over pairs of people, which gives you n^2
(just to enumerate the pairs).
Inside that loop you are looking for an intersection between 2 sets of up to b
elements, which can be no better than O(b)
(that's the case where the hashes turn out perfectly, assuming Intersect
converts one of the lists to a HashSet
internally). cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)))
is O(b^2)
. You ought instead be using HashSet
s from the outset, which will get the inner working down to something more like O(b)
again (i.e. HashSet<T>.UnionWith(HashSet<T>)
is kind of just about linear most of the time). ... but then that matching can only happen n
times... so it's a separate concern.
Then there is the outer most loop... which I don't want to think about... but in the worst case it will be run n
times, giving you a worst-case complexity of O(n*n*n*b + n*n*b*b)
, or just about O(n*n*n*b)
if you use HashSets.
I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n)
.
Technically, your printing out at the end has third order time complexity: O(m*k*k)
in number of clusters m
, and number of places per cluster k
. Consider using a StringBuilder
or string.Join
instead of Aggregate
and repeated concatenation (that will make it O(m*k)
).
Disjoint Sets
Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b)
time-complexity (max tree-depth is O(n)
) on this problem.
The basic idea is that you maintain a table pointers which describe a set of trees, each tree being an independent (disjoint) set. You can follow the pointers from any member of a set to the root member, and by assigning the parent of one root node to the root of another you join the two sets together. You can throw an implementation together in no time, with very little code.
Some very unscientific benchmarking suggests that a somewhat untested disjoint sets implementation I just threw together is a couple of orders of magnitude faster than your implementation (using HashSets and with reduced Pair count, as suggested in Paparazzo's answer) on randomly generated problems. Your implementation is taking over 10seconds on my machine to solve a problem with b=4000
, which the disjoint sets solve in a few milliseconds. The disjoint set version seems to be linear in n*b
(easier to scale than just b
in random problems); yours is taking a while for larger problems larger than n*b=5000
, and I wouldn't wish to comment on the experimental time-complexity with these few (noisy) results.
A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)
edited Jun 26 at 9:59
answered Jun 25 at 17:44
VisualMelon
2,105716
2,105716
add a comment |Â
add a comment |Â
up vote
3
down vote
Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).
Consistency
I would make bool proceed = false;
into var proceed = false;
to keep consistent with your usage of var
.
Naming
Booleans usually have a prefix of is
, has
, will
, etc. So perhaps willProceed
is more fitting. Ultimately, it might be best to name it something that describes the scenario in which we will proceed. In this case, I believe you proceed as long has you have found intersecting pairs/clusters. So maybe hasIntersectingPairs
or hasIntersectingClusters
it's not as short, but it definitely is more clear on its intentions.
Casing
Typically you would make local variables such as Connections
and Clusters
lower-case (i.e. var connections = ...;
and var clusters = ...;
).
Unnecessary Foreach
I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault
and an if-statement.
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
willProceed = true;
No need for a break
and it's a bit more clear what the possible paths your program can take, since your foreach loop will either find an intersecting pair and stop (via break
) or will find no intersecting pair and continue until out of pairs
.
Additionally naming it cluster
(instead of pair
) makes the naming line up with cluster1 -> cluster.Item1
and cluster2 -> cluster.Item2
without having to make unnecessary variables. If you prefer the extra variables I would then suggest you take VisualMelon's advice and make a meaningful type instead of using Tuple<>
.
do/while vs while
You declare bool proceed = true;
then use a do/while loop as if it was initialized as false, but then immediately set it to false inside the loop. The following should be a bit more straight forward and accomplish the same thing,
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
...
Putting My Suggestions Together
static void Main(string args)
var connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var clusters = connections.Select(x => x.Item2).ToList();
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
hasIntersectingPairs = true;
foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output still Matches
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
add a comment |Â
up vote
3
down vote
Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).
Consistency
I would make bool proceed = false;
into var proceed = false;
to keep consistent with your usage of var
.
Naming
Booleans usually have a prefix of is
, has
, will
, etc. So perhaps willProceed
is more fitting. Ultimately, it might be best to name it something that describes the scenario in which we will proceed. In this case, I believe you proceed as long has you have found intersecting pairs/clusters. So maybe hasIntersectingPairs
or hasIntersectingClusters
it's not as short, but it definitely is more clear on its intentions.
Casing
Typically you would make local variables such as Connections
and Clusters
lower-case (i.e. var connections = ...;
and var clusters = ...;
).
Unnecessary Foreach
I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault
and an if-statement.
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
willProceed = true;
No need for a break
and it's a bit more clear what the possible paths your program can take, since your foreach loop will either find an intersecting pair and stop (via break
) or will find no intersecting pair and continue until out of pairs
.
Additionally naming it cluster
(instead of pair
) makes the naming line up with cluster1 -> cluster.Item1
and cluster2 -> cluster.Item2
without having to make unnecessary variables. If you prefer the extra variables I would then suggest you take VisualMelon's advice and make a meaningful type instead of using Tuple<>
.
do/while vs while
You declare bool proceed = true;
then use a do/while loop as if it was initialized as false, but then immediately set it to false inside the loop. The following should be a bit more straight forward and accomplish the same thing,
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
...
Putting My Suggestions Together
static void Main(string args)
var connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var clusters = connections.Select(x => x.Item2).ToList();
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
hasIntersectingPairs = true;
foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output still Matches
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
add a comment |Â
up vote
3
down vote
up vote
3
down vote
Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).
Consistency
I would make bool proceed = false;
into var proceed = false;
to keep consistent with your usage of var
.
Naming
Booleans usually have a prefix of is
, has
, will
, etc. So perhaps willProceed
is more fitting. Ultimately, it might be best to name it something that describes the scenario in which we will proceed. In this case, I believe you proceed as long has you have found intersecting pairs/clusters. So maybe hasIntersectingPairs
or hasIntersectingClusters
it's not as short, but it definitely is more clear on its intentions.
Casing
Typically you would make local variables such as Connections
and Clusters
lower-case (i.e. var connections = ...;
and var clusters = ...;
).
Unnecessary Foreach
I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault
and an if-statement.
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
willProceed = true;
No need for a break
and it's a bit more clear what the possible paths your program can take, since your foreach loop will either find an intersecting pair and stop (via break
) or will find no intersecting pair and continue until out of pairs
.
Additionally naming it cluster
(instead of pair
) makes the naming line up with cluster1 -> cluster.Item1
and cluster2 -> cluster.Item2
without having to make unnecessary variables. If you prefer the extra variables I would then suggest you take VisualMelon's advice and make a meaningful type instead of using Tuple<>
.
do/while vs while
You declare bool proceed = true;
then use a do/while loop as if it was initialized as false, but then immediately set it to false inside the loop. The following should be a bit more straight forward and accomplish the same thing,
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
...
Putting My Suggestions Together
static void Main(string args)
var connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var clusters = connections.Select(x => x.Item2).ToList();
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
hasIntersectingPairs = true;
foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output still Matches
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).
Consistency
I would make bool proceed = false;
into var proceed = false;
to keep consistent with your usage of var
.
Naming
Booleans usually have a prefix of is
, has
, will
, etc. So perhaps willProceed
is more fitting. Ultimately, it might be best to name it something that describes the scenario in which we will proceed. In this case, I believe you proceed as long has you have found intersecting pairs/clusters. So maybe hasIntersectingPairs
or hasIntersectingClusters
it's not as short, but it definitely is more clear on its intentions.
Casing
Typically you would make local variables such as Connections
and Clusters
lower-case (i.e. var connections = ...;
and var clusters = ...;
).
Unnecessary Foreach
I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault
and an if-statement.
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
willProceed = true;
No need for a break
and it's a bit more clear what the possible paths your program can take, since your foreach loop will either find an intersecting pair and stop (via break
) or will find no intersecting pair and continue until out of pairs
.
Additionally naming it cluster
(instead of pair
) makes the naming line up with cluster1 -> cluster.Item1
and cluster2 -> cluster.Item2
without having to make unnecessary variables. If you prefer the extra variables I would then suggest you take VisualMelon's advice and make a meaningful type instead of using Tuple<>
.
do/while vs while
You declare bool proceed = true;
then use a do/while loop as if it was initialized as false, but then immediately set it to false inside the loop. The following should be a bit more straight forward and accomplish the same thing,
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
...
Putting My Suggestions Together
static void Main(string args)
var connections = new List<Tuple<string, List<string>>>()
new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
new Tuple<string, List<string>>("O6", new List<string>() "P4"),
new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
;
var clusters = connections.Select(x => x.Item2).ToList();
var hasIntersectingPairs = true;
while (hasIntersectingPairs)
hasIntersectingPairs = false;
var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);
var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)
cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
clusters.Remove(cluster.Item2);
hasIntersectingPairs = true;
foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
Console.Read();
Output still Matches
P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13
answered Jun 25 at 18:20
Shelby115
1,354516
1,354516
add a comment |Â
add a comment |Â
up vote
1
down vote
Look good to me.
Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.
add a comment |Â
up vote
1
down vote
Look good to me.
Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Look good to me.
Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.
Look good to me.
Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.
edited Jun 25 at 15:58
answered Jun 25 at 13:23
paparazzo
4,8131730
4,8131730
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197189%2fbuild-clusters-from-connections-virus-propagation%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Vogel612â¦
Jun 29 at 10:54
As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â VisualMelon
Jun 29 at 12:46