Build clusters from connections (virus propagation)

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
8
down vote

favorite

The scenario is the following: There is a list of persons and places where those persons visit. Each person is connected to a single or multiple places. See the picture below, where O[i] represents objects (persons) and P[i] places, respectively.

I need to get the list of places connected by those persons.

Here **O** represents objects (persons) and **P** places, respectively

Codes:

class Program

 static void Main(string args)
 
 var Connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;

 var Clusters = Connections.Select(x => x.Item2).ToList();

 bool proceed = true;
 do
 
 var pairs = Clusters.SelectMany(x => Clusters, (x, y) => 
 Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 proceed = false;
 foreach (var pair in pairs)
 
 var cluster1 = pair.Item1;
 var cluster2 = pair.Item2;

 if (cluster1.Intersect(cluster2).Any())
 
 cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
 Clusters.Remove(cluster2);
 proceed = true;
 break;
 
 
 while (proceed);

 foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output:

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

Questions:

Is this optimal way to solve the given problem for larger data?

How can I estimate the computational complexity of those operations?

Am I doing here something potentially dangerous or excessive with the collections?

Any other concerns or remarks?

Any feedback would be appreciated.

edited Jun 29 at 10:53

Vogel612â™¦

20.9k345124

asked Jun 25 at 0:03

L_J

1837

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Vogel612â™¦
Jun 29 at 10:54

As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â€“Â VisualMelon
Jun 29 at 12:46

add a commentÂ |Â

up vote
8
down vote

favorite

I need to get the list of places connected by those persons.

Here **O** represents objects (persons) and **P** places, respectively

Codes:

class Program

 static void Main(string args)
 
 var Connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;

 var Clusters = Connections.Select(x => x.Item2).ToList();

 bool proceed = true;
 do
 
 var pairs = Clusters.SelectMany(x => Clusters, (x, y) => 
 Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 proceed = false;
 foreach (var pair in pairs)
 
 var cluster1 = pair.Item1;
 var cluster2 = pair.Item2;

 if (cluster1.Intersect(cluster2).Any())
 
 cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
 Clusters.Remove(cluster2);
 proceed = true;
 break;
 
 
 while (proceed);

 foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output:

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

Questions:

Is this optimal way to solve the given problem for larger data?

How can I estimate the computational complexity of those operations?

Am I doing here something potentially dangerous or excessive with the collections?

Any other concerns or remarks?

Any feedback would be appreciated.

edited Jun 29 at 10:53

Vogel612â™¦

20.9k345124

asked Jun 25 at 0:03

L_J

1837

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Vogel612â™¦
Jun 29 at 10:54

As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â€“Â VisualMelon
Jun 29 at 12:46

add a commentÂ |Â

up vote
8
down vote

favorite

I need to get the list of places connected by those persons.

Here **O** represents objects (persons) and **P** places, respectively

Codes:

class Program

 static void Main(string args)
 
 var Connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;

 var Clusters = Connections.Select(x => x.Item2).ToList();

 bool proceed = true;
 do
 
 var pairs = Clusters.SelectMany(x => Clusters, (x, y) => 
 Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 proceed = false;
 foreach (var pair in pairs)
 
 var cluster1 = pair.Item1;
 var cluster2 = pair.Item2;

 if (cluster1.Intersect(cluster2).Any())
 
 cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
 Clusters.Remove(cluster2);
 proceed = true;
 break;
 
 
 while (proceed);

 foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output:

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

Questions:

Is this optimal way to solve the given problem for larger data?

How can I estimate the computational complexity of those operations?

Am I doing here something potentially dangerous or excessive with the collections?

Any other concerns or remarks?

Any feedback would be appreciated.

edited Jun 29 at 10:53

Vogel612â™¦

20.9k345124

asked Jun 25 at 0:03

L_J

1837

I need to get the list of places connected by those persons.

Here **O** represents objects (persons) and **P** places, respectively

Codes:

class Program

 static void Main(string args)
 
 var Connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;

 var Clusters = Connections.Select(x => x.Item2).ToList();

 bool proceed = true;
 do
 
 var pairs = Clusters.SelectMany(x => Clusters, (x, y) => 
 Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 proceed = false;
 foreach (var pair in pairs)
 
 var cluster1 = pair.Item1;
 var cluster2 = pair.Item2;

 if (cluster1.Intersect(cluster2).Any())
 
 cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x)));
 Clusters.Remove(cluster2);
 proceed = true;
 break;
 
 
 while (proceed);

 foreach (var cluster in Clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output:

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

Questions:

Is this optimal way to solve the given problem for larger data?

How can I estimate the computational complexity of those operations?

Am I doing here something potentially dangerous or excessive with the collections?

Any other concerns or remarks?

Any feedback would be appreciated.

edited Jun 29 at 10:53

Vogel612â™¦

20.9k345124

asked Jun 25 at 0:03

L_J

1837

edited Jun 29 at 10:53

Vogel612â™¦

20.9k345124

edited Jun 29 at 10:53

Vogel612â™¦

20.9k345124

edited Jun 29 at 10:53

Vogel612â™¦

20.9k345124

asked Jun 25 at 0:03

L_J

1837

asked Jun 25 at 0:03

L_J

1837

asked Jun 25 at 0:03

L_J

1837

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Vogel612â™¦
Jun 29 at 10:54

As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â€“Â VisualMelon
Jun 29 at 12:46

add a commentÂ |Â

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Vogel612â™¦
Jun 29 at 10:54

As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â€“Â VisualMelon
Jun 29 at 12:46

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â€“Â Vogel612â™¦
Jun 29 at 10:54

As Vogel612 says, if you want to ask for futher help, you'll need a different question, possibly on a different Stack Exchange site. However, I did link to a Disjoints Set implementation I wrote as part of my answer (which is here) (which you are welcome to use, but I can't promise it's perfect). There is also mathblog.dk/disjoint-set-data-structure which is linked from the Wikipedia article.
â€“Â VisualMelon
Jun 29 at 12:46

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
5
down vote

accepted

I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:

IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);

This will make it reusable (and so testable).

You ask "Am I doing here something potentially dangerous or excessive with the collections?". I'm not entirely sure what you mean... but I think no. Collections are efficient, and it's hard to do anything dangerous in C# by accident. If anything, you should probably be using more powerful collections.

Tuples

What is Tuple<string, List<string>>? What are Item1 and Item2? Give it a meaningful type, with meaningful member names!

public class Person

 public Person(string name, IReadOnlyCollection<string> places)
 
 Name = name;
 Places = places;
 

 public string Name get; 
 public IReadOnlyCollection<string> Places get;

Complexity

Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.

First you need to identify the parameters you care about. Let's say n for the number of people and 'b' number of places. You iterate over pairs of people, which gives you n^2 (just to enumerate the pairs).

Inside that loop you are looking for an intersection between 2 sets of up to b elements, which can be no better than O(b) (that's the case where the hashes turn out perfectly, assuming Intersect converts one of the lists to a HashSet internally). cluster1.AddRange(cluster2.Where(x => !cluster1.Contains(x))) is O(b^2). You ought instead be using HashSets from the outset, which will get the inner working down to something more like O(b) again (i.e. HashSet<T>.UnionWith(HashSet<T>) is kind of just about linear most of the time). ... but then that matching can only happen n times... so it's a separate concern.

Then there is the outer most loop... which I don't want to think about... but in the worst case it will be run n times, giving you a worst-case complexity of O(n*n*n*b + n*n*b*b), or just about O(n*n*n*b) if you use HashSets.

I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n).

Technically, your printing out at the end has third order time complexity: O(m*k*k) in number of clusters m, and number of places per cluster k. Consider using a StringBuilder or string.Join instead of Aggregate and repeated concatenation (that will make it O(m*k)).

Disjoint Sets

Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b) time-complexity (max tree-depth is O(n)) on this problem.

The basic idea is that you maintain a table pointers which describe a set of trees, each tree being an independent (disjoint) set. You can follow the pointers from any member of a set to the root member, and by assigning the parent of one root node to the root of another you join the two sets together. You can throw an implementation together in no time, with very little code.

Some very unscientific benchmarking suggests that a somewhat untested disjoint sets implementation I just threw together is a couple of orders of magnitude faster than your implementation (using HashSets and with reduced Pair count, as suggested in Paparazzo's answer) on randomly generated problems. Your implementation is taking over 10seconds on my machine to solve a problem with b=4000, which the disjoint sets solve in a few milliseconds. The disjoint set version seems to be linear in n*b (easier to scale than just b in random problems); yours is taking a while for larger problems larger than n*b=5000, and I wouldn't wish to comment on the experimental time-complexity with these few (noisy) results.

A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)

edited Jun 26 at 9:59

answered Jun 25 at 17:44

VisualMelon

2,105716

add a commentÂ |Â

up vote
3
down vote

Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).

Consistency

I would make bool proceed = false; into var proceed = false; to keep consistent with your usage of var.

Naming

Booleans usually have a prefix of is, has, will, etc. So perhaps willProceed is more fitting. Ultimately, it might be best to name it something that describes the scenario in which we will proceed. In this case, I believe you proceed as long has you have found intersecting pairs/clusters. So maybe hasIntersectingPairs or hasIntersectingClusters it's not as short, but it definitely is more clear on its intentions.

Casing

Typically you would make local variables such as Connections and Clusters lower-case (i.e. var connections = ...; and var clusters = ...;).

Unnecessary Foreach

I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault and an if-statement.

var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)

 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 willProceed = true;

No need for a break and it's a bit more clear what the possible paths your program can take, since your foreach loop will either find an intersecting pair and stop (via break) or will find no intersecting pair and continue until out of pairs.

Additionally naming it cluster (instead of pair) makes the naming line up with cluster1 -> cluster.Item1 and cluster2 -> cluster.Item2 without having to make unnecessary variables. If you prefer the extra variables I would then suggest you take VisualMelon's advice and make a meaningful type instead of using Tuple<>.

do/while vs while

You declare bool proceed = true; then use a do/while loop as if it was initialized as false, but then immediately set it to false inside the loop. The following should be a bit more straight forward and accomplish the same thing,

var hasIntersectingPairs = true;
while (hasIntersectingPairs)

 hasIntersectingPairs = false;
 ...

Putting My Suggestions Together

static void Main(string args)

 var connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;
 var clusters = connections.Select(x => x.Item2).ToList();

 var hasIntersectingPairs = true;
 while (hasIntersectingPairs)
 
 hasIntersectingPairs = false;
 var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
 if (cluster != null)
 
 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 hasIntersectingPairs = true;
 
 

 foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output still Matches

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

answered Jun 25 at 18:20

Shelby115

1,354516

add a commentÂ |Â

up vote
1
down vote

Look good to me.

Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.

edited Jun 25 at 15:58

answered Jun 25 at 13:23

paparazzo

4,8131730

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197189%2fbuild-clusters-from-connections-virus-propagation%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
5
down vote

accepted

I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:

IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);

This will make it reusable (and so testable).

Tuples

What is Tuple<string, List<string>>? What are Item1 and Item2? Give it a meaningful type, with meaningful member names!

public class Person

 public Person(string name, IReadOnlyCollection<string> places)
 
 Name = name;
 Places = places;
 

 public string Name get; 
 public IReadOnlyCollection<string> Places get;

Complexity

Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.

I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n).

Disjoint Sets

Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b) time-complexity (max tree-depth is O(n)) on this problem.

A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)

edited Jun 26 at 9:59

answered Jun 25 at 17:44

VisualMelon

2,105716

add a commentÂ |Â

up vote
5
down vote

accepted

I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:

IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);

This will make it reusable (and so testable).

Tuples

What is Tuple<string, List<string>>? What are Item1 and Item2? Give it a meaningful type, with meaningful member names!

public class Person

 public Person(string name, IReadOnlyCollection<string> places)
 
 Name = name;
 Places = places;
 

 public string Name get; 
 public IReadOnlyCollection<string> Places get;

Complexity

Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.

I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n).

Disjoint Sets

Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b) time-complexity (max tree-depth is O(n)) on this problem.

A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)

edited Jun 26 at 9:59

answered Jun 25 at 17:44

VisualMelon

2,105716

add a commentÂ |Â

up vote
5
down vote

accepted

I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:

IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);

This will make it reusable (and so testable).

Tuples

What is Tuple<string, List<string>>? What are Item1 and Item2? Give it a meaningful type, with meaningful member names!

public class Person

 public Person(string name, IReadOnlyCollection<string> places)
 
 Name = name;
 Places = places;
 

 public string Name get; 
 public IReadOnlyCollection<string> Places get;

Complexity

Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.

I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n).

Disjoint Sets

Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b) time-complexity (max tree-depth is O(n)) on this problem.

A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)

edited Jun 26 at 9:59

answered Jun 25 at 17:44

VisualMelon

2,105716

I always feel odd reviewing code like this, because there is no API. You should really pull out the clustering code into it's own method, with a nice signature, something like:

IEnumerable<IReadOnlyCollection<string>> Cluster(IEnumerable<Person> people);

This will make it reusable (and so testable).

Tuples

What is Tuple<string, List<string>>? What are Item1 and Item2? Give it a meaningful type, with meaningful member names!

public class Person

 public Person(string name, IReadOnlyCollection<string> places)
 
 Name = name;
 Places = places;
 

 public string Name get; 
 public IReadOnlyCollection<string> Places get;

Complexity

Never been much good at complexity... but here's an attempt at answering your second question. Take it with spoonful of salt. Hopefully someone will poke holes in it promptly.

I can't think how the space complexity of your code could be anything other than the input space complexity of O(b + n).

Disjoint Sets

Any problem like this I would solve with disjoint sets, and even a poor implementation should get you O(n*b) time-complexity (max tree-depth is O(n)) on this problem.

A gist of the disjoint sets implementation and cluster method is provided so that people can find juicy bugs in it: 1 bug fixed so far (2018-06-26)

edited Jun 26 at 9:59

answered Jun 25 at 17:44

VisualMelon

2,105716

edited Jun 26 at 9:59

answered Jun 25 at 17:44

VisualMelon

2,105716

answered Jun 25 at 17:44

VisualMelon

2,105716

answered Jun 25 at 17:44

VisualMelon

2,105716

add a commentÂ |Â

up vote
3
down vote

Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).

Consistency

I would make bool proceed = false; into var proceed = false; to keep consistent with your usage of var.

Naming

Casing

Typically you would make local variables such as Connections and Clusters lower-case (i.e. var connections = ...; and var clusters = ...;).

Unnecessary Foreach

I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault and an if-statement.

var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)

 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 willProceed = true;

do/while vs while

var hasIntersectingPairs = true;
while (hasIntersectingPairs)

 hasIntersectingPairs = false;
 ...

Putting My Suggestions Together

static void Main(string args)

 var connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;
 var clusters = connections.Select(x => x.Item2).ToList();

 var hasIntersectingPairs = true;
 while (hasIntersectingPairs)
 
 hasIntersectingPairs = false;
 var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
 if (cluster != null)
 
 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 hasIntersectingPairs = true;
 
 

 foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output still Matches

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

answered Jun 25 at 18:20

Shelby115

1,354516

add a commentÂ |Â

up vote
3
down vote

Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).

Consistency

I would make bool proceed = false; into var proceed = false; to keep consistent with your usage of var.

Naming

Casing

Typically you would make local variables such as Connections and Clusters lower-case (i.e. var connections = ...; and var clusters = ...;).

Unnecessary Foreach

I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault and an if-statement.

var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)

 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 willProceed = true;

do/while vs while

var hasIntersectingPairs = true;
while (hasIntersectingPairs)

 hasIntersectingPairs = false;
 ...

Putting My Suggestions Together

static void Main(string args)

 var connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;
 var clusters = connections.Select(x => x.Item2).ToList();

 var hasIntersectingPairs = true;
 while (hasIntersectingPairs)
 
 hasIntersectingPairs = false;
 var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
 if (cluster != null)
 
 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 hasIntersectingPairs = true;
 
 

 foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output still Matches

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

answered Jun 25 at 18:20

Shelby115

1,354516

add a commentÂ |Â

up vote
3
down vote

Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).

Consistency

I would make bool proceed = false; into var proceed = false; to keep consistent with your usage of var.

Naming

Casing

Typically you would make local variables such as Connections and Clusters lower-case (i.e. var connections = ...; and var clusters = ...;).

Unnecessary Foreach

I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault and an if-statement.

var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)

 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 willProceed = true;

do/while vs while

var hasIntersectingPairs = true;
while (hasIntersectingPairs)

 hasIntersectingPairs = false;
 ...

Putting My Suggestions Together

static void Main(string args)

 var connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;
 var clusters = connections.Select(x => x.Item2).ToList();

 var hasIntersectingPairs = true;
 while (hasIntersectingPairs)
 
 hasIntersectingPairs = false;
 var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
 if (cluster != null)
 
 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 hasIntersectingPairs = true;
 
 

 foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output still Matches

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

answered Jun 25 at 18:20

Shelby115

1,354516

Note that a decent amount of this is preference but I do provide reasoning for it below (I try to encourage improved code readability).

Consistency

I would make bool proceed = false; into var proceed = false; to keep consistent with your usage of var.

Naming

Casing

Typically you would make local variables such as Connections and Clusters lower-case (i.e. var connections = ...; and var clusters = ...;).

Unnecessary Foreach

I might be wrong, but I'm pretty sure your foreach statement can be replaced with LINQs FirstOrDefault and an if-statement.

var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
if (cluster != null)

 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 willProceed = true;

do/while vs while

var hasIntersectingPairs = true;
while (hasIntersectingPairs)

 hasIntersectingPairs = false;
 ...

Putting My Suggestions Together

static void Main(string args)

 var connections = new List<Tuple<string, List<string>>>()
 
 new Tuple<string, List<string>>("O1", new List<string>() "P1", "P2" ),
 new Tuple<string, List<string>>("O2", new List<string>() "P2", "P3", "P4" ),
 new Tuple<string, List<string>>("O3", new List<string>() "P8" ),
 new Tuple<string, List<string>>("O4", new List<string>() "P5", "P11" ),
 new Tuple<string, List<string>>("O5", new List<string>() "P2", "P3" ),
 new Tuple<string, List<string>>("O6", new List<string>() "P4"),
 new Tuple<string, List<string>>("O7", new List<string>() "P7", "P10" ),
 new Tuple<string, List<string>>("O8", new List<string>() "P5", "P6", "P7", "P12" ),
 new Tuple<string, List<string>>("O9", new List<string>() "P7", "P11" ),
 new Tuple<string, List<string>>("O10", new List<string>() "P13" ),
 new Tuple<string, List<string>>("O11", new List<string>() "P8", "P9" ),
 ;
 var clusters = connections.Select(x => x.Item2).ToList();

 var hasIntersectingPairs = true;
 while (hasIntersectingPairs)
 
 hasIntersectingPairs = false;
 var pairs = clusters.SelectMany(x => clusters, (x, y) => Tuple.Create(x, y)).Where(x => x.Item1 != x.Item2);

 var cluster = pairs.FirstOrDefault(p => p.Item1.Intersect(p.Item2).Any());
 if (cluster != null)
 
 cluster.Item1.AddRange(cluster.Item2.Where(x => !cluster.Item1.Contains(x)));
 clusters.Remove(cluster.Item2);
 hasIntersectingPairs = true;
 
 

 foreach (var cluster in clusters.OrderByDescending(x => x.Count()))
 
 Console.WriteLine(cluster.OrderBy(x => x).Aggregate((i, j) => i + ", " + j));
 
 Console.Read();

Output still Matches

P10, P11, P12, P5, P6, P7
P1, P2, P3, P4
P8, P9
P13

answered Jun 25 at 18:20

Shelby115

1,354516

answered Jun 25 at 18:20

Shelby115

1,354516

answered Jun 25 at 18:20

Shelby115

1,354516

answered Jun 25 at 18:20

Shelby115

1,354516

add a commentÂ |Â

up vote
1
down vote

Look good to me.

Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.

edited Jun 25 at 15:58

answered Jun 25 at 13:23

paparazzo

4,8131730

add a commentÂ |Â

up vote
1
down vote

Look good to me.

Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.

edited Jun 25 at 15:58

answered Jun 25 at 13:23

paparazzo

4,8131730

add a commentÂ |Â

up vote
1
down vote

Look good to me.

Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.

edited Jun 25 at 15:58

answered Jun 25 at 13:23

paparazzo

4,8131730

Look good to me.

Not sure it matters but you are creating all pairs. (A,B) and (B,A). If you only created unique combinations you would cut the count in 1/2.

edited Jun 25 at 15:58

answered Jun 25 at 13:23

paparazzo

4,8131730

edited Jun 25 at 15:58

answered Jun 25 at 13:23

paparazzo

4,8131730

answered Jun 25 at 13:23

paparazzo

4,8131730

answered Jun 25 at 13:23

paparazzo

4,8131730

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Ft1pi3qg26pAfl9oRvuZhg3CDFxWaVcx,51RcFyUJUV40O3Qpb,2G ErAzI

搜尋此網誌

trjhtr