Performance of C++ Union-Find solution is strangely poor

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
7
down vote

favorite
2












Problem Summary (USACO Gold 3 February 2016)




Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.



Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.




Solution



This is clearly a disjoint-set union problem (union-find/Kruskals), but the tricky part is building the edge-list and tree to union. Each edge is weighted by the length of fence between the two nodes that it connects.



Here is a quick outline of the code below:



  1. Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")


  2. For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.


  3. Run Kruskals, keeping track of distance.


This is an O(ElogV) algorithm, which should be more than fast enough since N and M are less than 2000. Note that the given solution (linked above) is fast enough, where this takes to long (failure) for the last 4/10 test cases.



Download Test Cases (10)



Where is the performance bottleneck? How can this be improved?



#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge
ll dist;
int start;
int finish;
bool operator< (const edge &rhs) const
;

struct uf_node
int parent;
int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
fin >> A >> B >> N >> M;

vector<int> vert(N+2);
vert[0] = 0;
for(int i = 1; i<=N; i++) fin >> vert[i];
vert[N+1] = A;
N++;

vector<int> hori(M+2);
hori[0] = 0;
for(int i = 1; i<=M; i++) fin >> hori[i];
hori[M+1] = B;
M++;

sort(vert.begin(), vert.end());
sort(hori.begin(), hori.end());

tree.resize(N*M);
for(int i = 1; i<=N; i++)
for(int j = 1; j<=M; j++)
int curr = (i-1)*M+(j-1);

//Add node to DSU tree
uf_node n = .parent = curr, .level = 0;
tree[curr] = n;

//Add leftward edge if not at bottom
if(i != N)
edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
edge_list.insert(left);


//Add upward edge if not at right
if(j != M)
edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
edge_list.insert(up);





int find_par(int i)
if(i != tree[i].parent)
tree[i].parent = find_par(tree[i].parent);

return tree[i].parent;


void do_union(int i, int j)
int r = find_par(i);
int s = find_par(j);
if(r == s) return;
else if (tree[r].level > tree[s].level)
tree[r].parent = s;
else if (tree[s].level > tree[r].level)
tree[s].parent = r;
else
tree[r].parent = s;
tree[r].level += 1;



int main()

ifstream fin (PROJ_NAME + ".in");
ofstream fout (PROJ_NAME + ".out");

make_edge_list(fin);

int total_dist = 0;
for(edge e: edge_list)
if(find_par(e.start) != find_par(e.finish))
do_union(e.start, e.finish);
total_dist += e.dist;



fout << total_dist << endl;
return 0;







share|improve this question

















  • 1




    tree[r].level > tree[r].level - a typo?
    – vnp
    Jan 30 at 1:22










  • @vnp Edited, did not fix problem. Good catch.
    – Evan Weissburg
    Jan 30 at 1:26











  • Should the if in find_par be if (i == tree[i].parent)?
    – 1201ProgramAlarm
    Jan 30 at 1:48






  • 1




    Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
    – user1118321
    Jan 30 at 3:47






  • 1




    @user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
    – Evan Weissburg
    Jan 30 at 3:49
















up vote
7
down vote

favorite
2












Problem Summary (USACO Gold 3 February 2016)




Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.



Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.




Solution



This is clearly a disjoint-set union problem (union-find/Kruskals), but the tricky part is building the edge-list and tree to union. Each edge is weighted by the length of fence between the two nodes that it connects.



Here is a quick outline of the code below:



  1. Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")


  2. For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.


  3. Run Kruskals, keeping track of distance.


This is an O(ElogV) algorithm, which should be more than fast enough since N and M are less than 2000. Note that the given solution (linked above) is fast enough, where this takes to long (failure) for the last 4/10 test cases.



Download Test Cases (10)



Where is the performance bottleneck? How can this be improved?



#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge
ll dist;
int start;
int finish;
bool operator< (const edge &rhs) const
;

struct uf_node
int parent;
int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
fin >> A >> B >> N >> M;

vector<int> vert(N+2);
vert[0] = 0;
for(int i = 1; i<=N; i++) fin >> vert[i];
vert[N+1] = A;
N++;

vector<int> hori(M+2);
hori[0] = 0;
for(int i = 1; i<=M; i++) fin >> hori[i];
hori[M+1] = B;
M++;

sort(vert.begin(), vert.end());
sort(hori.begin(), hori.end());

tree.resize(N*M);
for(int i = 1; i<=N; i++)
for(int j = 1; j<=M; j++)
int curr = (i-1)*M+(j-1);

//Add node to DSU tree
uf_node n = .parent = curr, .level = 0;
tree[curr] = n;

//Add leftward edge if not at bottom
if(i != N)
edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
edge_list.insert(left);


//Add upward edge if not at right
if(j != M)
edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
edge_list.insert(up);





int find_par(int i)
if(i != tree[i].parent)
tree[i].parent = find_par(tree[i].parent);

return tree[i].parent;


void do_union(int i, int j)
int r = find_par(i);
int s = find_par(j);
if(r == s) return;
else if (tree[r].level > tree[s].level)
tree[r].parent = s;
else if (tree[s].level > tree[r].level)
tree[s].parent = r;
else
tree[r].parent = s;
tree[r].level += 1;



int main()

ifstream fin (PROJ_NAME + ".in");
ofstream fout (PROJ_NAME + ".out");

make_edge_list(fin);

int total_dist = 0;
for(edge e: edge_list)
if(find_par(e.start) != find_par(e.finish))
do_union(e.start, e.finish);
total_dist += e.dist;



fout << total_dist << endl;
return 0;







share|improve this question

















  • 1




    tree[r].level > tree[r].level - a typo?
    – vnp
    Jan 30 at 1:22










  • @vnp Edited, did not fix problem. Good catch.
    – Evan Weissburg
    Jan 30 at 1:26











  • Should the if in find_par be if (i == tree[i].parent)?
    – 1201ProgramAlarm
    Jan 30 at 1:48






  • 1




    Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
    – user1118321
    Jan 30 at 3:47






  • 1




    @user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
    – Evan Weissburg
    Jan 30 at 3:49












up vote
7
down vote

favorite
2









up vote
7
down vote

favorite
2






2





Problem Summary (USACO Gold 3 February 2016)




Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.



Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.




Solution



This is clearly a disjoint-set union problem (union-find/Kruskals), but the tricky part is building the edge-list and tree to union. Each edge is weighted by the length of fence between the two nodes that it connects.



Here is a quick outline of the code below:



  1. Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")


  2. For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.


  3. Run Kruskals, keeping track of distance.


This is an O(ElogV) algorithm, which should be more than fast enough since N and M are less than 2000. Note that the given solution (linked above) is fast enough, where this takes to long (failure) for the last 4/10 test cases.



Download Test Cases (10)



Where is the performance bottleneck? How can this be improved?



#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge
ll dist;
int start;
int finish;
bool operator< (const edge &rhs) const
;

struct uf_node
int parent;
int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
fin >> A >> B >> N >> M;

vector<int> vert(N+2);
vert[0] = 0;
for(int i = 1; i<=N; i++) fin >> vert[i];
vert[N+1] = A;
N++;

vector<int> hori(M+2);
hori[0] = 0;
for(int i = 1; i<=M; i++) fin >> hori[i];
hori[M+1] = B;
M++;

sort(vert.begin(), vert.end());
sort(hori.begin(), hori.end());

tree.resize(N*M);
for(int i = 1; i<=N; i++)
for(int j = 1; j<=M; j++)
int curr = (i-1)*M+(j-1);

//Add node to DSU tree
uf_node n = .parent = curr, .level = 0;
tree[curr] = n;

//Add leftward edge if not at bottom
if(i != N)
edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
edge_list.insert(left);


//Add upward edge if not at right
if(j != M)
edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
edge_list.insert(up);





int find_par(int i)
if(i != tree[i].parent)
tree[i].parent = find_par(tree[i].parent);

return tree[i].parent;


void do_union(int i, int j)
int r = find_par(i);
int s = find_par(j);
if(r == s) return;
else if (tree[r].level > tree[s].level)
tree[r].parent = s;
else if (tree[s].level > tree[r].level)
tree[s].parent = r;
else
tree[r].parent = s;
tree[r].level += 1;



int main()

ifstream fin (PROJ_NAME + ".in");
ofstream fout (PROJ_NAME + ".out");

make_edge_list(fin);

int total_dist = 0;
for(edge e: edge_list)
if(find_par(e.start) != find_par(e.finish))
do_union(e.start, e.finish);
total_dist += e.dist;



fout << total_dist << endl;
return 0;







share|improve this question













Problem Summary (USACO Gold 3 February 2016)




Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.



Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.




Solution



This is clearly a disjoint-set union problem (union-find/Kruskals), but the tricky part is building the edge-list and tree to union. Each edge is weighted by the length of fence between the two nodes that it connects.



Here is a quick outline of the code below:



  1. Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")


  2. For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.


  3. Run Kruskals, keeping track of distance.


This is an O(ElogV) algorithm, which should be more than fast enough since N and M are less than 2000. Note that the given solution (linked above) is fast enough, where this takes to long (failure) for the last 4/10 test cases.



Download Test Cases (10)



Where is the performance bottleneck? How can this be improved?



#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge
ll dist;
int start;
int finish;
bool operator< (const edge &rhs) const
;

struct uf_node
int parent;
int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
fin >> A >> B >> N >> M;

vector<int> vert(N+2);
vert[0] = 0;
for(int i = 1; i<=N; i++) fin >> vert[i];
vert[N+1] = A;
N++;

vector<int> hori(M+2);
hori[0] = 0;
for(int i = 1; i<=M; i++) fin >> hori[i];
hori[M+1] = B;
M++;

sort(vert.begin(), vert.end());
sort(hori.begin(), hori.end());

tree.resize(N*M);
for(int i = 1; i<=N; i++)
for(int j = 1; j<=M; j++)
int curr = (i-1)*M+(j-1);

//Add node to DSU tree
uf_node n = .parent = curr, .level = 0;
tree[curr] = n;

//Add leftward edge if not at bottom
if(i != N)
edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
edge_list.insert(left);


//Add upward edge if not at right
if(j != M)
edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
edge_list.insert(up);





int find_par(int i)
if(i != tree[i].parent)
tree[i].parent = find_par(tree[i].parent);

return tree[i].parent;


void do_union(int i, int j)
int r = find_par(i);
int s = find_par(j);
if(r == s) return;
else if (tree[r].level > tree[s].level)
tree[r].parent = s;
else if (tree[s].level > tree[r].level)
tree[s].parent = r;
else
tree[r].parent = s;
tree[r].level += 1;



int main()

ifstream fin (PROJ_NAME + ".in");
ofstream fout (PROJ_NAME + ".out");

make_edge_list(fin);

int total_dist = 0;
for(edge e: edge_list)
if(find_par(e.start) != find_par(e.finish))
do_union(e.start, e.finish);
total_dist += e.dist;



fout << total_dist << endl;
return 0;









share|improve this question












share|improve this question




share|improve this question








edited Jan 31 at 5:01









Heslacher

43.9k359152




43.9k359152









asked Jan 29 at 23:55









Evan Weissburg

1364




1364







  • 1




    tree[r].level > tree[r].level - a typo?
    – vnp
    Jan 30 at 1:22










  • @vnp Edited, did not fix problem. Good catch.
    – Evan Weissburg
    Jan 30 at 1:26











  • Should the if in find_par be if (i == tree[i].parent)?
    – 1201ProgramAlarm
    Jan 30 at 1:48






  • 1




    Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
    – user1118321
    Jan 30 at 3:47






  • 1




    @user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
    – Evan Weissburg
    Jan 30 at 3:49












  • 1




    tree[r].level > tree[r].level - a typo?
    – vnp
    Jan 30 at 1:22










  • @vnp Edited, did not fix problem. Good catch.
    – Evan Weissburg
    Jan 30 at 1:26











  • Should the if in find_par be if (i == tree[i].parent)?
    – 1201ProgramAlarm
    Jan 30 at 1:48






  • 1




    Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
    – user1118321
    Jan 30 at 3:47






  • 1




    @user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
    – Evan Weissburg
    Jan 30 at 3:49







1




1




tree[r].level > tree[r].level - a typo?
– vnp
Jan 30 at 1:22




tree[r].level > tree[r].level - a typo?
– vnp
Jan 30 at 1:22












@vnp Edited, did not fix problem. Good catch.
– Evan Weissburg
Jan 30 at 1:26





@vnp Edited, did not fix problem. Good catch.
– Evan Weissburg
Jan 30 at 1:26













Should the if in find_par be if (i == tree[i].parent)?
– 1201ProgramAlarm
Jan 30 at 1:48




Should the if in find_par be if (i == tree[i].parent)?
– 1201ProgramAlarm
Jan 30 at 1:48




1




1




Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
– user1118321
Jan 30 at 3:47




Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
– user1118321
Jan 30 at 3:47




1




1




@user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
– Evan Weissburg
Jan 30 at 3:49




@user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
– Evan Weissburg
Jan 30 at 3:49










1 Answer
1






active

oldest

votes

















up vote
5
down vote













This is a really interesting problem! Here are some things I noticed about your implementation.



Performance



Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:



  • 85% in make_edge_list(), particularly the 2 calls to set::insert()

  • 7% in find_par()

  • 7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

What this means, I think, is that std::set is maybe not the best choice of container in this case. Since you can't preallocate space for items in a set like you can with a vector or array, it has to do the allocations on-the-fly, which can hamper performance. Also, choosing where to insert things seems to be part of the time sink of insert() as well. With a vector you can preallocate memory, and then push_back() just adds it to the end. It's not clear what you're getting by using a set here, so I say ditch it. The linked solution uses only a single vector and a static array of ~2000 x 2000 elements and does some sorting. So that might be a winning strategy here.



Naming



Macros have all sorts of problems. But even when you manage to write a macro that won't have any of those problems, as you did here, if there's a better way, you should use it. (For example, what if someone modifying this code makes a variable named ll or writes a string with the word "all" in it?) If you want your own shorter name for long long, then do:



using ll = long long;


or at the very least:



typedef long long ll;


But really - too lazy to type long long? Just type it out. It's a well-known type.



I'm not a fan of 1-letter variable names in most cases. The values A and B are the dimensions. I'd name them width and height. N and M are the number of vertical and horizontal fences, so why not name them num_vert_fences and num_horz_fences? Or even just rows and columns?



Avoid using namespace std



You should really avoid using using namespace std for all the reasons pointed out there.



Globals



The use of globals made it hard to read your code. Down in do_union(), I had to scroll all the way back up to the top of the file to see what type tree was. You should define these variables in main() and pass them to the functions that use them. That will make it much easier to understand the flow of the program and avoid problems where some function changes a variable you didn't pass it while you're using the same variable in another function.



I mentioned in my comment that the globals screwed up my performance analysis. This is a great example of the unintended consequences of globals. Because the code was running too fast to analyze, I took main(), renamed it main2(), and then wrote a new main() to simply call it 100 times in a loop. But because the tree and edge_list are global, they didn't get cleared out when main2() exited as they would have if they had been local. So that resulted in the numbers of my analysis being incorrect because the containers can have different characteristics when there are many more items in them than when there are fewer. Lesson learned. I should have made the variables local before running my performance test.



Be Careful with Less-Well-Known Tools



I really like this method of initializing the members of a struct or union:



uf_node n = .parent = curr, .level = 0;


I recently started using it and coworkers with many years of experience have been expressing surprise and concern over it! So just know that you might confuse some readers with it. It's the type of thing I'd like to see catch on, though, as it improves readability once you understand it.






share|improve this answer























  • Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
    – Jorge Bellón
    Jan 30 at 9:33











  • This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
    – Evan Weissburg
    Jan 30 at 12:07










  • Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
    – user1118321
    Jan 30 at 16:32










  • OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
    – user1118321
    Jan 31 at 2:43










  • It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
    – Evan Weissburg
    Jan 31 at 3:30











Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186290%2fperformance-of-c-union-find-solution-is-strangely-poor%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote













This is a really interesting problem! Here are some things I noticed about your implementation.



Performance



Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:



  • 85% in make_edge_list(), particularly the 2 calls to set::insert()

  • 7% in find_par()

  • 7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

What this means, I think, is that std::set is maybe not the best choice of container in this case. Since you can't preallocate space for items in a set like you can with a vector or array, it has to do the allocations on-the-fly, which can hamper performance. Also, choosing where to insert things seems to be part of the time sink of insert() as well. With a vector you can preallocate memory, and then push_back() just adds it to the end. It's not clear what you're getting by using a set here, so I say ditch it. The linked solution uses only a single vector and a static array of ~2000 x 2000 elements and does some sorting. So that might be a winning strategy here.



Naming



Macros have all sorts of problems. But even when you manage to write a macro that won't have any of those problems, as you did here, if there's a better way, you should use it. (For example, what if someone modifying this code makes a variable named ll or writes a string with the word "all" in it?) If you want your own shorter name for long long, then do:



using ll = long long;


or at the very least:



typedef long long ll;


But really - too lazy to type long long? Just type it out. It's a well-known type.



I'm not a fan of 1-letter variable names in most cases. The values A and B are the dimensions. I'd name them width and height. N and M are the number of vertical and horizontal fences, so why not name them num_vert_fences and num_horz_fences? Or even just rows and columns?



Avoid using namespace std



You should really avoid using using namespace std for all the reasons pointed out there.



Globals



The use of globals made it hard to read your code. Down in do_union(), I had to scroll all the way back up to the top of the file to see what type tree was. You should define these variables in main() and pass them to the functions that use them. That will make it much easier to understand the flow of the program and avoid problems where some function changes a variable you didn't pass it while you're using the same variable in another function.



I mentioned in my comment that the globals screwed up my performance analysis. This is a great example of the unintended consequences of globals. Because the code was running too fast to analyze, I took main(), renamed it main2(), and then wrote a new main() to simply call it 100 times in a loop. But because the tree and edge_list are global, they didn't get cleared out when main2() exited as they would have if they had been local. So that resulted in the numbers of my analysis being incorrect because the containers can have different characteristics when there are many more items in them than when there are fewer. Lesson learned. I should have made the variables local before running my performance test.



Be Careful with Less-Well-Known Tools



I really like this method of initializing the members of a struct or union:



uf_node n = .parent = curr, .level = 0;


I recently started using it and coworkers with many years of experience have been expressing surprise and concern over it! So just know that you might confuse some readers with it. It's the type of thing I'd like to see catch on, though, as it improves readability once you understand it.






share|improve this answer























  • Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
    – Jorge Bellón
    Jan 30 at 9:33











  • This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
    – Evan Weissburg
    Jan 30 at 12:07










  • Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
    – user1118321
    Jan 30 at 16:32










  • OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
    – user1118321
    Jan 31 at 2:43










  • It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
    – Evan Weissburg
    Jan 31 at 3:30















up vote
5
down vote













This is a really interesting problem! Here are some things I noticed about your implementation.



Performance



Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:



  • 85% in make_edge_list(), particularly the 2 calls to set::insert()

  • 7% in find_par()

  • 7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

What this means, I think, is that std::set is maybe not the best choice of container in this case. Since you can't preallocate space for items in a set like you can with a vector or array, it has to do the allocations on-the-fly, which can hamper performance. Also, choosing where to insert things seems to be part of the time sink of insert() as well. With a vector you can preallocate memory, and then push_back() just adds it to the end. It's not clear what you're getting by using a set here, so I say ditch it. The linked solution uses only a single vector and a static array of ~2000 x 2000 elements and does some sorting. So that might be a winning strategy here.



Naming



Macros have all sorts of problems. But even when you manage to write a macro that won't have any of those problems, as you did here, if there's a better way, you should use it. (For example, what if someone modifying this code makes a variable named ll or writes a string with the word "all" in it?) If you want your own shorter name for long long, then do:



using ll = long long;


or at the very least:



typedef long long ll;


But really - too lazy to type long long? Just type it out. It's a well-known type.



I'm not a fan of 1-letter variable names in most cases. The values A and B are the dimensions. I'd name them width and height. N and M are the number of vertical and horizontal fences, so why not name them num_vert_fences and num_horz_fences? Or even just rows and columns?



Avoid using namespace std



You should really avoid using using namespace std for all the reasons pointed out there.



Globals



The use of globals made it hard to read your code. Down in do_union(), I had to scroll all the way back up to the top of the file to see what type tree was. You should define these variables in main() and pass them to the functions that use them. That will make it much easier to understand the flow of the program and avoid problems where some function changes a variable you didn't pass it while you're using the same variable in another function.



I mentioned in my comment that the globals screwed up my performance analysis. This is a great example of the unintended consequences of globals. Because the code was running too fast to analyze, I took main(), renamed it main2(), and then wrote a new main() to simply call it 100 times in a loop. But because the tree and edge_list are global, they didn't get cleared out when main2() exited as they would have if they had been local. So that resulted in the numbers of my analysis being incorrect because the containers can have different characteristics when there are many more items in them than when there are fewer. Lesson learned. I should have made the variables local before running my performance test.



Be Careful with Less-Well-Known Tools



I really like this method of initializing the members of a struct or union:



uf_node n = .parent = curr, .level = 0;


I recently started using it and coworkers with many years of experience have been expressing surprise and concern over it! So just know that you might confuse some readers with it. It's the type of thing I'd like to see catch on, though, as it improves readability once you understand it.






share|improve this answer























  • Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
    – Jorge Bellón
    Jan 30 at 9:33











  • This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
    – Evan Weissburg
    Jan 30 at 12:07










  • Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
    – user1118321
    Jan 30 at 16:32










  • OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
    – user1118321
    Jan 31 at 2:43










  • It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
    – Evan Weissburg
    Jan 31 at 3:30













up vote
5
down vote










up vote
5
down vote









This is a really interesting problem! Here are some things I noticed about your implementation.



Performance



Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:



  • 85% in make_edge_list(), particularly the 2 calls to set::insert()

  • 7% in find_par()

  • 7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

What this means, I think, is that std::set is maybe not the best choice of container in this case. Since you can't preallocate space for items in a set like you can with a vector or array, it has to do the allocations on-the-fly, which can hamper performance. Also, choosing where to insert things seems to be part of the time sink of insert() as well. With a vector you can preallocate memory, and then push_back() just adds it to the end. It's not clear what you're getting by using a set here, so I say ditch it. The linked solution uses only a single vector and a static array of ~2000 x 2000 elements and does some sorting. So that might be a winning strategy here.



Naming



Macros have all sorts of problems. But even when you manage to write a macro that won't have any of those problems, as you did here, if there's a better way, you should use it. (For example, what if someone modifying this code makes a variable named ll or writes a string with the word "all" in it?) If you want your own shorter name for long long, then do:



using ll = long long;


or at the very least:



typedef long long ll;


But really - too lazy to type long long? Just type it out. It's a well-known type.



I'm not a fan of 1-letter variable names in most cases. The values A and B are the dimensions. I'd name them width and height. N and M are the number of vertical and horizontal fences, so why not name them num_vert_fences and num_horz_fences? Or even just rows and columns?



Avoid using namespace std



You should really avoid using using namespace std for all the reasons pointed out there.



Globals



The use of globals made it hard to read your code. Down in do_union(), I had to scroll all the way back up to the top of the file to see what type tree was. You should define these variables in main() and pass them to the functions that use them. That will make it much easier to understand the flow of the program and avoid problems where some function changes a variable you didn't pass it while you're using the same variable in another function.



I mentioned in my comment that the globals screwed up my performance analysis. This is a great example of the unintended consequences of globals. Because the code was running too fast to analyze, I took main(), renamed it main2(), and then wrote a new main() to simply call it 100 times in a loop. But because the tree and edge_list are global, they didn't get cleared out when main2() exited as they would have if they had been local. So that resulted in the numbers of my analysis being incorrect because the containers can have different characteristics when there are many more items in them than when there are fewer. Lesson learned. I should have made the variables local before running my performance test.



Be Careful with Less-Well-Known Tools



I really like this method of initializing the members of a struct or union:



uf_node n = .parent = curr, .level = 0;


I recently started using it and coworkers with many years of experience have been expressing surprise and concern over it! So just know that you might confuse some readers with it. It's the type of thing I'd like to see catch on, though, as it improves readability once you understand it.






share|improve this answer















This is a really interesting problem! Here are some things I noticed about your implementation.



Performance



Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:



  • 85% in make_edge_list(), particularly the 2 calls to set::insert()

  • 7% in find_par()

  • 7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

What this means, I think, is that std::set is maybe not the best choice of container in this case. Since you can't preallocate space for items in a set like you can with a vector or array, it has to do the allocations on-the-fly, which can hamper performance. Also, choosing where to insert things seems to be part of the time sink of insert() as well. With a vector you can preallocate memory, and then push_back() just adds it to the end. It's not clear what you're getting by using a set here, so I say ditch it. The linked solution uses only a single vector and a static array of ~2000 x 2000 elements and does some sorting. So that might be a winning strategy here.



Naming



Macros have all sorts of problems. But even when you manage to write a macro that won't have any of those problems, as you did here, if there's a better way, you should use it. (For example, what if someone modifying this code makes a variable named ll or writes a string with the word "all" in it?) If you want your own shorter name for long long, then do:



using ll = long long;


or at the very least:



typedef long long ll;


But really - too lazy to type long long? Just type it out. It's a well-known type.



I'm not a fan of 1-letter variable names in most cases. The values A and B are the dimensions. I'd name them width and height. N and M are the number of vertical and horizontal fences, so why not name them num_vert_fences and num_horz_fences? Or even just rows and columns?



Avoid using namespace std



You should really avoid using using namespace std for all the reasons pointed out there.



Globals



The use of globals made it hard to read your code. Down in do_union(), I had to scroll all the way back up to the top of the file to see what type tree was. You should define these variables in main() and pass them to the functions that use them. That will make it much easier to understand the flow of the program and avoid problems where some function changes a variable you didn't pass it while you're using the same variable in another function.



I mentioned in my comment that the globals screwed up my performance analysis. This is a great example of the unintended consequences of globals. Because the code was running too fast to analyze, I took main(), renamed it main2(), and then wrote a new main() to simply call it 100 times in a loop. But because the tree and edge_list are global, they didn't get cleared out when main2() exited as they would have if they had been local. So that resulted in the numbers of my analysis being incorrect because the containers can have different characteristics when there are many more items in them than when there are fewer. Lesson learned. I should have made the variables local before running my performance test.



Be Careful with Less-Well-Known Tools



I really like this method of initializing the members of a struct or union:



uf_node n = .parent = curr, .level = 0;


I recently started using it and coworkers with many years of experience have been expressing surprise and concern over it! So just know that you might confuse some readers with it. It's the type of thing I'd like to see catch on, though, as it improves readability once you understand it.







share|improve this answer















share|improve this answer



share|improve this answer








edited Jan 31 at 2:41


























answered Jan 30 at 4:45









user1118321

10.2k11144




10.2k11144











  • Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
    – Jorge Bellón
    Jan 30 at 9:33











  • This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
    – Evan Weissburg
    Jan 30 at 12:07










  • Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
    – user1118321
    Jan 30 at 16:32










  • OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
    – user1118321
    Jan 31 at 2:43










  • It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
    – Evan Weissburg
    Jan 31 at 3:30

















  • Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
    – Jorge Bellón
    Jan 30 at 9:33











  • This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
    – Evan Weissburg
    Jan 30 at 12:07










  • Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
    – user1118321
    Jan 30 at 16:32










  • OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
    – user1118321
    Jan 31 at 2:43










  • It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
    – Evan Weissburg
    Jan 31 at 3:30
















Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
– Jorge Bellón
Jan 30 at 9:33





Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
– Jorge Bellón
Jan 30 at 9:33













This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
– Evan Weissburg
Jan 30 at 12:07




This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
– Evan Weissburg
Jan 30 at 12:07












Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
– user1118321
Jan 30 at 16:32




Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
– user1118321
Jan 30 at 16:32












OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
– user1118321
Jan 31 at 2:43




OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
– user1118321
Jan 31 at 2:43












It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
– Evan Weissburg
Jan 31 at 3:30





It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
– Evan Weissburg
Jan 31 at 3:30













 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186290%2fperformance-of-c-union-find-solution-is-strangely-poor%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods