Performance of C++ Union-Find solution is strangely poor

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
7
down vote

favorite

Problem Summary (USACO Gold 3 February 2016)

Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.

Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.

Solution

This is clearly a disjoint-set union problem (union-find/Kruskals), but the tricky part is building the edge-list and tree to union. Each edge is weighted by the length of fence between the two nodes that it connects.

Here is a quick outline of the code below:

Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")

For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.

Run Kruskals, keeping track of distance.

This is an O(ElogV) algorithm, which should be more than fast enough since N and M are less than 2000. Note that the given solution (linked above) is fast enough, where this takes to long (failure) for the last 4/10 test cases.

Download Test Cases (10)

Where is the performance bottleneck? How can this be improved?

#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge 
 ll dist;
 int start;
 int finish;
 bool operator< (const edge &rhs) const 
;

struct uf_node 
 int parent;
 int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
 fin >> A >> B >> N >> M;

 vector<int> vert(N+2);
 vert[0] = 0;
 for(int i = 1; i<=N; i++) fin >> vert[i];
 vert[N+1] = A;
 N++;

 vector<int> hori(M+2);
 hori[0] = 0;
 for(int i = 1; i<=M; i++) fin >> hori[i];
 hori[M+1] = B;
 M++;

 sort(vert.begin(), vert.end());
 sort(hori.begin(), hori.end());

 tree.resize(N*M);
 for(int i = 1; i<=N; i++)
 for(int j = 1; j<=M; j++)
 int curr = (i-1)*M+(j-1);

 //Add node to DSU tree
 uf_node n = .parent = curr, .level = 0;
 tree[curr] = n;

 //Add leftward edge if not at bottom
 if(i != N)
 edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
 edge_list.insert(left);
 

 //Add upward edge if not at right
 if(j != M)
 edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
 edge_list.insert(up);
 
 
 


int find_par(int i) 
 if(i != tree[i].parent) 
 tree[i].parent = find_par(tree[i].parent);
 
 return tree[i].parent;


void do_union(int i, int j) 
 int r = find_par(i);
 int s = find_par(j);
 if(r == s) return;
 else if (tree[r].level > tree[s].level) 
 tree[r].parent = s;
 else if (tree[s].level > tree[r].level) 
 tree[s].parent = r;
 else 
 tree[r].parent = s;
 tree[r].level += 1;
 


int main()

 ifstream fin (PROJ_NAME + ".in");
 ofstream fout (PROJ_NAME + ".out");

 make_edge_list(fin);

 int total_dist = 0;
 for(edge e: edge_list)
 if(find_par(e.start) != find_par(e.finish))
 do_union(e.start, e.finish);
 total_dist += e.dist;
 
 

 fout << total_dist << endl;
 return 0;

edited Jan 31 at 5:01

Heslacher

43.9k359152

asked Jan 29 at 23:55

Evan Weissburg

1364

1

tree[r].level > tree[r].level - a typo?
â€“Â vnp
Jan 30 at 1:22

@vnp Edited, did not fix problem. Good catch.
â€“Â Evan Weissburg
Jan 30 at 1:26

Should the if in find_par be if (i == tree[i].parent)?
â€“Â 1201ProgramAlarm
Jan 30 at 1:48

1

Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
â€“Â user1118321
Jan 30 at 3:47

1

@user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
â€“Â Evan Weissburg
Jan 30 at 3:49

Â |Â
show 5 more comments

up vote
7
down vote

favorite

Problem Summary (USACO Gold 3 February 2016)

Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.

Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.

Solution

Here is a quick outline of the code below:

Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")

For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.

Run Kruskals, keeping track of distance.

Download Test Cases (10)

Where is the performance bottleneck? How can this be improved?

#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge 
 ll dist;
 int start;
 int finish;
 bool operator< (const edge &rhs) const 
;

struct uf_node 
 int parent;
 int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
 fin >> A >> B >> N >> M;

 vector<int> vert(N+2);
 vert[0] = 0;
 for(int i = 1; i<=N; i++) fin >> vert[i];
 vert[N+1] = A;
 N++;

 vector<int> hori(M+2);
 hori[0] = 0;
 for(int i = 1; i<=M; i++) fin >> hori[i];
 hori[M+1] = B;
 M++;

 sort(vert.begin(), vert.end());
 sort(hori.begin(), hori.end());

 tree.resize(N*M);
 for(int i = 1; i<=N; i++)
 for(int j = 1; j<=M; j++)
 int curr = (i-1)*M+(j-1);

 //Add node to DSU tree
 uf_node n = .parent = curr, .level = 0;
 tree[curr] = n;

 //Add leftward edge if not at bottom
 if(i != N)
 edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
 edge_list.insert(left);
 

 //Add upward edge if not at right
 if(j != M)
 edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
 edge_list.insert(up);
 
 
 


int find_par(int i) 
 if(i != tree[i].parent) 
 tree[i].parent = find_par(tree[i].parent);
 
 return tree[i].parent;


void do_union(int i, int j) 
 int r = find_par(i);
 int s = find_par(j);
 if(r == s) return;
 else if (tree[r].level > tree[s].level) 
 tree[r].parent = s;
 else if (tree[s].level > tree[r].level) 
 tree[s].parent = r;
 else 
 tree[r].parent = s;
 tree[r].level += 1;
 


int main()

 ifstream fin (PROJ_NAME + ".in");
 ofstream fout (PROJ_NAME + ".out");

 make_edge_list(fin);

 int total_dist = 0;
 for(edge e: edge_list)
 if(find_par(e.start) != find_par(e.finish))
 do_union(e.start, e.finish);
 total_dist += e.dist;
 
 

 fout << total_dist << endl;
 return 0;

edited Jan 31 at 5:01

Heslacher

43.9k359152

asked Jan 29 at 23:55

Evan Weissburg

1364

1

tree[r].level > tree[r].level - a typo?
â€“Â vnp
Jan 30 at 1:22

@vnp Edited, did not fix problem. Good catch.
â€“Â Evan Weissburg
Jan 30 at 1:26

Should the if in find_par be if (i == tree[i].parent)?
â€“Â 1201ProgramAlarm
Jan 30 at 1:48

1

Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
â€“Â user1118321
Jan 30 at 3:47

1

@user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
â€“Â Evan Weissburg
Jan 30 at 3:49

Â |Â
show 5 more comments

up vote
7
down vote

favorite

Problem Summary (USACO Gold 3 February 2016)

Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.

Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.

Solution

Here is a quick outline of the code below:

Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")

For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.

Run Kruskals, keeping track of distance.

Download Test Cases (10)

Where is the performance bottleneck? How can this be improved?

#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge 
 ll dist;
 int start;
 int finish;
 bool operator< (const edge &rhs) const 
;

struct uf_node 
 int parent;
 int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
 fin >> A >> B >> N >> M;

 vector<int> vert(N+2);
 vert[0] = 0;
 for(int i = 1; i<=N; i++) fin >> vert[i];
 vert[N+1] = A;
 N++;

 vector<int> hori(M+2);
 hori[0] = 0;
 for(int i = 1; i<=M; i++) fin >> hori[i];
 hori[M+1] = B;
 M++;

 sort(vert.begin(), vert.end());
 sort(hori.begin(), hori.end());

 tree.resize(N*M);
 for(int i = 1; i<=N; i++)
 for(int j = 1; j<=M; j++)
 int curr = (i-1)*M+(j-1);

 //Add node to DSU tree
 uf_node n = .parent = curr, .level = 0;
 tree[curr] = n;

 //Add leftward edge if not at bottom
 if(i != N)
 edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
 edge_list.insert(left);
 

 //Add upward edge if not at right
 if(j != M)
 edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
 edge_list.insert(up);
 
 
 


int find_par(int i) 
 if(i != tree[i].parent) 
 tree[i].parent = find_par(tree[i].parent);
 
 return tree[i].parent;


void do_union(int i, int j) 
 int r = find_par(i);
 int s = find_par(j);
 if(r == s) return;
 else if (tree[r].level > tree[s].level) 
 tree[r].parent = s;
 else if (tree[s].level > tree[r].level) 
 tree[s].parent = r;
 else 
 tree[r].parent = s;
 tree[r].level += 1;
 


int main()

 ifstream fin (PROJ_NAME + ".in");
 ofstream fout (PROJ_NAME + ".out");

 make_edge_list(fin);

 int total_dist = 0;
 for(edge e: edge_list)
 if(find_par(e.start) != find_par(e.finish))
 do_union(e.start, e.finish);
 total_dist += e.dist;
 
 

 fout << total_dist << endl;
 return 0;

edited Jan 31 at 5:01

Heslacher

43.9k359152

asked Jan 29 at 23:55

Evan Weissburg

1364

Problem Summary (USACO Gold 3 February 2016)

Given a rectangular coordinate plane of dimensions (A,B), there are N vertical fences, and M horizontal fences. Clearly, the intersection of these fences within the larger rectangle creates (n+1)(m+1) regions.

Given A,B,N,M, the x-coordinates of the N vertical fences and the y coordinates of the M horizontal fences, find the minimum amount of fence to remove to create a minimum spanning tree (connect all regions). In order to remove a fence and thus union two adjacent regions, the entire length of fence between them must be removed.

Solution

Here is a quick outline of the code below:

Read in x,y coords of vertical, horizontal fences (also add 0,0,A,B as "fences")

For each intersection of the fences (some special cases at ends), add the left-facing and up-facing edges to edge list, and additionally add the region between the two edges as a node to the union-find tree.

Run Kruskals, keeping track of distance.

Download Test Cases (10)

Where is the performance bottleneck? How can this be improved?

#include <iostream>
#include <fstream>
#include <vector>
#include <set>
#include <algorithm>

#define ll long long

using namespace std;

const string PROJ_NAME = "fencedin";

struct edge 
 ll dist;
 int start;
 int finish;
 bool operator< (const edge &rhs) const 
;

struct uf_node 
 int parent;
 int level;
;

ll A, B;
int N, M;
set<edge> edge_list;
vector<uf_node> tree;

void make_edge_list(ifstream &fin)
 fin >> A >> B >> N >> M;

 vector<int> vert(N+2);
 vert[0] = 0;
 for(int i = 1; i<=N; i++) fin >> vert[i];
 vert[N+1] = A;
 N++;

 vector<int> hori(M+2);
 hori[0] = 0;
 for(int i = 1; i<=M; i++) fin >> hori[i];
 hori[M+1] = B;
 M++;

 sort(vert.begin(), vert.end());
 sort(hori.begin(), hori.end());

 tree.resize(N*M);
 for(int i = 1; i<=N; i++)
 for(int j = 1; j<=M; j++)
 int curr = (i-1)*M+(j-1);

 //Add node to DSU tree
 uf_node n = .parent = curr, .level = 0;
 tree[curr] = n;

 //Add leftward edge if not at bottom
 if(i != N)
 edge left = .dist = hori[j]-hori[j-1], .start=curr, .finish=curr+M;
 edge_list.insert(left);
 

 //Add upward edge if not at right
 if(j != M)
 edge up = .dist = vert[i]-vert[i-1], .start=curr, .finish=curr+1;
 edge_list.insert(up);
 
 
 


int find_par(int i) 
 if(i != tree[i].parent) 
 tree[i].parent = find_par(tree[i].parent);
 
 return tree[i].parent;


void do_union(int i, int j) 
 int r = find_par(i);
 int s = find_par(j);
 if(r == s) return;
 else if (tree[r].level > tree[s].level) 
 tree[r].parent = s;
 else if (tree[s].level > tree[r].level) 
 tree[s].parent = r;
 else 
 tree[r].parent = s;
 tree[r].level += 1;
 


int main()

 ifstream fin (PROJ_NAME + ".in");
 ofstream fout (PROJ_NAME + ".out");

 make_edge_list(fin);

 int total_dist = 0;
 for(edge e: edge_list)
 if(find_par(e.start) != find_par(e.finish))
 do_union(e.start, e.finish);
 total_dist += e.dist;
 
 

 fout << total_dist << endl;
 return 0;

edited Jan 31 at 5:01

Heslacher

43.9k359152

asked Jan 29 at 23:55

Evan Weissburg

1364

edited Jan 31 at 5:01

Heslacher

43.9k359152

edited Jan 31 at 5:01

Heslacher

43.9k359152

edited Jan 31 at 5:01

Heslacher

43.9k359152

asked Jan 29 at 23:55

Evan Weissburg

1364

asked Jan 29 at 23:55

Evan Weissburg

1364

asked Jan 29 at 23:55

Evan Weissburg

1364

1

tree[r].level > tree[r].level - a typo?
â€“Â vnp
Jan 30 at 1:22

@vnp Edited, did not fix problem. Good catch.
â€“Â Evan Weissburg
Jan 30 at 1:26

Should the if in find_par be if (i == tree[i].parent)?
â€“Â 1201ProgramAlarm
Jan 30 at 1:48

1

Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
â€“Â user1118321
Jan 30 at 3:47

1

@user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
â€“Â Evan Weissburg
Jan 30 at 3:49

Â |Â
show 5 more comments

1

tree[r].level > tree[r].level - a typo?
â€“Â vnp
Jan 30 at 1:22

@vnp Edited, did not fix problem. Good catch.
â€“Â Evan Weissburg
Jan 30 at 1:26

Should the if in find_par be if (i == tree[i].parent)?
â€“Â 1201ProgramAlarm
Jan 30 at 1:48

1

Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
â€“Â user1118321
Jan 30 at 3:47

1

@user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
â€“Â Evan Weissburg
Jan 30 at 3:49

tree[r].level > tree[r].level - a typo?
â€“Â vnp
Jan 30 at 1:22

@vnp Edited, did not fix problem. Good catch.
â€“Â Evan Weissburg
Jan 30 at 1:26

Should the if in find_par be if (i == tree[i].parent)?
â€“Â 1201ProgramAlarm
Jan 30 at 1:48

Do you have any more input cases to test with than the example on the linked website? That particular one runs in less time than the profiler has to sample the program!
â€“Â user1118321
Jan 30 at 3:47

@user1118321 Yes: usaco.org/current/data/fencedin_gold_feb16.zip (added as link in OP)
â€“Â Evan Weissburg
Jan 30 at 3:49

Â |Â
show 5 more comments

1 Answer
1

active

oldest

votes

up vote
5
down vote

This is a really interesting problem! Here are some things I noticed about your implementation.

Performance

Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:

85% in make_edge_list(), particularly the 2 calls to set::insert()

7% in find_par()

7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

What this means, I think, is that std::set is maybe not the best choice of container in this case. Since you can't preallocate space for items in a set like you can with a vector or array, it has to do the allocations on-the-fly, which can hamper performance. Also, choosing where to insert things seems to be part of the time sink of insert() as well. With a vector you can preallocate memory, and then push_back() just adds it to the end. It's not clear what you're getting by using a set here, so I say ditch it. The linked solution uses only a single vector and a static array of ~2000 x 2000 elements and does some sorting. So that might be a winning strategy here.

Naming

Macros have all sorts of problems. But even when you manage to write a macro that won't have any of those problems, as you did here, if there's a better way, you should use it. (For example, what if someone modifying this code makes a variable named ll or writes a string with the word "all" in it?) If you want your own shorter name for long long, then do:

using ll = long long;

or at the very least:

typedef long long ll;

But really - too lazy to type long long? Just type it out. It's a well-known type.

I'm not a fan of 1-letter variable names in most cases. The values A and B are the dimensions. I'd name them width and height. N and M are the number of vertical and horizontal fences, so why not name them num_vert_fences and num_horz_fences? Or even just rows and columns?

Avoid `using namespace std`

You should really avoid using using namespace std for all the reasons pointed out there.

Globals

The use of globals made it hard to read your code. Down in do_union(), I had to scroll all the way back up to the top of the file to see what type tree was. You should define these variables in main() and pass them to the functions that use them. That will make it much easier to understand the flow of the program and avoid problems where some function changes a variable you didn't pass it while you're using the same variable in another function.

I mentioned in my comment that the globals screwed up my performance analysis. This is a great example of the unintended consequences of globals. Because the code was running too fast to analyze, I took main(), renamed it main2(), and then wrote a new main() to simply call it 100 times in a loop. But because the tree and edge_list are global, they didn't get cleared out when main2() exited as they would have if they had been local. So that resulted in the numbers of my analysis being incorrect because the containers can have different characteristics when there are many more items in them than when there are fewer. Lesson learned. I should have made the variables local before running my performance test.

Be Careful with Less-Well-Known Tools

I really like this method of initializing the members of a struct or union:

uf_node n = .parent = curr, .level = 0;

I recently started using it and coworkers with many years of experience have been expressing surprise and concern over it! So just know that you might confuse some readers with it. It's the type of thing I'd like to see catch on, though, as it improves readability once you understand it.

edited Jan 31 at 2:41

answered Jan 30 at 4:45

user1118321

10.2k11144

Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
â€“Â Jorge BellÃ³n
Jan 30 at 9:33

This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
â€“Â Evan Weissburg
Jan 30 at 12:07

Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
â€“Â user1118321
Jan 30 at 16:32

OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
â€“Â user1118321
Jan 31 at 2:43

It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
â€“Â Evan Weissburg
Jan 31 at 3:30

Â |Â
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186290%2fperformance-of-c-union-find-solution-is-strangely-poor%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
5
down vote

This is a really interesting problem! Here are some things I noticed about your implementation.

Performance

Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:

85% in make_edge_list(), particularly the 2 calls to set::insert()

7% in find_par()

7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

Naming

using ll = long long;

or at the very least:

typedef long long ll;

But really - too lazy to type long long? Just type it out. It's a well-known type.

Avoid `using namespace std`

You should really avoid using using namespace std for all the reasons pointed out there.

Globals

Be Careful with Less-Well-Known Tools

I really like this method of initializing the members of a struct or union:

uf_node n = .parent = curr, .level = 0;

edited Jan 31 at 2:41

answered Jan 30 at 4:45

user1118321

10.2k11144

Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
â€“Â Jorge BellÃ³n
Jan 30 at 9:33

This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
â€“Â Evan Weissburg
Jan 30 at 12:07

Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
â€“Â user1118321
Jan 30 at 16:32

OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
â€“Â user1118321
Jan 31 at 2:43

It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
â€“Â Evan Weissburg
Jan 31 at 3:30

Â |Â
show 2 more comments

up vote
5
down vote

This is a really interesting problem! Here are some things I noticed about your implementation.

Performance

Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:

85% in make_edge_list(), particularly the 2 calls to set::insert()

7% in find_par()

7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

Naming

using ll = long long;

or at the very least:

typedef long long ll;

But really - too lazy to type long long? Just type it out. It's a well-known type.

Avoid `using namespace std`

You should really avoid using using namespace std for all the reasons pointed out there.

Globals

Be Careful with Less-Well-Known Tools

I really like this method of initializing the members of a struct or union:

uf_node n = .parent = curr, .level = 0;

edited Jan 31 at 2:41

answered Jan 30 at 4:45

user1118321

10.2k11144

Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
â€“Â Jorge BellÃ³n
Jan 30 at 9:33

This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
â€“Â Evan Weissburg
Jan 30 at 12:07

Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
â€“Â user1118321
Jan 30 at 16:32

OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
â€“Â user1118321
Jan 31 at 2:43

It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
â€“Â Evan Weissburg
Jan 31 at 3:30

Â |Â
show 2 more comments

up vote
5
down vote

This is a really interesting problem! Here are some things I noticed about your implementation.

Performance

Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:

85% in make_edge_list(), particularly the 2 calls to set::insert()

7% in find_par()

7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

Naming

using ll = long long;

or at the very least:

typedef long long ll;

But really - too lazy to type long long? Just type it out. It's a well-known type.

Avoid `using namespace std`

You should really avoid using using namespace std for all the reasons pointed out there.

Globals

Be Careful with Less-Well-Known Tools

I really like this method of initializing the members of a struct or union:

uf_node n = .parent = curr, .level = 0;

edited Jan 31 at 2:41

answered Jan 30 at 4:45

user1118321

10.2k11144

This is a really interesting problem! Here are some things I noticed about your implementation.

Performance

Looking at a profile of your code (compiled with llvm on macOS High Sierra with Xcode 9.2), it looks like the majority of time is spent in these 3 places:

85% in make_edge_list(), particularly the 2 calls to set::insert()

7% in find_par()

7% in std::tree::iterator::operator++() (so incrementing an iterator on your edge list, I believe)

Naming

using ll = long long;

or at the very least:

typedef long long ll;

But really - too lazy to type long long? Just type it out. It's a well-known type.

Avoid `using namespace std`

You should really avoid using using namespace std for all the reasons pointed out there.

Globals

Be Careful with Less-Well-Known Tools

I really like this method of initializing the members of a struct or union:

uf_node n = .parent = curr, .level = 0;

edited Jan 31 at 2:41

answered Jan 30 at 4:45

user1118321

10.2k11144

edited Jan 31 at 2:41

answered Jan 30 at 4:45

user1118321

10.2k11144

answered Jan 30 at 4:45

user1118321

10.2k11144

answered Jan 30 at 4:45

user1118321

10.2k11144

Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
â€“Â Jorge BellÃ³n
Jan 30 at 9:33

This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
â€“Â Evan Weissburg
Jan 30 at 12:07

Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
â€“Â user1118321
Jan 30 at 16:32

OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
â€“Â user1118321
Jan 31 at 2:43

It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
â€“Â Evan Weissburg
Jan 31 at 3:30

Â |Â
show 2 more comments

Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
â€“Â Jorge BellÃ³n
Jan 30 at 9:33

This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
â€“Â Evan Weissburg
Jan 30 at 12:07

Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
â€“Â user1118321
Jan 30 at 16:32

OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
â€“Â user1118321
Jan 31 at 2:43

It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
â€“Â Evan Weissburg
Jan 31 at 3:30

Geometric scientific applications make wide use of quad-trees (2D) and oct-trees (3D) for improved insertion and search times. Here is an explanation of these structures for a GPU oriented application (the figures in the article help to understand how they work).
â€“Â Jorge BellÃ³n
Jan 30 at 9:33

This is code for a (past) programming competition, some time counts especially with long long. Thanks for the advice, I'll test out the optimizations tonight.
â€“Â Evan Weissburg
Jan 30 at 12:07

Actually, I screwed up the performance analysis because of the globals. I'll re-run it tonight and post an update. Sorry about that. It just occurred to me this morning.
â€“Â user1118321
Jan 30 at 16:32

OK, I've redone it. It's not drastically different, in that the same functions are implicated. (Not surprising, since the code is so small.) But the amounts are pretty different. 85% vs 56% for the highest cost function.
â€“Â user1118321
Jan 31 at 2:43

It's actually still too slow even when using a preallocated vector -- have you looked at the linked solution in the OP? Maybe that will give you more of a hint than it gave me about why my solution is too slow. I added my updated code to the OP as well.
â€“Â Evan Weissburg
Jan 31 at 3:30

Â |Â
show 2 more comments

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr

Performance of C++ Union-Find solution is strangely poor

1 Answer
1

Performance

Naming

Avoid `using namespace std`

Globals

Be Careful with Less-Well-Known Tools

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Performance

Naming

Avoid `using namespace std`

Globals

Be Careful with Less-Well-Known Tools

Performance

Naming

Avoid `using namespace std`

Globals

Be Careful with Less-Well-Known Tools

Performance

Naming

Avoid `using namespace std`

Globals

Be Careful with Less-Well-Known Tools

Performance

Naming

Avoid `using namespace std`

Globals

Be Careful with Less-Well-Known Tools

Post as a guest

Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods

Performance of C++ Union-Find solution is strangely poor

1 Answer 1

Performance

Naming

Avoid using namespace std

Globals

Be Careful with Less-Well-Known Tools

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Performance

Naming

Avoid using namespace std

Globals

Be Careful with Less-Well-Known Tools

Performance

Naming

Avoid using namespace std

Globals

Be Careful with Less-Well-Known Tools

Performance

Naming

Avoid using namespace std

Globals

Be Careful with Less-Well-Known Tools

Performance

Naming

Avoid using namespace std

Globals

Be Careful with Less-Well-Known Tools

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods

1 Answer
1

Avoid `using namespace std`

1 Answer
1

1 Answer
1

Avoid `using namespace std`

Avoid `using namespace std`

Avoid `using namespace std`

Avoid `using namespace std`