trjhtr

Question

Coming from a Java background, and being fairly new to C++ as well as code review, I am trying to get accustomed to C++11/14 concepts. I have been reading Modern Effective C++ by Scott Meyers to internalize the concepts but I would love to hear some feedback on the implementation(s) below. I want to also add that the algorithm for finding the strongly connected components works, and I have tested it. However, before moving on I want to ensure I am writing proper C++ code.

Vertex class:

enum status:intunvisited,visiting,visited;
template <class T>
class vertex

 using vertex_sp = std::shared_ptr<vertex<T>>;
 using vertex_wp = std::weak_ptr<vertex<T>>;
 using neighbors = std::vector<vertex_wp>;
 public:
 static int time;

 explicit vertex(T l) : label_(l), pre_(0), post_(0), status_(status::unvisited) 
 const T& get_label() const return label_; 
 const neighbors& get_neighbors() const return neighbors_; 
 bool is_unvisited() const return status_ == unvisited; 
 bool is_visiting() const return status_ == visiting; 
 bool is_visited() const return status_ == visited; 
 int get_pre() const return pre_; 
 int get_post() const return post_; 

 void add_neighbor(const vertex_sp& v) neighbors_.emplace_back(v); 
 void set_unvisited() status_ = unvisited; 
 void set_status(status s)
 
 status_ = s;
 if (s == visiting) pre_ = ++time;
 if (s == visited) post_ = ++time;
 

 friend std::ostream& operator<<(std::ostream& out, const vertex_wp& v) return out << v.lock()->label_; 
 friend std::ostream& operator<<(std::ostream& out, const vertex_sp& v) return out << v->label_; 

 template <class E> friend class graph;
 template <class E> friend class edge;
 private:
 T label_;
 neighbors neighbors_;
 int pre_, post_;
 status status_;
;

Edge Class:

template <class T>
class edge

 using vertex_sp = std::shared_ptr<vertex<T>>;
 public:
 edge(const vertex_sp& from, const vertex_sp& to, const int cost=1): 
 from_(from), to_(to), cost_(cost) 
 const vertex_sp& get_from() const return from_; 
 const vertex_sp& get_to() const return to_; 
 int get_cost() const return cost_; 
 friend std::ostream& operator<<(std::ostream& out, const edge& v) return out << "(" << v.from_ << "," << v.to_ << "," << v.cost_ << ")"; 

 template <class E> friend class graph;
 private:
 const vertex_sp& from_;
 const vertex_sp& to_;
 const int cost_;
;

Graph Class:

enum class type:intdirected,undirected;

template <class T>
class graph

 using vertex_sp = std::shared_ptr<vertex<T>>;
 using vertices = std::unordered_set<vertex_sp>;
 using edges = std::vector<edge<T>>;
 using adj_list = std::unordered_map<vertex_sp, edges>;
public:
 graph() = default;
 explicit graph(const type t=type::undirected): type_(t) 

 const vertices& get_vertices() const return vertices_; 
 const edges& get_edges() const return edges_; 
 const edges& adj(const vertex_sp& v) const return adj_.at(v); 
 const adj_list& get_adj_list() return adj_; 

 void add_edge(const vertex_sp& from, const vertex_sp& to, const int cost=1)
 
 from->add_neighbor(to);
 vertices_.insert(from), vertices_.insert(to);
 edges_.emplace_back(from, to, cost);
 adj_[from].emplace_back(from, to, cost);

 if (type_ == type::undirected)
 
 to->add_neighbor(from);
 edges_.emplace_back(to, from, cost);
 adj_[from].emplace_back(to, from, cost);
 
 

 std::unique_ptr<graph<T>> get_transpose()
 
 for (const auto& v : vertices_) v->neighbors_.clear(); 
 auto graph_t = std::make_unique<graph<T>>(type_);
 for (const auto& edge : edges_)
 
 if (edge.from_ && edge.to_)
 
 edge.from_->set_status(unvisited);
 edge.to_->set_status(unvisited);
 graph_t->add_edge(std::move(edge.to_), std::move(edge.from_), edge.cost_);
 
 
 return graph_t;
 
private:
 vertices vertices_;
 edges edges_;
 adj_list adj_;
 type type_;
;

Kosaraju Implementation:

template <class T>
int vertex<T>::time = 0;

template <class T>
void time_stamps(const vertex_sp<T>& vertex_ptr, stack<vertex_sp<T>>* s)

 auto *u = vertex_ptr.get();
 u->set_status(visiting);

 for (const auto& v : vertex_ptr->get_neighbors())
 
 auto v_sp = v.lock(); 
 if (v_sp->is_unvisited()) 
 
 time_stamps(v_sp, s);
 
 

 u->set_status(visited);
 s->push(vertex_ptr);
 

template <class T>
void dfs(const vertex_sp<T>& vertex_ptr, vector<T>* partial)

 auto *u = vertex_ptr.get();
 u->set_status(visiting);
 partial->emplace_back(u->get_label());

 for (const auto& v : vertex_ptr->get_neighbors())
 
 auto v_sp = v.lock();
 if (v_sp->is_unvisited())
 
 dfs(v_sp, partial);
 
 

 u->set_status(visited);
 

template <class T>
vector<vector<T>> get_scc(const graph_up<T>& graph_ptr)

 vector<vector<T>> sccs;
 stack<vertex_sp<T>> times;
 for (const auto &v : graph_ptr->get_vertices())
 
 if (v->is_unvisited())
 
 time_stamps(v, &times);
 
 
 auto graph_t_ptr = graph_ptr->get_transpose();

 while (!times.empty())
 
 const auto& curr = times.top();
 times.pop();

 vector<T> partial;
 if (curr->is_unvisited())
 
 dfs(curr, &partial);
 sccs.emplace_back(partial);
 
 
 return sccs;

My main concerns are:

I am writing C++ like Java code.

I am using a shared pointer where a unique pointer would suffice. I thought about it for a while before using a shared pointer and opted for shared pointer because I keep several pointers to the vertices since they get copied to several containers and thus having a shared ownership.

I am not transposing the graph efficiently, I have read that std::move has 0 overhead so I opted to use that instead of creating additional copies.

Being new to operator overloading, are my uses above correct? Am I using the friend keyword properly?

What tips do you all have on improving my C++ style in general, mainly on how I can improve efficiency or more importantly understand when dynamic allocation is necessary and when mere objects suffice (I want to ensure I am not abusing dynamic allocation).

Thanks in advance!

Giel 1161 · Accepted Answer · 2018-03-20 10:59:42Z

There are several issues that I see:

First of all: you're using a lot of type aliases (using T =, aka typedefs). This might be considered to be just a personal preference but I'll mention it anyway. Having to constantly refer back to your type aliases for basic stuff like std::shared_ptr<T>, std::weak_ptr<T> and std::vector<T> makes for very slow reading. This is because the original type names are easily recognized by anyone with some experience in C++ and it's standard library. You're custom type names however, don't have this familiarity advantage.

Smart pointers by reference

You're passing (smart) pointers by (const) reference. Sometimes to take ownership of the pointed to object, sometimes just to temporarily operate on it.

E.g.:

template <class T>
void dfs(const vertex_sp<T>& vertex_ptr, ...) /* just using vertex_ptr */

vs:

void add_neighbor(const vertex_sp& v) neighbors_.emplace_back(v);

In general you want to pass (smart) pointers by value when you're transferring ownership and std::move them to the place that you're intending to store them. E.g. the neighbors_ vector in this case, but also edge's constructor and probably a few other locations.

Transferring ownership by passing by value has the additional advantage, if you're consistently using std::move, that it can avoid a single extra copy when the function is called with a temporary. For std::shared_ptr<T> this avoids an expensive atomic increment/decrement cycle (expensive due to inter-core synchronization).

When you're just accessing an object you generally don't want to access it by smart pointer at all. Just use references to the object itself instead. If you optionally want to pass an object then there's about two generally acceptable approaches for that (when limiting to the standard library):

pass by pointer T*

pass by opt_ref<T> with:

template
using opt_ref = std::optional>;

The latter can also be accomplished with boost::optional<T&> if you don't mind using Boost. (Some of the involved reasoning and history gets explained in C++ Can't Abandon Raw Pointers ... yet).

Using smart_ptr.get()

You're using the .get() member function of std::shared_ptr<T> inside time_stamps and dfs: just don't. Using operator-> will have the exact same performance on release builds and on debug builds, with most implementations of the standard library, add an extra assertion that the pointer isn't NULL. I.e. you'll get a small extra safety net without any performance hit in release builds.

Raw pointer parameter for required by-reference passing

The second parameters to time_stamps and dfs are passed by raw pointer. Generally the only case where you want to use raw pointers as parameters are as optional references (see above). This is not one of those cases: just use a regular (l-value, non-const) reference.

Another option is to take that parameter by-value, move into it and, after modification, return it. This has the advantage of being a pure function, that's generally easier to reason about, with the same performance profile. This would look like this:

vector<T> append_something(vector<T> v)

 auto lv = std::move(v); // need to move to local variable to ensure compiler
 // will move from it by itself, and from C++14
 // onwards be forced to do copy-elision
 lv.emplace_back(/* something */);
 return lv;


vector<T> v;
while (/* something */)

 v = append_something(std::move(v));

Enum with specific storage specifier

You're explicitly specifying the storage specifier for the enum class that you're using. Generally you don't want to do this unless you have to deal with serialization or have ABI backwards/forwards compatibility concerns. If none of this applies this just distracts. Especially since you're only specifying what is already the default.

Additionally you have made the type enum scoped (enum class) while you haven't done so for the status enum. I would recommend that you make the latter scoped too.

Ownership

You declare a weak_ptr alias, indicating you intend to use it, but you never do. The most straightforward improvement would seem to be to use it for the from node of your edge class.

You're storing references to smart pointers inside your edge class. You almost definitely don't want to do this and store by value instead.

Inside get_scc you get a reference to the top of your times stack and then immediately pop it: using that reference afterwards is undefined behavior. I recommend this minor modification:

auto curr = std::move(times.stop());
times.pop();

Misc

You explicitly ask the compiler to generate a default constructor for graph and then subvert it by defaulting the only parameter of your only other constructor. I suggest to give your type_ member variable a default value instead. You can then delete the default parameter value from your custom constructor.

In operator<<(..., const vertex_wp&) you're unconditionally dereferencing the return value from std::weak_ptr::lock. This can result in undefined behaviour when the pointed-to-object no longer exists. I suggest to assign to local variable first (auto p = v.lock();).

For the other operator<<(..., const vertex_sp&) you're also referencing unconditionally: I suggest to do so conditionally, just like for the above. As an alternative you can output something like "(null)", or whatever suits your situation best.

You're use of get_ is inconsistent. Most of the time it's to indicate a non-destructive action: returning some read-only property of the object. But in the case of get_transpose() it's most definitely a destructive action. For returning read-only information I generally prefer to use just a noun without any get prefix, because "to get" (in the English language) generally means to take something away. For mutating (i.e. destructive) actions I prefer a verb form that describes the change.

You're using _t as a suffix for a variable (graph_t): this suffix is used as a convention in the standard library for type names. You may want to avoid this confusion.

You're storing algorithm state in global variables and as part of the graph. E.g. the (class scoped) global variable vertex<T>::time and the pre_, post_ and status_ member variables. You may wish to store this outside of the graph instead. E.g. as a per-vertex struct for the member variables stored in an unordered_map indexed by the vertices (like adj_list). This will make your data classes very simple data holders, without any algorithm specific behaviour. I.e. separation of concerns. Additionally this will make your algorithm read-only with regards to the processed data structure. This in turn, potentially, opens the door for parallelization (assuming this algorithm is suited to that).

Your questions

Most of my remarks are in the category of those that I have to give to people with C++ as their primary language too. I didn't see any of the Java-isms I'm used to seeing.

If your graph is a-cyclic then you can use std::unique_ptr<T> for the to nodes in your edge and a raw pointer for your from nodes. That way ownership flows in one direction. But I'm only familiar with using this model in tree structures where the edges are stored as pointers inside the nodes. I.e. the child pointers would be std::vector<std::unique_ptr<node<T>>> while the parent pointer would be a single raw pointer node<T>*. This is safe because in this model children get destroyed before parents (including a child's pointer to its parent). Additionally, when using the default, compiler generated, destructor this has a destructor calling depth equal to your tree depth which may overflow your stack for unbalanced or very large trees. It's trivial to use iteration instead of recursion instead, but you need to be aware of it.

std::move is tricky: the function itself is basically just annotation for the compiler and as such has indeed no runtime overhead. However, what it does do is tell the compiler to consider, during overload resolution, functions taking the parameter by r-value reference. These functions are generally expected to perform a move-from instead of a copy, but are not guaranteed to do so. I.e. in some cases a move constructor or move assignment operator might not be available, in which case you still get a copy. Even when a move constructor is present it is still not for free. Usually little more than the swapping of several pointers (ranging from 1 to 3 for most types in most standard library implementations), which is still cheap, just not for free. But in some special cases (e.g. moving from an array, either built-in or std::array) it happens that containers cannot move their memory. In those cases the individual members are moved instead. Turning what would otherwise be an O(1) operation (pointer swap) into an O(n) operation.

I'm assuming you're referring to your streaming operator<< implementations: this is indeed the general approach for implementing this operator. Although you might wish to have one overload taking the object itself by reference instead of a pointer to it. Let the code using this deal with the pointers it has instead.

I suggest you watch Herb Sutter's Ã¢Â€ÂœLeak-Freedom in C++... By Default.Ã¢Â€Â presentation at CppCon 2016. It's main topic is dealing with memory safety issues in the presence of data structures in growing complexity. Including a solution, in the shape of a library, developed by him, to storing cyclic graphs without needing shared_ptr.

Thank you so much! This helps a lot! Additionally, that video you suggested cleared up a lot of additional questions I had. — Mar 20 at 19:11

Giel 1161 · Accepted Answer · 2018-03-20 10:59:42Z

There are several issues that I see:

First of all: you're using a lot of type aliases (using T =, aka typedefs). This might be considered to be just a personal preference but I'll mention it anyway. Having to constantly refer back to your type aliases for basic stuff like std::shared_ptr<T>, std::weak_ptr<T> and std::vector<T> makes for very slow reading. This is because the original type names are easily recognized by anyone with some experience in C++ and it's standard library. You're custom type names however, don't have this familiarity advantage.

Smart pointers by reference

You're passing (smart) pointers by (const) reference. Sometimes to take ownership of the pointed to object, sometimes just to temporarily operate on it.

E.g.:

template <class T>
void dfs(const vertex_sp<T>& vertex_ptr, ...) /* just using vertex_ptr */

vs:

void add_neighbor(const vertex_sp& v) neighbors_.emplace_back(v);

In general you want to pass (smart) pointers by value when you're transferring ownership and std::move them to the place that you're intending to store them. E.g. the neighbors_ vector in this case, but also edge's constructor and probably a few other locations.

Transferring ownership by passing by value has the additional advantage, if you're consistently using std::move, that it can avoid a single extra copy when the function is called with a temporary. For std::shared_ptr<T> this avoids an expensive atomic increment/decrement cycle (expensive due to inter-core synchronization).

When you're just accessing an object you generally don't want to access it by smart pointer at all. Just use references to the object itself instead. If you optionally want to pass an object then there's about two generally acceptable approaches for that (when limiting to the standard library):

pass by pointer T*

pass by opt_ref<T> with:

template
using opt_ref = std::optional>;

The latter can also be accomplished with boost::optional<T&> if you don't mind using Boost. (Some of the involved reasoning and history gets explained in C++ Can't Abandon Raw Pointers ... yet).

Using smart_ptr.get()

You're using the .get() member function of std::shared_ptr<T> inside time_stamps and dfs: just don't. Using operator-> will have the exact same performance on release builds and on debug builds, with most implementations of the standard library, add an extra assertion that the pointer isn't NULL. I.e. you'll get a small extra safety net without any performance hit in release builds.

Raw pointer parameter for required by-reference passing

The second parameters to time_stamps and dfs are passed by raw pointer. Generally the only case where you want to use raw pointers as parameters are as optional references (see above). This is not one of those cases: just use a regular (l-value, non-const) reference.

Another option is to take that parameter by-value, move into it and, after modification, return it. This has the advantage of being a pure function, that's generally easier to reason about, with the same performance profile. This would look like this:

vector<T> append_something(vector<T> v)

 auto lv = std::move(v); // need to move to local variable to ensure compiler
 // will move from it by itself, and from C++14
 // onwards be forced to do copy-elision
 lv.emplace_back(/* something */);
 return lv;


vector<T> v;
while (/* something */)

 v = append_something(std::move(v));

Enum with specific storage specifier

You're explicitly specifying the storage specifier for the enum class that you're using. Generally you don't want to do this unless you have to deal with serialization or have ABI backwards/forwards compatibility concerns. If none of this applies this just distracts. Especially since you're only specifying what is already the default.

Additionally you have made the type enum scoped (enum class) while you haven't done so for the status enum. I would recommend that you make the latter scoped too.

Ownership

You declare a weak_ptr alias, indicating you intend to use it, but you never do. The most straightforward improvement would seem to be to use it for the from node of your edge class.

You're storing references to smart pointers inside your edge class. You almost definitely don't want to do this and store by value instead.

Inside get_scc you get a reference to the top of your times stack and then immediately pop it: using that reference afterwards is undefined behavior. I recommend this minor modification:

auto curr = std::move(times.stop());
times.pop();

Misc

You explicitly ask the compiler to generate a default constructor for graph and then subvert it by defaulting the only parameter of your only other constructor. I suggest to give your type_ member variable a default value instead. You can then delete the default parameter value from your custom constructor.

In operator<<(..., const vertex_wp&) you're unconditionally dereferencing the return value from std::weak_ptr::lock. This can result in undefined behaviour when the pointed-to-object no longer exists. I suggest to assign to local variable first (auto p = v.lock();).

For the other operator<<(..., const vertex_sp&) you're also referencing unconditionally: I suggest to do so conditionally, just like for the above. As an alternative you can output something like "(null)", or whatever suits your situation best.

You're use of get_ is inconsistent. Most of the time it's to indicate a non-destructive action: returning some read-only property of the object. But in the case of get_transpose() it's most definitely a destructive action. For returning read-only information I generally prefer to use just a noun without any get prefix, because "to get" (in the English language) generally means to take something away. For mutating (i.e. destructive) actions I prefer a verb form that describes the change.

You're using _t as a suffix for a variable (graph_t): this suffix is used as a convention in the standard library for type names. You may want to avoid this confusion.

You're storing algorithm state in global variables and as part of the graph. E.g. the (class scoped) global variable vertex<T>::time and the pre_, post_ and status_ member variables. You may wish to store this outside of the graph instead. E.g. as a per-vertex struct for the member variables stored in an unordered_map indexed by the vertices (like adj_list). This will make your data classes very simple data holders, without any algorithm specific behaviour. I.e. separation of concerns. Additionally this will make your algorithm read-only with regards to the processed data structure. This in turn, potentially, opens the door for parallelization (assuming this algorithm is suited to that).

Your questions

Most of my remarks are in the category of those that I have to give to people with C++ as their primary language too. I didn't see any of the Java-isms I'm used to seeing.

If your graph is a-cyclic then you can use std::unique_ptr<T> for the to nodes in your edge and a raw pointer for your from nodes. That way ownership flows in one direction. But I'm only familiar with using this model in tree structures where the edges are stored as pointers inside the nodes. I.e. the child pointers would be std::vector<std::unique_ptr<node<T>>> while the parent pointer would be a single raw pointer node<T>*. This is safe because in this model children get destroyed before parents (including a child's pointer to its parent). Additionally, when using the default, compiler generated, destructor this has a destructor calling depth equal to your tree depth which may overflow your stack for unbalanced or very large trees. It's trivial to use iteration instead of recursion instead, but you need to be aware of it.

std::move is tricky: the function itself is basically just annotation for the compiler and as such has indeed no runtime overhead. However, what it does do is tell the compiler to consider, during overload resolution, functions taking the parameter by r-value reference. These functions are generally expected to perform a move-from instead of a copy, but are not guaranteed to do so. I.e. in some cases a move constructor or move assignment operator might not be available, in which case you still get a copy. Even when a move constructor is present it is still not for free. Usually little more than the swapping of several pointers (ranging from 1 to 3 for most types in most standard library implementations), which is still cheap, just not for free. But in some special cases (e.g. moving from an array, either built-in or std::array) it happens that containers cannot move their memory. In those cases the individual members are moved instead. Turning what would otherwise be an O(1) operation (pointer swap) into an O(n) operation.

I'm assuming you're referring to your streaming operator<< implementations: this is indeed the general approach for implementing this operator. Although you might wish to have one overload taking the object itself by reference instead of a pointer to it. Let the code using this deal with the pointers it has instead.

I suggest you watch Herb Sutter's Ã¢Â€ÂœLeak-Freedom in C++... By Default.Ã¢Â€Â presentation at CppCon 2016. It's main topic is dealing with memory safety issues in the presence of data structures in growing complexity. Including a solution, in the shape of a library, developed by him, to storing cyclic graphs without needing shared_ptr.

Thank you so much! This helps a lot! Additionally, that video you suggested cleared up a lot of additional questions I had. — Mar 20 at 19:11

Graph Representation using smart pointers. Kosaraju's algorithm implementation

1 Answer 1

Smart pointers by reference

Using smart_ptr.get()

Raw pointer parameter for required by-reference passing

Enum with specific storage specifier

Ownership

Misc

Your questions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Smart pointers by reference

Using smart_ptr.get()

Raw pointer parameter for required by-reference passing

Enum with specific storage specifier

Ownership

Misc

Your questions

Smart pointers by reference

Using smart_ptr.get()

Raw pointer parameter for required by-reference passing

Enum with specific storage specifier

Ownership

Misc

Your questions

Smart pointers by reference

Using smart_ptr.get()

Raw pointer parameter for required by-reference passing

Enum with specific storage specifier

Ownership

Misc

Your questions

Smart pointers by reference

Using smart_ptr.get()

Raw pointer parameter for required by-reference passing

Enum with specific storage specifier

Ownership

Misc

Your questions

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read an image with ADNS2610 optical sensor and Arduino Uno

Read files from a directory using Promises

1 Answer
1

1 Answer
1

1 Answer
1