Segment road map data with heuristics

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












Backstory



A little while ago I was assigned with the task of clustering some road segments using open street map data. After parsing a huge XML (maybe that should be another review) I ended up with the following data format:



struct node 
std::string id;
double lat;
double lon;
int refs;
node();
void print();
;

struct road
std::string id;
std::string type;
std::vector<int> nodes;
;


So a node has the 2 coordinates plus an id and a ref number, because later I'm supposed to keep track when a node is part of 2 different roads. A road has an id and a type and the vector has the indexes of its nodes in a vector<node> I'm using to keep the nodes. Also this vector doesn't change in size, so it's safe to keep indexes like that.



Segmenting



My actual task is to make segments out of these roads according to the following criteria:



  • When the curvature of the segment's circumcircle exceeds a heuristic value

  • When the size of a segment exceeds a heuristic value

  • When a node is also part of another road; in this case the segment ends after all the nodes that are part of other roads.

The output should be a text file with the following format:



segment_id, way_id, no_of_nodes, node1_lat, node_1_lon, .... lastnode_lat, lastnode_lon


My solution is this



inline double curvature(double l1, double l2, double l3)
return l1*l2*l3/sqrt((l1+l2+l3)*(l2+l3-l1)*(l3+l1-l2)*(l1+l2-l3));


void write_segment(ofstream &out, int &id, string way_id, vector<double> &coords,
int &minsize, int &maxsize, int &tot_n)
out << id << ", " << way_id << ", " << coords.size()/2 << ", ";
for(size_t i=0; i<coords.size()-1; i++)
out << coords[i] << ", ";

out << coords.back() << 'n';
//check if we have new max/min
if(coords.size()/2 < minsize)
minsize = coords.size()/2;
else if(coords.size()/2 > maxsize)
maxsize = coords.size()/2;
//ingrement the id
id++;
tot_n += coords.size()/2;
//keep last node because it's also a part of the next segment
coords.erase(coords.begin(), coords.end()-2);


void make_segments(const vector<road> &roads, const vector<node> &nodes,
const string &out_s)
ofstream out(out_s);
bool writefalse;
//lengths for every 3 points, total and temporary curvatures
double l1, l2, l3, curb, curv;
double thrs0.03; //curvature threshold over which a segment breaks
int segid; //serial segment id for every new segment
//count for average curvature, tot_n -> nodes written
int count, tot_n, totalnodes;
//min and max segment sizes for the statistics
int maxsize, minsizestd::numeric_limits<int>::max();
size_t nthrs200, minthrs4; //min and max threshold for segment size
vector<double> coords; //keep all the coordinates until written to file

cout << "Segmenting:n";
for(size_t i=0; i<roads.size(); i++)
totalnodes += roads[i].nodes.size();
coords.push_back(nodes[roads[i].nodes[0]].lat);
coords.push_back(nodes[roads[i].nodes[0]].lon);
//special case for roads with one node
if(roads[i].nodes.size() ==1)
out << segid << ", " << roads[i].id << ", " << 1 << ", ";
out << coords[0] << ", " << coords[1] << 'n';
segid++;
tot_n++;
coords.clear();
continue;

//make one segment for small roads (according to minthrs)
if(roads[i].nodes.size() <= minthrs)
for(size_t j=1; j<roads[i].nodes.size(); j++)
coords.push_back(nodes[roads[i].nodes[j]].lat);
coords.push_back(nodes[roads[i].nodes[j]].lon);

write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
continue;

//for every other road segment it according to heuristics
for(size_t j=1; j<roads[i].nodes.size()-minthrs; j++)

for(size_t j=roads[i].nodes.size()-minthrs; j<roads[i].nodes.size(); j++)
coords.push_back(nodes[roads[i].nodes[j]].lat);
coords.push_back(nodes[roads[i].nodes[j]].lon);

write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
coords.clear();

cout << "Maxsize: " << maxsize << 'n';
cout << "Minsize: " << minsize << 'n';
cout << "Segs: " << segid << 'n';
cout << "Nodes written: " << tot_n << 'n';
cout << "Total nodes were: " << totalnodes << 'n';
cout << "Average curvature: " << double(curv/count) << endl;



I'm interested in good and bad practices this code follows, readability, especially when documenting what each variable stands for and any performance issues.







share|improve this question



























    up vote
    2
    down vote

    favorite












    Backstory



    A little while ago I was assigned with the task of clustering some road segments using open street map data. After parsing a huge XML (maybe that should be another review) I ended up with the following data format:



    struct node 
    std::string id;
    double lat;
    double lon;
    int refs;
    node();
    void print();
    ;

    struct road
    std::string id;
    std::string type;
    std::vector<int> nodes;
    ;


    So a node has the 2 coordinates plus an id and a ref number, because later I'm supposed to keep track when a node is part of 2 different roads. A road has an id and a type and the vector has the indexes of its nodes in a vector<node> I'm using to keep the nodes. Also this vector doesn't change in size, so it's safe to keep indexes like that.



    Segmenting



    My actual task is to make segments out of these roads according to the following criteria:



    • When the curvature of the segment's circumcircle exceeds a heuristic value

    • When the size of a segment exceeds a heuristic value

    • When a node is also part of another road; in this case the segment ends after all the nodes that are part of other roads.

    The output should be a text file with the following format:



    segment_id, way_id, no_of_nodes, node1_lat, node_1_lon, .... lastnode_lat, lastnode_lon


    My solution is this



    inline double curvature(double l1, double l2, double l3)
    return l1*l2*l3/sqrt((l1+l2+l3)*(l2+l3-l1)*(l3+l1-l2)*(l1+l2-l3));


    void write_segment(ofstream &out, int &id, string way_id, vector<double> &coords,
    int &minsize, int &maxsize, int &tot_n)
    out << id << ", " << way_id << ", " << coords.size()/2 << ", ";
    for(size_t i=0; i<coords.size()-1; i++)
    out << coords[i] << ", ";

    out << coords.back() << 'n';
    //check if we have new max/min
    if(coords.size()/2 < minsize)
    minsize = coords.size()/2;
    else if(coords.size()/2 > maxsize)
    maxsize = coords.size()/2;
    //ingrement the id
    id++;
    tot_n += coords.size()/2;
    //keep last node because it's also a part of the next segment
    coords.erase(coords.begin(), coords.end()-2);


    void make_segments(const vector<road> &roads, const vector<node> &nodes,
    const string &out_s)
    ofstream out(out_s);
    bool writefalse;
    //lengths for every 3 points, total and temporary curvatures
    double l1, l2, l3, curb, curv;
    double thrs0.03; //curvature threshold over which a segment breaks
    int segid; //serial segment id for every new segment
    //count for average curvature, tot_n -> nodes written
    int count, tot_n, totalnodes;
    //min and max segment sizes for the statistics
    int maxsize, minsizestd::numeric_limits<int>::max();
    size_t nthrs200, minthrs4; //min and max threshold for segment size
    vector<double> coords; //keep all the coordinates until written to file

    cout << "Segmenting:n";
    for(size_t i=0; i<roads.size(); i++)
    totalnodes += roads[i].nodes.size();
    coords.push_back(nodes[roads[i].nodes[0]].lat);
    coords.push_back(nodes[roads[i].nodes[0]].lon);
    //special case for roads with one node
    if(roads[i].nodes.size() ==1)
    out << segid << ", " << roads[i].id << ", " << 1 << ", ";
    out << coords[0] << ", " << coords[1] << 'n';
    segid++;
    tot_n++;
    coords.clear();
    continue;

    //make one segment for small roads (according to minthrs)
    if(roads[i].nodes.size() <= minthrs)
    for(size_t j=1; j<roads[i].nodes.size(); j++)
    coords.push_back(nodes[roads[i].nodes[j]].lat);
    coords.push_back(nodes[roads[i].nodes[j]].lon);

    write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
    continue;

    //for every other road segment it according to heuristics
    for(size_t j=1; j<roads[i].nodes.size()-minthrs; j++)

    for(size_t j=roads[i].nodes.size()-minthrs; j<roads[i].nodes.size(); j++)
    coords.push_back(nodes[roads[i].nodes[j]].lat);
    coords.push_back(nodes[roads[i].nodes[j]].lon);

    write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
    coords.clear();

    cout << "Maxsize: " << maxsize << 'n';
    cout << "Minsize: " << minsize << 'n';
    cout << "Segs: " << segid << 'n';
    cout << "Nodes written: " << tot_n << 'n';
    cout << "Total nodes were: " << totalnodes << 'n';
    cout << "Average curvature: " << double(curv/count) << endl;



    I'm interested in good and bad practices this code follows, readability, especially when documenting what each variable stands for and any performance issues.







    share|improve this question























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      Backstory



      A little while ago I was assigned with the task of clustering some road segments using open street map data. After parsing a huge XML (maybe that should be another review) I ended up with the following data format:



      struct node 
      std::string id;
      double lat;
      double lon;
      int refs;
      node();
      void print();
      ;

      struct road
      std::string id;
      std::string type;
      std::vector<int> nodes;
      ;


      So a node has the 2 coordinates plus an id and a ref number, because later I'm supposed to keep track when a node is part of 2 different roads. A road has an id and a type and the vector has the indexes of its nodes in a vector<node> I'm using to keep the nodes. Also this vector doesn't change in size, so it's safe to keep indexes like that.



      Segmenting



      My actual task is to make segments out of these roads according to the following criteria:



      • When the curvature of the segment's circumcircle exceeds a heuristic value

      • When the size of a segment exceeds a heuristic value

      • When a node is also part of another road; in this case the segment ends after all the nodes that are part of other roads.

      The output should be a text file with the following format:



      segment_id, way_id, no_of_nodes, node1_lat, node_1_lon, .... lastnode_lat, lastnode_lon


      My solution is this



      inline double curvature(double l1, double l2, double l3)
      return l1*l2*l3/sqrt((l1+l2+l3)*(l2+l3-l1)*(l3+l1-l2)*(l1+l2-l3));


      void write_segment(ofstream &out, int &id, string way_id, vector<double> &coords,
      int &minsize, int &maxsize, int &tot_n)
      out << id << ", " << way_id << ", " << coords.size()/2 << ", ";
      for(size_t i=0; i<coords.size()-1; i++)
      out << coords[i] << ", ";

      out << coords.back() << 'n';
      //check if we have new max/min
      if(coords.size()/2 < minsize)
      minsize = coords.size()/2;
      else if(coords.size()/2 > maxsize)
      maxsize = coords.size()/2;
      //ingrement the id
      id++;
      tot_n += coords.size()/2;
      //keep last node because it's also a part of the next segment
      coords.erase(coords.begin(), coords.end()-2);


      void make_segments(const vector<road> &roads, const vector<node> &nodes,
      const string &out_s)
      ofstream out(out_s);
      bool writefalse;
      //lengths for every 3 points, total and temporary curvatures
      double l1, l2, l3, curb, curv;
      double thrs0.03; //curvature threshold over which a segment breaks
      int segid; //serial segment id for every new segment
      //count for average curvature, tot_n -> nodes written
      int count, tot_n, totalnodes;
      //min and max segment sizes for the statistics
      int maxsize, minsizestd::numeric_limits<int>::max();
      size_t nthrs200, minthrs4; //min and max threshold for segment size
      vector<double> coords; //keep all the coordinates until written to file

      cout << "Segmenting:n";
      for(size_t i=0; i<roads.size(); i++)
      totalnodes += roads[i].nodes.size();
      coords.push_back(nodes[roads[i].nodes[0]].lat);
      coords.push_back(nodes[roads[i].nodes[0]].lon);
      //special case for roads with one node
      if(roads[i].nodes.size() ==1)
      out << segid << ", " << roads[i].id << ", " << 1 << ", ";
      out << coords[0] << ", " << coords[1] << 'n';
      segid++;
      tot_n++;
      coords.clear();
      continue;

      //make one segment for small roads (according to minthrs)
      if(roads[i].nodes.size() <= minthrs)
      for(size_t j=1; j<roads[i].nodes.size(); j++)
      coords.push_back(nodes[roads[i].nodes[j]].lat);
      coords.push_back(nodes[roads[i].nodes[j]].lon);

      write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
      continue;

      //for every other road segment it according to heuristics
      for(size_t j=1; j<roads[i].nodes.size()-minthrs; j++)

      for(size_t j=roads[i].nodes.size()-minthrs; j<roads[i].nodes.size(); j++)
      coords.push_back(nodes[roads[i].nodes[j]].lat);
      coords.push_back(nodes[roads[i].nodes[j]].lon);

      write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
      coords.clear();

      cout << "Maxsize: " << maxsize << 'n';
      cout << "Minsize: " << minsize << 'n';
      cout << "Segs: " << segid << 'n';
      cout << "Nodes written: " << tot_n << 'n';
      cout << "Total nodes were: " << totalnodes << 'n';
      cout << "Average curvature: " << double(curv/count) << endl;



      I'm interested in good and bad practices this code follows, readability, especially when documenting what each variable stands for and any performance issues.







      share|improve this question













      Backstory



      A little while ago I was assigned with the task of clustering some road segments using open street map data. After parsing a huge XML (maybe that should be another review) I ended up with the following data format:



      struct node 
      std::string id;
      double lat;
      double lon;
      int refs;
      node();
      void print();
      ;

      struct road
      std::string id;
      std::string type;
      std::vector<int> nodes;
      ;


      So a node has the 2 coordinates plus an id and a ref number, because later I'm supposed to keep track when a node is part of 2 different roads. A road has an id and a type and the vector has the indexes of its nodes in a vector<node> I'm using to keep the nodes. Also this vector doesn't change in size, so it's safe to keep indexes like that.



      Segmenting



      My actual task is to make segments out of these roads according to the following criteria:



      • When the curvature of the segment's circumcircle exceeds a heuristic value

      • When the size of a segment exceeds a heuristic value

      • When a node is also part of another road; in this case the segment ends after all the nodes that are part of other roads.

      The output should be a text file with the following format:



      segment_id, way_id, no_of_nodes, node1_lat, node_1_lon, .... lastnode_lat, lastnode_lon


      My solution is this



      inline double curvature(double l1, double l2, double l3)
      return l1*l2*l3/sqrt((l1+l2+l3)*(l2+l3-l1)*(l3+l1-l2)*(l1+l2-l3));


      void write_segment(ofstream &out, int &id, string way_id, vector<double> &coords,
      int &minsize, int &maxsize, int &tot_n)
      out << id << ", " << way_id << ", " << coords.size()/2 << ", ";
      for(size_t i=0; i<coords.size()-1; i++)
      out << coords[i] << ", ";

      out << coords.back() << 'n';
      //check if we have new max/min
      if(coords.size()/2 < minsize)
      minsize = coords.size()/2;
      else if(coords.size()/2 > maxsize)
      maxsize = coords.size()/2;
      //ingrement the id
      id++;
      tot_n += coords.size()/2;
      //keep last node because it's also a part of the next segment
      coords.erase(coords.begin(), coords.end()-2);


      void make_segments(const vector<road> &roads, const vector<node> &nodes,
      const string &out_s)
      ofstream out(out_s);
      bool writefalse;
      //lengths for every 3 points, total and temporary curvatures
      double l1, l2, l3, curb, curv;
      double thrs0.03; //curvature threshold over which a segment breaks
      int segid; //serial segment id for every new segment
      //count for average curvature, tot_n -> nodes written
      int count, tot_n, totalnodes;
      //min and max segment sizes for the statistics
      int maxsize, minsizestd::numeric_limits<int>::max();
      size_t nthrs200, minthrs4; //min and max threshold for segment size
      vector<double> coords; //keep all the coordinates until written to file

      cout << "Segmenting:n";
      for(size_t i=0; i<roads.size(); i++)
      totalnodes += roads[i].nodes.size();
      coords.push_back(nodes[roads[i].nodes[0]].lat);
      coords.push_back(nodes[roads[i].nodes[0]].lon);
      //special case for roads with one node
      if(roads[i].nodes.size() ==1)
      out << segid << ", " << roads[i].id << ", " << 1 << ", ";
      out << coords[0] << ", " << coords[1] << 'n';
      segid++;
      tot_n++;
      coords.clear();
      continue;

      //make one segment for small roads (according to minthrs)
      if(roads[i].nodes.size() <= minthrs)
      for(size_t j=1; j<roads[i].nodes.size(); j++)
      coords.push_back(nodes[roads[i].nodes[j]].lat);
      coords.push_back(nodes[roads[i].nodes[j]].lon);

      write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
      continue;

      //for every other road segment it according to heuristics
      for(size_t j=1; j<roads[i].nodes.size()-minthrs; j++)

      for(size_t j=roads[i].nodes.size()-minthrs; j<roads[i].nodes.size(); j++)
      coords.push_back(nodes[roads[i].nodes[j]].lat);
      coords.push_back(nodes[roads[i].nodes[j]].lon);

      write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
      coords.clear();

      cout << "Maxsize: " << maxsize << 'n';
      cout << "Minsize: " << minsize << 'n';
      cout << "Segs: " << segid << 'n';
      cout << "Nodes written: " << tot_n << 'n';
      cout << "Total nodes were: " << totalnodes << 'n';
      cout << "Average curvature: " << double(curv/count) << endl;



      I'm interested in good and bad practices this code follows, readability, especially when documenting what each variable stands for and any performance issues.









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 9 at 2:33









      Jamal♦

      30.1k11114225




      30.1k11114225









      asked Jan 8 at 14:56









      Konstantinoscs

      2511




      2511




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Prefer Clear Naming over using namespace std



          According to the MSDN website:




          Namespaces are used to organize code into logical groups and to prevent name collisions that can occur especially when your code base includes multiple libraries.




          A collision is when 2 different functions have the same name, the same argument types and a similar functionality (this is why they have the same name). Someone developing software may want to override a function such as std::cout, std::cin or they may want to override the functionality of a class such as std::vector or std::stack. Namespaces allow these constructs to be overridden.



          The use of the programming statement:



          using namespace std;


          hides the fact that cin, cout, vector and stack are coming from the namespace std where cin, cout,
          vector and stack are used in the code. This can cause confusion of where the code is actually
          coming from.



          As the software becomes more complex and uses more libraries this becomes a bigger problem.



          For a more detailed discussion of why it is a bad idea to use using namespace std; see this stackoverflow question and stackoverflow question.



          Reduce Complexity, Follow SRP

          The Single Responsibility Principle states that every module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility.




          Robert C. Martin expresses the principle as follows:
          A class should have only one reason to change.




          While this is primarily targeted at classes in object oriented languages it applies to functions and subroutines well.



          The void make_segments(const std::vector<road> &roads, const std::vector<node> &nodes, const std::string &out_s) function could be broken up into at multiple functions. The following code is a good candidate for a function to be called by make_segments:



           //for every other road segment it according to heuristics
          for (size_t j = 1; j<roads[i].nodes.size() - minthrs; j++) write)
          //check for curvature or maxsize
          write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
          write = false;




          The more separate functions there are the easier it is to understand or read the code. This also makes it easier for any programmer to maintain or debug the code.



          Prefer Iterators Over Indexes for Container Classes

          The function void write_segment(std::ofstream &out, int &id, std::string way_id, std::vector<double> &coords, int &minsize, int &maxsize, int &tot_n) contains the following code:



           for (size_t i = 0; i<coords.size() - 1; i++) 
          out << coords[i] << ", ";



          A more modern and less C programming language like construct would be:



           for (auto coord_iterator: coords) 
          out << coord_iterator << ",";



          Using iterators in this manner makes it much easier to program correctly.



          Declare Variables as Close to Use as Possible

          In the loop defined as a possible candidate for a function there are 3 variables that should be defined within the loop rather than at the top of the function. The variables l1, l2 and l3 are only assigned values and used within the loop.



           //find the 3 sides of the triangle for every 3 points
          double l1 = euclid_dist(nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon,
          nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon);
          double l2 = euclid_dist(nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon,
          nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon);
          double l3 = euclid_dist(nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon,
          nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon);
          //compute circumcircle radius (curvature)
          curb = curvature(l1, l2, l3);
          count++;





          share|improve this answer

















          • 1




            Your for auto loop writes an extra comma at the end, which the original code does not.
            – Roland Illig
            Jan 9 at 0:57










          • @RolandIllig true.
            – pacmaninbw
            Jan 9 at 1:18










          • I thought about declaring the l1/2/3 variables inside the for loop but isn't the constant memory allocation bad for performance? Putting std:: I think it's going to be very hard for me but I read about its advantages.
            – Konstantinoscs
            Jan 9 at 2:58










          • @Konstantinoscs If you are using an optimizing compiler at the O3 level performance shouldn't be affected by these declarations. You actually have a nice little function there called find_the_triangle if you want it, if curb = find_the_triangle().
            – pacmaninbw
            Jan 9 at 3:23











          • Can you elaborate on the optimization a bit? Are there rules of thumb about what to not care about if you put -O?
            – Konstantinoscs
            Jan 9 at 3:42










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184588%2fsegment-road-map-data-with-heuristics%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          Prefer Clear Naming over using namespace std



          According to the MSDN website:




          Namespaces are used to organize code into logical groups and to prevent name collisions that can occur especially when your code base includes multiple libraries.




          A collision is when 2 different functions have the same name, the same argument types and a similar functionality (this is why they have the same name). Someone developing software may want to override a function such as std::cout, std::cin or they may want to override the functionality of a class such as std::vector or std::stack. Namespaces allow these constructs to be overridden.



          The use of the programming statement:



          using namespace std;


          hides the fact that cin, cout, vector and stack are coming from the namespace std where cin, cout,
          vector and stack are used in the code. This can cause confusion of where the code is actually
          coming from.



          As the software becomes more complex and uses more libraries this becomes a bigger problem.



          For a more detailed discussion of why it is a bad idea to use using namespace std; see this stackoverflow question and stackoverflow question.



          Reduce Complexity, Follow SRP

          The Single Responsibility Principle states that every module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility.




          Robert C. Martin expresses the principle as follows:
          A class should have only one reason to change.




          While this is primarily targeted at classes in object oriented languages it applies to functions and subroutines well.



          The void make_segments(const std::vector<road> &roads, const std::vector<node> &nodes, const std::string &out_s) function could be broken up into at multiple functions. The following code is a good candidate for a function to be called by make_segments:



           //for every other road segment it according to heuristics
          for (size_t j = 1; j<roads[i].nodes.size() - minthrs; j++) write)
          //check for curvature or maxsize
          write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
          write = false;




          The more separate functions there are the easier it is to understand or read the code. This also makes it easier for any programmer to maintain or debug the code.



          Prefer Iterators Over Indexes for Container Classes

          The function void write_segment(std::ofstream &out, int &id, std::string way_id, std::vector<double> &coords, int &minsize, int &maxsize, int &tot_n) contains the following code:



           for (size_t i = 0; i<coords.size() - 1; i++) 
          out << coords[i] << ", ";



          A more modern and less C programming language like construct would be:



           for (auto coord_iterator: coords) 
          out << coord_iterator << ",";



          Using iterators in this manner makes it much easier to program correctly.



          Declare Variables as Close to Use as Possible

          In the loop defined as a possible candidate for a function there are 3 variables that should be defined within the loop rather than at the top of the function. The variables l1, l2 and l3 are only assigned values and used within the loop.



           //find the 3 sides of the triangle for every 3 points
          double l1 = euclid_dist(nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon,
          nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon);
          double l2 = euclid_dist(nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon,
          nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon);
          double l3 = euclid_dist(nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon,
          nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon);
          //compute circumcircle radius (curvature)
          curb = curvature(l1, l2, l3);
          count++;





          share|improve this answer

















          • 1




            Your for auto loop writes an extra comma at the end, which the original code does not.
            – Roland Illig
            Jan 9 at 0:57










          • @RolandIllig true.
            – pacmaninbw
            Jan 9 at 1:18










          • I thought about declaring the l1/2/3 variables inside the for loop but isn't the constant memory allocation bad for performance? Putting std:: I think it's going to be very hard for me but I read about its advantages.
            – Konstantinoscs
            Jan 9 at 2:58










          • @Konstantinoscs If you are using an optimizing compiler at the O3 level performance shouldn't be affected by these declarations. You actually have a nice little function there called find_the_triangle if you want it, if curb = find_the_triangle().
            – pacmaninbw
            Jan 9 at 3:23











          • Can you elaborate on the optimization a bit? Are there rules of thumb about what to not care about if you put -O?
            – Konstantinoscs
            Jan 9 at 3:42














          up vote
          1
          down vote



          accepted










          Prefer Clear Naming over using namespace std



          According to the MSDN website:




          Namespaces are used to organize code into logical groups and to prevent name collisions that can occur especially when your code base includes multiple libraries.




          A collision is when 2 different functions have the same name, the same argument types and a similar functionality (this is why they have the same name). Someone developing software may want to override a function such as std::cout, std::cin or they may want to override the functionality of a class such as std::vector or std::stack. Namespaces allow these constructs to be overridden.



          The use of the programming statement:



          using namespace std;


          hides the fact that cin, cout, vector and stack are coming from the namespace std where cin, cout,
          vector and stack are used in the code. This can cause confusion of where the code is actually
          coming from.



          As the software becomes more complex and uses more libraries this becomes a bigger problem.



          For a more detailed discussion of why it is a bad idea to use using namespace std; see this stackoverflow question and stackoverflow question.



          Reduce Complexity, Follow SRP

          The Single Responsibility Principle states that every module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility.




          Robert C. Martin expresses the principle as follows:
          A class should have only one reason to change.




          While this is primarily targeted at classes in object oriented languages it applies to functions and subroutines well.



          The void make_segments(const std::vector<road> &roads, const std::vector<node> &nodes, const std::string &out_s) function could be broken up into at multiple functions. The following code is a good candidate for a function to be called by make_segments:



           //for every other road segment it according to heuristics
          for (size_t j = 1; j<roads[i].nodes.size() - minthrs; j++) write)
          //check for curvature or maxsize
          write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
          write = false;




          The more separate functions there are the easier it is to understand or read the code. This also makes it easier for any programmer to maintain or debug the code.



          Prefer Iterators Over Indexes for Container Classes

          The function void write_segment(std::ofstream &out, int &id, std::string way_id, std::vector<double> &coords, int &minsize, int &maxsize, int &tot_n) contains the following code:



           for (size_t i = 0; i<coords.size() - 1; i++) 
          out << coords[i] << ", ";



          A more modern and less C programming language like construct would be:



           for (auto coord_iterator: coords) 
          out << coord_iterator << ",";



          Using iterators in this manner makes it much easier to program correctly.



          Declare Variables as Close to Use as Possible

          In the loop defined as a possible candidate for a function there are 3 variables that should be defined within the loop rather than at the top of the function. The variables l1, l2 and l3 are only assigned values and used within the loop.



           //find the 3 sides of the triangle for every 3 points
          double l1 = euclid_dist(nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon,
          nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon);
          double l2 = euclid_dist(nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon,
          nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon);
          double l3 = euclid_dist(nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon,
          nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon);
          //compute circumcircle radius (curvature)
          curb = curvature(l1, l2, l3);
          count++;





          share|improve this answer

















          • 1




            Your for auto loop writes an extra comma at the end, which the original code does not.
            – Roland Illig
            Jan 9 at 0:57










          • @RolandIllig true.
            – pacmaninbw
            Jan 9 at 1:18










          • I thought about declaring the l1/2/3 variables inside the for loop but isn't the constant memory allocation bad for performance? Putting std:: I think it's going to be very hard for me but I read about its advantages.
            – Konstantinoscs
            Jan 9 at 2:58










          • @Konstantinoscs If you are using an optimizing compiler at the O3 level performance shouldn't be affected by these declarations. You actually have a nice little function there called find_the_triangle if you want it, if curb = find_the_triangle().
            – pacmaninbw
            Jan 9 at 3:23











          • Can you elaborate on the optimization a bit? Are there rules of thumb about what to not care about if you put -O?
            – Konstantinoscs
            Jan 9 at 3:42












          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          Prefer Clear Naming over using namespace std



          According to the MSDN website:




          Namespaces are used to organize code into logical groups and to prevent name collisions that can occur especially when your code base includes multiple libraries.




          A collision is when 2 different functions have the same name, the same argument types and a similar functionality (this is why they have the same name). Someone developing software may want to override a function such as std::cout, std::cin or they may want to override the functionality of a class such as std::vector or std::stack. Namespaces allow these constructs to be overridden.



          The use of the programming statement:



          using namespace std;


          hides the fact that cin, cout, vector and stack are coming from the namespace std where cin, cout,
          vector and stack are used in the code. This can cause confusion of where the code is actually
          coming from.



          As the software becomes more complex and uses more libraries this becomes a bigger problem.



          For a more detailed discussion of why it is a bad idea to use using namespace std; see this stackoverflow question and stackoverflow question.



          Reduce Complexity, Follow SRP

          The Single Responsibility Principle states that every module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility.




          Robert C. Martin expresses the principle as follows:
          A class should have only one reason to change.




          While this is primarily targeted at classes in object oriented languages it applies to functions and subroutines well.



          The void make_segments(const std::vector<road> &roads, const std::vector<node> &nodes, const std::string &out_s) function could be broken up into at multiple functions. The following code is a good candidate for a function to be called by make_segments:



           //for every other road segment it according to heuristics
          for (size_t j = 1; j<roads[i].nodes.size() - minthrs; j++) write)
          //check for curvature or maxsize
          write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
          write = false;




          The more separate functions there are the easier it is to understand or read the code. This also makes it easier for any programmer to maintain or debug the code.



          Prefer Iterators Over Indexes for Container Classes

          The function void write_segment(std::ofstream &out, int &id, std::string way_id, std::vector<double> &coords, int &minsize, int &maxsize, int &tot_n) contains the following code:



           for (size_t i = 0; i<coords.size() - 1; i++) 
          out << coords[i] << ", ";



          A more modern and less C programming language like construct would be:



           for (auto coord_iterator: coords) 
          out << coord_iterator << ",";



          Using iterators in this manner makes it much easier to program correctly.



          Declare Variables as Close to Use as Possible

          In the loop defined as a possible candidate for a function there are 3 variables that should be defined within the loop rather than at the top of the function. The variables l1, l2 and l3 are only assigned values and used within the loop.



           //find the 3 sides of the triangle for every 3 points
          double l1 = euclid_dist(nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon,
          nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon);
          double l2 = euclid_dist(nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon,
          nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon);
          double l3 = euclid_dist(nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon,
          nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon);
          //compute circumcircle radius (curvature)
          curb = curvature(l1, l2, l3);
          count++;





          share|improve this answer













          Prefer Clear Naming over using namespace std



          According to the MSDN website:




          Namespaces are used to organize code into logical groups and to prevent name collisions that can occur especially when your code base includes multiple libraries.




          A collision is when 2 different functions have the same name, the same argument types and a similar functionality (this is why they have the same name). Someone developing software may want to override a function such as std::cout, std::cin or they may want to override the functionality of a class such as std::vector or std::stack. Namespaces allow these constructs to be overridden.



          The use of the programming statement:



          using namespace std;


          hides the fact that cin, cout, vector and stack are coming from the namespace std where cin, cout,
          vector and stack are used in the code. This can cause confusion of where the code is actually
          coming from.



          As the software becomes more complex and uses more libraries this becomes a bigger problem.



          For a more detailed discussion of why it is a bad idea to use using namespace std; see this stackoverflow question and stackoverflow question.



          Reduce Complexity, Follow SRP

          The Single Responsibility Principle states that every module or class should have responsibility over a single part of the functionality provided by the software, and that responsibility should be entirely encapsulated by the class. All its services should be narrowly aligned with that responsibility.




          Robert C. Martin expresses the principle as follows:
          A class should have only one reason to change.




          While this is primarily targeted at classes in object oriented languages it applies to functions and subroutines well.



          The void make_segments(const std::vector<road> &roads, const std::vector<node> &nodes, const std::string &out_s) function could be broken up into at multiple functions. The following code is a good candidate for a function to be called by make_segments:



           //for every other road segment it according to heuristics
          for (size_t j = 1; j<roads[i].nodes.size() - minthrs; j++) write)
          //check for curvature or maxsize
          write_segment(out, segid, roads[i].id, coords, minsize, maxsize, tot_n);
          write = false;




          The more separate functions there are the easier it is to understand or read the code. This also makes it easier for any programmer to maintain or debug the code.



          Prefer Iterators Over Indexes for Container Classes

          The function void write_segment(std::ofstream &out, int &id, std::string way_id, std::vector<double> &coords, int &minsize, int &maxsize, int &tot_n) contains the following code:



           for (size_t i = 0; i<coords.size() - 1; i++) 
          out << coords[i] << ", ";



          A more modern and less C programming language like construct would be:



           for (auto coord_iterator: coords) 
          out << coord_iterator << ",";



          Using iterators in this manner makes it much easier to program correctly.



          Declare Variables as Close to Use as Possible

          In the loop defined as a possible candidate for a function there are 3 variables that should be defined within the loop rather than at the top of the function. The variables l1, l2 and l3 are only assigned values and used within the loop.



           //find the 3 sides of the triangle for every 3 points
          double l1 = euclid_dist(nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon,
          nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon);
          double l2 = euclid_dist(nodes[roads[i].nodes[j]].lat, nodes[roads[i].nodes[j]].lon,
          nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon);
          double l3 = euclid_dist(nodes[roads[i].nodes[j + 1]].lat, nodes[roads[i].nodes[j + 1]].lon,
          nodes[roads[i].nodes[j - 1]].lat, nodes[roads[i].nodes[j - 1]].lon);
          //compute circumcircle radius (curvature)
          curb = curvature(l1, l2, l3);
          count++;






          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Jan 9 at 0:17









          pacmaninbw

          4,99321436




          4,99321436







          • 1




            Your for auto loop writes an extra comma at the end, which the original code does not.
            – Roland Illig
            Jan 9 at 0:57










          • @RolandIllig true.
            – pacmaninbw
            Jan 9 at 1:18










          • I thought about declaring the l1/2/3 variables inside the for loop but isn't the constant memory allocation bad for performance? Putting std:: I think it's going to be very hard for me but I read about its advantages.
            – Konstantinoscs
            Jan 9 at 2:58










          • @Konstantinoscs If you are using an optimizing compiler at the O3 level performance shouldn't be affected by these declarations. You actually have a nice little function there called find_the_triangle if you want it, if curb = find_the_triangle().
            – pacmaninbw
            Jan 9 at 3:23











          • Can you elaborate on the optimization a bit? Are there rules of thumb about what to not care about if you put -O?
            – Konstantinoscs
            Jan 9 at 3:42












          • 1




            Your for auto loop writes an extra comma at the end, which the original code does not.
            – Roland Illig
            Jan 9 at 0:57










          • @RolandIllig true.
            – pacmaninbw
            Jan 9 at 1:18










          • I thought about declaring the l1/2/3 variables inside the for loop but isn't the constant memory allocation bad for performance? Putting std:: I think it's going to be very hard for me but I read about its advantages.
            – Konstantinoscs
            Jan 9 at 2:58










          • @Konstantinoscs If you are using an optimizing compiler at the O3 level performance shouldn't be affected by these declarations. You actually have a nice little function there called find_the_triangle if you want it, if curb = find_the_triangle().
            – pacmaninbw
            Jan 9 at 3:23











          • Can you elaborate on the optimization a bit? Are there rules of thumb about what to not care about if you put -O?
            – Konstantinoscs
            Jan 9 at 3:42







          1




          1




          Your for auto loop writes an extra comma at the end, which the original code does not.
          – Roland Illig
          Jan 9 at 0:57




          Your for auto loop writes an extra comma at the end, which the original code does not.
          – Roland Illig
          Jan 9 at 0:57












          @RolandIllig true.
          – pacmaninbw
          Jan 9 at 1:18




          @RolandIllig true.
          – pacmaninbw
          Jan 9 at 1:18












          I thought about declaring the l1/2/3 variables inside the for loop but isn't the constant memory allocation bad for performance? Putting std:: I think it's going to be very hard for me but I read about its advantages.
          – Konstantinoscs
          Jan 9 at 2:58




          I thought about declaring the l1/2/3 variables inside the for loop but isn't the constant memory allocation bad for performance? Putting std:: I think it's going to be very hard for me but I read about its advantages.
          – Konstantinoscs
          Jan 9 at 2:58












          @Konstantinoscs If you are using an optimizing compiler at the O3 level performance shouldn't be affected by these declarations. You actually have a nice little function there called find_the_triangle if you want it, if curb = find_the_triangle().
          – pacmaninbw
          Jan 9 at 3:23





          @Konstantinoscs If you are using an optimizing compiler at the O3 level performance shouldn't be affected by these declarations. You actually have a nice little function there called find_the_triangle if you want it, if curb = find_the_triangle().
          – pacmaninbw
          Jan 9 at 3:23













          Can you elaborate on the optimization a bit? Are there rules of thumb about what to not care about if you put -O?
          – Konstantinoscs
          Jan 9 at 3:42




          Can you elaborate on the optimization a bit? Are there rules of thumb about what to not care about if you put -O?
          – Konstantinoscs
          Jan 9 at 3:42












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184588%2fsegment-road-map-data-with-heuristics%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Python Lists

          Aion

          JavaScript Array Iteration Methods