Speed up a simple edge connecting algorithm

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:



edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]


We would connect the edge ((2, 1), (3, 1)) with the edge ((3, 6), (2, 6)) because the linking edge ((3, 1), (3, 6)) is in links, etc. The result would be ordering of edges as:



[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]


where ((2, 1), (3, 1)), ((3, 6), (2, 6)) forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4)) another one.



So the idea is to check if (edge[i][1], edge[j][0]) is in links, for all i!=j. If true, then connect the edge[i] with edge[j] by putting the latter after the former, and repeat with (edge[j][1], edge[k][0]) etc.



I use the following function to do it:



def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected


It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.







share|improve this question





















  • You're talking about graphs, so are edges really vertices and are links actually edges? Are links directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
    – Eric Duminil
    Mar 3 at 10:07






  • 1




    edges and links are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges and the ((3, 1), (3, 6)) is a vertical edge from links. If it is confusing, you indeed can think of elements of edgesas nodes and the elements of links as edges connecting the nodes of a graph.
    – Andyk
    Mar 3 at 10:39











  • Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
    – Andyk
    Mar 3 at 10:40










  • Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
    – Eric Duminil
    Mar 3 at 22:36
















up vote
3
down vote

favorite












I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:



edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]


We would connect the edge ((2, 1), (3, 1)) with the edge ((3, 6), (2, 6)) because the linking edge ((3, 1), (3, 6)) is in links, etc. The result would be ordering of edges as:



[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]


where ((2, 1), (3, 1)), ((3, 6), (2, 6)) forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4)) another one.



So the idea is to check if (edge[i][1], edge[j][0]) is in links, for all i!=j. If true, then connect the edge[i] with edge[j] by putting the latter after the former, and repeat with (edge[j][1], edge[k][0]) etc.



I use the following function to do it:



def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected


It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.







share|improve this question





















  • You're talking about graphs, so are edges really vertices and are links actually edges? Are links directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
    – Eric Duminil
    Mar 3 at 10:07






  • 1




    edges and links are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges and the ((3, 1), (3, 6)) is a vertical edge from links. If it is confusing, you indeed can think of elements of edgesas nodes and the elements of links as edges connecting the nodes of a graph.
    – Andyk
    Mar 3 at 10:39











  • Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
    – Andyk
    Mar 3 at 10:40










  • Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
    – Eric Duminil
    Mar 3 at 22:36












up vote
3
down vote

favorite









up vote
3
down vote

favorite











I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:



edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]


We would connect the edge ((2, 1), (3, 1)) with the edge ((3, 6), (2, 6)) because the linking edge ((3, 1), (3, 6)) is in links, etc. The result would be ordering of edges as:



[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]


where ((2, 1), (3, 1)), ((3, 6), (2, 6)) forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4)) another one.



So the idea is to check if (edge[i][1], edge[j][0]) is in links, for all i!=j. If true, then connect the edge[i] with edge[j] by putting the latter after the former, and repeat with (edge[j][1], edge[k][0]) etc.



I use the following function to do it:



def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected


It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.







share|improve this question













I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:



edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]


We would connect the edge ((2, 1), (3, 1)) with the edge ((3, 6), (2, 6)) because the linking edge ((3, 1), (3, 6)) is in links, etc. The result would be ordering of edges as:



[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]


where ((2, 1), (3, 1)), ((3, 6), (2, 6)) forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4)) another one.



So the idea is to check if (edge[i][1], edge[j][0]) is in links, for all i!=j. If true, then connect the edge[i] with edge[j] by putting the latter after the former, and repeat with (edge[j][1], edge[k][0]) etc.



I use the following function to do it:



def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected


It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.









share|improve this question












share|improve this question




share|improve this question








edited Mar 2 at 16:34
























asked Mar 2 at 16:21









Andyk

1166




1166











  • You're talking about graphs, so are edges really vertices and are links actually edges? Are links directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
    – Eric Duminil
    Mar 3 at 10:07






  • 1




    edges and links are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges and the ((3, 1), (3, 6)) is a vertical edge from links. If it is confusing, you indeed can think of elements of edgesas nodes and the elements of links as edges connecting the nodes of a graph.
    – Andyk
    Mar 3 at 10:39











  • Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
    – Andyk
    Mar 3 at 10:40










  • Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
    – Eric Duminil
    Mar 3 at 22:36
















  • You're talking about graphs, so are edges really vertices and are links actually edges? Are links directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
    – Eric Duminil
    Mar 3 at 10:07






  • 1




    edges and links are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges and the ((3, 1), (3, 6)) is a vertical edge from links. If it is confusing, you indeed can think of elements of edgesas nodes and the elements of links as edges connecting the nodes of a graph.
    – Andyk
    Mar 3 at 10:39











  • Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
    – Andyk
    Mar 3 at 10:40










  • Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
    – Eric Duminil
    Mar 3 at 22:36















You're talking about graphs, so are edges really vertices and are links actually edges? Are links directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
– Eric Duminil
Mar 3 at 10:07




You're talking about graphs, so are edges really vertices and are links actually edges? Are links directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
– Eric Duminil
Mar 3 at 10:07




1




1




edges and links are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges and the ((3, 1), (3, 6)) is a vertical edge from links. If it is confusing, you indeed can think of elements of edgesas nodes and the elements of links as edges connecting the nodes of a graph.
– Andyk
Mar 3 at 10:39





edges and links are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges and the ((3, 1), (3, 6)) is a vertical edge from links. If it is confusing, you indeed can think of elements of edgesas nodes and the elements of links as edges connecting the nodes of a graph.
– Andyk
Mar 3 at 10:39













Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
– Andyk
Mar 3 at 10:40




Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
– Andyk
Mar 3 at 10:40












Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
– Eric Duminil
Mar 3 at 22:36




Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
– Eric Duminil
Mar 3 at 22:36










1 Answer
1






active

oldest

votes

















up vote
1
down vote













Just a collection of thoughts...



Terminology



Since you're talking about graphs, it might help the readers if you talk about nodes (your edges) and edges (your links).



Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".



Use simpler objects



Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:



edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']


Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.



KeyError



When calling edges[links[e[1]], you're hoping there won't be any KeyError. Your example would fail with :



edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC']


Links between multiple edges



What happens in the case of link connecting more than 2 edges, for example



edges = ['AB', 'GH', 'AC']
links = ['HA']


? Your current algorithm doesn't seem to take it into account. A defaultdict(list) might help, for example to keep a list of nodes beginning with A and another list for nodes finishing with H.



Output format



If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.



['AB', 'GH', 'CD', 'EF']


networkx



If you work with graph theory in Python, you should take a look at networkx. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph.



Your example could become:



import networkx as nx
from collections import defaultdict
import matplotlib

edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']

G = nx.DiGraph()

from_dict = defaultdict(list)
to_dict = defaultdict(list)

for edge in edges:
node = ''.join(edge)
G.add_node(node)
to_dict[node[0]].append(node)
from_dict[node[1]].append(node)

for link in links:
for node1 in from_dict[link[0]]:
for node2 in to_dict[link[1]]:
G.add_edge(node1, node2)

list(nx.strongly_connected_components(G))
# ['AB', 'GH', 'CD', 'EF']
nx.draw(G, with_labels = True)


It displays:



enter image description here



You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.






share|improve this answer





















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188685%2fspeed-up-a-simple-edge-connecting-algorithm%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    Just a collection of thoughts...



    Terminology



    Since you're talking about graphs, it might help the readers if you talk about nodes (your edges) and edges (your links).



    Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".



    Use simpler objects



    Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:



    edges = ['AB', 'CD', 'EF', 'GH']
    links = ['HA', 'BG', 'FC', 'DE']


    Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.



    KeyError



    When calling edges[links[e[1]], you're hoping there won't be any KeyError. Your example would fail with :



    edges = ['AB', 'CD', 'EF', 'GH']
    links = ['HA', 'BG', 'FC']


    Links between multiple edges



    What happens in the case of link connecting more than 2 edges, for example



    edges = ['AB', 'GH', 'AC']
    links = ['HA']


    ? Your current algorithm doesn't seem to take it into account. A defaultdict(list) might help, for example to keep a list of nodes beginning with A and another list for nodes finishing with H.



    Output format



    If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.



    ['AB', 'GH', 'CD', 'EF']


    networkx



    If you work with graph theory in Python, you should take a look at networkx. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph.



    Your example could become:



    import networkx as nx
    from collections import defaultdict
    import matplotlib

    edges = ['AB', 'CD', 'EF', 'GH']
    links = ['HA', 'BG', 'FC', 'DE']

    G = nx.DiGraph()

    from_dict = defaultdict(list)
    to_dict = defaultdict(list)

    for edge in edges:
    node = ''.join(edge)
    G.add_node(node)
    to_dict[node[0]].append(node)
    from_dict[node[1]].append(node)

    for link in links:
    for node1 in from_dict[link[0]]:
    for node2 in to_dict[link[1]]:
    G.add_edge(node1, node2)

    list(nx.strongly_connected_components(G))
    # ['AB', 'GH', 'CD', 'EF']
    nx.draw(G, with_labels = True)


    It displays:



    enter image description here



    You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.






    share|improve this answer

























      up vote
      1
      down vote













      Just a collection of thoughts...



      Terminology



      Since you're talking about graphs, it might help the readers if you talk about nodes (your edges) and edges (your links).



      Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".



      Use simpler objects



      Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:



      edges = ['AB', 'CD', 'EF', 'GH']
      links = ['HA', 'BG', 'FC', 'DE']


      Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.



      KeyError



      When calling edges[links[e[1]], you're hoping there won't be any KeyError. Your example would fail with :



      edges = ['AB', 'CD', 'EF', 'GH']
      links = ['HA', 'BG', 'FC']


      Links between multiple edges



      What happens in the case of link connecting more than 2 edges, for example



      edges = ['AB', 'GH', 'AC']
      links = ['HA']


      ? Your current algorithm doesn't seem to take it into account. A defaultdict(list) might help, for example to keep a list of nodes beginning with A and another list for nodes finishing with H.



      Output format



      If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.



      ['AB', 'GH', 'CD', 'EF']


      networkx



      If you work with graph theory in Python, you should take a look at networkx. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph.



      Your example could become:



      import networkx as nx
      from collections import defaultdict
      import matplotlib

      edges = ['AB', 'CD', 'EF', 'GH']
      links = ['HA', 'BG', 'FC', 'DE']

      G = nx.DiGraph()

      from_dict = defaultdict(list)
      to_dict = defaultdict(list)

      for edge in edges:
      node = ''.join(edge)
      G.add_node(node)
      to_dict[node[0]].append(node)
      from_dict[node[1]].append(node)

      for link in links:
      for node1 in from_dict[link[0]]:
      for node2 in to_dict[link[1]]:
      G.add_edge(node1, node2)

      list(nx.strongly_connected_components(G))
      # ['AB', 'GH', 'CD', 'EF']
      nx.draw(G, with_labels = True)


      It displays:



      enter image description here



      You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.






      share|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote









        Just a collection of thoughts...



        Terminology



        Since you're talking about graphs, it might help the readers if you talk about nodes (your edges) and edges (your links).



        Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".



        Use simpler objects



        Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:



        edges = ['AB', 'CD', 'EF', 'GH']
        links = ['HA', 'BG', 'FC', 'DE']


        Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.



        KeyError



        When calling edges[links[e[1]], you're hoping there won't be any KeyError. Your example would fail with :



        edges = ['AB', 'CD', 'EF', 'GH']
        links = ['HA', 'BG', 'FC']


        Links between multiple edges



        What happens in the case of link connecting more than 2 edges, for example



        edges = ['AB', 'GH', 'AC']
        links = ['HA']


        ? Your current algorithm doesn't seem to take it into account. A defaultdict(list) might help, for example to keep a list of nodes beginning with A and another list for nodes finishing with H.



        Output format



        If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.



        ['AB', 'GH', 'CD', 'EF']


        networkx



        If you work with graph theory in Python, you should take a look at networkx. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph.



        Your example could become:



        import networkx as nx
        from collections import defaultdict
        import matplotlib

        edges = ['AB', 'CD', 'EF', 'GH']
        links = ['HA', 'BG', 'FC', 'DE']

        G = nx.DiGraph()

        from_dict = defaultdict(list)
        to_dict = defaultdict(list)

        for edge in edges:
        node = ''.join(edge)
        G.add_node(node)
        to_dict[node[0]].append(node)
        from_dict[node[1]].append(node)

        for link in links:
        for node1 in from_dict[link[0]]:
        for node2 in to_dict[link[1]]:
        G.add_edge(node1, node2)

        list(nx.strongly_connected_components(G))
        # ['AB', 'GH', 'CD', 'EF']
        nx.draw(G, with_labels = True)


        It displays:



        enter image description here



        You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.






        share|improve this answer













        Just a collection of thoughts...



        Terminology



        Since you're talking about graphs, it might help the readers if you talk about nodes (your edges) and edges (your links).



        Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".



        Use simpler objects



        Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:



        edges = ['AB', 'CD', 'EF', 'GH']
        links = ['HA', 'BG', 'FC', 'DE']


        Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.



        KeyError



        When calling edges[links[e[1]], you're hoping there won't be any KeyError. Your example would fail with :



        edges = ['AB', 'CD', 'EF', 'GH']
        links = ['HA', 'BG', 'FC']


        Links between multiple edges



        What happens in the case of link connecting more than 2 edges, for example



        edges = ['AB', 'GH', 'AC']
        links = ['HA']


        ? Your current algorithm doesn't seem to take it into account. A defaultdict(list) might help, for example to keep a list of nodes beginning with A and another list for nodes finishing with H.



        Output format



        If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.



        ['AB', 'GH', 'CD', 'EF']


        networkx



        If you work with graph theory in Python, you should take a look at networkx. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph.



        Your example could become:



        import networkx as nx
        from collections import defaultdict
        import matplotlib

        edges = ['AB', 'CD', 'EF', 'GH']
        links = ['HA', 'BG', 'FC', 'DE']

        G = nx.DiGraph()

        from_dict = defaultdict(list)
        to_dict = defaultdict(list)

        for edge in edges:
        node = ''.join(edge)
        G.add_node(node)
        to_dict[node[0]].append(node)
        from_dict[node[1]].append(node)

        for link in links:
        for node1 in from_dict[link[0]]:
        for node2 in to_dict[link[1]]:
        G.add_edge(node1, node2)

        list(nx.strongly_connected_components(G))
        # ['AB', 'GH', 'CD', 'EF']
        nx.draw(G, with_labels = True)


        It displays:



        enter image description here



        You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Mar 3 at 23:56









        Eric Duminil

        1,8501613




        1,8501613






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188685%2fspeed-up-a-simple-edge-connecting-algorithm%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Greedy Best First Search implementation in Rust

            Function to Return a JSON Like Objects Using VBA Collections and Arrays

            C++11 CLH Lock Implementation