Speed up a simple edge connecting algorithm
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:
edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]
We would connect the edge ((2, 1), (3, 1))
with the edge ((3, 6), (2, 6))
because the linking edge ((3, 1), (3, 6))
is in links
, etc. The result would be ordering of edges
as:
[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]
where ((2, 1), (3, 1)), ((3, 6), (2, 6))
forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4))
another one.
So the idea is to check if (edge[i][1], edge[j][0])
is in links
, for all i!=j
. If true, then connect the edge[i]
with edge[j]
by putting the latter after the former, and repeat with (edge[j][1], edge[k][0])
etc.
I use the following function to do it:
def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected
It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.
python performance algorithm graph cython
add a comment |Â
up vote
3
down vote
favorite
I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:
edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]
We would connect the edge ((2, 1), (3, 1))
with the edge ((3, 6), (2, 6))
because the linking edge ((3, 1), (3, 6))
is in links
, etc. The result would be ordering of edges
as:
[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]
where ((2, 1), (3, 1)), ((3, 6), (2, 6))
forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4))
another one.
So the idea is to check if (edge[i][1], edge[j][0])
is in links
, for all i!=j
. If true, then connect the edge[i]
with edge[j]
by putting the latter after the former, and repeat with (edge[j][1], edge[k][0])
etc.
I use the following function to do it:
def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected
It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.
python performance algorithm graph cython
You're talking about graphs, so areedges
really vertices and arelinks
actually edges? Arelinks
directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
â Eric Duminil
Mar 3 at 10:07
1
edges
andlinks
are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges
and the ((3, 1), (3, 6)) is a vertical edge fromlinks
. If it is confusing, you indeed can think of elements ofedges
as nodes and the elements oflinks
as edges connecting the nodes of a graph.
â Andyk
Mar 3 at 10:39
Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
â Andyk
Mar 3 at 10:40
Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
â Eric Duminil
Mar 3 at 22:36
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:
edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]
We would connect the edge ((2, 1), (3, 1))
with the edge ((3, 6), (2, 6))
because the linking edge ((3, 1), (3, 6))
is in links
, etc. The result would be ordering of edges
as:
[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]
where ((2, 1), (3, 1)), ((3, 6), (2, 6))
forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4))
another one.
So the idea is to check if (edge[i][1], edge[j][0])
is in links
, for all i!=j
. If true, then connect the edge[i]
with edge[j]
by putting the latter after the former, and repeat with (edge[j][1], edge[k][0])
etc.
I use the following function to do it:
def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected
It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.
python performance algorithm graph cython
I have a list of edges and a similar list of links (linking edges) telling which edges to connect (how to order them). Connected edges form polygons and there can be many disconnected polygons (ordering of which doesn't matter). For example:
edges = [((2, 1), (3, 1)), ((4, 1), (6, 1)), ((6, 4), (4, 4)), ((3, 6), (2, 6))]
links = [((2, 6), (2, 1)), ((3, 1), (3, 6)), ((4, 4), (4, 1)), ((6, 1), (6, 4))]
We would connect the edge ((2, 1), (3, 1))
with the edge ((3, 6), (2, 6))
because the linking edge ((3, 1), (3, 6))
is in links
, etc. The result would be ordering of edges
as:
[((2, 1), (3, 1)), ((3, 6), (2, 6)), ((4, 1), (6, 1)), ((6, 4), (4, 4))]
where ((2, 1), (3, 1)), ((3, 6), (2, 6))
forms a polygon and ((4, 1), (6, 1)), ((6, 4), (4, 4))
another one.
So the idea is to check if (edge[i][1], edge[j][0])
is in links
, for all i!=j
. If true, then connect the edge[i]
with edge[j]
by putting the latter after the former, and repeat with (edge[j][1], edge[k][0])
etc.
I use the following function to do it:
def connect_edges(edges, links):
edges = dict(edges)
links = dict(links)
seen = set()
edges_connected =
for e in edges.items():
while e not in seen:
seen.add(e)
edges_connected.append(e)
e = (links[e[1]], edges[links[e[1]]])
return edges_connected
It is quite fast, but for a very large number of edges (e.g. 1 million) it takes almost 20 seconds on my old laptop. Is there a way to speed it up? I think it is more or less optimal and maybe could be cynthonized for a gain in speed, but the problem is that it is very pythonic (having lists of tuples, dictionaries, sets) and I don't know if it will work (my knowledge of cython is very limited). Simply cythonizing without declaring types gives minuscule speed-up.
python performance algorithm graph cython
edited Mar 2 at 16:34
asked Mar 2 at 16:21
Andyk
1166
1166
You're talking about graphs, so areedges
really vertices and arelinks
actually edges? Arelinks
directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
â Eric Duminil
Mar 3 at 10:07
1
edges
andlinks
are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges
and the ((3, 1), (3, 6)) is a vertical edge fromlinks
. If it is confusing, you indeed can think of elements ofedges
as nodes and the elements oflinks
as edges connecting the nodes of a graph.
â Andyk
Mar 3 at 10:39
Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
â Andyk
Mar 3 at 10:40
Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
â Eric Duminil
Mar 3 at 22:36
add a comment |Â
You're talking about graphs, so areedges
really vertices and arelinks
actually edges? Arelinks
directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.
â Eric Duminil
Mar 3 at 10:07
1
edges
andlinks
are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges
and the ((3, 1), (3, 6)) is a vertical edge fromlinks
. If it is confusing, you indeed can think of elements ofedges
as nodes and the elements oflinks
as edges connecting the nodes of a graph.
â Andyk
Mar 3 at 10:39
Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
â Andyk
Mar 3 at 10:40
Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
â Eric Duminil
Mar 3 at 22:36
You're talking about graphs, so are
edges
really vertices and are links
actually edges? Are links
directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.â Eric Duminil
Mar 3 at 10:07
You're talking about graphs, so are
edges
really vertices and are links
actually edges? Are links
directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.â Eric Duminil
Mar 3 at 10:07
1
1
edges
and links
are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges
and the ((3, 1), (3, 6)) is a vertical edge from links
. If it is confusing, you indeed can think of elements of edges
as nodes and the elements of links
as edges connecting the nodes of a graph.â Andyk
Mar 3 at 10:39
edges
and links
are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges
and the ((3, 1), (3, 6)) is a vertical edge from links
. If it is confusing, you indeed can think of elements of edges
as nodes and the elements of links
as edges connecting the nodes of a graph.â Andyk
Mar 3 at 10:39
Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
â Andyk
Mar 3 at 10:40
Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
â Andyk
Mar 3 at 10:40
Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
â Eric Duminil
Mar 3 at 22:36
Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
â Eric Duminil
Mar 3 at 22:36
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
Just a collection of thoughts...
Terminology
Since you're talking about graphs, it might help the readers if you talk about nodes (your edges
) and edges (your links
).
Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".
Use simpler objects
Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.
KeyError
When calling edges[links[e[1]]
, you're hoping there won't be any KeyError
. Your example would fail with :
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC']
Links between multiple edges
What happens in the case of link connecting more than 2 edges, for example
edges = ['AB', 'GH', 'AC']
links = ['HA']
? Your current algorithm doesn't seem to take it into account. A defaultdict(list)
might help, for example to keep a list of nodes beginning with A
and another list for nodes finishing with H
.
Output format
If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.
['AB', 'GH', 'CD', 'EF']
networkx
If you work with graph theory in Python, you should take a look at networkx
. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph
.
Your example could become:
import networkx as nx
from collections import defaultdict
import matplotlib
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
G = nx.DiGraph()
from_dict = defaultdict(list)
to_dict = defaultdict(list)
for edge in edges:
node = ''.join(edge)
G.add_node(node)
to_dict[node[0]].append(node)
from_dict[node[1]].append(node)
for link in links:
for node1 in from_dict[link[0]]:
for node2 in to_dict[link[1]]:
G.add_edge(node1, node2)
list(nx.strongly_connected_components(G))
# ['AB', 'GH', 'CD', 'EF']
nx.draw(G, with_labels = True)
It displays:
You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Just a collection of thoughts...
Terminology
Since you're talking about graphs, it might help the readers if you talk about nodes (your edges
) and edges (your links
).
Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".
Use simpler objects
Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.
KeyError
When calling edges[links[e[1]]
, you're hoping there won't be any KeyError
. Your example would fail with :
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC']
Links between multiple edges
What happens in the case of link connecting more than 2 edges, for example
edges = ['AB', 'GH', 'AC']
links = ['HA']
? Your current algorithm doesn't seem to take it into account. A defaultdict(list)
might help, for example to keep a list of nodes beginning with A
and another list for nodes finishing with H
.
Output format
If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.
['AB', 'GH', 'CD', 'EF']
networkx
If you work with graph theory in Python, you should take a look at networkx
. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph
.
Your example could become:
import networkx as nx
from collections import defaultdict
import matplotlib
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
G = nx.DiGraph()
from_dict = defaultdict(list)
to_dict = defaultdict(list)
for edge in edges:
node = ''.join(edge)
G.add_node(node)
to_dict[node[0]].append(node)
from_dict[node[1]].append(node)
for link in links:
for node1 in from_dict[link[0]]:
for node2 in to_dict[link[1]]:
G.add_edge(node1, node2)
list(nx.strongly_connected_components(G))
# ['AB', 'GH', 'CD', 'EF']
nx.draw(G, with_labels = True)
It displays:
You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.
add a comment |Â
up vote
1
down vote
Just a collection of thoughts...
Terminology
Since you're talking about graphs, it might help the readers if you talk about nodes (your edges
) and edges (your links
).
Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".
Use simpler objects
Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.
KeyError
When calling edges[links[e[1]]
, you're hoping there won't be any KeyError
. Your example would fail with :
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC']
Links between multiple edges
What happens in the case of link connecting more than 2 edges, for example
edges = ['AB', 'GH', 'AC']
links = ['HA']
? Your current algorithm doesn't seem to take it into account. A defaultdict(list)
might help, for example to keep a list of nodes beginning with A
and another list for nodes finishing with H
.
Output format
If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.
['AB', 'GH', 'CD', 'EF']
networkx
If you work with graph theory in Python, you should take a look at networkx
. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph
.
Your example could become:
import networkx as nx
from collections import defaultdict
import matplotlib
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
G = nx.DiGraph()
from_dict = defaultdict(list)
to_dict = defaultdict(list)
for edge in edges:
node = ''.join(edge)
G.add_node(node)
to_dict[node[0]].append(node)
from_dict[node[1]].append(node)
for link in links:
for node1 in from_dict[link[0]]:
for node2 in to_dict[link[1]]:
G.add_edge(node1, node2)
list(nx.strongly_connected_components(G))
# ['AB', 'GH', 'CD', 'EF']
nx.draw(G, with_labels = True)
It displays:
You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Just a collection of thoughts...
Terminology
Since you're talking about graphs, it might help the readers if you talk about nodes (your edges
) and edges (your links
).
Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".
Use simpler objects
Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.
KeyError
When calling edges[links[e[1]]
, you're hoping there won't be any KeyError
. Your example would fail with :
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC']
Links between multiple edges
What happens in the case of link connecting more than 2 edges, for example
edges = ['AB', 'GH', 'AC']
links = ['HA']
? Your current algorithm doesn't seem to take it into account. A defaultdict(list)
might help, for example to keep a list of nodes beginning with A
and another list for nodes finishing with H
.
Output format
If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.
['AB', 'GH', 'CD', 'EF']
networkx
If you work with graph theory in Python, you should take a look at networkx
. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph
.
Your example could become:
import networkx as nx
from collections import defaultdict
import matplotlib
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
G = nx.DiGraph()
from_dict = defaultdict(list)
to_dict = defaultdict(list)
for edge in edges:
node = ''.join(edge)
G.add_node(node)
to_dict[node[0]].append(node)
from_dict[node[1]].append(node)
for link in links:
for node1 in from_dict[link[0]]:
for node2 in to_dict[link[1]]:
G.add_edge(node1, node2)
list(nx.strongly_connected_components(G))
# ['AB', 'GH', 'CD', 'EF']
nx.draw(G, with_labels = True)
It displays:
You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.
Just a collection of thoughts...
Terminology
Since you're talking about graphs, it might help the readers if you talk about nodes (your edges
) and edges (your links
).
Your graph is directed, and as far as I can tell, you're looking for the "strongly connected components".
Use simpler objects
Lists of pairs of pairs aren't very readable IMHO. When showing examples, you can replace them with lists of strings:
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
Python strings are basically tuples of characters, so you don't have to change much in your code. The above example is isomorphic to yours and is much more concise.
KeyError
When calling edges[links[e[1]]
, you're hoping there won't be any KeyError
. Your example would fail with :
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC']
Links between multiple edges
What happens in the case of link connecting more than 2 edges, for example
edges = ['AB', 'GH', 'AC']
links = ['HA']
? Your current algorithm doesn't seem to take it into account. A defaultdict(list)
might help, for example to keep a list of nodes beginning with A
and another list for nodes finishing with H
.
Output format
If you only sort the edges, without grouping them in connected components, you're losing information which you have to recover later. You could return a list of sets of edges instead of a flat list of edges.
['AB', 'GH', 'CD', 'EF']
networkx
If you work with graph theory in Python, you should take a look at networkx
. It's fast, easy to use and offers many algorithms. All you need to do is to preprocess your data and feed it to a nx.DiGraph
.
Your example could become:
import networkx as nx
from collections import defaultdict
import matplotlib
edges = ['AB', 'CD', 'EF', 'GH']
links = ['HA', 'BG', 'FC', 'DE']
G = nx.DiGraph()
from_dict = defaultdict(list)
to_dict = defaultdict(list)
for edge in edges:
node = ''.join(edge)
G.add_node(node)
to_dict[node[0]].append(node)
from_dict[node[1]].append(node)
for link in links:
for node1 in from_dict[link[0]]:
for node2 in to_dict[link[1]]:
G.add_edge(node1, node2)
list(nx.strongly_connected_components(G))
# ['AB', 'GH', 'CD', 'EF']
nx.draw(G, with_labels = True)
It displays:
You get more information, it's more robust, you get diagrams if you want, it can handle more complex cases and (who knows?) it might be faster than your solution for large datasets.
answered Mar 3 at 23:56
Eric Duminil
1,8501613
1,8501613
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f188685%2fspeed-up-a-simple-edge-connecting-algorithm%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
You're talking about graphs, so are
edges
really vertices and arelinks
actually edges? Arelinks
directed? Can they only connect in one direction? When you write "polygon", do you mean "polyline"? Polygons should be rings, with the last vertex being the first one.â Eric Duminil
Mar 3 at 10:07
1
edges
andlinks
are both lists of edges (horizontal edges and vertical edges of a rectilinear polygon, respectively). For example, in ((2, 1), (3, 1)) -- ((3, 6), (2, 6)), the edges ((2, 1), (3, 1)) and ((3, 6), (2, 6)) are horizontal edges fromedges
and the ((3, 1), (3, 6)) is a vertical edge fromlinks
. If it is confusing, you indeed can think of elements ofedges
as nodes and the elements oflinks
as edges connecting the nodes of a graph.â Andyk
Mar 3 at 10:39
Technically they form polylines as you say, but later on I add the first vertex to the end to close it (I didn't show it here in order to make the code simpler). And yes, they are directed in the sense that they form a polygon if the vertices are connected in the given order e.g. (2, 1) -- (3, 1) -- (3, 6) -- (2, 6) -- (2,1). Sorry for the confusion.
â Andyk
Mar 3 at 10:40
Thanks for the answer. Just curious : why do you need it? Don't you need more information, e.g. each polyline grouped together instead of being concatenated into a single list, without separator? Aren't there multiple, possible orderings?
â Eric Duminil
Mar 3 at 22:36