Computing resilience of the network presented as an undirected graph in Python
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors)
for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited()
returns the set of nodes that are still connected to the given node.
How can I improve the implementation of the algorithm (currently it is very slow)?
def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited
def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components
def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N
returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res
python performance algorithm graph
add a comment |Â
up vote
2
down vote
favorite
The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors)
for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited()
returns the set of nodes that are still connected to the given node.
How can I improve the implementation of the algorithm (currently it is very slow)?
def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited
def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components
def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N
returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res
python performance algorithm graph
There is a stray}
in the first sentence of the post. Where does that belong?
â mkrieger1
Jan 6 at 15:10
@mkrieger1, it was supposed to be:node: (set of its neighbors) for each node in the graph
. Thank you for pointing out the error
â Ayenn Ul
Jan 6 at 15:16
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors)
for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited()
returns the set of nodes that are still connected to the given node.
How can I improve the implementation of the algorithm (currently it is very slow)?
def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited
def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components
def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N
returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res
python performance algorithm graph
The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors)
for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited()
returns the set of nodes that are still connected to the given node.
How can I improve the implementation of the algorithm (currently it is very slow)?
def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited
def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components
def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N
returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res
python performance algorithm graph
edited Jan 6 at 14:15
Gareth Rees
41.2k394168
41.2k394168
asked Jan 5 at 20:41
Ayenn Ul
605
605
There is a stray}
in the first sentence of the post. Where does that belong?
â mkrieger1
Jan 6 at 15:10
@mkrieger1, it was supposed to be:node: (set of its neighbors) for each node in the graph
. Thank you for pointing out the error
â Ayenn Ul
Jan 6 at 15:16
add a comment |Â
There is a stray}
in the first sentence of the post. Where does that belong?
â mkrieger1
Jan 6 at 15:10
@mkrieger1, it was supposed to be:node: (set of its neighbors) for each node in the graph
. Thank you for pointing out the error
â Ayenn Ul
Jan 6 at 15:16
There is a stray
}
in the first sentence of the post. Where does that belong?â mkrieger1
Jan 6 at 15:10
There is a stray
}
in the first sentence of the post. Where does that belong?â mkrieger1
Jan 6 at 15:10
@mkrieger1, it was supposed to be:
node: (set of its neighbors) for each node in the graph
. Thank you for pointing out the errorâ Ayenn Ul
Jan 6 at 15:16
@mkrieger1, it was supposed to be:
node: (set of its neighbors) for each node in the graph
. Thank you for pointing out the errorâ Ayenn Ul
Jan 6 at 15:16
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
1. Review
The docstring for
bfs_visited
should explain thenode
argument.The docstring for
compute_resilience
should explain that theugraph
argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.In
bfs_visited
the lines:queue = deque()
queue.append(node)can be simplified to:
queue = deque([node])
The function
largest_cc_size
builds a list of pairs:res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use
max
to find the largest:if ugraph:
return max(map(len, cc_visited(ugraph)))
else:
return 0If you are using Python 3.4 or later, this can be further simplified using the
default
argument tomax
:return max(map(len, cc_visited(ugraph)), default=0)
This is now so simple that it probably doesn't need its own function.
This line:
remaining_nodes = set(graph.keys())
can be written more simply:
remaining_nodes = set(graph)
There is a loop over the set
remaining_nodes
where on each loop iteration you updateremaining_nodes
:for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visitedIt looks as if the intention of the code to avoid iterating over the nodes in
visited
by removing them fromremaining_nodes
, but this doesn't work! The problem is that thefor
statement:for node in remaining_nodes:
only evaluates the expression
remaining_nodes
once, at the start of the loop. So when the code creates a new set and assigns it toremaining_nodes
:remaining_nodes = remaining_nodes - visited
this has no effect on the nodes being iterated over.
You might imagine trying to fix this by using the
difference_update
method to adjust the set being iterated over:remaining_nodes.difference_update(visited)
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
while remaining_nodes:
node = remaining_nodes.pop()
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes.difference_update(visited)Using
while
andpop
is the standard idiom in Python for consuming a data structure while modifying it â you do something similar inbfs_visited
.There is now no need for the test:
if visited not in connected_components:
since each component is produced exactly once.
In
compute_resilience
the first line is:res = [len(ugraph)]
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
res = [largest_cc_size(ugraph)]
For each node in attack order,
compute_resilience
calls:res.append(largest_cc_size(ugraph))
But this doesn't take advantage of the work that was previously done. When we remove
node
from the graph, all connected components remain the same, except for the connected component containingnode
. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.
This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.
def connected_components(graph, nodes):
"""Given an undirected graph represented as a mapping from nodes to
the set of their neighbours, and a set of nodes, find the
connected components in the graph containing those nodes.
Returns:
- mapping from nodes to the canonical node of the connected
component they belong to
- mapping from canonical nodes to connected components
"""
canonical =
components =
while nodes:
node = nodes.pop()
component = bfs_visited(graph, node)
components[node] = component
nodes.difference_update(component)
for n in component:
canonical[n] = node
return canonical, components
def resilience(graph, attack_order):
"""Given an undirected graph represented as a mapping from nodes to
an iterable of their neighbours, and an iterable of nodes, generate
integers such that the the k-th result is the size of the largest
connected component after the removal of the first k-1 nodes.
"""
# Take a copy of the graph so that we can destructively modify it.
graph = node: set(neighbours) for node, neighbours in graph.items()
canonical, components = connected_components(graph, set(graph))
largest = lambda: max(map(len, components.values()), default=0)
yield largest()
for node in attack_order:
# Find connected component containing node.
component = components.pop(canonical.pop(node))
# Remove node from graph.
for neighbor in graph[node]:
graph[neighbor].remove(node)
graph.pop(node)
component.remove(node)
# Component may have been split by removal of node, so search
# it for new connected components and update data structures
# accordingly.
canon, comp = connected_components(graph, component)
canonical.update(canon)
components.update(comp)
yield largest()In the revised code, the
max
operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.
2. Performance comparison
Here's a useful function for making test cases:
from itertools import combinations
from random import random
def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.
"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph
Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in ç1.2 above.
>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434
So the revised code is about 200 times faster on this test case.
thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in themax
function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
â Ayenn Ul
Jan 6 at 15:05
You way of copying the graphgraph = node: set(neighbours) for node, neighbours in graph.items()
looks very concise and neat. As far as I understand, you created a local variablegraph
which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
â Ayenn Ul
Jan 7 at 14:06
graph
isn't a global variable â it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ â "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
â Gareth Rees
Jan 7 at 17:38
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
1. Review
The docstring for
bfs_visited
should explain thenode
argument.The docstring for
compute_resilience
should explain that theugraph
argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.In
bfs_visited
the lines:queue = deque()
queue.append(node)can be simplified to:
queue = deque([node])
The function
largest_cc_size
builds a list of pairs:res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use
max
to find the largest:if ugraph:
return max(map(len, cc_visited(ugraph)))
else:
return 0If you are using Python 3.4 or later, this can be further simplified using the
default
argument tomax
:return max(map(len, cc_visited(ugraph)), default=0)
This is now so simple that it probably doesn't need its own function.
This line:
remaining_nodes = set(graph.keys())
can be written more simply:
remaining_nodes = set(graph)
There is a loop over the set
remaining_nodes
where on each loop iteration you updateremaining_nodes
:for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visitedIt looks as if the intention of the code to avoid iterating over the nodes in
visited
by removing them fromremaining_nodes
, but this doesn't work! The problem is that thefor
statement:for node in remaining_nodes:
only evaluates the expression
remaining_nodes
once, at the start of the loop. So when the code creates a new set and assigns it toremaining_nodes
:remaining_nodes = remaining_nodes - visited
this has no effect on the nodes being iterated over.
You might imagine trying to fix this by using the
difference_update
method to adjust the set being iterated over:remaining_nodes.difference_update(visited)
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
while remaining_nodes:
node = remaining_nodes.pop()
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes.difference_update(visited)Using
while
andpop
is the standard idiom in Python for consuming a data structure while modifying it â you do something similar inbfs_visited
.There is now no need for the test:
if visited not in connected_components:
since each component is produced exactly once.
In
compute_resilience
the first line is:res = [len(ugraph)]
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
res = [largest_cc_size(ugraph)]
For each node in attack order,
compute_resilience
calls:res.append(largest_cc_size(ugraph))
But this doesn't take advantage of the work that was previously done. When we remove
node
from the graph, all connected components remain the same, except for the connected component containingnode
. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.
This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.
def connected_components(graph, nodes):
"""Given an undirected graph represented as a mapping from nodes to
the set of their neighbours, and a set of nodes, find the
connected components in the graph containing those nodes.
Returns:
- mapping from nodes to the canonical node of the connected
component they belong to
- mapping from canonical nodes to connected components
"""
canonical =
components =
while nodes:
node = nodes.pop()
component = bfs_visited(graph, node)
components[node] = component
nodes.difference_update(component)
for n in component:
canonical[n] = node
return canonical, components
def resilience(graph, attack_order):
"""Given an undirected graph represented as a mapping from nodes to
an iterable of their neighbours, and an iterable of nodes, generate
integers such that the the k-th result is the size of the largest
connected component after the removal of the first k-1 nodes.
"""
# Take a copy of the graph so that we can destructively modify it.
graph = node: set(neighbours) for node, neighbours in graph.items()
canonical, components = connected_components(graph, set(graph))
largest = lambda: max(map(len, components.values()), default=0)
yield largest()
for node in attack_order:
# Find connected component containing node.
component = components.pop(canonical.pop(node))
# Remove node from graph.
for neighbor in graph[node]:
graph[neighbor].remove(node)
graph.pop(node)
component.remove(node)
# Component may have been split by removal of node, so search
# it for new connected components and update data structures
# accordingly.
canon, comp = connected_components(graph, component)
canonical.update(canon)
components.update(comp)
yield largest()In the revised code, the
max
operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.
2. Performance comparison
Here's a useful function for making test cases:
from itertools import combinations
from random import random
def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.
"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph
Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in ç1.2 above.
>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434
So the revised code is about 200 times faster on this test case.
thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in themax
function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
â Ayenn Ul
Jan 6 at 15:05
You way of copying the graphgraph = node: set(neighbours) for node, neighbours in graph.items()
looks very concise and neat. As far as I understand, you created a local variablegraph
which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
â Ayenn Ul
Jan 7 at 14:06
graph
isn't a global variable â it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ â "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
â Gareth Rees
Jan 7 at 17:38
add a comment |Â
up vote
2
down vote
accepted
1. Review
The docstring for
bfs_visited
should explain thenode
argument.The docstring for
compute_resilience
should explain that theugraph
argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.In
bfs_visited
the lines:queue = deque()
queue.append(node)can be simplified to:
queue = deque([node])
The function
largest_cc_size
builds a list of pairs:res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use
max
to find the largest:if ugraph:
return max(map(len, cc_visited(ugraph)))
else:
return 0If you are using Python 3.4 or later, this can be further simplified using the
default
argument tomax
:return max(map(len, cc_visited(ugraph)), default=0)
This is now so simple that it probably doesn't need its own function.
This line:
remaining_nodes = set(graph.keys())
can be written more simply:
remaining_nodes = set(graph)
There is a loop over the set
remaining_nodes
where on each loop iteration you updateremaining_nodes
:for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visitedIt looks as if the intention of the code to avoid iterating over the nodes in
visited
by removing them fromremaining_nodes
, but this doesn't work! The problem is that thefor
statement:for node in remaining_nodes:
only evaluates the expression
remaining_nodes
once, at the start of the loop. So when the code creates a new set and assigns it toremaining_nodes
:remaining_nodes = remaining_nodes - visited
this has no effect on the nodes being iterated over.
You might imagine trying to fix this by using the
difference_update
method to adjust the set being iterated over:remaining_nodes.difference_update(visited)
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
while remaining_nodes:
node = remaining_nodes.pop()
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes.difference_update(visited)Using
while
andpop
is the standard idiom in Python for consuming a data structure while modifying it â you do something similar inbfs_visited
.There is now no need for the test:
if visited not in connected_components:
since each component is produced exactly once.
In
compute_resilience
the first line is:res = [len(ugraph)]
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
res = [largest_cc_size(ugraph)]
For each node in attack order,
compute_resilience
calls:res.append(largest_cc_size(ugraph))
But this doesn't take advantage of the work that was previously done. When we remove
node
from the graph, all connected components remain the same, except for the connected component containingnode
. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.
This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.
def connected_components(graph, nodes):
"""Given an undirected graph represented as a mapping from nodes to
the set of their neighbours, and a set of nodes, find the
connected components in the graph containing those nodes.
Returns:
- mapping from nodes to the canonical node of the connected
component they belong to
- mapping from canonical nodes to connected components
"""
canonical =
components =
while nodes:
node = nodes.pop()
component = bfs_visited(graph, node)
components[node] = component
nodes.difference_update(component)
for n in component:
canonical[n] = node
return canonical, components
def resilience(graph, attack_order):
"""Given an undirected graph represented as a mapping from nodes to
an iterable of their neighbours, and an iterable of nodes, generate
integers such that the the k-th result is the size of the largest
connected component after the removal of the first k-1 nodes.
"""
# Take a copy of the graph so that we can destructively modify it.
graph = node: set(neighbours) for node, neighbours in graph.items()
canonical, components = connected_components(graph, set(graph))
largest = lambda: max(map(len, components.values()), default=0)
yield largest()
for node in attack_order:
# Find connected component containing node.
component = components.pop(canonical.pop(node))
# Remove node from graph.
for neighbor in graph[node]:
graph[neighbor].remove(node)
graph.pop(node)
component.remove(node)
# Component may have been split by removal of node, so search
# it for new connected components and update data structures
# accordingly.
canon, comp = connected_components(graph, component)
canonical.update(canon)
components.update(comp)
yield largest()In the revised code, the
max
operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.
2. Performance comparison
Here's a useful function for making test cases:
from itertools import combinations
from random import random
def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.
"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph
Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in ç1.2 above.
>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434
So the revised code is about 200 times faster on this test case.
thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in themax
function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
â Ayenn Ul
Jan 6 at 15:05
You way of copying the graphgraph = node: set(neighbours) for node, neighbours in graph.items()
looks very concise and neat. As far as I understand, you created a local variablegraph
which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
â Ayenn Ul
Jan 7 at 14:06
graph
isn't a global variable â it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ â "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
â Gareth Rees
Jan 7 at 17:38
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
1. Review
The docstring for
bfs_visited
should explain thenode
argument.The docstring for
compute_resilience
should explain that theugraph
argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.In
bfs_visited
the lines:queue = deque()
queue.append(node)can be simplified to:
queue = deque([node])
The function
largest_cc_size
builds a list of pairs:res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use
max
to find the largest:if ugraph:
return max(map(len, cc_visited(ugraph)))
else:
return 0If you are using Python 3.4 or later, this can be further simplified using the
default
argument tomax
:return max(map(len, cc_visited(ugraph)), default=0)
This is now so simple that it probably doesn't need its own function.
This line:
remaining_nodes = set(graph.keys())
can be written more simply:
remaining_nodes = set(graph)
There is a loop over the set
remaining_nodes
where on each loop iteration you updateremaining_nodes
:for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visitedIt looks as if the intention of the code to avoid iterating over the nodes in
visited
by removing them fromremaining_nodes
, but this doesn't work! The problem is that thefor
statement:for node in remaining_nodes:
only evaluates the expression
remaining_nodes
once, at the start of the loop. So when the code creates a new set and assigns it toremaining_nodes
:remaining_nodes = remaining_nodes - visited
this has no effect on the nodes being iterated over.
You might imagine trying to fix this by using the
difference_update
method to adjust the set being iterated over:remaining_nodes.difference_update(visited)
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
while remaining_nodes:
node = remaining_nodes.pop()
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes.difference_update(visited)Using
while
andpop
is the standard idiom in Python for consuming a data structure while modifying it â you do something similar inbfs_visited
.There is now no need for the test:
if visited not in connected_components:
since each component is produced exactly once.
In
compute_resilience
the first line is:res = [len(ugraph)]
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
res = [largest_cc_size(ugraph)]
For each node in attack order,
compute_resilience
calls:res.append(largest_cc_size(ugraph))
But this doesn't take advantage of the work that was previously done. When we remove
node
from the graph, all connected components remain the same, except for the connected component containingnode
. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.
This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.
def connected_components(graph, nodes):
"""Given an undirected graph represented as a mapping from nodes to
the set of their neighbours, and a set of nodes, find the
connected components in the graph containing those nodes.
Returns:
- mapping from nodes to the canonical node of the connected
component they belong to
- mapping from canonical nodes to connected components
"""
canonical =
components =
while nodes:
node = nodes.pop()
component = bfs_visited(graph, node)
components[node] = component
nodes.difference_update(component)
for n in component:
canonical[n] = node
return canonical, components
def resilience(graph, attack_order):
"""Given an undirected graph represented as a mapping from nodes to
an iterable of their neighbours, and an iterable of nodes, generate
integers such that the the k-th result is the size of the largest
connected component after the removal of the first k-1 nodes.
"""
# Take a copy of the graph so that we can destructively modify it.
graph = node: set(neighbours) for node, neighbours in graph.items()
canonical, components = connected_components(graph, set(graph))
largest = lambda: max(map(len, components.values()), default=0)
yield largest()
for node in attack_order:
# Find connected component containing node.
component = components.pop(canonical.pop(node))
# Remove node from graph.
for neighbor in graph[node]:
graph[neighbor].remove(node)
graph.pop(node)
component.remove(node)
# Component may have been split by removal of node, so search
# it for new connected components and update data structures
# accordingly.
canon, comp = connected_components(graph, component)
canonical.update(canon)
components.update(comp)
yield largest()In the revised code, the
max
operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.
2. Performance comparison
Here's a useful function for making test cases:
from itertools import combinations
from random import random
def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.
"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph
Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in ç1.2 above.
>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434
So the revised code is about 200 times faster on this test case.
1. Review
The docstring for
bfs_visited
should explain thenode
argument.The docstring for
compute_resilience
should explain that theugraph
argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.In
bfs_visited
the lines:queue = deque()
queue.append(node)can be simplified to:
queue = deque([node])
The function
largest_cc_size
builds a list of pairs:res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use
max
to find the largest:if ugraph:
return max(map(len, cc_visited(ugraph)))
else:
return 0If you are using Python 3.4 or later, this can be further simplified using the
default
argument tomax
:return max(map(len, cc_visited(ugraph)), default=0)
This is now so simple that it probably doesn't need its own function.
This line:
remaining_nodes = set(graph.keys())
can be written more simply:
remaining_nodes = set(graph)
There is a loop over the set
remaining_nodes
where on each loop iteration you updateremaining_nodes
:for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visitedIt looks as if the intention of the code to avoid iterating over the nodes in
visited
by removing them fromremaining_nodes
, but this doesn't work! The problem is that thefor
statement:for node in remaining_nodes:
only evaluates the expression
remaining_nodes
once, at the start of the loop. So when the code creates a new set and assigns it toremaining_nodes
:remaining_nodes = remaining_nodes - visited
this has no effect on the nodes being iterated over.
You might imagine trying to fix this by using the
difference_update
method to adjust the set being iterated over:remaining_nodes.difference_update(visited)
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
while remaining_nodes:
node = remaining_nodes.pop()
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes.difference_update(visited)Using
while
andpop
is the standard idiom in Python for consuming a data structure while modifying it â you do something similar inbfs_visited
.There is now no need for the test:
if visited not in connected_components:
since each component is produced exactly once.
In
compute_resilience
the first line is:res = [len(ugraph)]
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
res = [largest_cc_size(ugraph)]
For each node in attack order,
compute_resilience
calls:res.append(largest_cc_size(ugraph))
But this doesn't take advantage of the work that was previously done. When we remove
node
from the graph, all connected components remain the same, except for the connected component containingnode
. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.
This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.
def connected_components(graph, nodes):
"""Given an undirected graph represented as a mapping from nodes to
the set of their neighbours, and a set of nodes, find the
connected components in the graph containing those nodes.
Returns:
- mapping from nodes to the canonical node of the connected
component they belong to
- mapping from canonical nodes to connected components
"""
canonical =
components =
while nodes:
node = nodes.pop()
component = bfs_visited(graph, node)
components[node] = component
nodes.difference_update(component)
for n in component:
canonical[n] = node
return canonical, components
def resilience(graph, attack_order):
"""Given an undirected graph represented as a mapping from nodes to
an iterable of their neighbours, and an iterable of nodes, generate
integers such that the the k-th result is the size of the largest
connected component after the removal of the first k-1 nodes.
"""
# Take a copy of the graph so that we can destructively modify it.
graph = node: set(neighbours) for node, neighbours in graph.items()
canonical, components = connected_components(graph, set(graph))
largest = lambda: max(map(len, components.values()), default=0)
yield largest()
for node in attack_order:
# Find connected component containing node.
component = components.pop(canonical.pop(node))
# Remove node from graph.
for neighbor in graph[node]:
graph[neighbor].remove(node)
graph.pop(node)
component.remove(node)
# Component may have been split by removal of node, so search
# it for new connected components and update data structures
# accordingly.
canon, comp = connected_components(graph, component)
canonical.update(canon)
components.update(comp)
yield largest()In the revised code, the
max
operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.
2. Performance comparison
Here's a useful function for making test cases:
from itertools import combinations
from random import random
def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.
"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph
Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in ç1.2 above.
>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434
So the revised code is about 200 times faster on this test case.
edited Jan 6 at 12:33
answered Jan 6 at 11:19
Gareth Rees
41.2k394168
41.2k394168
thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in themax
function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
â Ayenn Ul
Jan 6 at 15:05
You way of copying the graphgraph = node: set(neighbours) for node, neighbours in graph.items()
looks very concise and neat. As far as I understand, you created a local variablegraph
which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
â Ayenn Ul
Jan 7 at 14:06
graph
isn't a global variable â it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ â "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
â Gareth Rees
Jan 7 at 17:38
add a comment |Â
thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in themax
function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
â Ayenn Ul
Jan 6 at 15:05
You way of copying the graphgraph = node: set(neighbours) for node, neighbours in graph.items()
looks very concise and neat. As far as I understand, you created a local variablegraph
which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
â Ayenn Ul
Jan 7 at 14:06
graph
isn't a global variable â it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ â "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
â Gareth Rees
Jan 7 at 17:38
thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the
max
function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.â Ayenn Ul
Jan 6 at 15:05
thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the
max
function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.â Ayenn Ul
Jan 6 at 15:05
You way of copying the graph
graph = node: set(neighbours) for node, neighbours in graph.items()
looks very concise and neat. As far as I understand, you created a local variable graph
which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?â Ayenn Ul
Jan 7 at 14:06
You way of copying the graph
graph = node: set(neighbours) for node, neighbours in graph.items()
looks very concise and neat. As far as I understand, you created a local variable graph
which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?â Ayenn Ul
Jan 7 at 14:06
graph
isn't a global variable â it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ â "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".â Gareth Rees
Jan 7 at 17:38
graph
isn't a global variable â it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ â "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".â Gareth Rees
Jan 7 at 17:38
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184392%2fcomputing-resilience-of-the-network-presented-as-an-undirected-graph-in-python%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
There is a stray
}
in the first sentence of the post. Where does that belong?â mkrieger1
Jan 6 at 15:10
@mkrieger1, it was supposed to be:
node: (set of its neighbors) for each node in the graph
. Thank you for pointing out the errorâ Ayenn Ul
Jan 6 at 15:16