trjhtr

Question

The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors) for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1^th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited() returns the set of nodes that are still connected to the given node.

How can I improve the implementation of the algorithm (currently it is very slow)?

def bfs_visited(graph, node):
 """undirected graph Vertex: neighbors
 Returns the set of all nodes visited by the algrorithm"""
 queue = deque()
 queue.append(node)
 visited = set([node])
 while queue:
 current_node = queue.popleft()
 for neighbor in graph[current_node]:
 if neighbor not in visited:
 visited.add(neighbor)
 queue.append(neighbor)
 return visited

def cc_visited(graph):
 """ undirected graph Vertex: neighbors
 Returns a list of sets of connected components"""
 remaining_nodes = set(graph.keys())
 connected_components = 
 for node in remaining_nodes:
 visited = bfs_visited(graph, node)
 if visited not in connected_components:
 connected_components.append(visited)
 remaining_nodes = remaining_nodes - visited
 #print(node, remaining_nodes)
 return connected_components

def largest_cc_size(ugraph):
 """returns the size (an integer) of the largest connected component in the ugraph."""
 if not ugraph:
 return 0
 res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
 res.sort()
 return res[-1][0]

def compute_resilience(ugraph, attack_order):
 """
 input: a graph V: N

 returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
 """
 res = [len(ugraph)]
 for node in attack_order:
 neighbors = ugraph[node] 
 for neighbor in neighbors:
 ugraph[neighbor].remove(node)
 ugraph.pop(node)
 res.append(largest_cc_size(ugraph)) 
 return res

There is a stray } in the first sentence of the post. Where does that belong? — Jan 6 at 15:10
@mkrieger1, it was supposed to be: node: (set of its neighbors) for each node in the graph. Thank you for pointing out the error — Jan 6 at 15:16

score 2 · Accepted Answer · 2018-01-06 12:33:23Z

1. Review

The docstring for bfs_visited should explain the node argument.

The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.

In bfs_visited the lines:

queue = deque()
queue.append(node)

can be simplified to:

queue = deque([node])

The function largest_cc_size builds a list of pairs:
```
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
```
But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
```
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]
```

Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:
```
if ugraph:
 return max(map(len, cc_visited(ugraph)))
else:
 return 0
```

If you are using Python 3.4 or later, this can be further simplified using the default argument to max:
```
return max(map(len, cc_visited(ugraph)), default=0)
```
This is now so simple that it probably doesn't need its own function.

This line:

remaining_nodes = set(graph.keys())

can be written more simply:

remaining_nodes = set(graph)

There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:
```
for node in remaining_nodes:
 visited = bfs_visited(graph, node)
 if visited not in connected_components:
 connected_components.append(visited)
 remaining_nodes = remaining_nodes - visited
```
It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:
```
for node in remaining_nodes:
```
only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:
```
remaining_nodes = remaining_nodes - visited
```
this has no effect on the nodes being iterated over.

You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:
```
remaining_nodes.difference_update(visited)
```
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
```
while remaining_nodes:
 node = remaining_nodes.pop()
 visited = bfs_visited(graph, node)
 if visited not in connected_components:
 connected_components.append(visited)
 remaining_nodes.difference_update(visited)
```
Using while and pop is the standard idiom in Python for consuming a data structure while modifying it Ã¢Â€Â” you do something similar in bfs_visited.

There is now no need for the test:
```
if visited not in connected_components:
```
since each component is produced exactly once.

In compute_resilience the first line is:
```
res = [len(ugraph)]
```
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
```
res = [largest_cc_size(ugraph)]
```

For each node in attack order, compute_resilience calls:

res.append(largest_cc_size(ugraph))

But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)

In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.

This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.

def connected_components(graph, nodes):
 """Given an undirected graph represented as a mapping from nodes to
 the set of their neighbours, and a set of nodes, find the
 connected components in the graph containing those nodes.

 Returns:
 - mapping from nodes to the canonical node of the connected
 component they belong to
 - mapping from canonical nodes to connected components

 """
 canonical = 
 components = 
 while nodes:
 node = nodes.pop()
 component = bfs_visited(graph, node)
 components[node] = component
 nodes.difference_update(component)
 for n in component:
 canonical[n] = node
 return canonical, components

def resilience(graph, attack_order):
 """Given an undirected graph represented as a mapping from nodes to
 an iterable of their neighbours, and an iterable of nodes, generate
 integers such that the the k-th result is the size of the largest
 connected component after the removal of the first k-1 nodes.

 """
 # Take a copy of the graph so that we can destructively modify it.
 graph = node: set(neighbours) for node, neighbours in graph.items()

 canonical, components = connected_components(graph, set(graph))
 largest = lambda: max(map(len, components.values()), default=0)
 yield largest()
 for node in attack_order:
 # Find connected component containing node.
 component = components.pop(canonical.pop(node))

 # Remove node from graph.
 for neighbor in graph[node]:
 graph[neighbor].remove(node)
 graph.pop(node)
 component.remove(node)

 # Component may have been split by removal of node, so search
 # it for new connected components and update data structures
 # accordingly.
 canon, comp = connected_components(graph, component)
 canonical.update(canon)
 components.update(comp)
 yield largest()

In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.

I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.

2. Performance comparison

Here's a useful function for making test cases:

from itertools import combinations
from random import random

def random_graph(n, p):
 """Return a random undirected graph with n nodes and each edge chosen
 independently with probability p.

 """
 assert 0 <= p <= 1
 graph = i: set() for i in range(n)
 for i, j in combinations(range(n), 2):
 if random() <= p:
 graph[i].add(j)
 graph[j].add(i)
 return graph

Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in Ã‚Â§1.2 above.

>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434

So the revised code is about 200 times faster on this test case.

thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments. — Jan 6 at 15:05
You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations? — Jan 7 at 14:06
graph isn't a global variable Ã¢Â€Â” it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ Ã¢Â€Â” "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope". — Jan 7 at 17:38

score 2 · Accepted Answer · 2018-01-06 12:33:23Z

1. Review

The docstring for bfs_visited should explain the node argument.

The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.

In bfs_visited the lines:

queue = deque()
queue.append(node)

can be simplified to:

queue = deque([node])

The function largest_cc_size builds a list of pairs:
```
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]
```
But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:
```
res = [len(ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1]
```

Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:
```
if ugraph:
 return max(map(len, cc_visited(ugraph)))
else:
 return 0
```

If you are using Python 3.4 or later, this can be further simplified using the default argument to max:
```
return max(map(len, cc_visited(ugraph)), default=0)
```
This is now so simple that it probably doesn't need its own function.

This line:

remaining_nodes = set(graph.keys())

can be written more simply:

remaining_nodes = set(graph)

There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:
```
for node in remaining_nodes:
 visited = bfs_visited(graph, node)
 if visited not in connected_components:
 connected_components.append(visited)
 remaining_nodes = remaining_nodes - visited
```
It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:
```
for node in remaining_nodes:
```
only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:
```
remaining_nodes = remaining_nodes - visited
```
this has no effect on the nodes being iterated over.

You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:
```
remaining_nodes.difference_update(visited)
```
but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:
```
while remaining_nodes:
 node = remaining_nodes.pop()
 visited = bfs_visited(graph, node)
 if visited not in connected_components:
 connected_components.append(visited)
 remaining_nodes.difference_update(visited)
```
Using while and pop is the standard idiom in Python for consuming a data structure while modifying it Ã¢Â€Â” you do something similar in bfs_visited.

There is now no need for the test:
```
if visited not in connected_components:
```
since each component is produced exactly once.

In compute_resilience the first line is:
```
res = [len(ugraph)]
```
but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:
```
res = [largest_cc_size(ugraph)]
```

For each node in attack order, compute_resilience calls:

res.append(largest_cc_size(ugraph))

But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)

In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.

This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.

def connected_components(graph, nodes):
 """Given an undirected graph represented as a mapping from nodes to
 the set of their neighbours, and a set of nodes, find the
 connected components in the graph containing those nodes.

 Returns:
 - mapping from nodes to the canonical node of the connected
 component they belong to
 - mapping from canonical nodes to connected components

 """
 canonical = 
 components = 
 while nodes:
 node = nodes.pop()
 component = bfs_visited(graph, node)
 components[node] = component
 nodes.difference_update(component)
 for n in component:
 canonical[n] = node
 return canonical, components

def resilience(graph, attack_order):
 """Given an undirected graph represented as a mapping from nodes to
 an iterable of their neighbours, and an iterable of nodes, generate
 integers such that the the k-th result is the size of the largest
 connected component after the removal of the first k-1 nodes.

 """
 # Take a copy of the graph so that we can destructively modify it.
 graph = node: set(neighbours) for node, neighbours in graph.items()

 canonical, components = connected_components(graph, set(graph))
 largest = lambda: max(map(len, components.values()), default=0)
 yield largest()
 for node in attack_order:
 # Find connected component containing node.
 component = components.pop(canonical.pop(node))

 # Remove node from graph.
 for neighbor in graph[node]:
 graph[neighbor].remove(node)
 graph.pop(node)
 component.remove(node)

 # Component may have been split by removal of node, so search
 # it for new connected components and update data structures
 # accordingly.
 canon, comp = connected_components(graph, component)
 canonical.update(canon)
 components.update(comp)
 yield largest()

In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.

I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.

2. Performance comparison

Here's a useful function for making test cases:

from itertools import combinations
from random import random

def random_graph(n, p):
 """Return a random undirected graph with n nodes and each edge chosen
 independently with probability p.

 """
 assert 0 <= p <= 1
 graph = i: set() for i in range(n)
 for i, j in combinations(range(n), 2):
 if random() <= p:
 graph[i].add(j)
 graph[j].add(i)
 return graph

Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in Ã‚Â§1.2 above.

>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434

So the revised code is about 200 times faster on this test case.

thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments. — Jan 6 at 15:05
You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations? — Jan 7 at 14:06
graph isn't a global variable Ã¢Â€Â” it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ Ã¢Â€Â” "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope". — Jan 7 at 17:38

搜尋此網誌

trjhtr

Computing resilience of the network presented as an undirected graph in Python

1 Answer
1

1. Review

2. Performance comparison

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

1. Review

2. Performance comparison

1. Review

2. Performance comparison

1. Review

2. Performance comparison

1. Review

2. Performance comparison

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read an image with ADNS2610 optical sensor and Arduino Uno

Read files from a directory using Promises

Computing resilience of the network presented as an undirected graph in Python

1 Answer 1

1. Review

2. Performance comparison

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

1. Review

2. Performance comparison

1. Review

2. Performance comparison

1. Review

2. Performance comparison

1. Review

2. Performance comparison

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read an image with ADNS2610 optical sensor and Arduino Uno

Read files from a directory using Promises

1 Answer
1

1 Answer
1

1 Answer
1