Computing resilience of the network presented as an undirected graph in Python

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite
1












The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors) for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited() returns the set of nodes that are still connected to the given node.



How can I improve the implementation of the algorithm (currently it is very slow)?



def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited

def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components

def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]

def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N

returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res






share|improve this question





















  • There is a stray } in the first sentence of the post. Where does that belong?
    – mkrieger1
    Jan 6 at 15:10










  • @mkrieger1, it was supposed to be: node: (set of its neighbors) for each node in the graph. Thank you for pointing out the error
    – Ayenn Ul
    Jan 6 at 15:16
















up vote
2
down vote

favorite
1












The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors) for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited() returns the set of nodes that are still connected to the given node.



How can I improve the implementation of the algorithm (currently it is very slow)?



def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited

def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components

def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]

def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N

returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res






share|improve this question





















  • There is a stray } in the first sentence of the post. Where does that belong?
    – mkrieger1
    Jan 6 at 15:10










  • @mkrieger1, it was supposed to be: node: (set of its neighbors) for each node in the graph. Thank you for pointing out the error
    – Ayenn Ul
    Jan 6 at 15:16












up vote
2
down vote

favorite
1









up vote
2
down vote

favorite
1






1





The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors) for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited() returns the set of nodes that are still connected to the given node.



How can I improve the implementation of the algorithm (currently it is very slow)?



def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited

def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components

def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]

def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N

returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res






share|improve this question













The idea is to compute resilience of the network presented as an undirected graph in form node: (set of its neighbors) for each node in the graph}. The function removes nodes from the graph in the specified order one by one and calculates the size of the largest remaining connected component. The function returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes. The helper function bfs_visited() returns the set of nodes that are still connected to the given node.



How can I improve the implementation of the algorithm (currently it is very slow)?



def bfs_visited(graph, node):
"""undirected graph Vertex: neighbors
Returns the set of all nodes visited by the algrorithm"""
queue = deque()
queue.append(node)
visited = set([node])
while queue:
current_node = queue.popleft()
for neighbor in graph[current_node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return visited

def cc_visited(graph):
""" undirected graph Vertex: neighbors
Returns a list of sets of connected components"""
remaining_nodes = set(graph.keys())
connected_components =
for node in remaining_nodes:
visited = bfs_visited(graph, node)
if visited not in connected_components:
connected_components.append(visited)
remaining_nodes = remaining_nodes - visited
#print(node, remaining_nodes)
return connected_components

def largest_cc_size(ugraph):
"""returns the size (an integer) of the largest connected component in the ugraph."""
if not ugraph:
return 0
res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
res.sort()
return res[-1][0]

def compute_resilience(ugraph, attack_order):
"""
input: a graph V: N

returns a list whose k+1th entry is the size of the largest connected component after the removal of the first k nodes
"""
res = [len(ugraph)]
for node in attack_order:
neighbors = ugraph[node]
for neighbor in neighbors:
ugraph[neighbor].remove(node)
ugraph.pop(node)
res.append(largest_cc_size(ugraph))
return res








share|improve this question












share|improve this question




share|improve this question








edited Jan 6 at 14:15









Gareth Rees

41.2k394168




41.2k394168









asked Jan 5 at 20:41









Ayenn Ul

605




605











  • There is a stray } in the first sentence of the post. Where does that belong?
    – mkrieger1
    Jan 6 at 15:10










  • @mkrieger1, it was supposed to be: node: (set of its neighbors) for each node in the graph. Thank you for pointing out the error
    – Ayenn Ul
    Jan 6 at 15:16
















  • There is a stray } in the first sentence of the post. Where does that belong?
    – mkrieger1
    Jan 6 at 15:10










  • @mkrieger1, it was supposed to be: node: (set of its neighbors) for each node in the graph. Thank you for pointing out the error
    – Ayenn Ul
    Jan 6 at 15:16















There is a stray } in the first sentence of the post. Where does that belong?
– mkrieger1
Jan 6 at 15:10




There is a stray } in the first sentence of the post. Where does that belong?
– mkrieger1
Jan 6 at 15:10












@mkrieger1, it was supposed to be: node: (set of its neighbors) for each node in the graph. Thank you for pointing out the error
– Ayenn Ul
Jan 6 at 15:16




@mkrieger1, it was supposed to be: node: (set of its neighbors) for each node in the graph. Thank you for pointing out the error
– Ayenn Ul
Jan 6 at 15:16










1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










1. Review



  1. The docstring for bfs_visited should explain the node argument.


  2. The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.



  3. In bfs_visited the lines:



    queue = deque()
    queue.append(node)


    can be simplified to:



    queue = deque([node]) 



  4. The function largest_cc_size builds a list of pairs:



    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]


    But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:



    res = [len(ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1]



  5. Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:



    if ugraph:
    return max(map(len, cc_visited(ugraph)))
    else:
    return 0



  6. If you are using Python 3.4 or later, this can be further simplified using the default argument to max:



    return max(map(len, cc_visited(ugraph)), default=0)


    This is now so simple that it probably doesn't need its own function.




  7. This line:



    remaining_nodes = set(graph.keys())


    can be written more simply:



    remaining_nodes = set(graph)



  8. There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:



    for node in remaining_nodes:
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes = remaining_nodes - visited


    It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:



    for node in remaining_nodes:


    only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:



    remaining_nodes = remaining_nodes - visited


    this has no effect on the nodes being iterated over.




  9. You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:



    remaining_nodes.difference_update(visited)


    but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:



    while remaining_nodes:
    node = remaining_nodes.pop()
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes.difference_update(visited)


    Using while and pop is the standard idiom in Python for consuming a data structure while modifying it — you do something similar in bfs_visited.




  10. There is now no need for the test:



    if visited not in connected_components:


    since each component is produced exactly once.




  11. In compute_resilience the first line is:



    res = [len(ugraph)]


    but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:



    res = [largest_cc_size(ugraph)]



  12. For each node in attack order, compute_resilience calls:



    res.append(largest_cc_size(ugraph))


    But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)



    In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.



    This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.



    def connected_components(graph, nodes):
    """Given an undirected graph represented as a mapping from nodes to
    the set of their neighbours, and a set of nodes, find the
    connected components in the graph containing those nodes.

    Returns:
    - mapping from nodes to the canonical node of the connected
    component they belong to
    - mapping from canonical nodes to connected components

    """
    canonical =
    components =
    while nodes:
    node = nodes.pop()
    component = bfs_visited(graph, node)
    components[node] = component
    nodes.difference_update(component)
    for n in component:
    canonical[n] = node
    return canonical, components

    def resilience(graph, attack_order):
    """Given an undirected graph represented as a mapping from nodes to
    an iterable of their neighbours, and an iterable of nodes, generate
    integers such that the the k-th result is the size of the largest
    connected component after the removal of the first k-1 nodes.

    """
    # Take a copy of the graph so that we can destructively modify it.
    graph = node: set(neighbours) for node, neighbours in graph.items()

    canonical, components = connected_components(graph, set(graph))
    largest = lambda: max(map(len, components.values()), default=0)
    yield largest()
    for node in attack_order:
    # Find connected component containing node.
    component = components.pop(canonical.pop(node))

    # Remove node from graph.
    for neighbor in graph[node]:
    graph[neighbor].remove(node)
    graph.pop(node)
    component.remove(node)

    # Component may have been split by removal of node, so search
    # it for new connected components and update data structures
    # accordingly.
    canon, comp = connected_components(graph, component)
    canonical.update(canon)
    components.update(comp)
    yield largest()



  13. In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.



    I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.



2. Performance comparison



Here's a useful function for making test cases:



from itertools import combinations
from random import random

def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.

"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph


Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in §1.2 above.



>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434


So the revised code is about 200 times faster on this test case.






share|improve this answer























  • thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
    – Ayenn Ul
    Jan 6 at 15:05











  • You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
    – Ayenn Ul
    Jan 7 at 14:06











  • graph isn't a global variable — it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ — "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
    – Gareth Rees
    Jan 7 at 17:38










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184392%2fcomputing-resilience-of-the-network-presented-as-an-undirected-graph-in-python%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










1. Review



  1. The docstring for bfs_visited should explain the node argument.


  2. The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.



  3. In bfs_visited the lines:



    queue = deque()
    queue.append(node)


    can be simplified to:



    queue = deque([node]) 



  4. The function largest_cc_size builds a list of pairs:



    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]


    But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:



    res = [len(ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1]



  5. Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:



    if ugraph:
    return max(map(len, cc_visited(ugraph)))
    else:
    return 0



  6. If you are using Python 3.4 or later, this can be further simplified using the default argument to max:



    return max(map(len, cc_visited(ugraph)), default=0)


    This is now so simple that it probably doesn't need its own function.




  7. This line:



    remaining_nodes = set(graph.keys())


    can be written more simply:



    remaining_nodes = set(graph)



  8. There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:



    for node in remaining_nodes:
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes = remaining_nodes - visited


    It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:



    for node in remaining_nodes:


    only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:



    remaining_nodes = remaining_nodes - visited


    this has no effect on the nodes being iterated over.




  9. You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:



    remaining_nodes.difference_update(visited)


    but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:



    while remaining_nodes:
    node = remaining_nodes.pop()
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes.difference_update(visited)


    Using while and pop is the standard idiom in Python for consuming a data structure while modifying it — you do something similar in bfs_visited.




  10. There is now no need for the test:



    if visited not in connected_components:


    since each component is produced exactly once.




  11. In compute_resilience the first line is:



    res = [len(ugraph)]


    but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:



    res = [largest_cc_size(ugraph)]



  12. For each node in attack order, compute_resilience calls:



    res.append(largest_cc_size(ugraph))


    But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)



    In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.



    This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.



    def connected_components(graph, nodes):
    """Given an undirected graph represented as a mapping from nodes to
    the set of their neighbours, and a set of nodes, find the
    connected components in the graph containing those nodes.

    Returns:
    - mapping from nodes to the canonical node of the connected
    component they belong to
    - mapping from canonical nodes to connected components

    """
    canonical =
    components =
    while nodes:
    node = nodes.pop()
    component = bfs_visited(graph, node)
    components[node] = component
    nodes.difference_update(component)
    for n in component:
    canonical[n] = node
    return canonical, components

    def resilience(graph, attack_order):
    """Given an undirected graph represented as a mapping from nodes to
    an iterable of their neighbours, and an iterable of nodes, generate
    integers such that the the k-th result is the size of the largest
    connected component after the removal of the first k-1 nodes.

    """
    # Take a copy of the graph so that we can destructively modify it.
    graph = node: set(neighbours) for node, neighbours in graph.items()

    canonical, components = connected_components(graph, set(graph))
    largest = lambda: max(map(len, components.values()), default=0)
    yield largest()
    for node in attack_order:
    # Find connected component containing node.
    component = components.pop(canonical.pop(node))

    # Remove node from graph.
    for neighbor in graph[node]:
    graph[neighbor].remove(node)
    graph.pop(node)
    component.remove(node)

    # Component may have been split by removal of node, so search
    # it for new connected components and update data structures
    # accordingly.
    canon, comp = connected_components(graph, component)
    canonical.update(canon)
    components.update(comp)
    yield largest()



  13. In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.



    I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.



2. Performance comparison



Here's a useful function for making test cases:



from itertools import combinations
from random import random

def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.

"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph


Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in §1.2 above.



>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434


So the revised code is about 200 times faster on this test case.






share|improve this answer























  • thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
    – Ayenn Ul
    Jan 6 at 15:05











  • You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
    – Ayenn Ul
    Jan 7 at 14:06











  • graph isn't a global variable — it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ — "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
    – Gareth Rees
    Jan 7 at 17:38














up vote
2
down vote



accepted










1. Review



  1. The docstring for bfs_visited should explain the node argument.


  2. The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.



  3. In bfs_visited the lines:



    queue = deque()
    queue.append(node)


    can be simplified to:



    queue = deque([node]) 



  4. The function largest_cc_size builds a list of pairs:



    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]


    But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:



    res = [len(ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1]



  5. Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:



    if ugraph:
    return max(map(len, cc_visited(ugraph)))
    else:
    return 0



  6. If you are using Python 3.4 or later, this can be further simplified using the default argument to max:



    return max(map(len, cc_visited(ugraph)), default=0)


    This is now so simple that it probably doesn't need its own function.




  7. This line:



    remaining_nodes = set(graph.keys())


    can be written more simply:



    remaining_nodes = set(graph)



  8. There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:



    for node in remaining_nodes:
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes = remaining_nodes - visited


    It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:



    for node in remaining_nodes:


    only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:



    remaining_nodes = remaining_nodes - visited


    this has no effect on the nodes being iterated over.




  9. You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:



    remaining_nodes.difference_update(visited)


    but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:



    while remaining_nodes:
    node = remaining_nodes.pop()
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes.difference_update(visited)


    Using while and pop is the standard idiom in Python for consuming a data structure while modifying it — you do something similar in bfs_visited.




  10. There is now no need for the test:



    if visited not in connected_components:


    since each component is produced exactly once.




  11. In compute_resilience the first line is:



    res = [len(ugraph)]


    but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:



    res = [largest_cc_size(ugraph)]



  12. For each node in attack order, compute_resilience calls:



    res.append(largest_cc_size(ugraph))


    But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)



    In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.



    This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.



    def connected_components(graph, nodes):
    """Given an undirected graph represented as a mapping from nodes to
    the set of their neighbours, and a set of nodes, find the
    connected components in the graph containing those nodes.

    Returns:
    - mapping from nodes to the canonical node of the connected
    component they belong to
    - mapping from canonical nodes to connected components

    """
    canonical =
    components =
    while nodes:
    node = nodes.pop()
    component = bfs_visited(graph, node)
    components[node] = component
    nodes.difference_update(component)
    for n in component:
    canonical[n] = node
    return canonical, components

    def resilience(graph, attack_order):
    """Given an undirected graph represented as a mapping from nodes to
    an iterable of their neighbours, and an iterable of nodes, generate
    integers such that the the k-th result is the size of the largest
    connected component after the removal of the first k-1 nodes.

    """
    # Take a copy of the graph so that we can destructively modify it.
    graph = node: set(neighbours) for node, neighbours in graph.items()

    canonical, components = connected_components(graph, set(graph))
    largest = lambda: max(map(len, components.values()), default=0)
    yield largest()
    for node in attack_order:
    # Find connected component containing node.
    component = components.pop(canonical.pop(node))

    # Remove node from graph.
    for neighbor in graph[node]:
    graph[neighbor].remove(node)
    graph.pop(node)
    component.remove(node)

    # Component may have been split by removal of node, so search
    # it for new connected components and update data structures
    # accordingly.
    canon, comp = connected_components(graph, component)
    canonical.update(canon)
    components.update(comp)
    yield largest()



  13. In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.



    I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.



2. Performance comparison



Here's a useful function for making test cases:



from itertools import combinations
from random import random

def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.

"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph


Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in §1.2 above.



>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434


So the revised code is about 200 times faster on this test case.






share|improve this answer























  • thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
    – Ayenn Ul
    Jan 6 at 15:05











  • You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
    – Ayenn Ul
    Jan 7 at 14:06











  • graph isn't a global variable — it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ — "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
    – Gareth Rees
    Jan 7 at 17:38












up vote
2
down vote



accepted







up vote
2
down vote



accepted






1. Review



  1. The docstring for bfs_visited should explain the node argument.


  2. The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.



  3. In bfs_visited the lines:



    queue = deque()
    queue.append(node)


    can be simplified to:



    queue = deque([node]) 



  4. The function largest_cc_size builds a list of pairs:



    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]


    But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:



    res = [len(ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1]



  5. Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:



    if ugraph:
    return max(map(len, cc_visited(ugraph)))
    else:
    return 0



  6. If you are using Python 3.4 or later, this can be further simplified using the default argument to max:



    return max(map(len, cc_visited(ugraph)), default=0)


    This is now so simple that it probably doesn't need its own function.




  7. This line:



    remaining_nodes = set(graph.keys())


    can be written more simply:



    remaining_nodes = set(graph)



  8. There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:



    for node in remaining_nodes:
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes = remaining_nodes - visited


    It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:



    for node in remaining_nodes:


    only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:



    remaining_nodes = remaining_nodes - visited


    this has no effect on the nodes being iterated over.




  9. You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:



    remaining_nodes.difference_update(visited)


    but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:



    while remaining_nodes:
    node = remaining_nodes.pop()
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes.difference_update(visited)


    Using while and pop is the standard idiom in Python for consuming a data structure while modifying it — you do something similar in bfs_visited.




  10. There is now no need for the test:



    if visited not in connected_components:


    since each component is produced exactly once.




  11. In compute_resilience the first line is:



    res = [len(ugraph)]


    but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:



    res = [largest_cc_size(ugraph)]



  12. For each node in attack order, compute_resilience calls:



    res.append(largest_cc_size(ugraph))


    But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)



    In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.



    This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.



    def connected_components(graph, nodes):
    """Given an undirected graph represented as a mapping from nodes to
    the set of their neighbours, and a set of nodes, find the
    connected components in the graph containing those nodes.

    Returns:
    - mapping from nodes to the canonical node of the connected
    component they belong to
    - mapping from canonical nodes to connected components

    """
    canonical =
    components =
    while nodes:
    node = nodes.pop()
    component = bfs_visited(graph, node)
    components[node] = component
    nodes.difference_update(component)
    for n in component:
    canonical[n] = node
    return canonical, components

    def resilience(graph, attack_order):
    """Given an undirected graph represented as a mapping from nodes to
    an iterable of their neighbours, and an iterable of nodes, generate
    integers such that the the k-th result is the size of the largest
    connected component after the removal of the first k-1 nodes.

    """
    # Take a copy of the graph so that we can destructively modify it.
    graph = node: set(neighbours) for node, neighbours in graph.items()

    canonical, components = connected_components(graph, set(graph))
    largest = lambda: max(map(len, components.values()), default=0)
    yield largest()
    for node in attack_order:
    # Find connected component containing node.
    component = components.pop(canonical.pop(node))

    # Remove node from graph.
    for neighbor in graph[node]:
    graph[neighbor].remove(node)
    graph.pop(node)
    component.remove(node)

    # Component may have been split by removal of node, so search
    # it for new connected components and update data structures
    # accordingly.
    canon, comp = connected_components(graph, component)
    canonical.update(canon)
    components.update(comp)
    yield largest()



  13. In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.



    I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.



2. Performance comparison



Here's a useful function for making test cases:



from itertools import combinations
from random import random

def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.

"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph


Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in §1.2 above.



>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434


So the revised code is about 200 times faster on this test case.






share|improve this answer















1. Review



  1. The docstring for bfs_visited should explain the node argument.


  2. The docstring for compute_resilience should explain that the ugraph argument gets modified. Alternatively, the function could take a copy of the graph so that the original is not modified.



  3. In bfs_visited the lines:



    queue = deque()
    queue.append(node)


    can be simplified to:



    queue = deque([node]) 



  4. The function largest_cc_size builds a list of pairs:



    res = [(len(ccc), ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1][0]


    But you can see that it only ever uses the first element of each pair (the size of the component). So you could simplify it by not building the pairs:



    res = [len(ccc) for ccc in cc_visited(ugraph)]
    res.sort()
    return res[-1]



  5. Since only the size of the largest component is needed, there is no need to build the whole list. Instead you could use max to find the largest:



    if ugraph:
    return max(map(len, cc_visited(ugraph)))
    else:
    return 0



  6. If you are using Python 3.4 or later, this can be further simplified using the default argument to max:



    return max(map(len, cc_visited(ugraph)), default=0)


    This is now so simple that it probably doesn't need its own function.




  7. This line:



    remaining_nodes = set(graph.keys())


    can be written more simply:



    remaining_nodes = set(graph)



  8. There is a loop over the set remaining_nodes where on each loop iteration you update remaining_nodes:



    for node in remaining_nodes:
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes = remaining_nodes - visited


    It looks as if the intention of the code to avoid iterating over the nodes in visited by removing them from remaining_nodes, but this doesn't work! The problem is that the for statement:



    for node in remaining_nodes:


    only evaluates the expression remaining_nodes once, at the start of the loop. So when the code creates a new set and assigns it to remaining_nodes:



    remaining_nodes = remaining_nodes - visited


    this has no effect on the nodes being iterated over.




  9. You might imagine trying to fix this by using the difference_update method to adjust the set being iterated over:



    remaining_nodes.difference_update(visited)


    but this would be a bad idea because then you would be iterating over a set and modifying it within the loop, which is not safe. Instead, you need to write the loop as follows:



    while remaining_nodes:
    node = remaining_nodes.pop()
    visited = bfs_visited(graph, node)
    if visited not in connected_components:
    connected_components.append(visited)
    remaining_nodes.difference_update(visited)


    Using while and pop is the standard idiom in Python for consuming a data structure while modifying it — you do something similar in bfs_visited.




  10. There is now no need for the test:



    if visited not in connected_components:


    since each component is produced exactly once.




  11. In compute_resilience the first line is:



    res = [len(ugraph)]


    but this only works if the graph is a single connected component to start with. To handle the general case, the first line should be:



    res = [largest_cc_size(ugraph)]



  12. For each node in attack order, compute_resilience calls:



    res.append(largest_cc_size(ugraph))


    But this doesn't take advantage of the work that was previously done. When we remove node from the graph, all connected components remain the same, except for the connected component containing node. So we can potentially save some work if we only do a breadth-first search over that component, and not over the whole graph. (Whether this actually saves any work depends on how resilient the graph is. For highly resilient graphs it won't make much difference.)



    In order to do this we'll need to redesign the data structures so that we can efficiently find the component containing a node, and efficiently remove that component from the collection of components.



    This answer is already quite long, so I won't explain in detail how to redesign the data structures, I'll just present the revised code and let you figure it out for yourself.



    def connected_components(graph, nodes):
    """Given an undirected graph represented as a mapping from nodes to
    the set of their neighbours, and a set of nodes, find the
    connected components in the graph containing those nodes.

    Returns:
    - mapping from nodes to the canonical node of the connected
    component they belong to
    - mapping from canonical nodes to connected components

    """
    canonical =
    components =
    while nodes:
    node = nodes.pop()
    component = bfs_visited(graph, node)
    components[node] = component
    nodes.difference_update(component)
    for n in component:
    canonical[n] = node
    return canonical, components

    def resilience(graph, attack_order):
    """Given an undirected graph represented as a mapping from nodes to
    an iterable of their neighbours, and an iterable of nodes, generate
    integers such that the the k-th result is the size of the largest
    connected component after the removal of the first k-1 nodes.

    """
    # Take a copy of the graph so that we can destructively modify it.
    graph = node: set(neighbours) for node, neighbours in graph.items()

    canonical, components = connected_components(graph, set(graph))
    largest = lambda: max(map(len, components.values()), default=0)
    yield largest()
    for node in attack_order:
    # Find connected component containing node.
    component = components.pop(canonical.pop(node))

    # Remove node from graph.
    for neighbor in graph[node]:
    graph[neighbor].remove(node)
    graph.pop(node)
    component.remove(node)

    # Component may have been split by removal of node, so search
    # it for new connected components and update data structures
    # accordingly.
    canon, comp = connected_components(graph, component)
    canonical.update(canon)
    components.update(comp)
    yield largest()



  13. In the revised code, the max operation has to iterate over all the remaining connected components in order to find the largest one. It would be possible to improve the efficiency of this step by storing the connected components in a priority queue so that the largest one can be found in time that's logarithmic in the number of components.



    I doubt that this part of the algorithm is a bottleneck in practice, so it's probably not worth the extra code, but if you need to do this, then there are some Priority Queue Implementation Notes in the Python documentation.



2. Performance comparison



Here's a useful function for making test cases:



from itertools import combinations
from random import random

def random_graph(n, p):
"""Return a random undirected graph with n nodes and each edge chosen
independently with probability p.

"""
assert 0 <= p <= 1
graph = i: set() for i in range(n)
for i, j in combinations(range(n), 2):
if random() <= p:
graph[i].add(j)
graph[j].add(i)
return graph


Now, a quick performance comparison between the revised and original code. Note that we have to run the revised code first, because the original code destructively modifies the graph, as noted in §1.2 above.



>>> from timeit import timeit
>>> G = random_graph(300, 0.2)
>>> timeit(lambda:list(resilience(G, list(G))), number=1) # revised
0.28782312001567334
>>> timeit(lambda:compute_resilience(G, list(G)), number=1) # original
59.46968446299434


So the revised code is about 200 times faster on this test case.







share|improve this answer















share|improve this answer



share|improve this answer








edited Jan 6 at 12:33


























answered Jan 6 at 11:19









Gareth Rees

41.2k394168




41.2k394168











  • thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
    – Ayenn Ul
    Jan 6 at 15:05











  • You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
    – Ayenn Ul
    Jan 7 at 14:06











  • graph isn't a global variable — it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ — "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
    – Gareth Rees
    Jan 7 at 17:38
















  • thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
    – Ayenn Ul
    Jan 6 at 15:05











  • You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
    – Ayenn Ul
    Jan 7 at 14:06











  • graph isn't a global variable — it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ — "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
    – Gareth Rees
    Jan 7 at 17:38















thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
– Ayenn Ul
Jan 6 at 15:05





thank you for a tremendously valuable answer! I have learned a lot of new thing from the answer, including the default argument in the max function and the proper way to iterate over a modified sequence. I also going to change my docstrings taking into account your comments.
– Ayenn Ul
Jan 6 at 15:05













You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
– Ayenn Ul
Jan 7 at 14:06





You way of copying the graph graph = node: set(neighbours) for node, neighbours in graph.items()looks very concise and neat. As far as I understand, you created a local variable graph which is used instead of the global variable with the same name. Could this technique be used everywhere, or is there some limitations?
– Ayenn Ul
Jan 7 at 14:06













graph isn't a global variable — it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ — "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
– Gareth Rees
Jan 7 at 17:38




graph isn't a global variable — it's one of the function parameters. But yes, you can always make a local variable with whatever name you like, even if there is a global variable with the same name. This is in the Python FAQ — "when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope".
– Gareth Rees
Jan 7 at 17:38












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184392%2fcomputing-resilience-of-the-network-presented-as-an-undirected-graph-in-python%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation