increase the efficiency of a matching algorithm

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
4
down vote

favorite

I'm trying to create an algorithm that will match one set of objects with another. Each of the objects from one group can be matched with some of the objects from another group. Once an item has been matched, it cannot be matched to another item. My current implementation iterates over each of the objects and attempts to find a solution where each object has a match. I've done this recursively, but this method quickly becomes unusable as the processing time increases exponentially. Here is a trivial example of what I've written:

def demo_match():
 possible_matches = 
 "x": 1, 2,
 "y": 2, 3,
 "z": 1, 2
 

 used_numbers = set()

 matched = 

 def match_items(remaining_items_to_match):

 for letter in remaining_items_to_match:
 possible_numbers = possible_matches[letter]
 available_numbers = possible_numbers.difference(used_numbers)
 print(f"can match 'letter' with available_numbers")
 if not available_numbers:
 raise ValueError(f"there are no possible matches for letter")
 for possible_match in available_numbers:
 try:
 used_numbers.add(possible_match)
 matched[letter] = possible_match
 print(f"'letter' has been matched with possible_match")
 if len(remaining_items_to_match) is 1:
 return matched
 # remove this item and process the remaining
 next_remaining_items = remaining_items_to_match[1:]
 return match_items(next_remaining_items)
 except ValueError:
 print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
 used_numbers.remove(possible_match)
 del matched[letter]
 raise ValueError("ran out of available numbers to match")

 print(f"nMatch Solution: match_items(list(possible_matches.keys()))")

The actual implementation assigns weights to the possible matches, meaning that matching 1 item would be preferred, but if it results in a dead end it tries the next match. I feel like there is a better implementation of this out there, maybe involving sets to find where items overlap and determining the critical path. I just haven't been able to come up with a decent alternative other than this exhaustive search.

------- Update ------

After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.

It accepts a matrix of items, where the rows represent 1 item, the columns another, and where they intersect is the cost when those items are paired. The common example uses people and jobs as the items. This algorithm will find the pair of items with the least cost. To prevent an item from being paired, set the cost in the matrix to float('inf')
eg:

 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6 
Sally 5 3 7

would be represented by:

[
[9,5,1],
[1,4,6],
[5,3,7]
]

and would return [0,1,2], [2,0,1] where the first list represents the row indices and the second list represents the index of columns that would be matched with the rows
I ended up converting the two lists into a dictionary of people to their corresponding job via

>>> def match():
 from munkres import Munkres
 people = ["bob", "dick", "sally"]
 jobs = ["janitor", "teacher", "doctor"]
 matrix = [
 [9,5,1],
 [1,4,6],
 [5,3,7]
 ]

 m = Munkres()
 indexes = m.compute(matrix)

 matches = 
 for p, j in indexes:
 person = people[p]
 job = jobs[j]
 matches[person] = job
 return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'

matches will be a dictionary of people to their matched job

edited Jan 28 at 18:43

asked Jan 26 at 19:40

user2682863

1235

The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
â€“Â vnp
Jan 27 at 5:34

add a commentÂ |Â

up vote
4
down vote

favorite

def demo_match():
 possible_matches = 
 "x": 1, 2,
 "y": 2, 3,
 "z": 1, 2
 

 used_numbers = set()

 matched = 

 def match_items(remaining_items_to_match):

 for letter in remaining_items_to_match:
 possible_numbers = possible_matches[letter]
 available_numbers = possible_numbers.difference(used_numbers)
 print(f"can match 'letter' with available_numbers")
 if not available_numbers:
 raise ValueError(f"there are no possible matches for letter")
 for possible_match in available_numbers:
 try:
 used_numbers.add(possible_match)
 matched[letter] = possible_match
 print(f"'letter' has been matched with possible_match")
 if len(remaining_items_to_match) is 1:
 return matched
 # remove this item and process the remaining
 next_remaining_items = remaining_items_to_match[1:]
 return match_items(next_remaining_items)
 except ValueError:
 print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
 used_numbers.remove(possible_match)
 del matched[letter]
 raise ValueError("ran out of available numbers to match")

 print(f"nMatch Solution: match_items(list(possible_matches.keys()))")

------- Update ------

After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.

 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6 
Sally 5 3 7

would be represented by:

[
[9,5,1],
[1,4,6],
[5,3,7]
]

>>> def match():
 from munkres import Munkres
 people = ["bob", "dick", "sally"]
 jobs = ["janitor", "teacher", "doctor"]
 matrix = [
 [9,5,1],
 [1,4,6],
 [5,3,7]
 ]

 m = Munkres()
 indexes = m.compute(matrix)

 matches = 
 for p, j in indexes:
 person = people[p]
 job = jobs[j]
 matches[person] = job
 return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'

matches will be a dictionary of people to their matched job

edited Jan 28 at 18:43

asked Jan 26 at 19:40

user2682863

1235

The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
â€“Â vnp
Jan 27 at 5:34

add a commentÂ |Â

up vote
4
down vote

favorite

def demo_match():
 possible_matches = 
 "x": 1, 2,
 "y": 2, 3,
 "z": 1, 2
 

 used_numbers = set()

 matched = 

 def match_items(remaining_items_to_match):

 for letter in remaining_items_to_match:
 possible_numbers = possible_matches[letter]
 available_numbers = possible_numbers.difference(used_numbers)
 print(f"can match 'letter' with available_numbers")
 if not available_numbers:
 raise ValueError(f"there are no possible matches for letter")
 for possible_match in available_numbers:
 try:
 used_numbers.add(possible_match)
 matched[letter] = possible_match
 print(f"'letter' has been matched with possible_match")
 if len(remaining_items_to_match) is 1:
 return matched
 # remove this item and process the remaining
 next_remaining_items = remaining_items_to_match[1:]
 return match_items(next_remaining_items)
 except ValueError:
 print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
 used_numbers.remove(possible_match)
 del matched[letter]
 raise ValueError("ran out of available numbers to match")

 print(f"nMatch Solution: match_items(list(possible_matches.keys()))")

------- Update ------

After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.

 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6 
Sally 5 3 7

would be represented by:

[
[9,5,1],
[1,4,6],
[5,3,7]
]

>>> def match():
 from munkres import Munkres
 people = ["bob", "dick", "sally"]
 jobs = ["janitor", "teacher", "doctor"]
 matrix = [
 [9,5,1],
 [1,4,6],
 [5,3,7]
 ]

 m = Munkres()
 indexes = m.compute(matrix)

 matches = 
 for p, j in indexes:
 person = people[p]
 job = jobs[j]
 matches[person] = job
 return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'

matches will be a dictionary of people to their matched job

edited Jan 28 at 18:43

asked Jan 26 at 19:40

user2682863

1235

def demo_match():
 possible_matches = 
 "x": 1, 2,
 "y": 2, 3,
 "z": 1, 2
 

 used_numbers = set()

 matched = 

 def match_items(remaining_items_to_match):

 for letter in remaining_items_to_match:
 possible_numbers = possible_matches[letter]
 available_numbers = possible_numbers.difference(used_numbers)
 print(f"can match 'letter' with available_numbers")
 if not available_numbers:
 raise ValueError(f"there are no possible matches for letter")
 for possible_match in available_numbers:
 try:
 used_numbers.add(possible_match)
 matched[letter] = possible_match
 print(f"'letter' has been matched with possible_match")
 if len(remaining_items_to_match) is 1:
 return matched
 # remove this item and process the remaining
 next_remaining_items = remaining_items_to_match[1:]
 return match_items(next_remaining_items)
 except ValueError:
 print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
 used_numbers.remove(possible_match)
 del matched[letter]
 raise ValueError("ran out of available numbers to match")

 print(f"nMatch Solution: match_items(list(possible_matches.keys()))")

------- Update ------

After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.

 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6 
Sally 5 3 7

would be represented by:

[
[9,5,1],
[1,4,6],
[5,3,7]
]

>>> def match():
 from munkres import Munkres
 people = ["bob", "dick", "sally"]
 jobs = ["janitor", "teacher", "doctor"]
 matrix = [
 [9,5,1],
 [1,4,6],
 [5,3,7]
 ]

 m = Munkres()
 indexes = m.compute(matrix)

 matches = 
 for p, j in indexes:
 person = people[p]
 job = jobs[j]
 matches[person] = job
 return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'

matches will be a dictionary of people to their matched job

edited Jan 28 at 18:43

asked Jan 26 at 19:40

user2682863

1235

edited Jan 28 at 18:43

asked Jan 26 at 19:40

user2682863

1235

asked Jan 26 at 19:40

user2682863

1235

asked Jan 26 at 19:40

user2682863

1235

The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
â€“Â vnp
Jan 27 at 5:34

add a commentÂ |Â

The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
â€“Â vnp
Jan 27 at 5:34

The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
â€“Â vnp
Jan 27 at 5:34

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

1. The HopcroftÃ¢Â€Â“Karp algorithm

The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the HopcroftÃ¢Â€Â“Karp algorithm.

Here's a sketch of how the HopcroftÃ¢Â€Â“Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:

Bipartite graph with vertices x,y,z,1,2,3

Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:

U contains vertices x,y,z and V contains vertices 1,2,3

The HopcroftÃ¢Â€Â“Karp algorithm works by repeatedly augmenting the matching, increasing the number of edges in the matching by one. Suppose that we have added one edge to the matching so far, between $y$ and $2$, drawn as a red edge:

y matched with 2

The algorithm now searches for an augmenting path. This is a path that starts at an unmatched vertex in $U$, crosses to $V$ on an unmatched (black) edge, crosses back to $U$ on a matched (red) edge, and so on, finally ending in $V$. Here's an augmenting path, starting at $x$, following an unmatched edge to $2$, following a matched edge to $y$, and following an unmatched edge to $3$:

Augmenting path x-2-y-3

An augmenting path has an odd number of edges (because it starts in $U$ and finishes in $V$) and the edges along the path alternate between unmatched and matched. This means that if we swap all the edges along the augmenting path (matched becoming unmatched and vice versa), we get a new matching with one more edge, like this:

x matched with 2 and y matched with 3

Now carrying on with the algorithm, there is another augmenting path, $z$Ã¢Â€Â“$2$Ã¢Â€Â“$x$Ã¢Â€Â“$1$:

Augmenting path z-2-x-1

Swapping the edges along this path, we increase the matching again:

x matched with 1, y with 3 and z with 2

Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.

How do we find the augmenting paths? Well, we use a combination of breadth-first search and depth-first search. At each stage we do a breadth-first search to divide the vertices into layers according to their shortest distance from an unmatched vertex along an augmenting path. Suppose we are at this stage of the algorithm, where $y$ has been matched with $2$:

y matched with 2

Now the breadth-first search divides the vertices into the following layers:

Four layers

Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.

2. Implementation

This is a direct translation of the pseudocode from Wikipedia into Python. To help with book-keeping, this implementation introduces a distinguished node nil, with all vertices being artificially matched with nil to start with. When there are no more augmenting paths ending at nil then we know the matching is maximal.

from collections import deque

def maximum_matching(graph):
 """Find a maximum unweighted matching in a bipartite graph. The input
 must be a dictionary mapping vertices in one partition to sets of
 vertices in the other partition. Return a dictionary mapping
 vertices in one partition to their matched vertex in the other.

 """
 # The two partitions of the graph.
 U = set(graph.keys())
 V = set.union(*graph.values())

 # A distinguished vertex not belonging to the graph.
 nil = object()

 # Current pairing for vertices in each partition.
 pair_u = dict.fromkeys(U, nil)
 pair_v = dict.fromkeys(V, nil)

 # Distance to each vertex along augmenting paths.
 dist = 
 inf = float('inf')

 def bfs():
 # Breadth-first search of the graph starting with unmatched
 # vertices in U and following "augmenting paths" alternating
 # between edges from U->V not in the matching and edges from
 # V->U in the matching. This partitions the vertexes into
 # layers according to their distance along augmenting paths.
 queue = deque()
 for u in U:
 if pair_u[u] is nil:
 dist[u] = 0
 queue.append(u)
 else:
 dist[u] = inf
 dist[nil] = inf
 while queue:
 u = queue.pop()
 if dist[u] < dist[nil]:
 # Follow edges from U->V not in the matching.
 for v in graph[u]:
 # Follow edge from V->U in the matching, if any.
 uu = pair_v[v]
 if dist[uu] == inf:
 dist[uu] = dist[u] + 1
 queue.append(uu)
 return dist[nil] is not inf

 def dfs(u):
 # Depth-first search along "augmenting paths" starting at
 # u. If we can find such a path that goes all the way from
 # u to nil, then we can swap matched/unmatched edges all
 # along the path, getting one additional matched edge
 # (since the path must have an odd number of edges).
 if u is not nil:
 for v in graph[u]:
 uu = pair_v[v]
 if dist[uu] == dist[u] + 1: # uu in next layer
 if dfs(uu):
 # Found augmenting path all the way to nil. In
 # this path v was matched with uu, so after
 # swapping v must now be matched with u.
 pair_v[v] = u
 pair_u[u] = v
 return True
 dist[u] = inf
 return False
 return True

 while bfs():
 for u in U:
 if pair_u[u] is nil:
 dfs(u)
 return u: v for u, v in pair_u.items() if v is not nil

edited Jan 27 at 11:44

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
â€“Â user2682863
Jan 27 at 17:08

I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
â€“Â Gareth Rees
Jan 27 at 17:13

I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
â€“Â user2682863
Jan 27 at 17:22

I can't see any evidence of that in the post Ã¢Â€Â” the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
â€“Â Gareth Rees
Jan 27 at 17:29

I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
â€“Â user2682863
Jan 27 at 17:34

Â |Â
show 1 more comment

up vote
3
down vote

Right off the bat, this occurs to me:

Build a frequency map of the possible match values:

freqs = collections.Counter()

for k,v in possible_matches.items():
 freqs.update(v)

Weight the match keys by the sum of all possible replacements:

weights = k: sum(freqs[replacement] for replacement in v) 
 for k,v in possible_matches.items()

Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:
```
order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])
```

Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:
```
ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])
```

Note that technically, when you "allocate" a match you are changing the dynamics of the weighting. So you could pass a modified replacements table down recursively, so used replacement values were removed from the weight computations:

new_freqs = collections.Counter()
for k,v in possible_matches.items():
 new_freqs.update(v - used_numbers) # Discarding used numbers

And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...

answered Jan 26 at 20:16

Austin Hastings

6,1591130

this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
â€“Â user2682863
Jan 26 at 20:36

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186077%2fincrease-the-efficiency-of-a-matching-algorithm%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
3
down vote

accepted

1. The HopcroftÃ¢Â€Â“Karp algorithm

The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the HopcroftÃ¢Â€Â“Karp algorithm.

Here's a sketch of how the HopcroftÃ¢Â€Â“Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:

Bipartite graph with vertices x,y,z,1,2,3

Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:

U contains vertices x,y,z and V contains vertices 1,2,3

y matched with 2

Augmenting path x-2-y-3

x matched with 2 and y matched with 3

Now carrying on with the algorithm, there is another augmenting path, $z$Ã¢Â€Â“$2$Ã¢Â€Â“$x$Ã¢Â€Â“$1$:

Augmenting path z-2-x-1

Swapping the edges along this path, we increase the matching again:

x matched with 1, y with 3 and z with 2

Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.

y matched with 2

Now the breadth-first search divides the vertices into the following layers:

Four layers

Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.

2. Implementation

from collections import deque

def maximum_matching(graph):
 """Find a maximum unweighted matching in a bipartite graph. The input
 must be a dictionary mapping vertices in one partition to sets of
 vertices in the other partition. Return a dictionary mapping
 vertices in one partition to their matched vertex in the other.

 """
 # The two partitions of the graph.
 U = set(graph.keys())
 V = set.union(*graph.values())

 # A distinguished vertex not belonging to the graph.
 nil = object()

 # Current pairing for vertices in each partition.
 pair_u = dict.fromkeys(U, nil)
 pair_v = dict.fromkeys(V, nil)

 # Distance to each vertex along augmenting paths.
 dist = 
 inf = float('inf')

 def bfs():
 # Breadth-first search of the graph starting with unmatched
 # vertices in U and following "augmenting paths" alternating
 # between edges from U->V not in the matching and edges from
 # V->U in the matching. This partitions the vertexes into
 # layers according to their distance along augmenting paths.
 queue = deque()
 for u in U:
 if pair_u[u] is nil:
 dist[u] = 0
 queue.append(u)
 else:
 dist[u] = inf
 dist[nil] = inf
 while queue:
 u = queue.pop()
 if dist[u] < dist[nil]:
 # Follow edges from U->V not in the matching.
 for v in graph[u]:
 # Follow edge from V->U in the matching, if any.
 uu = pair_v[v]
 if dist[uu] == inf:
 dist[uu] = dist[u] + 1
 queue.append(uu)
 return dist[nil] is not inf

 def dfs(u):
 # Depth-first search along "augmenting paths" starting at
 # u. If we can find such a path that goes all the way from
 # u to nil, then we can swap matched/unmatched edges all
 # along the path, getting one additional matched edge
 # (since the path must have an odd number of edges).
 if u is not nil:
 for v in graph[u]:
 uu = pair_v[v]
 if dist[uu] == dist[u] + 1: # uu in next layer
 if dfs(uu):
 # Found augmenting path all the way to nil. In
 # this path v was matched with uu, so after
 # swapping v must now be matched with u.
 pair_v[v] = u
 pair_u[u] = v
 return True
 dist[u] = inf
 return False
 return True

 while bfs():
 for u in U:
 if pair_u[u] is nil:
 dfs(u)
 return u: v for u, v in pair_u.items() if v is not nil

edited Jan 27 at 11:44

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
â€“Â user2682863
Jan 27 at 17:08

I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
â€“Â Gareth Rees
Jan 27 at 17:13

I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
â€“Â user2682863
Jan 27 at 17:22

I can't see any evidence of that in the post Ã¢Â€Â” the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
â€“Â Gareth Rees
Jan 27 at 17:29

I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
â€“Â user2682863
Jan 27 at 17:34

Â |Â
show 1 more comment

up vote
3
down vote

accepted

1. The HopcroftÃ¢Â€Â“Karp algorithm

The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the HopcroftÃ¢Â€Â“Karp algorithm.

Here's a sketch of how the HopcroftÃ¢Â€Â“Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:

Bipartite graph with vertices x,y,z,1,2,3

Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:

U contains vertices x,y,z and V contains vertices 1,2,3

y matched with 2

Augmenting path x-2-y-3

x matched with 2 and y matched with 3

Now carrying on with the algorithm, there is another augmenting path, $z$Ã¢Â€Â“$2$Ã¢Â€Â“$x$Ã¢Â€Â“$1$:

Augmenting path z-2-x-1

Swapping the edges along this path, we increase the matching again:

x matched with 1, y with 3 and z with 2

Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.

y matched with 2

Now the breadth-first search divides the vertices into the following layers:

Four layers

Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.

2. Implementation

from collections import deque

def maximum_matching(graph):
 """Find a maximum unweighted matching in a bipartite graph. The input
 must be a dictionary mapping vertices in one partition to sets of
 vertices in the other partition. Return a dictionary mapping
 vertices in one partition to their matched vertex in the other.

 """
 # The two partitions of the graph.
 U = set(graph.keys())
 V = set.union(*graph.values())

 # A distinguished vertex not belonging to the graph.
 nil = object()

 # Current pairing for vertices in each partition.
 pair_u = dict.fromkeys(U, nil)
 pair_v = dict.fromkeys(V, nil)

 # Distance to each vertex along augmenting paths.
 dist = 
 inf = float('inf')

 def bfs():
 # Breadth-first search of the graph starting with unmatched
 # vertices in U and following "augmenting paths" alternating
 # between edges from U->V not in the matching and edges from
 # V->U in the matching. This partitions the vertexes into
 # layers according to their distance along augmenting paths.
 queue = deque()
 for u in U:
 if pair_u[u] is nil:
 dist[u] = 0
 queue.append(u)
 else:
 dist[u] = inf
 dist[nil] = inf
 while queue:
 u = queue.pop()
 if dist[u] < dist[nil]:
 # Follow edges from U->V not in the matching.
 for v in graph[u]:
 # Follow edge from V->U in the matching, if any.
 uu = pair_v[v]
 if dist[uu] == inf:
 dist[uu] = dist[u] + 1
 queue.append(uu)
 return dist[nil] is not inf

 def dfs(u):
 # Depth-first search along "augmenting paths" starting at
 # u. If we can find such a path that goes all the way from
 # u to nil, then we can swap matched/unmatched edges all
 # along the path, getting one additional matched edge
 # (since the path must have an odd number of edges).
 if u is not nil:
 for v in graph[u]:
 uu = pair_v[v]
 if dist[uu] == dist[u] + 1: # uu in next layer
 if dfs(uu):
 # Found augmenting path all the way to nil. In
 # this path v was matched with uu, so after
 # swapping v must now be matched with u.
 pair_v[v] = u
 pair_u[u] = v
 return True
 dist[u] = inf
 return False
 return True

 while bfs():
 for u in U:
 if pair_u[u] is nil:
 dfs(u)
 return u: v for u, v in pair_u.items() if v is not nil

edited Jan 27 at 11:44

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
â€“Â user2682863
Jan 27 at 17:08

I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
â€“Â Gareth Rees
Jan 27 at 17:13

I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
â€“Â user2682863
Jan 27 at 17:22

I can't see any evidence of that in the post Ã¢Â€Â” the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
â€“Â Gareth Rees
Jan 27 at 17:29

I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
â€“Â user2682863
Jan 27 at 17:34

Â |Â
show 1 more comment

up vote
3
down vote

accepted

1. The HopcroftÃ¢Â€Â“Karp algorithm

The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the HopcroftÃ¢Â€Â“Karp algorithm.

Here's a sketch of how the HopcroftÃ¢Â€Â“Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:

Bipartite graph with vertices x,y,z,1,2,3

Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:

U contains vertices x,y,z and V contains vertices 1,2,3

y matched with 2

Augmenting path x-2-y-3

x matched with 2 and y matched with 3

Now carrying on with the algorithm, there is another augmenting path, $z$Ã¢Â€Â“$2$Ã¢Â€Â“$x$Ã¢Â€Â“$1$:

Augmenting path z-2-x-1

Swapping the edges along this path, we increase the matching again:

x matched with 1, y with 3 and z with 2

Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.

y matched with 2

Now the breadth-first search divides the vertices into the following layers:

Four layers

Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.

2. Implementation

from collections import deque

def maximum_matching(graph):
 """Find a maximum unweighted matching in a bipartite graph. The input
 must be a dictionary mapping vertices in one partition to sets of
 vertices in the other partition. Return a dictionary mapping
 vertices in one partition to their matched vertex in the other.

 """
 # The two partitions of the graph.
 U = set(graph.keys())
 V = set.union(*graph.values())

 # A distinguished vertex not belonging to the graph.
 nil = object()

 # Current pairing for vertices in each partition.
 pair_u = dict.fromkeys(U, nil)
 pair_v = dict.fromkeys(V, nil)

 # Distance to each vertex along augmenting paths.
 dist = 
 inf = float('inf')

 def bfs():
 # Breadth-first search of the graph starting with unmatched
 # vertices in U and following "augmenting paths" alternating
 # between edges from U->V not in the matching and edges from
 # V->U in the matching. This partitions the vertexes into
 # layers according to their distance along augmenting paths.
 queue = deque()
 for u in U:
 if pair_u[u] is nil:
 dist[u] = 0
 queue.append(u)
 else:
 dist[u] = inf
 dist[nil] = inf
 while queue:
 u = queue.pop()
 if dist[u] < dist[nil]:
 # Follow edges from U->V not in the matching.
 for v in graph[u]:
 # Follow edge from V->U in the matching, if any.
 uu = pair_v[v]
 if dist[uu] == inf:
 dist[uu] = dist[u] + 1
 queue.append(uu)
 return dist[nil] is not inf

 def dfs(u):
 # Depth-first search along "augmenting paths" starting at
 # u. If we can find such a path that goes all the way from
 # u to nil, then we can swap matched/unmatched edges all
 # along the path, getting one additional matched edge
 # (since the path must have an odd number of edges).
 if u is not nil:
 for v in graph[u]:
 uu = pair_v[v]
 if dist[uu] == dist[u] + 1: # uu in next layer
 if dfs(uu):
 # Found augmenting path all the way to nil. In
 # this path v was matched with uu, so after
 # swapping v must now be matched with u.
 pair_v[v] = u
 pair_u[u] = v
 return True
 dist[u] = inf
 return False
 return True

 while bfs():
 for u in U:
 if pair_u[u] is nil:
 dfs(u)
 return u: v for u, v in pair_u.items() if v is not nil

edited Jan 27 at 11:44

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

1. The HopcroftÃ¢Â€Â“Karp algorithm

The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the HopcroftÃ¢Â€Â“Karp algorithm.

Here's a sketch of how the HopcroftÃ¢Â€Â“Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:

Bipartite graph with vertices x,y,z,1,2,3

Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:

U contains vertices x,y,z and V contains vertices 1,2,3

y matched with 2

Augmenting path x-2-y-3

x matched with 2 and y matched with 3

Now carrying on with the algorithm, there is another augmenting path, $z$Ã¢Â€Â“$2$Ã¢Â€Â“$x$Ã¢Â€Â“$1$:

Augmenting path z-2-x-1

Swapping the edges along this path, we increase the matching again:

x matched with 1, y with 3 and z with 2

Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.

y matched with 2

Now the breadth-first search divides the vertices into the following layers:

Four layers

Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.

2. Implementation

from collections import deque

def maximum_matching(graph):
 """Find a maximum unweighted matching in a bipartite graph. The input
 must be a dictionary mapping vertices in one partition to sets of
 vertices in the other partition. Return a dictionary mapping
 vertices in one partition to their matched vertex in the other.

 """
 # The two partitions of the graph.
 U = set(graph.keys())
 V = set.union(*graph.values())

 # A distinguished vertex not belonging to the graph.
 nil = object()

 # Current pairing for vertices in each partition.
 pair_u = dict.fromkeys(U, nil)
 pair_v = dict.fromkeys(V, nil)

 # Distance to each vertex along augmenting paths.
 dist = 
 inf = float('inf')

 def bfs():
 # Breadth-first search of the graph starting with unmatched
 # vertices in U and following "augmenting paths" alternating
 # between edges from U->V not in the matching and edges from
 # V->U in the matching. This partitions the vertexes into
 # layers according to their distance along augmenting paths.
 queue = deque()
 for u in U:
 if pair_u[u] is nil:
 dist[u] = 0
 queue.append(u)
 else:
 dist[u] = inf
 dist[nil] = inf
 while queue:
 u = queue.pop()
 if dist[u] < dist[nil]:
 # Follow edges from U->V not in the matching.
 for v in graph[u]:
 # Follow edge from V->U in the matching, if any.
 uu = pair_v[v]
 if dist[uu] == inf:
 dist[uu] = dist[u] + 1
 queue.append(uu)
 return dist[nil] is not inf

 def dfs(u):
 # Depth-first search along "augmenting paths" starting at
 # u. If we can find such a path that goes all the way from
 # u to nil, then we can swap matched/unmatched edges all
 # along the path, getting one additional matched edge
 # (since the path must have an odd number of edges).
 if u is not nil:
 for v in graph[u]:
 uu = pair_v[v]
 if dist[uu] == dist[u] + 1: # uu in next layer
 if dfs(uu):
 # Found augmenting path all the way to nil. In
 # this path v was matched with uu, so after
 # swapping v must now be matched with u.
 pair_v[v] = u
 pair_u[u] = v
 return True
 dist[u] = inf
 return False
 return True

 while bfs():
 for u in U:
 if pair_u[u] is nil:
 dfs(u)
 return u: v for u, v in pair_u.items() if v is not nil

edited Jan 27 at 11:44

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

edited Jan 27 at 11:44

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

answered Jan 27 at 11:38

Gareth Rees

41.1k394168

excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
â€“Â user2682863
Jan 27 at 17:08

I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
â€“Â Gareth Rees
Jan 27 at 17:13

I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
â€“Â user2682863
Jan 27 at 17:22

I can't see any evidence of that in the post Ã¢Â€Â” the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
â€“Â Gareth Rees
Jan 27 at 17:29

I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
â€“Â user2682863
Jan 27 at 17:34

Â |Â
show 1 more comment

excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
â€“Â user2682863
Jan 27 at 17:08

I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
â€“Â Gareth Rees
Jan 27 at 17:13

I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
â€“Â user2682863
Jan 27 at 17:22

I can't see any evidence of that in the post Ã¢Â€Â” the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
â€“Â Gareth Rees
Jan 27 at 17:29

I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
â€“Â user2682863
Jan 27 at 17:34

excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
â€“Â user2682863
Jan 27 at 17:08

I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
â€“Â Gareth Rees
Jan 27 at 17:13

I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
â€“Â user2682863
Jan 27 at 17:22

I can't see any evidence of that in the post Ã¢Â€Â” the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
â€“Â Gareth Rees
Jan 27 at 17:29

I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
â€“Â user2682863
Jan 27 at 17:34

Â |Â
show 1 more comment

up vote
3
down vote

Right off the bat, this occurs to me:

Build a frequency map of the possible match values:

freqs = collections.Counter()

for k,v in possible_matches.items():
 freqs.update(v)

Weight the match keys by the sum of all possible replacements:

weights = k: sum(freqs[replacement] for replacement in v) 
 for k,v in possible_matches.items()

Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:
```
order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])
```

Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:
```
ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])
```

new_freqs = collections.Counter()
for k,v in possible_matches.items():
 new_freqs.update(v - used_numbers) # Discarding used numbers

And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...

answered Jan 26 at 20:16

Austin Hastings

6,1591130

this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
â€“Â user2682863
Jan 26 at 20:36

add a commentÂ |Â

up vote
3
down vote

Right off the bat, this occurs to me:

Build a frequency map of the possible match values:

freqs = collections.Counter()

for k,v in possible_matches.items():
 freqs.update(v)

Weight the match keys by the sum of all possible replacements:

weights = k: sum(freqs[replacement] for replacement in v) 
 for k,v in possible_matches.items()

Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:
```
order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])
```

Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:
```
ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])
```

new_freqs = collections.Counter()
for k,v in possible_matches.items():
 new_freqs.update(v - used_numbers) # Discarding used numbers

And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...

answered Jan 26 at 20:16

Austin Hastings

6,1591130

this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
â€“Â user2682863
Jan 26 at 20:36

add a commentÂ |Â

up vote
3
down vote

Right off the bat, this occurs to me:

Build a frequency map of the possible match values:

freqs = collections.Counter()

for k,v in possible_matches.items():
 freqs.update(v)

Weight the match keys by the sum of all possible replacements:

weights = k: sum(freqs[replacement] for replacement in v) 
 for k,v in possible_matches.items()

Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:
```
order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])
```

Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:
```
ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])
```

new_freqs = collections.Counter()
for k,v in possible_matches.items():
 new_freqs.update(v - used_numbers) # Discarding used numbers

And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...

answered Jan 26 at 20:16

Austin Hastings

6,1591130

Right off the bat, this occurs to me:

Build a frequency map of the possible match values:

freqs = collections.Counter()

for k,v in possible_matches.items():
 freqs.update(v)

Weight the match keys by the sum of all possible replacements:

weights = k: sum(freqs[replacement] for replacement in v) 
 for k,v in possible_matches.items()

Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:
```
order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])
```

Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:
```
ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])
```

new_freqs = collections.Counter()
for k,v in possible_matches.items():
 new_freqs.update(v - used_numbers) # Discarding used numbers

And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...

answered Jan 26 at 20:16

Austin Hastings

6,1591130

answered Jan 26 at 20:16

Austin Hastings

6,1591130

answered Jan 26 at 20:16

Austin Hastings

6,1591130

answered Jan 26 at 20:16

Austin Hastings

6,1591130

this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
â€“Â user2682863
Jan 26 at 20:36

add a commentÂ |Â

this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
â€“Â user2682863
Jan 26 at 20:36

this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
â€“Â user2682863
Jan 26 at 20:36

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr