increase the efficiency of a matching algorithm

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












I'm trying to create an algorithm that will match one set of objects with another. Each of the objects from one group can be matched with some of the objects from another group. Once an item has been matched, it cannot be matched to another item. My current implementation iterates over each of the objects and attempts to find a solution where each object has a match. I've done this recursively, but this method quickly becomes unusable as the processing time increases exponentially. Here is a trivial example of what I've written:



def demo_match():
possible_matches =
"x": 1, 2,
"y": 2, 3,
"z": 1, 2


used_numbers = set()

matched =

def match_items(remaining_items_to_match):

for letter in remaining_items_to_match:
possible_numbers = possible_matches[letter]
available_numbers = possible_numbers.difference(used_numbers)
print(f"can match 'letter' with available_numbers")
if not available_numbers:
raise ValueError(f"there are no possible matches for letter")
for possible_match in available_numbers:
try:
used_numbers.add(possible_match)
matched[letter] = possible_match
print(f"'letter' has been matched with possible_match")
if len(remaining_items_to_match) is 1:
return matched
# remove this item and process the remaining
next_remaining_items = remaining_items_to_match[1:]
return match_items(next_remaining_items)
except ValueError:
print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
used_numbers.remove(possible_match)
del matched[letter]
raise ValueError("ran out of available numbers to match")

print(f"nMatch Solution: match_items(list(possible_matches.keys()))")


The actual implementation assigns weights to the possible matches, meaning that matching 1 item would be preferred, but if it results in a dead end it tries the next match. I feel like there is a better implementation of this out there, maybe involving sets to find where items overlap and determining the critical path. I just haven't been able to come up with a decent alternative other than this exhaustive search.



------- Update ------



After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.



It accepts a matrix of items, where the rows represent 1 item, the columns another, and where they intersect is the cost when those items are paired. The common example uses people and jobs as the items. This algorithm will find the pair of items with the least cost. To prevent an item from being paired, set the cost in the matrix to float('inf')
eg:



 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6
Sally 5 3 7


would be represented by:



[
[9,5,1],
[1,4,6],
[5,3,7]
]


and would return [0,1,2], [2,0,1] where the first list represents the row indices and the second list represents the index of columns that would be matched with the rows
I ended up converting the two lists into a dictionary of people to their corresponding job via



>>> def match():
from munkres import Munkres
people = ["bob", "dick", "sally"]
jobs = ["janitor", "teacher", "doctor"]
matrix = [
[9,5,1],
[1,4,6],
[5,3,7]
]

m = Munkres()
indexes = m.compute(matrix)

matches =
for p, j in indexes:
person = people[p]
job = jobs[j]
matches[person] = job
return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'


matches will be a dictionary of people to their matched job







share|improve this question





















  • The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
    – vnp
    Jan 27 at 5:34
















up vote
4
down vote

favorite












I'm trying to create an algorithm that will match one set of objects with another. Each of the objects from one group can be matched with some of the objects from another group. Once an item has been matched, it cannot be matched to another item. My current implementation iterates over each of the objects and attempts to find a solution where each object has a match. I've done this recursively, but this method quickly becomes unusable as the processing time increases exponentially. Here is a trivial example of what I've written:



def demo_match():
possible_matches =
"x": 1, 2,
"y": 2, 3,
"z": 1, 2


used_numbers = set()

matched =

def match_items(remaining_items_to_match):

for letter in remaining_items_to_match:
possible_numbers = possible_matches[letter]
available_numbers = possible_numbers.difference(used_numbers)
print(f"can match 'letter' with available_numbers")
if not available_numbers:
raise ValueError(f"there are no possible matches for letter")
for possible_match in available_numbers:
try:
used_numbers.add(possible_match)
matched[letter] = possible_match
print(f"'letter' has been matched with possible_match")
if len(remaining_items_to_match) is 1:
return matched
# remove this item and process the remaining
next_remaining_items = remaining_items_to_match[1:]
return match_items(next_remaining_items)
except ValueError:
print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
used_numbers.remove(possible_match)
del matched[letter]
raise ValueError("ran out of available numbers to match")

print(f"nMatch Solution: match_items(list(possible_matches.keys()))")


The actual implementation assigns weights to the possible matches, meaning that matching 1 item would be preferred, but if it results in a dead end it tries the next match. I feel like there is a better implementation of this out there, maybe involving sets to find where items overlap and determining the critical path. I just haven't been able to come up with a decent alternative other than this exhaustive search.



------- Update ------



After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.



It accepts a matrix of items, where the rows represent 1 item, the columns another, and where they intersect is the cost when those items are paired. The common example uses people and jobs as the items. This algorithm will find the pair of items with the least cost. To prevent an item from being paired, set the cost in the matrix to float('inf')
eg:



 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6
Sally 5 3 7


would be represented by:



[
[9,5,1],
[1,4,6],
[5,3,7]
]


and would return [0,1,2], [2,0,1] where the first list represents the row indices and the second list represents the index of columns that would be matched with the rows
I ended up converting the two lists into a dictionary of people to their corresponding job via



>>> def match():
from munkres import Munkres
people = ["bob", "dick", "sally"]
jobs = ["janitor", "teacher", "doctor"]
matrix = [
[9,5,1],
[1,4,6],
[5,3,7]
]

m = Munkres()
indexes = m.compute(matrix)

matches =
for p, j in indexes:
person = people[p]
job = jobs[j]
matches[person] = job
return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'


matches will be a dictionary of people to their matched job







share|improve this question





















  • The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
    – vnp
    Jan 27 at 5:34












up vote
4
down vote

favorite









up vote
4
down vote

favorite











I'm trying to create an algorithm that will match one set of objects with another. Each of the objects from one group can be matched with some of the objects from another group. Once an item has been matched, it cannot be matched to another item. My current implementation iterates over each of the objects and attempts to find a solution where each object has a match. I've done this recursively, but this method quickly becomes unusable as the processing time increases exponentially. Here is a trivial example of what I've written:



def demo_match():
possible_matches =
"x": 1, 2,
"y": 2, 3,
"z": 1, 2


used_numbers = set()

matched =

def match_items(remaining_items_to_match):

for letter in remaining_items_to_match:
possible_numbers = possible_matches[letter]
available_numbers = possible_numbers.difference(used_numbers)
print(f"can match 'letter' with available_numbers")
if not available_numbers:
raise ValueError(f"there are no possible matches for letter")
for possible_match in available_numbers:
try:
used_numbers.add(possible_match)
matched[letter] = possible_match
print(f"'letter' has been matched with possible_match")
if len(remaining_items_to_match) is 1:
return matched
# remove this item and process the remaining
next_remaining_items = remaining_items_to_match[1:]
return match_items(next_remaining_items)
except ValueError:
print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
used_numbers.remove(possible_match)
del matched[letter]
raise ValueError("ran out of available numbers to match")

print(f"nMatch Solution: match_items(list(possible_matches.keys()))")


The actual implementation assigns weights to the possible matches, meaning that matching 1 item would be preferred, but if it results in a dead end it tries the next match. I feel like there is a better implementation of this out there, maybe involving sets to find where items overlap and determining the critical path. I just haven't been able to come up with a decent alternative other than this exhaustive search.



------- Update ------



After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.



It accepts a matrix of items, where the rows represent 1 item, the columns another, and where they intersect is the cost when those items are paired. The common example uses people and jobs as the items. This algorithm will find the pair of items with the least cost. To prevent an item from being paired, set the cost in the matrix to float('inf')
eg:



 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6
Sally 5 3 7


would be represented by:



[
[9,5,1],
[1,4,6],
[5,3,7]
]


and would return [0,1,2], [2,0,1] where the first list represents the row indices and the second list represents the index of columns that would be matched with the rows
I ended up converting the two lists into a dictionary of people to their corresponding job via



>>> def match():
from munkres import Munkres
people = ["bob", "dick", "sally"]
jobs = ["janitor", "teacher", "doctor"]
matrix = [
[9,5,1],
[1,4,6],
[5,3,7]
]

m = Munkres()
indexes = m.compute(matrix)

matches =
for p, j in indexes:
person = people[p]
job = jobs[j]
matches[person] = job
return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'


matches will be a dictionary of people to their matched job







share|improve this question













I'm trying to create an algorithm that will match one set of objects with another. Each of the objects from one group can be matched with some of the objects from another group. Once an item has been matched, it cannot be matched to another item. My current implementation iterates over each of the objects and attempts to find a solution where each object has a match. I've done this recursively, but this method quickly becomes unusable as the processing time increases exponentially. Here is a trivial example of what I've written:



def demo_match():
possible_matches =
"x": 1, 2,
"y": 2, 3,
"z": 1, 2


used_numbers = set()

matched =

def match_items(remaining_items_to_match):

for letter in remaining_items_to_match:
possible_numbers = possible_matches[letter]
available_numbers = possible_numbers.difference(used_numbers)
print(f"can match 'letter' with available_numbers")
if not available_numbers:
raise ValueError(f"there are no possible matches for letter")
for possible_match in available_numbers:
try:
used_numbers.add(possible_match)
matched[letter] = possible_match
print(f"'letter' has been matched with possible_match")
if len(remaining_items_to_match) is 1:
return matched
# remove this item and process the remaining
next_remaining_items = remaining_items_to_match[1:]
return match_items(next_remaining_items)
except ValueError:
print(f"***** found a dead end, unmatching 'letter' and possible_match *****")
used_numbers.remove(possible_match)
del matched[letter]
raise ValueError("ran out of available numbers to match")

print(f"nMatch Solution: match_items(list(possible_matches.keys()))")


The actual implementation assigns weights to the possible matches, meaning that matching 1 item would be preferred, but if it results in a dead end it tries the next match. I feel like there is a better implementation of this out there, maybe involving sets to find where items overlap and determining the critical path. I just haven't been able to come up with a decent alternative other than this exhaustive search.



------- Update ------



After researching the algorithms suggested in the answers I decided to go with the munkres algorithm. It allows weighting of items in the vertex.



It accepts a matrix of items, where the rows represent 1 item, the columns another, and where they intersect is the cost when those items are paired. The common example uses people and jobs as the items. This algorithm will find the pair of items with the least cost. To prevent an item from being paired, set the cost in the matrix to float('inf')
eg:



 Janitor Teacher Doctor
Bob 9 5 1
Dick 1 4 6
Sally 5 3 7


would be represented by:



[
[9,5,1],
[1,4,6],
[5,3,7]
]


and would return [0,1,2], [2,0,1] where the first list represents the row indices and the second list represents the index of columns that would be matched with the rows
I ended up converting the two lists into a dictionary of people to their corresponding job via



>>> def match():
from munkres import Munkres
people = ["bob", "dick", "sally"]
jobs = ["janitor", "teacher", "doctor"]
matrix = [
[9,5,1],
[1,4,6],
[5,3,7]
]

m = Munkres()
indexes = m.compute(matrix)

matches =
for p, j in indexes:
person = people[p]
job = jobs[j]
matches[person] = job
return matches

>>> match()
'bob': 'doctor', 'dick': 'janitor', 'sally': 'teacher'


matches will be a dictionary of people to their matched job









share|improve this question












share|improve this question




share|improve this question








edited Jan 28 at 18:43
























asked Jan 26 at 19:40









user2682863

1235




1235











  • The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
    – vnp
    Jan 27 at 5:34
















  • The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
    – vnp
    Jan 27 at 5:34















The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
– vnp
Jan 27 at 5:34




The problem you are trying to solve smells NP-hard. I am almost sure it is equivalent to a vertex cover problem on a dual graph. The almost clause makes it a comment rather than an answer.
– vnp
Jan 27 at 5:34










2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










1. The Hopcroft–Karp algorithm



The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the Hopcroft–Karp algorithm.



Here's a sketch of how the Hopcroft–Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:



Bipartite graph with vertices x,y,z,1,2,3



Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:



U contains vertices x,y,z and V contains vertices 1,2,3



The Hopcroft–Karp algorithm works by repeatedly augmenting the matching, increasing the number of edges in the matching by one. Suppose that we have added one edge to the matching so far, between $y$ and $2$, drawn as a red edge:



y matched with 2



The algorithm now searches for an augmenting path. This is a path that starts at an unmatched vertex in $U$, crosses to $V$ on an unmatched (black) edge, crosses back to $U$ on a matched (red) edge, and so on, finally ending in $V$. Here's an augmenting path, starting at $x$, following an unmatched edge to $2$, following a matched edge to $y$, and following an unmatched edge to $3$:



Augmenting path x-2-y-3



An augmenting path has an odd number of edges (because it starts in $U$ and finishes in $V$) and the edges along the path alternate between unmatched and matched. This means that if we swap all the edges along the augmenting path (matched becoming unmatched and vice versa), we get a new matching with one more edge, like this:



x matched with 2 and y matched with 3



Now carrying on with the algorithm, there is another augmenting path, $z$–$2$–$x$–$1$:



Augmenting path z-2-x-1



Swapping the edges along this path, we increase the matching again:



x matched with 1, y with 3 and z with 2



Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.



How do we find the augmenting paths? Well, we use a combination of breadth-first search and depth-first search. At each stage we do a breadth-first search to divide the vertices into layers according to their shortest distance from an unmatched vertex along an augmenting path. Suppose we are at this stage of the algorithm, where $y$ has been matched with $2$:



y matched with 2



Now the breadth-first search divides the vertices into the following layers:



Four layers



Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.



2. Implementation



This is a direct translation of the pseudocode from Wikipedia into Python. To help with book-keeping, this implementation introduces a distinguished node nil, with all vertices being artificially matched with nil to start with. When there are no more augmenting paths ending at nil then we know the matching is maximal.



from collections import deque

def maximum_matching(graph):
"""Find a maximum unweighted matching in a bipartite graph. The input
must be a dictionary mapping vertices in one partition to sets of
vertices in the other partition. Return a dictionary mapping
vertices in one partition to their matched vertex in the other.

"""
# The two partitions of the graph.
U = set(graph.keys())
V = set.union(*graph.values())

# A distinguished vertex not belonging to the graph.
nil = object()

# Current pairing for vertices in each partition.
pair_u = dict.fromkeys(U, nil)
pair_v = dict.fromkeys(V, nil)

# Distance to each vertex along augmenting paths.
dist =
inf = float('inf')

def bfs():
# Breadth-first search of the graph starting with unmatched
# vertices in U and following "augmenting paths" alternating
# between edges from U->V not in the matching and edges from
# V->U in the matching. This partitions the vertexes into
# layers according to their distance along augmenting paths.
queue = deque()
for u in U:
if pair_u[u] is nil:
dist[u] = 0
queue.append(u)
else:
dist[u] = inf
dist[nil] = inf
while queue:
u = queue.pop()
if dist[u] < dist[nil]:
# Follow edges from U->V not in the matching.
for v in graph[u]:
# Follow edge from V->U in the matching, if any.
uu = pair_v[v]
if dist[uu] == inf:
dist[uu] = dist[u] + 1
queue.append(uu)
return dist[nil] is not inf

def dfs(u):
# Depth-first search along "augmenting paths" starting at
# u. If we can find such a path that goes all the way from
# u to nil, then we can swap matched/unmatched edges all
# along the path, getting one additional matched edge
# (since the path must have an odd number of edges).
if u is not nil:
for v in graph[u]:
uu = pair_v[v]
if dist[uu] == dist[u] + 1: # uu in next layer
if dfs(uu):
# Found augmenting path all the way to nil. In
# this path v was matched with uu, so after
# swapping v must now be matched with u.
pair_v[v] = u
pair_u[u] = v
return True
dist[u] = inf
return False
return True

while bfs():
for u in U:
if pair_u[u] is nil:
dfs(u)
return u: v for u, v in pair_u.items() if v is not nil





share|improve this answer























  • excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
    – user2682863
    Jan 27 at 17:08










  • I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
    – Gareth Rees
    Jan 27 at 17:13










  • I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
    – user2682863
    Jan 27 at 17:22










  • I can't see any evidence of that in the post — the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
    – Gareth Rees
    Jan 27 at 17:29











  • I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
    – user2682863
    Jan 27 at 17:34

















up vote
3
down vote













Right off the bat, this occurs to me:




  • Build a frequency map of the possible match values:



    freqs = collections.Counter()

    for k,v in possible_matches.items():
    freqs.update(v)



  • Weight the match keys by the sum of all possible replacements:



    weights = k: sum(freqs[replacement] for replacement in v) 
    for k,v in possible_matches.items()



  • Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:



    order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])



  • Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:



    ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])


Note that technically, when you "allocate" a match you are changing the dynamics of the weighting. So you could pass a modified replacements table down recursively, so used replacement values were removed from the weight computations:



new_freqs = collections.Counter()
for k,v in possible_matches.items():
new_freqs.update(v - used_numbers) # Discarding used numbers


And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...






share|improve this answer





















  • this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
    – user2682863
    Jan 26 at 20:36










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186077%2fincrease-the-efficiency-of-a-matching-algorithm%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










1. The Hopcroft–Karp algorithm



The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the Hopcroft–Karp algorithm.



Here's a sketch of how the Hopcroft–Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:



Bipartite graph with vertices x,y,z,1,2,3



Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:



U contains vertices x,y,z and V contains vertices 1,2,3



The Hopcroft–Karp algorithm works by repeatedly augmenting the matching, increasing the number of edges in the matching by one. Suppose that we have added one edge to the matching so far, between $y$ and $2$, drawn as a red edge:



y matched with 2



The algorithm now searches for an augmenting path. This is a path that starts at an unmatched vertex in $U$, crosses to $V$ on an unmatched (black) edge, crosses back to $U$ on a matched (red) edge, and so on, finally ending in $V$. Here's an augmenting path, starting at $x$, following an unmatched edge to $2$, following a matched edge to $y$, and following an unmatched edge to $3$:



Augmenting path x-2-y-3



An augmenting path has an odd number of edges (because it starts in $U$ and finishes in $V$) and the edges along the path alternate between unmatched and matched. This means that if we swap all the edges along the augmenting path (matched becoming unmatched and vice versa), we get a new matching with one more edge, like this:



x matched with 2 and y matched with 3



Now carrying on with the algorithm, there is another augmenting path, $z$–$2$–$x$–$1$:



Augmenting path z-2-x-1



Swapping the edges along this path, we increase the matching again:



x matched with 1, y with 3 and z with 2



Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.



How do we find the augmenting paths? Well, we use a combination of breadth-first search and depth-first search. At each stage we do a breadth-first search to divide the vertices into layers according to their shortest distance from an unmatched vertex along an augmenting path. Suppose we are at this stage of the algorithm, where $y$ has been matched with $2$:



y matched with 2



Now the breadth-first search divides the vertices into the following layers:



Four layers



Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.



2. Implementation



This is a direct translation of the pseudocode from Wikipedia into Python. To help with book-keeping, this implementation introduces a distinguished node nil, with all vertices being artificially matched with nil to start with. When there are no more augmenting paths ending at nil then we know the matching is maximal.



from collections import deque

def maximum_matching(graph):
"""Find a maximum unweighted matching in a bipartite graph. The input
must be a dictionary mapping vertices in one partition to sets of
vertices in the other partition. Return a dictionary mapping
vertices in one partition to their matched vertex in the other.

"""
# The two partitions of the graph.
U = set(graph.keys())
V = set.union(*graph.values())

# A distinguished vertex not belonging to the graph.
nil = object()

# Current pairing for vertices in each partition.
pair_u = dict.fromkeys(U, nil)
pair_v = dict.fromkeys(V, nil)

# Distance to each vertex along augmenting paths.
dist =
inf = float('inf')

def bfs():
# Breadth-first search of the graph starting with unmatched
# vertices in U and following "augmenting paths" alternating
# between edges from U->V not in the matching and edges from
# V->U in the matching. This partitions the vertexes into
# layers according to their distance along augmenting paths.
queue = deque()
for u in U:
if pair_u[u] is nil:
dist[u] = 0
queue.append(u)
else:
dist[u] = inf
dist[nil] = inf
while queue:
u = queue.pop()
if dist[u] < dist[nil]:
# Follow edges from U->V not in the matching.
for v in graph[u]:
# Follow edge from V->U in the matching, if any.
uu = pair_v[v]
if dist[uu] == inf:
dist[uu] = dist[u] + 1
queue.append(uu)
return dist[nil] is not inf

def dfs(u):
# Depth-first search along "augmenting paths" starting at
# u. If we can find such a path that goes all the way from
# u to nil, then we can swap matched/unmatched edges all
# along the path, getting one additional matched edge
# (since the path must have an odd number of edges).
if u is not nil:
for v in graph[u]:
uu = pair_v[v]
if dist[uu] == dist[u] + 1: # uu in next layer
if dfs(uu):
# Found augmenting path all the way to nil. In
# this path v was matched with uu, so after
# swapping v must now be matched with u.
pair_v[v] = u
pair_u[u] = v
return True
dist[u] = inf
return False
return True

while bfs():
for u in U:
if pair_u[u] is nil:
dfs(u)
return u: v for u, v in pair_u.items() if v is not nil





share|improve this answer























  • excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
    – user2682863
    Jan 27 at 17:08










  • I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
    – Gareth Rees
    Jan 27 at 17:13










  • I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
    – user2682863
    Jan 27 at 17:22










  • I can't see any evidence of that in the post — the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
    – Gareth Rees
    Jan 27 at 17:29











  • I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
    – user2682863
    Jan 27 at 17:34














up vote
3
down vote



accepted










1. The Hopcroft–Karp algorithm



The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the Hopcroft–Karp algorithm.



Here's a sketch of how the Hopcroft–Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:



Bipartite graph with vertices x,y,z,1,2,3



Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:



U contains vertices x,y,z and V contains vertices 1,2,3



The Hopcroft–Karp algorithm works by repeatedly augmenting the matching, increasing the number of edges in the matching by one. Suppose that we have added one edge to the matching so far, between $y$ and $2$, drawn as a red edge:



y matched with 2



The algorithm now searches for an augmenting path. This is a path that starts at an unmatched vertex in $U$, crosses to $V$ on an unmatched (black) edge, crosses back to $U$ on a matched (red) edge, and so on, finally ending in $V$. Here's an augmenting path, starting at $x$, following an unmatched edge to $2$, following a matched edge to $y$, and following an unmatched edge to $3$:



Augmenting path x-2-y-3



An augmenting path has an odd number of edges (because it starts in $U$ and finishes in $V$) and the edges along the path alternate between unmatched and matched. This means that if we swap all the edges along the augmenting path (matched becoming unmatched and vice versa), we get a new matching with one more edge, like this:



x matched with 2 and y matched with 3



Now carrying on with the algorithm, there is another augmenting path, $z$–$2$–$x$–$1$:



Augmenting path z-2-x-1



Swapping the edges along this path, we increase the matching again:



x matched with 1, y with 3 and z with 2



Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.



How do we find the augmenting paths? Well, we use a combination of breadth-first search and depth-first search. At each stage we do a breadth-first search to divide the vertices into layers according to their shortest distance from an unmatched vertex along an augmenting path. Suppose we are at this stage of the algorithm, where $y$ has been matched with $2$:



y matched with 2



Now the breadth-first search divides the vertices into the following layers:



Four layers



Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.



2. Implementation



This is a direct translation of the pseudocode from Wikipedia into Python. To help with book-keeping, this implementation introduces a distinguished node nil, with all vertices being artificially matched with nil to start with. When there are no more augmenting paths ending at nil then we know the matching is maximal.



from collections import deque

def maximum_matching(graph):
"""Find a maximum unweighted matching in a bipartite graph. The input
must be a dictionary mapping vertices in one partition to sets of
vertices in the other partition. Return a dictionary mapping
vertices in one partition to their matched vertex in the other.

"""
# The two partitions of the graph.
U = set(graph.keys())
V = set.union(*graph.values())

# A distinguished vertex not belonging to the graph.
nil = object()

# Current pairing for vertices in each partition.
pair_u = dict.fromkeys(U, nil)
pair_v = dict.fromkeys(V, nil)

# Distance to each vertex along augmenting paths.
dist =
inf = float('inf')

def bfs():
# Breadth-first search of the graph starting with unmatched
# vertices in U and following "augmenting paths" alternating
# between edges from U->V not in the matching and edges from
# V->U in the matching. This partitions the vertexes into
# layers according to their distance along augmenting paths.
queue = deque()
for u in U:
if pair_u[u] is nil:
dist[u] = 0
queue.append(u)
else:
dist[u] = inf
dist[nil] = inf
while queue:
u = queue.pop()
if dist[u] < dist[nil]:
# Follow edges from U->V not in the matching.
for v in graph[u]:
# Follow edge from V->U in the matching, if any.
uu = pair_v[v]
if dist[uu] == inf:
dist[uu] = dist[u] + 1
queue.append(uu)
return dist[nil] is not inf

def dfs(u):
# Depth-first search along "augmenting paths" starting at
# u. If we can find such a path that goes all the way from
# u to nil, then we can swap matched/unmatched edges all
# along the path, getting one additional matched edge
# (since the path must have an odd number of edges).
if u is not nil:
for v in graph[u]:
uu = pair_v[v]
if dist[uu] == dist[u] + 1: # uu in next layer
if dfs(uu):
# Found augmenting path all the way to nil. In
# this path v was matched with uu, so after
# swapping v must now be matched with u.
pair_v[v] = u
pair_u[u] = v
return True
dist[u] = inf
return False
return True

while bfs():
for u in U:
if pair_u[u] is nil:
dfs(u)
return u: v for u, v in pair_u.items() if v is not nil





share|improve this answer























  • excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
    – user2682863
    Jan 27 at 17:08










  • I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
    – Gareth Rees
    Jan 27 at 17:13










  • I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
    – user2682863
    Jan 27 at 17:22










  • I can't see any evidence of that in the post — the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
    – Gareth Rees
    Jan 27 at 17:29











  • I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
    – user2682863
    Jan 27 at 17:34












up vote
3
down vote



accepted







up vote
3
down vote



accepted






1. The Hopcroft–Karp algorithm



The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the Hopcroft–Karp algorithm.



Here's a sketch of how the Hopcroft–Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:



Bipartite graph with vertices x,y,z,1,2,3



Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:



U contains vertices x,y,z and V contains vertices 1,2,3



The Hopcroft–Karp algorithm works by repeatedly augmenting the matching, increasing the number of edges in the matching by one. Suppose that we have added one edge to the matching so far, between $y$ and $2$, drawn as a red edge:



y matched with 2



The algorithm now searches for an augmenting path. This is a path that starts at an unmatched vertex in $U$, crosses to $V$ on an unmatched (black) edge, crosses back to $U$ on a matched (red) edge, and so on, finally ending in $V$. Here's an augmenting path, starting at $x$, following an unmatched edge to $2$, following a matched edge to $y$, and following an unmatched edge to $3$:



Augmenting path x-2-y-3



An augmenting path has an odd number of edges (because it starts in $U$ and finishes in $V$) and the edges along the path alternate between unmatched and matched. This means that if we swap all the edges along the augmenting path (matched becoming unmatched and vice versa), we get a new matching with one more edge, like this:



x matched with 2 and y matched with 3



Now carrying on with the algorithm, there is another augmenting path, $z$–$2$–$x$–$1$:



Augmenting path z-2-x-1



Swapping the edges along this path, we increase the matching again:



x matched with 1, y with 3 and z with 2



Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.



How do we find the augmenting paths? Well, we use a combination of breadth-first search and depth-first search. At each stage we do a breadth-first search to divide the vertices into layers according to their shortest distance from an unmatched vertex along an augmenting path. Suppose we are at this stage of the algorithm, where $y$ has been matched with $2$:



y matched with 2



Now the breadth-first search divides the vertices into the following layers:



Four layers



Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.



2. Implementation



This is a direct translation of the pseudocode from Wikipedia into Python. To help with book-keeping, this implementation introduces a distinguished node nil, with all vertices being artificially matched with nil to start with. When there are no more augmenting paths ending at nil then we know the matching is maximal.



from collections import deque

def maximum_matching(graph):
"""Find a maximum unweighted matching in a bipartite graph. The input
must be a dictionary mapping vertices in one partition to sets of
vertices in the other partition. Return a dictionary mapping
vertices in one partition to their matched vertex in the other.

"""
# The two partitions of the graph.
U = set(graph.keys())
V = set.union(*graph.values())

# A distinguished vertex not belonging to the graph.
nil = object()

# Current pairing for vertices in each partition.
pair_u = dict.fromkeys(U, nil)
pair_v = dict.fromkeys(V, nil)

# Distance to each vertex along augmenting paths.
dist =
inf = float('inf')

def bfs():
# Breadth-first search of the graph starting with unmatched
# vertices in U and following "augmenting paths" alternating
# between edges from U->V not in the matching and edges from
# V->U in the matching. This partitions the vertexes into
# layers according to their distance along augmenting paths.
queue = deque()
for u in U:
if pair_u[u] is nil:
dist[u] = 0
queue.append(u)
else:
dist[u] = inf
dist[nil] = inf
while queue:
u = queue.pop()
if dist[u] < dist[nil]:
# Follow edges from U->V not in the matching.
for v in graph[u]:
# Follow edge from V->U in the matching, if any.
uu = pair_v[v]
if dist[uu] == inf:
dist[uu] = dist[u] + 1
queue.append(uu)
return dist[nil] is not inf

def dfs(u):
# Depth-first search along "augmenting paths" starting at
# u. If we can find such a path that goes all the way from
# u to nil, then we can swap matched/unmatched edges all
# along the path, getting one additional matched edge
# (since the path must have an odd number of edges).
if u is not nil:
for v in graph[u]:
uu = pair_v[v]
if dist[uu] == dist[u] + 1: # uu in next layer
if dfs(uu):
# Found augmenting path all the way to nil. In
# this path v was matched with uu, so after
# swapping v must now be matched with u.
pair_v[v] = u
pair_u[u] = v
return True
dist[u] = inf
return False
return True

while bfs():
for u in U:
if pair_u[u] is nil:
dfs(u)
return u: v for u, v in pair_u.items() if v is not nil





share|improve this answer















1. The Hopcroft–Karp algorithm



The problem you're trying to solve here is to find a maximum matching in an unweighted bipartite graph. This can be solved in $O(sqrt V E)$ by the Hopcroft–Karp algorithm.



Here's a sketch of how the Hopcroft–Karp algorithm works. Suppose that we have a bipartite unweighted graph, like the one you use as an example in the post:



Bipartite graph with vertices x,y,z,1,2,3



Because it's bipartite, there are two sets of vertices which we'll call $U$ and $V$:



U contains vertices x,y,z and V contains vertices 1,2,3



The Hopcroft–Karp algorithm works by repeatedly augmenting the matching, increasing the number of edges in the matching by one. Suppose that we have added one edge to the matching so far, between $y$ and $2$, drawn as a red edge:



y matched with 2



The algorithm now searches for an augmenting path. This is a path that starts at an unmatched vertex in $U$, crosses to $V$ on an unmatched (black) edge, crosses back to $U$ on a matched (red) edge, and so on, finally ending in $V$. Here's an augmenting path, starting at $x$, following an unmatched edge to $2$, following a matched edge to $y$, and following an unmatched edge to $3$:



Augmenting path x-2-y-3



An augmenting path has an odd number of edges (because it starts in $U$ and finishes in $V$) and the edges along the path alternate between unmatched and matched. This means that if we swap all the edges along the augmenting path (matched becoming unmatched and vice versa), we get a new matching with one more edge, like this:



x matched with 2 and y matched with 3



Now carrying on with the algorithm, there is another augmenting path, $z$–$2$–$x$–$1$:



Augmenting path z-2-x-1



Swapping the edges along this path, we increase the matching again:



x matched with 1, y with 3 and z with 2



Now there are no more augmenting paths (because there are no unmatched vertices in $U$) and we are done.



How do we find the augmenting paths? Well, we use a combination of breadth-first search and depth-first search. At each stage we do a breadth-first search to divide the vertices into layers according to their shortest distance from an unmatched vertex along an augmenting path. Suppose we are at this stage of the algorithm, where $y$ has been matched with $2$:



y matched with 2



Now the breadth-first search divides the vertices into the following layers:



Four layers



Using this structure, an augmenting path can be found efficiently (linear time) by a depth-first search that only follows edges from lower-numbered layers to higher-numbered layers.



2. Implementation



This is a direct translation of the pseudocode from Wikipedia into Python. To help with book-keeping, this implementation introduces a distinguished node nil, with all vertices being artificially matched with nil to start with. When there are no more augmenting paths ending at nil then we know the matching is maximal.



from collections import deque

def maximum_matching(graph):
"""Find a maximum unweighted matching in a bipartite graph. The input
must be a dictionary mapping vertices in one partition to sets of
vertices in the other partition. Return a dictionary mapping
vertices in one partition to their matched vertex in the other.

"""
# The two partitions of the graph.
U = set(graph.keys())
V = set.union(*graph.values())

# A distinguished vertex not belonging to the graph.
nil = object()

# Current pairing for vertices in each partition.
pair_u = dict.fromkeys(U, nil)
pair_v = dict.fromkeys(V, nil)

# Distance to each vertex along augmenting paths.
dist =
inf = float('inf')

def bfs():
# Breadth-first search of the graph starting with unmatched
# vertices in U and following "augmenting paths" alternating
# between edges from U->V not in the matching and edges from
# V->U in the matching. This partitions the vertexes into
# layers according to their distance along augmenting paths.
queue = deque()
for u in U:
if pair_u[u] is nil:
dist[u] = 0
queue.append(u)
else:
dist[u] = inf
dist[nil] = inf
while queue:
u = queue.pop()
if dist[u] < dist[nil]:
# Follow edges from U->V not in the matching.
for v in graph[u]:
# Follow edge from V->U in the matching, if any.
uu = pair_v[v]
if dist[uu] == inf:
dist[uu] = dist[u] + 1
queue.append(uu)
return dist[nil] is not inf

def dfs(u):
# Depth-first search along "augmenting paths" starting at
# u. If we can find such a path that goes all the way from
# u to nil, then we can swap matched/unmatched edges all
# along the path, getting one additional matched edge
# (since the path must have an odd number of edges).
if u is not nil:
for v in graph[u]:
uu = pair_v[v]
if dist[uu] == dist[u] + 1: # uu in next layer
if dfs(uu):
# Found augmenting path all the way to nil. In
# this path v was matched with uu, so after
# swapping v must now be matched with u.
pair_v[v] = u
pair_u[u] = v
return True
dist[u] = inf
return False
return True

while bfs():
for u in U:
if pair_u[u] is nil:
dfs(u)
return u: v for u, v in pair_u.items() if v is not nil






share|improve this answer















share|improve this answer



share|improve this answer








edited Jan 27 at 11:44


























answered Jan 27 at 11:38









Gareth Rees

41.1k394168




41.1k394168











  • excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
    – user2682863
    Jan 27 at 17:08










  • I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
    – Gareth Rees
    Jan 27 at 17:13










  • I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
    – user2682863
    Jan 27 at 17:22










  • I can't see any evidence of that in the post — the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
    – Gareth Rees
    Jan 27 at 17:29











  • I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
    – user2682863
    Jan 27 at 17:34
















  • excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
    – user2682863
    Jan 27 at 17:08










  • I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
    – Gareth Rees
    Jan 27 at 17:13










  • I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
    – user2682863
    Jan 27 at 17:22










  • I can't see any evidence of that in the post — the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
    – Gareth Rees
    Jan 27 at 17:29











  • I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
    – user2682863
    Jan 27 at 17:34















excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
– user2682863
Jan 27 at 17:08




excellent answer and explanation - I'm still working on comprehending the algorithm- there's no way to assign a weight or to sort the preferred matches correct?
– user2682863
Jan 27 at 17:08












I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
– Gareth Rees
Jan 27 at 17:13




I don't know what you mean by "assign a weight" or "sort the preferred matches". Do you mean that some matches score better than others, and you would like to maximize the sum of the scores of the matches? You didn't mention that in the post, and so I didn't consider it in my answer. But if you are in that situation, you have a variant of the assignment problem, which can be solved by the Hungarian algorithm.
– Gareth Rees
Jan 27 at 17:13












I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
– user2682863
Jan 27 at 17:22




I mentioned it in the last paragraph- eg: "x" can be paired with 1, 2, or 3, but matching it with 2 is preferred, followed by 3, and then 1. I'll check out the hungarian algorithm
– user2682863
Jan 27 at 17:22












I can't see any evidence of that in the post — the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
– Gareth Rees
Jan 27 at 17:29





I can't see any evidence of that in the post — the implementation does not assign weights or preferences. When you write "actual implementation" do you mean that you have another implementation that you didn't show us?
– Gareth Rees
Jan 27 at 17:29













I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
– user2682863
Jan 27 at 17:34




I simplified the demo implementation for clarity (perhaps too much) thinking it would be easy to adapt an answer to prioritize matches- this will probably be the accepted answer- thanks
– user2682863
Jan 27 at 17:34












up vote
3
down vote













Right off the bat, this occurs to me:




  • Build a frequency map of the possible match values:



    freqs = collections.Counter()

    for k,v in possible_matches.items():
    freqs.update(v)



  • Weight the match keys by the sum of all possible replacements:



    weights = k: sum(freqs[replacement] for replacement in v) 
    for k,v in possible_matches.items()



  • Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:



    order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])



  • Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:



    ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])


Note that technically, when you "allocate" a match you are changing the dynamics of the weighting. So you could pass a modified replacements table down recursively, so used replacement values were removed from the weight computations:



new_freqs = collections.Counter()
for k,v in possible_matches.items():
new_freqs.update(v - used_numbers) # Discarding used numbers


And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...






share|improve this answer





















  • this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
    – user2682863
    Jan 26 at 20:36














up vote
3
down vote













Right off the bat, this occurs to me:




  • Build a frequency map of the possible match values:



    freqs = collections.Counter()

    for k,v in possible_matches.items():
    freqs.update(v)



  • Weight the match keys by the sum of all possible replacements:



    weights = k: sum(freqs[replacement] for replacement in v) 
    for k,v in possible_matches.items()



  • Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:



    order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])



  • Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:



    ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])


Note that technically, when you "allocate" a match you are changing the dynamics of the weighting. So you could pass a modified replacements table down recursively, so used replacement values were removed from the weight computations:



new_freqs = collections.Counter()
for k,v in possible_matches.items():
new_freqs.update(v - used_numbers) # Discarding used numbers


And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...






share|improve this answer





















  • this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
    – user2682863
    Jan 26 at 20:36












up vote
3
down vote










up vote
3
down vote









Right off the bat, this occurs to me:




  • Build a frequency map of the possible match values:



    freqs = collections.Counter()

    for k,v in possible_matches.items():
    freqs.update(v)



  • Weight the match keys by the sum of all possible replacements:



    weights = k: sum(freqs[replacement] for replacement in v) 
    for k,v in possible_matches.items()



  • Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:



    order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])



  • Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:



    ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])


Note that technically, when you "allocate" a match you are changing the dynamics of the weighting. So you could pass a modified replacements table down recursively, so used replacement values were removed from the weight computations:



new_freqs = collections.Counter()
for k,v in possible_matches.items():
new_freqs.update(v - used_numbers) # Discarding used numbers


And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...






share|improve this answer













Right off the bat, this occurs to me:




  • Build a frequency map of the possible match values:



    freqs = collections.Counter()

    for k,v in possible_matches.items():
    freqs.update(v)



  • Weight the match keys by the sum of all possible replacements:



    weights = k: sum(freqs[replacement] for replacement in v) 
    for k,v in possible_matches.items()



  • Use the weights to order the possible matches with the lowest weight first. These will be the matches with the fewest possibilities:



    order_to_try = sorted(possible_matches.keys(), key=lambda k: weights[k])



  • Sort the remaining_items_to_match using the order above. This will tend to assign the hardest-to-fit replacements first, so it should prune early:



    ritm = sorted(remaining_items_to_match, key=lambda k: weights[k])


Note that technically, when you "allocate" a match you are changing the dynamics of the weighting. So you could pass a modified replacements table down recursively, so used replacement values were removed from the weight computations:



new_freqs = collections.Counter()
for k,v in possible_matches.items():
new_freqs.update(v - used_numbers) # Discarding used numbers


And if it matters, you should be intersecting your possible_matches with the set of inputs (remaining_items_to_match) to make sure you're not carrying any extra baggage...







share|improve this answer













share|improve this answer



share|improve this answer











answered Jan 26 at 20:16









Austin Hastings

6,1591130




6,1591130











  • this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
    – user2682863
    Jan 26 at 20:36
















  • this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
    – user2682863
    Jan 26 at 20:36















this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
– user2682863
Jan 26 at 20:36




this is a good suggestion - I should have mentioned that in the actual implementation I iterate in order of items that have the fewest possible matches which I think would produce similar results to this
– user2682863
Jan 26 at 20:36












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f186077%2fincrease-the-efficiency-of-a-matching-algorithm%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Chat program with C++ and SFML

Function to Return a JSON Like Objects Using VBA Collections and Arrays

Will my employers contract hold up in court?