Calculate median of a matrix

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
0
down vote

favorite












I am solving interview questions from here.




Problem : Given a N cross M matrix in which each row is sorted, find the overall median of the matrix. Assume N*M is odd.



Note: No extra memory is allowed.
For example:



 Matrix= [1, 3, 5]
[2, 6, 9]
[3, 6, 9]


A = [1, 2, 3, 3, 5, 6, 6, 9, 9]
Median is 5. So, output is 5.




This is my approach:



def find_median( A):
"""Returns the median value from given list"""
for i in range(1,len(A)):
A[0].extend(A[i])

return (sorted(A[0])).pop(len(A[0])/2)


Test Cases:



assert find_median([[1,3,5],[2,5,9],[3,6,11]]) == 5
assert find_median([[0,1,1],[2,6,10],[3,5,9]]) == 3
assert find_median([[1,3,4,12,14],[1,6,9,10,15],[0,1,3,3,4]]) == 4


I am able to solve the problem but I wanted to know is there a better approach to solve this problem?







share|improve this question

















  • 1




    Presumably by "no extra memory is allowed" they actually mean "$O(1)$ extra memory is allowed" (otherwise there would be no solution). But the code in the post uses $Theta(NM)$ extra memory and so does not solve the problem.
    – Gareth Rees
    May 24 at 17:39











  • @Gareth Raees Updated code s.t. constraint "no extra memory allocation required"
    – Latika Agarwal
    May 24 at 18:56







  • 2




    It's not there yet: the calls to A[0].extend(A[i]) have to allocate $Theta(MN)$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the __sizeof__ method to determine how much memory a particular object is using, for example A[0].__sizeof__() tells you the memory used by the list A[0] in bytes.
    – Gareth Rees
    May 25 at 9:07







  • 1




    I'm pretty sure this can be solved without allocating any memory on the heap. You will need some allocation to keep track of some counters (namely $O(n)$). You know on which index the median will be (because you know the dimensions of the matrix). Since all rows are sorted, you can make use of a merge-sort-like iteration of the rows to get to that index. You'll need a "current" counter for each row, though.
    – Vogel612♦
    May 25 at 9:49











  • I have been educated in chat. The solution is not what you have here, but it's not pretty on the time complexity level :/
    – Vogel612♦
    May 25 at 10:28
















up vote
0
down vote

favorite












I am solving interview questions from here.




Problem : Given a N cross M matrix in which each row is sorted, find the overall median of the matrix. Assume N*M is odd.



Note: No extra memory is allowed.
For example:



 Matrix= [1, 3, 5]
[2, 6, 9]
[3, 6, 9]


A = [1, 2, 3, 3, 5, 6, 6, 9, 9]
Median is 5. So, output is 5.




This is my approach:



def find_median( A):
"""Returns the median value from given list"""
for i in range(1,len(A)):
A[0].extend(A[i])

return (sorted(A[0])).pop(len(A[0])/2)


Test Cases:



assert find_median([[1,3,5],[2,5,9],[3,6,11]]) == 5
assert find_median([[0,1,1],[2,6,10],[3,5,9]]) == 3
assert find_median([[1,3,4,12,14],[1,6,9,10,15],[0,1,3,3,4]]) == 4


I am able to solve the problem but I wanted to know is there a better approach to solve this problem?







share|improve this question

















  • 1




    Presumably by "no extra memory is allowed" they actually mean "$O(1)$ extra memory is allowed" (otherwise there would be no solution). But the code in the post uses $Theta(NM)$ extra memory and so does not solve the problem.
    – Gareth Rees
    May 24 at 17:39











  • @Gareth Raees Updated code s.t. constraint "no extra memory allocation required"
    – Latika Agarwal
    May 24 at 18:56







  • 2




    It's not there yet: the calls to A[0].extend(A[i]) have to allocate $Theta(MN)$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the __sizeof__ method to determine how much memory a particular object is using, for example A[0].__sizeof__() tells you the memory used by the list A[0] in bytes.
    – Gareth Rees
    May 25 at 9:07







  • 1




    I'm pretty sure this can be solved without allocating any memory on the heap. You will need some allocation to keep track of some counters (namely $O(n)$). You know on which index the median will be (because you know the dimensions of the matrix). Since all rows are sorted, you can make use of a merge-sort-like iteration of the rows to get to that index. You'll need a "current" counter for each row, though.
    – Vogel612♦
    May 25 at 9:49











  • I have been educated in chat. The solution is not what you have here, but it's not pretty on the time complexity level :/
    – Vogel612♦
    May 25 at 10:28












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am solving interview questions from here.




Problem : Given a N cross M matrix in which each row is sorted, find the overall median of the matrix. Assume N*M is odd.



Note: No extra memory is allowed.
For example:



 Matrix= [1, 3, 5]
[2, 6, 9]
[3, 6, 9]


A = [1, 2, 3, 3, 5, 6, 6, 9, 9]
Median is 5. So, output is 5.




This is my approach:



def find_median( A):
"""Returns the median value from given list"""
for i in range(1,len(A)):
A[0].extend(A[i])

return (sorted(A[0])).pop(len(A[0])/2)


Test Cases:



assert find_median([[1,3,5],[2,5,9],[3,6,11]]) == 5
assert find_median([[0,1,1],[2,6,10],[3,5,9]]) == 3
assert find_median([[1,3,4,12,14],[1,6,9,10,15],[0,1,3,3,4]]) == 4


I am able to solve the problem but I wanted to know is there a better approach to solve this problem?







share|improve this question













I am solving interview questions from here.




Problem : Given a N cross M matrix in which each row is sorted, find the overall median of the matrix. Assume N*M is odd.



Note: No extra memory is allowed.
For example:



 Matrix= [1, 3, 5]
[2, 6, 9]
[3, 6, 9]


A = [1, 2, 3, 3, 5, 6, 6, 9, 9]
Median is 5. So, output is 5.




This is my approach:



def find_median( A):
"""Returns the median value from given list"""
for i in range(1,len(A)):
A[0].extend(A[i])

return (sorted(A[0])).pop(len(A[0])/2)


Test Cases:



assert find_median([[1,3,5],[2,5,9],[3,6,11]]) == 5
assert find_median([[0,1,1],[2,6,10],[3,5,9]]) == 3
assert find_median([[1,3,4,12,14],[1,6,9,10,15],[0,1,3,3,4]]) == 4


I am able to solve the problem but I wanted to know is there a better approach to solve this problem?









share|improve this question












share|improve this question




share|improve this question








edited May 25 at 8:30









t3chb0t

31.9k54195




31.9k54195









asked May 24 at 17:32









Latika Agarwal

861216




861216







  • 1




    Presumably by "no extra memory is allowed" they actually mean "$O(1)$ extra memory is allowed" (otherwise there would be no solution). But the code in the post uses $Theta(NM)$ extra memory and so does not solve the problem.
    – Gareth Rees
    May 24 at 17:39











  • @Gareth Raees Updated code s.t. constraint "no extra memory allocation required"
    – Latika Agarwal
    May 24 at 18:56







  • 2




    It's not there yet: the calls to A[0].extend(A[i]) have to allocate $Theta(MN)$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the __sizeof__ method to determine how much memory a particular object is using, for example A[0].__sizeof__() tells you the memory used by the list A[0] in bytes.
    – Gareth Rees
    May 25 at 9:07







  • 1




    I'm pretty sure this can be solved without allocating any memory on the heap. You will need some allocation to keep track of some counters (namely $O(n)$). You know on which index the median will be (because you know the dimensions of the matrix). Since all rows are sorted, you can make use of a merge-sort-like iteration of the rows to get to that index. You'll need a "current" counter for each row, though.
    – Vogel612♦
    May 25 at 9:49











  • I have been educated in chat. The solution is not what you have here, but it's not pretty on the time complexity level :/
    – Vogel612♦
    May 25 at 10:28












  • 1




    Presumably by "no extra memory is allowed" they actually mean "$O(1)$ extra memory is allowed" (otherwise there would be no solution). But the code in the post uses $Theta(NM)$ extra memory and so does not solve the problem.
    – Gareth Rees
    May 24 at 17:39











  • @Gareth Raees Updated code s.t. constraint "no extra memory allocation required"
    – Latika Agarwal
    May 24 at 18:56







  • 2




    It's not there yet: the calls to A[0].extend(A[i]) have to allocate $Theta(MN)$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the __sizeof__ method to determine how much memory a particular object is using, for example A[0].__sizeof__() tells you the memory used by the list A[0] in bytes.
    – Gareth Rees
    May 25 at 9:07







  • 1




    I'm pretty sure this can be solved without allocating any memory on the heap. You will need some allocation to keep track of some counters (namely $O(n)$). You know on which index the median will be (because you know the dimensions of the matrix). Since all rows are sorted, you can make use of a merge-sort-like iteration of the rows to get to that index. You'll need a "current" counter for each row, though.
    – Vogel612♦
    May 25 at 9:49











  • I have been educated in chat. The solution is not what you have here, but it's not pretty on the time complexity level :/
    – Vogel612♦
    May 25 at 10:28







1




1




Presumably by "no extra memory is allowed" they actually mean "$O(1)$ extra memory is allowed" (otherwise there would be no solution). But the code in the post uses $Theta(NM)$ extra memory and so does not solve the problem.
– Gareth Rees
May 24 at 17:39





Presumably by "no extra memory is allowed" they actually mean "$O(1)$ extra memory is allowed" (otherwise there would be no solution). But the code in the post uses $Theta(NM)$ extra memory and so does not solve the problem.
– Gareth Rees
May 24 at 17:39













@Gareth Raees Updated code s.t. constraint "no extra memory allocation required"
– Latika Agarwal
May 24 at 18:56





@Gareth Raees Updated code s.t. constraint "no extra memory allocation required"
– Latika Agarwal
May 24 at 18:56





2




2




It's not there yet: the calls to A[0].extend(A[i]) have to allocate $Theta(MN)$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the __sizeof__ method to determine how much memory a particular object is using, for example A[0].__sizeof__() tells you the memory used by the list A[0] in bytes.
– Gareth Rees
May 25 at 9:07





It's not there yet: the calls to A[0].extend(A[i]) have to allocate $Theta(MN)$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the __sizeof__ method to determine how much memory a particular object is using, for example A[0].__sizeof__() tells you the memory used by the list A[0] in bytes.
– Gareth Rees
May 25 at 9:07





1




1




I'm pretty sure this can be solved without allocating any memory on the heap. You will need some allocation to keep track of some counters (namely $O(n)$). You know on which index the median will be (because you know the dimensions of the matrix). Since all rows are sorted, you can make use of a merge-sort-like iteration of the rows to get to that index. You'll need a "current" counter for each row, though.
– Vogel612♦
May 25 at 9:49





I'm pretty sure this can be solved without allocating any memory on the heap. You will need some allocation to keep track of some counters (namely $O(n)$). You know on which index the median will be (because you know the dimensions of the matrix). Since all rows are sorted, you can make use of a merge-sort-like iteration of the rows to get to that index. You'll need a "current" counter for each row, though.
– Vogel612♦
May 25 at 9:49













I have been educated in chat. The solution is not what you have here, but it's not pretty on the time complexity level :/
– Vogel612♦
May 25 at 10:28




I have been educated in chat. The solution is not what you have here, but it's not pretty on the time complexity level :/
– Vogel612♦
May 25 at 10:28










3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted










Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.




for i in range(1,len(A)):
A[0].extend(A[i])


This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.



The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.



How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.



def index_of_smallest(numbers, starting_index):
# Assume numbers is not empty.
smallest, index = numbers[starting_index], starting_index
for i, number in enumerate(numbers[starting_index:], starting_index):
if number < smallest:
smallest, index = number, i

return index


def selection_sort(numbers):
size = len(numbers)
for i in range(size):
index = index_of_smallest(numbers, i)
numbers[i], numbers[index] = numbers[index], numbers[i]

# Don't return anything, we are modifying it inplace.


Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.



def coordinates_of_smallest(matrix, starting_x, starting_y):
smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
for x, row in enumerate(matrix):
for y, cell in enumerate(row):
if x < starting_x or (x == starting_x and y < starting_y):
continue
if cell < smallest:
smallest, smallest_x, smallest_y = cell, x, y

return smallest_x, smallest_y


def selection_sort(matrix):
# Assume the matrix is not empty.
n, m = len(matrix), len(matrix[0])
for x in range(n):
for y in range(m):
smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]

>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]



Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].




There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).



Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.






share|improve this answer























  • this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
    – Maarten Fabré
    May 28 at 8:38










  • @MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.
    – spyr03
    May 29 at 13:00


















up vote
1
down vote














  • Follow PEP8




    • A is a bad variable name, use say matrix.

    • You should remove the space in-front of the function argument.

    • You should put spaces after ,.

    • You don't need the () surrounding sorted.

    • You could add some space around your division.


  • You can use // rather than / to make your code Python 2 and Python 3 compatable.


  • You don't need to use pop, normal indexing will work too.

def find_median(matrix):
"""Returns the median value from given matrix"""
for i in range(1, len(matrix)):
matrix[0].extend(matrix[i])
return sorted(matrix[0])[len(matrix[0]) // 2]



Your code doesn't work as the challenge asks, if you add print(matrix) before return you'll see:



[[1, 3, 5, 2, 6, 9, 3, 6, 9], [2, 6, 9], [3, 6, 9]]





share|improve this answer





















  • I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.
    – spyr03
    May 25 at 14:39

















up vote
1
down vote













making one list



Name your variables correctly. When you look back at your code in a few month's, you will have to look for a few minutes to figure out what you did. You'll have to figure out that A[0] is the list with all the values of the rows appended, and that len(A[0])/2 is the index of the median.



PS. this code will fail in python 3. If you really need floor division, use //, which is clear in both Python 2 and 3



instead of



 for i in range(1,len(A)):
A[0].extend(A[i])


at least you can do



all_elements = 
for row in A:
all_elements.extend(row)


or even better, use itertools.chain.from_iterable



from itertools import chain
all_elements = chain.from_iterable(A)
median_index = len(A) * len(A[0]) // 2
return sorted(all_elements)[median_index]


alternative approach



In your solution, you'll have 3 copies of the whole matrix (+ whatever sorted uses internally):



  1. A[0] contains a copy of each element of the matrix because OP appends them all there.

  2. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row),


  3. sorted generates a third list

By using chain you eliminate the first, so you still remain with 2 copies.



The easiest way to do this without copying the while matrix in a sorted list, is to use a sorted queue of iterators of the different rows, sorted by the next value, and pop and reinsert on this queue until you have the median. I use bisect.insort_left for the insertion in order



from bisect import insort_left

def find_median(matrix):
"""
finds the median in a matrix with sorted rows
"""
median_index = len(matrix) * len(matrix[0]) // 2
iterators = map(iter, matrix)
iterators = deque(sorted((next(it), row, it) for (row, it) in enumerate(iterators)))
idx = 0
while idx <= median_index:
value, row, iterator = iterators.popleft()
try:
item = next(iterator), row, iterator
insort_left(iterators, item)
except StopIteration:
pass
idx += 1
# print(value, idx)
return value


The deque consumes some extra memory, but only $O(N)$ instead of $O(NM)$
This can also be done using a list of length N with the index of the iteration, doing the iteration over the different rows yourself.



The row is added to the item as tiebreaker when there are multiple rows with the same value because iterators are not sortable.



standard library



I found out that heapq.merge does the same as what I do with the deque of iterators, so this works too:



from heapq import merge
from itertools import islice
def find_median_heapq(matrix):
median_index = len(matrix) * len(matrix[0]) // 2
all_items = merge(*matrix)
return next(islice(all_items, median_index, None))





share|improve this answer























  • A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"
    – Peilonrayz
    May 25 at 12:09











  • A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list
    – Maarten Fabré
    May 25 at 12:18










  • Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.
    – Peilonrayz
    May 25 at 12:20










  • Shouldn't "1st" be used instead of "1nd"?
    – Mathias Ettinger
    May 25 at 13:30










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195104%2fcalculate-median-of-a-matrix%23new-answer', 'question_page');

);

Post as a guest






























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.




for i in range(1,len(A)):
A[0].extend(A[i])


This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.



The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.



How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.



def index_of_smallest(numbers, starting_index):
# Assume numbers is not empty.
smallest, index = numbers[starting_index], starting_index
for i, number in enumerate(numbers[starting_index:], starting_index):
if number < smallest:
smallest, index = number, i

return index


def selection_sort(numbers):
size = len(numbers)
for i in range(size):
index = index_of_smallest(numbers, i)
numbers[i], numbers[index] = numbers[index], numbers[i]

# Don't return anything, we are modifying it inplace.


Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.



def coordinates_of_smallest(matrix, starting_x, starting_y):
smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
for x, row in enumerate(matrix):
for y, cell in enumerate(row):
if x < starting_x or (x == starting_x and y < starting_y):
continue
if cell < smallest:
smallest, smallest_x, smallest_y = cell, x, y

return smallest_x, smallest_y


def selection_sort(matrix):
# Assume the matrix is not empty.
n, m = len(matrix), len(matrix[0])
for x in range(n):
for y in range(m):
smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]

>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]



Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].




There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).



Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.






share|improve this answer























  • this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
    – Maarten Fabré
    May 28 at 8:38










  • @MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.
    – spyr03
    May 29 at 13:00















up vote
1
down vote



accepted










Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.




for i in range(1,len(A)):
A[0].extend(A[i])


This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.



The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.



How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.



def index_of_smallest(numbers, starting_index):
# Assume numbers is not empty.
smallest, index = numbers[starting_index], starting_index
for i, number in enumerate(numbers[starting_index:], starting_index):
if number < smallest:
smallest, index = number, i

return index


def selection_sort(numbers):
size = len(numbers)
for i in range(size):
index = index_of_smallest(numbers, i)
numbers[i], numbers[index] = numbers[index], numbers[i]

# Don't return anything, we are modifying it inplace.


Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.



def coordinates_of_smallest(matrix, starting_x, starting_y):
smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
for x, row in enumerate(matrix):
for y, cell in enumerate(row):
if x < starting_x or (x == starting_x and y < starting_y):
continue
if cell < smallest:
smallest, smallest_x, smallest_y = cell, x, y

return smallest_x, smallest_y


def selection_sort(matrix):
# Assume the matrix is not empty.
n, m = len(matrix), len(matrix[0])
for x in range(n):
for y in range(m):
smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]

>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]



Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].




There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).



Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.






share|improve this answer























  • this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
    – Maarten Fabré
    May 28 at 8:38










  • @MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.
    – spyr03
    May 29 at 13:00













up vote
1
down vote



accepted







up vote
1
down vote



accepted






Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.




for i in range(1,len(A)):
A[0].extend(A[i])


This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.



The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.



How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.



def index_of_smallest(numbers, starting_index):
# Assume numbers is not empty.
smallest, index = numbers[starting_index], starting_index
for i, number in enumerate(numbers[starting_index:], starting_index):
if number < smallest:
smallest, index = number, i

return index


def selection_sort(numbers):
size = len(numbers)
for i in range(size):
index = index_of_smallest(numbers, i)
numbers[i], numbers[index] = numbers[index], numbers[i]

# Don't return anything, we are modifying it inplace.


Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.



def coordinates_of_smallest(matrix, starting_x, starting_y):
smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
for x, row in enumerate(matrix):
for y, cell in enumerate(row):
if x < starting_x or (x == starting_x and y < starting_y):
continue
if cell < smallest:
smallest, smallest_x, smallest_y = cell, x, y

return smallest_x, smallest_y


def selection_sort(matrix):
# Assume the matrix is not empty.
n, m = len(matrix), len(matrix[0])
for x in range(n):
for y in range(m):
smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]

>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]



Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].




There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).



Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.






share|improve this answer















Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.




for i in range(1,len(A)):
A[0].extend(A[i])


This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.



The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.



How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.



def index_of_smallest(numbers, starting_index):
# Assume numbers is not empty.
smallest, index = numbers[starting_index], starting_index
for i, number in enumerate(numbers[starting_index:], starting_index):
if number < smallest:
smallest, index = number, i

return index


def selection_sort(numbers):
size = len(numbers)
for i in range(size):
index = index_of_smallest(numbers, i)
numbers[i], numbers[index] = numbers[index], numbers[i]

# Don't return anything, we are modifying it inplace.


Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.



def coordinates_of_smallest(matrix, starting_x, starting_y):
smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
for x, row in enumerate(matrix):
for y, cell in enumerate(row):
if x < starting_x or (x == starting_x and y < starting_y):
continue
if cell < smallest:
smallest, smallest_x, smallest_y = cell, x, y

return smallest_x, smallest_y


def selection_sort(matrix):
# Assume the matrix is not empty.
n, m = len(matrix), len(matrix[0])
for x in range(n):
for y in range(m):
smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]

>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]



Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].




There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).



Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.







share|improve this answer















share|improve this answer



share|improve this answer








edited May 25 at 14:53


























answered May 25 at 14:36









spyr03

1,122418




1,122418











  • this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
    – Maarten Fabré
    May 28 at 8:38










  • @MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.
    – spyr03
    May 29 at 13:00

















  • this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
    – Maarten Fabré
    May 28 at 8:38










  • @MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.
    – spyr03
    May 29 at 13:00
















this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
– Maarten Fabré
May 28 at 8:38




this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
– Maarten Fabré
May 28 at 8:38












@MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.
– spyr03
May 29 at 13:00





@MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.
– spyr03
May 29 at 13:00













up vote
1
down vote














  • Follow PEP8




    • A is a bad variable name, use say matrix.

    • You should remove the space in-front of the function argument.

    • You should put spaces after ,.

    • You don't need the () surrounding sorted.

    • You could add some space around your division.


  • You can use // rather than / to make your code Python 2 and Python 3 compatable.


  • You don't need to use pop, normal indexing will work too.

def find_median(matrix):
"""Returns the median value from given matrix"""
for i in range(1, len(matrix)):
matrix[0].extend(matrix[i])
return sorted(matrix[0])[len(matrix[0]) // 2]



Your code doesn't work as the challenge asks, if you add print(matrix) before return you'll see:



[[1, 3, 5, 2, 6, 9, 3, 6, 9], [2, 6, 9], [3, 6, 9]]





share|improve this answer





















  • I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.
    – spyr03
    May 25 at 14:39














up vote
1
down vote














  • Follow PEP8




    • A is a bad variable name, use say matrix.

    • You should remove the space in-front of the function argument.

    • You should put spaces after ,.

    • You don't need the () surrounding sorted.

    • You could add some space around your division.


  • You can use // rather than / to make your code Python 2 and Python 3 compatable.


  • You don't need to use pop, normal indexing will work too.

def find_median(matrix):
"""Returns the median value from given matrix"""
for i in range(1, len(matrix)):
matrix[0].extend(matrix[i])
return sorted(matrix[0])[len(matrix[0]) // 2]



Your code doesn't work as the challenge asks, if you add print(matrix) before return you'll see:



[[1, 3, 5, 2, 6, 9, 3, 6, 9], [2, 6, 9], [3, 6, 9]]





share|improve this answer





















  • I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.
    – spyr03
    May 25 at 14:39












up vote
1
down vote










up vote
1
down vote










  • Follow PEP8




    • A is a bad variable name, use say matrix.

    • You should remove the space in-front of the function argument.

    • You should put spaces after ,.

    • You don't need the () surrounding sorted.

    • You could add some space around your division.


  • You can use // rather than / to make your code Python 2 and Python 3 compatable.


  • You don't need to use pop, normal indexing will work too.

def find_median(matrix):
"""Returns the median value from given matrix"""
for i in range(1, len(matrix)):
matrix[0].extend(matrix[i])
return sorted(matrix[0])[len(matrix[0]) // 2]



Your code doesn't work as the challenge asks, if you add print(matrix) before return you'll see:



[[1, 3, 5, 2, 6, 9, 3, 6, 9], [2, 6, 9], [3, 6, 9]]





share|improve this answer














  • Follow PEP8




    • A is a bad variable name, use say matrix.

    • You should remove the space in-front of the function argument.

    • You should put spaces after ,.

    • You don't need the () surrounding sorted.

    • You could add some space around your division.


  • You can use // rather than / to make your code Python 2 and Python 3 compatable.


  • You don't need to use pop, normal indexing will work too.

def find_median(matrix):
"""Returns the median value from given matrix"""
for i in range(1, len(matrix)):
matrix[0].extend(matrix[i])
return sorted(matrix[0])[len(matrix[0]) // 2]



Your code doesn't work as the challenge asks, if you add print(matrix) before return you'll see:



[[1, 3, 5, 2, 6, 9, 3, 6, 9], [2, 6, 9], [3, 6, 9]]






share|improve this answer













share|improve this answer



share|improve this answer











answered May 25 at 10:47









Peilonrayz

24.3k336102




24.3k336102











  • I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.
    – spyr03
    May 25 at 14:39
















  • I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.
    – spyr03
    May 25 at 14:39















I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.
– spyr03
May 25 at 14:39




I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.
– spyr03
May 25 at 14:39










up vote
1
down vote













making one list



Name your variables correctly. When you look back at your code in a few month's, you will have to look for a few minutes to figure out what you did. You'll have to figure out that A[0] is the list with all the values of the rows appended, and that len(A[0])/2 is the index of the median.



PS. this code will fail in python 3. If you really need floor division, use //, which is clear in both Python 2 and 3



instead of



 for i in range(1,len(A)):
A[0].extend(A[i])


at least you can do



all_elements = 
for row in A:
all_elements.extend(row)


or even better, use itertools.chain.from_iterable



from itertools import chain
all_elements = chain.from_iterable(A)
median_index = len(A) * len(A[0]) // 2
return sorted(all_elements)[median_index]


alternative approach



In your solution, you'll have 3 copies of the whole matrix (+ whatever sorted uses internally):



  1. A[0] contains a copy of each element of the matrix because OP appends them all there.

  2. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row),


  3. sorted generates a third list

By using chain you eliminate the first, so you still remain with 2 copies.



The easiest way to do this without copying the while matrix in a sorted list, is to use a sorted queue of iterators of the different rows, sorted by the next value, and pop and reinsert on this queue until you have the median. I use bisect.insort_left for the insertion in order



from bisect import insort_left

def find_median(matrix):
"""
finds the median in a matrix with sorted rows
"""
median_index = len(matrix) * len(matrix[0]) // 2
iterators = map(iter, matrix)
iterators = deque(sorted((next(it), row, it) for (row, it) in enumerate(iterators)))
idx = 0
while idx <= median_index:
value, row, iterator = iterators.popleft()
try:
item = next(iterator), row, iterator
insort_left(iterators, item)
except StopIteration:
pass
idx += 1
# print(value, idx)
return value


The deque consumes some extra memory, but only $O(N)$ instead of $O(NM)$
This can also be done using a list of length N with the index of the iteration, doing the iteration over the different rows yourself.



The row is added to the item as tiebreaker when there are multiple rows with the same value because iterators are not sortable.



standard library



I found out that heapq.merge does the same as what I do with the deque of iterators, so this works too:



from heapq import merge
from itertools import islice
def find_median_heapq(matrix):
median_index = len(matrix) * len(matrix[0]) // 2
all_items = merge(*matrix)
return next(islice(all_items, median_index, None))





share|improve this answer























  • A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"
    – Peilonrayz
    May 25 at 12:09











  • A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list
    – Maarten Fabré
    May 25 at 12:18










  • Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.
    – Peilonrayz
    May 25 at 12:20










  • Shouldn't "1st" be used instead of "1nd"?
    – Mathias Ettinger
    May 25 at 13:30














up vote
1
down vote













making one list



Name your variables correctly. When you look back at your code in a few month's, you will have to look for a few minutes to figure out what you did. You'll have to figure out that A[0] is the list with all the values of the rows appended, and that len(A[0])/2 is the index of the median.



PS. this code will fail in python 3. If you really need floor division, use //, which is clear in both Python 2 and 3



instead of



 for i in range(1,len(A)):
A[0].extend(A[i])


at least you can do



all_elements = 
for row in A:
all_elements.extend(row)


or even better, use itertools.chain.from_iterable



from itertools import chain
all_elements = chain.from_iterable(A)
median_index = len(A) * len(A[0]) // 2
return sorted(all_elements)[median_index]


alternative approach



In your solution, you'll have 3 copies of the whole matrix (+ whatever sorted uses internally):



  1. A[0] contains a copy of each element of the matrix because OP appends them all there.

  2. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row),


  3. sorted generates a third list

By using chain you eliminate the first, so you still remain with 2 copies.



The easiest way to do this without copying the while matrix in a sorted list, is to use a sorted queue of iterators of the different rows, sorted by the next value, and pop and reinsert on this queue until you have the median. I use bisect.insort_left for the insertion in order



from bisect import insort_left

def find_median(matrix):
"""
finds the median in a matrix with sorted rows
"""
median_index = len(matrix) * len(matrix[0]) // 2
iterators = map(iter, matrix)
iterators = deque(sorted((next(it), row, it) for (row, it) in enumerate(iterators)))
idx = 0
while idx <= median_index:
value, row, iterator = iterators.popleft()
try:
item = next(iterator), row, iterator
insort_left(iterators, item)
except StopIteration:
pass
idx += 1
# print(value, idx)
return value


The deque consumes some extra memory, but only $O(N)$ instead of $O(NM)$
This can also be done using a list of length N with the index of the iteration, doing the iteration over the different rows yourself.



The row is added to the item as tiebreaker when there are multiple rows with the same value because iterators are not sortable.



standard library



I found out that heapq.merge does the same as what I do with the deque of iterators, so this works too:



from heapq import merge
from itertools import islice
def find_median_heapq(matrix):
median_index = len(matrix) * len(matrix[0]) // 2
all_items = merge(*matrix)
return next(islice(all_items, median_index, None))





share|improve this answer























  • A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"
    – Peilonrayz
    May 25 at 12:09











  • A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list
    – Maarten Fabré
    May 25 at 12:18










  • Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.
    – Peilonrayz
    May 25 at 12:20










  • Shouldn't "1st" be used instead of "1nd"?
    – Mathias Ettinger
    May 25 at 13:30












up vote
1
down vote










up vote
1
down vote









making one list



Name your variables correctly. When you look back at your code in a few month's, you will have to look for a few minutes to figure out what you did. You'll have to figure out that A[0] is the list with all the values of the rows appended, and that len(A[0])/2 is the index of the median.



PS. this code will fail in python 3. If you really need floor division, use //, which is clear in both Python 2 and 3



instead of



 for i in range(1,len(A)):
A[0].extend(A[i])


at least you can do



all_elements = 
for row in A:
all_elements.extend(row)


or even better, use itertools.chain.from_iterable



from itertools import chain
all_elements = chain.from_iterable(A)
median_index = len(A) * len(A[0]) // 2
return sorted(all_elements)[median_index]


alternative approach



In your solution, you'll have 3 copies of the whole matrix (+ whatever sorted uses internally):



  1. A[0] contains a copy of each element of the matrix because OP appends them all there.

  2. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row),


  3. sorted generates a third list

By using chain you eliminate the first, so you still remain with 2 copies.



The easiest way to do this without copying the while matrix in a sorted list, is to use a sorted queue of iterators of the different rows, sorted by the next value, and pop and reinsert on this queue until you have the median. I use bisect.insort_left for the insertion in order



from bisect import insort_left

def find_median(matrix):
"""
finds the median in a matrix with sorted rows
"""
median_index = len(matrix) * len(matrix[0]) // 2
iterators = map(iter, matrix)
iterators = deque(sorted((next(it), row, it) for (row, it) in enumerate(iterators)))
idx = 0
while idx <= median_index:
value, row, iterator = iterators.popleft()
try:
item = next(iterator), row, iterator
insort_left(iterators, item)
except StopIteration:
pass
idx += 1
# print(value, idx)
return value


The deque consumes some extra memory, but only $O(N)$ instead of $O(NM)$
This can also be done using a list of length N with the index of the iteration, doing the iteration over the different rows yourself.



The row is added to the item as tiebreaker when there are multiple rows with the same value because iterators are not sortable.



standard library



I found out that heapq.merge does the same as what I do with the deque of iterators, so this works too:



from heapq import merge
from itertools import islice
def find_median_heapq(matrix):
median_index = len(matrix) * len(matrix[0]) // 2
all_items = merge(*matrix)
return next(islice(all_items, median_index, None))





share|improve this answer















making one list



Name your variables correctly. When you look back at your code in a few month's, you will have to look for a few minutes to figure out what you did. You'll have to figure out that A[0] is the list with all the values of the rows appended, and that len(A[0])/2 is the index of the median.



PS. this code will fail in python 3. If you really need floor division, use //, which is clear in both Python 2 and 3



instead of



 for i in range(1,len(A)):
A[0].extend(A[i])


at least you can do



all_elements = 
for row in A:
all_elements.extend(row)


or even better, use itertools.chain.from_iterable



from itertools import chain
all_elements = chain.from_iterable(A)
median_index = len(A) * len(A[0]) // 2
return sorted(all_elements)[median_index]


alternative approach



In your solution, you'll have 3 copies of the whole matrix (+ whatever sorted uses internally):



  1. A[0] contains a copy of each element of the matrix because OP appends them all there.

  2. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row),


  3. sorted generates a third list

By using chain you eliminate the first, so you still remain with 2 copies.



The easiest way to do this without copying the while matrix in a sorted list, is to use a sorted queue of iterators of the different rows, sorted by the next value, and pop and reinsert on this queue until you have the median. I use bisect.insort_left for the insertion in order



from bisect import insort_left

def find_median(matrix):
"""
finds the median in a matrix with sorted rows
"""
median_index = len(matrix) * len(matrix[0]) // 2
iterators = map(iter, matrix)
iterators = deque(sorted((next(it), row, it) for (row, it) in enumerate(iterators)))
idx = 0
while idx <= median_index:
value, row, iterator = iterators.popleft()
try:
item = next(iterator), row, iterator
insort_left(iterators, item)
except StopIteration:
pass
idx += 1
# print(value, idx)
return value


The deque consumes some extra memory, but only $O(N)$ instead of $O(NM)$
This can also be done using a list of length N with the index of the iteration, doing the iteration over the different rows yourself.



The row is added to the item as tiebreaker when there are multiple rows with the same value because iterators are not sortable.



standard library



I found out that heapq.merge does the same as what I do with the deque of iterators, so this works too:



from heapq import merge
from itertools import islice
def find_median_heapq(matrix):
median_index = len(matrix) * len(matrix[0]) // 2
all_items = merge(*matrix)
return next(islice(all_items, median_index, None))






share|improve this answer















share|improve this answer



share|improve this answer








edited May 28 at 6:41


























answered May 25 at 11:55









Maarten Fabré

3,204214




3,204214











  • A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"
    – Peilonrayz
    May 25 at 12:09











  • A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list
    – Maarten Fabré
    May 25 at 12:18










  • Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.
    – Peilonrayz
    May 25 at 12:20










  • Shouldn't "1st" be used instead of "1nd"?
    – Mathias Ettinger
    May 25 at 13:30
















  • A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"
    – Peilonrayz
    May 25 at 12:09











  • A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list
    – Maarten Fabré
    May 25 at 12:18










  • Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.
    – Peilonrayz
    May 25 at 12:20










  • Shouldn't "1st" be used instead of "1nd"?
    – Mathias Ettinger
    May 25 at 13:30















A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"
– Peilonrayz
May 25 at 12:09





A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"
– Peilonrayz
May 25 at 12:09













A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list
– Maarten Fabré
May 25 at 12:18




A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list
– Maarten Fabré
May 25 at 12:18












Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.
– Peilonrayz
May 25 at 12:20




Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.
– Peilonrayz
May 25 at 12:20












Shouldn't "1st" be used instead of "1nd"?
– Mathias Ettinger
May 25 at 13:30




Shouldn't "1st" be used instead of "1nd"?
– Mathias Ettinger
May 25 at 13:30












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f195104%2fcalculate-median-of-a-matrix%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation