Heap Sort Implementation in Python

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite













Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.




Does this implementation look correct? Do you have any tips for optimizing this?



'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''

#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result

def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)

#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)






share|improve this question

















  • 1




    Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
    – Toby Speight
    May 10 at 10:19










  • Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
    – Johnathan
    May 10 at 23:46
















up vote
2
down vote

favorite













Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.




Does this implementation look correct? Do you have any tips for optimizing this?



'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''

#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result

def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)

#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)






share|improve this question

















  • 1




    Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
    – Toby Speight
    May 10 at 10:19










  • Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
    – Johnathan
    May 10 at 23:46












up vote
2
down vote

favorite









up vote
2
down vote

favorite












Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.




Does this implementation look correct? Do you have any tips for optimizing this?



'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''

#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result

def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)

#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)






share|improve this question














Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.




Does this implementation look correct? Do you have any tips for optimizing this?



'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''

#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result

def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)

#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)








share|improve this question












share|improve this question




share|improve this question








edited May 10 at 7:30









Gareth Rees

41.1k394166




41.1k394166









asked May 9 at 23:01









Johnathan

162




162







  • 1




    Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
    – Toby Speight
    May 10 at 10:19










  • Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
    – Johnathan
    May 10 at 23:46












  • 1




    Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
    – Toby Speight
    May 10 at 10:19










  • Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
    – Johnathan
    May 10 at 23:46







1




1




Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
– Toby Speight
May 10 at 10:19




Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
– Toby Speight
May 10 at 10:19












Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
– Johnathan
May 10 at 23:46




Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
– Johnathan
May 10 at 23:46










2 Answers
2






active

oldest

votes

















up vote
5
down vote



accepted










See §1.9 for the cause of the bug identified by janos.



1. Review



  1. The code only runs on Python 2.7, because it uses the / operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator // to make the code portable.


  2. There are no docstrings. What do these functions do? In the case of heapSort there is a comment that could be turned into a docstring.



  3. heapSort modifies its input. If I run:



    >>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
    >>> heapSort(a)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


    Then a is now empty:



    >>> a



    This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function sorted, or to modify the input in-place, like the method list.sort.



  4. The name struct could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by the struct module. A name like list or seq (for "sequence") would be clearer.


  5. The base case does not depend on x and so it should be outside the loop.



  6. Two values can be swapped without needing a temporary variable, if you use a tuple assignment:



    struct[x-1], struct[x//2] = struct[x//2], struct[x-1]



  7. The numbers x-1 and x//2 are computed three times. The repetition of x-1 can be avoided by changing the loop from:



    range(len(seq),0,-1)


    to:



    range(len(seq) - 1, -1, -1)


    which can be simplified to:



    reversed(range(len(seq)))


    and the repetition of x//2 can be avoided by assigning a local variable:



    parent = (x + 1) // 2


    (It need to be (x + 1) // 2 rather than x // 2 because of the change in the loop range.)



  8. It's conventional use names i, j and so on for index variables (leaving x, y and so on for coordinates).



  9. Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:



    if struct[0] < struct[1]:


    But this special case led to the bug.



    If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:



    if len(struct) < 2:


    would also be unnecessary.



  10. heapify currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so that heapify only does the heap-conversion, leaving the popping to the caller.


2. Revised code



def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result

def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]


3. Performance



The docstring says:




HeapSort. Worst: $n log n$




But it is easy to see that this is not the case. That's because heapify is called $n$ times, and on the $i$th call heapify loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0) which also takes time proportional to the length of struct. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $Θ(n^2)$.



To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.






share|improve this answer























  • Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
    – Johnathan
    May 10 at 23:48











  • @Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
    – Gareth Rees
    May 11 at 8:08

















up vote
4
down vote













The implementation has a bug



For example it will not sort correctly 1, 2, 3.



Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1], and only when that's false it compares struct[x-1] > struct[x/2]?



Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.



When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.



Extract constant conditions from loops



The check on length will have the same result for all iterations of the loop. It should be done before the loop.



Style



There are some style issues I would improve.



  • I find it a bit surprising that a method named heapify removes an element from the list. I would rename that to heapify_and_pop to avoid confusion.


  • The naming and spacing doesn't follow PEP8 conventions.


  • The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.


  • The name struct is not great for a list.






share|improve this answer























  • Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
    – Johnathan
    May 10 at 23:36











  • @Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
    – janos
    May 11 at 5:01










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194062%2fheap-sort-implementation-in-python%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote



accepted










See §1.9 for the cause of the bug identified by janos.



1. Review



  1. The code only runs on Python 2.7, because it uses the / operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator // to make the code portable.


  2. There are no docstrings. What do these functions do? In the case of heapSort there is a comment that could be turned into a docstring.



  3. heapSort modifies its input. If I run:



    >>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
    >>> heapSort(a)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


    Then a is now empty:



    >>> a



    This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function sorted, or to modify the input in-place, like the method list.sort.



  4. The name struct could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by the struct module. A name like list or seq (for "sequence") would be clearer.


  5. The base case does not depend on x and so it should be outside the loop.



  6. Two values can be swapped without needing a temporary variable, if you use a tuple assignment:



    struct[x-1], struct[x//2] = struct[x//2], struct[x-1]



  7. The numbers x-1 and x//2 are computed three times. The repetition of x-1 can be avoided by changing the loop from:



    range(len(seq),0,-1)


    to:



    range(len(seq) - 1, -1, -1)


    which can be simplified to:



    reversed(range(len(seq)))


    and the repetition of x//2 can be avoided by assigning a local variable:



    parent = (x + 1) // 2


    (It need to be (x + 1) // 2 rather than x // 2 because of the change in the loop range.)



  8. It's conventional use names i, j and so on for index variables (leaving x, y and so on for coordinates).



  9. Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:



    if struct[0] < struct[1]:


    But this special case led to the bug.



    If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:



    if len(struct) < 2:


    would also be unnecessary.



  10. heapify currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so that heapify only does the heap-conversion, leaving the popping to the caller.


2. Revised code



def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result

def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]


3. Performance



The docstring says:




HeapSort. Worst: $n log n$




But it is easy to see that this is not the case. That's because heapify is called $n$ times, and on the $i$th call heapify loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0) which also takes time proportional to the length of struct. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $Θ(n^2)$.



To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.






share|improve this answer























  • Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
    – Johnathan
    May 10 at 23:48











  • @Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
    – Gareth Rees
    May 11 at 8:08














up vote
5
down vote



accepted










See §1.9 for the cause of the bug identified by janos.



1. Review



  1. The code only runs on Python 2.7, because it uses the / operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator // to make the code portable.


  2. There are no docstrings. What do these functions do? In the case of heapSort there is a comment that could be turned into a docstring.



  3. heapSort modifies its input. If I run:



    >>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
    >>> heapSort(a)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


    Then a is now empty:



    >>> a



    This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function sorted, or to modify the input in-place, like the method list.sort.



  4. The name struct could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by the struct module. A name like list or seq (for "sequence") would be clearer.


  5. The base case does not depend on x and so it should be outside the loop.



  6. Two values can be swapped without needing a temporary variable, if you use a tuple assignment:



    struct[x-1], struct[x//2] = struct[x//2], struct[x-1]



  7. The numbers x-1 and x//2 are computed three times. The repetition of x-1 can be avoided by changing the loop from:



    range(len(seq),0,-1)


    to:



    range(len(seq) - 1, -1, -1)


    which can be simplified to:



    reversed(range(len(seq)))


    and the repetition of x//2 can be avoided by assigning a local variable:



    parent = (x + 1) // 2


    (It need to be (x + 1) // 2 rather than x // 2 because of the change in the loop range.)



  8. It's conventional use names i, j and so on for index variables (leaving x, y and so on for coordinates).



  9. Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:



    if struct[0] < struct[1]:


    But this special case led to the bug.



    If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:



    if len(struct) < 2:


    would also be unnecessary.



  10. heapify currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so that heapify only does the heap-conversion, leaving the popping to the caller.


2. Revised code



def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result

def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]


3. Performance



The docstring says:




HeapSort. Worst: $n log n$




But it is easy to see that this is not the case. That's because heapify is called $n$ times, and on the $i$th call heapify loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0) which also takes time proportional to the length of struct. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $Θ(n^2)$.



To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.






share|improve this answer























  • Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
    – Johnathan
    May 10 at 23:48











  • @Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
    – Gareth Rees
    May 11 at 8:08












up vote
5
down vote



accepted







up vote
5
down vote



accepted






See §1.9 for the cause of the bug identified by janos.



1. Review



  1. The code only runs on Python 2.7, because it uses the / operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator // to make the code portable.


  2. There are no docstrings. What do these functions do? In the case of heapSort there is a comment that could be turned into a docstring.



  3. heapSort modifies its input. If I run:



    >>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
    >>> heapSort(a)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


    Then a is now empty:



    >>> a



    This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function sorted, or to modify the input in-place, like the method list.sort.



  4. The name struct could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by the struct module. A name like list or seq (for "sequence") would be clearer.


  5. The base case does not depend on x and so it should be outside the loop.



  6. Two values can be swapped without needing a temporary variable, if you use a tuple assignment:



    struct[x-1], struct[x//2] = struct[x//2], struct[x-1]



  7. The numbers x-1 and x//2 are computed three times. The repetition of x-1 can be avoided by changing the loop from:



    range(len(seq),0,-1)


    to:



    range(len(seq) - 1, -1, -1)


    which can be simplified to:



    reversed(range(len(seq)))


    and the repetition of x//2 can be avoided by assigning a local variable:



    parent = (x + 1) // 2


    (It need to be (x + 1) // 2 rather than x // 2 because of the change in the loop range.)



  8. It's conventional use names i, j and so on for index variables (leaving x, y and so on for coordinates).



  9. Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:



    if struct[0] < struct[1]:


    But this special case led to the bug.



    If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:



    if len(struct) < 2:


    would also be unnecessary.



  10. heapify currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so that heapify only does the heap-conversion, leaving the popping to the caller.


2. Revised code



def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result

def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]


3. Performance



The docstring says:




HeapSort. Worst: $n log n$




But it is easy to see that this is not the case. That's because heapify is called $n$ times, and on the $i$th call heapify loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0) which also takes time proportional to the length of struct. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $Θ(n^2)$.



To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.






share|improve this answer















See §1.9 for the cause of the bug identified by janos.



1. Review



  1. The code only runs on Python 2.7, because it uses the / operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator // to make the code portable.


  2. There are no docstrings. What do these functions do? In the case of heapSort there is a comment that could be turned into a docstring.



  3. heapSort modifies its input. If I run:



    >>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
    >>> heapSort(a)
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


    Then a is now empty:



    >>> a



    This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function sorted, or to modify the input in-place, like the method list.sort.



  4. The name struct could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by the struct module. A name like list or seq (for "sequence") would be clearer.


  5. The base case does not depend on x and so it should be outside the loop.



  6. Two values can be swapped without needing a temporary variable, if you use a tuple assignment:



    struct[x-1], struct[x//2] = struct[x//2], struct[x-1]



  7. The numbers x-1 and x//2 are computed three times. The repetition of x-1 can be avoided by changing the loop from:



    range(len(seq),0,-1)


    to:



    range(len(seq) - 1, -1, -1)


    which can be simplified to:



    reversed(range(len(seq)))


    and the repetition of x//2 can be avoided by assigning a local variable:



    parent = (x + 1) // 2


    (It need to be (x + 1) // 2 rather than x // 2 because of the change in the loop range.)



  8. It's conventional use names i, j and so on for index variables (leaving x, y and so on for coordinates).



  9. Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:



    if struct[0] < struct[1]:


    But this special case led to the bug.



    If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:



    if len(struct) < 2:


    would also be unnecessary.



  10. heapify currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so that heapify only does the heap-conversion, leaving the popping to the caller.


2. Revised code



def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result

def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]


3. Performance



The docstring says:




HeapSort. Worst: $n log n$




But it is easy to see that this is not the case. That's because heapify is called $n$ times, and on the $i$th call heapify loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0) which also takes time proportional to the length of struct. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $Θ(n^2)$.



To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.







share|improve this answer















share|improve this answer



share|improve this answer








edited May 10 at 10:53


























answered May 10 at 10:14









Gareth Rees

41.1k394166




41.1k394166











  • Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
    – Johnathan
    May 10 at 23:48











  • @Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
    – Gareth Rees
    May 11 at 8:08
















  • Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
    – Johnathan
    May 10 at 23:48











  • @Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
    – Gareth Rees
    May 11 at 8:08















Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
– Johnathan
May 10 at 23:48





Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
– Johnathan
May 10 at 23:48













@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
– Gareth Rees
May 11 at 8:08




@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
– Gareth Rees
May 11 at 8:08












up vote
4
down vote













The implementation has a bug



For example it will not sort correctly 1, 2, 3.



Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1], and only when that's false it compares struct[x-1] > struct[x/2]?



Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.



When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.



Extract constant conditions from loops



The check on length will have the same result for all iterations of the loop. It should be done before the loop.



Style



There are some style issues I would improve.



  • I find it a bit surprising that a method named heapify removes an element from the list. I would rename that to heapify_and_pop to avoid confusion.


  • The naming and spacing doesn't follow PEP8 conventions.


  • The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.


  • The name struct is not great for a list.






share|improve this answer























  • Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
    – Johnathan
    May 10 at 23:36











  • @Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
    – janos
    May 11 at 5:01














up vote
4
down vote













The implementation has a bug



For example it will not sort correctly 1, 2, 3.



Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1], and only when that's false it compares struct[x-1] > struct[x/2]?



Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.



When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.



Extract constant conditions from loops



The check on length will have the same result for all iterations of the loop. It should be done before the loop.



Style



There are some style issues I would improve.



  • I find it a bit surprising that a method named heapify removes an element from the list. I would rename that to heapify_and_pop to avoid confusion.


  • The naming and spacing doesn't follow PEP8 conventions.


  • The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.


  • The name struct is not great for a list.






share|improve this answer























  • Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
    – Johnathan
    May 10 at 23:36











  • @Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
    – janos
    May 11 at 5:01












up vote
4
down vote










up vote
4
down vote









The implementation has a bug



For example it will not sort correctly 1, 2, 3.



Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1], and only when that's false it compares struct[x-1] > struct[x/2]?



Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.



When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.



Extract constant conditions from loops



The check on length will have the same result for all iterations of the loop. It should be done before the loop.



Style



There are some style issues I would improve.



  • I find it a bit surprising that a method named heapify removes an element from the list. I would rename that to heapify_and_pop to avoid confusion.


  • The naming and spacing doesn't follow PEP8 conventions.


  • The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.


  • The name struct is not great for a list.






share|improve this answer















The implementation has a bug



For example it will not sort correctly 1, 2, 3.



Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1], and only when that's false it compares struct[x-1] > struct[x/2]?



Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.



When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.



Extract constant conditions from loops



The check on length will have the same result for all iterations of the loop. It should be done before the loop.



Style



There are some style issues I would improve.



  • I find it a bit surprising that a method named heapify removes an element from the list. I would rename that to heapify_and_pop to avoid confusion.


  • The naming and spacing doesn't follow PEP8 conventions.


  • The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.


  • The name struct is not great for a list.







share|improve this answer















share|improve this answer



share|improve this answer








edited May 10 at 9:11


























answered May 10 at 9:01









janos

95.4k12119342




95.4k12119342











  • Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
    – Johnathan
    May 10 at 23:36











  • @Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
    – janos
    May 11 at 5:01
















  • Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
    – Johnathan
    May 10 at 23:36











  • @Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
    – janos
    May 11 at 5:01















Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
– Johnathan
May 10 at 23:36





Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
– Johnathan
May 10 at 23:36













@Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
– janos
May 11 at 5:01




@Johnathan the bug affects any list where A[0]< A[1] and A[-1] = max(A), not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
– janos
May 11 at 5:01












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194062%2fheap-sort-implementation-in-python%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Chat program with C++ and SFML

Function to Return a JSON Like Objects Using VBA Collections and Arrays

Will my employers contract hold up in court?