Heap Sort Implementation in Python
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.
Does this implementation look correct? Do you have any tips for optimizing this?
'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''
#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result
def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)
#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)
python algorithm python-2.7 reinventing-the-wheel heap-sort
add a comment |Â
up vote
2
down vote
favorite
Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.
Does this implementation look correct? Do you have any tips for optimizing this?
'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''
#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result
def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)
#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)
python algorithm python-2.7 reinventing-the-wheel heap-sort
1
Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
â Toby Speight
May 10 at 10:19
Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
â Johnathan
May 10 at 23:46
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.
Does this implementation look correct? Do you have any tips for optimizing this?
'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''
#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result
def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)
#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)
python algorithm python-2.7 reinventing-the-wheel heap-sort
Objective: Create a heap sort that returns a unsorted list to sorted from greatest to least.
Does this implementation look correct? Do you have any tips for optimizing this?
'''HeapSort. Best: n log(n) Average: n log(n) Worst: n log(n) '''
#Calls heapify and returns sorted list from greatest to least.
def heapSort(list):
result =
while list:
result.append(heapify(list))
return result
def heapify(struct):
for x in range(len(struct),0,-1):
#Base Case: Returns last integer when sequence reaches below 2
if len(struct) < 2:
return struct.pop(0)
#Compares Tree Root Node vs Children.
if struct[0] < struct[1]:
temp = struct[0]
struct[0] = struct[1]
struct[1] = temp
else:
#Parent: x/2. Child: x-1. Compares child nodes vs parent nodes in Tree.
if struct[x-1] > struct[x/2]:
temp = struct[x-1]
struct[x-1] = struct[x/2]
struct[x/2] = temp
return struct.pop(0)
python algorithm python-2.7 reinventing-the-wheel heap-sort
edited May 10 at 7:30
Gareth Rees
41.1k394166
41.1k394166
asked May 9 at 23:01
Johnathan
162
162
1
Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
â Toby Speight
May 10 at 10:19
Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
â Johnathan
May 10 at 23:46
add a comment |Â
1
Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
â Toby Speight
May 10 at 10:19
Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
â Johnathan
May 10 at 23:46
1
1
Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
â Toby Speight
May 10 at 10:19
Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
â Toby Speight
May 10 at 10:19
Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
â Johnathan
May 10 at 23:46
Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
â Johnathan
May 10 at 23:46
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
5
down vote
accepted
See ç1.9 for the cause of the bug identified by janos.
1. Review
The code only runs on Python 2.7, because it uses the
/
operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator//
to make the code portable.There are no docstrings. What do these functions do? In the case of
heapSort
there is a comment that could be turned into a docstring.heapSort
modifies its input. If I run:>>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
>>> heapSort(a)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]Then
a
is now empty:>>> a
This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function
sorted
, or to modify the input in-place, like the methodlist.sort
.The name
struct
could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by thestruct
module. A name likelist
orseq
(for "sequence") would be clearer.The base case does not depend on
x
and so it should be outside the loop.Two values can be swapped without needing a temporary variable, if you use a tuple assignment:
struct[x-1], struct[x//2] = struct[x//2], struct[x-1]
The numbers
x-1
andx//2
are computed three times. The repetition ofx-1
can be avoided by changing the loop from:range(len(seq),0,-1)
to:
range(len(seq) - 1, -1, -1)
which can be simplified to:
reversed(range(len(seq)))
and the repetition of
x//2
can be avoided by assigning a local variable:parent = (x + 1) // 2
(It need to be
(x + 1) // 2
rather thanx // 2
because of the change in the loop range.)It's conventional use names
i
,j
and so on for index variables (leavingx
,y
and so on for coordinates).Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:
if struct[0] < struct[1]:
But this special case led to the bug.
If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:
if len(struct) < 2:
would also be unnecessary.
heapify
currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so thatheapify
only does the heap-conversion, leaving the popping to the caller.
2. Revised code
def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result
def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]
3. Performance
The docstring says:
HeapSort. Worst: $n log n$
But it is easy to see that this is not the case. That's because heapify
is called $n$ times, and on the $i$th call heapify
loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0)
which also takes time proportional to the length of struct
. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $ÃÂ(n^2)$.
To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.
Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
â Johnathan
May 10 at 23:48
@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
â Gareth Rees
May 11 at 8:08
add a comment |Â
up vote
4
down vote
The implementation has a bug
For example it will not sort correctly 1, 2, 3.
Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1]
, and only when that's false it compares struct[x-1] > struct[x/2]
?
Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.
When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.
Extract constant conditions from loops
The check on length will have the same result for all iterations of the loop. It should be done before the loop.
Style
There are some style issues I would improve.
I find it a bit surprising that a method named
heapify
removes an element from the list. I would rename that toheapify_and_pop
to avoid confusion.The naming and spacing doesn't follow PEP8 conventions.
The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.
The name
struct
is not great for a list.
Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
â Johnathan
May 10 at 23:36
@Johnathan the bug affects any list whereA[0]< A[1]
andA[-1] = max(A)
, not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
â janos
May 11 at 5:01
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
See ç1.9 for the cause of the bug identified by janos.
1. Review
The code only runs on Python 2.7, because it uses the
/
operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator//
to make the code portable.There are no docstrings. What do these functions do? In the case of
heapSort
there is a comment that could be turned into a docstring.heapSort
modifies its input. If I run:>>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
>>> heapSort(a)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]Then
a
is now empty:>>> a
This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function
sorted
, or to modify the input in-place, like the methodlist.sort
.The name
struct
could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by thestruct
module. A name likelist
orseq
(for "sequence") would be clearer.The base case does not depend on
x
and so it should be outside the loop.Two values can be swapped without needing a temporary variable, if you use a tuple assignment:
struct[x-1], struct[x//2] = struct[x//2], struct[x-1]
The numbers
x-1
andx//2
are computed three times. The repetition ofx-1
can be avoided by changing the loop from:range(len(seq),0,-1)
to:
range(len(seq) - 1, -1, -1)
which can be simplified to:
reversed(range(len(seq)))
and the repetition of
x//2
can be avoided by assigning a local variable:parent = (x + 1) // 2
(It need to be
(x + 1) // 2
rather thanx // 2
because of the change in the loop range.)It's conventional use names
i
,j
and so on for index variables (leavingx
,y
and so on for coordinates).Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:
if struct[0] < struct[1]:
But this special case led to the bug.
If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:
if len(struct) < 2:
would also be unnecessary.
heapify
currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so thatheapify
only does the heap-conversion, leaving the popping to the caller.
2. Revised code
def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result
def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]
3. Performance
The docstring says:
HeapSort. Worst: $n log n$
But it is easy to see that this is not the case. That's because heapify
is called $n$ times, and on the $i$th call heapify
loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0)
which also takes time proportional to the length of struct
. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $ÃÂ(n^2)$.
To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.
Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
â Johnathan
May 10 at 23:48
@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
â Gareth Rees
May 11 at 8:08
add a comment |Â
up vote
5
down vote
accepted
See ç1.9 for the cause of the bug identified by janos.
1. Review
The code only runs on Python 2.7, because it uses the
/
operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator//
to make the code portable.There are no docstrings. What do these functions do? In the case of
heapSort
there is a comment that could be turned into a docstring.heapSort
modifies its input. If I run:>>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
>>> heapSort(a)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]Then
a
is now empty:>>> a
This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function
sorted
, or to modify the input in-place, like the methodlist.sort
.The name
struct
could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by thestruct
module. A name likelist
orseq
(for "sequence") would be clearer.The base case does not depend on
x
and so it should be outside the loop.Two values can be swapped without needing a temporary variable, if you use a tuple assignment:
struct[x-1], struct[x//2] = struct[x//2], struct[x-1]
The numbers
x-1
andx//2
are computed three times. The repetition ofx-1
can be avoided by changing the loop from:range(len(seq),0,-1)
to:
range(len(seq) - 1, -1, -1)
which can be simplified to:
reversed(range(len(seq)))
and the repetition of
x//2
can be avoided by assigning a local variable:parent = (x + 1) // 2
(It need to be
(x + 1) // 2
rather thanx // 2
because of the change in the loop range.)It's conventional use names
i
,j
and so on for index variables (leavingx
,y
and so on for coordinates).Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:
if struct[0] < struct[1]:
But this special case led to the bug.
If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:
if len(struct) < 2:
would also be unnecessary.
heapify
currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so thatheapify
only does the heap-conversion, leaving the popping to the caller.
2. Revised code
def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result
def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]
3. Performance
The docstring says:
HeapSort. Worst: $n log n$
But it is easy to see that this is not the case. That's because heapify
is called $n$ times, and on the $i$th call heapify
loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0)
which also takes time proportional to the length of struct
. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $ÃÂ(n^2)$.
To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.
Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
â Johnathan
May 10 at 23:48
@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
â Gareth Rees
May 11 at 8:08
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
See ç1.9 for the cause of the bug identified by janos.
1. Review
The code only runs on Python 2.7, because it uses the
/
operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator//
to make the code portable.There are no docstrings. What do these functions do? In the case of
heapSort
there is a comment that could be turned into a docstring.heapSort
modifies its input. If I run:>>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
>>> heapSort(a)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]Then
a
is now empty:>>> a
This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function
sorted
, or to modify the input in-place, like the methodlist.sort
.The name
struct
could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by thestruct
module. A name likelist
orseq
(for "sequence") would be clearer.The base case does not depend on
x
and so it should be outside the loop.Two values can be swapped without needing a temporary variable, if you use a tuple assignment:
struct[x-1], struct[x//2] = struct[x//2], struct[x-1]
The numbers
x-1
andx//2
are computed three times. The repetition ofx-1
can be avoided by changing the loop from:range(len(seq),0,-1)
to:
range(len(seq) - 1, -1, -1)
which can be simplified to:
reversed(range(len(seq)))
and the repetition of
x//2
can be avoided by assigning a local variable:parent = (x + 1) // 2
(It need to be
(x + 1) // 2
rather thanx // 2
because of the change in the loop range.)It's conventional use names
i
,j
and so on for index variables (leavingx
,y
and so on for coordinates).Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:
if struct[0] < struct[1]:
But this special case led to the bug.
If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:
if len(struct) < 2:
would also be unnecessary.
heapify
currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so thatheapify
only does the heap-conversion, leaving the popping to the caller.
2. Revised code
def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result
def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]
3. Performance
The docstring says:
HeapSort. Worst: $n log n$
But it is easy to see that this is not the case. That's because heapify
is called $n$ times, and on the $i$th call heapify
loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0)
which also takes time proportional to the length of struct
. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $ÃÂ(n^2)$.
To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.
See ç1.9 for the cause of the bug identified by janos.
1. Review
The code only runs on Python 2.7, because it uses the
/
operator for floor division. But support for Python 2.7 is scheduled to end in 2020. It is long past the time when it was a good idea to start writing new projects in Python 2.7. Even if you really can't use Python 3, it is still a good idea to use the floor division operator//
to make the code portable.There are no docstrings. What do these functions do? In the case of
heapSort
there is a comment that could be turned into a docstring.heapSort
modifies its input. If I run:>>> a = [6, 7, 2, 4, 1, 0, 3, 9, 8, 5]
>>> heapSort(a)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]Then
a
is now empty:>>> a
This is likely to be surprising to someone using this code. It would be better to either leave the input unchanged (taking a copy if necessary), like the built-in function
sorted
, or to modify the input in-place, like the methodlist.sort
.The name
struct
could be improved: for experienced Python programmers this name suggests a fixed-size object of the kind that is manipulated by thestruct
module. A name likelist
orseq
(for "sequence") would be clearer.The base case does not depend on
x
and so it should be outside the loop.Two values can be swapped without needing a temporary variable, if you use a tuple assignment:
struct[x-1], struct[x//2] = struct[x//2], struct[x-1]
The numbers
x-1
andx//2
are computed three times. The repetition ofx-1
can be avoided by changing the loop from:range(len(seq),0,-1)
to:
range(len(seq) - 1, -1, -1)
which can be simplified to:
reversed(range(len(seq)))
and the repetition of
x//2
can be avoided by assigning a local variable:parent = (x + 1) // 2
(It need to be
(x + 1) // 2
rather thanx // 2
because of the change in the loop range.)It's conventional use names
i
,j
and so on for index variables (leavingx
,y
and so on for coordinates).Normally the way heaps are implemented is that element $i$ is the parent of elements $2i + 1$ and $2i + 2$. However, in the code in the post, element $i$ is the parent of elements $2i$ and $2i + 1$. But this requires element 0 to be its own parent, which makes no sense! Presumably this is why you felt the need to add the special case:
if struct[0] < struct[1]:
But this special case led to the bug.
If you changed the implementation so that it used the usual convention for the layout of the heap, then this special case would be unnecessary. Moreover, if you did this then the special case:
if len(struct) < 2:
would also be unnecessary.
heapify
currently does two things: it converts the input sequence into a max-heap, in-place, and it pops the first (maximum) item. It would be clearer, I think, to separate these two features, so thatheapify
only does the heap-conversion, leaving the popping to the caller.
2. Revised code
def heap_sorted(iterable):
"Return new list of all items from iterable in descending order."
result =
seq = list(iterable)
while seq:
heapify(seq)
result.append(seq.pop(0))
return result
def heapify(seq):
"Transform sequence into a max-heap, in-place."
for i in reversed(range(1, len(seq))):
parent = (i - 1) // 2
if seq[i] > seq[parent]:
seq[i], seq[parent] = seq[parent], seq[i]
3. Performance
The docstring says:
HeapSort. Worst: $n log n$
But it is easy to see that this is not the case. That's because heapify
is called $n$ times, and on the $i$th call heapify
loops over all the remaining $n-i$ elements of the sequence, and calls struct.pop(0)
which also takes time proportional to the length of struct
. So the overall runtime is proportional to $$ sum_0le i< n n - i = n(n + 1)over2 $$ which is $ÃÂ(n^2)$.
To get the runtime down to $O(n log n)$, you need to heapify just once, and then instead of extracting the maximum item by popping the list, swap the maximum item with the last item in the list, and then apply the "sift down" procedure to repair the heap condition in time $O(log n)$.
edited May 10 at 10:53
answered May 10 at 10:14
Gareth Rees
41.1k394166
41.1k394166
Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
â Johnathan
May 10 at 23:48
@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
â Gareth Rees
May 11 at 8:08
add a comment |Â
Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
â Johnathan
May 10 at 23:48
@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
â Gareth Rees
May 11 at 8:08
Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
â Johnathan
May 10 at 23:48
Incredible answer, thanks! Would your implementation work in n log n time under a worst case scenario?
â Johnathan
May 10 at 23:48
@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
â Gareth Rees
May 11 at 8:08
@Johnathan: I'm not sure what you mean by "my implementation". The revised code in §2 has quadratic runtime, as explained in §3.
â Gareth Rees
May 11 at 8:08
add a comment |Â
up vote
4
down vote
The implementation has a bug
For example it will not sort correctly 1, 2, 3.
Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1]
, and only when that's false it compares struct[x-1] > struct[x/2]
?
Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.
When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.
Extract constant conditions from loops
The check on length will have the same result for all iterations of the loop. It should be done before the loop.
Style
There are some style issues I would improve.
I find it a bit surprising that a method named
heapify
removes an element from the list. I would rename that toheapify_and_pop
to avoid confusion.The naming and spacing doesn't follow PEP8 conventions.
The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.
The name
struct
is not great for a list.
Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
â Johnathan
May 10 at 23:36
@Johnathan the bug affects any list whereA[0]< A[1]
andA[-1] = max(A)
, not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
â janos
May 11 at 5:01
add a comment |Â
up vote
4
down vote
The implementation has a bug
For example it will not sort correctly 1, 2, 3.
Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1]
, and only when that's false it compares struct[x-1] > struct[x/2]
?
Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.
When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.
Extract constant conditions from loops
The check on length will have the same result for all iterations of the loop. It should be done before the loop.
Style
There are some style issues I would improve.
I find it a bit surprising that a method named
heapify
removes an element from the list. I would rename that toheapify_and_pop
to avoid confusion.The naming and spacing doesn't follow PEP8 conventions.
The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.
The name
struct
is not great for a list.
Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
â Johnathan
May 10 at 23:36
@Johnathan the bug affects any list whereA[0]< A[1]
andA[-1] = max(A)
, not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
â janos
May 11 at 5:01
add a comment |Â
up vote
4
down vote
up vote
4
down vote
The implementation has a bug
For example it will not sort correctly 1, 2, 3.
Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1]
, and only when that's false it compares struct[x-1] > struct[x/2]
?
Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.
When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.
Extract constant conditions from loops
The check on length will have the same result for all iterations of the loop. It should be done before the loop.
Style
There are some style issues I would improve.
I find it a bit surprising that a method named
heapify
removes an element from the list. I would rename that toheapify_and_pop
to avoid confusion.The naming and spacing doesn't follow PEP8 conventions.
The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.
The name
struct
is not great for a list.
The implementation has a bug
For example it will not sort correctly 1, 2, 3.
Isn't it suspicious that in the main loop for decreasing indexes, first it compares struct[0] < struct[1]
, and only when that's false it compares struct[x-1] > struct[x/2]
?
Yes it is. For a list of 1000 elements, preforming this comparison 999 times makes no sense. If it's true for the first time, it will swap the first two values and then it will be always false in the remaining iterations.
When something is suspicious like that, it's good to take a closer look. On closer look, notice that if that condition is true for the first time, then the last two elements of the list don't get compared. That's why the example I gave above doesn't get sorted correctly.
Extract constant conditions from loops
The check on length will have the same result for all iterations of the loop. It should be done before the loop.
Style
There are some style issues I would improve.
I find it a bit surprising that a method named
heapify
removes an element from the list. I would rename that toheapify_and_pop
to avoid confusion.The naming and spacing doesn't follow PEP8 conventions.
The code works only with Python 2. It would be trivial to make it work with Python 3 too. That would make it more usable.
The name
struct
is not great for a list.
edited May 10 at 9:11
answered May 10 at 9:01
janos
95.4k12119342
95.4k12119342
Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
â Johnathan
May 10 at 23:36
@Johnathan the bug affects any list whereA[0]< A[1]
andA[-1] = max(A)
, not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
â janos
May 11 at 5:01
add a comment |Â
Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
â Johnathan
May 10 at 23:36
@Johnathan the bug affects any list whereA[0]< A[1]
andA[-1] = max(A)
, not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.
â janos
May 11 at 5:01
Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
â Johnathan
May 10 at 23:36
Interesting. This bug pops up when the list is already sorted. It works on all unsorted combinations. Any sorted combination besides [1, 2, 3] like [1, 2, 3, 4] would also create the same bug I believe on the last two indexes. A simple fix is to check within the heap method if it the list is already sorted, but I understand your point. Thank you for pointing this out.
â Johnathan
May 10 at 23:36
@Johnathan the bug affects any list where
A[0]< A[1]
and A[-1] = max(A)
, not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.â janos
May 11 at 5:01
@Johnathan the bug affects any list where
A[0]< A[1]
and A[-1] = max(A)
, not only sorted lists. And the "simple fix" you describe is an unacceptable dirty hack.â janos
May 11 at 5:01
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194062%2fheap-sort-implementation-in-python%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Does the code function correctly? If not, it isn't ready for review (see help center) and the question may be deleted. If you've tested it, I recommend that you edit to add a summary of the testing (ideally as reproducible unit-test code).
â Toby Speight
May 10 at 10:19
Code does work properly. Bug is for the what-if scenario of inserting a sorted list inside such as 1, 2, 3. Question asks specifically for only a unsorted list, but the tip can be used for optimizing.
â Johnathan
May 10 at 23:46