Median Filter Implementation In Python

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












I implemented median filter in Python in order to remove the salt & pepper noise from the images. It is working fine and all but I would love to hear your advice or opinions.



def median_filter(data, filter_size):
temp =
indexer = filter_size // 2
for i in range(len(data)):

for j in range(len(data[0])):

for z in range(filter_size):
if i + z - indexer < 0 or i + z - indexer > len(data) - 1:
for c in range(filter_size):
temp.append(0)
else:
if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
temp.append(0)
else:
for k in range(filter_size):
temp.append(data[i + z - indexer][j + k - indexer])

temp.sort()
data[i][j] = temp[len(temp) // 2]
temp =
return data






share|improve this question

















  • 1




    Your code doesn't make the complete sense to me. Could you explain a little what it does? Why do you loop through filter_size on two of the three options in the for z in range(filter_size) loop?
    – Peilonrayz
    Apr 13 at 14:19











  • Sure, Median filter is usually used to reduce noise in an image. My code basically takes the array of the image which is corrupted by salt and pepper noise and remove the noise. I loop through "filter_size" because there are different sized median filters, like 3x3, 5x5. So there is more pixels that need to be considered. You can take a look at the GitHub repo, github.com/MeteHanC/Python-Median-Filter
    – MeteHan
    Apr 13 at 14:37











  • could you answer my second question? I don't really understand how you handle those median filters going out of bounds.
    – Peilonrayz
    Apr 13 at 14:40










  • I check the rows and columns going out of bounds with different if statements. So if a columns value is negative, I append zeros to the temp array. Adding zeros in the situation of an out bound, is a common approach. And if row value is not negative , I check the column value. And append zeros if out of bounds again. And about "i+z-indexer", with this way I put the current pixel in the hotspot(basically center of the array) and I check its surrounding pixels. For 3x3 median filter, lets say we got the [0][0] pixel. So program will check [-1][-1], [-1][0], [-1][1], [0][-1], [0][0], [0][1] so on
    – MeteHan
    Apr 13 at 14:47
















up vote
4
down vote

favorite












I implemented median filter in Python in order to remove the salt & pepper noise from the images. It is working fine and all but I would love to hear your advice or opinions.



def median_filter(data, filter_size):
temp =
indexer = filter_size // 2
for i in range(len(data)):

for j in range(len(data[0])):

for z in range(filter_size):
if i + z - indexer < 0 or i + z - indexer > len(data) - 1:
for c in range(filter_size):
temp.append(0)
else:
if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
temp.append(0)
else:
for k in range(filter_size):
temp.append(data[i + z - indexer][j + k - indexer])

temp.sort()
data[i][j] = temp[len(temp) // 2]
temp =
return data






share|improve this question

















  • 1




    Your code doesn't make the complete sense to me. Could you explain a little what it does? Why do you loop through filter_size on two of the three options in the for z in range(filter_size) loop?
    – Peilonrayz
    Apr 13 at 14:19











  • Sure, Median filter is usually used to reduce noise in an image. My code basically takes the array of the image which is corrupted by salt and pepper noise and remove the noise. I loop through "filter_size" because there are different sized median filters, like 3x3, 5x5. So there is more pixels that need to be considered. You can take a look at the GitHub repo, github.com/MeteHanC/Python-Median-Filter
    – MeteHan
    Apr 13 at 14:37











  • could you answer my second question? I don't really understand how you handle those median filters going out of bounds.
    – Peilonrayz
    Apr 13 at 14:40










  • I check the rows and columns going out of bounds with different if statements. So if a columns value is negative, I append zeros to the temp array. Adding zeros in the situation of an out bound, is a common approach. And if row value is not negative , I check the column value. And append zeros if out of bounds again. And about "i+z-indexer", with this way I put the current pixel in the hotspot(basically center of the array) and I check its surrounding pixels. For 3x3 median filter, lets say we got the [0][0] pixel. So program will check [-1][-1], [-1][0], [-1][1], [0][-1], [0][0], [0][1] so on
    – MeteHan
    Apr 13 at 14:47












up vote
4
down vote

favorite









up vote
4
down vote

favorite











I implemented median filter in Python in order to remove the salt & pepper noise from the images. It is working fine and all but I would love to hear your advice or opinions.



def median_filter(data, filter_size):
temp =
indexer = filter_size // 2
for i in range(len(data)):

for j in range(len(data[0])):

for z in range(filter_size):
if i + z - indexer < 0 or i + z - indexer > len(data) - 1:
for c in range(filter_size):
temp.append(0)
else:
if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
temp.append(0)
else:
for k in range(filter_size):
temp.append(data[i + z - indexer][j + k - indexer])

temp.sort()
data[i][j] = temp[len(temp) // 2]
temp =
return data






share|improve this question













I implemented median filter in Python in order to remove the salt & pepper noise from the images. It is working fine and all but I would love to hear your advice or opinions.



def median_filter(data, filter_size):
temp =
indexer = filter_size // 2
for i in range(len(data)):

for j in range(len(data[0])):

for z in range(filter_size):
if i + z - indexer < 0 or i + z - indexer > len(data) - 1:
for c in range(filter_size):
temp.append(0)
else:
if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
temp.append(0)
else:
for k in range(filter_size):
temp.append(data[i + z - indexer][j + k - indexer])

temp.sort()
data[i][j] = temp[len(temp) // 2]
temp =
return data








share|improve this question












share|improve this question




share|improve this question








edited Apr 14 at 6:58









200_success

123k14142399




123k14142399









asked Apr 13 at 14:04









MeteHan

615




615







  • 1




    Your code doesn't make the complete sense to me. Could you explain a little what it does? Why do you loop through filter_size on two of the three options in the for z in range(filter_size) loop?
    – Peilonrayz
    Apr 13 at 14:19











  • Sure, Median filter is usually used to reduce noise in an image. My code basically takes the array of the image which is corrupted by salt and pepper noise and remove the noise. I loop through "filter_size" because there are different sized median filters, like 3x3, 5x5. So there is more pixels that need to be considered. You can take a look at the GitHub repo, github.com/MeteHanC/Python-Median-Filter
    – MeteHan
    Apr 13 at 14:37











  • could you answer my second question? I don't really understand how you handle those median filters going out of bounds.
    – Peilonrayz
    Apr 13 at 14:40










  • I check the rows and columns going out of bounds with different if statements. So if a columns value is negative, I append zeros to the temp array. Adding zeros in the situation of an out bound, is a common approach. And if row value is not negative , I check the column value. And append zeros if out of bounds again. And about "i+z-indexer", with this way I put the current pixel in the hotspot(basically center of the array) and I check its surrounding pixels. For 3x3 median filter, lets say we got the [0][0] pixel. So program will check [-1][-1], [-1][0], [-1][1], [0][-1], [0][0], [0][1] so on
    – MeteHan
    Apr 13 at 14:47












  • 1




    Your code doesn't make the complete sense to me. Could you explain a little what it does? Why do you loop through filter_size on two of the three options in the for z in range(filter_size) loop?
    – Peilonrayz
    Apr 13 at 14:19











  • Sure, Median filter is usually used to reduce noise in an image. My code basically takes the array of the image which is corrupted by salt and pepper noise and remove the noise. I loop through "filter_size" because there are different sized median filters, like 3x3, 5x5. So there is more pixels that need to be considered. You can take a look at the GitHub repo, github.com/MeteHanC/Python-Median-Filter
    – MeteHan
    Apr 13 at 14:37











  • could you answer my second question? I don't really understand how you handle those median filters going out of bounds.
    – Peilonrayz
    Apr 13 at 14:40










  • I check the rows and columns going out of bounds with different if statements. So if a columns value is negative, I append zeros to the temp array. Adding zeros in the situation of an out bound, is a common approach. And if row value is not negative , I check the column value. And append zeros if out of bounds again. And about "i+z-indexer", with this way I put the current pixel in the hotspot(basically center of the array) and I check its surrounding pixels. For 3x3 median filter, lets say we got the [0][0] pixel. So program will check [-1][-1], [-1][0], [-1][1], [0][-1], [0][0], [0][1] so on
    – MeteHan
    Apr 13 at 14:47







1




1




Your code doesn't make the complete sense to me. Could you explain a little what it does? Why do you loop through filter_size on two of the three options in the for z in range(filter_size) loop?
– Peilonrayz
Apr 13 at 14:19





Your code doesn't make the complete sense to me. Could you explain a little what it does? Why do you loop through filter_size on two of the three options in the for z in range(filter_size) loop?
– Peilonrayz
Apr 13 at 14:19













Sure, Median filter is usually used to reduce noise in an image. My code basically takes the array of the image which is corrupted by salt and pepper noise and remove the noise. I loop through "filter_size" because there are different sized median filters, like 3x3, 5x5. So there is more pixels that need to be considered. You can take a look at the GitHub repo, github.com/MeteHanC/Python-Median-Filter
– MeteHan
Apr 13 at 14:37





Sure, Median filter is usually used to reduce noise in an image. My code basically takes the array of the image which is corrupted by salt and pepper noise and remove the noise. I loop through "filter_size" because there are different sized median filters, like 3x3, 5x5. So there is more pixels that need to be considered. You can take a look at the GitHub repo, github.com/MeteHanC/Python-Median-Filter
– MeteHan
Apr 13 at 14:37













could you answer my second question? I don't really understand how you handle those median filters going out of bounds.
– Peilonrayz
Apr 13 at 14:40




could you answer my second question? I don't really understand how you handle those median filters going out of bounds.
– Peilonrayz
Apr 13 at 14:40












I check the rows and columns going out of bounds with different if statements. So if a columns value is negative, I append zeros to the temp array. Adding zeros in the situation of an out bound, is a common approach. And if row value is not negative , I check the column value. And append zeros if out of bounds again. And about "i+z-indexer", with this way I put the current pixel in the hotspot(basically center of the array) and I check its surrounding pixels. For 3x3 median filter, lets say we got the [0][0] pixel. So program will check [-1][-1], [-1][0], [-1][1], [0][-1], [0][0], [0][1] so on
– MeteHan
Apr 13 at 14:47




I check the rows and columns going out of bounds with different if statements. So if a columns value is negative, I append zeros to the temp array. Adding zeros in the situation of an out bound, is a common approach. And if row value is not negative , I check the column value. And append zeros if out of bounds again. And about "i+z-indexer", with this way I put the current pixel in the hotspot(basically center of the array) and I check its surrounding pixels. For 3x3 median filter, lets say we got the [0][0] pixel. So program will check [-1][-1], [-1][0], [-1][1], [0][-1], [0][0], [0][1] so on
– MeteHan
Apr 13 at 14:47










2 Answers
2






active

oldest

votes

















up vote
3
down vote



accepted










Code review



Peilonrays points out a mixup with the out-of-bounds testing that is valid. The statement if j ... must be within the loop for k .... One of the results is that you add a different number of elements to temp depending on which boundary you're at. But there are better ways to avoid out-of-bounds indexing, see below.



Your biggest bug, however, is that you write the result of the filter into the image you are processing. Median filtering cannot be done in-place. When you update data[i][j], you'll be reading the updated value to compute data[i][j+1]. You need to allocate a new image, and write the result there.



I would suggest not adding zeros for out-of-bounds pixels at all, because it introduces a bias to the output pixels near the boundary. The clearest example is for the pixels close to any of the corners. At the corner pixel, with a 3x3 kernel, you'll have 4 image pixels covered by the kernel. Adding 5 zeros for the out-of-bounds pixels guarantees that the output will be 0. For larger kernels this happens in more pixels of course. Instead, it is easy to simply remove the temp.append(0) statements, leading to a non-biased result. Other options are to read values from elsewhere in the image, for example mirroring the image at the boundary or extending the image by extrapolation. For median filtering this has little effect, IMO.



You set temp = at the very beginning of your function, then reset it when you're done using it, in preparation for the next loop. Instead, initialize it once inside the main double-loop over the pixels:



for i in range(len(data)):
for j in range(len(data[0])):
temp =
# ...


You're looping over i and j as image indices, then over z and c or k for filter kernel indices. c and k have the same function in two different loops, I would suggest using the same variable for that. z doesn't really fit in with either c or k. I would pick two names that are related in the way that i and j are, such as m and n. The choice of variable names is always very limited if it's just one letter. Using longer names would make this code clearer: for example img_column, img_row, kernel_column, kernel_row.




Out-of-bounds checking



This concludes my comments on your code. Now I'd like to offer some alternatives for out-of-bounds checking. These tests are rather expensive when performed for every pixel -- it's a test that is done $n k$ times (with $n$ pixels in the image and $k$ pixels in the kernel). Maybe in Python the added cost is relatively small, it's an interpreted language after all, but for a compiled language these tests can easily amount to doubling processing time. There are 3 common alternatives that I know of. I will use border = filter_size // 2, and presume filter_size is odd. It is possible to adjust all 3 methods to even-sized filters.



Separate loops for image border pixels



The idea here is that the loop over the first and last border pixels along each dimension are handled separately from the loop over the core of the image. This avoids all tests. But it does require some code duplication (all in the name of speed!).



for i in range(border):
# here we loop over the kernel from -i to border+1
for i in range(border, len(data)-border):
# here we loop over the full kernel
for i in range(len(data)-border, len(data)):
# here we loop over the kernel from -border to len(data)-i


Of course, within each of those loops, a similar set of 3 loops is necessary to loop over j. The filter logic is thus repeated 9 times. In a compiled language, where this is the most efficient method, code duplication can be avoided with inlined functions or macros. I don't know how a Python function call compares to a bunch of tests for out-of-bounds access, so can't comment on the usefulness of this method in Python.



A separate code path for border pixels



The idea here is to do out-of-bounds checking only for those pixels that are close to the image boundary. For pixels within the border, you use a version of the filtering logic with out-of-bounds checking. For the pixels in the core of the image (which is the big majority of pixels), you use a second version of the logic without out-of-bounds checking.



for i in range(len(data)):
i_border = i < border or i >= len(data)-border
for j in range(len(data[0])):
j_border = j < border or j >= len(data)-border
if i_border or j_border:
# filtering with bounds checking
else:
# filtering without bounds checking


Padding the image



The simplest solution, and also the most flexible one, is to create a temporary image that is larger than the input image by 2*border along each dimension, and copy the input image into it. The "new" pixels can be filled with zeros (to replicate what OP intended to do), or with values taken from the input image (for example by mirroring the image at the boundary or extrapolating in some other way).



The filter now never needs to check for out-of-bounds reads. When the filtering kernel is placed over any of the input image pixels, all samples fall within the padded image.



Since for this type of filtering it is necessary to create a new output image anyway (it is not possible to compute it in-place, as I mentioned before), this is not a huge cost: the original input image can now be re-used as output image.



This solution leads to the simplest code, allows for all sorts of boundary extension methods without complicating the filtering code, and often results in the fastest code too.






share|improve this answer




























    up vote
    3
    down vote













    You seem to have a few bugs.





    1. if i + z - indexer < 0 or i + z - indexer > len(data) - 1:



      If i and z are 0, where indexer is 1, then you'll have 0 + 0 - 1 < 0. This would mean that you'd replace the data in (-1, j), (0, j) and (1, j) to 0. Since 0 and 1 probably do contain data this is just plain wrong.





    2. if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
      temp.append(0)



      This removes some data, meaning that the median is shifted. Say you should have (0, 0, 0, 1, 2, 3), however you removed the first three because of this you'd have (0, 1, 2, 3). Now the median is 1 rather than 0.




    Your code would be simpler if you:



    1. Made a window list, that contained all the indexes that you want to move to.

    2. Have an if to check if the data in that index is out of bounds.

    3. If it's out of bounds default to 0.

    4. If it's not out of bounds use the data.

    This could become:



    def median_filter(data, filter_size):
    temp =
    indexer = filter_size // 2
    window = [
    (i, j)
    for i in range(-indexer, filter_size-indexer)
    for j in range(-indexer, filter_size-indexer)
    ]
    index = len(window) // 2
    for i in range(len(data)):
    for j in range(len(data[0])):
    data[i][j] = sorted(
    0 if (
    min(i+a, j+b) < 0
    or len(data) <= i+a
    or len(data[0]) <= j+b
    ) else data[i+a][j+b]
    for a, b in window
    )[index]
    return data





    share|improve this answer





















    • That is the whole point, replacing those values with 0. As I said adding zeros, it is very common approach. Probably I couldnt make myself clear to you in the comment section, sorry about that Thank you for your comments
      – MeteHan
      Apr 13 at 15:15











    • @M.Han Please read my answer again. You're changing too many values to black. If changing valid data to black was 'the whole point', then you'd want to just change the code to data[i][j] = 0...
      – Peilonrayz
      Apr 13 at 15:17










    • @M.Han Is (0, 0) meant to be replaced to black?
      – Peilonrayz
      Apr 13 at 15:23










    • The window list not only makes the code simpler, it also allows for arbitrary window shapes. A circular window has much nicer properties than a square one. It is easy to implement that with this code. Nice!
      – Cris Luengo
      Apr 14 at 6:36










    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191974%2fmedian-filter-implementation-in-python%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote



    accepted










    Code review



    Peilonrays points out a mixup with the out-of-bounds testing that is valid. The statement if j ... must be within the loop for k .... One of the results is that you add a different number of elements to temp depending on which boundary you're at. But there are better ways to avoid out-of-bounds indexing, see below.



    Your biggest bug, however, is that you write the result of the filter into the image you are processing. Median filtering cannot be done in-place. When you update data[i][j], you'll be reading the updated value to compute data[i][j+1]. You need to allocate a new image, and write the result there.



    I would suggest not adding zeros for out-of-bounds pixels at all, because it introduces a bias to the output pixels near the boundary. The clearest example is for the pixels close to any of the corners. At the corner pixel, with a 3x3 kernel, you'll have 4 image pixels covered by the kernel. Adding 5 zeros for the out-of-bounds pixels guarantees that the output will be 0. For larger kernels this happens in more pixels of course. Instead, it is easy to simply remove the temp.append(0) statements, leading to a non-biased result. Other options are to read values from elsewhere in the image, for example mirroring the image at the boundary or extending the image by extrapolation. For median filtering this has little effect, IMO.



    You set temp = at the very beginning of your function, then reset it when you're done using it, in preparation for the next loop. Instead, initialize it once inside the main double-loop over the pixels:



    for i in range(len(data)):
    for j in range(len(data[0])):
    temp =
    # ...


    You're looping over i and j as image indices, then over z and c or k for filter kernel indices. c and k have the same function in two different loops, I would suggest using the same variable for that. z doesn't really fit in with either c or k. I would pick two names that are related in the way that i and j are, such as m and n. The choice of variable names is always very limited if it's just one letter. Using longer names would make this code clearer: for example img_column, img_row, kernel_column, kernel_row.




    Out-of-bounds checking



    This concludes my comments on your code. Now I'd like to offer some alternatives for out-of-bounds checking. These tests are rather expensive when performed for every pixel -- it's a test that is done $n k$ times (with $n$ pixels in the image and $k$ pixels in the kernel). Maybe in Python the added cost is relatively small, it's an interpreted language after all, but for a compiled language these tests can easily amount to doubling processing time. There are 3 common alternatives that I know of. I will use border = filter_size // 2, and presume filter_size is odd. It is possible to adjust all 3 methods to even-sized filters.



    Separate loops for image border pixels



    The idea here is that the loop over the first and last border pixels along each dimension are handled separately from the loop over the core of the image. This avoids all tests. But it does require some code duplication (all in the name of speed!).



    for i in range(border):
    # here we loop over the kernel from -i to border+1
    for i in range(border, len(data)-border):
    # here we loop over the full kernel
    for i in range(len(data)-border, len(data)):
    # here we loop over the kernel from -border to len(data)-i


    Of course, within each of those loops, a similar set of 3 loops is necessary to loop over j. The filter logic is thus repeated 9 times. In a compiled language, where this is the most efficient method, code duplication can be avoided with inlined functions or macros. I don't know how a Python function call compares to a bunch of tests for out-of-bounds access, so can't comment on the usefulness of this method in Python.



    A separate code path for border pixels



    The idea here is to do out-of-bounds checking only for those pixels that are close to the image boundary. For pixels within the border, you use a version of the filtering logic with out-of-bounds checking. For the pixels in the core of the image (which is the big majority of pixels), you use a second version of the logic without out-of-bounds checking.



    for i in range(len(data)):
    i_border = i < border or i >= len(data)-border
    for j in range(len(data[0])):
    j_border = j < border or j >= len(data)-border
    if i_border or j_border:
    # filtering with bounds checking
    else:
    # filtering without bounds checking


    Padding the image



    The simplest solution, and also the most flexible one, is to create a temporary image that is larger than the input image by 2*border along each dimension, and copy the input image into it. The "new" pixels can be filled with zeros (to replicate what OP intended to do), or with values taken from the input image (for example by mirroring the image at the boundary or extrapolating in some other way).



    The filter now never needs to check for out-of-bounds reads. When the filtering kernel is placed over any of the input image pixels, all samples fall within the padded image.



    Since for this type of filtering it is necessary to create a new output image anyway (it is not possible to compute it in-place, as I mentioned before), this is not a huge cost: the original input image can now be re-used as output image.



    This solution leads to the simplest code, allows for all sorts of boundary extension methods without complicating the filtering code, and often results in the fastest code too.






    share|improve this answer

























      up vote
      3
      down vote



      accepted










      Code review



      Peilonrays points out a mixup with the out-of-bounds testing that is valid. The statement if j ... must be within the loop for k .... One of the results is that you add a different number of elements to temp depending on which boundary you're at. But there are better ways to avoid out-of-bounds indexing, see below.



      Your biggest bug, however, is that you write the result of the filter into the image you are processing. Median filtering cannot be done in-place. When you update data[i][j], you'll be reading the updated value to compute data[i][j+1]. You need to allocate a new image, and write the result there.



      I would suggest not adding zeros for out-of-bounds pixels at all, because it introduces a bias to the output pixels near the boundary. The clearest example is for the pixels close to any of the corners. At the corner pixel, with a 3x3 kernel, you'll have 4 image pixels covered by the kernel. Adding 5 zeros for the out-of-bounds pixels guarantees that the output will be 0. For larger kernels this happens in more pixels of course. Instead, it is easy to simply remove the temp.append(0) statements, leading to a non-biased result. Other options are to read values from elsewhere in the image, for example mirroring the image at the boundary or extending the image by extrapolation. For median filtering this has little effect, IMO.



      You set temp = at the very beginning of your function, then reset it when you're done using it, in preparation for the next loop. Instead, initialize it once inside the main double-loop over the pixels:



      for i in range(len(data)):
      for j in range(len(data[0])):
      temp =
      # ...


      You're looping over i and j as image indices, then over z and c or k for filter kernel indices. c and k have the same function in two different loops, I would suggest using the same variable for that. z doesn't really fit in with either c or k. I would pick two names that are related in the way that i and j are, such as m and n. The choice of variable names is always very limited if it's just one letter. Using longer names would make this code clearer: for example img_column, img_row, kernel_column, kernel_row.




      Out-of-bounds checking



      This concludes my comments on your code. Now I'd like to offer some alternatives for out-of-bounds checking. These tests are rather expensive when performed for every pixel -- it's a test that is done $n k$ times (with $n$ pixels in the image and $k$ pixels in the kernel). Maybe in Python the added cost is relatively small, it's an interpreted language after all, but for a compiled language these tests can easily amount to doubling processing time. There are 3 common alternatives that I know of. I will use border = filter_size // 2, and presume filter_size is odd. It is possible to adjust all 3 methods to even-sized filters.



      Separate loops for image border pixels



      The idea here is that the loop over the first and last border pixels along each dimension are handled separately from the loop over the core of the image. This avoids all tests. But it does require some code duplication (all in the name of speed!).



      for i in range(border):
      # here we loop over the kernel from -i to border+1
      for i in range(border, len(data)-border):
      # here we loop over the full kernel
      for i in range(len(data)-border, len(data)):
      # here we loop over the kernel from -border to len(data)-i


      Of course, within each of those loops, a similar set of 3 loops is necessary to loop over j. The filter logic is thus repeated 9 times. In a compiled language, where this is the most efficient method, code duplication can be avoided with inlined functions or macros. I don't know how a Python function call compares to a bunch of tests for out-of-bounds access, so can't comment on the usefulness of this method in Python.



      A separate code path for border pixels



      The idea here is to do out-of-bounds checking only for those pixels that are close to the image boundary. For pixels within the border, you use a version of the filtering logic with out-of-bounds checking. For the pixels in the core of the image (which is the big majority of pixels), you use a second version of the logic without out-of-bounds checking.



      for i in range(len(data)):
      i_border = i < border or i >= len(data)-border
      for j in range(len(data[0])):
      j_border = j < border or j >= len(data)-border
      if i_border or j_border:
      # filtering with bounds checking
      else:
      # filtering without bounds checking


      Padding the image



      The simplest solution, and also the most flexible one, is to create a temporary image that is larger than the input image by 2*border along each dimension, and copy the input image into it. The "new" pixels can be filled with zeros (to replicate what OP intended to do), or with values taken from the input image (for example by mirroring the image at the boundary or extrapolating in some other way).



      The filter now never needs to check for out-of-bounds reads. When the filtering kernel is placed over any of the input image pixels, all samples fall within the padded image.



      Since for this type of filtering it is necessary to create a new output image anyway (it is not possible to compute it in-place, as I mentioned before), this is not a huge cost: the original input image can now be re-used as output image.



      This solution leads to the simplest code, allows for all sorts of boundary extension methods without complicating the filtering code, and often results in the fastest code too.






      share|improve this answer























        up vote
        3
        down vote



        accepted







        up vote
        3
        down vote



        accepted






        Code review



        Peilonrays points out a mixup with the out-of-bounds testing that is valid. The statement if j ... must be within the loop for k .... One of the results is that you add a different number of elements to temp depending on which boundary you're at. But there are better ways to avoid out-of-bounds indexing, see below.



        Your biggest bug, however, is that you write the result of the filter into the image you are processing. Median filtering cannot be done in-place. When you update data[i][j], you'll be reading the updated value to compute data[i][j+1]. You need to allocate a new image, and write the result there.



        I would suggest not adding zeros for out-of-bounds pixels at all, because it introduces a bias to the output pixels near the boundary. The clearest example is for the pixels close to any of the corners. At the corner pixel, with a 3x3 kernel, you'll have 4 image pixels covered by the kernel. Adding 5 zeros for the out-of-bounds pixels guarantees that the output will be 0. For larger kernels this happens in more pixels of course. Instead, it is easy to simply remove the temp.append(0) statements, leading to a non-biased result. Other options are to read values from elsewhere in the image, for example mirroring the image at the boundary or extending the image by extrapolation. For median filtering this has little effect, IMO.



        You set temp = at the very beginning of your function, then reset it when you're done using it, in preparation for the next loop. Instead, initialize it once inside the main double-loop over the pixels:



        for i in range(len(data)):
        for j in range(len(data[0])):
        temp =
        # ...


        You're looping over i and j as image indices, then over z and c or k for filter kernel indices. c and k have the same function in two different loops, I would suggest using the same variable for that. z doesn't really fit in with either c or k. I would pick two names that are related in the way that i and j are, such as m and n. The choice of variable names is always very limited if it's just one letter. Using longer names would make this code clearer: for example img_column, img_row, kernel_column, kernel_row.




        Out-of-bounds checking



        This concludes my comments on your code. Now I'd like to offer some alternatives for out-of-bounds checking. These tests are rather expensive when performed for every pixel -- it's a test that is done $n k$ times (with $n$ pixels in the image and $k$ pixels in the kernel). Maybe in Python the added cost is relatively small, it's an interpreted language after all, but for a compiled language these tests can easily amount to doubling processing time. There are 3 common alternatives that I know of. I will use border = filter_size // 2, and presume filter_size is odd. It is possible to adjust all 3 methods to even-sized filters.



        Separate loops for image border pixels



        The idea here is that the loop over the first and last border pixels along each dimension are handled separately from the loop over the core of the image. This avoids all tests. But it does require some code duplication (all in the name of speed!).



        for i in range(border):
        # here we loop over the kernel from -i to border+1
        for i in range(border, len(data)-border):
        # here we loop over the full kernel
        for i in range(len(data)-border, len(data)):
        # here we loop over the kernel from -border to len(data)-i


        Of course, within each of those loops, a similar set of 3 loops is necessary to loop over j. The filter logic is thus repeated 9 times. In a compiled language, where this is the most efficient method, code duplication can be avoided with inlined functions or macros. I don't know how a Python function call compares to a bunch of tests for out-of-bounds access, so can't comment on the usefulness of this method in Python.



        A separate code path for border pixels



        The idea here is to do out-of-bounds checking only for those pixels that are close to the image boundary. For pixels within the border, you use a version of the filtering logic with out-of-bounds checking. For the pixels in the core of the image (which is the big majority of pixels), you use a second version of the logic without out-of-bounds checking.



        for i in range(len(data)):
        i_border = i < border or i >= len(data)-border
        for j in range(len(data[0])):
        j_border = j < border or j >= len(data)-border
        if i_border or j_border:
        # filtering with bounds checking
        else:
        # filtering without bounds checking


        Padding the image



        The simplest solution, and also the most flexible one, is to create a temporary image that is larger than the input image by 2*border along each dimension, and copy the input image into it. The "new" pixels can be filled with zeros (to replicate what OP intended to do), or with values taken from the input image (for example by mirroring the image at the boundary or extrapolating in some other way).



        The filter now never needs to check for out-of-bounds reads. When the filtering kernel is placed over any of the input image pixels, all samples fall within the padded image.



        Since for this type of filtering it is necessary to create a new output image anyway (it is not possible to compute it in-place, as I mentioned before), this is not a huge cost: the original input image can now be re-used as output image.



        This solution leads to the simplest code, allows for all sorts of boundary extension methods without complicating the filtering code, and often results in the fastest code too.






        share|improve this answer













        Code review



        Peilonrays points out a mixup with the out-of-bounds testing that is valid. The statement if j ... must be within the loop for k .... One of the results is that you add a different number of elements to temp depending on which boundary you're at. But there are better ways to avoid out-of-bounds indexing, see below.



        Your biggest bug, however, is that you write the result of the filter into the image you are processing. Median filtering cannot be done in-place. When you update data[i][j], you'll be reading the updated value to compute data[i][j+1]. You need to allocate a new image, and write the result there.



        I would suggest not adding zeros for out-of-bounds pixels at all, because it introduces a bias to the output pixels near the boundary. The clearest example is for the pixels close to any of the corners. At the corner pixel, with a 3x3 kernel, you'll have 4 image pixels covered by the kernel. Adding 5 zeros for the out-of-bounds pixels guarantees that the output will be 0. For larger kernels this happens in more pixels of course. Instead, it is easy to simply remove the temp.append(0) statements, leading to a non-biased result. Other options are to read values from elsewhere in the image, for example mirroring the image at the boundary or extending the image by extrapolation. For median filtering this has little effect, IMO.



        You set temp = at the very beginning of your function, then reset it when you're done using it, in preparation for the next loop. Instead, initialize it once inside the main double-loop over the pixels:



        for i in range(len(data)):
        for j in range(len(data[0])):
        temp =
        # ...


        You're looping over i and j as image indices, then over z and c or k for filter kernel indices. c and k have the same function in two different loops, I would suggest using the same variable for that. z doesn't really fit in with either c or k. I would pick two names that are related in the way that i and j are, such as m and n. The choice of variable names is always very limited if it's just one letter. Using longer names would make this code clearer: for example img_column, img_row, kernel_column, kernel_row.




        Out-of-bounds checking



        This concludes my comments on your code. Now I'd like to offer some alternatives for out-of-bounds checking. These tests are rather expensive when performed for every pixel -- it's a test that is done $n k$ times (with $n$ pixels in the image and $k$ pixels in the kernel). Maybe in Python the added cost is relatively small, it's an interpreted language after all, but for a compiled language these tests can easily amount to doubling processing time. There are 3 common alternatives that I know of. I will use border = filter_size // 2, and presume filter_size is odd. It is possible to adjust all 3 methods to even-sized filters.



        Separate loops for image border pixels



        The idea here is that the loop over the first and last border pixels along each dimension are handled separately from the loop over the core of the image. This avoids all tests. But it does require some code duplication (all in the name of speed!).



        for i in range(border):
        # here we loop over the kernel from -i to border+1
        for i in range(border, len(data)-border):
        # here we loop over the full kernel
        for i in range(len(data)-border, len(data)):
        # here we loop over the kernel from -border to len(data)-i


        Of course, within each of those loops, a similar set of 3 loops is necessary to loop over j. The filter logic is thus repeated 9 times. In a compiled language, where this is the most efficient method, code duplication can be avoided with inlined functions or macros. I don't know how a Python function call compares to a bunch of tests for out-of-bounds access, so can't comment on the usefulness of this method in Python.



        A separate code path for border pixels



        The idea here is to do out-of-bounds checking only for those pixels that are close to the image boundary. For pixels within the border, you use a version of the filtering logic with out-of-bounds checking. For the pixels in the core of the image (which is the big majority of pixels), you use a second version of the logic without out-of-bounds checking.



        for i in range(len(data)):
        i_border = i < border or i >= len(data)-border
        for j in range(len(data[0])):
        j_border = j < border or j >= len(data)-border
        if i_border or j_border:
        # filtering with bounds checking
        else:
        # filtering without bounds checking


        Padding the image



        The simplest solution, and also the most flexible one, is to create a temporary image that is larger than the input image by 2*border along each dimension, and copy the input image into it. The "new" pixels can be filled with zeros (to replicate what OP intended to do), or with values taken from the input image (for example by mirroring the image at the boundary or extrapolating in some other way).



        The filter now never needs to check for out-of-bounds reads. When the filtering kernel is placed over any of the input image pixels, all samples fall within the padded image.



        Since for this type of filtering it is necessary to create a new output image anyway (it is not possible to compute it in-place, as I mentioned before), this is not a huge cost: the original input image can now be re-used as output image.



        This solution leads to the simplest code, allows for all sorts of boundary extension methods without complicating the filtering code, and often results in the fastest code too.







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Apr 14 at 6:35









        Cris Luengo

        1,877215




        1,877215






















            up vote
            3
            down vote













            You seem to have a few bugs.





            1. if i + z - indexer < 0 or i + z - indexer > len(data) - 1:



              If i and z are 0, where indexer is 1, then you'll have 0 + 0 - 1 < 0. This would mean that you'd replace the data in (-1, j), (0, j) and (1, j) to 0. Since 0 and 1 probably do contain data this is just plain wrong.





            2. if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
              temp.append(0)



              This removes some data, meaning that the median is shifted. Say you should have (0, 0, 0, 1, 2, 3), however you removed the first three because of this you'd have (0, 1, 2, 3). Now the median is 1 rather than 0.




            Your code would be simpler if you:



            1. Made a window list, that contained all the indexes that you want to move to.

            2. Have an if to check if the data in that index is out of bounds.

            3. If it's out of bounds default to 0.

            4. If it's not out of bounds use the data.

            This could become:



            def median_filter(data, filter_size):
            temp =
            indexer = filter_size // 2
            window = [
            (i, j)
            for i in range(-indexer, filter_size-indexer)
            for j in range(-indexer, filter_size-indexer)
            ]
            index = len(window) // 2
            for i in range(len(data)):
            for j in range(len(data[0])):
            data[i][j] = sorted(
            0 if (
            min(i+a, j+b) < 0
            or len(data) <= i+a
            or len(data[0]) <= j+b
            ) else data[i+a][j+b]
            for a, b in window
            )[index]
            return data





            share|improve this answer





















            • That is the whole point, replacing those values with 0. As I said adding zeros, it is very common approach. Probably I couldnt make myself clear to you in the comment section, sorry about that Thank you for your comments
              – MeteHan
              Apr 13 at 15:15











            • @M.Han Please read my answer again. You're changing too many values to black. If changing valid data to black was 'the whole point', then you'd want to just change the code to data[i][j] = 0...
              – Peilonrayz
              Apr 13 at 15:17










            • @M.Han Is (0, 0) meant to be replaced to black?
              – Peilonrayz
              Apr 13 at 15:23










            • The window list not only makes the code simpler, it also allows for arbitrary window shapes. A circular window has much nicer properties than a square one. It is easy to implement that with this code. Nice!
              – Cris Luengo
              Apr 14 at 6:36














            up vote
            3
            down vote













            You seem to have a few bugs.





            1. if i + z - indexer < 0 or i + z - indexer > len(data) - 1:



              If i and z are 0, where indexer is 1, then you'll have 0 + 0 - 1 < 0. This would mean that you'd replace the data in (-1, j), (0, j) and (1, j) to 0. Since 0 and 1 probably do contain data this is just plain wrong.





            2. if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
              temp.append(0)



              This removes some data, meaning that the median is shifted. Say you should have (0, 0, 0, 1, 2, 3), however you removed the first three because of this you'd have (0, 1, 2, 3). Now the median is 1 rather than 0.




            Your code would be simpler if you:



            1. Made a window list, that contained all the indexes that you want to move to.

            2. Have an if to check if the data in that index is out of bounds.

            3. If it's out of bounds default to 0.

            4. If it's not out of bounds use the data.

            This could become:



            def median_filter(data, filter_size):
            temp =
            indexer = filter_size // 2
            window = [
            (i, j)
            for i in range(-indexer, filter_size-indexer)
            for j in range(-indexer, filter_size-indexer)
            ]
            index = len(window) // 2
            for i in range(len(data)):
            for j in range(len(data[0])):
            data[i][j] = sorted(
            0 if (
            min(i+a, j+b) < 0
            or len(data) <= i+a
            or len(data[0]) <= j+b
            ) else data[i+a][j+b]
            for a, b in window
            )[index]
            return data





            share|improve this answer





















            • That is the whole point, replacing those values with 0. As I said adding zeros, it is very common approach. Probably I couldnt make myself clear to you in the comment section, sorry about that Thank you for your comments
              – MeteHan
              Apr 13 at 15:15











            • @M.Han Please read my answer again. You're changing too many values to black. If changing valid data to black was 'the whole point', then you'd want to just change the code to data[i][j] = 0...
              – Peilonrayz
              Apr 13 at 15:17










            • @M.Han Is (0, 0) meant to be replaced to black?
              – Peilonrayz
              Apr 13 at 15:23










            • The window list not only makes the code simpler, it also allows for arbitrary window shapes. A circular window has much nicer properties than a square one. It is easy to implement that with this code. Nice!
              – Cris Luengo
              Apr 14 at 6:36












            up vote
            3
            down vote










            up vote
            3
            down vote









            You seem to have a few bugs.





            1. if i + z - indexer < 0 or i + z - indexer > len(data) - 1:



              If i and z are 0, where indexer is 1, then you'll have 0 + 0 - 1 < 0. This would mean that you'd replace the data in (-1, j), (0, j) and (1, j) to 0. Since 0 and 1 probably do contain data this is just plain wrong.





            2. if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
              temp.append(0)



              This removes some data, meaning that the median is shifted. Say you should have (0, 0, 0, 1, 2, 3), however you removed the first three because of this you'd have (0, 1, 2, 3). Now the median is 1 rather than 0.




            Your code would be simpler if you:



            1. Made a window list, that contained all the indexes that you want to move to.

            2. Have an if to check if the data in that index is out of bounds.

            3. If it's out of bounds default to 0.

            4. If it's not out of bounds use the data.

            This could become:



            def median_filter(data, filter_size):
            temp =
            indexer = filter_size // 2
            window = [
            (i, j)
            for i in range(-indexer, filter_size-indexer)
            for j in range(-indexer, filter_size-indexer)
            ]
            index = len(window) // 2
            for i in range(len(data)):
            for j in range(len(data[0])):
            data[i][j] = sorted(
            0 if (
            min(i+a, j+b) < 0
            or len(data) <= i+a
            or len(data[0]) <= j+b
            ) else data[i+a][j+b]
            for a, b in window
            )[index]
            return data





            share|improve this answer













            You seem to have a few bugs.





            1. if i + z - indexer < 0 or i + z - indexer > len(data) - 1:



              If i and z are 0, where indexer is 1, then you'll have 0 + 0 - 1 < 0. This would mean that you'd replace the data in (-1, j), (0, j) and (1, j) to 0. Since 0 and 1 probably do contain data this is just plain wrong.





            2. if j + z - indexer < 0 or j + indexer > len(data[0]) - 1:
              temp.append(0)



              This removes some data, meaning that the median is shifted. Say you should have (0, 0, 0, 1, 2, 3), however you removed the first three because of this you'd have (0, 1, 2, 3). Now the median is 1 rather than 0.




            Your code would be simpler if you:



            1. Made a window list, that contained all the indexes that you want to move to.

            2. Have an if to check if the data in that index is out of bounds.

            3. If it's out of bounds default to 0.

            4. If it's not out of bounds use the data.

            This could become:



            def median_filter(data, filter_size):
            temp =
            indexer = filter_size // 2
            window = [
            (i, j)
            for i in range(-indexer, filter_size-indexer)
            for j in range(-indexer, filter_size-indexer)
            ]
            index = len(window) // 2
            for i in range(len(data)):
            for j in range(len(data[0])):
            data[i][j] = sorted(
            0 if (
            min(i+a, j+b) < 0
            or len(data) <= i+a
            or len(data[0]) <= j+b
            ) else data[i+a][j+b]
            for a, b in window
            )[index]
            return data






            share|improve this answer













            share|improve this answer



            share|improve this answer











            answered Apr 13 at 15:11









            Peilonrayz

            24.3k336102




            24.3k336102











            • That is the whole point, replacing those values with 0. As I said adding zeros, it is very common approach. Probably I couldnt make myself clear to you in the comment section, sorry about that Thank you for your comments
              – MeteHan
              Apr 13 at 15:15











            • @M.Han Please read my answer again. You're changing too many values to black. If changing valid data to black was 'the whole point', then you'd want to just change the code to data[i][j] = 0...
              – Peilonrayz
              Apr 13 at 15:17










            • @M.Han Is (0, 0) meant to be replaced to black?
              – Peilonrayz
              Apr 13 at 15:23










            • The window list not only makes the code simpler, it also allows for arbitrary window shapes. A circular window has much nicer properties than a square one. It is easy to implement that with this code. Nice!
              – Cris Luengo
              Apr 14 at 6:36
















            • That is the whole point, replacing those values with 0. As I said adding zeros, it is very common approach. Probably I couldnt make myself clear to you in the comment section, sorry about that Thank you for your comments
              – MeteHan
              Apr 13 at 15:15











            • @M.Han Please read my answer again. You're changing too many values to black. If changing valid data to black was 'the whole point', then you'd want to just change the code to data[i][j] = 0...
              – Peilonrayz
              Apr 13 at 15:17










            • @M.Han Is (0, 0) meant to be replaced to black?
              – Peilonrayz
              Apr 13 at 15:23










            • The window list not only makes the code simpler, it also allows for arbitrary window shapes. A circular window has much nicer properties than a square one. It is easy to implement that with this code. Nice!
              – Cris Luengo
              Apr 14 at 6:36















            That is the whole point, replacing those values with 0. As I said adding zeros, it is very common approach. Probably I couldnt make myself clear to you in the comment section, sorry about that Thank you for your comments
            – MeteHan
            Apr 13 at 15:15





            That is the whole point, replacing those values with 0. As I said adding zeros, it is very common approach. Probably I couldnt make myself clear to you in the comment section, sorry about that Thank you for your comments
            – MeteHan
            Apr 13 at 15:15













            @M.Han Please read my answer again. You're changing too many values to black. If changing valid data to black was 'the whole point', then you'd want to just change the code to data[i][j] = 0...
            – Peilonrayz
            Apr 13 at 15:17




            @M.Han Please read my answer again. You're changing too many values to black. If changing valid data to black was 'the whole point', then you'd want to just change the code to data[i][j] = 0...
            – Peilonrayz
            Apr 13 at 15:17












            @M.Han Is (0, 0) meant to be replaced to black?
            – Peilonrayz
            Apr 13 at 15:23




            @M.Han Is (0, 0) meant to be replaced to black?
            – Peilonrayz
            Apr 13 at 15:23












            The window list not only makes the code simpler, it also allows for arbitrary window shapes. A circular window has much nicer properties than a square one. It is easy to implement that with this code. Nice!
            – Cris Luengo
            Apr 14 at 6:36




            The window list not only makes the code simpler, it also allows for arbitrary window shapes. A circular window has much nicer properties than a square one. It is easy to implement that with this code. Nice!
            – Cris Luengo
            Apr 14 at 6:36












             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191974%2fmedian-filter-implementation-in-python%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Chat program with C++ and SFML

            Function to Return a JSON Like Objects Using VBA Collections and Arrays

            Will my employers contract hold up in court?