Finding the closest pair of points divide-and-conquer speed improvement
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
This is a follow up to a previously asked Finding the closest pair of points divide-and-conquer question . The original code changed significantly based on answers from @AJNeufeld therefore I created a new question per codereview guidelines for additional help.
After incorporating suggestions from answers to the original question linked above, the run-time improved by 30% which is significant. Nevertheless, the code is still slow and I think there is room for further improvements. My hunch is that find_min_distance_in_rec
is slowing the code down. Below is the current code. Again, the code has been stress tested so I am confident that it's correct, however, it's slow.
#Uses python3
import math
import statistics as stats
# helper functions:
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
def combine_xy(x_arr,y_arr):
# combine x_arr and y_arr to combined list of (x,y) tuples
return list(zip(x_arr,y_arr))
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dmin = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
dis_storage_min = min( two_point_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if dis_storage_min < dmin:
dmin = dis_storage_min
return dmin
def calc_median_x(xy_arr):
# return median of x values in list of (x,y) points
return stats.median( val[0] for val in xy_arr )
def filter_set(xy_arr_y_sorted, median, distance):
# filter initial set such than |x-median|<= distance
return [ val for val in xy_arr_y_sorted if abs(val[0] - median) <= distance ]
def x_sort(xy_arr):
# sort array according to x value
return sorted(xy_arr, key=lambda val: val[0])
def y_sort(xy_arr):
# sort array according to y value
return sorted(xy_arr, key=lambda val: val[1])
def split_array(arr_x_sorted, arr_y_sorted,median):
# split array of size n to two arrays of n/2
# input is the same array twice, one sorted wrt x, the other wrt y
leq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] < median ]
geq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] > median ]
eq_arr_x = [ val for val in arr_x_sorted if val[0] == median ]
n = len(eq_arr_x)//2
leq_arr_x_sorted = leq_arr_x_sorted + eq_arr_x[:n]
geq_arr_x_sorted = eq_arr_x[n:] + geq_arr_x_sorted
leq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] < median ]
geq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] > median ]
eq_arr_y = [ val for val in arr_y_sorted if val[0] == median ]
n = len(eq_arr_y)//2
leq_arr_y_sorted = leq_arr_y_sorted + eq_arr_y[:n]
geq_arr_y_sorted = eq_arr_y[n:] + geq_arr_y_sorted
return leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
# takes in array sorted in y, and minimum distance of n/2 halves
# for each point it computes distance to 7 subsequent points
# output min distance encountered
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
if len(xy_arr_y_sorted) > 7:
for i, pnt_i in enumerate(xy_arr_y_sorted[:-7]):
dis_storage_min = min(two_point_distance(pnt_i, pnt_j)
for pnt_j in xy_arr_y_sorted[i+1:i+1+7])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
dis_storage_min = find_closest_distance_brute(xy_arr_y_sorted[-7:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
return dmin_rec
def find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sorted):
# recursive function to find closest distance between points
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
median = calc_median_x(xy_arr_x_sorted)
leq_arr_x_sorted, leq_arr_y_sorted , grt_arr_x_sorted, grt_arr_y_sorted = split_array(xy_arr_x_sorted, xy_arr_y_sorted, median)
distance_left = find_closest_distance_recur(leq_arr_x_sorted, leq_arr_y_sorted)
distance_right = find_closest_distance_recur(grt_arr_x_sorted, grt_arr_y_sorted)
distance_min = min(distance_left, distance_right)
filt_out = filter_set(xy_arr_y_sorted, median, distance_min)
distance_filt = find_min_distance_in_rec(filt_out, distance_min)
return min(distance_min, distance_filt)
def find_closest_point(x_arr, y_arr):
# input is x,y points in two arrays, all x's in x_arr, all y's in y_arr
xy_arr = combine_xy(x_arr,y_arr)
xy_arr_x_sorted = x_sort(xy_arr)
xy_arr_y_sored = y_sort(xy_arr)
min_distance = find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sored)
return min_distance
python recursion computational-geometry divide-and-conquer
add a comment |Â
up vote
2
down vote
favorite
This is a follow up to a previously asked Finding the closest pair of points divide-and-conquer question . The original code changed significantly based on answers from @AJNeufeld therefore I created a new question per codereview guidelines for additional help.
After incorporating suggestions from answers to the original question linked above, the run-time improved by 30% which is significant. Nevertheless, the code is still slow and I think there is room for further improvements. My hunch is that find_min_distance_in_rec
is slowing the code down. Below is the current code. Again, the code has been stress tested so I am confident that it's correct, however, it's slow.
#Uses python3
import math
import statistics as stats
# helper functions:
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
def combine_xy(x_arr,y_arr):
# combine x_arr and y_arr to combined list of (x,y) tuples
return list(zip(x_arr,y_arr))
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dmin = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
dis_storage_min = min( two_point_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if dis_storage_min < dmin:
dmin = dis_storage_min
return dmin
def calc_median_x(xy_arr):
# return median of x values in list of (x,y) points
return stats.median( val[0] for val in xy_arr )
def filter_set(xy_arr_y_sorted, median, distance):
# filter initial set such than |x-median|<= distance
return [ val for val in xy_arr_y_sorted if abs(val[0] - median) <= distance ]
def x_sort(xy_arr):
# sort array according to x value
return sorted(xy_arr, key=lambda val: val[0])
def y_sort(xy_arr):
# sort array according to y value
return sorted(xy_arr, key=lambda val: val[1])
def split_array(arr_x_sorted, arr_y_sorted,median):
# split array of size n to two arrays of n/2
# input is the same array twice, one sorted wrt x, the other wrt y
leq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] < median ]
geq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] > median ]
eq_arr_x = [ val for val in arr_x_sorted if val[0] == median ]
n = len(eq_arr_x)//2
leq_arr_x_sorted = leq_arr_x_sorted + eq_arr_x[:n]
geq_arr_x_sorted = eq_arr_x[n:] + geq_arr_x_sorted
leq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] < median ]
geq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] > median ]
eq_arr_y = [ val for val in arr_y_sorted if val[0] == median ]
n = len(eq_arr_y)//2
leq_arr_y_sorted = leq_arr_y_sorted + eq_arr_y[:n]
geq_arr_y_sorted = eq_arr_y[n:] + geq_arr_y_sorted
return leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
# takes in array sorted in y, and minimum distance of n/2 halves
# for each point it computes distance to 7 subsequent points
# output min distance encountered
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
if len(xy_arr_y_sorted) > 7:
for i, pnt_i in enumerate(xy_arr_y_sorted[:-7]):
dis_storage_min = min(two_point_distance(pnt_i, pnt_j)
for pnt_j in xy_arr_y_sorted[i+1:i+1+7])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
dis_storage_min = find_closest_distance_brute(xy_arr_y_sorted[-7:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
return dmin_rec
def find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sorted):
# recursive function to find closest distance between points
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
median = calc_median_x(xy_arr_x_sorted)
leq_arr_x_sorted, leq_arr_y_sorted , grt_arr_x_sorted, grt_arr_y_sorted = split_array(xy_arr_x_sorted, xy_arr_y_sorted, median)
distance_left = find_closest_distance_recur(leq_arr_x_sorted, leq_arr_y_sorted)
distance_right = find_closest_distance_recur(grt_arr_x_sorted, grt_arr_y_sorted)
distance_min = min(distance_left, distance_right)
filt_out = filter_set(xy_arr_y_sorted, median, distance_min)
distance_filt = find_min_distance_in_rec(filt_out, distance_min)
return min(distance_min, distance_filt)
def find_closest_point(x_arr, y_arr):
# input is x,y points in two arrays, all x's in x_arr, all y's in y_arr
xy_arr = combine_xy(x_arr,y_arr)
xy_arr_x_sorted = x_sort(xy_arr)
xy_arr_y_sored = y_sort(xy_arr)
min_distance = find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sored)
return min_distance
python recursion computational-geometry divide-and-conquer
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
This is a follow up to a previously asked Finding the closest pair of points divide-and-conquer question . The original code changed significantly based on answers from @AJNeufeld therefore I created a new question per codereview guidelines for additional help.
After incorporating suggestions from answers to the original question linked above, the run-time improved by 30% which is significant. Nevertheless, the code is still slow and I think there is room for further improvements. My hunch is that find_min_distance_in_rec
is slowing the code down. Below is the current code. Again, the code has been stress tested so I am confident that it's correct, however, it's slow.
#Uses python3
import math
import statistics as stats
# helper functions:
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
def combine_xy(x_arr,y_arr):
# combine x_arr and y_arr to combined list of (x,y) tuples
return list(zip(x_arr,y_arr))
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dmin = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
dis_storage_min = min( two_point_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if dis_storage_min < dmin:
dmin = dis_storage_min
return dmin
def calc_median_x(xy_arr):
# return median of x values in list of (x,y) points
return stats.median( val[0] for val in xy_arr )
def filter_set(xy_arr_y_sorted, median, distance):
# filter initial set such than |x-median|<= distance
return [ val for val in xy_arr_y_sorted if abs(val[0] - median) <= distance ]
def x_sort(xy_arr):
# sort array according to x value
return sorted(xy_arr, key=lambda val: val[0])
def y_sort(xy_arr):
# sort array according to y value
return sorted(xy_arr, key=lambda val: val[1])
def split_array(arr_x_sorted, arr_y_sorted,median):
# split array of size n to two arrays of n/2
# input is the same array twice, one sorted wrt x, the other wrt y
leq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] < median ]
geq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] > median ]
eq_arr_x = [ val for val in arr_x_sorted if val[0] == median ]
n = len(eq_arr_x)//2
leq_arr_x_sorted = leq_arr_x_sorted + eq_arr_x[:n]
geq_arr_x_sorted = eq_arr_x[n:] + geq_arr_x_sorted
leq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] < median ]
geq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] > median ]
eq_arr_y = [ val for val in arr_y_sorted if val[0] == median ]
n = len(eq_arr_y)//2
leq_arr_y_sorted = leq_arr_y_sorted + eq_arr_y[:n]
geq_arr_y_sorted = eq_arr_y[n:] + geq_arr_y_sorted
return leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
# takes in array sorted in y, and minimum distance of n/2 halves
# for each point it computes distance to 7 subsequent points
# output min distance encountered
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
if len(xy_arr_y_sorted) > 7:
for i, pnt_i in enumerate(xy_arr_y_sorted[:-7]):
dis_storage_min = min(two_point_distance(pnt_i, pnt_j)
for pnt_j in xy_arr_y_sorted[i+1:i+1+7])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
dis_storage_min = find_closest_distance_brute(xy_arr_y_sorted[-7:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
return dmin_rec
def find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sorted):
# recursive function to find closest distance between points
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
median = calc_median_x(xy_arr_x_sorted)
leq_arr_x_sorted, leq_arr_y_sorted , grt_arr_x_sorted, grt_arr_y_sorted = split_array(xy_arr_x_sorted, xy_arr_y_sorted, median)
distance_left = find_closest_distance_recur(leq_arr_x_sorted, leq_arr_y_sorted)
distance_right = find_closest_distance_recur(grt_arr_x_sorted, grt_arr_y_sorted)
distance_min = min(distance_left, distance_right)
filt_out = filter_set(xy_arr_y_sorted, median, distance_min)
distance_filt = find_min_distance_in_rec(filt_out, distance_min)
return min(distance_min, distance_filt)
def find_closest_point(x_arr, y_arr):
# input is x,y points in two arrays, all x's in x_arr, all y's in y_arr
xy_arr = combine_xy(x_arr,y_arr)
xy_arr_x_sorted = x_sort(xy_arr)
xy_arr_y_sored = y_sort(xy_arr)
min_distance = find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sored)
return min_distance
python recursion computational-geometry divide-and-conquer
This is a follow up to a previously asked Finding the closest pair of points divide-and-conquer question . The original code changed significantly based on answers from @AJNeufeld therefore I created a new question per codereview guidelines for additional help.
After incorporating suggestions from answers to the original question linked above, the run-time improved by 30% which is significant. Nevertheless, the code is still slow and I think there is room for further improvements. My hunch is that find_min_distance_in_rec
is slowing the code down. Below is the current code. Again, the code has been stress tested so I am confident that it's correct, however, it's slow.
#Uses python3
import math
import statistics as stats
# helper functions:
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
def combine_xy(x_arr,y_arr):
# combine x_arr and y_arr to combined list of (x,y) tuples
return list(zip(x_arr,y_arr))
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dmin = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
dis_storage_min = min( two_point_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if dis_storage_min < dmin:
dmin = dis_storage_min
return dmin
def calc_median_x(xy_arr):
# return median of x values in list of (x,y) points
return stats.median( val[0] for val in xy_arr )
def filter_set(xy_arr_y_sorted, median, distance):
# filter initial set such than |x-median|<= distance
return [ val for val in xy_arr_y_sorted if abs(val[0] - median) <= distance ]
def x_sort(xy_arr):
# sort array according to x value
return sorted(xy_arr, key=lambda val: val[0])
def y_sort(xy_arr):
# sort array according to y value
return sorted(xy_arr, key=lambda val: val[1])
def split_array(arr_x_sorted, arr_y_sorted,median):
# split array of size n to two arrays of n/2
# input is the same array twice, one sorted wrt x, the other wrt y
leq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] < median ]
geq_arr_x_sorted = [ val for val in arr_x_sorted if val[0] > median ]
eq_arr_x = [ val for val in arr_x_sorted if val[0] == median ]
n = len(eq_arr_x)//2
leq_arr_x_sorted = leq_arr_x_sorted + eq_arr_x[:n]
geq_arr_x_sorted = eq_arr_x[n:] + geq_arr_x_sorted
leq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] < median ]
geq_arr_y_sorted = [ val for val in arr_y_sorted if val[0] > median ]
eq_arr_y = [ val for val in arr_y_sorted if val[0] == median ]
n = len(eq_arr_y)//2
leq_arr_y_sorted = leq_arr_y_sorted + eq_arr_y[:n]
geq_arr_y_sorted = eq_arr_y[n:] + geq_arr_y_sorted
return leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
# takes in array sorted in y, and minimum distance of n/2 halves
# for each point it computes distance to 7 subsequent points
# output min distance encountered
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
if len(xy_arr_y_sorted) > 7:
for i, pnt_i in enumerate(xy_arr_y_sorted[:-7]):
dis_storage_min = min(two_point_distance(pnt_i, pnt_j)
for pnt_j in xy_arr_y_sorted[i+1:i+1+7])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
dis_storage_min = find_closest_distance_brute(xy_arr_y_sorted[-7:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
return dmin_rec
def find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sorted):
# recursive function to find closest distance between points
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
median = calc_median_x(xy_arr_x_sorted)
leq_arr_x_sorted, leq_arr_y_sorted , grt_arr_x_sorted, grt_arr_y_sorted = split_array(xy_arr_x_sorted, xy_arr_y_sorted, median)
distance_left = find_closest_distance_recur(leq_arr_x_sorted, leq_arr_y_sorted)
distance_right = find_closest_distance_recur(grt_arr_x_sorted, grt_arr_y_sorted)
distance_min = min(distance_left, distance_right)
filt_out = filter_set(xy_arr_y_sorted, median, distance_min)
distance_filt = find_min_distance_in_rec(filt_out, distance_min)
return min(distance_min, distance_filt)
def find_closest_point(x_arr, y_arr):
# input is x,y points in two arrays, all x's in x_arr, all y's in y_arr
xy_arr = combine_xy(x_arr,y_arr)
xy_arr_x_sorted = x_sort(xy_arr)
xy_arr_y_sored = y_sort(xy_arr)
min_distance = find_closest_distance_recur(xy_arr_x_sorted, xy_arr_y_sored)
return min_distance
python recursion computational-geometry divide-and-conquer
edited Jul 18 at 19:59
asked Jul 17 at 17:49
DRmo
585
585
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
The original code changed significantly based on answers
I think that the plural is inappropriate there: the changes seem to be solely based on AJNeufeld's answer. Not a single one of the issues I pointed out has been addressed, not even the bug:
- My understanding is that the end result should be four arrays such that
leq_arr_x_sorted
andleq_arr_y_sorted
contain the same points
sorted on different projections. But thedmy_x
/dmy_y
separation
of points whosex
coordinate ismedian
doesn't seem to guarantee
that those points will be sent to the same half in thex
-projection
and they
-projection. In order to guarantee that an additional
pre-condition is needed: most obviously that inarr_x_sorted
ties
are broken byy
; but that pre-condition is not produced byx_sort
.
This bug can be demonstrated easily by adding
print(leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted)
before the return
of split_array
and calling with the simple test case find_closest_point([1, 2, 2, 3], [4, 3, 2, 1])
.
I edited the question to give credit to AJNeufeld, thanks for the suggestion. Regarding your points for improvement, I commented on your answer under original question
â DRmo
Jul 18 at 20:32
add a comment |Â
up vote
2
down vote
I guess I didn't make this clear in my answer on the original question.
The minimum of the square-root of the square distances is the square-root of the minimum of the square distances. You can save time by calculating the square-root only when needed.
Replace this (and all calls to it):
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
with this:
def square_distance(p0, p1):
dx = p0[0] - p1[0]
dy = p0[1] - p1[1]
return dx * dx + dy * dy
For example, find_closest_distance_brute()
becomes:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
d_sqr_min = min( square_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if d_sqr_min < dist_sqr_min:
dist_sqr_min = d_sqr_min
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Or more pythonically:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = min((square_distance(pnt_i, pnt_j) for i, pnt_i in enumerate(xy_arr[:-1])
for pnt_j in xy_arr[i+1:]),
default=math.inf)
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Regarding find_min_distance_in_rect()
:
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
What happens if len(xy_arr_y_sorted) == 0
? A better test would be <= 1
.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
This else:
code looks exactly like find_closest_distance_brute()
code. In fact, you can replace it with a call to that function.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
dmin_rec = find_closest_distance_brute(xy_arr_y_sorted)
I don't like # ... complicated code involving lots of 7's.
. I'll rework it in a later edit.
See Peter Taylor's answer ... especially the part about median = calc_median_x(xy_arr_x_sorted)
, for an O(n) to O(1) improvement.
As pointed out a second time by Peter Taylor, your code has a bug; xy_arr_x_sorted
and xy_arr_y_sorted
are supposed to have the same content, but due to the bug, may not. He has suggested a change to fix the bug. I have a different approach: eliminate xy_arr_x_sorted
entirely.
In find_closest_distance_recur()
, you have:
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
Since the arrays are supposed to contain the same content, you could replace this with:
if len(xy_arr_y_sorted) <= 3:
return find_closest_distance_brute(xy_arr_y_sorted)
Once you do that, you no longer have any useful references to the [1]
members of the contents of xy_arr_x_sorted
. Instead, you could replace it with a sorted list of just the x-coordinates.
Once that is done, the partitioning code should be revisited. There is no difference between duplicate median values, so a simple slice operation can divide the x-coordinate list in two with zero copying required.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
The original code changed significantly based on answers
I think that the plural is inappropriate there: the changes seem to be solely based on AJNeufeld's answer. Not a single one of the issues I pointed out has been addressed, not even the bug:
- My understanding is that the end result should be four arrays such that
leq_arr_x_sorted
andleq_arr_y_sorted
contain the same points
sorted on different projections. But thedmy_x
/dmy_y
separation
of points whosex
coordinate ismedian
doesn't seem to guarantee
that those points will be sent to the same half in thex
-projection
and they
-projection. In order to guarantee that an additional
pre-condition is needed: most obviously that inarr_x_sorted
ties
are broken byy
; but that pre-condition is not produced byx_sort
.
This bug can be demonstrated easily by adding
print(leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted)
before the return
of split_array
and calling with the simple test case find_closest_point([1, 2, 2, 3], [4, 3, 2, 1])
.
I edited the question to give credit to AJNeufeld, thanks for the suggestion. Regarding your points for improvement, I commented on your answer under original question
â DRmo
Jul 18 at 20:32
add a comment |Â
up vote
2
down vote
The original code changed significantly based on answers
I think that the plural is inappropriate there: the changes seem to be solely based on AJNeufeld's answer. Not a single one of the issues I pointed out has been addressed, not even the bug:
- My understanding is that the end result should be four arrays such that
leq_arr_x_sorted
andleq_arr_y_sorted
contain the same points
sorted on different projections. But thedmy_x
/dmy_y
separation
of points whosex
coordinate ismedian
doesn't seem to guarantee
that those points will be sent to the same half in thex
-projection
and they
-projection. In order to guarantee that an additional
pre-condition is needed: most obviously that inarr_x_sorted
ties
are broken byy
; but that pre-condition is not produced byx_sort
.
This bug can be demonstrated easily by adding
print(leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted)
before the return
of split_array
and calling with the simple test case find_closest_point([1, 2, 2, 3], [4, 3, 2, 1])
.
I edited the question to give credit to AJNeufeld, thanks for the suggestion. Regarding your points for improvement, I commented on your answer under original question
â DRmo
Jul 18 at 20:32
add a comment |Â
up vote
2
down vote
up vote
2
down vote
The original code changed significantly based on answers
I think that the plural is inappropriate there: the changes seem to be solely based on AJNeufeld's answer. Not a single one of the issues I pointed out has been addressed, not even the bug:
- My understanding is that the end result should be four arrays such that
leq_arr_x_sorted
andleq_arr_y_sorted
contain the same points
sorted on different projections. But thedmy_x
/dmy_y
separation
of points whosex
coordinate ismedian
doesn't seem to guarantee
that those points will be sent to the same half in thex
-projection
and they
-projection. In order to guarantee that an additional
pre-condition is needed: most obviously that inarr_x_sorted
ties
are broken byy
; but that pre-condition is not produced byx_sort
.
This bug can be demonstrated easily by adding
print(leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted)
before the return
of split_array
and calling with the simple test case find_closest_point([1, 2, 2, 3], [4, 3, 2, 1])
.
The original code changed significantly based on answers
I think that the plural is inappropriate there: the changes seem to be solely based on AJNeufeld's answer. Not a single one of the issues I pointed out has been addressed, not even the bug:
- My understanding is that the end result should be four arrays such that
leq_arr_x_sorted
andleq_arr_y_sorted
contain the same points
sorted on different projections. But thedmy_x
/dmy_y
separation
of points whosex
coordinate ismedian
doesn't seem to guarantee
that those points will be sent to the same half in thex
-projection
and they
-projection. In order to guarantee that an additional
pre-condition is needed: most obviously that inarr_x_sorted
ties
are broken byy
; but that pre-condition is not produced byx_sort
.
This bug can be demonstrated easily by adding
print(leq_arr_x_sorted, leq_arr_y_sorted, geq_arr_x_sorted, geq_arr_y_sorted)
before the return
of split_array
and calling with the simple test case find_closest_point([1, 2, 2, 3], [4, 3, 2, 1])
.
answered Jul 18 at 9:42
Peter Taylor
14k2454
14k2454
I edited the question to give credit to AJNeufeld, thanks for the suggestion. Regarding your points for improvement, I commented on your answer under original question
â DRmo
Jul 18 at 20:32
add a comment |Â
I edited the question to give credit to AJNeufeld, thanks for the suggestion. Regarding your points for improvement, I commented on your answer under original question
â DRmo
Jul 18 at 20:32
I edited the question to give credit to AJNeufeld, thanks for the suggestion. Regarding your points for improvement, I commented on your answer under original question
â DRmo
Jul 18 at 20:32
I edited the question to give credit to AJNeufeld, thanks for the suggestion. Regarding your points for improvement, I commented on your answer under original question
â DRmo
Jul 18 at 20:32
add a comment |Â
up vote
2
down vote
I guess I didn't make this clear in my answer on the original question.
The minimum of the square-root of the square distances is the square-root of the minimum of the square distances. You can save time by calculating the square-root only when needed.
Replace this (and all calls to it):
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
with this:
def square_distance(p0, p1):
dx = p0[0] - p1[0]
dy = p0[1] - p1[1]
return dx * dx + dy * dy
For example, find_closest_distance_brute()
becomes:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
d_sqr_min = min( square_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if d_sqr_min < dist_sqr_min:
dist_sqr_min = d_sqr_min
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Or more pythonically:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = min((square_distance(pnt_i, pnt_j) for i, pnt_i in enumerate(xy_arr[:-1])
for pnt_j in xy_arr[i+1:]),
default=math.inf)
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Regarding find_min_distance_in_rect()
:
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
What happens if len(xy_arr_y_sorted) == 0
? A better test would be <= 1
.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
This else:
code looks exactly like find_closest_distance_brute()
code. In fact, you can replace it with a call to that function.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
dmin_rec = find_closest_distance_brute(xy_arr_y_sorted)
I don't like # ... complicated code involving lots of 7's.
. I'll rework it in a later edit.
See Peter Taylor's answer ... especially the part about median = calc_median_x(xy_arr_x_sorted)
, for an O(n) to O(1) improvement.
As pointed out a second time by Peter Taylor, your code has a bug; xy_arr_x_sorted
and xy_arr_y_sorted
are supposed to have the same content, but due to the bug, may not. He has suggested a change to fix the bug. I have a different approach: eliminate xy_arr_x_sorted
entirely.
In find_closest_distance_recur()
, you have:
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
Since the arrays are supposed to contain the same content, you could replace this with:
if len(xy_arr_y_sorted) <= 3:
return find_closest_distance_brute(xy_arr_y_sorted)
Once you do that, you no longer have any useful references to the [1]
members of the contents of xy_arr_x_sorted
. Instead, you could replace it with a sorted list of just the x-coordinates.
Once that is done, the partitioning code should be revisited. There is no difference between duplicate median values, so a simple slice operation can divide the x-coordinate list in two with zero copying required.
add a comment |Â
up vote
2
down vote
I guess I didn't make this clear in my answer on the original question.
The minimum of the square-root of the square distances is the square-root of the minimum of the square distances. You can save time by calculating the square-root only when needed.
Replace this (and all calls to it):
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
with this:
def square_distance(p0, p1):
dx = p0[0] - p1[0]
dy = p0[1] - p1[1]
return dx * dx + dy * dy
For example, find_closest_distance_brute()
becomes:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
d_sqr_min = min( square_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if d_sqr_min < dist_sqr_min:
dist_sqr_min = d_sqr_min
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Or more pythonically:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = min((square_distance(pnt_i, pnt_j) for i, pnt_i in enumerate(xy_arr[:-1])
for pnt_j in xy_arr[i+1:]),
default=math.inf)
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Regarding find_min_distance_in_rect()
:
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
What happens if len(xy_arr_y_sorted) == 0
? A better test would be <= 1
.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
This else:
code looks exactly like find_closest_distance_brute()
code. In fact, you can replace it with a call to that function.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
dmin_rec = find_closest_distance_brute(xy_arr_y_sorted)
I don't like # ... complicated code involving lots of 7's.
. I'll rework it in a later edit.
See Peter Taylor's answer ... especially the part about median = calc_median_x(xy_arr_x_sorted)
, for an O(n) to O(1) improvement.
As pointed out a second time by Peter Taylor, your code has a bug; xy_arr_x_sorted
and xy_arr_y_sorted
are supposed to have the same content, but due to the bug, may not. He has suggested a change to fix the bug. I have a different approach: eliminate xy_arr_x_sorted
entirely.
In find_closest_distance_recur()
, you have:
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
Since the arrays are supposed to contain the same content, you could replace this with:
if len(xy_arr_y_sorted) <= 3:
return find_closest_distance_brute(xy_arr_y_sorted)
Once you do that, you no longer have any useful references to the [1]
members of the contents of xy_arr_x_sorted
. Instead, you could replace it with a sorted list of just the x-coordinates.
Once that is done, the partitioning code should be revisited. There is no difference between duplicate median values, so a simple slice operation can divide the x-coordinate list in two with zero copying required.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
I guess I didn't make this clear in my answer on the original question.
The minimum of the square-root of the square distances is the square-root of the minimum of the square distances. You can save time by calculating the square-root only when needed.
Replace this (and all calls to it):
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
with this:
def square_distance(p0, p1):
dx = p0[0] - p1[0]
dy = p0[1] - p1[1]
return dx * dx + dy * dy
For example, find_closest_distance_brute()
becomes:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
d_sqr_min = min( square_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if d_sqr_min < dist_sqr_min:
dist_sqr_min = d_sqr_min
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Or more pythonically:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = min((square_distance(pnt_i, pnt_j) for i, pnt_i in enumerate(xy_arr[:-1])
for pnt_j in xy_arr[i+1:]),
default=math.inf)
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Regarding find_min_distance_in_rect()
:
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
What happens if len(xy_arr_y_sorted) == 0
? A better test would be <= 1
.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
This else:
code looks exactly like find_closest_distance_brute()
code. In fact, you can replace it with a call to that function.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
dmin_rec = find_closest_distance_brute(xy_arr_y_sorted)
I don't like # ... complicated code involving lots of 7's.
. I'll rework it in a later edit.
See Peter Taylor's answer ... especially the part about median = calc_median_x(xy_arr_x_sorted)
, for an O(n) to O(1) improvement.
As pointed out a second time by Peter Taylor, your code has a bug; xy_arr_x_sorted
and xy_arr_y_sorted
are supposed to have the same content, but due to the bug, may not. He has suggested a change to fix the bug. I have a different approach: eliminate xy_arr_x_sorted
entirely.
In find_closest_distance_recur()
, you have:
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
Since the arrays are supposed to contain the same content, you could replace this with:
if len(xy_arr_y_sorted) <= 3:
return find_closest_distance_brute(xy_arr_y_sorted)
Once you do that, you no longer have any useful references to the [1]
members of the contents of xy_arr_x_sorted
. Instead, you could replace it with a sorted list of just the x-coordinates.
Once that is done, the partitioning code should be revisited. There is no difference between duplicate median values, so a simple slice operation can divide the x-coordinate list in two with zero copying required.
I guess I didn't make this clear in my answer on the original question.
The minimum of the square-root of the square distances is the square-root of the minimum of the square distances. You can save time by calculating the square-root only when needed.
Replace this (and all calls to it):
def two_point_distance(p0,p1):
# returns distance between two (x,y) pairs
return math.sqrt( ((p0[0]-p1[0])*(p0[0]-p1[0])) +
((p0[1] - p1[1])*(p0[1] - p1[1])) )
with this:
def square_distance(p0, p1):
dx = p0[0] - p1[0]
dy = p0[1] - p1[1]
return dx * dx + dy * dy
For example, find_closest_distance_brute()
becomes:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = math.inf
for i, pnt_i in enumerate(xy_arr[:-1]):
d_sqr_min = min( square_distance(pnt_i, pnt_j) for pnt_j in xy_arr[i+1:])
if d_sqr_min < dist_sqr_min:
dist_sqr_min = d_sqr_min
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Or more pythonically:
def find_closest_distance_brute(xy_arr):
# brute force approach to find closest distance
dist_sqr_min = min((square_distance(pnt_i, pnt_j) for i, pnt_i in enumerate(xy_arr[:-1])
for pnt_j in xy_arr[i+1:]),
default=math.inf)
return math.sqrt(dist_sqr_min) # Only calculate square-root of final value
Regarding find_min_distance_in_rect()
:
def find_min_distance_in_rec(xy_arr_y_sorted,dmin):
dmin_rec = dmin
if len(xy_arr_y_sorted) == 1:
return math.inf
What happens if len(xy_arr_y_sorted) == 0
? A better test would be <= 1
.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
for k, pnt_k in enumerate(xy_arr_y_sorted[:-1]):
dis_storage_min = min( two_point_distance(pnt_k, pnt_l)
for pnt_l in xy_arr_y_sorted[k+1:])
if dis_storage_min < dmin_rec:
dmin_rec = dis_storage_min
This else:
code looks exactly like find_closest_distance_brute()
code. In fact, you can replace it with a call to that function.
if len(xy_arr_y_sorted) > 7:
# ... complicated code involving lots of 7's.
else:
dmin_rec = find_closest_distance_brute(xy_arr_y_sorted)
I don't like # ... complicated code involving lots of 7's.
. I'll rework it in a later edit.
See Peter Taylor's answer ... especially the part about median = calc_median_x(xy_arr_x_sorted)
, for an O(n) to O(1) improvement.
As pointed out a second time by Peter Taylor, your code has a bug; xy_arr_x_sorted
and xy_arr_y_sorted
are supposed to have the same content, but due to the bug, may not. He has suggested a change to fix the bug. I have a different approach: eliminate xy_arr_x_sorted
entirely.
In find_closest_distance_recur()
, you have:
if len(xy_arr_x_sorted) <=3 :
return find_closest_distance_brute(xy_arr_x_sorted)
Since the arrays are supposed to contain the same content, you could replace this with:
if len(xy_arr_y_sorted) <= 3:
return find_closest_distance_brute(xy_arr_y_sorted)
Once you do that, you no longer have any useful references to the [1]
members of the contents of xy_arr_x_sorted
. Instead, you could replace it with a sorted list of just the x-coordinates.
Once that is done, the partitioning code should be revisited. There is no difference between duplicate median values, so a simple slice operation can divide the x-coordinate list in two with zero copying required.
edited Jul 18 at 16:09
answered Jul 18 at 3:48
AJNeufeld
1,358312
1,358312
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f199703%2ffinding-the-closest-pair-of-points-divide-and-conquer-speed-improvement%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password