trjhtr

Question

I am solving interview question from here.

Problem : Given a set of digits (A) in sorted order, find how many numbers of length B are possible whose value is less than number C.

Constraints: 1 Ã¢Â‰Â¤ B Ã¢Â‰Â¤ 9, 0 Ã¢Â‰Â¤ C Ã¢Â‰Â¤ 1e9 & 0 Ã¢Â‰Â¤ A[i] Ã¢Â‰Â¤ 9
Input: A = [ 0 1 5], B= 1 , C = 2 ; Output: 2 (0 and 1 are possible) 
Input: A = 0 1 2 5 , B = 2 , C = 21 ; Output: 5 (10, 11, 12, 15, 20 are possible)

This is my approach

from itertools import product
from itertools import ifilter
def solve(A, B, C):
 if A == or B > len(str(C)):
 return 0

 elif B < len(str(C)):
 #constraint is B
 if B == 1:
 new_list = A
 return len(new_list)
 else:
 new_list = list((product((''.join(str(i)for i in A)),repeat = B)))
 b = [''.join(num) for num in new_list]
 c = list(ifilter(lambda x: x[0]!='0' , b))
 return len(c)


 elif B == len(str(C)):
 #constraint is C 
 if B == 1:
 new_list = [i for i in A if i< C]
 return len(new_list)
 else:
 new_list = list((product((''.join(str(i)for i in A)),repeat = B)))
 b = [''.join(num) for num in new_list]
 c = list(ifilter(lambda x: x[0]!='0' and int(x) < C , b))
 return len(c)

Test cases:

assert solve([2],5,51345) == 1
assert solve(,1,1) == 0
assert solve([ 2, 3, 5, 6, 7, 9 ],5,42950) == 2592
assert solve([0],1,5) == 1
assert solve([0,1,2,5],1,123) == 4
assert solve([0,1,5],1,2) == 2
assert solve([ 3 ],5, 26110) == 0
assert solve([0,1,2,5],2,21) == 5

How can I optimize this code in terms of memory usage?

score 7 · Accepted Answer · 2018-05-24 08:44:07Z

Optimise memory usage

You could optimise memory usage by not converting your iterators into list and by avoiding non-required steps (like join).

Changing a few others details (formatting, adding tests, etc), you'd get something like:

from itertools import product
from itertools import ifilter

def solve(A, B, C):
 c_len = len(str(C))
 if A == or B > c_len:
 return 0
 elif B < c_len:
 # Constraint is B
 if B == 1:
 return len(A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' for x in candidates)
 else:
 assert B == c_len
 # Constraint is C
 if B == 1:
 return sum(i < C for i in A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' and int(''.join(x)) < C for x in candidates)

assert solve([2],5,51345) == 1
assert solve(,1,1) == 0
assert solve([2, 3, 5, 6, 7, 9],4,42950) == 1296
assert solve([2, 3, 5, 6, 7, 9],5,42950) == 2592
assert solve([0],1,5) == 1
assert solve([0,1,2,5],1,123) == 4
assert solve([0,1,5],1,2) == 2
assert solve([3],5, 26110) == 0
assert solve([0,1,2,5],2,21) == 5

Another algorithm

I'm pretty sure the whole thing can be optimised further by not generating the various numbers to count them but just using mathematical tricks to get the solution with no counting.

The easiest case to handle is B < c_len:

elif B < c_len:
 # All combinations of B elements are valid
 return len(set(A)) ** B

Actually, as mentionned by Maarten FabrÃƒÂ©, this does not handle 0s perfectly. The code below is updated to handle it better.

The last case is trickier. We can try to use recursion to solve smaller versions of the problem. I didn't manage to make this work properly...

from itertools import product, ifilter, dropwhile, product, takewhile
import timeit

def solve_naive(A, B, C):
 A = set(str(A))
 mini = 10 ** (B-1)
 maxi = min(10 * mini, C)
 cand = [str(i) for i in (['0'] if B == 1 else ) + range(mini, maxi)]
 valid = [i for i in cand if all(c in A for c in i)]
 return len(valid)


def solve_op(A, B, C):
 # print(A, B, C)
 c_len = len(str(C))
 if A == or B > c_len:
 return 0
 elif B < c_len:
 # Constraint is B
 if B == 1:
 return len(A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' for x in candidates)
 else:
 assert B == c_len
 # Constraint is C
 if B == 1:
 return sum(i < C for i in A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' and int(''.join(x)) < C for x in candidates)


def solve_maarten(A, B, C):
 if A == or B > len(str(C)):
 return 0
 c_tuple = tuple(map(int, str(C)))
 combinations = product(A, repeat=B)
 if B != 1:
 combinations = dropwhile(lambda x: x[0] == 0, combinations)
 if B == len(c_tuple):
 combinations = takewhile(lambda x: x < c_tuple, combinations)
 combinations = list(combinations)
 return sum(1 for _ in combinations)


def solve(A, B, C):
 c_str = str(C)
 c_len = len(c_str)
 if A == or B > c_len:
 return 0
 if B < c_len:
 a_len = len(set(A))
 if B == 1:
 return a_len
 non_0_len = a_len - (0 in A)
 return non_0_len * (a_len ** (B-1))
 assert B == c_len # Constraint is C
 head, tail = int(c_str[0]), c_str[1:]
 nb_first_dig_cand = sum(i < head for i in A)
 if not tail or not nb_first_dig_cand:
 return nb_first_dig_cand
 if head in A: # TODO: This case is not handled properly...
 # It should involve ret and solve(A, B-1, int(tail)) or something like that
 return solve_maarten(A, B, C)
 solve_c = solve(A, B-1, C)
 ret = nb_first_dig_cand * solve_c
 return ret


tests = [
 ([2], 4, 51345, 1),
 ([2], 5, 51345, 1),
 (, 1, 1, 0),
 ([2, 3, 5, 6, 7, 9], 4, 42950, 1296),
 ([2, 3, 5, 6, 7, 9], 5, 42950, 2592),
 ([0], 1, 5, 1),
 ([0, 1, 2, 5], 1, 123, 4),
 ([0, 1, 5], 1, 2, 2),
 ([3], 5, 26110, 0),
 ([0, 1, 2, 5], 1, 21, 4),
 ([0, 1, 2, 5], 2, 21, 5),
 ([0, 1, 2, 5], 2, 201, 12),
 ([0, 1, 2, 5], 3, 2010, 48),
 ([0, 1, 2, 5], 4, 20108, 192),
 ([0, 1, 2, 5], 5, 201089, 768),
 ([0, 1, 2, 3, 4, 5, 7, 8], 5, 201089, 28672),
 ([0, 1, 2, 3, 4, 5, 7, 8], 6, 201089, 33344),
 ([0, 1, 2, 3, 4, 5, 7, 8, 9], 6, 200000, 59049),
 ([0, 1, 2, 3, 4, 5, 7, 8, 9], 6, 999999, 472391),
 ([1, 2, 3, 4, 5, 7, 8, 9], 6, 200000, 32768),
 ([1, 2, 3, 4, 5, 7, 8, 9], 6, 999999, 262143),
]
funcs = [solve, solve_op, solve_maarten, solve_naive]

for func in funcs:
 start = timeit.default_timer()
 for (A, B, C, exp) in tests:
 ret = func(A, B, C)
 if ret != exp:
 print "%s(%s, %d, %d): ret=%d, exp:%d" % (func.__name__, str(A), B, C, ret, exp)
 end = timeit.default_timer()
 print("Time for %s: %f" % (func.__name__, end - start))




def solve2(A, B, C):
 c_str = str(C)
 c_len = len(c_str)
 if A == or B > c_len:
 return 0
 if B < c_len:
 a_len = len(set(A))
 if B == 1:
 return a_len
 non_0_len = a_len - (0 in A)
 return non_0_len * (a_len ** (B-1))
 assert B == c_len # Constraint is C
 head, last_dig = divmod(C, 10)
 nb_last_dig_cand = sum(i < last_dig for i in A)
 if head == 0:
 return nb_last_dig_cand
 ret = solve_naive(A, B-1, head - 1) * len(A)
 ret_dummy = solve_naive(A, B, C)
 print(ret - ret_dummy, A, B, C)
 return ret_dummy

if A == or B > c_len: could be if A or B > c_len: (even if some newbies may think that it tests > c_len for both A & B :) — May 22 at 11:58
in the case of B < c_len, and B > 1 you must discard the ones that start with a 0, or use len(<A without 0>) * len(A)**(B-1) — May 22 at 12:59
Oh, I guess I missed an edge case. Would you have a test case for this ? — May 22 at 13:00

score 5 · Answer 2 · 2018-05-24 09:07:25Z

DRY

The code you use for the special cases is independent from each other, and has no impact on the further handling of the data, so it is unnecessary to make a complete code path for each of the combinations edge cases, since this will explode the number of code quickly.

generators

You seem to use iterators, but instantiate the list intermediately at every step, while this is not needed. Even counting how much items are in a iterator can be done more memory efficient with sum(1 for _ in iterator) instead of len(list(iterator))

comparison

instead of converting all the combinations to str and then back to int, why not keep it in the shape of a tuple and use tuple comparison.

Ordered

Since the list of digits is ordered, the products will be ordered too. So, instead of using ifilter, you can use takewhile and dropwhile, this will limit the number of checks to be done

My code:

from itertools import dropwhile, product, takewhile

def solve(A, B, C):
 if A == or B > len(str(C)):
 return 0
 c_tuple = tuple(map(int, str(C)))
 combinations = product(A, repeat=B)
 if B != 1:
 combinations = dropwhile(lambda x: x[0] == 0, combinations)
 if B == len(c_tuple):
 combinations = takewhile(lambda x: x < c_tuple, combinations)
 return sum(1 for _ in combinations)

alternative implementation

Since apparently this isn't fast enough, I was looking for another way around this, without generating all the possibilities:

from bisect import bisect_left
def solve_fast(A, B, C):
 c_tuple = tuple(map(int, str(C)))
 if A == or B > len(c_tuple) or c_tuple[0] < A[0]:
 return 0
 if A == [0]:
 return B == 1 and C != 0
 if B == 1:
 return sum(1 for a in A if a < C)
 len_a, len_c = len(A), len(c_tuple)
 A_0 = not A[0]
 if B < len_c or c_tuple[0] > A[-1]:
 return len_a ** (B-A_0) * (len_a-A_0)**A_0

 idx_c = bisect_left(A, c_tuple[0]) - A_0
# idx_c = sum(0 < a < c_tuple[0] for a in A)
 result_smaller = idx_c * len_a**(len_c - 1) 
 # number of combinations starting with a smaller digit that the first digit of C which is not 0,

 result_same = 0
 c_equal = c_tuple[0] in A
 for i, c in enumerate(c_tuple[1:], 2):
 if not c_equal: # as long as there is a digit in A which is the same as the next digit in C, continue
 break
 idx_c = bisect_left(A, c) # numbers of digits smaller than c
# idx_c = sum(a < c for a in A) # numbers of digits smaller than c
 if A[idx_c] != c:
 c_equal = False
 if idx_c:
 result_same += idx_c * len_a ** (len_c - i)
 return result_same + result_smaller

This code is far less elegant than the other one, but it is faster.

If I look closer to it, it has the same backbone as josay's algorithm, but without the recursion, and converting C to a tuple of ints, instead of keeping it as a str.

timings:

def test_function(func, tests):
 flag = True
 for test in tests:
 A, B, C, expected = test
 result = func(A, B, C)
 if expected != result:
 print(f'func.__name__: test failed: result')
 flag = False
 return flag
funcs = [solve_josay, solve_op, solve_maarten, solve_dummy, solve_fast]
all(test_function(func, tests) for func in funcs)

all passed

timings:

import timeit
for func in funcs:
 global_dict='func': func, 'tests': tests, 'test_function': test_function
 time = timeit.timeit('test_function(func, tests)', globals=global_dict, number = 1000)
 print(func.__name__, time)

print(func.__name__, time)

solve_josay 0.036541963709623815
solve_op 4.350994605536243
solve_maarten 0.7999383794617643
solve_fast 0.03256370566714395
solve_dummy 113.19599720861424

shows the performance is pretty close to Josay's, but 20+ times faster than my original attempt

It looks good. arguments shouldn't be upper case, though, should they? — May 22 at 14:48
There might be some potential code review, then? You already have an excellent answer, BTW. — May 22 at 15:01
In this case the OP inherited the signature from an external source, so I found it more important to stay consistent with that, than change the signature — May 23 at 8:50
@MaartenFabrtÃ© thanks for the review but now the source says time limit exceeded , so it still need to be optimised . — May 24 at 5:35

score 7 · Accepted Answer · 2018-05-24 08:44:07Z

Optimise memory usage

You could optimise memory usage by not converting your iterators into list and by avoiding non-required steps (like join).

Changing a few others details (formatting, adding tests, etc), you'd get something like:

from itertools import product
from itertools import ifilter

def solve(A, B, C):
 c_len = len(str(C))
 if A == or B > c_len:
 return 0
 elif B < c_len:
 # Constraint is B
 if B == 1:
 return len(A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' for x in candidates)
 else:
 assert B == c_len
 # Constraint is C
 if B == 1:
 return sum(i < C for i in A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' and int(''.join(x)) < C for x in candidates)

assert solve([2],5,51345) == 1
assert solve(,1,1) == 0
assert solve([2, 3, 5, 6, 7, 9],4,42950) == 1296
assert solve([2, 3, 5, 6, 7, 9],5,42950) == 2592
assert solve([0],1,5) == 1
assert solve([0,1,2,5],1,123) == 4
assert solve([0,1,5],1,2) == 2
assert solve([3],5, 26110) == 0
assert solve([0,1,2,5],2,21) == 5

Another algorithm

I'm pretty sure the whole thing can be optimised further by not generating the various numbers to count them but just using mathematical tricks to get the solution with no counting.

The easiest case to handle is B < c_len:

elif B < c_len:
 # All combinations of B elements are valid
 return len(set(A)) ** B

Actually, as mentionned by Maarten FabrÃƒÂ©, this does not handle 0s perfectly. The code below is updated to handle it better.

The last case is trickier. We can try to use recursion to solve smaller versions of the problem. I didn't manage to make this work properly...

from itertools import product, ifilter, dropwhile, product, takewhile
import timeit

def solve_naive(A, B, C):
 A = set(str(A))
 mini = 10 ** (B-1)
 maxi = min(10 * mini, C)
 cand = [str(i) for i in (['0'] if B == 1 else ) + range(mini, maxi)]
 valid = [i for i in cand if all(c in A for c in i)]
 return len(valid)


def solve_op(A, B, C):
 # print(A, B, C)
 c_len = len(str(C))
 if A == or B > c_len:
 return 0
 elif B < c_len:
 # Constraint is B
 if B == 1:
 return len(A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' for x in candidates)
 else:
 assert B == c_len
 # Constraint is C
 if B == 1:
 return sum(i < C for i in A)
 else:
 candidates = product((str(i) for i in A), repeat = B)
 return sum(x[0] != '0' and int(''.join(x)) < C for x in candidates)


def solve_maarten(A, B, C):
 if A == or B > len(str(C)):
 return 0
 c_tuple = tuple(map(int, str(C)))
 combinations = product(A, repeat=B)
 if B != 1:
 combinations = dropwhile(lambda x: x[0] == 0, combinations)
 if B == len(c_tuple):
 combinations = takewhile(lambda x: x < c_tuple, combinations)
 combinations = list(combinations)
 return sum(1 for _ in combinations)


def solve(A, B, C):
 c_str = str(C)
 c_len = len(c_str)
 if A == or B > c_len:
 return 0
 if B < c_len:
 a_len = len(set(A))
 if B == 1:
 return a_len
 non_0_len = a_len - (0 in A)
 return non_0_len * (a_len ** (B-1))
 assert B == c_len # Constraint is C
 head, tail = int(c_str[0]), c_str[1:]
 nb_first_dig_cand = sum(i < head for i in A)
 if not tail or not nb_first_dig_cand:
 return nb_first_dig_cand
 if head in A: # TODO: This case is not handled properly...
 # It should involve ret and solve(A, B-1, int(tail)) or something like that
 return solve_maarten(A, B, C)
 solve_c = solve(A, B-1, C)
 ret = nb_first_dig_cand * solve_c
 return ret


tests = [
 ([2], 4, 51345, 1),
 ([2], 5, 51345, 1),
 (, 1, 1, 0),
 ([2, 3, 5, 6, 7, 9], 4, 42950, 1296),
 ([2, 3, 5, 6, 7, 9], 5, 42950, 2592),
 ([0], 1, 5, 1),
 ([0, 1, 2, 5], 1, 123, 4),
 ([0, 1, 5], 1, 2, 2),
 ([3], 5, 26110, 0),
 ([0, 1, 2, 5], 1, 21, 4),
 ([0, 1, 2, 5], 2, 21, 5),
 ([0, 1, 2, 5], 2, 201, 12),
 ([0, 1, 2, 5], 3, 2010, 48),
 ([0, 1, 2, 5], 4, 20108, 192),
 ([0, 1, 2, 5], 5, 201089, 768),
 ([0, 1, 2, 3, 4, 5, 7, 8], 5, 201089, 28672),
 ([0, 1, 2, 3, 4, 5, 7, 8], 6, 201089, 33344),
 ([0, 1, 2, 3, 4, 5, 7, 8, 9], 6, 200000, 59049),
 ([0, 1, 2, 3, 4, 5, 7, 8, 9], 6, 999999, 472391),
 ([1, 2, 3, 4, 5, 7, 8, 9], 6, 200000, 32768),
 ([1, 2, 3, 4, 5, 7, 8, 9], 6, 999999, 262143),
]
funcs = [solve, solve_op, solve_maarten, solve_naive]

for func in funcs:
 start = timeit.default_timer()
 for (A, B, C, exp) in tests:
 ret = func(A, B, C)
 if ret != exp:
 print "%s(%s, %d, %d): ret=%d, exp:%d" % (func.__name__, str(A), B, C, ret, exp)
 end = timeit.default_timer()
 print("Time for %s: %f" % (func.__name__, end - start))




def solve2(A, B, C):
 c_str = str(C)
 c_len = len(c_str)
 if A == or B > c_len:
 return 0
 if B < c_len:
 a_len = len(set(A))
 if B == 1:
 return a_len
 non_0_len = a_len - (0 in A)
 return non_0_len * (a_len ** (B-1))
 assert B == c_len # Constraint is C
 head, last_dig = divmod(C, 10)
 nb_last_dig_cand = sum(i < last_dig for i in A)
 if head == 0:
 return nb_last_dig_cand
 ret = solve_naive(A, B-1, head - 1) * len(A)
 ret_dummy = solve_naive(A, B, C)
 print(ret - ret_dummy, A, B, C)
 return ret_dummy

if A == or B > c_len: could be if A or B > c_len: (even if some newbies may think that it tests > c_len for both A & B :) — May 22 at 11:58
in the case of B < c_len, and B > 1 you must discard the ones that start with a 0, or use len(<A without 0>) * len(A)**(B-1) — May 22 at 12:59
Oh, I guess I missed an edge case. Would you have a test case for this ? — May 22 at 13:00

score 5 · Answer 4 · 2018-05-24 09:07:25Z

DRY

The code you use for the special cases is independent from each other, and has no impact on the further handling of the data, so it is unnecessary to make a complete code path for each of the combinations edge cases, since this will explode the number of code quickly.

generators

You seem to use iterators, but instantiate the list intermediately at every step, while this is not needed. Even counting how much items are in a iterator can be done more memory efficient with sum(1 for _ in iterator) instead of len(list(iterator))

comparison

instead of converting all the combinations to str and then back to int, why not keep it in the shape of a tuple and use tuple comparison.

Ordered

Since the list of digits is ordered, the products will be ordered too. So, instead of using ifilter, you can use takewhile and dropwhile, this will limit the number of checks to be done

My code:

from itertools import dropwhile, product, takewhile

def solve(A, B, C):
 if A == or B > len(str(C)):
 return 0
 c_tuple = tuple(map(int, str(C)))
 combinations = product(A, repeat=B)
 if B != 1:
 combinations = dropwhile(lambda x: x[0] == 0, combinations)
 if B == len(c_tuple):
 combinations = takewhile(lambda x: x < c_tuple, combinations)
 return sum(1 for _ in combinations)

alternative implementation

Since apparently this isn't fast enough, I was looking for another way around this, without generating all the possibilities:

from bisect import bisect_left
def solve_fast(A, B, C):
 c_tuple = tuple(map(int, str(C)))
 if A == or B > len(c_tuple) or c_tuple[0] < A[0]:
 return 0
 if A == [0]:
 return B == 1 and C != 0
 if B == 1:
 return sum(1 for a in A if a < C)
 len_a, len_c = len(A), len(c_tuple)
 A_0 = not A[0]
 if B < len_c or c_tuple[0] > A[-1]:
 return len_a ** (B-A_0) * (len_a-A_0)**A_0

 idx_c = bisect_left(A, c_tuple[0]) - A_0
# idx_c = sum(0 < a < c_tuple[0] for a in A)
 result_smaller = idx_c * len_a**(len_c - 1) 
 # number of combinations starting with a smaller digit that the first digit of C which is not 0,

 result_same = 0
 c_equal = c_tuple[0] in A
 for i, c in enumerate(c_tuple[1:], 2):
 if not c_equal: # as long as there is a digit in A which is the same as the next digit in C, continue
 break
 idx_c = bisect_left(A, c) # numbers of digits smaller than c
# idx_c = sum(a < c for a in A) # numbers of digits smaller than c
 if A[idx_c] != c:
 c_equal = False
 if idx_c:
 result_same += idx_c * len_a ** (len_c - i)
 return result_same + result_smaller

This code is far less elegant than the other one, but it is faster.

If I look closer to it, it has the same backbone as josay's algorithm, but without the recursion, and converting C to a tuple of ints, instead of keeping it as a str.

timings:

def test_function(func, tests):
 flag = True
 for test in tests:
 A, B, C, expected = test
 result = func(A, B, C)
 if expected != result:
 print(f'func.__name__: test failed: result')
 flag = False
 return flag
funcs = [solve_josay, solve_op, solve_maarten, solve_dummy, solve_fast]
all(test_function(func, tests) for func in funcs)

all passed

timings:

import timeit
for func in funcs:
 global_dict='func': func, 'tests': tests, 'test_function': test_function
 time = timeit.timeit('test_function(func, tests)', globals=global_dict, number = 1000)
 print(func.__name__, time)

print(func.__name__, time)

solve_josay 0.036541963709623815
solve_op 4.350994605536243
solve_maarten 0.7999383794617643
solve_fast 0.03256370566714395
solve_dummy 113.19599720861424

shows the performance is pretty close to Josay's, but 20+ times faster than my original attempt

It looks good. arguments shouldn't be upper case, though, should they? — May 22 at 14:48
There might be some potential code review, then? You already have an excellent answer, BTW. — May 22 at 15:01
In this case the OP inherited the signature from an external source, so I found it more important to stay consistent with that, than change the signature — May 23 at 8:50
@MaartenFabrtÃ© thanks for the review but now the source says time limit exceeded , so it still need to be optimised . — May 24 at 5:35

Numbers of length N and value less than K

2 Answers 2

DRY

generators

comparison

Ordered

My code:

alternative implementation

timings:

timings:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

DRY

generators

comparison

Ordered

My code:

alternative implementation

timings:

timings:

DRY

generators

comparison

Ordered

My code:

alternative implementation

timings:

timings:

DRY

generators

comparison

Ordered

My code:

alternative implementation

timings:

timings:

DRY

generators

comparison

Ordered

My code:

alternative implementation

timings:

timings:

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Greedy Best First Search implementation in Rust

Function to Return a JSON Like Objects Using VBA Collections and Arrays

C++11 CLH Lock Implementation

2 Answers
2

2 Answers
2

2 Answers
2