Variant of 2-SUM Problem using Hashtable with multi-threading

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
4
down vote

favorite

I am doing a Coursera assignment on Hashtable and my solution is taking too long to run. Here is the problem as described in the assignment:

The goal of this problem is to implement a variant of the 2-SUM algorithm covered in this week's lectures.

The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.

Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t

Here is my entire Python code:

from collections import defaultdict
import threading

def hash_fn(n):
 M=1999993
 return n%M

def two_sum_present(target):
 global ans
 bool=True
 for x in nums:
 y = target-x
 if y in my_hash_table[hash_fn(y)] and y!=x:
 bool=False
 ans+=1
 return 1
 if bool == True:
 return 0
f = open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt')
nums=
my_hash_table = defaultdict(list) 
for line in f.readlines():
 num = int(line.split('n')[0])
 nums.append(num)
 my_hash_table[hash_fn(num)].append(num)

ans=0
for nr in range(-10000, 10001):
 thr = threading.Thread(target = two_sum_present, args=(nr,))
 thr.start()

print(ans)

Notes:

I found a page for the same problem here, but this solution uses neither hashing nor binary search, so it should take longer than my solution. This solution is essentially this:
```
ans = sum(any(n-x in nums and 2*x != n for x in nums) for n in range(-10000, 10001))
```

The text file can be found here. if someone is interested. But think of it as a big text file with 1M lines having 1 integer(+ve or -ve) on each line.

'distinct' as mentioned here only means that x+x=t is not a valid 2-sum pair; but two (x, y) pairs where x and y are not same constitute a valid pair

edited Apr 14 at 8:08

asked Apr 5 at 23:00

Piyush Singh

317

Please fix the indentation in your code. The hash_fn is misaligned for eg.
â€“Â hjpotter92
Apr 5 at 23:04

Sorry for bad indentation there. Fixed it.
â€“Â Piyush Singh
Apr 5 at 23:09

Welcome to Code Review! I hope you get some great answers.
â€“Â Phrancis
Apr 5 at 23:29

Your hash table will have collisions (you are not checking for equality before overwriting an existing entry) and your hash function is not really doing much for you. Try this trick, change your hash function to return the number you pass in to it, is it faster? I suspect it will be.
â€“Â Turksarama
Apr 6 at 2:31

1

@PiyushSingh I removed the link that you changed in your last edit because strange things happened when I clicked the link. Maybe the site is infected.
â€“Â miracle173
Apr 11 at 8:25

Â |Â
show 8 more comments

up vote
4
down vote

favorite

I am doing a Coursera assignment on Hashtable and my solution is taking too long to run. Here is the problem as described in the assignment:

The goal of this problem is to implement a variant of the 2-SUM algorithm covered in this week's lectures.

The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.

Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t

Here is my entire Python code:

from collections import defaultdict
import threading

def hash_fn(n):
 M=1999993
 return n%M

def two_sum_present(target):
 global ans
 bool=True
 for x in nums:
 y = target-x
 if y in my_hash_table[hash_fn(y)] and y!=x:
 bool=False
 ans+=1
 return 1
 if bool == True:
 return 0
f = open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt')
nums=
my_hash_table = defaultdict(list) 
for line in f.readlines():
 num = int(line.split('n')[0])
 nums.append(num)
 my_hash_table[hash_fn(num)].append(num)

ans=0
for nr in range(-10000, 10001):
 thr = threading.Thread(target = two_sum_present, args=(nr,))
 thr.start()

print(ans)

Notes:

I found a page for the same problem here, but this solution uses neither hashing nor binary search, so it should take longer than my solution. This solution is essentially this:
```
ans = sum(any(n-x in nums and 2*x != n for x in nums) for n in range(-10000, 10001))
```

The text file can be found here. if someone is interested. But think of it as a big text file with 1M lines having 1 integer(+ve or -ve) on each line.

'distinct' as mentioned here only means that x+x=t is not a valid 2-sum pair; but two (x, y) pairs where x and y are not same constitute a valid pair

edited Apr 14 at 8:08

asked Apr 5 at 23:00

Piyush Singh

317

Please fix the indentation in your code. The hash_fn is misaligned for eg.
â€“Â hjpotter92
Apr 5 at 23:04

Sorry for bad indentation there. Fixed it.
â€“Â Piyush Singh
Apr 5 at 23:09

Welcome to Code Review! I hope you get some great answers.
â€“Â Phrancis
Apr 5 at 23:29

Your hash table will have collisions (you are not checking for equality before overwriting an existing entry) and your hash function is not really doing much for you. Try this trick, change your hash function to return the number you pass in to it, is it faster? I suspect it will be.
â€“Â Turksarama
Apr 6 at 2:31

1

@PiyushSingh I removed the link that you changed in your last edit because strange things happened when I clicked the link. Maybe the site is infected.
â€“Â miracle173
Apr 11 at 8:25

Â |Â
show 8 more comments

up vote
4
down vote

favorite

I am doing a Coursera assignment on Hashtable and my solution is taking too long to run. Here is the problem as described in the assignment:

The goal of this problem is to implement a variant of the 2-SUM algorithm covered in this week's lectures.

The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.

Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t

Here is my entire Python code:

from collections import defaultdict
import threading

def hash_fn(n):
 M=1999993
 return n%M

def two_sum_present(target):
 global ans
 bool=True
 for x in nums:
 y = target-x
 if y in my_hash_table[hash_fn(y)] and y!=x:
 bool=False
 ans+=1
 return 1
 if bool == True:
 return 0
f = open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt')
nums=
my_hash_table = defaultdict(list) 
for line in f.readlines():
 num = int(line.split('n')[0])
 nums.append(num)
 my_hash_table[hash_fn(num)].append(num)

ans=0
for nr in range(-10000, 10001):
 thr = threading.Thread(target = two_sum_present, args=(nr,))
 thr.start()

print(ans)

Notes:

I found a page for the same problem here, but this solution uses neither hashing nor binary search, so it should take longer than my solution. This solution is essentially this:
```
ans = sum(any(n-x in nums and 2*x != n for x in nums) for n in range(-10000, 10001))
```

The text file can be found here. if someone is interested. But think of it as a big text file with 1M lines having 1 integer(+ve or -ve) on each line.

'distinct' as mentioned here only means that x+x=t is not a valid 2-sum pair; but two (x, y) pairs where x and y are not same constitute a valid pair

edited Apr 14 at 8:08

asked Apr 5 at 23:00

Piyush Singh

317

I am doing a Coursera assignment on Hashtable and my solution is taking too long to run. Here is the problem as described in the assignment:

The goal of this problem is to implement a variant of the 2-SUM algorithm covered in this week's lectures.

The file contains 1 million integers, both positive and negative (there might be some repetitions!).This is your array of integers, with the ith row of the file specifying the ith entry of the array.

Your task is to compute the number of target values t in the interval [-10000,10000] (inclusive) such that there are distinct numbers x,y in the input file that satisfy x+y=t

Here is my entire Python code:

from collections import defaultdict
import threading

def hash_fn(n):
 M=1999993
 return n%M

def two_sum_present(target):
 global ans
 bool=True
 for x in nums:
 y = target-x
 if y in my_hash_table[hash_fn(y)] and y!=x:
 bool=False
 ans+=1
 return 1
 if bool == True:
 return 0
f = open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt')
nums=
my_hash_table = defaultdict(list) 
for line in f.readlines():
 num = int(line.split('n')[0])
 nums.append(num)
 my_hash_table[hash_fn(num)].append(num)

ans=0
for nr in range(-10000, 10001):
 thr = threading.Thread(target = two_sum_present, args=(nr,))
 thr.start()

print(ans)

Notes:

I found a page for the same problem here, but this solution uses neither hashing nor binary search, so it should take longer than my solution. This solution is essentially this:
```
ans = sum(any(n-x in nums and 2*x != n for x in nums) for n in range(-10000, 10001))
```

The text file can be found here. if someone is interested. But think of it as a big text file with 1M lines having 1 integer(+ve or -ve) on each line.

'distinct' as mentioned here only means that x+x=t is not a valid 2-sum pair; but two (x, y) pairs where x and y are not same constitute a valid pair

edited Apr 14 at 8:08

asked Apr 5 at 23:00

Piyush Singh

317

edited Apr 14 at 8:08

asked Apr 5 at 23:00

Piyush Singh

317

asked Apr 5 at 23:00

Piyush Singh

317

asked Apr 5 at 23:00

Piyush Singh

317

Please fix the indentation in your code. The hash_fn is misaligned for eg.
â€“Â hjpotter92
Apr 5 at 23:04

Sorry for bad indentation there. Fixed it.
â€“Â Piyush Singh
Apr 5 at 23:09

Welcome to Code Review! I hope you get some great answers.
â€“Â Phrancis
Apr 5 at 23:29

Your hash table will have collisions (you are not checking for equality before overwriting an existing entry) and your hash function is not really doing much for you. Try this trick, change your hash function to return the number you pass in to it, is it faster? I suspect it will be.
â€“Â Turksarama
Apr 6 at 2:31

1

@PiyushSingh I removed the link that you changed in your last edit because strange things happened when I clicked the link. Maybe the site is infected.
â€“Â miracle173
Apr 11 at 8:25

Â |Â
show 8 more comments

Please fix the indentation in your code. The hash_fn is misaligned for eg.
â€“Â hjpotter92
Apr 5 at 23:04

Sorry for bad indentation there. Fixed it.
â€“Â Piyush Singh
Apr 5 at 23:09

Welcome to Code Review! I hope you get some great answers.
â€“Â Phrancis
Apr 5 at 23:29

Your hash table will have collisions (you are not checking for equality before overwriting an existing entry) and your hash function is not really doing much for you. Try this trick, change your hash function to return the number you pass in to it, is it faster? I suspect it will be.
â€“Â Turksarama
Apr 6 at 2:31

1

@PiyushSingh I removed the link that you changed in your last edit because strange things happened when I clicked the link. Maybe the site is infected.
â€“Â miracle173
Apr 11 at 8:25

Please fix the indentation in your code. The hash_fn is misaligned for eg.
â€“Â hjpotter92
Apr 5 at 23:04

Sorry for bad indentation there. Fixed it.
â€“Â Piyush Singh
Apr 5 at 23:09

Welcome to Code Review! I hope you get some great answers.
â€“Â Phrancis
Apr 5 at 23:29

Your hash table will have collisions (you are not checking for equality before overwriting an existing entry) and your hash function is not really doing much for you. Try this trick, change your hash function to return the number you pass in to it, is it faster? I suspect it will be.
â€“Â Turksarama
Apr 6 at 2:31

@PiyushSingh I removed the link that you changed in your last edit because strange things happened when I clicked the link. Maybe the site is infected.
â€“Â miracle173
Apr 11 at 8:25

Â |Â
show 8 more comments

2 Answers
2

active

oldest

votes

up vote
2
down vote

functions

split the work you want to do in logical functions. I try to limit the work that is done in the general script (and thus the global namespace) as much as possible. Lookups of local variable are supposed to be faster, but it make the code easier to read and test as well, so win-win.

hashing

Why device your own hashing? just dumping all the numbers in a set will eliminate all the duplicates, and 1M integers is a limited amount of memory. Certainly a lot less that the defaultdict you use

global variable ans

Instead of passing the global variable ans, the easiest way would be to use the fact that in a numeric context (like sum), True equals 1 and False equals 0, so you can just return True or False in two_sum_present

my take on this:

read the file

Using the fact that a file is an iterator, I would do something like this

def read_file(filename):
 with open(filename, 'r') as file:
 return set(map(int, file))

find the target

About the same as your two_sum_present with my remarks incorporated

def find_target(numbers, target):
 for i in numbers:
 if target - i in numbers and 2 * i != target:
 return True
 return False

iterate over the interval

def find_sums(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target(numbers, target)

putting it together

def main(filename, start=-10000, end=10000):
 numbers = read_file(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums(numbers, start, end))

if __name__ == '__main__':
 filename = 'data/_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt'
 print(main(filename, -100, 100))

All in all, it took 43 seconds to scan those 201 numbers on my system, to find 5 matches. Reading the file took about .6s of that time

This solution is essentially the same as sum(any(n-x in numbers and 2*x != n for x in numbers) for n in range(-100, 101)), but that one took 59s

Multithreading

If you want to, you can divide out all the call to find_target to different workers/cores/... In this case, a frozenset instead of a set might be more appropriate since it's immutable, and will give less problems with concurrency

distinct

As noted by Gareth Rees, the original question is ambiguous on what is mean with distinct numbers x,y. To cover the second interpretation, You can change the set to a collections.Counter, and change the test criterium slightly

from collections import Counter

def read_file_distinct(filename):
 with open(filename, 'r') as file:
 return Counter(map(int, file))

def find_sums_distinct(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct(numbers, target)

def find_target_distinct(numbers, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or numbers[i] > 1):
 return True
 return False

def main_distinct(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums_distinct(numbers, start, end))

This has an effect on speed, though

%timeit main(filename, -10, 10)

4.51 s Ã‚Â± 32.5 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

%timeit main_distinct(filename, -10, 10)

6.42 s Ã‚Â± 48.4 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

distinct 2

A slightly different, faster approach can be used to tackle the 2nd interpretation

def find_sums_distinct2(numbers, repetitions,start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct2(numbers, repetitions, target)

def find_target_distinct2(numbers, repetitions, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or i in repetitions):
 return True
 return False

def main_distinct2(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
 repetitions = k for k, v in numbers.items() if ((v > 1) and (start < k < end))
# print(repetitions)
 numbers = set(numbers)
# print(f'len(numbers) numbers found, len(repetitions) repetitions')
 return sum(find_sums_distinct2(numbers, repetitions, start, end))

%timeit main_distinct2(filename, -10, 10)

4.92 s Ã‚Â± 160 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

edited Apr 9 at 15:04

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

This doesn't seem to handle the possibility of repetitions in the input.
â€“Â Gareth Rees
Apr 9 at 14:08

does it need to? The question mentions such that there are distinct numbers x,y in the input file that satisfy x+y=t. If instead by distinct they also mean 2 repetitions of the same number, then a Counter might be more appropriate, with a slightly revised test like if target - i in numbers and (2 * i != target or numbers[i] > 1)
â€“Â Maarten FabrÃ©
Apr 9 at 14:39

The question is not exactly clear on this point. From a pragmatic point of view, why note "there might be some repetitions" unless this is relevant somehow? But on the other hand, you're right that "distinct" suggests $x ne y$.
â€“Â Gareth Rees
Apr 9 at 14:44

the question is unclear on this point indeed, and I selected one interpretation. I will try to adapt the answer to include both interpretations
â€“Â Maarten FabrÃ©
Apr 9 at 14:45

@MaartenFabrÃ© I have added the meaning of 'distinct' in the question. Also in find_target function you should search only in the range [-10000-target, 10000-target], because, that's where any relevant complementary number for that number might lie; if any. Also you don't need to search for 10001 because we are not interested if any number pair add up to 10001. For completion, I am adding my own answer which ran in 2 seconds on my PC.
â€“Â Piyush Singh
Apr 9 at 17:19

add a commentÂ |Â

up vote
1
down vote

import bisect
num_set = set()
with open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt') as file:
 for num in file.read().splitlines():
 num = int(num)
 num_set.add(num)

num_list = sorted(num_set)
targets = set()
for num in num_list:
 low = bisect.bisect_left(num_list, -10000-num)
 high = bisect.bisect_right(num_list, 10000-num)
 for num_2 in num_list[low:high]:
 if num_2 != num:
 targets.add(num+num_2)

print(len(targets))

Using Maarten's code, I wrote this; but I limited my search range for each number to [-10000-number, 10000-number] because that's where any relevant number for that number will lie; if any. This helped greatly with the running time.

edited Apr 10 at 12:26

Billal BEGUERADJ

answered Apr 9 at 17:21

Piyush Singh

317

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f191365%2fvariant-of-2-sum-problem-using-hashtable-with-multi-threading%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

functions

hashing

global variable ans

my take on this:

read the file

Using the fact that a file is an iterator, I would do something like this

def read_file(filename):
 with open(filename, 'r') as file:
 return set(map(int, file))

find the target

About the same as your two_sum_present with my remarks incorporated

def find_target(numbers, target):
 for i in numbers:
 if target - i in numbers and 2 * i != target:
 return True
 return False

iterate over the interval

def find_sums(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target(numbers, target)

putting it together

def main(filename, start=-10000, end=10000):
 numbers = read_file(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums(numbers, start, end))

if __name__ == '__main__':
 filename = 'data/_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt'
 print(main(filename, -100, 100))

All in all, it took 43 seconds to scan those 201 numbers on my system, to find 5 matches. Reading the file took about .6s of that time

This solution is essentially the same as sum(any(n-x in numbers and 2*x != n for x in numbers) for n in range(-100, 101)), but that one took 59s

Multithreading

distinct

from collections import Counter

def read_file_distinct(filename):
 with open(filename, 'r') as file:
 return Counter(map(int, file))

def find_sums_distinct(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct(numbers, target)

def find_target_distinct(numbers, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or numbers[i] > 1):
 return True
 return False

def main_distinct(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums_distinct(numbers, start, end))

This has an effect on speed, though

%timeit main(filename, -10, 10)

4.51 s Ã‚Â± 32.5 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

%timeit main_distinct(filename, -10, 10)

6.42 s Ã‚Â± 48.4 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

distinct 2

A slightly different, faster approach can be used to tackle the 2nd interpretation

def find_sums_distinct2(numbers, repetitions,start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct2(numbers, repetitions, target)

def find_target_distinct2(numbers, repetitions, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or i in repetitions):
 return True
 return False

def main_distinct2(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
 repetitions = k for k, v in numbers.items() if ((v > 1) and (start < k < end))
# print(repetitions)
 numbers = set(numbers)
# print(f'len(numbers) numbers found, len(repetitions) repetitions')
 return sum(find_sums_distinct2(numbers, repetitions, start, end))

%timeit main_distinct2(filename, -10, 10)

4.92 s Ã‚Â± 160 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

edited Apr 9 at 15:04

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

This doesn't seem to handle the possibility of repetitions in the input.
â€“Â Gareth Rees
Apr 9 at 14:08

does it need to? The question mentions such that there are distinct numbers x,y in the input file that satisfy x+y=t. If instead by distinct they also mean 2 repetitions of the same number, then a Counter might be more appropriate, with a slightly revised test like if target - i in numbers and (2 * i != target or numbers[i] > 1)
â€“Â Maarten FabrÃ©
Apr 9 at 14:39

The question is not exactly clear on this point. From a pragmatic point of view, why note "there might be some repetitions" unless this is relevant somehow? But on the other hand, you're right that "distinct" suggests $x ne y$.
â€“Â Gareth Rees
Apr 9 at 14:44

the question is unclear on this point indeed, and I selected one interpretation. I will try to adapt the answer to include both interpretations
â€“Â Maarten FabrÃ©
Apr 9 at 14:45

@MaartenFabrÃ© I have added the meaning of 'distinct' in the question. Also in find_target function you should search only in the range [-10000-target, 10000-target], because, that's where any relevant complementary number for that number might lie; if any. Also you don't need to search for 10001 because we are not interested if any number pair add up to 10001. For completion, I am adding my own answer which ran in 2 seconds on my PC.
â€“Â Piyush Singh
Apr 9 at 17:19

add a commentÂ |Â

up vote
2
down vote

functions

hashing

global variable ans

my take on this:

read the file

Using the fact that a file is an iterator, I would do something like this

def read_file(filename):
 with open(filename, 'r') as file:
 return set(map(int, file))

find the target

About the same as your two_sum_present with my remarks incorporated

def find_target(numbers, target):
 for i in numbers:
 if target - i in numbers and 2 * i != target:
 return True
 return False

iterate over the interval

def find_sums(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target(numbers, target)

putting it together

def main(filename, start=-10000, end=10000):
 numbers = read_file(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums(numbers, start, end))

if __name__ == '__main__':
 filename = 'data/_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt'
 print(main(filename, -100, 100))

All in all, it took 43 seconds to scan those 201 numbers on my system, to find 5 matches. Reading the file took about .6s of that time

This solution is essentially the same as sum(any(n-x in numbers and 2*x != n for x in numbers) for n in range(-100, 101)), but that one took 59s

Multithreading

distinct

from collections import Counter

def read_file_distinct(filename):
 with open(filename, 'r') as file:
 return Counter(map(int, file))

def find_sums_distinct(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct(numbers, target)

def find_target_distinct(numbers, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or numbers[i] > 1):
 return True
 return False

def main_distinct(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums_distinct(numbers, start, end))

This has an effect on speed, though

%timeit main(filename, -10, 10)

4.51 s Ã‚Â± 32.5 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

%timeit main_distinct(filename, -10, 10)

6.42 s Ã‚Â± 48.4 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

distinct 2

A slightly different, faster approach can be used to tackle the 2nd interpretation

def find_sums_distinct2(numbers, repetitions,start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct2(numbers, repetitions, target)

def find_target_distinct2(numbers, repetitions, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or i in repetitions):
 return True
 return False

def main_distinct2(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
 repetitions = k for k, v in numbers.items() if ((v > 1) and (start < k < end))
# print(repetitions)
 numbers = set(numbers)
# print(f'len(numbers) numbers found, len(repetitions) repetitions')
 return sum(find_sums_distinct2(numbers, repetitions, start, end))

%timeit main_distinct2(filename, -10, 10)

4.92 s Ã‚Â± 160 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

edited Apr 9 at 15:04

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

This doesn't seem to handle the possibility of repetitions in the input.
â€“Â Gareth Rees
Apr 9 at 14:08

does it need to? The question mentions such that there are distinct numbers x,y in the input file that satisfy x+y=t. If instead by distinct they also mean 2 repetitions of the same number, then a Counter might be more appropriate, with a slightly revised test like if target - i in numbers and (2 * i != target or numbers[i] > 1)
â€“Â Maarten FabrÃ©
Apr 9 at 14:39

The question is not exactly clear on this point. From a pragmatic point of view, why note "there might be some repetitions" unless this is relevant somehow? But on the other hand, you're right that "distinct" suggests $x ne y$.
â€“Â Gareth Rees
Apr 9 at 14:44

the question is unclear on this point indeed, and I selected one interpretation. I will try to adapt the answer to include both interpretations
â€“Â Maarten FabrÃ©
Apr 9 at 14:45

@MaartenFabrÃ© I have added the meaning of 'distinct' in the question. Also in find_target function you should search only in the range [-10000-target, 10000-target], because, that's where any relevant complementary number for that number might lie; if any. Also you don't need to search for 10001 because we are not interested if any number pair add up to 10001. For completion, I am adding my own answer which ran in 2 seconds on my PC.
â€“Â Piyush Singh
Apr 9 at 17:19

add a commentÂ |Â

up vote
2
down vote

functions

hashing

global variable ans

my take on this:

read the file

Using the fact that a file is an iterator, I would do something like this

def read_file(filename):
 with open(filename, 'r') as file:
 return set(map(int, file))

find the target

About the same as your two_sum_present with my remarks incorporated

def find_target(numbers, target):
 for i in numbers:
 if target - i in numbers and 2 * i != target:
 return True
 return False

iterate over the interval

def find_sums(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target(numbers, target)

putting it together

def main(filename, start=-10000, end=10000):
 numbers = read_file(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums(numbers, start, end))

if __name__ == '__main__':
 filename = 'data/_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt'
 print(main(filename, -100, 100))

All in all, it took 43 seconds to scan those 201 numbers on my system, to find 5 matches. Reading the file took about .6s of that time

This solution is essentially the same as sum(any(n-x in numbers and 2*x != n for x in numbers) for n in range(-100, 101)), but that one took 59s

Multithreading

distinct

from collections import Counter

def read_file_distinct(filename):
 with open(filename, 'r') as file:
 return Counter(map(int, file))

def find_sums_distinct(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct(numbers, target)

def find_target_distinct(numbers, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or numbers[i] > 1):
 return True
 return False

def main_distinct(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums_distinct(numbers, start, end))

This has an effect on speed, though

%timeit main(filename, -10, 10)

4.51 s Ã‚Â± 32.5 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

%timeit main_distinct(filename, -10, 10)

6.42 s Ã‚Â± 48.4 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

distinct 2

A slightly different, faster approach can be used to tackle the 2nd interpretation

def find_sums_distinct2(numbers, repetitions,start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct2(numbers, repetitions, target)

def find_target_distinct2(numbers, repetitions, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or i in repetitions):
 return True
 return False

def main_distinct2(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
 repetitions = k for k, v in numbers.items() if ((v > 1) and (start < k < end))
# print(repetitions)
 numbers = set(numbers)
# print(f'len(numbers) numbers found, len(repetitions) repetitions')
 return sum(find_sums_distinct2(numbers, repetitions, start, end))

%timeit main_distinct2(filename, -10, 10)

4.92 s Ã‚Â± 160 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

edited Apr 9 at 15:04

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

functions

hashing

global variable ans

my take on this:

read the file

Using the fact that a file is an iterator, I would do something like this

def read_file(filename):
 with open(filename, 'r') as file:
 return set(map(int, file))

find the target

About the same as your two_sum_present with my remarks incorporated

def find_target(numbers, target):
 for i in numbers:
 if target - i in numbers and 2 * i != target:
 return True
 return False

iterate over the interval

def find_sums(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target(numbers, target)

putting it together

def main(filename, start=-10000, end=10000):
 numbers = read_file(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums(numbers, start, end))

if __name__ == '__main__':
 filename = 'data/_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt'
 print(main(filename, -100, 100))

All in all, it took 43 seconds to scan those 201 numbers on my system, to find 5 matches. Reading the file took about .6s of that time

This solution is essentially the same as sum(any(n-x in numbers and 2*x != n for x in numbers) for n in range(-100, 101)), but that one took 59s

Multithreading

distinct

from collections import Counter

def read_file_distinct(filename):
 with open(filename, 'r') as file:
 return Counter(map(int, file))

def find_sums_distinct(numbers, start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct(numbers, target)

def find_target_distinct(numbers, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or numbers[i] > 1):
 return True
 return False

def main_distinct(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
# print(f'numbers found: len(numbers)')
 return sum(find_sums_distinct(numbers, start, end))

This has an effect on speed, though

%timeit main(filename, -10, 10)

4.51 s Ã‚Â± 32.5 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

%timeit main_distinct(filename, -10, 10)

6.42 s Ã‚Â± 48.4 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

distinct 2

A slightly different, faster approach can be used to tackle the 2nd interpretation

def find_sums_distinct2(numbers, repetitions,start=-10000, end=10000):
 for target in range(start, end + 1):
# print(target)
 yield find_target_distinct2(numbers, repetitions, target)

def find_target_distinct2(numbers, repetitions, target):
 for i in numbers:
 if target - i in numbers and (2 * i != target or i in repetitions):
 return True
 return False

def main_distinct2(filename, start=-10000, end=10000):
 numbers = read_file_distinct(filename)
 repetitions = k for k, v in numbers.items() if ((v > 1) and (start < k < end))
# print(repetitions)
 numbers = set(numbers)
# print(f'len(numbers) numbers found, len(repetitions) repetitions')
 return sum(find_sums_distinct2(numbers, repetitions, start, end))

%timeit main_distinct2(filename, -10, 10)

4.92 s Ã‚Â± 160 ms per loop (mean Ã‚Â± std. dev. of 7 runs, 1 loop each)

edited Apr 9 at 15:04

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

edited Apr 9 at 15:04

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

answered Apr 9 at 12:25

Maarten FabrÃ©

3,204214

This doesn't seem to handle the possibility of repetitions in the input.
â€“Â Gareth Rees
Apr 9 at 14:08

does it need to? The question mentions such that there are distinct numbers x,y in the input file that satisfy x+y=t. If instead by distinct they also mean 2 repetitions of the same number, then a Counter might be more appropriate, with a slightly revised test like if target - i in numbers and (2 * i != target or numbers[i] > 1)
â€“Â Maarten FabrÃ©
Apr 9 at 14:39

The question is not exactly clear on this point. From a pragmatic point of view, why note "there might be some repetitions" unless this is relevant somehow? But on the other hand, you're right that "distinct" suggests $x ne y$.
â€“Â Gareth Rees
Apr 9 at 14:44

the question is unclear on this point indeed, and I selected one interpretation. I will try to adapt the answer to include both interpretations
â€“Â Maarten FabrÃ©
Apr 9 at 14:45

@MaartenFabrÃ© I have added the meaning of 'distinct' in the question. Also in find_target function you should search only in the range [-10000-target, 10000-target], because, that's where any relevant complementary number for that number might lie; if any. Also you don't need to search for 10001 because we are not interested if any number pair add up to 10001. For completion, I am adding my own answer which ran in 2 seconds on my PC.
â€“Â Piyush Singh
Apr 9 at 17:19

add a commentÂ |Â

This doesn't seem to handle the possibility of repetitions in the input.
â€“Â Gareth Rees
Apr 9 at 14:08

does it need to? The question mentions such that there are distinct numbers x,y in the input file that satisfy x+y=t. If instead by distinct they also mean 2 repetitions of the same number, then a Counter might be more appropriate, with a slightly revised test like if target - i in numbers and (2 * i != target or numbers[i] > 1)
â€“Â Maarten FabrÃ©
Apr 9 at 14:39

The question is not exactly clear on this point. From a pragmatic point of view, why note "there might be some repetitions" unless this is relevant somehow? But on the other hand, you're right that "distinct" suggests $x ne y$.
â€“Â Gareth Rees
Apr 9 at 14:44

the question is unclear on this point indeed, and I selected one interpretation. I will try to adapt the answer to include both interpretations
â€“Â Maarten FabrÃ©
Apr 9 at 14:45

@MaartenFabrÃ© I have added the meaning of 'distinct' in the question. Also in find_target function you should search only in the range [-10000-target, 10000-target], because, that's where any relevant complementary number for that number might lie; if any. Also you don't need to search for 10001 because we are not interested if any number pair add up to 10001. For completion, I am adding my own answer which ran in 2 seconds on my PC.
â€“Â Piyush Singh
Apr 9 at 17:19

This doesn't seem to handle the possibility of repetitions in the input.
â€“Â Gareth Rees
Apr 9 at 14:08

does it need to? The question mentions such that there are distinct numbers x,y in the input file that satisfy x+y=t. If instead by distinct they also mean 2 repetitions of the same number, then a Counter might be more appropriate, with a slightly revised test like if target - i in numbers and (2 * i != target or numbers[i] > 1)
â€“Â Maarten FabrÃ©
Apr 9 at 14:39

The question is not exactly clear on this point. From a pragmatic point of view, why note "there might be some repetitions" unless this is relevant somehow? But on the other hand, you're right that "distinct" suggests $x ne y$.
â€“Â Gareth Rees
Apr 9 at 14:44

the question is unclear on this point indeed, and I selected one interpretation. I will try to adapt the answer to include both interpretations
â€“Â Maarten FabrÃ©
Apr 9 at 14:45

@MaartenFabrÃ© I have added the meaning of 'distinct' in the question. Also in find_target function you should search only in the range [-10000-target, 10000-target], because, that's where any relevant complementary number for that number might lie; if any. Also you don't need to search for 10001 because we are not interested if any number pair add up to 10001. For completion, I am adding my own answer which ran in 2 seconds on my PC.
â€“Â Piyush Singh
Apr 9 at 17:19

add a commentÂ |Â

up vote
1
down vote

import bisect
num_set = set()
with open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt') as file:
 for num in file.read().splitlines():
 num = int(num)
 num_set.add(num)

num_list = sorted(num_set)
targets = set()
for num in num_list:
 low = bisect.bisect_left(num_list, -10000-num)
 high = bisect.bisect_right(num_list, 10000-num)
 for num_2 in num_list[low:high]:
 if num_2 != num:
 targets.add(num+num_2)

print(len(targets))

edited Apr 10 at 12:26

Billal BEGUERADJ

answered Apr 9 at 17:21

Piyush Singh

317

add a commentÂ |Â

up vote
1
down vote

import bisect
num_set = set()
with open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt') as file:
 for num in file.read().splitlines():
 num = int(num)
 num_set.add(num)

num_list = sorted(num_set)
targets = set()
for num in num_list:
 low = bisect.bisect_left(num_list, -10000-num)
 high = bisect.bisect_right(num_list, 10000-num)
 for num_2 in num_list[low:high]:
 if num_2 != num:
 targets.add(num+num_2)

print(len(targets))

edited Apr 10 at 12:26

Billal BEGUERADJ

answered Apr 9 at 17:21

Piyush Singh

317

add a commentÂ |Â

up vote
1
down vote

import bisect
num_set = set()
with open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt') as file:
 for num in file.read().splitlines():
 num = int(num)
 num_set.add(num)

num_list = sorted(num_set)
targets = set()
for num in num_list:
 low = bisect.bisect_left(num_list, -10000-num)
 high = bisect.bisect_right(num_list, 10000-num)
 for num_2 in num_list[low:high]:
 if num_2 != num:
 targets.add(num+num_2)

print(len(targets))

edited Apr 10 at 12:26

Billal BEGUERADJ

answered Apr 9 at 17:21

Piyush Singh

317

import bisect
num_set = set()
with open('_6ec67df2804ff4b58ab21c12edcb21f8_algo1-programming_prob-2sum.txt') as file:
 for num in file.read().splitlines():
 num = int(num)
 num_set.add(num)

num_list = sorted(num_set)
targets = set()
for num in num_list:
 low = bisect.bisect_left(num_list, -10000-num)
 high = bisect.bisect_right(num_list, 10000-num)
 for num_2 in num_list[low:high]:
 if num_2 != num:
 targets.add(num+num_2)

print(len(targets))

edited Apr 10 at 12:26

Billal BEGUERADJ

answered Apr 9 at 17:21

Piyush Singh

317

edited Apr 10 at 12:26

Billal BEGUERADJ

edited Apr 10 at 12:26

Billal BEGUERADJ

edited Apr 10 at 12:26

Billal BEGUERADJ

answered Apr 9 at 17:21

Piyush Singh

317

answered Apr 9 at 17:21

Piyush Singh

317

answered Apr 9 at 17:21

Piyush Singh

317

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Variant of 2-SUM Problem using Hashtable with multi-threading

2 Answers 2

functions

hashing

global variable ans

my take on this:

read the file

find the target

iterate over the interval

putting it together

Multithreading

distinct

distinct 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

functions

hashing

global variable ans

my take on this:

read the file

find the target

iterate over the interval

putting it together

Multithreading

distinct

distinct 2

functions

hashing

global variable ans

my take on this:

read the file

find the target

iterate over the interval

putting it together

Multithreading

distinct

distinct 2

functions

hashing

global variable ans

my take on this:

read the file

find the target

iterate over the interval

putting it together

Multithreading

distinct

distinct 2

functions

hashing

global variable ans

my take on this:

read the file

find the target

iterate over the interval

putting it together

Multithreading

distinct

distinct 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read an image with ADNS2610 optical sensor and Arduino Uno

Read files from a directory using Promises

2 Answers
2

2 Answers
2

2 Answers
2