Python methods to reduce running times KTNS algorithm
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
7
down vote
favorite
I have been trying to develop an algorithm called keep the tool needed soonest (for more information about the KTNS explanation (page 3)) but during the simulations, I have realized that it takes too much time to run.
I want to decrease the running times and checking other previous questions about how to fast python coding Is Python slower than Java/C#? [closed] I have found several solutions, but I don't know how to implement them in my code.
On my computer it takes 0.004999876022338867 seconds, but the main problem is that for the whole program this function is called 13,000 times. I would like to reduce as much as possible the times but I don't know how. The fact of adapting the dictionaries to list is a good idea, but I don't know how to keep doing the same calculations. I have tried to ask in StackOverFlow but I've been said that here is a suitable place to do this kind of question. Here I attach my whole code, if you have any suggestion to improve it please don't hesitate to share with me.
import sets
import numpy
import copy
import time
J= 1: [6, 7],2: [2, 3], 3: [1, 6, 9], 4: [1, 5, 9], 5: [5, 8, 10], 6: [1, 3, 6, 8], 7: [5, 6, 8, 9], 8: [5, 7, 8], 9: [1, 4, 5, 8], 10: [1, 2, 4, 10]
def KTNS(sigma=[10, 9, 4, 1, 6, 3, 7, 5, 2, 8], Jobs=J, m=10 ,capacity=4 ):
t0=time.time()
Tools =
Lin=
n=len(sigma)
for i in range(1,m+1):
for e in sigma:
if i in Jobs[e]:
Tools[i]=sets.Set()
count = 1
available_tools=sets.Set()
for e in sigma:
for i in Jobs[e]:
Tools[i].add(count)
available_tools.add(i)
count+=1
Tin=copy.deepcopy(Tools)
for e in Tin:
Lin[e]=min(Tin[e])
count=1
J = numpy.array([0] *m)
W = numpy.array([[0] * m] * n)
while count<=len(sigma):
for e in Tools:
if len(available_tools)<capacity:
reference=len(available_tools)
else:
reference=capacity
while numpy.count_nonzero(J == 1) <reference:
min_value = min(Lin.itervalues())
min_keys = [k for k in Lin if Lin[k] == min_value]
temp = min_keys[0] #min(Lin, key=Lin.get)
if min_value>count:
if len(min_keys)>=2:
if count==1:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J0=W[count-2]
k=0
for elements in min_keys: #tested
if numpy.count_nonzero(J == 1) < reference:
if J0[elements-1]==1:
J[elements-1]=1
Lin[elements]='-'
k=1
else:
pass
else:
pass
if k==0:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp-1]=1
Lin[temp] = '-'
Tin[e].discard(count)
for element in Tin:
try:
Lin[element] = min(Tin[element])
except ValueError:
Tin[element]=sets.Set([len(sigma)+1])
Lin[element]=len(sigma)+1
W[count-1]=J
J= numpy.array([0] *m)
count+=1
Cost=0
for e in range(1,len(sigma)):
temp=W[e]-W[e-1]
temp[temp < 0] = 0
Cost+=sum(temp)
return Cost+capacity,time.time()-t0
python algorithm python-2.7 genetic-algorithm
add a comment |Â
up vote
7
down vote
favorite
I have been trying to develop an algorithm called keep the tool needed soonest (for more information about the KTNS explanation (page 3)) but during the simulations, I have realized that it takes too much time to run.
I want to decrease the running times and checking other previous questions about how to fast python coding Is Python slower than Java/C#? [closed] I have found several solutions, but I don't know how to implement them in my code.
On my computer it takes 0.004999876022338867 seconds, but the main problem is that for the whole program this function is called 13,000 times. I would like to reduce as much as possible the times but I don't know how. The fact of adapting the dictionaries to list is a good idea, but I don't know how to keep doing the same calculations. I have tried to ask in StackOverFlow but I've been said that here is a suitable place to do this kind of question. Here I attach my whole code, if you have any suggestion to improve it please don't hesitate to share with me.
import sets
import numpy
import copy
import time
J= 1: [6, 7],2: [2, 3], 3: [1, 6, 9], 4: [1, 5, 9], 5: [5, 8, 10], 6: [1, 3, 6, 8], 7: [5, 6, 8, 9], 8: [5, 7, 8], 9: [1, 4, 5, 8], 10: [1, 2, 4, 10]
def KTNS(sigma=[10, 9, 4, 1, 6, 3, 7, 5, 2, 8], Jobs=J, m=10 ,capacity=4 ):
t0=time.time()
Tools =
Lin=
n=len(sigma)
for i in range(1,m+1):
for e in sigma:
if i in Jobs[e]:
Tools[i]=sets.Set()
count = 1
available_tools=sets.Set()
for e in sigma:
for i in Jobs[e]:
Tools[i].add(count)
available_tools.add(i)
count+=1
Tin=copy.deepcopy(Tools)
for e in Tin:
Lin[e]=min(Tin[e])
count=1
J = numpy.array([0] *m)
W = numpy.array([[0] * m] * n)
while count<=len(sigma):
for e in Tools:
if len(available_tools)<capacity:
reference=len(available_tools)
else:
reference=capacity
while numpy.count_nonzero(J == 1) <reference:
min_value = min(Lin.itervalues())
min_keys = [k for k in Lin if Lin[k] == min_value]
temp = min_keys[0] #min(Lin, key=Lin.get)
if min_value>count:
if len(min_keys)>=2:
if count==1:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J0=W[count-2]
k=0
for elements in min_keys: #tested
if numpy.count_nonzero(J == 1) < reference:
if J0[elements-1]==1:
J[elements-1]=1
Lin[elements]='-'
k=1
else:
pass
else:
pass
if k==0:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp-1]=1
Lin[temp] = '-'
Tin[e].discard(count)
for element in Tin:
try:
Lin[element] = min(Tin[element])
except ValueError:
Tin[element]=sets.Set([len(sigma)+1])
Lin[element]=len(sigma)+1
W[count-1]=J
J= numpy.array([0] *m)
count+=1
Cost=0
for e in range(1,len(sigma)):
temp=W[e]-W[e-1]
temp[temp < 0] = 0
Cost+=sum(temp)
return Cost+capacity,time.time()-t0
python algorithm python-2.7 genetic-algorithm
4
Welcome to Code Review! I hope you get some great answers.
â Phrancis
May 1 at 17:55
Do you use the returned elapsed time in your whole program or is it just to know roughly how much time the function takes on your machine?
â Mathias Ettinger
May 4 at 9:40
@MathiasEttinger it is only to know how much time it takes. I want to decrease the time because nowadays due to the usage of if statements the python loop isn't optimal. Using less time the main algorithm will be able to do more iterations in less time. My main problem is that I don't know how to do it without the ifs, because I don't know how to define the set of Tools for every step without comparing with an if statement.
â A.Piquer
May 4 at 9:54
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I have been trying to develop an algorithm called keep the tool needed soonest (for more information about the KTNS explanation (page 3)) but during the simulations, I have realized that it takes too much time to run.
I want to decrease the running times and checking other previous questions about how to fast python coding Is Python slower than Java/C#? [closed] I have found several solutions, but I don't know how to implement them in my code.
On my computer it takes 0.004999876022338867 seconds, but the main problem is that for the whole program this function is called 13,000 times. I would like to reduce as much as possible the times but I don't know how. The fact of adapting the dictionaries to list is a good idea, but I don't know how to keep doing the same calculations. I have tried to ask in StackOverFlow but I've been said that here is a suitable place to do this kind of question. Here I attach my whole code, if you have any suggestion to improve it please don't hesitate to share with me.
import sets
import numpy
import copy
import time
J= 1: [6, 7],2: [2, 3], 3: [1, 6, 9], 4: [1, 5, 9], 5: [5, 8, 10], 6: [1, 3, 6, 8], 7: [5, 6, 8, 9], 8: [5, 7, 8], 9: [1, 4, 5, 8], 10: [1, 2, 4, 10]
def KTNS(sigma=[10, 9, 4, 1, 6, 3, 7, 5, 2, 8], Jobs=J, m=10 ,capacity=4 ):
t0=time.time()
Tools =
Lin=
n=len(sigma)
for i in range(1,m+1):
for e in sigma:
if i in Jobs[e]:
Tools[i]=sets.Set()
count = 1
available_tools=sets.Set()
for e in sigma:
for i in Jobs[e]:
Tools[i].add(count)
available_tools.add(i)
count+=1
Tin=copy.deepcopy(Tools)
for e in Tin:
Lin[e]=min(Tin[e])
count=1
J = numpy.array([0] *m)
W = numpy.array([[0] * m] * n)
while count<=len(sigma):
for e in Tools:
if len(available_tools)<capacity:
reference=len(available_tools)
else:
reference=capacity
while numpy.count_nonzero(J == 1) <reference:
min_value = min(Lin.itervalues())
min_keys = [k for k in Lin if Lin[k] == min_value]
temp = min_keys[0] #min(Lin, key=Lin.get)
if min_value>count:
if len(min_keys)>=2:
if count==1:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J0=W[count-2]
k=0
for elements in min_keys: #tested
if numpy.count_nonzero(J == 1) < reference:
if J0[elements-1]==1:
J[elements-1]=1
Lin[elements]='-'
k=1
else:
pass
else:
pass
if k==0:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp-1]=1
Lin[temp] = '-'
Tin[e].discard(count)
for element in Tin:
try:
Lin[element] = min(Tin[element])
except ValueError:
Tin[element]=sets.Set([len(sigma)+1])
Lin[element]=len(sigma)+1
W[count-1]=J
J= numpy.array([0] *m)
count+=1
Cost=0
for e in range(1,len(sigma)):
temp=W[e]-W[e-1]
temp[temp < 0] = 0
Cost+=sum(temp)
return Cost+capacity,time.time()-t0
python algorithm python-2.7 genetic-algorithm
I have been trying to develop an algorithm called keep the tool needed soonest (for more information about the KTNS explanation (page 3)) but during the simulations, I have realized that it takes too much time to run.
I want to decrease the running times and checking other previous questions about how to fast python coding Is Python slower than Java/C#? [closed] I have found several solutions, but I don't know how to implement them in my code.
On my computer it takes 0.004999876022338867 seconds, but the main problem is that for the whole program this function is called 13,000 times. I would like to reduce as much as possible the times but I don't know how. The fact of adapting the dictionaries to list is a good idea, but I don't know how to keep doing the same calculations. I have tried to ask in StackOverFlow but I've been said that here is a suitable place to do this kind of question. Here I attach my whole code, if you have any suggestion to improve it please don't hesitate to share with me.
import sets
import numpy
import copy
import time
J= 1: [6, 7],2: [2, 3], 3: [1, 6, 9], 4: [1, 5, 9], 5: [5, 8, 10], 6: [1, 3, 6, 8], 7: [5, 6, 8, 9], 8: [5, 7, 8], 9: [1, 4, 5, 8], 10: [1, 2, 4, 10]
def KTNS(sigma=[10, 9, 4, 1, 6, 3, 7, 5, 2, 8], Jobs=J, m=10 ,capacity=4 ):
t0=time.time()
Tools =
Lin=
n=len(sigma)
for i in range(1,m+1):
for e in sigma:
if i in Jobs[e]:
Tools[i]=sets.Set()
count = 1
available_tools=sets.Set()
for e in sigma:
for i in Jobs[e]:
Tools[i].add(count)
available_tools.add(i)
count+=1
Tin=copy.deepcopy(Tools)
for e in Tin:
Lin[e]=min(Tin[e])
count=1
J = numpy.array([0] *m)
W = numpy.array([[0] * m] * n)
while count<=len(sigma):
for e in Tools:
if len(available_tools)<capacity:
reference=len(available_tools)
else:
reference=capacity
while numpy.count_nonzero(J == 1) <reference:
min_value = min(Lin.itervalues())
min_keys = [k for k in Lin if Lin[k] == min_value]
temp = min_keys[0] #min(Lin, key=Lin.get)
if min_value>count:
if len(min_keys)>=2:
if count==1:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J0=W[count-2]
k=0
for elements in min_keys: #tested
if numpy.count_nonzero(J == 1) < reference:
if J0[elements-1]==1:
J[elements-1]=1
Lin[elements]='-'
k=1
else:
pass
else:
pass
if k==0:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp-1]=1
Lin[temp] = '-'
Tin[e].discard(count)
for element in Tin:
try:
Lin[element] = min(Tin[element])
except ValueError:
Tin[element]=sets.Set([len(sigma)+1])
Lin[element]=len(sigma)+1
W[count-1]=J
J= numpy.array([0] *m)
count+=1
Cost=0
for e in range(1,len(sigma)):
temp=W[e]-W[e-1]
temp[temp < 0] = 0
Cost+=sum(temp)
return Cost+capacity,time.time()-t0
python algorithm python-2.7 genetic-algorithm
asked May 1 at 17:36
A.Piquer
505
505
4
Welcome to Code Review! I hope you get some great answers.
â Phrancis
May 1 at 17:55
Do you use the returned elapsed time in your whole program or is it just to know roughly how much time the function takes on your machine?
â Mathias Ettinger
May 4 at 9:40
@MathiasEttinger it is only to know how much time it takes. I want to decrease the time because nowadays due to the usage of if statements the python loop isn't optimal. Using less time the main algorithm will be able to do more iterations in less time. My main problem is that I don't know how to do it without the ifs, because I don't know how to define the set of Tools for every step without comparing with an if statement.
â A.Piquer
May 4 at 9:54
add a comment |Â
4
Welcome to Code Review! I hope you get some great answers.
â Phrancis
May 1 at 17:55
Do you use the returned elapsed time in your whole program or is it just to know roughly how much time the function takes on your machine?
â Mathias Ettinger
May 4 at 9:40
@MathiasEttinger it is only to know how much time it takes. I want to decrease the time because nowadays due to the usage of if statements the python loop isn't optimal. Using less time the main algorithm will be able to do more iterations in less time. My main problem is that I don't know how to do it without the ifs, because I don't know how to define the set of Tools for every step without comparing with an if statement.
â A.Piquer
May 4 at 9:54
4
4
Welcome to Code Review! I hope you get some great answers.
â Phrancis
May 1 at 17:55
Welcome to Code Review! I hope you get some great answers.
â Phrancis
May 1 at 17:55
Do you use the returned elapsed time in your whole program or is it just to know roughly how much time the function takes on your machine?
â Mathias Ettinger
May 4 at 9:40
Do you use the returned elapsed time in your whole program or is it just to know roughly how much time the function takes on your machine?
â Mathias Ettinger
May 4 at 9:40
@MathiasEttinger it is only to know how much time it takes. I want to decrease the time because nowadays due to the usage of if statements the python loop isn't optimal. Using less time the main algorithm will be able to do more iterations in less time. My main problem is that I don't know how to do it without the ifs, because I don't know how to define the set of Tools for every step without comparing with an if statement.
â A.Piquer
May 4 at 9:54
@MathiasEttinger it is only to know how much time it takes. I want to decrease the time because nowadays due to the usage of if statements the python loop isn't optimal. Using less time the main algorithm will be able to do more iterations in less time. My main problem is that I don't know how to do it without the ifs, because I don't know how to define the set of Tools for every step without comparing with an if statement.
â A.Piquer
May 4 at 9:54
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
6
down vote
accepted
1. Bug
The code in the post does not compute the correct sequence of tool insertions for the Keep Tool Needed Soonest (KTNS) algorithm.
This is easiest to see if we print out the tools in the machine at each stage, by adding a line like
print(numpy.nonzero(J)[0] + 1)
near the end of the main loop. The first three lines of output are:
[ 1 2 4 10]
[1 4 5 8]
[1 5 6 9]
The third job in the schedule is job 4, which requires tools number 1, 5, and 9. So why did the code insert tool number 6? This is not needed by job 4, and so according to the KTNS algorithm it should not be inserted into the machine. Something has gone wrong.
This bug causes KTNS
to return the wrong result for the example: it returns a cost of 17, but the KTNS algorithm only requires 16 tool insertions for this schedule.
2. Argument
I started to write a review of the code in the post, but the major problem is that the wrong data structures have been selected, so I think it will be more helpful to explain how to approach this kind of problem from first principles.
Tang and Denardo (1986) give a mathematical description of KTNS procedure in section 3 of their paper. Mathematical notation is used in scientific papers because of its compactness and lack of ambiguity. But that doesn't mean that it is a good idea to translate the notation directly into source code. Before writing code, you need to solve two problems:
Choosing good names for variables. In the context of a paper, if you don't know what $L(i, n)$ is supposed to mean, then you can refer to its formal definition
$L(i, n) = minleftm:mâÂÂT(i, n)right$
or to its informal description:
the integer $L(i, n)$ is the first instant at or after instant $n$ at which tool $i$ is needed.
But in the context of source code it is not so easy to refer to these definitions, and so naming a variable
Lin
is only helpful to the reader who is following along with a copy of the paper. It is better to choose variable names that directly communicate the meaning, and to add comments (if necessary) to make that meaning unambiguous, for example:# Mapping from tool to the next instant at which it will be needed.
next_needed = ...Choosing data structures. The mathematical notation $L(i, n)$ doesn't give any clues as to how this should be implemented. Should you precompute a table of values for all tools $i$ and instants $n$? Or should you store for each tool the instants where that tool is needed in a sorted list and use bisection to find the first instant at or after $n$? Or should you store for each tool the instants where that tool is needed in a sorted queue, and pop its queue when a tool is used, so that the first element of queue is always the next instant when the tool will be needed? Or what? The choice of data structure is vital both in getting the most efficient implementation and in making the code simple and clear to the reader.
3. Decode
The key to finding the best data structures is to have a high level understanding of the algorithm, so that you are not trapped by the notation. The idea is to read the mathematics and decode it into concepts that you can work with. So when the paper says "Set $J_i = 1$" you figure out that this means "Insert tool $i$ into the machine".
This is the procedure:
Step 1. Set $J_i = 1$ for $C$ values of $i$ having minimal values of $L(i, 0)$. Break the ties arbitrarily. Set $J_i = 0$ for the remaining $M - C$ values of $i$. Set $n = 1$.
Step 2. Set $W_n = J$. Stop if $n = N$.
Step 3. If each $i$ having $L(i, n) = n$ also has $J_i = 1$, set $n = n + 1$ and go to Step 2.
Step 4. Pick $i$ having $L(i, n) = n$ and $J_i = 0$. Set $J_i = 1$.
Step 5. Set $J_k = 0$ for a $k$ that maximizes $L(p, n)$ over $p: J_p = 1$. Go to Step 3.
Here are my thoughts as I read the procedure:
- Steps 2 to step 5 are a loop over $n$.
- $n$ represents the current instant (job) in the schedule.
- $n$ starts at 1 (the first job) and goes up to $N$ (the last job). When implementing this in code, we will need to change this so that instants run from 0 up to (but not including) $N$, to avoid having to subtract 1 all the time.
- Step 1 is initialization: we fill the machine with the tools that will needed the soonest.
- In step 2, $W_n = J$ is just recording the set of tools that are now in the machine, which we don't care about since we are only going to be computing the cost.
- Step 3 is asking "does the machine contain every tool that is needed by the current job?" If so, it can go on to the next job.
- Step 4 only gets run if the machine lacks some tools that are needed. So we pick one of them, tool $i$, and insert it into the machine.
- Step 5 picks a tool in the machine that is not currently needed and removes it from the machine. The tool is the one that won't be needed for the longest.
4. Pseudo-code
Now that you have a good high-level understanding of the algorithm, the next step is to describe it in pseudo-code. Something like this:
While the machine is under capacity:
- Insert the tool that is needed soonest.
For each job in the schedule:
For each tool needed by the job:
- Insert the tool into the machine.
While the machine is over capacity:
- Remove the tool that won't be needed for the longest.
5. Data structures
Now examine the pseudo-code and figure out the data structures that are needed.
During initialization (but not afterwards), we need to be able to find the tool that is needed soonest.
We need to store the collection of tools that are currently in the machine.
We need to be able to find the size of this collection, so that we can determine whether the machine is over or under capacity.
This collection needs to support efficient insertion and deletion.
We need to be able to efficiently find the tool in the machine that won't be needed for the longest.
Point 1 is easy enough: for each tool, we can find the jobs that require that tool, and so find the first job for each tool, and use heapq.nsmallest
to pick the tools we want.
But points 2 to 5 are hard to satisfy. Here are three alternative implementation strategies:
The requirement to repeatedly and efficiently extract the minimum or maximum element (point 5) suggests some kind of priority queue. In the Python standard library the closest thing we have to a priority queue is a heap and heaps don't support efficient deletion. However, there is a workaround for deleting items from heaps, namely to leave the item in the heap but mark it as deleted, and ignore deleted items when they are extracted. But in order to satisfy point 3 we would need to ignore the deleted items so we would need to maintain a separate count.
Use a data structure from a third-party library. It looks as if a
SortedSet
from thesortedcontainers
library would meet our requirements.Use an ordinary data structure like a
set
, and accept some inefficiency in point 5 by callingmax
(taking time proportional to the capacity of the machine). When the capacity is small, this is acceptable.
I'm going to use strategy 3 because it's the simplest and because in your example the capacity is 4, which is small enough for the inefficiency not to matter.
6. Implementation
from collections import defaultdict, deque
from heapq import nsmallest
def schedule_cost(schedule, requirements, capacity):
"""Return the cost of the schedule, that is, the number of tool insertions.
Parameters:
schedule -- Sequence of jobs.
requirements -- Mapping from job to iterable of required tools.
capacity -- Number of tools that the machine can hold at once.
Jobs and tools may be any hashable objects.
"""
# Mapping from tool to deque of instants when the tool will be needed.
tool_instants = defaultdict(deque)
for i, job in enumerate(schedule):
assert len(requirements[job]) <= capacity
for tool in requirements[job]:
tool_instants[tool].append(i)
for instants in tool_instants.values():
instants.append(len(schedule))
# Point in schedule when tool will next be needed.
def next_needed(tool):
return tool_instants[tool][0]
# Set of tools in the machine.
machine = set(nsmallest(capacity, tool_instants, key=next_needed))
cost = len(machine)
for job in schedule:
# Set of tools that need to be inserted.
new_tools = tool for tool in requirements[job] if tool not in machine
if new_tools:
cost += len(new_tools)
# Remove tools if necessary to avoid going over capacity.
for _ in range(len(machine) + len(new_tools) - capacity):
machine.remove(max(machine, key=next_needed))
machine.update(new_tools)
for tool in requirements[job]:
tool_instants[tool].popleft()
return cost
7. Performance
The revised code in ç6 is about 20 times faster than the code in the post:
>>> test = lambda:schedule_cost([10, 9, 4, 1, 6, 3, 7, 5, 2, 8], J, 4)
>>> from timeit import timeit
>>> KTNS()[1] / timeit(test, number=1)
20.335097001763668
8. Tool decisions
If you also want to return the decisions about which tools to insert (in addition to the cost), then that's easy. The newly inserted tools at each instant are in the new_tools
set, and the tools in the machine at each instant are in the machine
set. Take a copy of whichever one of these you need to keep.
For example, if you want to keep copies of machine
then you would start with something like:
machine_record =
and then at the end of each loop, append a copy of machine
to the record:
machine_record.append(set(machine))
and findally return the record along with the cost:
return cost, machine_record
Well... I don't know how to thank you because five hours ago I realized that my code was inefficient in some cases. I've been trying a while to look for the problem and I couldn't find it. Your code is hundreds of times faster than my original version and provides the exact amounts of changes. All the literature is inside the code and you understood perfectly the idea of the problem. Btw how can I get the whole sequence W? I have tried to copy from the machine set updated with new_tools but I couldn't.
â A.Piquer
May 5 at 23:16
1
I only focused on the given code for my answer and wanted to take a closer look at the paper today⦠Guess I won't have to anymore :)
â Mathias Ettinger
May 7 at 8:02
1
@A.Piquer: See updated answer.
â Gareth Rees
May 7 at 9:12
add a comment |Â
up vote
4
down vote
For starter, you should get a consistent coding style. Naming and spacing are usually off and makes the code harder to read. You should get yourself familiar with PEP8 and apply it to make your Python code look like Python code to others. As regard to variable names, I get that Tin
, Lin
, W
, and J
come from the literature youâÂÂre using; so keeping them should be fine. Other variables are way more obscure, such as m
or n
(are they the amount of jobs and the amount of tools?) or event count
for that matter.
Second, you shouldn't need to worry about timing in the very function you want to time. Use the timeit
module instead. And also wrap the timing code in an if __name__ == '__main__'
guard.
Then we will need to examine the code more closely:
import sets
is deprecated since Python 2.6, use the builtinset
instead;copy.deepcopy()
is very heavy, better build a new dictionnary by only copying the containedset
s;- the usual import of
numpy
isimport numpy as np
; - you don't need empty
else
s clauses, they are just noise to the reader; - since you copy the
Tools
dictionnary before your loop, you don't need to build it at each function call, extract that part to an initialisation function and provide the dictionnary directly toKTNS
rather thanJobs
andsigma
; - your
while
loop would read better asfor count in range(1, n + 1):
; - once
J
reaches its requirements, it is not modified for subsequentse
inTools
, thus you can extract that part from the innerfor
loop; reference
is always the same in every iterations, better compute it once at the start of the function;- you can avoid using a
try:â¦except ValueError:
by addingn + 1
in each set ofTin
before the main loop: youâÂÂre guaranteed that this element will never be discarded; - you can compute the final cost more easily by using the potential of numpy for vectorized operations, fancy indexing, and
np.sum
; - using an other type of value (such as
'-'
instead of a number) to denote an absence of an element is prone to error and misunderstanding, you're better off deleting the key since the last loop will put it back in place anyway.
Proposed improvements follows, a word of caution though, I may have inverted the meaning of m
and n
since it is very difficult to know which is which (again, better naming would solve the problem):
from collections import defaultdict
import numpy as np
JOBS =
1: [6, 7],
2: [2, 3],
3: [1, 6, 9],
4: [1, 5, 9],
5: [5, 8, 10],
6: [1, 3, 6, 8],
7: [5, 6, 8, 9],
8: [5, 7, 8],
9: [1, 4, 5, 8],
10: [1, 2, 4, 10],
def prepare_tools(jobs, sigma=(10, 9, 4, 1, 6, 3, 7, 5, 2, 8)):
tools = defaultdict(set)
for count, e in enumerate(sigma, 1):
for i in jobs[e]:
tools[i].add(count)
max_value = max(sigma) + 1
for counts_set in tools.itervalues():
counts_set.add(max_value)
return tools
def keep_the_tool_needed_soonest(tools, n, capacity=4):
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(n, capacity)
m = len(tools)
W = np.zeros((n, m), dtype=int)
for count in range(n):
job_id = count + 1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id and len(min_keys) > 1 and count:
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
del Lin[element]
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key - 1] = True
del Lin[key]
for jobs in Tin.itervalues():
jobs.discard(job_id)
for element, jobs in Tin.iteritems():
Lin[element] = min(jobs)
W[count] = J
cost = W[1:] - W[:n - 1]
return np.sum(cost[cost > 0]) + capacity
With the testing code being:
if __name__ == '__main__':
from timeit import timeit
number_of_calls = 10000
time = timeit('KTNS(JOBS)', setup='from __main__ import KTNS, JOBS', number=number_of_calls)
result = KTNS(Jobs=JOBS)
print('BEFORE', result, time / number_of_calls)
setup = """
from __main__ import JOBS, prepare_tools, keep_the_tool_needed_soonest
tools = prepare_tools(JOBS)"""
time = timeit('keep_the_tool_needed_soonest(tools, len(JOBS))', setup=setup, number=number_of_calls)
result = keep_the_tool_needed_soonest(prepare_tools(JOBS), len(JOBS))
print('AFTER', result, time / number_of_calls)
I obtain the following results:
$ python2 ktns.py
('BEFORE', 17, 0.0009421438932418824)
('AFTER', 17, 0.00019730830192565919)
Which is roughly a 5ÃÂ improvement, not thrilling but somehow acceptable. Granted the code reads way better now.
However, this code has a major flaw in which it only works with jobs and tools numbered from 1 to m
or n
. If, by any chance, your code allows for jobs being [3, 8, 12] and tools numbered from 100 to 123 for instance, your various loops and updates to J will just not work.
As you have commented the variables m and n are the number of tools (m) and the number of jobs (n, to denominate the number of steps to finish the algorithm). Talking about the try/except method, it must be done because the variable Lin is always updated with a new value, and when there aren't more values in Tin it should be updated with n+1 (to avoid the set of the tool). About your last comment, the jobs and the tools are always numbered from 1 to m or from 1 to n, but sometimes the sigma list doesn't have all the elements (e.g [2,5,6,7] as a partial solution of sigma). Any more comments?
â A.Piquer
May 4 at 16:21
@A.Piquer thetry/except
is taken care of by puttingn+1
in each set oftools
inprepare_tools
, so they are already inTin
and the set will never be empty. As forsigma
being less than the whole set of jobs⦠well, I didn't test extensivelly. It may or may not workâ¦
â Mathias Ettinger
May 4 at 16:27
But the fact is that prepare_tools doesn't do what it should. The fact is that the aim of prepare_tools is to set a Dictionary (Tools) which will agrupate for each tool in what job of the sequence sigma is needed. For example for Jobs:1:[1,2,3], 2:[3,4,5], 3:[1,3],4:[1,5],5:[10] and sigma=[1,2,4] , the Tools dictonary of sets would be: 1:[1,4], 2:[1], 3: [1,2], 4: [2] ,5:[2,4] . Do you know how to adapt the code to achieve this? I did in the way I post because I didn't know how to do faster, but your version is pretty better. Thank you
â A.Piquer
May 4 at 16:35
@A.Piquer I don't get it, the kids are the same:for e in sigma: for i in jobs[e]
I just handledcount
using enumerate and initialization using a defaultdict but the results are similar.
â Mathias Ettinger
May 4 at 17:21
I have checked another time and you are right. In the other hand, do you think that it exists a better way to do the steps of the article I attached in the question? I don't know how to achieve the steps 3 and 4, and they are the core of the algorithm. Looking for the previous steps in W it's complicated because it is needed to look for previous values. If you know a better way I would really appreciate.
â A.Piquer
May 4 at 17:28
add a comment |Â
up vote
1
down vote
Through the changes proposed by Mathias I have changed my code and nowadays works perfectly. The speed has improved a lot and the results are correct. Here I attach the whole code for the people who may be interested in the code optimization. I have added the prepare_tools function inside the keep_the_tool_needed_soonest in order to call only one function with the different arguments (the function must be called by sigma, Jobs, m, c). Any more improvements which can be done don't hesitate to share.
from collections import defaultdict
import numpy as np
def KTNS(sigma, jobs, m, capacity):
tools = defaultdict(set)
count = 1
for e in sigma:
for i in jobs[e]:
tools[i].add(count)
count += 1
steps=len(sigma)
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(len(Lin), capacity)
W = np.zeros((steps, m), dtype=int)
for count in range(steps):
job_id = count+1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id : #and len(min_keys) > 1
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
Lin[element]='-'
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key-1] = True
Lin[key]='-'
for jobs in Tin:
Tin[jobs].discard(job_id)
for element, jobs in Tin.iteritems():
try:
Lin[element] = min(jobs)
except ValueError:
Tin[element].add(steps+1)
Lin[element]=steps+1
W[count] = J
cost = W[1:] - W[:steps - 1]
return np.sum(cost[cost > 0])
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
1. Bug
The code in the post does not compute the correct sequence of tool insertions for the Keep Tool Needed Soonest (KTNS) algorithm.
This is easiest to see if we print out the tools in the machine at each stage, by adding a line like
print(numpy.nonzero(J)[0] + 1)
near the end of the main loop. The first three lines of output are:
[ 1 2 4 10]
[1 4 5 8]
[1 5 6 9]
The third job in the schedule is job 4, which requires tools number 1, 5, and 9. So why did the code insert tool number 6? This is not needed by job 4, and so according to the KTNS algorithm it should not be inserted into the machine. Something has gone wrong.
This bug causes KTNS
to return the wrong result for the example: it returns a cost of 17, but the KTNS algorithm only requires 16 tool insertions for this schedule.
2. Argument
I started to write a review of the code in the post, but the major problem is that the wrong data structures have been selected, so I think it will be more helpful to explain how to approach this kind of problem from first principles.
Tang and Denardo (1986) give a mathematical description of KTNS procedure in section 3 of their paper. Mathematical notation is used in scientific papers because of its compactness and lack of ambiguity. But that doesn't mean that it is a good idea to translate the notation directly into source code. Before writing code, you need to solve two problems:
Choosing good names for variables. In the context of a paper, if you don't know what $L(i, n)$ is supposed to mean, then you can refer to its formal definition
$L(i, n) = minleftm:mâÂÂT(i, n)right$
or to its informal description:
the integer $L(i, n)$ is the first instant at or after instant $n$ at which tool $i$ is needed.
But in the context of source code it is not so easy to refer to these definitions, and so naming a variable
Lin
is only helpful to the reader who is following along with a copy of the paper. It is better to choose variable names that directly communicate the meaning, and to add comments (if necessary) to make that meaning unambiguous, for example:# Mapping from tool to the next instant at which it will be needed.
next_needed = ...Choosing data structures. The mathematical notation $L(i, n)$ doesn't give any clues as to how this should be implemented. Should you precompute a table of values for all tools $i$ and instants $n$? Or should you store for each tool the instants where that tool is needed in a sorted list and use bisection to find the first instant at or after $n$? Or should you store for each tool the instants where that tool is needed in a sorted queue, and pop its queue when a tool is used, so that the first element of queue is always the next instant when the tool will be needed? Or what? The choice of data structure is vital both in getting the most efficient implementation and in making the code simple and clear to the reader.
3. Decode
The key to finding the best data structures is to have a high level understanding of the algorithm, so that you are not trapped by the notation. The idea is to read the mathematics and decode it into concepts that you can work with. So when the paper says "Set $J_i = 1$" you figure out that this means "Insert tool $i$ into the machine".
This is the procedure:
Step 1. Set $J_i = 1$ for $C$ values of $i$ having minimal values of $L(i, 0)$. Break the ties arbitrarily. Set $J_i = 0$ for the remaining $M - C$ values of $i$. Set $n = 1$.
Step 2. Set $W_n = J$. Stop if $n = N$.
Step 3. If each $i$ having $L(i, n) = n$ also has $J_i = 1$, set $n = n + 1$ and go to Step 2.
Step 4. Pick $i$ having $L(i, n) = n$ and $J_i = 0$. Set $J_i = 1$.
Step 5. Set $J_k = 0$ for a $k$ that maximizes $L(p, n)$ over $p: J_p = 1$. Go to Step 3.
Here are my thoughts as I read the procedure:
- Steps 2 to step 5 are a loop over $n$.
- $n$ represents the current instant (job) in the schedule.
- $n$ starts at 1 (the first job) and goes up to $N$ (the last job). When implementing this in code, we will need to change this so that instants run from 0 up to (but not including) $N$, to avoid having to subtract 1 all the time.
- Step 1 is initialization: we fill the machine with the tools that will needed the soonest.
- In step 2, $W_n = J$ is just recording the set of tools that are now in the machine, which we don't care about since we are only going to be computing the cost.
- Step 3 is asking "does the machine contain every tool that is needed by the current job?" If so, it can go on to the next job.
- Step 4 only gets run if the machine lacks some tools that are needed. So we pick one of them, tool $i$, and insert it into the machine.
- Step 5 picks a tool in the machine that is not currently needed and removes it from the machine. The tool is the one that won't be needed for the longest.
4. Pseudo-code
Now that you have a good high-level understanding of the algorithm, the next step is to describe it in pseudo-code. Something like this:
While the machine is under capacity:
- Insert the tool that is needed soonest.
For each job in the schedule:
For each tool needed by the job:
- Insert the tool into the machine.
While the machine is over capacity:
- Remove the tool that won't be needed for the longest.
5. Data structures
Now examine the pseudo-code and figure out the data structures that are needed.
During initialization (but not afterwards), we need to be able to find the tool that is needed soonest.
We need to store the collection of tools that are currently in the machine.
We need to be able to find the size of this collection, so that we can determine whether the machine is over or under capacity.
This collection needs to support efficient insertion and deletion.
We need to be able to efficiently find the tool in the machine that won't be needed for the longest.
Point 1 is easy enough: for each tool, we can find the jobs that require that tool, and so find the first job for each tool, and use heapq.nsmallest
to pick the tools we want.
But points 2 to 5 are hard to satisfy. Here are three alternative implementation strategies:
The requirement to repeatedly and efficiently extract the minimum or maximum element (point 5) suggests some kind of priority queue. In the Python standard library the closest thing we have to a priority queue is a heap and heaps don't support efficient deletion. However, there is a workaround for deleting items from heaps, namely to leave the item in the heap but mark it as deleted, and ignore deleted items when they are extracted. But in order to satisfy point 3 we would need to ignore the deleted items so we would need to maintain a separate count.
Use a data structure from a third-party library. It looks as if a
SortedSet
from thesortedcontainers
library would meet our requirements.Use an ordinary data structure like a
set
, and accept some inefficiency in point 5 by callingmax
(taking time proportional to the capacity of the machine). When the capacity is small, this is acceptable.
I'm going to use strategy 3 because it's the simplest and because in your example the capacity is 4, which is small enough for the inefficiency not to matter.
6. Implementation
from collections import defaultdict, deque
from heapq import nsmallest
def schedule_cost(schedule, requirements, capacity):
"""Return the cost of the schedule, that is, the number of tool insertions.
Parameters:
schedule -- Sequence of jobs.
requirements -- Mapping from job to iterable of required tools.
capacity -- Number of tools that the machine can hold at once.
Jobs and tools may be any hashable objects.
"""
# Mapping from tool to deque of instants when the tool will be needed.
tool_instants = defaultdict(deque)
for i, job in enumerate(schedule):
assert len(requirements[job]) <= capacity
for tool in requirements[job]:
tool_instants[tool].append(i)
for instants in tool_instants.values():
instants.append(len(schedule))
# Point in schedule when tool will next be needed.
def next_needed(tool):
return tool_instants[tool][0]
# Set of tools in the machine.
machine = set(nsmallest(capacity, tool_instants, key=next_needed))
cost = len(machine)
for job in schedule:
# Set of tools that need to be inserted.
new_tools = tool for tool in requirements[job] if tool not in machine
if new_tools:
cost += len(new_tools)
# Remove tools if necessary to avoid going over capacity.
for _ in range(len(machine) + len(new_tools) - capacity):
machine.remove(max(machine, key=next_needed))
machine.update(new_tools)
for tool in requirements[job]:
tool_instants[tool].popleft()
return cost
7. Performance
The revised code in ç6 is about 20 times faster than the code in the post:
>>> test = lambda:schedule_cost([10, 9, 4, 1, 6, 3, 7, 5, 2, 8], J, 4)
>>> from timeit import timeit
>>> KTNS()[1] / timeit(test, number=1)
20.335097001763668
8. Tool decisions
If you also want to return the decisions about which tools to insert (in addition to the cost), then that's easy. The newly inserted tools at each instant are in the new_tools
set, and the tools in the machine at each instant are in the machine
set. Take a copy of whichever one of these you need to keep.
For example, if you want to keep copies of machine
then you would start with something like:
machine_record =
and then at the end of each loop, append a copy of machine
to the record:
machine_record.append(set(machine))
and findally return the record along with the cost:
return cost, machine_record
Well... I don't know how to thank you because five hours ago I realized that my code was inefficient in some cases. I've been trying a while to look for the problem and I couldn't find it. Your code is hundreds of times faster than my original version and provides the exact amounts of changes. All the literature is inside the code and you understood perfectly the idea of the problem. Btw how can I get the whole sequence W? I have tried to copy from the machine set updated with new_tools but I couldn't.
â A.Piquer
May 5 at 23:16
1
I only focused on the given code for my answer and wanted to take a closer look at the paper today⦠Guess I won't have to anymore :)
â Mathias Ettinger
May 7 at 8:02
1
@A.Piquer: See updated answer.
â Gareth Rees
May 7 at 9:12
add a comment |Â
up vote
6
down vote
accepted
1. Bug
The code in the post does not compute the correct sequence of tool insertions for the Keep Tool Needed Soonest (KTNS) algorithm.
This is easiest to see if we print out the tools in the machine at each stage, by adding a line like
print(numpy.nonzero(J)[0] + 1)
near the end of the main loop. The first three lines of output are:
[ 1 2 4 10]
[1 4 5 8]
[1 5 6 9]
The third job in the schedule is job 4, which requires tools number 1, 5, and 9. So why did the code insert tool number 6? This is not needed by job 4, and so according to the KTNS algorithm it should not be inserted into the machine. Something has gone wrong.
This bug causes KTNS
to return the wrong result for the example: it returns a cost of 17, but the KTNS algorithm only requires 16 tool insertions for this schedule.
2. Argument
I started to write a review of the code in the post, but the major problem is that the wrong data structures have been selected, so I think it will be more helpful to explain how to approach this kind of problem from first principles.
Tang and Denardo (1986) give a mathematical description of KTNS procedure in section 3 of their paper. Mathematical notation is used in scientific papers because of its compactness and lack of ambiguity. But that doesn't mean that it is a good idea to translate the notation directly into source code. Before writing code, you need to solve two problems:
Choosing good names for variables. In the context of a paper, if you don't know what $L(i, n)$ is supposed to mean, then you can refer to its formal definition
$L(i, n) = minleftm:mâÂÂT(i, n)right$
or to its informal description:
the integer $L(i, n)$ is the first instant at or after instant $n$ at which tool $i$ is needed.
But in the context of source code it is not so easy to refer to these definitions, and so naming a variable
Lin
is only helpful to the reader who is following along with a copy of the paper. It is better to choose variable names that directly communicate the meaning, and to add comments (if necessary) to make that meaning unambiguous, for example:# Mapping from tool to the next instant at which it will be needed.
next_needed = ...Choosing data structures. The mathematical notation $L(i, n)$ doesn't give any clues as to how this should be implemented. Should you precompute a table of values for all tools $i$ and instants $n$? Or should you store for each tool the instants where that tool is needed in a sorted list and use bisection to find the first instant at or after $n$? Or should you store for each tool the instants where that tool is needed in a sorted queue, and pop its queue when a tool is used, so that the first element of queue is always the next instant when the tool will be needed? Or what? The choice of data structure is vital both in getting the most efficient implementation and in making the code simple and clear to the reader.
3. Decode
The key to finding the best data structures is to have a high level understanding of the algorithm, so that you are not trapped by the notation. The idea is to read the mathematics and decode it into concepts that you can work with. So when the paper says "Set $J_i = 1$" you figure out that this means "Insert tool $i$ into the machine".
This is the procedure:
Step 1. Set $J_i = 1$ for $C$ values of $i$ having minimal values of $L(i, 0)$. Break the ties arbitrarily. Set $J_i = 0$ for the remaining $M - C$ values of $i$. Set $n = 1$.
Step 2. Set $W_n = J$. Stop if $n = N$.
Step 3. If each $i$ having $L(i, n) = n$ also has $J_i = 1$, set $n = n + 1$ and go to Step 2.
Step 4. Pick $i$ having $L(i, n) = n$ and $J_i = 0$. Set $J_i = 1$.
Step 5. Set $J_k = 0$ for a $k$ that maximizes $L(p, n)$ over $p: J_p = 1$. Go to Step 3.
Here are my thoughts as I read the procedure:
- Steps 2 to step 5 are a loop over $n$.
- $n$ represents the current instant (job) in the schedule.
- $n$ starts at 1 (the first job) and goes up to $N$ (the last job). When implementing this in code, we will need to change this so that instants run from 0 up to (but not including) $N$, to avoid having to subtract 1 all the time.
- Step 1 is initialization: we fill the machine with the tools that will needed the soonest.
- In step 2, $W_n = J$ is just recording the set of tools that are now in the machine, which we don't care about since we are only going to be computing the cost.
- Step 3 is asking "does the machine contain every tool that is needed by the current job?" If so, it can go on to the next job.
- Step 4 only gets run if the machine lacks some tools that are needed. So we pick one of them, tool $i$, and insert it into the machine.
- Step 5 picks a tool in the machine that is not currently needed and removes it from the machine. The tool is the one that won't be needed for the longest.
4. Pseudo-code
Now that you have a good high-level understanding of the algorithm, the next step is to describe it in pseudo-code. Something like this:
While the machine is under capacity:
- Insert the tool that is needed soonest.
For each job in the schedule:
For each tool needed by the job:
- Insert the tool into the machine.
While the machine is over capacity:
- Remove the tool that won't be needed for the longest.
5. Data structures
Now examine the pseudo-code and figure out the data structures that are needed.
During initialization (but not afterwards), we need to be able to find the tool that is needed soonest.
We need to store the collection of tools that are currently in the machine.
We need to be able to find the size of this collection, so that we can determine whether the machine is over or under capacity.
This collection needs to support efficient insertion and deletion.
We need to be able to efficiently find the tool in the machine that won't be needed for the longest.
Point 1 is easy enough: for each tool, we can find the jobs that require that tool, and so find the first job for each tool, and use heapq.nsmallest
to pick the tools we want.
But points 2 to 5 are hard to satisfy. Here are three alternative implementation strategies:
The requirement to repeatedly and efficiently extract the minimum or maximum element (point 5) suggests some kind of priority queue. In the Python standard library the closest thing we have to a priority queue is a heap and heaps don't support efficient deletion. However, there is a workaround for deleting items from heaps, namely to leave the item in the heap but mark it as deleted, and ignore deleted items when they are extracted. But in order to satisfy point 3 we would need to ignore the deleted items so we would need to maintain a separate count.
Use a data structure from a third-party library. It looks as if a
SortedSet
from thesortedcontainers
library would meet our requirements.Use an ordinary data structure like a
set
, and accept some inefficiency in point 5 by callingmax
(taking time proportional to the capacity of the machine). When the capacity is small, this is acceptable.
I'm going to use strategy 3 because it's the simplest and because in your example the capacity is 4, which is small enough for the inefficiency not to matter.
6. Implementation
from collections import defaultdict, deque
from heapq import nsmallest
def schedule_cost(schedule, requirements, capacity):
"""Return the cost of the schedule, that is, the number of tool insertions.
Parameters:
schedule -- Sequence of jobs.
requirements -- Mapping from job to iterable of required tools.
capacity -- Number of tools that the machine can hold at once.
Jobs and tools may be any hashable objects.
"""
# Mapping from tool to deque of instants when the tool will be needed.
tool_instants = defaultdict(deque)
for i, job in enumerate(schedule):
assert len(requirements[job]) <= capacity
for tool in requirements[job]:
tool_instants[tool].append(i)
for instants in tool_instants.values():
instants.append(len(schedule))
# Point in schedule when tool will next be needed.
def next_needed(tool):
return tool_instants[tool][0]
# Set of tools in the machine.
machine = set(nsmallest(capacity, tool_instants, key=next_needed))
cost = len(machine)
for job in schedule:
# Set of tools that need to be inserted.
new_tools = tool for tool in requirements[job] if tool not in machine
if new_tools:
cost += len(new_tools)
# Remove tools if necessary to avoid going over capacity.
for _ in range(len(machine) + len(new_tools) - capacity):
machine.remove(max(machine, key=next_needed))
machine.update(new_tools)
for tool in requirements[job]:
tool_instants[tool].popleft()
return cost
7. Performance
The revised code in ç6 is about 20 times faster than the code in the post:
>>> test = lambda:schedule_cost([10, 9, 4, 1, 6, 3, 7, 5, 2, 8], J, 4)
>>> from timeit import timeit
>>> KTNS()[1] / timeit(test, number=1)
20.335097001763668
8. Tool decisions
If you also want to return the decisions about which tools to insert (in addition to the cost), then that's easy. The newly inserted tools at each instant are in the new_tools
set, and the tools in the machine at each instant are in the machine
set. Take a copy of whichever one of these you need to keep.
For example, if you want to keep copies of machine
then you would start with something like:
machine_record =
and then at the end of each loop, append a copy of machine
to the record:
machine_record.append(set(machine))
and findally return the record along with the cost:
return cost, machine_record
Well... I don't know how to thank you because five hours ago I realized that my code was inefficient in some cases. I've been trying a while to look for the problem and I couldn't find it. Your code is hundreds of times faster than my original version and provides the exact amounts of changes. All the literature is inside the code and you understood perfectly the idea of the problem. Btw how can I get the whole sequence W? I have tried to copy from the machine set updated with new_tools but I couldn't.
â A.Piquer
May 5 at 23:16
1
I only focused on the given code for my answer and wanted to take a closer look at the paper today⦠Guess I won't have to anymore :)
â Mathias Ettinger
May 7 at 8:02
1
@A.Piquer: See updated answer.
â Gareth Rees
May 7 at 9:12
add a comment |Â
up vote
6
down vote
accepted
up vote
6
down vote
accepted
1. Bug
The code in the post does not compute the correct sequence of tool insertions for the Keep Tool Needed Soonest (KTNS) algorithm.
This is easiest to see if we print out the tools in the machine at each stage, by adding a line like
print(numpy.nonzero(J)[0] + 1)
near the end of the main loop. The first three lines of output are:
[ 1 2 4 10]
[1 4 5 8]
[1 5 6 9]
The third job in the schedule is job 4, which requires tools number 1, 5, and 9. So why did the code insert tool number 6? This is not needed by job 4, and so according to the KTNS algorithm it should not be inserted into the machine. Something has gone wrong.
This bug causes KTNS
to return the wrong result for the example: it returns a cost of 17, but the KTNS algorithm only requires 16 tool insertions for this schedule.
2. Argument
I started to write a review of the code in the post, but the major problem is that the wrong data structures have been selected, so I think it will be more helpful to explain how to approach this kind of problem from first principles.
Tang and Denardo (1986) give a mathematical description of KTNS procedure in section 3 of their paper. Mathematical notation is used in scientific papers because of its compactness and lack of ambiguity. But that doesn't mean that it is a good idea to translate the notation directly into source code. Before writing code, you need to solve two problems:
Choosing good names for variables. In the context of a paper, if you don't know what $L(i, n)$ is supposed to mean, then you can refer to its formal definition
$L(i, n) = minleftm:mâÂÂT(i, n)right$
or to its informal description:
the integer $L(i, n)$ is the first instant at or after instant $n$ at which tool $i$ is needed.
But in the context of source code it is not so easy to refer to these definitions, and so naming a variable
Lin
is only helpful to the reader who is following along with a copy of the paper. It is better to choose variable names that directly communicate the meaning, and to add comments (if necessary) to make that meaning unambiguous, for example:# Mapping from tool to the next instant at which it will be needed.
next_needed = ...Choosing data structures. The mathematical notation $L(i, n)$ doesn't give any clues as to how this should be implemented. Should you precompute a table of values for all tools $i$ and instants $n$? Or should you store for each tool the instants where that tool is needed in a sorted list and use bisection to find the first instant at or after $n$? Or should you store for each tool the instants where that tool is needed in a sorted queue, and pop its queue when a tool is used, so that the first element of queue is always the next instant when the tool will be needed? Or what? The choice of data structure is vital both in getting the most efficient implementation and in making the code simple and clear to the reader.
3. Decode
The key to finding the best data structures is to have a high level understanding of the algorithm, so that you are not trapped by the notation. The idea is to read the mathematics and decode it into concepts that you can work with. So when the paper says "Set $J_i = 1$" you figure out that this means "Insert tool $i$ into the machine".
This is the procedure:
Step 1. Set $J_i = 1$ for $C$ values of $i$ having minimal values of $L(i, 0)$. Break the ties arbitrarily. Set $J_i = 0$ for the remaining $M - C$ values of $i$. Set $n = 1$.
Step 2. Set $W_n = J$. Stop if $n = N$.
Step 3. If each $i$ having $L(i, n) = n$ also has $J_i = 1$, set $n = n + 1$ and go to Step 2.
Step 4. Pick $i$ having $L(i, n) = n$ and $J_i = 0$. Set $J_i = 1$.
Step 5. Set $J_k = 0$ for a $k$ that maximizes $L(p, n)$ over $p: J_p = 1$. Go to Step 3.
Here are my thoughts as I read the procedure:
- Steps 2 to step 5 are a loop over $n$.
- $n$ represents the current instant (job) in the schedule.
- $n$ starts at 1 (the first job) and goes up to $N$ (the last job). When implementing this in code, we will need to change this so that instants run from 0 up to (but not including) $N$, to avoid having to subtract 1 all the time.
- Step 1 is initialization: we fill the machine with the tools that will needed the soonest.
- In step 2, $W_n = J$ is just recording the set of tools that are now in the machine, which we don't care about since we are only going to be computing the cost.
- Step 3 is asking "does the machine contain every tool that is needed by the current job?" If so, it can go on to the next job.
- Step 4 only gets run if the machine lacks some tools that are needed. So we pick one of them, tool $i$, and insert it into the machine.
- Step 5 picks a tool in the machine that is not currently needed and removes it from the machine. The tool is the one that won't be needed for the longest.
4. Pseudo-code
Now that you have a good high-level understanding of the algorithm, the next step is to describe it in pseudo-code. Something like this:
While the machine is under capacity:
- Insert the tool that is needed soonest.
For each job in the schedule:
For each tool needed by the job:
- Insert the tool into the machine.
While the machine is over capacity:
- Remove the tool that won't be needed for the longest.
5. Data structures
Now examine the pseudo-code and figure out the data structures that are needed.
During initialization (but not afterwards), we need to be able to find the tool that is needed soonest.
We need to store the collection of tools that are currently in the machine.
We need to be able to find the size of this collection, so that we can determine whether the machine is over or under capacity.
This collection needs to support efficient insertion and deletion.
We need to be able to efficiently find the tool in the machine that won't be needed for the longest.
Point 1 is easy enough: for each tool, we can find the jobs that require that tool, and so find the first job for each tool, and use heapq.nsmallest
to pick the tools we want.
But points 2 to 5 are hard to satisfy. Here are three alternative implementation strategies:
The requirement to repeatedly and efficiently extract the minimum or maximum element (point 5) suggests some kind of priority queue. In the Python standard library the closest thing we have to a priority queue is a heap and heaps don't support efficient deletion. However, there is a workaround for deleting items from heaps, namely to leave the item in the heap but mark it as deleted, and ignore deleted items when they are extracted. But in order to satisfy point 3 we would need to ignore the deleted items so we would need to maintain a separate count.
Use a data structure from a third-party library. It looks as if a
SortedSet
from thesortedcontainers
library would meet our requirements.Use an ordinary data structure like a
set
, and accept some inefficiency in point 5 by callingmax
(taking time proportional to the capacity of the machine). When the capacity is small, this is acceptable.
I'm going to use strategy 3 because it's the simplest and because in your example the capacity is 4, which is small enough for the inefficiency not to matter.
6. Implementation
from collections import defaultdict, deque
from heapq import nsmallest
def schedule_cost(schedule, requirements, capacity):
"""Return the cost of the schedule, that is, the number of tool insertions.
Parameters:
schedule -- Sequence of jobs.
requirements -- Mapping from job to iterable of required tools.
capacity -- Number of tools that the machine can hold at once.
Jobs and tools may be any hashable objects.
"""
# Mapping from tool to deque of instants when the tool will be needed.
tool_instants = defaultdict(deque)
for i, job in enumerate(schedule):
assert len(requirements[job]) <= capacity
for tool in requirements[job]:
tool_instants[tool].append(i)
for instants in tool_instants.values():
instants.append(len(schedule))
# Point in schedule when tool will next be needed.
def next_needed(tool):
return tool_instants[tool][0]
# Set of tools in the machine.
machine = set(nsmallest(capacity, tool_instants, key=next_needed))
cost = len(machine)
for job in schedule:
# Set of tools that need to be inserted.
new_tools = tool for tool in requirements[job] if tool not in machine
if new_tools:
cost += len(new_tools)
# Remove tools if necessary to avoid going over capacity.
for _ in range(len(machine) + len(new_tools) - capacity):
machine.remove(max(machine, key=next_needed))
machine.update(new_tools)
for tool in requirements[job]:
tool_instants[tool].popleft()
return cost
7. Performance
The revised code in ç6 is about 20 times faster than the code in the post:
>>> test = lambda:schedule_cost([10, 9, 4, 1, 6, 3, 7, 5, 2, 8], J, 4)
>>> from timeit import timeit
>>> KTNS()[1] / timeit(test, number=1)
20.335097001763668
8. Tool decisions
If you also want to return the decisions about which tools to insert (in addition to the cost), then that's easy. The newly inserted tools at each instant are in the new_tools
set, and the tools in the machine at each instant are in the machine
set. Take a copy of whichever one of these you need to keep.
For example, if you want to keep copies of machine
then you would start with something like:
machine_record =
and then at the end of each loop, append a copy of machine
to the record:
machine_record.append(set(machine))
and findally return the record along with the cost:
return cost, machine_record
1. Bug
The code in the post does not compute the correct sequence of tool insertions for the Keep Tool Needed Soonest (KTNS) algorithm.
This is easiest to see if we print out the tools in the machine at each stage, by adding a line like
print(numpy.nonzero(J)[0] + 1)
near the end of the main loop. The first three lines of output are:
[ 1 2 4 10]
[1 4 5 8]
[1 5 6 9]
The third job in the schedule is job 4, which requires tools number 1, 5, and 9. So why did the code insert tool number 6? This is not needed by job 4, and so according to the KTNS algorithm it should not be inserted into the machine. Something has gone wrong.
This bug causes KTNS
to return the wrong result for the example: it returns a cost of 17, but the KTNS algorithm only requires 16 tool insertions for this schedule.
2. Argument
I started to write a review of the code in the post, but the major problem is that the wrong data structures have been selected, so I think it will be more helpful to explain how to approach this kind of problem from first principles.
Tang and Denardo (1986) give a mathematical description of KTNS procedure in section 3 of their paper. Mathematical notation is used in scientific papers because of its compactness and lack of ambiguity. But that doesn't mean that it is a good idea to translate the notation directly into source code. Before writing code, you need to solve two problems:
Choosing good names for variables. In the context of a paper, if you don't know what $L(i, n)$ is supposed to mean, then you can refer to its formal definition
$L(i, n) = minleftm:mâÂÂT(i, n)right$
or to its informal description:
the integer $L(i, n)$ is the first instant at or after instant $n$ at which tool $i$ is needed.
But in the context of source code it is not so easy to refer to these definitions, and so naming a variable
Lin
is only helpful to the reader who is following along with a copy of the paper. It is better to choose variable names that directly communicate the meaning, and to add comments (if necessary) to make that meaning unambiguous, for example:# Mapping from tool to the next instant at which it will be needed.
next_needed = ...Choosing data structures. The mathematical notation $L(i, n)$ doesn't give any clues as to how this should be implemented. Should you precompute a table of values for all tools $i$ and instants $n$? Or should you store for each tool the instants where that tool is needed in a sorted list and use bisection to find the first instant at or after $n$? Or should you store for each tool the instants where that tool is needed in a sorted queue, and pop its queue when a tool is used, so that the first element of queue is always the next instant when the tool will be needed? Or what? The choice of data structure is vital both in getting the most efficient implementation and in making the code simple and clear to the reader.
3. Decode
The key to finding the best data structures is to have a high level understanding of the algorithm, so that you are not trapped by the notation. The idea is to read the mathematics and decode it into concepts that you can work with. So when the paper says "Set $J_i = 1$" you figure out that this means "Insert tool $i$ into the machine".
This is the procedure:
Step 1. Set $J_i = 1$ for $C$ values of $i$ having minimal values of $L(i, 0)$. Break the ties arbitrarily. Set $J_i = 0$ for the remaining $M - C$ values of $i$. Set $n = 1$.
Step 2. Set $W_n = J$. Stop if $n = N$.
Step 3. If each $i$ having $L(i, n) = n$ also has $J_i = 1$, set $n = n + 1$ and go to Step 2.
Step 4. Pick $i$ having $L(i, n) = n$ and $J_i = 0$. Set $J_i = 1$.
Step 5. Set $J_k = 0$ for a $k$ that maximizes $L(p, n)$ over $p: J_p = 1$. Go to Step 3.
Here are my thoughts as I read the procedure:
- Steps 2 to step 5 are a loop over $n$.
- $n$ represents the current instant (job) in the schedule.
- $n$ starts at 1 (the first job) and goes up to $N$ (the last job). When implementing this in code, we will need to change this so that instants run from 0 up to (but not including) $N$, to avoid having to subtract 1 all the time.
- Step 1 is initialization: we fill the machine with the tools that will needed the soonest.
- In step 2, $W_n = J$ is just recording the set of tools that are now in the machine, which we don't care about since we are only going to be computing the cost.
- Step 3 is asking "does the machine contain every tool that is needed by the current job?" If so, it can go on to the next job.
- Step 4 only gets run if the machine lacks some tools that are needed. So we pick one of them, tool $i$, and insert it into the machine.
- Step 5 picks a tool in the machine that is not currently needed and removes it from the machine. The tool is the one that won't be needed for the longest.
4. Pseudo-code
Now that you have a good high-level understanding of the algorithm, the next step is to describe it in pseudo-code. Something like this:
While the machine is under capacity:
- Insert the tool that is needed soonest.
For each job in the schedule:
For each tool needed by the job:
- Insert the tool into the machine.
While the machine is over capacity:
- Remove the tool that won't be needed for the longest.
5. Data structures
Now examine the pseudo-code and figure out the data structures that are needed.
During initialization (but not afterwards), we need to be able to find the tool that is needed soonest.
We need to store the collection of tools that are currently in the machine.
We need to be able to find the size of this collection, so that we can determine whether the machine is over or under capacity.
This collection needs to support efficient insertion and deletion.
We need to be able to efficiently find the tool in the machine that won't be needed for the longest.
Point 1 is easy enough: for each tool, we can find the jobs that require that tool, and so find the first job for each tool, and use heapq.nsmallest
to pick the tools we want.
But points 2 to 5 are hard to satisfy. Here are three alternative implementation strategies:
The requirement to repeatedly and efficiently extract the minimum or maximum element (point 5) suggests some kind of priority queue. In the Python standard library the closest thing we have to a priority queue is a heap and heaps don't support efficient deletion. However, there is a workaround for deleting items from heaps, namely to leave the item in the heap but mark it as deleted, and ignore deleted items when they are extracted. But in order to satisfy point 3 we would need to ignore the deleted items so we would need to maintain a separate count.
Use a data structure from a third-party library. It looks as if a
SortedSet
from thesortedcontainers
library would meet our requirements.Use an ordinary data structure like a
set
, and accept some inefficiency in point 5 by callingmax
(taking time proportional to the capacity of the machine). When the capacity is small, this is acceptable.
I'm going to use strategy 3 because it's the simplest and because in your example the capacity is 4, which is small enough for the inefficiency not to matter.
6. Implementation
from collections import defaultdict, deque
from heapq import nsmallest
def schedule_cost(schedule, requirements, capacity):
"""Return the cost of the schedule, that is, the number of tool insertions.
Parameters:
schedule -- Sequence of jobs.
requirements -- Mapping from job to iterable of required tools.
capacity -- Number of tools that the machine can hold at once.
Jobs and tools may be any hashable objects.
"""
# Mapping from tool to deque of instants when the tool will be needed.
tool_instants = defaultdict(deque)
for i, job in enumerate(schedule):
assert len(requirements[job]) <= capacity
for tool in requirements[job]:
tool_instants[tool].append(i)
for instants in tool_instants.values():
instants.append(len(schedule))
# Point in schedule when tool will next be needed.
def next_needed(tool):
return tool_instants[tool][0]
# Set of tools in the machine.
machine = set(nsmallest(capacity, tool_instants, key=next_needed))
cost = len(machine)
for job in schedule:
# Set of tools that need to be inserted.
new_tools = tool for tool in requirements[job] if tool not in machine
if new_tools:
cost += len(new_tools)
# Remove tools if necessary to avoid going over capacity.
for _ in range(len(machine) + len(new_tools) - capacity):
machine.remove(max(machine, key=next_needed))
machine.update(new_tools)
for tool in requirements[job]:
tool_instants[tool].popleft()
return cost
7. Performance
The revised code in ç6 is about 20 times faster than the code in the post:
>>> test = lambda:schedule_cost([10, 9, 4, 1, 6, 3, 7, 5, 2, 8], J, 4)
>>> from timeit import timeit
>>> KTNS()[1] / timeit(test, number=1)
20.335097001763668
8. Tool decisions
If you also want to return the decisions about which tools to insert (in addition to the cost), then that's easy. The newly inserted tools at each instant are in the new_tools
set, and the tools in the machine at each instant are in the machine
set. Take a copy of whichever one of these you need to keep.
For example, if you want to keep copies of machine
then you would start with something like:
machine_record =
and then at the end of each loop, append a copy of machine
to the record:
machine_record.append(set(machine))
and findally return the record along with the cost:
return cost, machine_record
edited May 7 at 9:08
answered May 5 at 19:11
Gareth Rees
41.1k394166
41.1k394166
Well... I don't know how to thank you because five hours ago I realized that my code was inefficient in some cases. I've been trying a while to look for the problem and I couldn't find it. Your code is hundreds of times faster than my original version and provides the exact amounts of changes. All the literature is inside the code and you understood perfectly the idea of the problem. Btw how can I get the whole sequence W? I have tried to copy from the machine set updated with new_tools but I couldn't.
â A.Piquer
May 5 at 23:16
1
I only focused on the given code for my answer and wanted to take a closer look at the paper today⦠Guess I won't have to anymore :)
â Mathias Ettinger
May 7 at 8:02
1
@A.Piquer: See updated answer.
â Gareth Rees
May 7 at 9:12
add a comment |Â
Well... I don't know how to thank you because five hours ago I realized that my code was inefficient in some cases. I've been trying a while to look for the problem and I couldn't find it. Your code is hundreds of times faster than my original version and provides the exact amounts of changes. All the literature is inside the code and you understood perfectly the idea of the problem. Btw how can I get the whole sequence W? I have tried to copy from the machine set updated with new_tools but I couldn't.
â A.Piquer
May 5 at 23:16
1
I only focused on the given code for my answer and wanted to take a closer look at the paper today⦠Guess I won't have to anymore :)
â Mathias Ettinger
May 7 at 8:02
1
@A.Piquer: See updated answer.
â Gareth Rees
May 7 at 9:12
Well... I don't know how to thank you because five hours ago I realized that my code was inefficient in some cases. I've been trying a while to look for the problem and I couldn't find it. Your code is hundreds of times faster than my original version and provides the exact amounts of changes. All the literature is inside the code and you understood perfectly the idea of the problem. Btw how can I get the whole sequence W? I have tried to copy from the machine set updated with new_tools but I couldn't.
â A.Piquer
May 5 at 23:16
Well... I don't know how to thank you because five hours ago I realized that my code was inefficient in some cases. I've been trying a while to look for the problem and I couldn't find it. Your code is hundreds of times faster than my original version and provides the exact amounts of changes. All the literature is inside the code and you understood perfectly the idea of the problem. Btw how can I get the whole sequence W? I have tried to copy from the machine set updated with new_tools but I couldn't.
â A.Piquer
May 5 at 23:16
1
1
I only focused on the given code for my answer and wanted to take a closer look at the paper today⦠Guess I won't have to anymore :)
â Mathias Ettinger
May 7 at 8:02
I only focused on the given code for my answer and wanted to take a closer look at the paper today⦠Guess I won't have to anymore :)
â Mathias Ettinger
May 7 at 8:02
1
1
@A.Piquer: See updated answer.
â Gareth Rees
May 7 at 9:12
@A.Piquer: See updated answer.
â Gareth Rees
May 7 at 9:12
add a comment |Â
up vote
4
down vote
For starter, you should get a consistent coding style. Naming and spacing are usually off and makes the code harder to read. You should get yourself familiar with PEP8 and apply it to make your Python code look like Python code to others. As regard to variable names, I get that Tin
, Lin
, W
, and J
come from the literature youâÂÂre using; so keeping them should be fine. Other variables are way more obscure, such as m
or n
(are they the amount of jobs and the amount of tools?) or event count
for that matter.
Second, you shouldn't need to worry about timing in the very function you want to time. Use the timeit
module instead. And also wrap the timing code in an if __name__ == '__main__'
guard.
Then we will need to examine the code more closely:
import sets
is deprecated since Python 2.6, use the builtinset
instead;copy.deepcopy()
is very heavy, better build a new dictionnary by only copying the containedset
s;- the usual import of
numpy
isimport numpy as np
; - you don't need empty
else
s clauses, they are just noise to the reader; - since you copy the
Tools
dictionnary before your loop, you don't need to build it at each function call, extract that part to an initialisation function and provide the dictionnary directly toKTNS
rather thanJobs
andsigma
; - your
while
loop would read better asfor count in range(1, n + 1):
; - once
J
reaches its requirements, it is not modified for subsequentse
inTools
, thus you can extract that part from the innerfor
loop; reference
is always the same in every iterations, better compute it once at the start of the function;- you can avoid using a
try:â¦except ValueError:
by addingn + 1
in each set ofTin
before the main loop: youâÂÂre guaranteed that this element will never be discarded; - you can compute the final cost more easily by using the potential of numpy for vectorized operations, fancy indexing, and
np.sum
; - using an other type of value (such as
'-'
instead of a number) to denote an absence of an element is prone to error and misunderstanding, you're better off deleting the key since the last loop will put it back in place anyway.
Proposed improvements follows, a word of caution though, I may have inverted the meaning of m
and n
since it is very difficult to know which is which (again, better naming would solve the problem):
from collections import defaultdict
import numpy as np
JOBS =
1: [6, 7],
2: [2, 3],
3: [1, 6, 9],
4: [1, 5, 9],
5: [5, 8, 10],
6: [1, 3, 6, 8],
7: [5, 6, 8, 9],
8: [5, 7, 8],
9: [1, 4, 5, 8],
10: [1, 2, 4, 10],
def prepare_tools(jobs, sigma=(10, 9, 4, 1, 6, 3, 7, 5, 2, 8)):
tools = defaultdict(set)
for count, e in enumerate(sigma, 1):
for i in jobs[e]:
tools[i].add(count)
max_value = max(sigma) + 1
for counts_set in tools.itervalues():
counts_set.add(max_value)
return tools
def keep_the_tool_needed_soonest(tools, n, capacity=4):
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(n, capacity)
m = len(tools)
W = np.zeros((n, m), dtype=int)
for count in range(n):
job_id = count + 1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id and len(min_keys) > 1 and count:
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
del Lin[element]
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key - 1] = True
del Lin[key]
for jobs in Tin.itervalues():
jobs.discard(job_id)
for element, jobs in Tin.iteritems():
Lin[element] = min(jobs)
W[count] = J
cost = W[1:] - W[:n - 1]
return np.sum(cost[cost > 0]) + capacity
With the testing code being:
if __name__ == '__main__':
from timeit import timeit
number_of_calls = 10000
time = timeit('KTNS(JOBS)', setup='from __main__ import KTNS, JOBS', number=number_of_calls)
result = KTNS(Jobs=JOBS)
print('BEFORE', result, time / number_of_calls)
setup = """
from __main__ import JOBS, prepare_tools, keep_the_tool_needed_soonest
tools = prepare_tools(JOBS)"""
time = timeit('keep_the_tool_needed_soonest(tools, len(JOBS))', setup=setup, number=number_of_calls)
result = keep_the_tool_needed_soonest(prepare_tools(JOBS), len(JOBS))
print('AFTER', result, time / number_of_calls)
I obtain the following results:
$ python2 ktns.py
('BEFORE', 17, 0.0009421438932418824)
('AFTER', 17, 0.00019730830192565919)
Which is roughly a 5ÃÂ improvement, not thrilling but somehow acceptable. Granted the code reads way better now.
However, this code has a major flaw in which it only works with jobs and tools numbered from 1 to m
or n
. If, by any chance, your code allows for jobs being [3, 8, 12] and tools numbered from 100 to 123 for instance, your various loops and updates to J will just not work.
As you have commented the variables m and n are the number of tools (m) and the number of jobs (n, to denominate the number of steps to finish the algorithm). Talking about the try/except method, it must be done because the variable Lin is always updated with a new value, and when there aren't more values in Tin it should be updated with n+1 (to avoid the set of the tool). About your last comment, the jobs and the tools are always numbered from 1 to m or from 1 to n, but sometimes the sigma list doesn't have all the elements (e.g [2,5,6,7] as a partial solution of sigma). Any more comments?
â A.Piquer
May 4 at 16:21
@A.Piquer thetry/except
is taken care of by puttingn+1
in each set oftools
inprepare_tools
, so they are already inTin
and the set will never be empty. As forsigma
being less than the whole set of jobs⦠well, I didn't test extensivelly. It may or may not workâ¦
â Mathias Ettinger
May 4 at 16:27
But the fact is that prepare_tools doesn't do what it should. The fact is that the aim of prepare_tools is to set a Dictionary (Tools) which will agrupate for each tool in what job of the sequence sigma is needed. For example for Jobs:1:[1,2,3], 2:[3,4,5], 3:[1,3],4:[1,5],5:[10] and sigma=[1,2,4] , the Tools dictonary of sets would be: 1:[1,4], 2:[1], 3: [1,2], 4: [2] ,5:[2,4] . Do you know how to adapt the code to achieve this? I did in the way I post because I didn't know how to do faster, but your version is pretty better. Thank you
â A.Piquer
May 4 at 16:35
@A.Piquer I don't get it, the kids are the same:for e in sigma: for i in jobs[e]
I just handledcount
using enumerate and initialization using a defaultdict but the results are similar.
â Mathias Ettinger
May 4 at 17:21
I have checked another time and you are right. In the other hand, do you think that it exists a better way to do the steps of the article I attached in the question? I don't know how to achieve the steps 3 and 4, and they are the core of the algorithm. Looking for the previous steps in W it's complicated because it is needed to look for previous values. If you know a better way I would really appreciate.
â A.Piquer
May 4 at 17:28
add a comment |Â
up vote
4
down vote
For starter, you should get a consistent coding style. Naming and spacing are usually off and makes the code harder to read. You should get yourself familiar with PEP8 and apply it to make your Python code look like Python code to others. As regard to variable names, I get that Tin
, Lin
, W
, and J
come from the literature youâÂÂre using; so keeping them should be fine. Other variables are way more obscure, such as m
or n
(are they the amount of jobs and the amount of tools?) or event count
for that matter.
Second, you shouldn't need to worry about timing in the very function you want to time. Use the timeit
module instead. And also wrap the timing code in an if __name__ == '__main__'
guard.
Then we will need to examine the code more closely:
import sets
is deprecated since Python 2.6, use the builtinset
instead;copy.deepcopy()
is very heavy, better build a new dictionnary by only copying the containedset
s;- the usual import of
numpy
isimport numpy as np
; - you don't need empty
else
s clauses, they are just noise to the reader; - since you copy the
Tools
dictionnary before your loop, you don't need to build it at each function call, extract that part to an initialisation function and provide the dictionnary directly toKTNS
rather thanJobs
andsigma
; - your
while
loop would read better asfor count in range(1, n + 1):
; - once
J
reaches its requirements, it is not modified for subsequentse
inTools
, thus you can extract that part from the innerfor
loop; reference
is always the same in every iterations, better compute it once at the start of the function;- you can avoid using a
try:â¦except ValueError:
by addingn + 1
in each set ofTin
before the main loop: youâÂÂre guaranteed that this element will never be discarded; - you can compute the final cost more easily by using the potential of numpy for vectorized operations, fancy indexing, and
np.sum
; - using an other type of value (such as
'-'
instead of a number) to denote an absence of an element is prone to error and misunderstanding, you're better off deleting the key since the last loop will put it back in place anyway.
Proposed improvements follows, a word of caution though, I may have inverted the meaning of m
and n
since it is very difficult to know which is which (again, better naming would solve the problem):
from collections import defaultdict
import numpy as np
JOBS =
1: [6, 7],
2: [2, 3],
3: [1, 6, 9],
4: [1, 5, 9],
5: [5, 8, 10],
6: [1, 3, 6, 8],
7: [5, 6, 8, 9],
8: [5, 7, 8],
9: [1, 4, 5, 8],
10: [1, 2, 4, 10],
def prepare_tools(jobs, sigma=(10, 9, 4, 1, 6, 3, 7, 5, 2, 8)):
tools = defaultdict(set)
for count, e in enumerate(sigma, 1):
for i in jobs[e]:
tools[i].add(count)
max_value = max(sigma) + 1
for counts_set in tools.itervalues():
counts_set.add(max_value)
return tools
def keep_the_tool_needed_soonest(tools, n, capacity=4):
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(n, capacity)
m = len(tools)
W = np.zeros((n, m), dtype=int)
for count in range(n):
job_id = count + 1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id and len(min_keys) > 1 and count:
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
del Lin[element]
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key - 1] = True
del Lin[key]
for jobs in Tin.itervalues():
jobs.discard(job_id)
for element, jobs in Tin.iteritems():
Lin[element] = min(jobs)
W[count] = J
cost = W[1:] - W[:n - 1]
return np.sum(cost[cost > 0]) + capacity
With the testing code being:
if __name__ == '__main__':
from timeit import timeit
number_of_calls = 10000
time = timeit('KTNS(JOBS)', setup='from __main__ import KTNS, JOBS', number=number_of_calls)
result = KTNS(Jobs=JOBS)
print('BEFORE', result, time / number_of_calls)
setup = """
from __main__ import JOBS, prepare_tools, keep_the_tool_needed_soonest
tools = prepare_tools(JOBS)"""
time = timeit('keep_the_tool_needed_soonest(tools, len(JOBS))', setup=setup, number=number_of_calls)
result = keep_the_tool_needed_soonest(prepare_tools(JOBS), len(JOBS))
print('AFTER', result, time / number_of_calls)
I obtain the following results:
$ python2 ktns.py
('BEFORE', 17, 0.0009421438932418824)
('AFTER', 17, 0.00019730830192565919)
Which is roughly a 5ÃÂ improvement, not thrilling but somehow acceptable. Granted the code reads way better now.
However, this code has a major flaw in which it only works with jobs and tools numbered from 1 to m
or n
. If, by any chance, your code allows for jobs being [3, 8, 12] and tools numbered from 100 to 123 for instance, your various loops and updates to J will just not work.
As you have commented the variables m and n are the number of tools (m) and the number of jobs (n, to denominate the number of steps to finish the algorithm). Talking about the try/except method, it must be done because the variable Lin is always updated with a new value, and when there aren't more values in Tin it should be updated with n+1 (to avoid the set of the tool). About your last comment, the jobs and the tools are always numbered from 1 to m or from 1 to n, but sometimes the sigma list doesn't have all the elements (e.g [2,5,6,7] as a partial solution of sigma). Any more comments?
â A.Piquer
May 4 at 16:21
@A.Piquer thetry/except
is taken care of by puttingn+1
in each set oftools
inprepare_tools
, so they are already inTin
and the set will never be empty. As forsigma
being less than the whole set of jobs⦠well, I didn't test extensivelly. It may or may not workâ¦
â Mathias Ettinger
May 4 at 16:27
But the fact is that prepare_tools doesn't do what it should. The fact is that the aim of prepare_tools is to set a Dictionary (Tools) which will agrupate for each tool in what job of the sequence sigma is needed. For example for Jobs:1:[1,2,3], 2:[3,4,5], 3:[1,3],4:[1,5],5:[10] and sigma=[1,2,4] , the Tools dictonary of sets would be: 1:[1,4], 2:[1], 3: [1,2], 4: [2] ,5:[2,4] . Do you know how to adapt the code to achieve this? I did in the way I post because I didn't know how to do faster, but your version is pretty better. Thank you
â A.Piquer
May 4 at 16:35
@A.Piquer I don't get it, the kids are the same:for e in sigma: for i in jobs[e]
I just handledcount
using enumerate and initialization using a defaultdict but the results are similar.
â Mathias Ettinger
May 4 at 17:21
I have checked another time and you are right. In the other hand, do you think that it exists a better way to do the steps of the article I attached in the question? I don't know how to achieve the steps 3 and 4, and they are the core of the algorithm. Looking for the previous steps in W it's complicated because it is needed to look for previous values. If you know a better way I would really appreciate.
â A.Piquer
May 4 at 17:28
add a comment |Â
up vote
4
down vote
up vote
4
down vote
For starter, you should get a consistent coding style. Naming and spacing are usually off and makes the code harder to read. You should get yourself familiar with PEP8 and apply it to make your Python code look like Python code to others. As regard to variable names, I get that Tin
, Lin
, W
, and J
come from the literature youâÂÂre using; so keeping them should be fine. Other variables are way more obscure, such as m
or n
(are they the amount of jobs and the amount of tools?) or event count
for that matter.
Second, you shouldn't need to worry about timing in the very function you want to time. Use the timeit
module instead. And also wrap the timing code in an if __name__ == '__main__'
guard.
Then we will need to examine the code more closely:
import sets
is deprecated since Python 2.6, use the builtinset
instead;copy.deepcopy()
is very heavy, better build a new dictionnary by only copying the containedset
s;- the usual import of
numpy
isimport numpy as np
; - you don't need empty
else
s clauses, they are just noise to the reader; - since you copy the
Tools
dictionnary before your loop, you don't need to build it at each function call, extract that part to an initialisation function and provide the dictionnary directly toKTNS
rather thanJobs
andsigma
; - your
while
loop would read better asfor count in range(1, n + 1):
; - once
J
reaches its requirements, it is not modified for subsequentse
inTools
, thus you can extract that part from the innerfor
loop; reference
is always the same in every iterations, better compute it once at the start of the function;- you can avoid using a
try:â¦except ValueError:
by addingn + 1
in each set ofTin
before the main loop: youâÂÂre guaranteed that this element will never be discarded; - you can compute the final cost more easily by using the potential of numpy for vectorized operations, fancy indexing, and
np.sum
; - using an other type of value (such as
'-'
instead of a number) to denote an absence of an element is prone to error and misunderstanding, you're better off deleting the key since the last loop will put it back in place anyway.
Proposed improvements follows, a word of caution though, I may have inverted the meaning of m
and n
since it is very difficult to know which is which (again, better naming would solve the problem):
from collections import defaultdict
import numpy as np
JOBS =
1: [6, 7],
2: [2, 3],
3: [1, 6, 9],
4: [1, 5, 9],
5: [5, 8, 10],
6: [1, 3, 6, 8],
7: [5, 6, 8, 9],
8: [5, 7, 8],
9: [1, 4, 5, 8],
10: [1, 2, 4, 10],
def prepare_tools(jobs, sigma=(10, 9, 4, 1, 6, 3, 7, 5, 2, 8)):
tools = defaultdict(set)
for count, e in enumerate(sigma, 1):
for i in jobs[e]:
tools[i].add(count)
max_value = max(sigma) + 1
for counts_set in tools.itervalues():
counts_set.add(max_value)
return tools
def keep_the_tool_needed_soonest(tools, n, capacity=4):
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(n, capacity)
m = len(tools)
W = np.zeros((n, m), dtype=int)
for count in range(n):
job_id = count + 1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id and len(min_keys) > 1 and count:
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
del Lin[element]
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key - 1] = True
del Lin[key]
for jobs in Tin.itervalues():
jobs.discard(job_id)
for element, jobs in Tin.iteritems():
Lin[element] = min(jobs)
W[count] = J
cost = W[1:] - W[:n - 1]
return np.sum(cost[cost > 0]) + capacity
With the testing code being:
if __name__ == '__main__':
from timeit import timeit
number_of_calls = 10000
time = timeit('KTNS(JOBS)', setup='from __main__ import KTNS, JOBS', number=number_of_calls)
result = KTNS(Jobs=JOBS)
print('BEFORE', result, time / number_of_calls)
setup = """
from __main__ import JOBS, prepare_tools, keep_the_tool_needed_soonest
tools = prepare_tools(JOBS)"""
time = timeit('keep_the_tool_needed_soonest(tools, len(JOBS))', setup=setup, number=number_of_calls)
result = keep_the_tool_needed_soonest(prepare_tools(JOBS), len(JOBS))
print('AFTER', result, time / number_of_calls)
I obtain the following results:
$ python2 ktns.py
('BEFORE', 17, 0.0009421438932418824)
('AFTER', 17, 0.00019730830192565919)
Which is roughly a 5ÃÂ improvement, not thrilling but somehow acceptable. Granted the code reads way better now.
However, this code has a major flaw in which it only works with jobs and tools numbered from 1 to m
or n
. If, by any chance, your code allows for jobs being [3, 8, 12] and tools numbered from 100 to 123 for instance, your various loops and updates to J will just not work.
For starter, you should get a consistent coding style. Naming and spacing are usually off and makes the code harder to read. You should get yourself familiar with PEP8 and apply it to make your Python code look like Python code to others. As regard to variable names, I get that Tin
, Lin
, W
, and J
come from the literature youâÂÂre using; so keeping them should be fine. Other variables are way more obscure, such as m
or n
(are they the amount of jobs and the amount of tools?) or event count
for that matter.
Second, you shouldn't need to worry about timing in the very function you want to time. Use the timeit
module instead. And also wrap the timing code in an if __name__ == '__main__'
guard.
Then we will need to examine the code more closely:
import sets
is deprecated since Python 2.6, use the builtinset
instead;copy.deepcopy()
is very heavy, better build a new dictionnary by only copying the containedset
s;- the usual import of
numpy
isimport numpy as np
; - you don't need empty
else
s clauses, they are just noise to the reader; - since you copy the
Tools
dictionnary before your loop, you don't need to build it at each function call, extract that part to an initialisation function and provide the dictionnary directly toKTNS
rather thanJobs
andsigma
; - your
while
loop would read better asfor count in range(1, n + 1):
; - once
J
reaches its requirements, it is not modified for subsequentse
inTools
, thus you can extract that part from the innerfor
loop; reference
is always the same in every iterations, better compute it once at the start of the function;- you can avoid using a
try:â¦except ValueError:
by addingn + 1
in each set ofTin
before the main loop: youâÂÂre guaranteed that this element will never be discarded; - you can compute the final cost more easily by using the potential of numpy for vectorized operations, fancy indexing, and
np.sum
; - using an other type of value (such as
'-'
instead of a number) to denote an absence of an element is prone to error and misunderstanding, you're better off deleting the key since the last loop will put it back in place anyway.
Proposed improvements follows, a word of caution though, I may have inverted the meaning of m
and n
since it is very difficult to know which is which (again, better naming would solve the problem):
from collections import defaultdict
import numpy as np
JOBS =
1: [6, 7],
2: [2, 3],
3: [1, 6, 9],
4: [1, 5, 9],
5: [5, 8, 10],
6: [1, 3, 6, 8],
7: [5, 6, 8, 9],
8: [5, 7, 8],
9: [1, 4, 5, 8],
10: [1, 2, 4, 10],
def prepare_tools(jobs, sigma=(10, 9, 4, 1, 6, 3, 7, 5, 2, 8)):
tools = defaultdict(set)
for count, e in enumerate(sigma, 1):
for i in jobs[e]:
tools[i].add(count)
max_value = max(sigma) + 1
for counts_set in tools.itervalues():
counts_set.add(max_value)
return tools
def keep_the_tool_needed_soonest(tools, n, capacity=4):
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(n, capacity)
m = len(tools)
W = np.zeros((n, m), dtype=int)
for count in range(n):
job_id = count + 1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id and len(min_keys) > 1 and count:
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
del Lin[element]
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key - 1] = True
del Lin[key]
for jobs in Tin.itervalues():
jobs.discard(job_id)
for element, jobs in Tin.iteritems():
Lin[element] = min(jobs)
W[count] = J
cost = W[1:] - W[:n - 1]
return np.sum(cost[cost > 0]) + capacity
With the testing code being:
if __name__ == '__main__':
from timeit import timeit
number_of_calls = 10000
time = timeit('KTNS(JOBS)', setup='from __main__ import KTNS, JOBS', number=number_of_calls)
result = KTNS(Jobs=JOBS)
print('BEFORE', result, time / number_of_calls)
setup = """
from __main__ import JOBS, prepare_tools, keep_the_tool_needed_soonest
tools = prepare_tools(JOBS)"""
time = timeit('keep_the_tool_needed_soonest(tools, len(JOBS))', setup=setup, number=number_of_calls)
result = keep_the_tool_needed_soonest(prepare_tools(JOBS), len(JOBS))
print('AFTER', result, time / number_of_calls)
I obtain the following results:
$ python2 ktns.py
('BEFORE', 17, 0.0009421438932418824)
('AFTER', 17, 0.00019730830192565919)
Which is roughly a 5ÃÂ improvement, not thrilling but somehow acceptable. Granted the code reads way better now.
However, this code has a major flaw in which it only works with jobs and tools numbered from 1 to m
or n
. If, by any chance, your code allows for jobs being [3, 8, 12] and tools numbered from 100 to 123 for instance, your various loops and updates to J will just not work.
answered May 4 at 15:10
Mathias Ettinger
21.8k32876
21.8k32876
As you have commented the variables m and n are the number of tools (m) and the number of jobs (n, to denominate the number of steps to finish the algorithm). Talking about the try/except method, it must be done because the variable Lin is always updated with a new value, and when there aren't more values in Tin it should be updated with n+1 (to avoid the set of the tool). About your last comment, the jobs and the tools are always numbered from 1 to m or from 1 to n, but sometimes the sigma list doesn't have all the elements (e.g [2,5,6,7] as a partial solution of sigma). Any more comments?
â A.Piquer
May 4 at 16:21
@A.Piquer thetry/except
is taken care of by puttingn+1
in each set oftools
inprepare_tools
, so they are already inTin
and the set will never be empty. As forsigma
being less than the whole set of jobs⦠well, I didn't test extensivelly. It may or may not workâ¦
â Mathias Ettinger
May 4 at 16:27
But the fact is that prepare_tools doesn't do what it should. The fact is that the aim of prepare_tools is to set a Dictionary (Tools) which will agrupate for each tool in what job of the sequence sigma is needed. For example for Jobs:1:[1,2,3], 2:[3,4,5], 3:[1,3],4:[1,5],5:[10] and sigma=[1,2,4] , the Tools dictonary of sets would be: 1:[1,4], 2:[1], 3: [1,2], 4: [2] ,5:[2,4] . Do you know how to adapt the code to achieve this? I did in the way I post because I didn't know how to do faster, but your version is pretty better. Thank you
â A.Piquer
May 4 at 16:35
@A.Piquer I don't get it, the kids are the same:for e in sigma: for i in jobs[e]
I just handledcount
using enumerate and initialization using a defaultdict but the results are similar.
â Mathias Ettinger
May 4 at 17:21
I have checked another time and you are right. In the other hand, do you think that it exists a better way to do the steps of the article I attached in the question? I don't know how to achieve the steps 3 and 4, and they are the core of the algorithm. Looking for the previous steps in W it's complicated because it is needed to look for previous values. If you know a better way I would really appreciate.
â A.Piquer
May 4 at 17:28
add a comment |Â
As you have commented the variables m and n are the number of tools (m) and the number of jobs (n, to denominate the number of steps to finish the algorithm). Talking about the try/except method, it must be done because the variable Lin is always updated with a new value, and when there aren't more values in Tin it should be updated with n+1 (to avoid the set of the tool). About your last comment, the jobs and the tools are always numbered from 1 to m or from 1 to n, but sometimes the sigma list doesn't have all the elements (e.g [2,5,6,7] as a partial solution of sigma). Any more comments?
â A.Piquer
May 4 at 16:21
@A.Piquer thetry/except
is taken care of by puttingn+1
in each set oftools
inprepare_tools
, so they are already inTin
and the set will never be empty. As forsigma
being less than the whole set of jobs⦠well, I didn't test extensivelly. It may or may not workâ¦
â Mathias Ettinger
May 4 at 16:27
But the fact is that prepare_tools doesn't do what it should. The fact is that the aim of prepare_tools is to set a Dictionary (Tools) which will agrupate for each tool in what job of the sequence sigma is needed. For example for Jobs:1:[1,2,3], 2:[3,4,5], 3:[1,3],4:[1,5],5:[10] and sigma=[1,2,4] , the Tools dictonary of sets would be: 1:[1,4], 2:[1], 3: [1,2], 4: [2] ,5:[2,4] . Do you know how to adapt the code to achieve this? I did in the way I post because I didn't know how to do faster, but your version is pretty better. Thank you
â A.Piquer
May 4 at 16:35
@A.Piquer I don't get it, the kids are the same:for e in sigma: for i in jobs[e]
I just handledcount
using enumerate and initialization using a defaultdict but the results are similar.
â Mathias Ettinger
May 4 at 17:21
I have checked another time and you are right. In the other hand, do you think that it exists a better way to do the steps of the article I attached in the question? I don't know how to achieve the steps 3 and 4, and they are the core of the algorithm. Looking for the previous steps in W it's complicated because it is needed to look for previous values. If you know a better way I would really appreciate.
â A.Piquer
May 4 at 17:28
As you have commented the variables m and n are the number of tools (m) and the number of jobs (n, to denominate the number of steps to finish the algorithm). Talking about the try/except method, it must be done because the variable Lin is always updated with a new value, and when there aren't more values in Tin it should be updated with n+1 (to avoid the set of the tool). About your last comment, the jobs and the tools are always numbered from 1 to m or from 1 to n, but sometimes the sigma list doesn't have all the elements (e.g [2,5,6,7] as a partial solution of sigma). Any more comments?
â A.Piquer
May 4 at 16:21
As you have commented the variables m and n are the number of tools (m) and the number of jobs (n, to denominate the number of steps to finish the algorithm). Talking about the try/except method, it must be done because the variable Lin is always updated with a new value, and when there aren't more values in Tin it should be updated with n+1 (to avoid the set of the tool). About your last comment, the jobs and the tools are always numbered from 1 to m or from 1 to n, but sometimes the sigma list doesn't have all the elements (e.g [2,5,6,7] as a partial solution of sigma). Any more comments?
â A.Piquer
May 4 at 16:21
@A.Piquer the
try/except
is taken care of by putting n+1
in each set of tools
in prepare_tools
, so they are already in Tin
and the set will never be empty. As for sigma
being less than the whole set of jobs⦠well, I didn't test extensivelly. It may or may not workâ¦â Mathias Ettinger
May 4 at 16:27
@A.Piquer the
try/except
is taken care of by putting n+1
in each set of tools
in prepare_tools
, so they are already in Tin
and the set will never be empty. As for sigma
being less than the whole set of jobs⦠well, I didn't test extensivelly. It may or may not workâ¦â Mathias Ettinger
May 4 at 16:27
But the fact is that prepare_tools doesn't do what it should. The fact is that the aim of prepare_tools is to set a Dictionary (Tools) which will agrupate for each tool in what job of the sequence sigma is needed. For example for Jobs:1:[1,2,3], 2:[3,4,5], 3:[1,3],4:[1,5],5:[10] and sigma=[1,2,4] , the Tools dictonary of sets would be: 1:[1,4], 2:[1], 3: [1,2], 4: [2] ,5:[2,4] . Do you know how to adapt the code to achieve this? I did in the way I post because I didn't know how to do faster, but your version is pretty better. Thank you
â A.Piquer
May 4 at 16:35
But the fact is that prepare_tools doesn't do what it should. The fact is that the aim of prepare_tools is to set a Dictionary (Tools) which will agrupate for each tool in what job of the sequence sigma is needed. For example for Jobs:1:[1,2,3], 2:[3,4,5], 3:[1,3],4:[1,5],5:[10] and sigma=[1,2,4] , the Tools dictonary of sets would be: 1:[1,4], 2:[1], 3: [1,2], 4: [2] ,5:[2,4] . Do you know how to adapt the code to achieve this? I did in the way I post because I didn't know how to do faster, but your version is pretty better. Thank you
â A.Piquer
May 4 at 16:35
@A.Piquer I don't get it, the kids are the same:
for e in sigma: for i in jobs[e]
I just handled count
using enumerate and initialization using a defaultdict but the results are similar.â Mathias Ettinger
May 4 at 17:21
@A.Piquer I don't get it, the kids are the same:
for e in sigma: for i in jobs[e]
I just handled count
using enumerate and initialization using a defaultdict but the results are similar.â Mathias Ettinger
May 4 at 17:21
I have checked another time and you are right. In the other hand, do you think that it exists a better way to do the steps of the article I attached in the question? I don't know how to achieve the steps 3 and 4, and they are the core of the algorithm. Looking for the previous steps in W it's complicated because it is needed to look for previous values. If you know a better way I would really appreciate.
â A.Piquer
May 4 at 17:28
I have checked another time and you are right. In the other hand, do you think that it exists a better way to do the steps of the article I attached in the question? I don't know how to achieve the steps 3 and 4, and they are the core of the algorithm. Looking for the previous steps in W it's complicated because it is needed to look for previous values. If you know a better way I would really appreciate.
â A.Piquer
May 4 at 17:28
add a comment |Â
up vote
1
down vote
Through the changes proposed by Mathias I have changed my code and nowadays works perfectly. The speed has improved a lot and the results are correct. Here I attach the whole code for the people who may be interested in the code optimization. I have added the prepare_tools function inside the keep_the_tool_needed_soonest in order to call only one function with the different arguments (the function must be called by sigma, Jobs, m, c). Any more improvements which can be done don't hesitate to share.
from collections import defaultdict
import numpy as np
def KTNS(sigma, jobs, m, capacity):
tools = defaultdict(set)
count = 1
for e in sigma:
for i in jobs[e]:
tools[i].add(count)
count += 1
steps=len(sigma)
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(len(Lin), capacity)
W = np.zeros((steps, m), dtype=int)
for count in range(steps):
job_id = count+1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id : #and len(min_keys) > 1
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
Lin[element]='-'
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key-1] = True
Lin[key]='-'
for jobs in Tin:
Tin[jobs].discard(job_id)
for element, jobs in Tin.iteritems():
try:
Lin[element] = min(jobs)
except ValueError:
Tin[element].add(steps+1)
Lin[element]=steps+1
W[count] = J
cost = W[1:] - W[:steps - 1]
return np.sum(cost[cost > 0])
add a comment |Â
up vote
1
down vote
Through the changes proposed by Mathias I have changed my code and nowadays works perfectly. The speed has improved a lot and the results are correct. Here I attach the whole code for the people who may be interested in the code optimization. I have added the prepare_tools function inside the keep_the_tool_needed_soonest in order to call only one function with the different arguments (the function must be called by sigma, Jobs, m, c). Any more improvements which can be done don't hesitate to share.
from collections import defaultdict
import numpy as np
def KTNS(sigma, jobs, m, capacity):
tools = defaultdict(set)
count = 1
for e in sigma:
for i in jobs[e]:
tools[i].add(count)
count += 1
steps=len(sigma)
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(len(Lin), capacity)
W = np.zeros((steps, m), dtype=int)
for count in range(steps):
job_id = count+1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id : #and len(min_keys) > 1
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
Lin[element]='-'
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key-1] = True
Lin[key]='-'
for jobs in Tin:
Tin[jobs].discard(job_id)
for element, jobs in Tin.iteritems():
try:
Lin[element] = min(jobs)
except ValueError:
Tin[element].add(steps+1)
Lin[element]=steps+1
W[count] = J
cost = W[1:] - W[:steps - 1]
return np.sum(cost[cost > 0])
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Through the changes proposed by Mathias I have changed my code and nowadays works perfectly. The speed has improved a lot and the results are correct. Here I attach the whole code for the people who may be interested in the code optimization. I have added the prepare_tools function inside the keep_the_tool_needed_soonest in order to call only one function with the different arguments (the function must be called by sigma, Jobs, m, c). Any more improvements which can be done don't hesitate to share.
from collections import defaultdict
import numpy as np
def KTNS(sigma, jobs, m, capacity):
tools = defaultdict(set)
count = 1
for e in sigma:
for i in jobs[e]:
tools[i].add(count)
count += 1
steps=len(sigma)
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(len(Lin), capacity)
W = np.zeros((steps, m), dtype=int)
for count in range(steps):
job_id = count+1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id : #and len(min_keys) > 1
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
Lin[element]='-'
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key-1] = True
Lin[key]='-'
for jobs in Tin:
Tin[jobs].discard(job_id)
for element, jobs in Tin.iteritems():
try:
Lin[element] = min(jobs)
except ValueError:
Tin[element].add(steps+1)
Lin[element]=steps+1
W[count] = J
cost = W[1:] - W[:steps - 1]
return np.sum(cost[cost > 0])
Through the changes proposed by Mathias I have changed my code and nowadays works perfectly. The speed has improved a lot and the results are correct. Here I attach the whole code for the people who may be interested in the code optimization. I have added the prepare_tools function inside the keep_the_tool_needed_soonest in order to call only one function with the different arguments (the function must be called by sigma, Jobs, m, c). Any more improvements which can be done don't hesitate to share.
from collections import defaultdict
import numpy as np
def KTNS(sigma, jobs, m, capacity):
tools = defaultdict(set)
count = 1
for e in sigma:
for i in jobs[e]:
tools[i].add(count)
count += 1
steps=len(sigma)
Lin = e: min(counts) for e, counts in tools.iteritems()
Tin = e: counts.copy() for e, counts in tools.iteritems()
reference = min(len(Lin), capacity)
W = np.zeros((steps, m), dtype=int)
for count in range(steps):
job_id = count+1
J = np.zeros(m, dtype=bool)
while J.sum() < reference:
min_value = min(Lin.itervalues())
min_keys = [k for k, v in Lin.iteritems() if v == min_value]
should_update = True
if min_value > job_id : #and len(min_keys) > 1
J0 = W[count - 1]
for element in min_keys:
if J0[element - 1]:
J[element - 1] = True
Lin[element]='-'
should_update = False
if J.sum() >= reference:
break
if should_update:
key = min_keys[0]
J[key-1] = True
Lin[key]='-'
for jobs in Tin:
Tin[jobs].discard(job_id)
for element, jobs in Tin.iteritems():
try:
Lin[element] = min(jobs)
except ValueError:
Tin[element].add(steps+1)
Lin[element]=steps+1
W[count] = J
cost = W[1:] - W[:steps - 1]
return np.sum(cost[cost > 0])
answered May 5 at 9:17
A.Piquer
505
505
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193371%2fpython-methods-to-reduce-running-times-ktns-algorithm%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
4
Welcome to Code Review! I hope you get some great answers.
â Phrancis
May 1 at 17:55
Do you use the returned elapsed time in your whole program or is it just to know roughly how much time the function takes on your machine?
â Mathias Ettinger
May 4 at 9:40
@MathiasEttinger it is only to know how much time it takes. I want to decrease the time because nowadays due to the usage of if statements the python loop isn't optimal. Using less time the main algorithm will be able to do more iterations in less time. My main problem is that I don't know how to do it without the ifs, because I don't know how to define the set of Tools for every step without comparing with an if statement.
â A.Piquer
May 4 at 9:54