Genetic algorithm for Traveling Salesman

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
8
down vote

favorite

I have implemented a genetic algorithm in python 3 for a programming assignment, and I think all the logic is correct. A friend of mine has also implemented one which carries out similar logic, however his was done in Java. It took his around 5 seconds to complete 5000 iterations, whereas mine is taking nearly four minutes!

I have tried to determine the bottleneck using the Python profiler (as described in this answer), but all of them seem to be taking quite long:

Are there any optimisations I can make which will improve the duration of my program? Here is the code.

import sys, math, random, heapq
import matplotlib.pyplot as plt
from itertools import chain

if sys.version_info < (3, 0):
 sys.exit("""Sorry, requires Python 3.x, not Python 2.x.""")

class Graph:

 def __init__(self, vertices):
 self.vertices = vertices
 self.n = len(vertices)

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

 def plot(self, tour=None):
 """Plots the cities and superimposes given tour"""

 if tour is None:
 tour = Tour(self, )

 _vertices = [self.vertices[0]]

 for i in tour.vertices:
 _vertices.append(self.vertices[i])

 _vertices.append(self.vertices[0])

 plt.title("Cost = " + str(tour.cost()))
 plt.plot(*zip(*_vertices), '-r')
 plt.scatter(*zip(*self.vertices), c="b", s=10, marker="s")
 plt.show()


class Tour:

 def __init__(self, g, vertices = None):
 """Generate random tour in given graph g"""

 self.g = g

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

 self.__cost = None

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])
 return self.__cost


class GeneticAlgorithm:

 def __init__(self, g, population_size, k=5, elite_mating_rate=0.5,
 mutation_rate=0.015, mutation_swap_rate=0.2):
 """Initialises algorithm parameters"""

 self.g = g

 self.population = 
 for _ in range(population_size):
 self.population.append(Tour(g))

 self.population_size = population_size
 self.k = k
 self.elite_mating_rate = elite_mating_rate
 self.mutation_rate = mutation_rate
 self.mutation_swap_rate = mutation_swap_rate

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):
 """Randomly swaps pairs of cities in a given tour according to mutation rate"""

 # Decide whether to mutate
 if random.random() < self.mutation_rate:

 # For each vertex
 for i in range(len(tour.vertices)):

 # Randomly decide whether to swap
 if random.random() < self.mutation_swap_rate:

 # Randomly choose other city position
 j = random.randrange(len(tour.vertices))

 # Swap
 tour.vertices[i], tour.vertices[j] = tour.vertices[j], tour.vertices[i]

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

 def evolve(self):
 """Executes one iteration of the genetic algorithm to obtain a new generation"""

 new_population = 

 for _ in range(self.population_size):

 # K-tournament for parents
 mum, dad = self.select_parent(self.k), self.select_parent(self.k)
 alice, bob = self.crossover(mum, dad)

 # Mate in an elite fashion according to the elitism_rate
 if random.random() < self.elite_mating_rate:
 if alice.cost() < mum.cost() or alice.cost() < dad.cost():
 new_population.append(alice)
 if bob.cost() < mum.cost() or bob.cost() < dad.cost():
 new_population.append(bob)

 else:
 self.mutate(alice)
 self.mutate(bob)
 new_population += [alice, bob]

 # Add new population to old
 self.population += new_population

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())


 def run(self, iterations=5000):
 for _ in range(iterations):
 self.evolve()

 def best(self):
 return max(self.population, key=lambda t: t.cost())


# Test on berlin52: http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/berlin52.tsp
g = Graph([(565.0, 575.0), (25.0, 185.0), (345.0, 750.0), (945.0, 685.0), 
 (845.0, 655.0), (880.0, 660.0), (25.0, 230.0), (525.0, 1000.0), 
 (580.0, 1175.0), (650.0, 1130.0), (1605.0, 620.0), (1220.0, 580.0), 
 (1465.0, 200.0), (1530.0, 5.0), (845.0, 680.0), (725.0, 370.0), 
 (145.0, 665.0), (415.0, 635.0), (510.0, 875.0), (560.0, 365.0), 
 (300.0, 465.0), (520.0, 585.0), (480.0, 415.0), (835.0, 625.0), 
 (975.0, 580.0), (1215.0, 245.0), (1320.0, 315.0), (1250.0, 400.0), 
 (660.0, 180.0), (410.0, 250.0), (420.0, 555.0), (575.0, 665.0), 
 (1150.0, 1160.0), (700.0, 580.0), (685.0, 595.0), (685.0, 610.0), 
 (770.0, 610.0), (795.0, 645.0), (720.0, 635.0), (760.0, 650.0), 
 (475.0, 960.0), (95.0, 260.0), (875.0, 920.0), (700.0, 500.0), 
 (555.0, 815.0), (830.0, 485.0), (1170.0, 65.0), (830.0, 610.0), 
 (605.0, 625.0), (595.0, 360.0), (1340.0, 725.0), (1740.0, 245.0)])

ga = GeneticAlgorithm(g, 100)
ga.run()

best_tour = ga.best()
g.plot(best_tour)

edited May 22 at 14:05

200_success

123k14143399

asked May 22 at 14:03

Luke Collins

1413

2

This is an excellent first question, welcome on CodeReview! :)
â€“Â IEatBagels
May 22 at 14:31

3

@TopinFrassi Thank you! It gives friendlier vibes than StackOverflow.
â€“Â Luke Collins
May 22 at 14:36

add a commentÂ |Â

up vote
8
down vote

favorite

I have tried to determine the bottleneck using the Python profiler (as described in this answer), but all of them seem to be taking quite long:

Are there any optimisations I can make which will improve the duration of my program? Here is the code.

import sys, math, random, heapq
import matplotlib.pyplot as plt
from itertools import chain

if sys.version_info < (3, 0):
 sys.exit("""Sorry, requires Python 3.x, not Python 2.x.""")

class Graph:

 def __init__(self, vertices):
 self.vertices = vertices
 self.n = len(vertices)

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

 def plot(self, tour=None):
 """Plots the cities and superimposes given tour"""

 if tour is None:
 tour = Tour(self, )

 _vertices = [self.vertices[0]]

 for i in tour.vertices:
 _vertices.append(self.vertices[i])

 _vertices.append(self.vertices[0])

 plt.title("Cost = " + str(tour.cost()))
 plt.plot(*zip(*_vertices), '-r')
 plt.scatter(*zip(*self.vertices), c="b", s=10, marker="s")
 plt.show()


class Tour:

 def __init__(self, g, vertices = None):
 """Generate random tour in given graph g"""

 self.g = g

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

 self.__cost = None

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])
 return self.__cost


class GeneticAlgorithm:

 def __init__(self, g, population_size, k=5, elite_mating_rate=0.5,
 mutation_rate=0.015, mutation_swap_rate=0.2):
 """Initialises algorithm parameters"""

 self.g = g

 self.population = 
 for _ in range(population_size):
 self.population.append(Tour(g))

 self.population_size = population_size
 self.k = k
 self.elite_mating_rate = elite_mating_rate
 self.mutation_rate = mutation_rate
 self.mutation_swap_rate = mutation_swap_rate

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):
 """Randomly swaps pairs of cities in a given tour according to mutation rate"""

 # Decide whether to mutate
 if random.random() < self.mutation_rate:

 # For each vertex
 for i in range(len(tour.vertices)):

 # Randomly decide whether to swap
 if random.random() < self.mutation_swap_rate:

 # Randomly choose other city position
 j = random.randrange(len(tour.vertices))

 # Swap
 tour.vertices[i], tour.vertices[j] = tour.vertices[j], tour.vertices[i]

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

 def evolve(self):
 """Executes one iteration of the genetic algorithm to obtain a new generation"""

 new_population = 

 for _ in range(self.population_size):

 # K-tournament for parents
 mum, dad = self.select_parent(self.k), self.select_parent(self.k)
 alice, bob = self.crossover(mum, dad)

 # Mate in an elite fashion according to the elitism_rate
 if random.random() < self.elite_mating_rate:
 if alice.cost() < mum.cost() or alice.cost() < dad.cost():
 new_population.append(alice)
 if bob.cost() < mum.cost() or bob.cost() < dad.cost():
 new_population.append(bob)

 else:
 self.mutate(alice)
 self.mutate(bob)
 new_population += [alice, bob]

 # Add new population to old
 self.population += new_population

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())


 def run(self, iterations=5000):
 for _ in range(iterations):
 self.evolve()

 def best(self):
 return max(self.population, key=lambda t: t.cost())


# Test on berlin52: http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/berlin52.tsp
g = Graph([(565.0, 575.0), (25.0, 185.0), (345.0, 750.0), (945.0, 685.0), 
 (845.0, 655.0), (880.0, 660.0), (25.0, 230.0), (525.0, 1000.0), 
 (580.0, 1175.0), (650.0, 1130.0), (1605.0, 620.0), (1220.0, 580.0), 
 (1465.0, 200.0), (1530.0, 5.0), (845.0, 680.0), (725.0, 370.0), 
 (145.0, 665.0), (415.0, 635.0), (510.0, 875.0), (560.0, 365.0), 
 (300.0, 465.0), (520.0, 585.0), (480.0, 415.0), (835.0, 625.0), 
 (975.0, 580.0), (1215.0, 245.0), (1320.0, 315.0), (1250.0, 400.0), 
 (660.0, 180.0), (410.0, 250.0), (420.0, 555.0), (575.0, 665.0), 
 (1150.0, 1160.0), (700.0, 580.0), (685.0, 595.0), (685.0, 610.0), 
 (770.0, 610.0), (795.0, 645.0), (720.0, 635.0), (760.0, 650.0), 
 (475.0, 960.0), (95.0, 260.0), (875.0, 920.0), (700.0, 500.0), 
 (555.0, 815.0), (830.0, 485.0), (1170.0, 65.0), (830.0, 610.0), 
 (605.0, 625.0), (595.0, 360.0), (1340.0, 725.0), (1740.0, 245.0)])

ga = GeneticAlgorithm(g, 100)
ga.run()

best_tour = ga.best()
g.plot(best_tour)

edited May 22 at 14:05

200_success

123k14143399

asked May 22 at 14:03

Luke Collins

1413

2

This is an excellent first question, welcome on CodeReview! :)
â€“Â IEatBagels
May 22 at 14:31

3

@TopinFrassi Thank you! It gives friendlier vibes than StackOverflow.
â€“Â Luke Collins
May 22 at 14:36

add a commentÂ |Â

up vote
8
down vote

favorite

I have tried to determine the bottleneck using the Python profiler (as described in this answer), but all of them seem to be taking quite long:

Are there any optimisations I can make which will improve the duration of my program? Here is the code.

import sys, math, random, heapq
import matplotlib.pyplot as plt
from itertools import chain

if sys.version_info < (3, 0):
 sys.exit("""Sorry, requires Python 3.x, not Python 2.x.""")

class Graph:

 def __init__(self, vertices):
 self.vertices = vertices
 self.n = len(vertices)

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

 def plot(self, tour=None):
 """Plots the cities and superimposes given tour"""

 if tour is None:
 tour = Tour(self, )

 _vertices = [self.vertices[0]]

 for i in tour.vertices:
 _vertices.append(self.vertices[i])

 _vertices.append(self.vertices[0])

 plt.title("Cost = " + str(tour.cost()))
 plt.plot(*zip(*_vertices), '-r')
 plt.scatter(*zip(*self.vertices), c="b", s=10, marker="s")
 plt.show()


class Tour:

 def __init__(self, g, vertices = None):
 """Generate random tour in given graph g"""

 self.g = g

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

 self.__cost = None

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])
 return self.__cost


class GeneticAlgorithm:

 def __init__(self, g, population_size, k=5, elite_mating_rate=0.5,
 mutation_rate=0.015, mutation_swap_rate=0.2):
 """Initialises algorithm parameters"""

 self.g = g

 self.population = 
 for _ in range(population_size):
 self.population.append(Tour(g))

 self.population_size = population_size
 self.k = k
 self.elite_mating_rate = elite_mating_rate
 self.mutation_rate = mutation_rate
 self.mutation_swap_rate = mutation_swap_rate

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):
 """Randomly swaps pairs of cities in a given tour according to mutation rate"""

 # Decide whether to mutate
 if random.random() < self.mutation_rate:

 # For each vertex
 for i in range(len(tour.vertices)):

 # Randomly decide whether to swap
 if random.random() < self.mutation_swap_rate:

 # Randomly choose other city position
 j = random.randrange(len(tour.vertices))

 # Swap
 tour.vertices[i], tour.vertices[j] = tour.vertices[j], tour.vertices[i]

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

 def evolve(self):
 """Executes one iteration of the genetic algorithm to obtain a new generation"""

 new_population = 

 for _ in range(self.population_size):

 # K-tournament for parents
 mum, dad = self.select_parent(self.k), self.select_parent(self.k)
 alice, bob = self.crossover(mum, dad)

 # Mate in an elite fashion according to the elitism_rate
 if random.random() < self.elite_mating_rate:
 if alice.cost() < mum.cost() or alice.cost() < dad.cost():
 new_population.append(alice)
 if bob.cost() < mum.cost() or bob.cost() < dad.cost():
 new_population.append(bob)

 else:
 self.mutate(alice)
 self.mutate(bob)
 new_population += [alice, bob]

 # Add new population to old
 self.population += new_population

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())


 def run(self, iterations=5000):
 for _ in range(iterations):
 self.evolve()

 def best(self):
 return max(self.population, key=lambda t: t.cost())


# Test on berlin52: http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/berlin52.tsp
g = Graph([(565.0, 575.0), (25.0, 185.0), (345.0, 750.0), (945.0, 685.0), 
 (845.0, 655.0), (880.0, 660.0), (25.0, 230.0), (525.0, 1000.0), 
 (580.0, 1175.0), (650.0, 1130.0), (1605.0, 620.0), (1220.0, 580.0), 
 (1465.0, 200.0), (1530.0, 5.0), (845.0, 680.0), (725.0, 370.0), 
 (145.0, 665.0), (415.0, 635.0), (510.0, 875.0), (560.0, 365.0), 
 (300.0, 465.0), (520.0, 585.0), (480.0, 415.0), (835.0, 625.0), 
 (975.0, 580.0), (1215.0, 245.0), (1320.0, 315.0), (1250.0, 400.0), 
 (660.0, 180.0), (410.0, 250.0), (420.0, 555.0), (575.0, 665.0), 
 (1150.0, 1160.0), (700.0, 580.0), (685.0, 595.0), (685.0, 610.0), 
 (770.0, 610.0), (795.0, 645.0), (720.0, 635.0), (760.0, 650.0), 
 (475.0, 960.0), (95.0, 260.0), (875.0, 920.0), (700.0, 500.0), 
 (555.0, 815.0), (830.0, 485.0), (1170.0, 65.0), (830.0, 610.0), 
 (605.0, 625.0), (595.0, 360.0), (1340.0, 725.0), (1740.0, 245.0)])

ga = GeneticAlgorithm(g, 100)
ga.run()

best_tour = ga.best()
g.plot(best_tour)

edited May 22 at 14:05

200_success

123k14143399

asked May 22 at 14:03

Luke Collins

1413

I have tried to determine the bottleneck using the Python profiler (as described in this answer), but all of them seem to be taking quite long:

Are there any optimisations I can make which will improve the duration of my program? Here is the code.

import sys, math, random, heapq
import matplotlib.pyplot as plt
from itertools import chain

if sys.version_info < (3, 0):
 sys.exit("""Sorry, requires Python 3.x, not Python 2.x.""")

class Graph:

 def __init__(self, vertices):
 self.vertices = vertices
 self.n = len(vertices)

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

 def plot(self, tour=None):
 """Plots the cities and superimposes given tour"""

 if tour is None:
 tour = Tour(self, )

 _vertices = [self.vertices[0]]

 for i in tour.vertices:
 _vertices.append(self.vertices[i])

 _vertices.append(self.vertices[0])

 plt.title("Cost = " + str(tour.cost()))
 plt.plot(*zip(*_vertices), '-r')
 plt.scatter(*zip(*self.vertices), c="b", s=10, marker="s")
 plt.show()


class Tour:

 def __init__(self, g, vertices = None):
 """Generate random tour in given graph g"""

 self.g = g

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

 self.__cost = None

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])
 return self.__cost


class GeneticAlgorithm:

 def __init__(self, g, population_size, k=5, elite_mating_rate=0.5,
 mutation_rate=0.015, mutation_swap_rate=0.2):
 """Initialises algorithm parameters"""

 self.g = g

 self.population = 
 for _ in range(population_size):
 self.population.append(Tour(g))

 self.population_size = population_size
 self.k = k
 self.elite_mating_rate = elite_mating_rate
 self.mutation_rate = mutation_rate
 self.mutation_swap_rate = mutation_swap_rate

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):
 """Randomly swaps pairs of cities in a given tour according to mutation rate"""

 # Decide whether to mutate
 if random.random() < self.mutation_rate:

 # For each vertex
 for i in range(len(tour.vertices)):

 # Randomly decide whether to swap
 if random.random() < self.mutation_swap_rate:

 # Randomly choose other city position
 j = random.randrange(len(tour.vertices))

 # Swap
 tour.vertices[i], tour.vertices[j] = tour.vertices[j], tour.vertices[i]

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

 def evolve(self):
 """Executes one iteration of the genetic algorithm to obtain a new generation"""

 new_population = 

 for _ in range(self.population_size):

 # K-tournament for parents
 mum, dad = self.select_parent(self.k), self.select_parent(self.k)
 alice, bob = self.crossover(mum, dad)

 # Mate in an elite fashion according to the elitism_rate
 if random.random() < self.elite_mating_rate:
 if alice.cost() < mum.cost() or alice.cost() < dad.cost():
 new_population.append(alice)
 if bob.cost() < mum.cost() or bob.cost() < dad.cost():
 new_population.append(bob)

 else:
 self.mutate(alice)
 self.mutate(bob)
 new_population += [alice, bob]

 # Add new population to old
 self.population += new_population

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())


 def run(self, iterations=5000):
 for _ in range(iterations):
 self.evolve()

 def best(self):
 return max(self.population, key=lambda t: t.cost())


# Test on berlin52: http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/berlin52.tsp
g = Graph([(565.0, 575.0), (25.0, 185.0), (345.0, 750.0), (945.0, 685.0), 
 (845.0, 655.0), (880.0, 660.0), (25.0, 230.0), (525.0, 1000.0), 
 (580.0, 1175.0), (650.0, 1130.0), (1605.0, 620.0), (1220.0, 580.0), 
 (1465.0, 200.0), (1530.0, 5.0), (845.0, 680.0), (725.0, 370.0), 
 (145.0, 665.0), (415.0, 635.0), (510.0, 875.0), (560.0, 365.0), 
 (300.0, 465.0), (520.0, 585.0), (480.0, 415.0), (835.0, 625.0), 
 (975.0, 580.0), (1215.0, 245.0), (1320.0, 315.0), (1250.0, 400.0), 
 (660.0, 180.0), (410.0, 250.0), (420.0, 555.0), (575.0, 665.0), 
 (1150.0, 1160.0), (700.0, 580.0), (685.0, 595.0), (685.0, 610.0), 
 (770.0, 610.0), (795.0, 645.0), (720.0, 635.0), (760.0, 650.0), 
 (475.0, 960.0), (95.0, 260.0), (875.0, 920.0), (700.0, 500.0), 
 (555.0, 815.0), (830.0, 485.0), (1170.0, 65.0), (830.0, 610.0), 
 (605.0, 625.0), (595.0, 360.0), (1340.0, 725.0), (1740.0, 245.0)])

ga = GeneticAlgorithm(g, 100)
ga.run()

best_tour = ga.best()
g.plot(best_tour)

edited May 22 at 14:05

200_success

123k14143399

asked May 22 at 14:03

Luke Collins

1413

edited May 22 at 14:05

200_success

123k14143399

edited May 22 at 14:05

200_success

123k14143399

edited May 22 at 14:05

200_success

123k14143399

asked May 22 at 14:03

Luke Collins

1413

asked May 22 at 14:03

Luke Collins

1413

asked May 22 at 14:03

Luke Collins

1413

2

This is an excellent first question, welcome on CodeReview! :)
â€“Â IEatBagels
May 22 at 14:31

3

@TopinFrassi Thank you! It gives friendlier vibes than StackOverflow.
â€“Â Luke Collins
May 22 at 14:36

add a commentÂ |Â

2

This is an excellent first question, welcome on CodeReview! :)
â€“Â IEatBagels
May 22 at 14:31

3

@TopinFrassi Thank you! It gives friendlier vibes than StackOverflow.
â€“Â Luke Collins
May 22 at 14:36

This is an excellent first question, welcome on CodeReview! :)
â€“Â IEatBagels
May 22 at 14:31

@TopinFrassi Thank you! It gives friendlier vibes than StackOverflow.
â€“Â Luke Collins
May 22 at 14:36

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
3
down vote

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

I think these two methods are dead code and could be deleted.

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

Would be good to have better documentation: what's the relationship between u and (x1, y1)?

In my testing, the lookup didn't actually provide any speedup. I think this is mainly because it's such a big key. If I change u and v to be indices into self.vertices, there is a speed saving. Obviously this also means changing Tour.cost, which is the only method which calls Graph.d.

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

I had to reverse engineer from the if case what the meaning of vertices is. A comment explaining it would have been helpful.

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])

Yakym's proposed code change was buggy (although it's now fixed), but the point about avoiding creating new lists is a good one. An alternative way of fixing Yakym's code which takes into account the goal of avoiding list creation and is further adapted to correspond to my point about using indices as arguments to Graph.d is

 self.__cost = self.g.d(0, self.vertices[0]) + 
 sum(map(self.g.d, self.vertices, self.vertices[1:])) + 
 self.g.d(self.vertices[-1], 0)

Separate point on cost: since you're always going to call cost() (if nothing else then in the heapq selection), is there any benefit to making it lazy?

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

Why not this?

 size = g.n - 1

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

Here we have a familiar culprit when code runs too slow: in list. Testing whether a list contains a value takes linear time: if you want a fast in test, use a set:

 skipalice, skipbob = set(alice), set(bob)
 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in skipalice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in skipbob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]
 current_dad_position += 1
 current_mum_position += 1

But actually a faster approach seems to be to use a comprehension and advanced indexing:

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = g.n - 1

 # Choose random start/end position for crossover
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Identify the elements from mum's sequence which end up in alice,
 # and from dad's which end up in bob 
 mumxo = set(mum.vertices[start:end+1])
 dadxo = set(dad.vertices[start:end+1])

 # Take the other elements in their original order
 alice = [i for i in dad.vertices if not i in mumxo]
 bob = [i for i in mum.vertices if not i in dadxo]

 # Insert selected elements of mum's sequence for alice, dad's for bob
 alice[start:start] = mum.vertices[start:end+1]
 bob[start:start] = dad.vertices[start:end+1]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):

I get a moderate speedup by pulling out n = len(tour.vertices), saving up to $n^2$ calls of len.

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

I haven't tried this, but I wonder whether it might be faster to sort self.population and then sample from range(self.population_size).

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())

The documentation for heapq.nsmallest says

[It performs] best for smaller values of n. For larger values, it is more efficient to use the sorted() function.

Here n is len(self.population)/2, and I see a small speedup from

 self.population = sorted(self.population, key=lambda t: t.cost())[0:self.population_size]

It might be worth going for a linear quickselect approach instead.

edited May 23 at 16:03

answered May 23 at 15:52

Peter Taylor

14k2454

NB I've been doing performance testing on the smaller test case elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/burma14.tsp because I was using an online code execution environment with a 60-second time limit; but on that test case the changes mentioned above got the runtime down from 40 seconds to 28 seconds.
â€“Â Peter Taylor
May 23 at 15:56

Thank you very much for all these! I'll be sure to read through them soon and give you feedback.
â€“Â Luke Collins
May 23 at 17:42

Thank you very much for all your optimisations, I have implemented all of them, together with some others (such as replacing the cost lookup table with an numpy matrix). Unfortunately however, I still haven't managed to decrease runtime to under a minute (you said it took you 28 seconds). Is there something I'm missing here? This is the updated code with all changes: ideone.com/oedbmG.
â€“Â Luke Collins
May 24 at 15:03

You're probably missing the part where that 28 seconds was for a 14-city tour, not a 52-city one.
â€“Â Peter Taylor
May 24 at 15:13

Oh right, sorry about that. Is it normal for such algorithms to take this long though? Like I said, my friend managed to compile in around 5 seconds in Java.
â€“Â Luke Collins
May 24 at 15:15

Â |Â
show 3 more comments

up vote
1
down vote

An extended comment rather than an answer. You can optimize the cost function as follows:

def cost(self):
 """Return total edge-cost of tour"""
 if self.__cost is None:
 pts = [self.g.vertices[i] for i in self.vertices]
 pts.append(self.g.vertices[0])
 self.__cost = sum(map(g.d, pts, pts[1:])) + g.d(pts[0], pts[-1])
 return self.__cost

First, you avoid looking up each point in g.vertices twice. More importantly, each of [0] + self.vertices and self.vertices + [0] creates a new list, necessarily making a copy of self.vertices. This shaves of about 15% of runtime on my machine.

edited May 23 at 15:55

answered May 23 at 14:09

Yakym Pirozhenko

1113

1

Also, this introduces a bug. The original code uses [0] + self.vertices and self.vertices + [0] because 0 is not in self.vertices, but does need to be taken into account in the tour.
â€“Â Peter Taylor
May 23 at 15:03

@PeterTaylor dist I think is referring to graph.d, the metric function. And as you rightly pointed out, the [0] is excluded for uniqueness of representation (otherwise tours can have multiple representations: each tour is assumed to start from zero)
â€“Â Luke Collins
May 23 at 15:14

@LukeCollins you are right, I was referring to graph.d. Also, since computing Euclidean distance is pretty cheap, memoizing the distances does save much. It ended up being only a tiny bit faster than recomputing the distance each time.
â€“Â Yakym Pirozhenko
May 23 at 15:53

@PeterTaylor absolutely right, I missed the point about 0 being present in every tour.
â€“Â Yakym Pirozhenko
May 23 at 15:57

1

Given that the point is to save list creation, IMO a better fix is the one I've given in my answer; although reducing from two list creations to one is an improvement. @LukeCollins, for true uniqueness of representation you should enforce t.vertices[0] <= t.vertices[-1] for a tour, since following the tour backwards doesn't fundamentally change it.
â€“Â Peter Taylor
May 23 at 16:01

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f194947%2fgenetic-algorithm-for-traveling-salesman%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
3
down vote

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

I think these two methods are dead code and could be deleted.

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

Would be good to have better documentation: what's the relationship between u and (x1, y1)?

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

I had to reverse engineer from the if case what the meaning of vertices is. A comment explaining it would have been helpful.

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])

 self.__cost = self.g.d(0, self.vertices[0]) + 
 sum(map(self.g.d, self.vertices, self.vertices[1:])) + 
 self.g.d(self.vertices[-1], 0)

Separate point on cost: since you're always going to call cost() (if nothing else then in the heapq selection), is there any benefit to making it lazy?

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

Why not this?

 size = g.n - 1

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

Here we have a familiar culprit when code runs too slow: in list. Testing whether a list contains a value takes linear time: if you want a fast in test, use a set:

 skipalice, skipbob = set(alice), set(bob)
 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in skipalice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in skipbob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]
 current_dad_position += 1
 current_mum_position += 1

But actually a faster approach seems to be to use a comprehension and advanced indexing:

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = g.n - 1

 # Choose random start/end position for crossover
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Identify the elements from mum's sequence which end up in alice,
 # and from dad's which end up in bob 
 mumxo = set(mum.vertices[start:end+1])
 dadxo = set(dad.vertices[start:end+1])

 # Take the other elements in their original order
 alice = [i for i in dad.vertices if not i in mumxo]
 bob = [i for i in mum.vertices if not i in dadxo]

 # Insert selected elements of mum's sequence for alice, dad's for bob
 alice[start:start] = mum.vertices[start:end+1]
 bob[start:start] = dad.vertices[start:end+1]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):

I get a moderate speedup by pulling out n = len(tour.vertices), saving up to $n^2$ calls of len.

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

I haven't tried this, but I wonder whether it might be faster to sort self.population and then sample from range(self.population_size).

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())

The documentation for heapq.nsmallest says

[It performs] best for smaller values of n. For larger values, it is more efficient to use the sorted() function.

Here n is len(self.population)/2, and I see a small speedup from

 self.population = sorted(self.population, key=lambda t: t.cost())[0:self.population_size]

It might be worth going for a linear quickselect approach instead.

edited May 23 at 16:03

answered May 23 at 15:52

Peter Taylor

14k2454

NB I've been doing performance testing on the smaller test case elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/burma14.tsp because I was using an online code execution environment with a 60-second time limit; but on that test case the changes mentioned above got the runtime down from 40 seconds to 28 seconds.
â€“Â Peter Taylor
May 23 at 15:56

Thank you very much for all these! I'll be sure to read through them soon and give you feedback.
â€“Â Luke Collins
May 23 at 17:42

Thank you very much for all your optimisations, I have implemented all of them, together with some others (such as replacing the cost lookup table with an numpy matrix). Unfortunately however, I still haven't managed to decrease runtime to under a minute (you said it took you 28 seconds). Is there something I'm missing here? This is the updated code with all changes: ideone.com/oedbmG.
â€“Â Luke Collins
May 24 at 15:03

You're probably missing the part where that 28 seconds was for a 14-city tour, not a 52-city one.
â€“Â Peter Taylor
May 24 at 15:13

Oh right, sorry about that. Is it normal for such algorithms to take this long though? Like I said, my friend managed to compile in around 5 seconds in Java.
â€“Â Luke Collins
May 24 at 15:15

Â |Â
show 3 more comments

up vote
3
down vote

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

I think these two methods are dead code and could be deleted.

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

Would be good to have better documentation: what's the relationship between u and (x1, y1)?

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

I had to reverse engineer from the if case what the meaning of vertices is. A comment explaining it would have been helpful.

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])

 self.__cost = self.g.d(0, self.vertices[0]) + 
 sum(map(self.g.d, self.vertices, self.vertices[1:])) + 
 self.g.d(self.vertices[-1], 0)

Separate point on cost: since you're always going to call cost() (if nothing else then in the heapq selection), is there any benefit to making it lazy?

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

Why not this?

 size = g.n - 1

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

Here we have a familiar culprit when code runs too slow: in list. Testing whether a list contains a value takes linear time: if you want a fast in test, use a set:

 skipalice, skipbob = set(alice), set(bob)
 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in skipalice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in skipbob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]
 current_dad_position += 1
 current_mum_position += 1

But actually a faster approach seems to be to use a comprehension and advanced indexing:

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = g.n - 1

 # Choose random start/end position for crossover
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Identify the elements from mum's sequence which end up in alice,
 # and from dad's which end up in bob 
 mumxo = set(mum.vertices[start:end+1])
 dadxo = set(dad.vertices[start:end+1])

 # Take the other elements in their original order
 alice = [i for i in dad.vertices if not i in mumxo]
 bob = [i for i in mum.vertices if not i in dadxo]

 # Insert selected elements of mum's sequence for alice, dad's for bob
 alice[start:start] = mum.vertices[start:end+1]
 bob[start:start] = dad.vertices[start:end+1]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):

I get a moderate speedup by pulling out n = len(tour.vertices), saving up to $n^2$ calls of len.

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

I haven't tried this, but I wonder whether it might be faster to sort self.population and then sample from range(self.population_size).

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())

The documentation for heapq.nsmallest says

[It performs] best for smaller values of n. For larger values, it is more efficient to use the sorted() function.

Here n is len(self.population)/2, and I see a small speedup from

 self.population = sorted(self.population, key=lambda t: t.cost())[0:self.population_size]

It might be worth going for a linear quickselect approach instead.

edited May 23 at 16:03

answered May 23 at 15:52

Peter Taylor

14k2454

NB I've been doing performance testing on the smaller test case elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/burma14.tsp because I was using an online code execution environment with a 60-second time limit; but on that test case the changes mentioned above got the runtime down from 40 seconds to 28 seconds.
â€“Â Peter Taylor
May 23 at 15:56

Thank you very much for all these! I'll be sure to read through them soon and give you feedback.
â€“Â Luke Collins
May 23 at 17:42

Thank you very much for all your optimisations, I have implemented all of them, together with some others (such as replacing the cost lookup table with an numpy matrix). Unfortunately however, I still haven't managed to decrease runtime to under a minute (you said it took you 28 seconds). Is there something I'm missing here? This is the updated code with all changes: ideone.com/oedbmG.
â€“Â Luke Collins
May 24 at 15:03

You're probably missing the part where that 28 seconds was for a 14-city tour, not a 52-city one.
â€“Â Peter Taylor
May 24 at 15:13

Oh right, sorry about that. Is it normal for such algorithms to take this long though? Like I said, my friend managed to compile in around 5 seconds in Java.
â€“Â Luke Collins
May 24 at 15:15

Â |Â
show 3 more comments

up vote
3
down vote

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

I think these two methods are dead code and could be deleted.

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

Would be good to have better documentation: what's the relationship between u and (x1, y1)?

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

I had to reverse engineer from the if case what the meaning of vertices is. A comment explaining it would have been helpful.

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])

 self.__cost = self.g.d(0, self.vertices[0]) + 
 sum(map(self.g.d, self.vertices, self.vertices[1:])) + 
 self.g.d(self.vertices[-1], 0)

Separate point on cost: since you're always going to call cost() (if nothing else then in the heapq selection), is there any benefit to making it lazy?

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

Why not this?

 size = g.n - 1

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

Here we have a familiar culprit when code runs too slow: in list. Testing whether a list contains a value takes linear time: if you want a fast in test, use a set:

 skipalice, skipbob = set(alice), set(bob)
 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in skipalice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in skipbob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]
 current_dad_position += 1
 current_mum_position += 1

But actually a faster approach seems to be to use a comprehension and advanced indexing:

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = g.n - 1

 # Choose random start/end position for crossover
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Identify the elements from mum's sequence which end up in alice,
 # and from dad's which end up in bob 
 mumxo = set(mum.vertices[start:end+1])
 dadxo = set(dad.vertices[start:end+1])

 # Take the other elements in their original order
 alice = [i for i in dad.vertices if not i in mumxo]
 bob = [i for i in mum.vertices if not i in dadxo]

 # Insert selected elements of mum's sequence for alice, dad's for bob
 alice[start:start] = mum.vertices[start:end+1]
 bob[start:start] = dad.vertices[start:end+1]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):

I get a moderate speedup by pulling out n = len(tour.vertices), saving up to $n^2$ calls of len.

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

I haven't tried this, but I wonder whether it might be faster to sort self.population and then sample from range(self.population_size).

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())

The documentation for heapq.nsmallest says

[It performs] best for smaller values of n. For larger values, it is more efficient to use the sorted() function.

Here n is len(self.population)/2, and I see a small speedup from

 self.population = sorted(self.population, key=lambda t: t.cost())[0:self.population_size]

It might be worth going for a linear quickselect approach instead.

edited May 23 at 16:03

answered May 23 at 15:52

Peter Taylor

14k2454

 def x(self, v):
 return self.vertices[v][0]

 def y(self, v):
 return self.vertices[v][1]

I think these two methods are dead code and could be deleted.

 # Lookup table for distances
 _d_lookup = 

 def d(self, u, v):
 """Euclidean Metric d_2((x1, y1), (x2, y2))"""

 # Check if the distance was computed before
 if (u, v) in self._d_lookup:
 return self._d_lookup[(u, v)]

 # Otherwise compute it
 _distance = math.sqrt((u[0] - v[0])**2 + (u[1] - v[1])**2)

 # Add to dictionary
 self._d_lookup[(u, v)], self._d_lookup[(v, u)] = _distance, _distance
 return _distance

Would be good to have better documentation: what's the relationship between u and (x1, y1)?

 if vertices is None:
 self.vertices = list(range(1, g.n))
 random.shuffle(self.vertices)
 else:
 self.vertices = vertices

I had to reverse engineer from the if case what the meaning of vertices is. A comment explaining it would have been helpful.

 def cost(self):
 """Return total edge-cost of tour"""

 if self.__cost is None:
 self.__cost = 0
 for i, j in zip([0] + self.vertices, self.vertices + [0]):
 self.__cost += self.g.d(self.g.vertices[i], self.g.vertices[j])

 self.__cost = self.g.d(0, self.vertices[0]) + 
 sum(map(self.g.d, self.vertices, self.vertices[1:])) + 
 self.g.d(self.vertices[-1], 0)

Separate point on cost: since you're always going to call cost() (if nothing else then in the heapq selection), is there any benefit to making it lazy?

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = len(mum.vertices)

Why not this?

 size = g.n - 1

 # Choose random start/end position for crossover
 alice, bob = [-1] * size, [-1] * size
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Replicate mum's sequence for alice, dad's sequence for bob
 for i in range(start, end + 1):
 alice[i] = mum.vertices[i]
 bob[i] = dad.vertices[i]

 # Fill the remaining position with the other parents' entries
 current_dad_position, current_mum_position = 0, 0

 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in alice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in bob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]

Here we have a familiar culprit when code runs too slow: in list. Testing whether a list contains a value takes linear time: if you want a fast in test, use a set:

 skipalice, skipbob = set(alice), set(bob)
 for i in chain(range(start), range(end + 1, size)):

 while dad.vertices[current_dad_position] in skipalice:
 current_dad_position += 1

 while mum.vertices[current_mum_position] in skipbob:
 current_mum_position += 1

 alice[i] = dad.vertices[current_dad_position]
 bob[i] = mum.vertices[current_mum_position]
 current_dad_position += 1
 current_mum_position += 1

But actually a faster approach seems to be to use a comprehension and advanced indexing:

 def crossover(self, mum, dad):
 """Implements ordered crossover"""

 size = g.n - 1

 # Choose random start/end position for crossover
 start, end = sorted([random.randrange(size) for _ in range(2)])

 # Identify the elements from mum's sequence which end up in alice,
 # and from dad's which end up in bob 
 mumxo = set(mum.vertices[start:end+1])
 dadxo = set(dad.vertices[start:end+1])

 # Take the other elements in their original order
 alice = [i for i in dad.vertices if not i in mumxo]
 bob = [i for i in mum.vertices if not i in dadxo]

 # Insert selected elements of mum's sequence for alice, dad's for bob
 alice[start:start] = mum.vertices[start:end+1]
 bob[start:start] = dad.vertices[start:end+1]

 # Return twins
 return Tour(self.g, alice), Tour(self.g, bob)

 def mutate(self, tour):

I get a moderate speedup by pulling out n = len(tour.vertices), saving up to $n^2$ calls of len.

 def select_parent(self, k):
 """Implements k-tournament selection to choose parents"""
 tournament = random.sample(self.population, k)
 return max(tournament, key=lambda t: t.cost())

I haven't tried this, but I wonder whether it might be faster to sort self.population and then sample from range(self.population_size).

 # Retain fittest
 self.population = heapq.nsmallest(self.population_size, self.population, key=lambda t: t.cost())

The documentation for heapq.nsmallest says

[It performs] best for smaller values of n. For larger values, it is more efficient to use the sorted() function.

Here n is len(self.population)/2, and I see a small speedup from

 self.population = sorted(self.population, key=lambda t: t.cost())[0:self.population_size]

It might be worth going for a linear quickselect approach instead.

edited May 23 at 16:03

answered May 23 at 15:52

Peter Taylor

14k2454

edited May 23 at 16:03

answered May 23 at 15:52

Peter Taylor

14k2454

answered May 23 at 15:52

Peter Taylor

14k2454

answered May 23 at 15:52

Peter Taylor

14k2454

NB I've been doing performance testing on the smaller test case elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/burma14.tsp because I was using an online code execution environment with a 60-second time limit; but on that test case the changes mentioned above got the runtime down from 40 seconds to 28 seconds.
â€“Â Peter Taylor
May 23 at 15:56

Thank you very much for all these! I'll be sure to read through them soon and give you feedback.
â€“Â Luke Collins
May 23 at 17:42

Thank you very much for all your optimisations, I have implemented all of them, together with some others (such as replacing the cost lookup table with an numpy matrix). Unfortunately however, I still haven't managed to decrease runtime to under a minute (you said it took you 28 seconds). Is there something I'm missing here? This is the updated code with all changes: ideone.com/oedbmG.
â€“Â Luke Collins
May 24 at 15:03

You're probably missing the part where that 28 seconds was for a 14-city tour, not a 52-city one.
â€“Â Peter Taylor
May 24 at 15:13

Oh right, sorry about that. Is it normal for such algorithms to take this long though? Like I said, my friend managed to compile in around 5 seconds in Java.
â€“Â Luke Collins
May 24 at 15:15

Â |Â
show 3 more comments

NB I've been doing performance testing on the smaller test case elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/burma14.tsp because I was using an online code execution environment with a 60-second time limit; but on that test case the changes mentioned above got the runtime down from 40 seconds to 28 seconds.
â€“Â Peter Taylor
May 23 at 15:56

Thank you very much for all these! I'll be sure to read through them soon and give you feedback.
â€“Â Luke Collins
May 23 at 17:42

Thank you very much for all your optimisations, I have implemented all of them, together with some others (such as replacing the cost lookup table with an numpy matrix). Unfortunately however, I still haven't managed to decrease runtime to under a minute (you said it took you 28 seconds). Is there something I'm missing here? This is the updated code with all changes: ideone.com/oedbmG.
â€“Â Luke Collins
May 24 at 15:03

You're probably missing the part where that 28 seconds was for a 14-city tour, not a 52-city one.
â€“Â Peter Taylor
May 24 at 15:13

Oh right, sorry about that. Is it normal for such algorithms to take this long though? Like I said, my friend managed to compile in around 5 seconds in Java.
â€“Â Luke Collins
May 24 at 15:15

NB I've been doing performance testing on the smaller test case elib.zib.de/pub/mp-testdata/tsp/tsplib/tsp/burma14.tsp because I was using an online code execution environment with a 60-second time limit; but on that test case the changes mentioned above got the runtime down from 40 seconds to 28 seconds.
â€“Â Peter Taylor
May 23 at 15:56

Thank you very much for all these! I'll be sure to read through them soon and give you feedback.
â€“Â Luke Collins
May 23 at 17:42

Thank you very much for all your optimisations, I have implemented all of them, together with some others (such as replacing the cost lookup table with an numpy matrix). Unfortunately however, I still haven't managed to decrease runtime to under a minute (you said it took you 28 seconds). Is there something I'm missing here? This is the updated code with all changes: ideone.com/oedbmG.
â€“Â Luke Collins
May 24 at 15:03

You're probably missing the part where that 28 seconds was for a 14-city tour, not a 52-city one.
â€“Â Peter Taylor
May 24 at 15:13

Oh right, sorry about that. Is it normal for such algorithms to take this long though? Like I said, my friend managed to compile in around 5 seconds in Java.
â€“Â Luke Collins
May 24 at 15:15

Â |Â
show 3 more comments

up vote
1
down vote

An extended comment rather than an answer. You can optimize the cost function as follows:

def cost(self):
 """Return total edge-cost of tour"""
 if self.__cost is None:
 pts = [self.g.vertices[i] for i in self.vertices]
 pts.append(self.g.vertices[0])
 self.__cost = sum(map(g.d, pts, pts[1:])) + g.d(pts[0], pts[-1])
 return self.__cost

edited May 23 at 15:55

answered May 23 at 14:09

Yakym Pirozhenko

1113

1

Also, this introduces a bug. The original code uses [0] + self.vertices and self.vertices + [0] because 0 is not in self.vertices, but does need to be taken into account in the tour.
â€“Â Peter Taylor
May 23 at 15:03

@PeterTaylor dist I think is referring to graph.d, the metric function. And as you rightly pointed out, the [0] is excluded for uniqueness of representation (otherwise tours can have multiple representations: each tour is assumed to start from zero)
â€“Â Luke Collins
May 23 at 15:14

@LukeCollins you are right, I was referring to graph.d. Also, since computing Euclidean distance is pretty cheap, memoizing the distances does save much. It ended up being only a tiny bit faster than recomputing the distance each time.
â€“Â Yakym Pirozhenko
May 23 at 15:53

@PeterTaylor absolutely right, I missed the point about 0 being present in every tour.
â€“Â Yakym Pirozhenko
May 23 at 15:57

1

Given that the point is to save list creation, IMO a better fix is the one I've given in my answer; although reducing from two list creations to one is an improvement. @LukeCollins, for true uniqueness of representation you should enforce t.vertices[0] <= t.vertices[-1] for a tour, since following the tour backwards doesn't fundamentally change it.
â€“Â Peter Taylor
May 23 at 16:01

add a commentÂ |Â

up vote
1
down vote

An extended comment rather than an answer. You can optimize the cost function as follows:

def cost(self):
 """Return total edge-cost of tour"""
 if self.__cost is None:
 pts = [self.g.vertices[i] for i in self.vertices]
 pts.append(self.g.vertices[0])
 self.__cost = sum(map(g.d, pts, pts[1:])) + g.d(pts[0], pts[-1])
 return self.__cost

edited May 23 at 15:55

answered May 23 at 14:09

Yakym Pirozhenko

1113

1

Also, this introduces a bug. The original code uses [0] + self.vertices and self.vertices + [0] because 0 is not in self.vertices, but does need to be taken into account in the tour.
â€“Â Peter Taylor
May 23 at 15:03

@PeterTaylor dist I think is referring to graph.d, the metric function. And as you rightly pointed out, the [0] is excluded for uniqueness of representation (otherwise tours can have multiple representations: each tour is assumed to start from zero)
â€“Â Luke Collins
May 23 at 15:14

@LukeCollins you are right, I was referring to graph.d. Also, since computing Euclidean distance is pretty cheap, memoizing the distances does save much. It ended up being only a tiny bit faster than recomputing the distance each time.
â€“Â Yakym Pirozhenko
May 23 at 15:53

@PeterTaylor absolutely right, I missed the point about 0 being present in every tour.
â€“Â Yakym Pirozhenko
May 23 at 15:57

1

Given that the point is to save list creation, IMO a better fix is the one I've given in my answer; although reducing from two list creations to one is an improvement. @LukeCollins, for true uniqueness of representation you should enforce t.vertices[0] <= t.vertices[-1] for a tour, since following the tour backwards doesn't fundamentally change it.
â€“Â Peter Taylor
May 23 at 16:01

add a commentÂ |Â

up vote
1
down vote

An extended comment rather than an answer. You can optimize the cost function as follows:

def cost(self):
 """Return total edge-cost of tour"""
 if self.__cost is None:
 pts = [self.g.vertices[i] for i in self.vertices]
 pts.append(self.g.vertices[0])
 self.__cost = sum(map(g.d, pts, pts[1:])) + g.d(pts[0], pts[-1])
 return self.__cost

edited May 23 at 15:55

answered May 23 at 14:09

Yakym Pirozhenko

1113

An extended comment rather than an answer. You can optimize the cost function as follows:

def cost(self):
 """Return total edge-cost of tour"""
 if self.__cost is None:
 pts = [self.g.vertices[i] for i in self.vertices]
 pts.append(self.g.vertices[0])
 self.__cost = sum(map(g.d, pts, pts[1:])) + g.d(pts[0], pts[-1])
 return self.__cost

edited May 23 at 15:55

answered May 23 at 14:09

Yakym Pirozhenko

1113

edited May 23 at 15:55

answered May 23 at 14:09

Yakym Pirozhenko

1113

answered May 23 at 14:09

Yakym Pirozhenko

1113

answered May 23 at 14:09

Yakym Pirozhenko

1113

1

Also, this introduces a bug. The original code uses [0] + self.vertices and self.vertices + [0] because 0 is not in self.vertices, but does need to be taken into account in the tour.
â€“Â Peter Taylor
May 23 at 15:03

@PeterTaylor dist I think is referring to graph.d, the metric function. And as you rightly pointed out, the [0] is excluded for uniqueness of representation (otherwise tours can have multiple representations: each tour is assumed to start from zero)
â€“Â Luke Collins
May 23 at 15:14

@LukeCollins you are right, I was referring to graph.d. Also, since computing Euclidean distance is pretty cheap, memoizing the distances does save much. It ended up being only a tiny bit faster than recomputing the distance each time.
â€“Â Yakym Pirozhenko
May 23 at 15:53

@PeterTaylor absolutely right, I missed the point about 0 being present in every tour.
â€“Â Yakym Pirozhenko
May 23 at 15:57

1

Given that the point is to save list creation, IMO a better fix is the one I've given in my answer; although reducing from two list creations to one is an improvement. @LukeCollins, for true uniqueness of representation you should enforce t.vertices[0] <= t.vertices[-1] for a tour, since following the tour backwards doesn't fundamentally change it.
â€“Â Peter Taylor
May 23 at 16:01

add a commentÂ |Â

1

Also, this introduces a bug. The original code uses [0] + self.vertices and self.vertices + [0] because 0 is not in self.vertices, but does need to be taken into account in the tour.
â€“Â Peter Taylor
May 23 at 15:03

@PeterTaylor dist I think is referring to graph.d, the metric function. And as you rightly pointed out, the [0] is excluded for uniqueness of representation (otherwise tours can have multiple representations: each tour is assumed to start from zero)
â€“Â Luke Collins
May 23 at 15:14

@LukeCollins you are right, I was referring to graph.d. Also, since computing Euclidean distance is pretty cheap, memoizing the distances does save much. It ended up being only a tiny bit faster than recomputing the distance each time.
â€“Â Yakym Pirozhenko
May 23 at 15:53

@PeterTaylor absolutely right, I missed the point about 0 being present in every tour.
â€“Â Yakym Pirozhenko
May 23 at 15:57

1

Given that the point is to save list creation, IMO a better fix is the one I've given in my answer; although reducing from two list creations to one is an improvement. @LukeCollins, for true uniqueness of representation you should enforce t.vertices[0] <= t.vertices[-1] for a tour, since following the tour backwards doesn't fundamentally change it.
â€“Â Peter Taylor
May 23 at 16:01

Also, this introduces a bug. The original code uses [0] + self.vertices and self.vertices + [0] because 0 is not in self.vertices, but does need to be taken into account in the tour.
â€“Â Peter Taylor
May 23 at 15:03

@PeterTaylor dist I think is referring to graph.d, the metric function. And as you rightly pointed out, the [0] is excluded for uniqueness of representation (otherwise tours can have multiple representations: each tour is assumed to start from zero)
â€“Â Luke Collins
May 23 at 15:14

@LukeCollins you are right, I was referring to graph.d. Also, since computing Euclidean distance is pretty cheap, memoizing the distances does save much. It ended up being only a tiny bit faster than recomputing the distance each time.
â€“Â Yakym Pirozhenko
May 23 at 15:53

@PeterTaylor absolutely right, I missed the point about 0 being present in every tour.
â€“Â Yakym Pirozhenko
May 23 at 15:57

Given that the point is to save list creation, IMO a better fix is the one I've given in my answer; although reducing from two list creations to one is an improvement. @LukeCollins, for true uniqueness of representation you should enforce t.vertices[0] <= t.vertices[-1] for a tour, since following the tour backwards doesn't fundamentally change it.
â€“Â Peter Taylor
May 23 at 16:01

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr