Python: Speeding up vector and matrix operations
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
The core of my program are the following lines
import numpy as np
import time
I, J = np.random.randn(20000,800), np.random.randn(10000,800)
d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d
I
and J
are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i
and matrix_j
. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.
What can I do to speed things a little bit up? Any ideas?
python performance numpy
 |Â
show 1 more comment
up vote
3
down vote
favorite
The core of my program are the following lines
import numpy as np
import time
I, J = np.random.randn(20000,800), np.random.randn(10000,800)
d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d
I
and J
are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i
and matrix_j
. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.
What can I do to speed things a little bit up? Any ideas?
python performance numpy
My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
â gbartonowen
May 1 at 8:55
@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
â Samuel
May 1 at 9:01
3
The code in the post looks wrong to me, becaused
has shape(len(J), 1)
but the assignments are to indicesd[0]
, ...,d[len(I)-1]
and for each iteration overj
these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
â Gareth Rees
May 1 at 9:12
@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
â Samuel
May 1 at 9:15
Thank you. But this does not address the problem that the assignments tod
get overwritten on each iteration. Please fix.
â Gareth Rees
May 1 at 9:16
 |Â
show 1 more comment
up vote
3
down vote
favorite
up vote
3
down vote
favorite
The core of my program are the following lines
import numpy as np
import time
I, J = np.random.randn(20000,800), np.random.randn(10000,800)
d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d
I
and J
are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i
and matrix_j
. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.
What can I do to speed things a little bit up? Any ideas?
python performance numpy
The core of my program are the following lines
import numpy as np
import time
I, J = np.random.randn(20000,800), np.random.randn(10000,800)
d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d
I
and J
are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i
and matrix_j
. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.
What can I do to speed things a little bit up? Any ideas?
python performance numpy
edited May 1 at 10:56
Latika Agarwal
861216
861216
asked May 1 at 8:17
Samuel
1839
1839
My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
â gbartonowen
May 1 at 8:55
@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
â Samuel
May 1 at 9:01
3
The code in the post looks wrong to me, becaused
has shape(len(J), 1)
but the assignments are to indicesd[0]
, ...,d[len(I)-1]
and for each iteration overj
these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
â Gareth Rees
May 1 at 9:12
@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
â Samuel
May 1 at 9:15
Thank you. But this does not address the problem that the assignments tod
get overwritten on each iteration. Please fix.
â Gareth Rees
May 1 at 9:16
 |Â
show 1 more comment
My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
â gbartonowen
May 1 at 8:55
@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
â Samuel
May 1 at 9:01
3
The code in the post looks wrong to me, becaused
has shape(len(J), 1)
but the assignments are to indicesd[0]
, ...,d[len(I)-1]
and for each iteration overj
these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
â Gareth Rees
May 1 at 9:12
@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
â Samuel
May 1 at 9:15
Thank you. But this does not address the problem that the assignments tod
get overwritten on each iteration. Please fix.
â Gareth Rees
May 1 at 9:16
My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
â gbartonowen
May 1 at 8:55
My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
â gbartonowen
May 1 at 8:55
@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
â Samuel
May 1 at 9:01
@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
â Samuel
May 1 at 9:01
3
3
The code in the post looks wrong to me, because
d
has shape (len(J), 1)
but the assignments are to indices d[0]
, ..., d[len(I)-1]
and for each iteration over j
these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.â Gareth Rees
May 1 at 9:12
The code in the post looks wrong to me, because
d
has shape (len(J), 1)
but the assignments are to indices d[0]
, ..., d[len(I)-1]
and for each iteration over j
these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.â Gareth Rees
May 1 at 9:12
@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
â Samuel
May 1 at 9:15
@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
â Samuel
May 1 at 9:15
Thank you. But this does not address the problem that the assignments to
d
get overwritten on each iteration. Please fix.â Gareth Rees
May 1 at 9:16
Thank you. But this does not address the problem that the assignments to
d
get overwritten on each iteration. Please fix.â Gareth Rees
May 1 at 9:16
 |Â
show 1 more comment
2 Answers
2
active
oldest
votes
up vote
4
down vote
accepted
What you compute with the presented code is
matrix_j = array([ J[j,k]**2 for k in range(N) ])
(where N=800
is the vector length) so that
d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )
This does not make much sense geometrically.
What you could have intended, by what might be implied by the formulas and variable names, is to have J[j]
and I[i]
as actual row vectors which can be achieved by transforming the tables I,J
into matrices,
I,J = np.matrix(I), np.matrix(J)
so that then d[i]
is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i]
, which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as
d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))
which has the form for row vectors $a,b$ of equal dimension
beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign
Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is
d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2
For a component-wise derivation see the answer of Gareth Rees.
These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i])
allows re-use so that all len(I)+len(J)
norm squares are only computed once and the main computational cost is from the len(I)*len(J)
scalar products.
scipy.linalg.norm
might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.
The scalar products are elements of a matrix product of I
and J.T
, so that a compact computation proceeds by (all are np.array
objects)
dotIJ = np.dot(I,J.T);
normI2 = [ np.dot(row,row) for row in I ];
normJ2 = [ np.dot(row,row) for row in J ];
d = np.zeros((len(I),1));
for j in range(len(J)):
d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
# process values of d
These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
â gbartonowen
May 1 at 9:35
I'm not sure if that is faster thannp.dot
. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
â LutzL
May 1 at 9:53
@LutzL I am not sure if I get your answer right, but when I computed[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2
it is much faster but I get different results compared to my approach. Do you see why?
â Samuel
May 1 at 10:06
1
I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format ofmatrix_i, matrix_j
is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
â LutzL
May 1 at 11:12
add a comment |Â
up vote
4
down vote
Here's an explicit working-out of the maths needed for the answer by LutzL.
Let's start by writing $I$ and $J$ for the two vectors I[i]
and J[j]
respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l â J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 â 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 â 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) â 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I÷Iright)^2 + left(J÷Jright)^2 - 2left(I÷Jright)^2$$
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
What you compute with the presented code is
matrix_j = array([ J[j,k]**2 for k in range(N) ])
(where N=800
is the vector length) so that
d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )
This does not make much sense geometrically.
What you could have intended, by what might be implied by the formulas and variable names, is to have J[j]
and I[i]
as actual row vectors which can be achieved by transforming the tables I,J
into matrices,
I,J = np.matrix(I), np.matrix(J)
so that then d[i]
is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i]
, which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as
d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))
which has the form for row vectors $a,b$ of equal dimension
beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign
Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is
d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2
For a component-wise derivation see the answer of Gareth Rees.
These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i])
allows re-use so that all len(I)+len(J)
norm squares are only computed once and the main computational cost is from the len(I)*len(J)
scalar products.
scipy.linalg.norm
might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.
The scalar products are elements of a matrix product of I
and J.T
, so that a compact computation proceeds by (all are np.array
objects)
dotIJ = np.dot(I,J.T);
normI2 = [ np.dot(row,row) for row in I ];
normJ2 = [ np.dot(row,row) for row in J ];
d = np.zeros((len(I),1));
for j in range(len(J)):
d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
# process values of d
These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
â gbartonowen
May 1 at 9:35
I'm not sure if that is faster thannp.dot
. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
â LutzL
May 1 at 9:53
@LutzL I am not sure if I get your answer right, but when I computed[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2
it is much faster but I get different results compared to my approach. Do you see why?
â Samuel
May 1 at 10:06
1
I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format ofmatrix_i, matrix_j
is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
â LutzL
May 1 at 11:12
add a comment |Â
up vote
4
down vote
accepted
What you compute with the presented code is
matrix_j = array([ J[j,k]**2 for k in range(N) ])
(where N=800
is the vector length) so that
d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )
This does not make much sense geometrically.
What you could have intended, by what might be implied by the formulas and variable names, is to have J[j]
and I[i]
as actual row vectors which can be achieved by transforming the tables I,J
into matrices,
I,J = np.matrix(I), np.matrix(J)
so that then d[i]
is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i]
, which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as
d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))
which has the form for row vectors $a,b$ of equal dimension
beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign
Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is
d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2
For a component-wise derivation see the answer of Gareth Rees.
These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i])
allows re-use so that all len(I)+len(J)
norm squares are only computed once and the main computational cost is from the len(I)*len(J)
scalar products.
scipy.linalg.norm
might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.
The scalar products are elements of a matrix product of I
and J.T
, so that a compact computation proceeds by (all are np.array
objects)
dotIJ = np.dot(I,J.T);
normI2 = [ np.dot(row,row) for row in I ];
normJ2 = [ np.dot(row,row) for row in J ];
d = np.zeros((len(I),1));
for j in range(len(J)):
d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
# process values of d
These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
â gbartonowen
May 1 at 9:35
I'm not sure if that is faster thannp.dot
. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
â LutzL
May 1 at 9:53
@LutzL I am not sure if I get your answer right, but when I computed[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2
it is much faster but I get different results compared to my approach. Do you see why?
â Samuel
May 1 at 10:06
1
I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format ofmatrix_i, matrix_j
is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
â LutzL
May 1 at 11:12
add a comment |Â
up vote
4
down vote
accepted
up vote
4
down vote
accepted
What you compute with the presented code is
matrix_j = array([ J[j,k]**2 for k in range(N) ])
(where N=800
is the vector length) so that
d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )
This does not make much sense geometrically.
What you could have intended, by what might be implied by the formulas and variable names, is to have J[j]
and I[i]
as actual row vectors which can be achieved by transforming the tables I,J
into matrices,
I,J = np.matrix(I), np.matrix(J)
so that then d[i]
is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i]
, which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as
d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))
which has the form for row vectors $a,b$ of equal dimension
beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign
Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is
d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2
For a component-wise derivation see the answer of Gareth Rees.
These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i])
allows re-use so that all len(I)+len(J)
norm squares are only computed once and the main computational cost is from the len(I)*len(J)
scalar products.
scipy.linalg.norm
might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.
The scalar products are elements of a matrix product of I
and J.T
, so that a compact computation proceeds by (all are np.array
objects)
dotIJ = np.dot(I,J.T);
normI2 = [ np.dot(row,row) for row in I ];
normJ2 = [ np.dot(row,row) for row in J ];
d = np.zeros((len(I),1));
for j in range(len(J)):
d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
# process values of d
What you compute with the presented code is
matrix_j = array([ J[j,k]**2 for k in range(N) ])
(where N=800
is the vector length) so that
d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )
This does not make much sense geometrically.
What you could have intended, by what might be implied by the formulas and variable names, is to have J[j]
and I[i]
as actual row vectors which can be achieved by transforming the tables I,J
into matrices,
I,J = np.matrix(I), np.matrix(J)
so that then d[i]
is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i]
, which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as
d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))
which has the form for row vectors $a,b$ of equal dimension
beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign
Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is
d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2
For a component-wise derivation see the answer of Gareth Rees.
These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i])
allows re-use so that all len(I)+len(J)
norm squares are only computed once and the main computational cost is from the len(I)*len(J)
scalar products.
scipy.linalg.norm
might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.
The scalar products are elements of a matrix product of I
and J.T
, so that a compact computation proceeds by (all are np.array
objects)
dotIJ = np.dot(I,J.T);
normI2 = [ np.dot(row,row) for row in I ];
normJ2 = [ np.dot(row,row) for row in J ];
d = np.zeros((len(I),1));
for j in range(len(J)):
d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
# process values of d
edited May 1 at 11:56
answered May 1 at 9:23
LutzL
1563
1563
These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
â gbartonowen
May 1 at 9:35
I'm not sure if that is faster thannp.dot
. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
â LutzL
May 1 at 9:53
@LutzL I am not sure if I get your answer right, but when I computed[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2
it is much faster but I get different results compared to my approach. Do you see why?
â Samuel
May 1 at 10:06
1
I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format ofmatrix_i, matrix_j
is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
â LutzL
May 1 at 11:12
add a comment |Â
These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
â gbartonowen
May 1 at 9:35
I'm not sure if that is faster thannp.dot
. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
â LutzL
May 1 at 9:53
@LutzL I am not sure if I get your answer right, but when I computed[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2
it is much faster but I get different results compared to my approach. Do you see why?
â Samuel
May 1 at 10:06
1
I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format ofmatrix_i, matrix_j
is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
â LutzL
May 1 at 11:12
These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
â gbartonowen
May 1 at 9:35
These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
â gbartonowen
May 1 at 9:35
I'm not sure if that is faster than
np.dot
. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.â LutzL
May 1 at 9:53
I'm not sure if that is faster than
np.dot
. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.â LutzL
May 1 at 9:53
@LutzL I am not sure if I get your answer right, but when I compute
d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2
it is much faster but I get different results compared to my approach. Do you see why?â Samuel
May 1 at 10:06
@LutzL I am not sure if I get your answer right, but when I compute
d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2
it is much faster but I get different results compared to my approach. Do you see why?â Samuel
May 1 at 10:06
1
1
I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of
matrix_i, matrix_j
is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.â LutzL
May 1 at 11:12
I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of
matrix_i, matrix_j
is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.â LutzL
May 1 at 11:12
add a comment |Â
up vote
4
down vote
Here's an explicit working-out of the maths needed for the answer by LutzL.
Let's start by writing $I$ and $J$ for the two vectors I[i]
and J[j]
respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l â J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 â 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 â 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) â 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I÷Iright)^2 + left(J÷Jright)^2 - 2left(I÷Jright)^2$$
add a comment |Â
up vote
4
down vote
Here's an explicit working-out of the maths needed for the answer by LutzL.
Let's start by writing $I$ and $J$ for the two vectors I[i]
and J[j]
respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l â J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 â 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 â 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) â 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I÷Iright)^2 + left(J÷Jright)^2 - 2left(I÷Jright)^2$$
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Here's an explicit working-out of the maths needed for the answer by LutzL.
Let's start by writing $I$ and $J$ for the two vectors I[i]
and J[j]
respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l â J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 â 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 â 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) â 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I÷Iright)^2 + left(J÷Jright)^2 - 2left(I÷Jright)^2$$
Here's an explicit working-out of the maths needed for the answer by LutzL.
Let's start by writing $I$ and $J$ for the two vectors I[i]
and J[j]
respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l â J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 â 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 â 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) â 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I÷Iright)^2 + left(J÷Jright)^2 - 2left(I÷Jright)^2$$
answered May 1 at 11:25
Gareth Rees
41.1k394166
41.1k394166
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193334%2fpython-speeding-up-vector-and-matrix-operations%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
â gbartonowen
May 1 at 8:55
@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
â Samuel
May 1 at 9:01
3
The code in the post looks wrong to me, because
d
has shape(len(J), 1)
but the assignments are to indicesd[0]
, ...,d[len(I)-1]
and for each iteration overj
these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.â Gareth Rees
May 1 at 9:12
@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
â Samuel
May 1 at 9:15
Thank you. But this does not address the problem that the assignments to
d
get overwritten on each iteration. Please fix.â Gareth Rees
May 1 at 9:16