Python: Speeding up vector and matrix operations

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite
1












The core of my program are the following lines



import numpy as np
import time

I, J = np.random.randn(20000,800), np.random.randn(10000,800)

d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d


I and J are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i and matrix_j. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.



What can I do to speed things a little bit up? Any ideas?







share|improve this question





















  • My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
    – gbartonowen
    May 1 at 8:55










  • @gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
    – Samuel
    May 1 at 9:01






  • 3




    The code in the post looks wrong to me, because d has shape (len(J), 1) but the assignments are to indices d[0], ..., d[len(I)-1] and for each iteration over j these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
    – Gareth Rees
    May 1 at 9:12











  • @GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
    – Samuel
    May 1 at 9:15











  • Thank you. But this does not address the problem that the assignments to d get overwritten on each iteration. Please fix.
    – Gareth Rees
    May 1 at 9:16

















up vote
3
down vote

favorite
1












The core of my program are the following lines



import numpy as np
import time

I, J = np.random.randn(20000,800), np.random.randn(10000,800)

d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d


I and J are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i and matrix_j. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.



What can I do to speed things a little bit up? Any ideas?







share|improve this question





















  • My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
    – gbartonowen
    May 1 at 8:55










  • @gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
    – Samuel
    May 1 at 9:01






  • 3




    The code in the post looks wrong to me, because d has shape (len(J), 1) but the assignments are to indices d[0], ..., d[len(I)-1] and for each iteration over j these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
    – Gareth Rees
    May 1 at 9:12











  • @GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
    – Samuel
    May 1 at 9:15











  • Thank you. But this does not address the problem that the assignments to d get overwritten on each iteration. Please fix.
    – Gareth Rees
    May 1 at 9:16













up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





The core of my program are the following lines



import numpy as np
import time

I, J = np.random.randn(20000,800), np.random.randn(10000,800)

d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d


I and J are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i and matrix_j. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.



What can I do to speed things a little bit up? Any ideas?







share|improve this question













The core of my program are the following lines



import numpy as np
import time

I, J = np.random.randn(20000,800), np.random.randn(10000,800)

d = np.zeros((len(I),1))
for j in range(len(J)):
t0 = time.time()
# Compute distance
matrix_j = np.array([J[j]]).T*np.array([J[j]])
for i in range(len(I)):
matrix_i = np.array([I[i]]).T*np.array([I[i]])
d[i] = np.sum((matrix_i-matrix_j)**2)
print(time.time()-t0)
# Do stuff with vector d


I and J are some matrices with data. I basically compare vectors of each matrix where I first extract a vector and than multiply it with its transposed version to get a matrix matrix_i and matrix_j. Afterwards I subtract both matrices from each other and sum all elements. But it takes ages even for such a small data set.



What can I do to speed things a little bit up? Any ideas?









share|improve this question












share|improve this question




share|improve this question








edited May 1 at 10:56









Latika Agarwal

861216




861216









asked May 1 at 8:17









Samuel

1839




1839











  • My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
    – gbartonowen
    May 1 at 8:55










  • @gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
    – Samuel
    May 1 at 9:01






  • 3




    The code in the post looks wrong to me, because d has shape (len(J), 1) but the assignments are to indices d[0], ..., d[len(I)-1] and for each iteration over j these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
    – Gareth Rees
    May 1 at 9:12











  • @GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
    – Samuel
    May 1 at 9:15











  • Thank you. But this does not address the problem that the assignments to d get overwritten on each iteration. Please fix.
    – Gareth Rees
    May 1 at 9:16

















  • My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
    – gbartonowen
    May 1 at 8:55










  • @gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
    – Samuel
    May 1 at 9:01






  • 3




    The code in the post looks wrong to me, because d has shape (len(J), 1) but the assignments are to indices d[0], ..., d[len(I)-1] and for each iteration over j these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
    – Gareth Rees
    May 1 at 9:12











  • @GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
    – Samuel
    May 1 at 9:15











  • Thank you. But this does not address the problem that the assignments to d get overwritten on each iteration. Please fix.
    – Gareth Rees
    May 1 at 9:16
















My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
– gbartonowen
May 1 at 8:55




My Advice to start with is to work out what d[i] is in terms of I and J, mathmatically, you might spot some better matrix operations. I'm not sure what the matrix squaring is supposed to be doing?
– gbartonowen
May 1 at 8:55












@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
– Samuel
May 1 at 9:01




@gbartonowen I am squaring the matrices such that the elements do not cancel each other out.
– Samuel
May 1 at 9:01




3




3




The code in the post looks wrong to me, because d has shape (len(J), 1) but the assignments are to indices d[0], ..., d[len(I)-1] and for each iteration over j these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
– Gareth Rees
May 1 at 9:12





The code in the post looks wrong to me, because d has shape (len(J), 1) but the assignments are to indices d[0], ..., d[len(I)-1] and for each iteration over j these overwrite all the assignments from the previous iteration. This makes no sense. Are you sure that this corresponds to your real code? Here at Code Review, we prefer like to see your real code (not some hypothetical version of it) precisely because of issues like this.
– Gareth Rees
May 1 at 9:12













@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
– Samuel
May 1 at 9:15





@GarethRees You are right! I corrected the error. This is exactly the core of an algorithm I use. Only the data sets are random.
– Samuel
May 1 at 9:15













Thank you. But this does not address the problem that the assignments to d get overwritten on each iteration. Please fix.
– Gareth Rees
May 1 at 9:16





Thank you. But this does not address the problem that the assignments to d get overwritten on each iteration. Please fix.
– Gareth Rees
May 1 at 9:16











2 Answers
2






active

oldest

votes

















up vote
4
down vote



accepted










What you compute with the presented code is



matrix_j = array([ J[j,k]**2 for k in range(N) ]) 


(where N=800 is the vector length) so that



d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )


This does not make much sense geometrically.




What you could have intended, by what might be implied by the formulas and variable names, is to have J[j] and I[i] as actual row vectors which can be achieved by transforming the tables I,J into matrices,



I,J = np.matrix(I), np.matrix(J)


so that then d[i] is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i], which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as



d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))


which has the form for row vectors $a,b$ of equal dimension
beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign



Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is



d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2


For a component-wise derivation see the answer of Gareth Rees.




These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i]) allows re-use so that all len(I)+len(J) norm squares are only computed once and the main computational cost is from the len(I)*len(J) scalar products.



scipy.linalg.norm might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.




The scalar products are elements of a matrix product of I and J.T, so that a compact computation proceeds by (all are np.array objects)



dotIJ = np.dot(I,J.T); 
normI2 = [ np.dot(row,row) for row in I ];
normJ2 = [ np.dot(row,row) for row in J ];
d = np.zeros((len(I),1));
for j in range(len(J)):
d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
# process values of d





share|improve this answer























  • These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
    – gbartonowen
    May 1 at 9:35










  • I'm not sure if that is faster than np.dot. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
    – LutzL
    May 1 at 9:53










  • @LutzL I am not sure if I get your answer right, but when I compute d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2 it is much faster but I get different results compared to my approach. Do you see why?
    – Samuel
    May 1 at 10:06







  • 1




    I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of matrix_i, matrix_j is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
    – LutzL
    May 1 at 11:12

















up vote
4
down vote













Here's an explicit working-out of the maths needed for the answer by LutzL.



Let's start by writing $I$ and $J$ for the two vectors I[i] and J[j] respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l − J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 − 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 − 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) − 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I·Iright)^2 + left(J·Jright)^2 - 2left(I·Jright)^2$$






share|improve this answer





















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193334%2fpython-speeding-up-vector-and-matrix-operations%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    4
    down vote



    accepted










    What you compute with the presented code is



    matrix_j = array([ J[j,k]**2 for k in range(N) ]) 


    (where N=800 is the vector length) so that



    d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )


    This does not make much sense geometrically.




    What you could have intended, by what might be implied by the formulas and variable names, is to have J[j] and I[i] as actual row vectors which can be achieved by transforming the tables I,J into matrices,



    I,J = np.matrix(I), np.matrix(J)


    so that then d[i] is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i], which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as



    d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))


    which has the form for row vectors $a,b$ of equal dimension
    beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign



    Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is



    d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2


    For a component-wise derivation see the answer of Gareth Rees.




    These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i]) allows re-use so that all len(I)+len(J) norm squares are only computed once and the main computational cost is from the len(I)*len(J) scalar products.



    scipy.linalg.norm might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.




    The scalar products are elements of a matrix product of I and J.T, so that a compact computation proceeds by (all are np.array objects)



    dotIJ = np.dot(I,J.T); 
    normI2 = [ np.dot(row,row) for row in I ];
    normJ2 = [ np.dot(row,row) for row in J ];
    d = np.zeros((len(I),1));
    for j in range(len(J)):
    d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
    # process values of d





    share|improve this answer























    • These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
      – gbartonowen
      May 1 at 9:35










    • I'm not sure if that is faster than np.dot. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
      – LutzL
      May 1 at 9:53










    • @LutzL I am not sure if I get your answer right, but when I compute d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2 it is much faster but I get different results compared to my approach. Do you see why?
      – Samuel
      May 1 at 10:06







    • 1




      I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of matrix_i, matrix_j is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
      – LutzL
      May 1 at 11:12














    up vote
    4
    down vote



    accepted










    What you compute with the presented code is



    matrix_j = array([ J[j,k]**2 for k in range(N) ]) 


    (where N=800 is the vector length) so that



    d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )


    This does not make much sense geometrically.




    What you could have intended, by what might be implied by the formulas and variable names, is to have J[j] and I[i] as actual row vectors which can be achieved by transforming the tables I,J into matrices,



    I,J = np.matrix(I), np.matrix(J)


    so that then d[i] is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i], which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as



    d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))


    which has the form for row vectors $a,b$ of equal dimension
    beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign



    Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is



    d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2


    For a component-wise derivation see the answer of Gareth Rees.




    These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i]) allows re-use so that all len(I)+len(J) norm squares are only computed once and the main computational cost is from the len(I)*len(J) scalar products.



    scipy.linalg.norm might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.




    The scalar products are elements of a matrix product of I and J.T, so that a compact computation proceeds by (all are np.array objects)



    dotIJ = np.dot(I,J.T); 
    normI2 = [ np.dot(row,row) for row in I ];
    normJ2 = [ np.dot(row,row) for row in J ];
    d = np.zeros((len(I),1));
    for j in range(len(J)):
    d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
    # process values of d





    share|improve this answer























    • These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
      – gbartonowen
      May 1 at 9:35










    • I'm not sure if that is faster than np.dot. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
      – LutzL
      May 1 at 9:53










    • @LutzL I am not sure if I get your answer right, but when I compute d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2 it is much faster but I get different results compared to my approach. Do you see why?
      – Samuel
      May 1 at 10:06







    • 1




      I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of matrix_i, matrix_j is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
      – LutzL
      May 1 at 11:12












    up vote
    4
    down vote



    accepted







    up vote
    4
    down vote



    accepted






    What you compute with the presented code is



    matrix_j = array([ J[j,k]**2 for k in range(N) ]) 


    (where N=800 is the vector length) so that



    d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )


    This does not make much sense geometrically.




    What you could have intended, by what might be implied by the formulas and variable names, is to have J[j] and I[i] as actual row vectors which can be achieved by transforming the tables I,J into matrices,



    I,J = np.matrix(I), np.matrix(J)


    so that then d[i] is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i], which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as



    d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))


    which has the form for row vectors $a,b$ of equal dimension
    beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign



    Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is



    d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2


    For a component-wise derivation see the answer of Gareth Rees.




    These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i]) allows re-use so that all len(I)+len(J) norm squares are only computed once and the main computational cost is from the len(I)*len(J) scalar products.



    scipy.linalg.norm might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.




    The scalar products are elements of a matrix product of I and J.T, so that a compact computation proceeds by (all are np.array objects)



    dotIJ = np.dot(I,J.T); 
    normI2 = [ np.dot(row,row) for row in I ];
    normJ2 = [ np.dot(row,row) for row in J ];
    d = np.zeros((len(I),1));
    for j in range(len(J)):
    d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
    # process values of d





    share|improve this answer















    What you compute with the presented code is



    matrix_j = array([ J[j,k]**2 for k in range(N) ]) 


    (where N=800 is the vector length) so that



    d[i] = sum( (J[j,k]**2 - I[i,k]**2)**2 for k in range(N) )


    This does not make much sense geometrically.




    What you could have intended, by what might be implied by the formulas and variable names, is to have J[j] and I[i] as actual row vectors which can be achieved by transforming the tables I,J into matrices,



    I,J = np.matrix(I), np.matrix(J)


    so that then d[i] is the square of the Frobenius norm of J[j].T*J[j]-I[i].T*I[i], which is also the sum over the squares over all matrix elements. As this matrix is symmetric, this quantity can be computed as



    d[i] = np.sum(np.diag( (J[j].T*J[j]-I[i].T*I[i])**2 ))


    which has the form for row vectors $a,b$ of equal dimension
    beginalign|a^Ta-b^Tb|_F^2 &= texttrace((a^Ta-b^Tb)^2)\ &=texttrace((a^Ta-b^Tb)(a^Ta-b^Tb))\ &=texttrace(a^Taa^Ta-a^Tab^Tb-b^Tba^Ta+b^Tbb^Tb))\ &=(aa^T)^2-2(ab^T)^2+(bb^T)^2\ &=|a|^4-2langle a,brangle ^2+|b|^4 endalign



    Re-expressed in the rows of the original matrices, that should be 2D-arrays, not converted to matrix objects, this is



    d[i] = np.dot(I[i],I[i])**2 - 2*np.dot(I[i],J[j])**2 + np.dot(J[j],J[j])**2


    For a component-wise derivation see the answer of Gareth Rees.




    These norm squares and scalar products should be faster to compute than the original formula. Also, pre-computing the norm squares np.dot(I[i],I[i]) allows re-use so that all len(I)+len(J) norm squares are only computed once and the main computational cost is from the len(I)*len(J) scalar products.



    scipy.linalg.norm might be faster in computing the norm, as squaring can be faster than multiplying the same number to itself, but I'm not sure how that translates over the several layers of interpretation and data encapsulation.




    The scalar products are elements of a matrix product of I and J.T, so that a compact computation proceeds by (all are np.array objects)



    dotIJ = np.dot(I,J.T); 
    normI2 = [ np.dot(row,row) for row in I ];
    normJ2 = [ np.dot(row,row) for row in J ];
    d = np.zeros((len(I),1));
    for j in range(len(J)):
    d[:,0] = [ normI2[i]**2 + normJ2[j]**2 - 2*dotIJ[i,j]**2 for i in range(len(I)) ]
    # process values of d






    share|improve this answer















    share|improve this answer



    share|improve this answer








    edited May 1 at 11:56


























    answered May 1 at 9:23









    LutzL

    1563




    1563











    • These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
      – gbartonowen
      May 1 at 9:35










    • I'm not sure if that is faster than np.dot. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
      – LutzL
      May 1 at 9:53










    • @LutzL I am not sure if I get your answer right, but when I compute d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2 it is much faster but I get different results compared to my approach. Do you see why?
      – Samuel
      May 1 at 10:06







    • 1




      I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of matrix_i, matrix_j is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
      – LutzL
      May 1 at 11:12
















    • These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
      – gbartonowen
      May 1 at 9:35










    • I'm not sure if that is faster than np.dot. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
      – LutzL
      May 1 at 9:53










    • @LutzL I am not sure if I get your answer right, but when I compute d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2 it is much faster but I get different results compared to my approach. Do you see why?
      – Samuel
      May 1 at 10:06







    • 1




      I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of matrix_i, matrix_j is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
      – LutzL
      May 1 at 11:12















    These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
    – gbartonowen
    May 1 at 9:35




    These can be further optimised by using np.sum with an axis argument, I suspect, as an extra win!
    – gbartonowen
    May 1 at 9:35












    I'm not sure if that is faster than np.dot. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
    – LutzL
    May 1 at 9:53




    I'm not sure if that is faster than np.dot. Especially as there are still element-wise operations to perform first. If that result in an extra loop and data object creation, it should be slower.
    – LutzL
    May 1 at 9:53












    @LutzL I am not sure if I get your answer right, but when I compute d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2 it is much faster but I get different results compared to my approach. Do you see why?
    – Samuel
    May 1 at 10:06





    @LutzL I am not sure if I get your answer right, but when I compute d[i]=np.dot(J[j],J[j])**2-2*np.dot(J[j],I[i])**2+np.dot(I[i],I[i])**2 it is much faster but I get different results compared to my approach. Do you see why?
    – Samuel
    May 1 at 10:06





    1




    1




    I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of matrix_i, matrix_j is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
    – LutzL
    May 1 at 11:12




    I've added a first paragraph on why I think your computation might be wrong relative to what you expect to compute. Please check yourself, especially if the format of matrix_i, matrix_j is as expected. As you named them "matrix", one might think that you might expect the format of a 800x800 2D-array.
    – LutzL
    May 1 at 11:12












    up vote
    4
    down vote













    Here's an explicit working-out of the maths needed for the answer by LutzL.



    Let's start by writing $I$ and $J$ for the two vectors I[i] and J[j] respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l − J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 − 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 − 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) − 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I·Iright)^2 + left(J·Jright)^2 - 2left(I·Jright)^2$$






    share|improve this answer

























      up vote
      4
      down vote













      Here's an explicit working-out of the maths needed for the answer by LutzL.



      Let's start by writing $I$ and $J$ for the two vectors I[i] and J[j] respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l − J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 − 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 − 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) − 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I·Iright)^2 + left(J·Jright)^2 - 2left(I·Jright)^2$$






      share|improve this answer























        up vote
        4
        down vote










        up vote
        4
        down vote









        Here's an explicit working-out of the maths needed for the answer by LutzL.



        Let's start by writing $I$ and $J$ for the two vectors I[i] and J[j] respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l − J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 − 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 − 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) − 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I·Iright)^2 + left(J·Jright)^2 - 2left(I·Jright)^2$$






        share|improve this answer













        Here's an explicit working-out of the maths needed for the answer by LutzL.



        Let's start by writing $I$ and $J$ for the two vectors I[i] and J[j] respectively and $n$ for the length of these vectors (800 in the example in the post). Then the distance $d$ that we want to compute is $$sum_k sum_l left(I_k I_l − J_k J_lright)^2$$ where $k$ and $l$ range over the indices from $0$ to $n-1$. Expand the square: $$sum_k sum_l left(I_k^2 I_l^2 + J_k^2 J_l^2 − 2 I_k I_l J_k J_lright).$$ Extract the sub-expressions that don't depend on $l$: $$sum_k left(I_k^2 sum_l I_l^2 + J_k^2 sum_l J_l^2 − 2 I_k J_k sum_l I_l J_lright).$$ Multiply out: $$ left(sum_k I_k^2right) left(sum_l I_l^2right) + left(sum_k J_k^2right) left(sum_l J_l^2right) − 2 left(sum_k I_k J_kright) left(sum_l I_l J_lright).$$ Now there are no nested summations, so rename $l$ to $k$ and then combine identical sums: $$ left(sum_k I_k^2right)^2 + left(sum_k J_k^2right)^2 - 2 left(sum_k I_k J_kright)^2.$$ Finally, rewrite the sums as dot products: $$left(I·Iright)^2 + left(J·Jright)^2 - 2left(I·Jright)^2$$







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered May 1 at 11:25









        Gareth Rees

        41.1k394166




        41.1k394166






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193334%2fpython-speeding-up-vector-and-matrix-operations%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Greedy Best First Search implementation in Rust

            Function to Return a JSON Like Objects Using VBA Collections and Arrays

            C++11 CLH Lock Implementation