Sub that compares 2 columns fast

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












This is an answer to this SO question: Vba comparing 2 columns of data.



In that question the OP code is very slow (also not working properly but the problem is fixed)



I spent some time to implement a version with decent performance at 1M rows+



Note: I know Activesheet is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet




Description



  • Sets up 2 dictionaries (late-bound)

  • Sets up the main UsedRange (ur) variable

  • Deletes the results columns C and D


  • Setups 4 arrays




    • arrA for the first column to be compared (Col A)

    • arrB for the second column to be compared (Col B)


    • arrC for the 1st result column C (from Col A, which shows vals not found in col B)



    • arrD for the 2nd result column D (from Col B, showing vals not found in col A)

      • (Each column can be a constant, or Optional Param with default vals)




  • Load data to dictionaries:




    • For loop 1: adds all elements in Col A as keys, in dictionary dColA


    • For loop 2: adds all elements in Col B as keys, in dictionary dColB


  • Compare dictionary items


    • For loop 3: for each itm in dColA checks their existence in dColB

      • If an item is not found, it places it in the 3th array (arrC)



    • For loop 4: for each itm in dColB checks their existence in dColA

      • If an item is not found, it places it in the 4th array (arrD)



  • At the end, it places both arr3 and arr4 back on the range (Col C and D respectively)


Option Explicit

Public Sub Compare2Cols()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

Dim ur As Range: Set ur = ActiveSheet.UsedRange

ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")

Dim itm As Variant

For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
Next
Dim r As Long
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub



Test results - dictionaries:



Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec


Test results - collections (suggested by @juvian, about 4 times faster for large data)



Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec



Image of test results



TestData




.



Other versions (combinations of reduced For loops) I've already tried:



.



V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3




Public Sub Compare2ColsV2()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

Dim ur As Range: Set ur = ActiveSheet.UsedRange

ur.Columns("C:D").Delete

Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")

Dim itm As Variant, r As Long

For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
dColA(arrA(r, 1)) = 0
dColB(arrB(r, 1)) = 0
Next

r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next

r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next

ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub



.



V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);




Public Sub Compare2ColsV3()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

Dim ur As Range: Set ur = ActiveSheet.UsedRange

ur.Columns("C:D").Delete

Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")

Dim itm As Variant, r As Long

For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next

ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub



Any performance improvements or suggestions are appreciated



PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA) vs. For Each itm In arrA with no other changes - there is no noticeable difference in speed between them







share|improve this question



























    up vote
    4
    down vote

    favorite












    This is an answer to this SO question: Vba comparing 2 columns of data.



    In that question the OP code is very slow (also not working properly but the problem is fixed)



    I spent some time to implement a version with decent performance at 1M rows+



    Note: I know Activesheet is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet




    Description



    • Sets up 2 dictionaries (late-bound)

    • Sets up the main UsedRange (ur) variable

    • Deletes the results columns C and D


    • Setups 4 arrays




      • arrA for the first column to be compared (Col A)

      • arrB for the second column to be compared (Col B)


      • arrC for the 1st result column C (from Col A, which shows vals not found in col B)



      • arrD for the 2nd result column D (from Col B, showing vals not found in col A)

        • (Each column can be a constant, or Optional Param with default vals)




    • Load data to dictionaries:




      • For loop 1: adds all elements in Col A as keys, in dictionary dColA


      • For loop 2: adds all elements in Col B as keys, in dictionary dColB


    • Compare dictionary items


      • For loop 3: for each itm in dColA checks their existence in dColB

        • If an item is not found, it places it in the 3th array (arrC)



      • For loop 4: for each itm in dColB checks their existence in dColA

        • If an item is not found, it places it in the 4th array (arrD)



    • At the end, it places both arr3 and arr4 back on the range (Col C and D respectively)


    Option Explicit

    Public Sub Compare2Cols()
    Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
    Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

    Dim ur As Range: Set ur = ActiveSheet.UsedRange

    ur.Columns("C:D").Delete
    Dim arrA As Variant: arrA = ur.Columns("A")
    Dim arrB As Variant: arrB = ur.Columns("B")
    Dim arrC As Variant: arrC = ur.Columns("C")
    Dim arrD As Variant: arrD = ur.Columns("D")

    Dim itm As Variant

    For Each itm In arrA
    dColA(itm) = 0
    Next
    For Each itm In arrB
    dColB(itm) = 0
    Next
    Dim r As Long
    For Each itm In dColA
    r = r + 1
    If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
    Next
    r = 0
    For Each itm In dColB
    r = r + 1
    If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
    Next
    ur.Columns("C") = arrC
    ur.Columns("D") = arrD
    End Sub



    Test results - dictionaries:



    Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
    Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
    Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
    Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
    Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
    Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
    Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
    Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec


    Test results - collections (suggested by @juvian, about 4 times faster for large data)



    Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
    Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
    Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
    Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
    Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
    Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
    Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
    Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec



    Image of test results



    TestData




    .



    Other versions (combinations of reduced For loops) I've already tried:



    .



    V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3




    Public Sub Compare2ColsV2()
    Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
    Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

    Dim ur As Range: Set ur = ActiveSheet.UsedRange

    ur.Columns("C:D").Delete

    Dim arrA As Variant: arrA = ur.Columns("A")
    Dim arrB As Variant: arrB = ur.Columns("B")
    Dim arrC As Variant: arrC = ur.Columns("C")
    Dim arrD As Variant: arrD = ur.Columns("D")

    Dim itm As Variant, r As Long

    For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
    dColA(arrA(r, 1)) = 0
    dColB(arrB(r, 1)) = 0
    Next

    r = 0
    For Each itm In dColA
    r = r + 1
    If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
    Next

    r = 0
    For Each itm In dColB
    r = r + 1
    If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
    Next

    ur.Columns("C") = arrC
    ur.Columns("D") = arrD
    End Sub



    .



    V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);




    Public Sub Compare2ColsV3()
    Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
    Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

    Dim ur As Range: Set ur = ActiveSheet.UsedRange

    ur.Columns("C:D").Delete

    Dim arrA As Variant: arrA = ur.Columns("A")
    Dim arrB As Variant: arrB = ur.Columns("B")
    Dim arrC As Variant: arrC = ur.Columns("C")
    Dim arrD As Variant: arrD = ur.Columns("D")

    Dim itm As Variant, r As Long

    For Each itm In arrA
    dColA(itm) = 0
    Next
    For Each itm In arrB
    dColB(itm) = 0
    r = r + 1
    If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
    Next
    r = 0
    For Each itm In dColA
    r = r + 1
    If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
    Next

    ur.Columns("C") = arrC
    ur.Columns("D") = arrD
    End Sub



    Any performance improvements or suggestions are appreciated



    PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA) vs. For Each itm In arrA with no other changes - there is no noticeable difference in speed between them







    share|improve this question























      up vote
      4
      down vote

      favorite









      up vote
      4
      down vote

      favorite











      This is an answer to this SO question: Vba comparing 2 columns of data.



      In that question the OP code is very slow (also not working properly but the problem is fixed)



      I spent some time to implement a version with decent performance at 1M rows+



      Note: I know Activesheet is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet




      Description



      • Sets up 2 dictionaries (late-bound)

      • Sets up the main UsedRange (ur) variable

      • Deletes the results columns C and D


      • Setups 4 arrays




        • arrA for the first column to be compared (Col A)

        • arrB for the second column to be compared (Col B)


        • arrC for the 1st result column C (from Col A, which shows vals not found in col B)



        • arrD for the 2nd result column D (from Col B, showing vals not found in col A)

          • (Each column can be a constant, or Optional Param with default vals)




      • Load data to dictionaries:




        • For loop 1: adds all elements in Col A as keys, in dictionary dColA


        • For loop 2: adds all elements in Col B as keys, in dictionary dColB


      • Compare dictionary items


        • For loop 3: for each itm in dColA checks their existence in dColB

          • If an item is not found, it places it in the 3th array (arrC)



        • For loop 4: for each itm in dColB checks their existence in dColA

          • If an item is not found, it places it in the 4th array (arrD)



      • At the end, it places both arr3 and arr4 back on the range (Col C and D respectively)


      Option Explicit

      Public Sub Compare2Cols()
      Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
      Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

      Dim ur As Range: Set ur = ActiveSheet.UsedRange

      ur.Columns("C:D").Delete
      Dim arrA As Variant: arrA = ur.Columns("A")
      Dim arrB As Variant: arrB = ur.Columns("B")
      Dim arrC As Variant: arrC = ur.Columns("C")
      Dim arrD As Variant: arrD = ur.Columns("D")

      Dim itm As Variant

      For Each itm In arrA
      dColA(itm) = 0
      Next
      For Each itm In arrB
      dColB(itm) = 0
      Next
      Dim r As Long
      For Each itm In dColA
      r = r + 1
      If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
      Next
      r = 0
      For Each itm In dColB
      r = r + 1
      If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
      Next
      ur.Columns("C") = arrC
      ur.Columns("D") = arrD
      End Sub



      Test results - dictionaries:



      Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
      Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
      Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
      Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
      Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
      Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
      Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
      Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec


      Test results - collections (suggested by @juvian, about 4 times faster for large data)



      Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
      Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
      Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
      Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
      Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
      Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
      Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
      Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec



      Image of test results



      TestData




      .



      Other versions (combinations of reduced For loops) I've already tried:



      .



      V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3




      Public Sub Compare2ColsV2()
      Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
      Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

      Dim ur As Range: Set ur = ActiveSheet.UsedRange

      ur.Columns("C:D").Delete

      Dim arrA As Variant: arrA = ur.Columns("A")
      Dim arrB As Variant: arrB = ur.Columns("B")
      Dim arrC As Variant: arrC = ur.Columns("C")
      Dim arrD As Variant: arrD = ur.Columns("D")

      Dim itm As Variant, r As Long

      For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
      dColA(arrA(r, 1)) = 0
      dColB(arrB(r, 1)) = 0
      Next

      r = 0
      For Each itm In dColA
      r = r + 1
      If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
      Next

      r = 0
      For Each itm In dColB
      r = r + 1
      If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
      Next

      ur.Columns("C") = arrC
      ur.Columns("D") = arrD
      End Sub



      .



      V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);




      Public Sub Compare2ColsV3()
      Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
      Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

      Dim ur As Range: Set ur = ActiveSheet.UsedRange

      ur.Columns("C:D").Delete

      Dim arrA As Variant: arrA = ur.Columns("A")
      Dim arrB As Variant: arrB = ur.Columns("B")
      Dim arrC As Variant: arrC = ur.Columns("C")
      Dim arrD As Variant: arrD = ur.Columns("D")

      Dim itm As Variant, r As Long

      For Each itm In arrA
      dColA(itm) = 0
      Next
      For Each itm In arrB
      dColB(itm) = 0
      r = r + 1
      If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
      Next
      r = 0
      For Each itm In dColA
      r = r + 1
      If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
      Next

      ur.Columns("C") = arrC
      ur.Columns("D") = arrD
      End Sub



      Any performance improvements or suggestions are appreciated



      PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA) vs. For Each itm In arrA with no other changes - there is no noticeable difference in speed between them







      share|improve this question













      This is an answer to this SO question: Vba comparing 2 columns of data.



      In that question the OP code is very slow (also not working properly but the problem is fixed)



      I spent some time to implement a version with decent performance at 1M rows+



      Note: I know Activesheet is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet




      Description



      • Sets up 2 dictionaries (late-bound)

      • Sets up the main UsedRange (ur) variable

      • Deletes the results columns C and D


      • Setups 4 arrays




        • arrA for the first column to be compared (Col A)

        • arrB for the second column to be compared (Col B)


        • arrC for the 1st result column C (from Col A, which shows vals not found in col B)



        • arrD for the 2nd result column D (from Col B, showing vals not found in col A)

          • (Each column can be a constant, or Optional Param with default vals)




      • Load data to dictionaries:




        • For loop 1: adds all elements in Col A as keys, in dictionary dColA


        • For loop 2: adds all elements in Col B as keys, in dictionary dColB


      • Compare dictionary items


        • For loop 3: for each itm in dColA checks their existence in dColB

          • If an item is not found, it places it in the 3th array (arrC)



        • For loop 4: for each itm in dColB checks their existence in dColA

          • If an item is not found, it places it in the 4th array (arrD)



      • At the end, it places both arr3 and arr4 back on the range (Col C and D respectively)


      Option Explicit

      Public Sub Compare2Cols()
      Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
      Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

      Dim ur As Range: Set ur = ActiveSheet.UsedRange

      ur.Columns("C:D").Delete
      Dim arrA As Variant: arrA = ur.Columns("A")
      Dim arrB As Variant: arrB = ur.Columns("B")
      Dim arrC As Variant: arrC = ur.Columns("C")
      Dim arrD As Variant: arrD = ur.Columns("D")

      Dim itm As Variant

      For Each itm In arrA
      dColA(itm) = 0
      Next
      For Each itm In arrB
      dColB(itm) = 0
      Next
      Dim r As Long
      For Each itm In dColA
      r = r + 1
      If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
      Next
      r = 0
      For Each itm In dColB
      r = r + 1
      If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
      Next
      ur.Columns("C") = arrC
      ur.Columns("D") = arrD
      End Sub



      Test results - dictionaries:



      Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
      Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
      Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
      Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
      Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
      Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
      Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
      Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec


      Test results - collections (suggested by @juvian, about 4 times faster for large data)



      Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
      Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
      Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
      Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
      Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
      Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
      Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
      Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec



      Image of test results



      TestData




      .



      Other versions (combinations of reduced For loops) I've already tried:



      .



      V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3




      Public Sub Compare2ColsV2()
      Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
      Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

      Dim ur As Range: Set ur = ActiveSheet.UsedRange

      ur.Columns("C:D").Delete

      Dim arrA As Variant: arrA = ur.Columns("A")
      Dim arrB As Variant: arrB = ur.Columns("B")
      Dim arrC As Variant: arrC = ur.Columns("C")
      Dim arrD As Variant: arrD = ur.Columns("D")

      Dim itm As Variant, r As Long

      For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
      dColA(arrA(r, 1)) = 0
      dColB(arrB(r, 1)) = 0
      Next

      r = 0
      For Each itm In dColA
      r = r + 1
      If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
      Next

      r = 0
      For Each itm In dColB
      r = r + 1
      If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
      Next

      ur.Columns("C") = arrC
      ur.Columns("D") = arrD
      End Sub



      .



      V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);




      Public Sub Compare2ColsV3()
      Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
      Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")

      Dim ur As Range: Set ur = ActiveSheet.UsedRange

      ur.Columns("C:D").Delete

      Dim arrA As Variant: arrA = ur.Columns("A")
      Dim arrB As Variant: arrB = ur.Columns("B")
      Dim arrC As Variant: arrC = ur.Columns("C")
      Dim arrD As Variant: arrD = ur.Columns("D")

      Dim itm As Variant, r As Long

      For Each itm In arrA
      dColA(itm) = 0
      Next
      For Each itm In arrB
      dColB(itm) = 0
      r = r + 1
      If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
      Next
      r = 0
      For Each itm In dColA
      r = r + 1
      If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
      Next

      ur.Columns("C") = arrC
      ur.Columns("D") = arrD
      End Sub



      Any performance improvements or suggestions are appreciated



      PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA) vs. For Each itm In arrA with no other changes - there is no noticeable difference in speed between them









      share|improve this question












      share|improve this question




      share|improve this question








      edited Apr 18 at 0:36
























      asked Apr 16 at 6:27









      paul bica

      1,059613




      1,059613




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          It´s worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.



          One of them is : Count distinct or unique values



          It is true that collections don´t have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection



          Another place worth checking is Get unique values from array



          Implementation (pb)




          Option Explicit

          Public Sub Compare2ColsUsingCollections()
          Dim t As Double, tr As String: t = Timer
          Dim dColA As Collection: Set dColA = New Collection
          Dim dColB As Collection: Set dColB = New Collection
          Dim ur As Range: Set ur = ActiveSheet.UsedRange
          ur.Columns("C:D").Delete
          Dim arrA As Variant: arrA = ur.Columns("A")
          Dim arrB As Variant: arrB = ur.Columns("B")
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")

          Dim itm As Variant, r As Long, s As String
          For Each itm In arrA
          s = CStr(itm)
          If IsMissing(dColA, s) Then dColA.Add s, s
          Next
          For Each itm In arrB
          s = CStr(itm)
          If IsMissing(dColB, s) Then dColB.Add s, s
          Next
          For Each itm In dColA
          r = r + 1
          If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next
          r = 0
          For Each itm In dColB
          r = r + 1
          If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
          Next
          ur.Columns("C") = arrC: ur.Columns("D") = arrD
          tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
          On Error GoTo IsMissingError
          Dim v As Variant
          v = c(fld)
          Exit Function
          IsMissingError:
          IsMissing = True
          End Function



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
          Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
          Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
          Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
          Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
          Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
          Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
          Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec



          Same results with same data:



          Dictionaries



          Dictionaries



          Collections



          Collections






          share|improve this answer























          • Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
            – paul bica
            Apr 16 at 20:18











          • @paulbica do you plan to make it generic or only for integers?
            – juvian
            Apr 16 at 20:31










          • Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
            – paul bica
            Apr 16 at 20:33










          • Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
            – paul bica
            Apr 16 at 21:00











          • @paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to long.
            – Raystafarian
            Apr 16 at 21:27

















          up vote
          6
          down vote













          I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.




          Confusion



          That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange? It's a bit misleading, but that's okay. Maybe rename ur to be more clear. UserRange for instance. And why create variants of columns you just deleted?




          ur.Columns("C:D").Delete
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")



          If you need some empty variants sized, you can do that explicitly using the UBound of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?



          And I know you were going for flexible, but arrA arrB arrC arrD aren't doing me any favors. What are they? Is arrC a result of arrA? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.



          Also, what if my UsedRange doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G - the A to D will look strange to me.




          Dictionaries



          Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm to key to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.




          Populating



          When you populate your arrays here




          For Each itm In dColA
          r = r + 1
          If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next



          You're increasing the row and leaving it blank if the if condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.




          Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.






          share|improve this answer





















          • Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I Dim and I Set1 - done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA, arrB, arrC, arrD as parameters would become vArrColToCompare1, vArrColToCompare2, etc
            – paul bica
            Apr 16 at 7:02










          • The UsedRange is debatable: what if the user has empty cells .End(xlDown) will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
            – paul bica
            Apr 16 at 7:06











          • Can you think of any performance improvements at all?
            – paul bica
            Apr 16 at 7:08










          • I just meant if you're starting with usedrange then you know the last row. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
            – Raystafarian
            Apr 16 at 7:08


















          up vote
          1
          down vote













          If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs.



          Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long arrays.



          This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.




          Option Explicit

          Public Sub Compare2NumColsUsingBoolArrays()
          Dim t As Double, tr As String: t = Timer
          Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
          Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
          Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)

          Dim arr As Variant: arr = ur.Columns("A:D")
          Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
          Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
          Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
          Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
          Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
          Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
          Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
          Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)

          Dim r As Long, rC As Long, rD As Long
          For r = 1 To UBound(arr) 'Arr index = value (true/false)
          nArrA(arr(r, 1)) = True
          nArrB(arr(r, 2)) = True
          Next

          For r = minN To maxN
          If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
          rC = rC + 1: arr(rC, 3) = r
          ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
          rD = rD + 1: arr(rD, 4) = r
          End If
          Next
          ur.Columns("A:D") = arr
          tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          .



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
          Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec



          Test results




          .



          Notes:



          • Adapted code to exclude Headers, and place results back in Col C and D

          • Test results don't show the positioning of missing elements





          share|improve this answer























          • At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
            – paul bica
            Apr 25 at 9:10











          • Hopefully it helps!
            – Ryan Wildry
            Apr 25 at 13:17










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192166%2fsub-that-compares-2-columns-fast%23new-answer', 'question_page');

          );

          Post as a guest






























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          It´s worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.



          One of them is : Count distinct or unique values



          It is true that collections don´t have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection



          Another place worth checking is Get unique values from array



          Implementation (pb)




          Option Explicit

          Public Sub Compare2ColsUsingCollections()
          Dim t As Double, tr As String: t = Timer
          Dim dColA As Collection: Set dColA = New Collection
          Dim dColB As Collection: Set dColB = New Collection
          Dim ur As Range: Set ur = ActiveSheet.UsedRange
          ur.Columns("C:D").Delete
          Dim arrA As Variant: arrA = ur.Columns("A")
          Dim arrB As Variant: arrB = ur.Columns("B")
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")

          Dim itm As Variant, r As Long, s As String
          For Each itm In arrA
          s = CStr(itm)
          If IsMissing(dColA, s) Then dColA.Add s, s
          Next
          For Each itm In arrB
          s = CStr(itm)
          If IsMissing(dColB, s) Then dColB.Add s, s
          Next
          For Each itm In dColA
          r = r + 1
          If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next
          r = 0
          For Each itm In dColB
          r = r + 1
          If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
          Next
          ur.Columns("C") = arrC: ur.Columns("D") = arrD
          tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
          On Error GoTo IsMissingError
          Dim v As Variant
          v = c(fld)
          Exit Function
          IsMissingError:
          IsMissing = True
          End Function



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
          Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
          Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
          Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
          Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
          Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
          Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
          Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec



          Same results with same data:



          Dictionaries



          Dictionaries



          Collections



          Collections






          share|improve this answer























          • Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
            – paul bica
            Apr 16 at 20:18











          • @paulbica do you plan to make it generic or only for integers?
            – juvian
            Apr 16 at 20:31










          • Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
            – paul bica
            Apr 16 at 20:33










          • Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
            – paul bica
            Apr 16 at 21:00











          • @paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to long.
            – Raystafarian
            Apr 16 at 21:27














          up vote
          3
          down vote



          accepted










          It´s worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.



          One of them is : Count distinct or unique values



          It is true that collections don´t have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection



          Another place worth checking is Get unique values from array



          Implementation (pb)




          Option Explicit

          Public Sub Compare2ColsUsingCollections()
          Dim t As Double, tr As String: t = Timer
          Dim dColA As Collection: Set dColA = New Collection
          Dim dColB As Collection: Set dColB = New Collection
          Dim ur As Range: Set ur = ActiveSheet.UsedRange
          ur.Columns("C:D").Delete
          Dim arrA As Variant: arrA = ur.Columns("A")
          Dim arrB As Variant: arrB = ur.Columns("B")
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")

          Dim itm As Variant, r As Long, s As String
          For Each itm In arrA
          s = CStr(itm)
          If IsMissing(dColA, s) Then dColA.Add s, s
          Next
          For Each itm In arrB
          s = CStr(itm)
          If IsMissing(dColB, s) Then dColB.Add s, s
          Next
          For Each itm In dColA
          r = r + 1
          If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next
          r = 0
          For Each itm In dColB
          r = r + 1
          If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
          Next
          ur.Columns("C") = arrC: ur.Columns("D") = arrD
          tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
          On Error GoTo IsMissingError
          Dim v As Variant
          v = c(fld)
          Exit Function
          IsMissingError:
          IsMissing = True
          End Function



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
          Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
          Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
          Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
          Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
          Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
          Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
          Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec



          Same results with same data:



          Dictionaries



          Dictionaries



          Collections



          Collections






          share|improve this answer























          • Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
            – paul bica
            Apr 16 at 20:18











          • @paulbica do you plan to make it generic or only for integers?
            – juvian
            Apr 16 at 20:31










          • Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
            – paul bica
            Apr 16 at 20:33










          • Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
            – paul bica
            Apr 16 at 21:00











          • @paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to long.
            – Raystafarian
            Apr 16 at 21:27












          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          It´s worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.



          One of them is : Count distinct or unique values



          It is true that collections don´t have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection



          Another place worth checking is Get unique values from array



          Implementation (pb)




          Option Explicit

          Public Sub Compare2ColsUsingCollections()
          Dim t As Double, tr As String: t = Timer
          Dim dColA As Collection: Set dColA = New Collection
          Dim dColB As Collection: Set dColB = New Collection
          Dim ur As Range: Set ur = ActiveSheet.UsedRange
          ur.Columns("C:D").Delete
          Dim arrA As Variant: arrA = ur.Columns("A")
          Dim arrB As Variant: arrB = ur.Columns("B")
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")

          Dim itm As Variant, r As Long, s As String
          For Each itm In arrA
          s = CStr(itm)
          If IsMissing(dColA, s) Then dColA.Add s, s
          Next
          For Each itm In arrB
          s = CStr(itm)
          If IsMissing(dColB, s) Then dColB.Add s, s
          Next
          For Each itm In dColA
          r = r + 1
          If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next
          r = 0
          For Each itm In dColB
          r = r + 1
          If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
          Next
          ur.Columns("C") = arrC: ur.Columns("D") = arrD
          tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
          On Error GoTo IsMissingError
          Dim v As Variant
          v = c(fld)
          Exit Function
          IsMissingError:
          IsMissing = True
          End Function



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
          Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
          Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
          Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
          Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
          Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
          Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
          Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec



          Same results with same data:



          Dictionaries



          Dictionaries



          Collections



          Collections






          share|improve this answer















          It´s worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.



          One of them is : Count distinct or unique values



          It is true that collections don´t have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection



          Another place worth checking is Get unique values from array



          Implementation (pb)




          Option Explicit

          Public Sub Compare2ColsUsingCollections()
          Dim t As Double, tr As String: t = Timer
          Dim dColA As Collection: Set dColA = New Collection
          Dim dColB As Collection: Set dColB = New Collection
          Dim ur As Range: Set ur = ActiveSheet.UsedRange
          ur.Columns("C:D").Delete
          Dim arrA As Variant: arrA = ur.Columns("A")
          Dim arrB As Variant: arrB = ur.Columns("B")
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")

          Dim itm As Variant, r As Long, s As String
          For Each itm In arrA
          s = CStr(itm)
          If IsMissing(dColA, s) Then dColA.Add s, s
          Next
          For Each itm In arrB
          s = CStr(itm)
          If IsMissing(dColB, s) Then dColB.Add s, s
          Next
          For Each itm In dColA
          r = r + 1
          If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next
          r = 0
          For Each itm In dColB
          r = r + 1
          If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
          Next
          ur.Columns("C") = arrC: ur.Columns("D") = arrD
          tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
          On Error GoTo IsMissingError
          Dim v As Variant
          v = c(fld)
          Exit Function
          IsMissingError:
          IsMissing = True
          End Function



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
          Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
          Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
          Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
          Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
          Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
          Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
          Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec



          Same results with same data:



          Dictionaries



          Dictionaries



          Collections



          Collections







          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Apr 16 at 20:54


























          answered Apr 16 at 19:58









          juvian

          85838




          85838











          • Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
            – paul bica
            Apr 16 at 20:18











          • @paulbica do you plan to make it generic or only for integers?
            – juvian
            Apr 16 at 20:31










          • Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
            – paul bica
            Apr 16 at 20:33










          • Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
            – paul bica
            Apr 16 at 21:00











          • @paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to long.
            – Raystafarian
            Apr 16 at 21:27
















          • Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
            – paul bica
            Apr 16 at 20:18











          • @paulbica do you plan to make it generic or only for integers?
            – juvian
            Apr 16 at 20:31










          • Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
            – paul bica
            Apr 16 at 20:33










          • Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
            – paul bica
            Apr 16 at 21:00











          • @paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to long.
            – Raystafarian
            Apr 16 at 21:27















          Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
          – paul bica
          Apr 16 at 20:18





          Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
          – paul bica
          Apr 16 at 20:18













          @paulbica do you plan to make it generic or only for integers?
          – juvian
          Apr 16 at 20:31




          @paulbica do you plan to make it generic or only for integers?
          – juvian
          Apr 16 at 20:31












          Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
          – paul bica
          Apr 16 at 20:33




          Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
          – paul bica
          Apr 16 at 20:33












          Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
          – paul bica
          Apr 16 at 21:00





          Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
          – paul bica
          Apr 16 at 21:00













          @paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to long.
          – Raystafarian
          Apr 16 at 21:27




          @paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to long.
          – Raystafarian
          Apr 16 at 21:27












          up vote
          6
          down vote













          I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.




          Confusion



          That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange? It's a bit misleading, but that's okay. Maybe rename ur to be more clear. UserRange for instance. And why create variants of columns you just deleted?




          ur.Columns("C:D").Delete
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")



          If you need some empty variants sized, you can do that explicitly using the UBound of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?



          And I know you were going for flexible, but arrA arrB arrC arrD aren't doing me any favors. What are they? Is arrC a result of arrA? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.



          Also, what if my UsedRange doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G - the A to D will look strange to me.




          Dictionaries



          Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm to key to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.




          Populating



          When you populate your arrays here




          For Each itm In dColA
          r = r + 1
          If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next



          You're increasing the row and leaving it blank if the if condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.




          Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.






          share|improve this answer





















          • Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I Dim and I Set1 - done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA, arrB, arrC, arrD as parameters would become vArrColToCompare1, vArrColToCompare2, etc
            – paul bica
            Apr 16 at 7:02










          • The UsedRange is debatable: what if the user has empty cells .End(xlDown) will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
            – paul bica
            Apr 16 at 7:06











          • Can you think of any performance improvements at all?
            – paul bica
            Apr 16 at 7:08










          • I just meant if you're starting with usedrange then you know the last row. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
            – Raystafarian
            Apr 16 at 7:08















          up vote
          6
          down vote













          I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.




          Confusion



          That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange? It's a bit misleading, but that's okay. Maybe rename ur to be more clear. UserRange for instance. And why create variants of columns you just deleted?




          ur.Columns("C:D").Delete
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")



          If you need some empty variants sized, you can do that explicitly using the UBound of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?



          And I know you were going for flexible, but arrA arrB arrC arrD aren't doing me any favors. What are they? Is arrC a result of arrA? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.



          Also, what if my UsedRange doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G - the A to D will look strange to me.




          Dictionaries



          Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm to key to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.




          Populating



          When you populate your arrays here




          For Each itm In dColA
          r = r + 1
          If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next



          You're increasing the row and leaving it blank if the if condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.




          Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.






          share|improve this answer





















          • Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I Dim and I Set1 - done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA, arrB, arrC, arrD as parameters would become vArrColToCompare1, vArrColToCompare2, etc
            – paul bica
            Apr 16 at 7:02










          • The UsedRange is debatable: what if the user has empty cells .End(xlDown) will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
            – paul bica
            Apr 16 at 7:06











          • Can you think of any performance improvements at all?
            – paul bica
            Apr 16 at 7:08










          • I just meant if you're starting with usedrange then you know the last row. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
            – Raystafarian
            Apr 16 at 7:08













          up vote
          6
          down vote










          up vote
          6
          down vote









          I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.




          Confusion



          That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange? It's a bit misleading, but that's okay. Maybe rename ur to be more clear. UserRange for instance. And why create variants of columns you just deleted?




          ur.Columns("C:D").Delete
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")



          If you need some empty variants sized, you can do that explicitly using the UBound of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?



          And I know you were going for flexible, but arrA arrB arrC arrD aren't doing me any favors. What are they? Is arrC a result of arrA? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.



          Also, what if my UsedRange doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G - the A to D will look strange to me.




          Dictionaries



          Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm to key to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.




          Populating



          When you populate your arrays here




          For Each itm In dColA
          r = r + 1
          If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next



          You're increasing the row and leaving it blank if the if condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.




          Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.






          share|improve this answer













          I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.




          Confusion



          That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange? It's a bit misleading, but that's okay. Maybe rename ur to be more clear. UserRange for instance. And why create variants of columns you just deleted?




          ur.Columns("C:D").Delete
          Dim arrC As Variant: arrC = ur.Columns("C")
          Dim arrD As Variant: arrD = ur.Columns("D")



          If you need some empty variants sized, you can do that explicitly using the UBound of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?



          And I know you were going for flexible, but arrA arrB arrC arrD aren't doing me any favors. What are they? Is arrC a result of arrA? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.



          Also, what if my UsedRange doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G - the A to D will look strange to me.




          Dictionaries



          Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm to key to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.




          Populating



          When you populate your arrays here




          For Each itm In dColA
          r = r + 1
          If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
          Next



          You're increasing the row and leaving it blank if the if condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.




          Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Apr 16 at 6:49









          Raystafarian

          5,4331046




          5,4331046











          • Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I Dim and I Set1 - done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA, arrB, arrC, arrD as parameters would become vArrColToCompare1, vArrColToCompare2, etc
            – paul bica
            Apr 16 at 7:02










          • The UsedRange is debatable: what if the user has empty cells .End(xlDown) will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
            – paul bica
            Apr 16 at 7:06











          • Can you think of any performance improvements at all?
            – paul bica
            Apr 16 at 7:08










          • I just meant if you're starting with usedrange then you know the last row. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
            – Raystafarian
            Apr 16 at 7:08

















          • Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I Dim and I Set1 - done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA, arrB, arrC, arrD as parameters would become vArrColToCompare1, vArrColToCompare2, etc
            – paul bica
            Apr 16 at 7:02










          • The UsedRange is debatable: what if the user has empty cells .End(xlDown) will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
            – paul bica
            Apr 16 at 7:06











          • Can you think of any performance improvements at all?
            – paul bica
            Apr 16 at 7:08










          • I just meant if you're starting with usedrange then you know the last row. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
            – Raystafarian
            Apr 16 at 7:08
















          Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I Dim and I Set1 - done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA, arrB, arrC, arrD as parameters would become vArrColToCompare1, vArrColToCompare2, etc
          – paul bica
          Apr 16 at 7:02




          Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I Dim and I Set1 - done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA, arrB, arrC, arrD as parameters would become vArrColToCompare1, vArrColToCompare2, etc
          – paul bica
          Apr 16 at 7:02












          The UsedRange is debatable: what if the user has empty cells .End(xlDown) will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
          – paul bica
          Apr 16 at 7:06





          The UsedRange is debatable: what if the user has empty cells .End(xlDown) will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
          – paul bica
          Apr 16 at 7:06













          Can you think of any performance improvements at all?
          – paul bica
          Apr 16 at 7:08




          Can you think of any performance improvements at all?
          – paul bica
          Apr 16 at 7:08












          I just meant if you're starting with usedrange then you know the last row. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
          – Raystafarian
          Apr 16 at 7:08





          I just meant if you're starting with usedrange then you know the last row. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
          – Raystafarian
          Apr 16 at 7:08











          up vote
          1
          down vote













          If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs.



          Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long arrays.



          This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.




          Option Explicit

          Public Sub Compare2NumColsUsingBoolArrays()
          Dim t As Double, tr As String: t = Timer
          Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
          Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
          Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)

          Dim arr As Variant: arr = ur.Columns("A:D")
          Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
          Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
          Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
          Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
          Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
          Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
          Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
          Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)

          Dim r As Long, rC As Long, rD As Long
          For r = 1 To UBound(arr) 'Arr index = value (true/false)
          nArrA(arr(r, 1)) = True
          nArrB(arr(r, 2)) = True
          Next

          For r = minN To maxN
          If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
          rC = rC + 1: arr(rC, 3) = r
          ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
          rD = rD + 1: arr(rD, 4) = r
          End If
          Next
          ur.Columns("A:D") = arr
          tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          .



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
          Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec



          Test results




          .



          Notes:



          • Adapted code to exclude Headers, and place results back in Col C and D

          • Test results don't show the positioning of missing elements





          share|improve this answer























          • At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
            – paul bica
            Apr 25 at 9:10











          • Hopefully it helps!
            – Ryan Wildry
            Apr 25 at 13:17














          up vote
          1
          down vote













          If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs.



          Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long arrays.



          This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.




          Option Explicit

          Public Sub Compare2NumColsUsingBoolArrays()
          Dim t As Double, tr As String: t = Timer
          Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
          Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
          Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)

          Dim arr As Variant: arr = ur.Columns("A:D")
          Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
          Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
          Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
          Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
          Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
          Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
          Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
          Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)

          Dim r As Long, rC As Long, rD As Long
          For r = 1 To UBound(arr) 'Arr index = value (true/false)
          nArrA(arr(r, 1)) = True
          nArrB(arr(r, 2)) = True
          Next

          For r = minN To maxN
          If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
          rC = rC + 1: arr(rC, 3) = r
          ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
          rD = rD + 1: arr(rD, 4) = r
          End If
          Next
          ur.Columns("A:D") = arr
          tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          .



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
          Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec



          Test results




          .



          Notes:



          • Adapted code to exclude Headers, and place results back in Col C and D

          • Test results don't show the positioning of missing elements





          share|improve this answer























          • At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
            – paul bica
            Apr 25 at 9:10











          • Hopefully it helps!
            – Ryan Wildry
            Apr 25 at 13:17












          up vote
          1
          down vote










          up vote
          1
          down vote









          If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs.



          Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long arrays.



          This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.




          Option Explicit

          Public Sub Compare2NumColsUsingBoolArrays()
          Dim t As Double, tr As String: t = Timer
          Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
          Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
          Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)

          Dim arr As Variant: arr = ur.Columns("A:D")
          Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
          Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
          Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
          Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
          Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
          Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
          Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
          Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)

          Dim r As Long, rC As Long, rD As Long
          For r = 1 To UBound(arr) 'Arr index = value (true/false)
          nArrA(arr(r, 1)) = True
          nArrB(arr(r, 2)) = True
          Next

          For r = minN To maxN
          If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
          rC = rC + 1: arr(rC, 3) = r
          ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
          rD = rD + 1: arr(rD, 4) = r
          End If
          Next
          ur.Columns("A:D") = arr
          tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          .



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
          Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec



          Test results




          .



          Notes:



          • Adapted code to exclude Headers, and place results back in Col C and D

          • Test results don't show the positioning of missing elements





          share|improve this answer















          If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs.



          Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long arrays.



          This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.




          Option Explicit

          Public Sub Compare2NumColsUsingBoolArrays()
          Dim t As Double, tr As String: t = Timer
          Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
          Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
          Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)

          Dim arr As Variant: arr = ur.Columns("A:D")
          Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
          Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
          Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
          Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
          Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
          Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
          Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
          Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)

          Dim r As Long, rC As Long, rD As Long
          For r = 1 To UBound(arr) 'Arr index = value (true/false)
          nArrA(arr(r, 1)) = True
          nArrB(arr(r, 2)) = True
          Next

          For r = minN To maxN
          If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
          rC = rC + 1: arr(rC, 3) = r
          ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
          rD = rD + 1: arr(rD, 4) = r
          End If
          Next
          ur.Columns("A:D") = arr
          tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
          Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
          End Sub



          .



          Test results



          Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
          Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec



          Test results




          .



          Notes:



          • Adapted code to exclude Headers, and place results back in Col C and D

          • Test results don't show the positioning of missing elements






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Apr 25 at 8:40









          paul bica

          1,059613




          1,059613











          answered Apr 24 at 1:27









          Ryan Wildry

          211210




          211210











          • At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
            – paul bica
            Apr 25 at 9:10











          • Hopefully it helps!
            – Ryan Wildry
            Apr 25 at 13:17
















          • At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
            – paul bica
            Apr 25 at 9:10











          • Hopefully it helps!
            – Ryan Wildry
            Apr 25 at 13:17















          At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
          – paul bica
          Apr 25 at 9:10





          At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
          – paul bica
          Apr 25 at 9:10













          Hopefully it helps!
          – Ryan Wildry
          Apr 25 at 13:17




          Hopefully it helps!
          – Ryan Wildry
          Apr 25 at 13:17












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192166%2fsub-that-compares-2-columns-fast%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Greedy Best First Search implementation in Rust

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          C++11 CLH Lock Implementation