Sub that compares 2 columns fast
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
4
down vote
favorite
This is an answer to this SO question: Vba comparing 2 columns of data.
In that question the OP code is very slow (also not working properly but the problem is fixed)
I spent some time to implement a version with decent performance at 1M rows+
Note: I know Activesheet
is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet
Description
- Sets up 2 dictionaries (late-bound)
- Sets up the main UsedRange (ur) variable
- Deletes the results columns C and D
Setups 4 arrays
arrA
for the first column to be compared (Col A)arrB
for the second column to be compared (Col B)arrC
for the 1st result column C (from Col A, which shows vals not found in col B)arrD
for the 2nd result column D (from Col B, showing vals not found in col A)- (Each column can be a constant, or Optional Param with default vals)
Load data to dictionaries:
For
loop 1: adds all elements in Col A as keys, in dictionary dColAFor
loop 2: adds all elements in Col B as keys, in dictionary dColB
- Compare dictionary items
For
loop 3: for each itm in dColA checks their existence in dColB- If an item is not found, it places it in the 3th array (arrC)
For
loop 4: for each itm in dColB checks their existence in dColA- If an item is not found, it places it in the 4th array (arrD)
- At the end, it places both
arr3
andarr4
back on the range (Col C and D respectively)
Option Explicit
Public Sub Compare2Cols()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
Next
Dim r As Long
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Test results - dictionaries:
Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec
Test results - collections (suggested by @juvian, about 4 times faster for large data)
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec
Image of test results
.
Other versions (combinations of reduced For
loops) I've already tried:
.
V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3
Public Sub Compare2ColsV2()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
dColA(arrA(r, 1)) = 0
dColB(arrB(r, 1)) = 0
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
.
V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);
Public Sub Compare2ColsV3()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Any performance improvements or suggestions are appreciated
PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA)
vs. For Each itm In arrA
with no other changes - there is no noticeable difference in speed between them
performance algorithm vba excel comparative-review
add a comment |Â
up vote
4
down vote
favorite
This is an answer to this SO question: Vba comparing 2 columns of data.
In that question the OP code is very slow (also not working properly but the problem is fixed)
I spent some time to implement a version with decent performance at 1M rows+
Note: I know Activesheet
is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet
Description
- Sets up 2 dictionaries (late-bound)
- Sets up the main UsedRange (ur) variable
- Deletes the results columns C and D
Setups 4 arrays
arrA
for the first column to be compared (Col A)arrB
for the second column to be compared (Col B)arrC
for the 1st result column C (from Col A, which shows vals not found in col B)arrD
for the 2nd result column D (from Col B, showing vals not found in col A)- (Each column can be a constant, or Optional Param with default vals)
Load data to dictionaries:
For
loop 1: adds all elements in Col A as keys, in dictionary dColAFor
loop 2: adds all elements in Col B as keys, in dictionary dColB
- Compare dictionary items
For
loop 3: for each itm in dColA checks their existence in dColB- If an item is not found, it places it in the 3th array (arrC)
For
loop 4: for each itm in dColB checks their existence in dColA- If an item is not found, it places it in the 4th array (arrD)
- At the end, it places both
arr3
andarr4
back on the range (Col C and D respectively)
Option Explicit
Public Sub Compare2Cols()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
Next
Dim r As Long
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Test results - dictionaries:
Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec
Test results - collections (suggested by @juvian, about 4 times faster for large data)
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec
Image of test results
.
Other versions (combinations of reduced For
loops) I've already tried:
.
V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3
Public Sub Compare2ColsV2()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
dColA(arrA(r, 1)) = 0
dColB(arrB(r, 1)) = 0
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
.
V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);
Public Sub Compare2ColsV3()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Any performance improvements or suggestions are appreciated
PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA)
vs. For Each itm In arrA
with no other changes - there is no noticeable difference in speed between them
performance algorithm vba excel comparative-review
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
This is an answer to this SO question: Vba comparing 2 columns of data.
In that question the OP code is very slow (also not working properly but the problem is fixed)
I spent some time to implement a version with decent performance at 1M rows+
Note: I know Activesheet
is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet
Description
- Sets up 2 dictionaries (late-bound)
- Sets up the main UsedRange (ur) variable
- Deletes the results columns C and D
Setups 4 arrays
arrA
for the first column to be compared (Col A)arrB
for the second column to be compared (Col B)arrC
for the 1st result column C (from Col A, which shows vals not found in col B)arrD
for the 2nd result column D (from Col B, showing vals not found in col A)- (Each column can be a constant, or Optional Param with default vals)
Load data to dictionaries:
For
loop 1: adds all elements in Col A as keys, in dictionary dColAFor
loop 2: adds all elements in Col B as keys, in dictionary dColB
- Compare dictionary items
For
loop 3: for each itm in dColA checks their existence in dColB- If an item is not found, it places it in the 3th array (arrC)
For
loop 4: for each itm in dColB checks their existence in dColA- If an item is not found, it places it in the 4th array (arrD)
- At the end, it places both
arr3
andarr4
back on the range (Col C and D respectively)
Option Explicit
Public Sub Compare2Cols()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
Next
Dim r As Long
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Test results - dictionaries:
Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec
Test results - collections (suggested by @juvian, about 4 times faster for large data)
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec
Image of test results
.
Other versions (combinations of reduced For
loops) I've already tried:
.
V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3
Public Sub Compare2ColsV2()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
dColA(arrA(r, 1)) = 0
dColB(arrB(r, 1)) = 0
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
.
V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);
Public Sub Compare2ColsV3()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Any performance improvements or suggestions are appreciated
PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA)
vs. For Each itm In arrA
with no other changes - there is no noticeable difference in speed between them
performance algorithm vba excel comparative-review
This is an answer to this SO question: Vba comparing 2 columns of data.
In that question the OP code is very slow (also not working properly but the problem is fixed)
I spent some time to implement a version with decent performance at 1M rows+
Note: I know Activesheet
is discouraged (and I avoid it) but the Sub is quite generic (and it can be made even more so); the little utility can be convenient to users for quick checks - simply compare values on current sheet
Description
- Sets up 2 dictionaries (late-bound)
- Sets up the main UsedRange (ur) variable
- Deletes the results columns C and D
Setups 4 arrays
arrA
for the first column to be compared (Col A)arrB
for the second column to be compared (Col B)arrC
for the 1st result column C (from Col A, which shows vals not found in col B)arrD
for the 2nd result column D (from Col B, showing vals not found in col A)- (Each column can be a constant, or Optional Param with default vals)
Load data to dictionaries:
For
loop 1: adds all elements in Col A as keys, in dictionary dColAFor
loop 2: adds all elements in Col B as keys, in dictionary dColB
- Compare dictionary items
For
loop 3: for each itm in dColA checks their existence in dColB- If an item is not found, it places it in the 3th array (arrC)
For
loop 4: for each itm in dColB checks their existence in dColA- If an item is not found, it places it in the 4th array (arrD)
- At the end, it places both
arr3
andarr4
back on the range (Col C and D respectively)
Option Explicit
Public Sub Compare2Cols()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
Next
Dim r As Long
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Test results - dictionaries:
Compare2ColsUsingDictionaries() - Rows: 10,001; Time: 0.047 sec
Compare2ColsUsingDictionaries() - Rows: 20,001; Time: 0.109 sec
Compare2ColsUsingDictionaries() - Rows: 30,001; Time: 0.156 sec
Compare2ColsUsingDictionaries() - Rows: 40,001; Time: 0.234 sec
Compare2ColsUsingDictionaries() - Rows: 50,001; Time: 0.296 sec
Compare2ColsUsingDictionaries() - Rows: 100,001; Time: 1.232 sec
Compare2ColsUsingDictionaries() - Rows: 500,001; Time: 31.934 sec
Compare2ColsUsingDictionaries() - Rows: 1,048,576; Time: 126.797 sec
Test results - collections (suggested by @juvian, about 4 times faster for large data)
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.719 sec
Image of test results
.
Other versions (combinations of reduced For
loops) I've already tried:
.
V2 (combined For 1 and 2) - significant increase in execution time from 2 min to 3
Public Sub Compare2ColsV2()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For r = 1 To UBound(arrA) 'based on UsedRange (same LastRow)
dColA(arrA(r, 1)) = 0
dColB(arrB(r, 1)) = 0
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
.
V3 (combined For 2 and 4) - No improvement (slower by about 3 secs);
Public Sub Compare2ColsV3()
Dim dColA As Object: Set dColA = CreateObject("Scripting.Dictionary")
Dim dColB As Object: Set dColB = CreateObject("Scripting.Dictionary")
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long
For Each itm In arrA
dColA(itm) = 0
Next
For Each itm In arrB
dColB(itm) = 0
r = r + 1
If Not dColA.Exists(itm) Then arrD(r, 1) = itm 'Find Col 2 in 1
Next
r = 0
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC
ur.Columns("D") = arrD
End Sub
Any performance improvements or suggestions are appreciated
PS. I also tested 2 types of loops over variant arrays, using For r = 1 To UBound(arrA)
vs. For Each itm In arrA
with no other changes - there is no noticeable difference in speed between them
performance algorithm vba excel comparative-review
edited Apr 18 at 0:36
asked Apr 16 at 6:27
paul bica
1,059613
1,059613
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
3
down vote
accepted
Itôs worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.
One of them is : Count distinct or unique values
It is true that collections donôt have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection
Another place worth checking is Get unique values from array
Implementation (pb)
Option Explicit
Public Sub Compare2ColsUsingCollections()
Dim t As Double, tr As String: t = Timer
Dim dColA As Collection: Set dColA = New Collection
Dim dColB As Collection: Set dColB = New Collection
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long, s As String
For Each itm In arrA
s = CStr(itm)
If IsMissing(dColA, s) Then dColA.Add s, s
Next
For Each itm In arrB
s = CStr(itm)
If IsMissing(dColB, s) Then dColB.Add s, s
Next
For Each itm In dColA
r = r + 1
If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC: ur.Columns("D") = arrD
tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
On Error GoTo IsMissingError
Dim v As Variant
v = c(fld)
Exit Function
IsMissingError:
IsMissing = True
End Function
Test results
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Same results with same data:
Dictionaries
Collections
Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
â paul bica
Apr 16 at 20:18
@paulbica do you plan to make it generic or only for integers?
â juvian
Apr 16 at 20:31
Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
â paul bica
Apr 16 at 20:33
Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
â paul bica
Apr 16 at 21:00
@paul Integers - integers are obsolete. According to msdn VBA silently converts all integers tolong
.
â Raystafarian
Apr 16 at 21:27
 |Â
show 1 more comment
up vote
6
down vote
I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.
Confusion
That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange
? It's a bit misleading, but that's okay. Maybe rename ur
to be more clear. UserRange
for instance. And why create variants of columns you just deleted?
ur.Columns("C:D").Delete
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
If you need some empty variants sized, you can do that explicitly using the UBound
of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?
And I know you were going for flexible, but arrA arrB arrC arrD
aren't doing me any favors. What are they? Is arrC
a result of arrA
? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.
Also, what if my UsedRange
doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G
- the A
to D
will look strange to me.
Dictionaries
Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm
to key
to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.
Populating
When you populate your arrays here
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
You're increasing the row and leaving it blank if the if
condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.
Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.
Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me IDim
and ISet1
- done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of theUsedRange
). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4.arrA
,arrB
,arrC
,arrD
as parameters would becomevArrColToCompare1
,vArrColToCompare2
, etc
â paul bica
Apr 16 at 7:02
TheUsedRange
is debatable: what if the user has empty cells.End(xlDown)
will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
â paul bica
Apr 16 at 7:06
Can you think of any performance improvements at all?
â paul bica
Apr 16 at 7:08
I just meant if you're starting withusedrange
then you know thelast row
. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
â Raystafarian
Apr 16 at 7:08
add a comment |Â
up vote
1
down vote
If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs
.
Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long
arrays.
This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.
Option Explicit
Public Sub Compare2NumColsUsingBoolArrays()
Dim t As Double, tr As String: t = Timer
Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)
Dim arr As Variant: arr = ur.Columns("A:D")
Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)
Dim r As Long, rC As Long, rD As Long
For r = 1 To UBound(arr) 'Arr index = value (true/false)
nArrA(arr(r, 1)) = True
nArrB(arr(r, 2)) = True
Next
For r = minN To maxN
If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
rC = rC + 1: arr(rC, 3) = r
ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
rD = rD + 1: arr(rD, 4) = r
End If
Next
ur.Columns("A:D") = arr
tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
.
Test results
Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec
.
Notes:
- Adapted code to exclude Headers, and place results back in Col C and D
- Test results don't show the positioning of missing elements
At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
â paul bica
Apr 25 at 9:10
Hopefully it helps!
â Ryan Wildry
Apr 25 at 13:17
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Itôs worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.
One of them is : Count distinct or unique values
It is true that collections donôt have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection
Another place worth checking is Get unique values from array
Implementation (pb)
Option Explicit
Public Sub Compare2ColsUsingCollections()
Dim t As Double, tr As String: t = Timer
Dim dColA As Collection: Set dColA = New Collection
Dim dColB As Collection: Set dColB = New Collection
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long, s As String
For Each itm In arrA
s = CStr(itm)
If IsMissing(dColA, s) Then dColA.Add s, s
Next
For Each itm In arrB
s = CStr(itm)
If IsMissing(dColB, s) Then dColB.Add s, s
Next
For Each itm In dColA
r = r + 1
If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC: ur.Columns("D") = arrD
tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
On Error GoTo IsMissingError
Dim v As Variant
v = c(fld)
Exit Function
IsMissingError:
IsMissing = True
End Function
Test results
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Same results with same data:
Dictionaries
Collections
Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
â paul bica
Apr 16 at 20:18
@paulbica do you plan to make it generic or only for integers?
â juvian
Apr 16 at 20:31
Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
â paul bica
Apr 16 at 20:33
Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
â paul bica
Apr 16 at 21:00
@paul Integers - integers are obsolete. According to msdn VBA silently converts all integers tolong
.
â Raystafarian
Apr 16 at 21:27
 |Â
show 1 more comment
up vote
3
down vote
accepted
Itôs worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.
One of them is : Count distinct or unique values
It is true that collections donôt have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection
Another place worth checking is Get unique values from array
Implementation (pb)
Option Explicit
Public Sub Compare2ColsUsingCollections()
Dim t As Double, tr As String: t = Timer
Dim dColA As Collection: Set dColA = New Collection
Dim dColB As Collection: Set dColB = New Collection
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long, s As String
For Each itm In arrA
s = CStr(itm)
If IsMissing(dColA, s) Then dColA.Add s, s
Next
For Each itm In arrB
s = CStr(itm)
If IsMissing(dColB, s) Then dColB.Add s, s
Next
For Each itm In dColA
r = r + 1
If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC: ur.Columns("D") = arrD
tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
On Error GoTo IsMissingError
Dim v As Variant
v = c(fld)
Exit Function
IsMissingError:
IsMissing = True
End Function
Test results
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Same results with same data:
Dictionaries
Collections
Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
â paul bica
Apr 16 at 20:18
@paulbica do you plan to make it generic or only for integers?
â juvian
Apr 16 at 20:31
Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
â paul bica
Apr 16 at 20:33
Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
â paul bica
Apr 16 at 21:00
@paul Integers - integers are obsolete. According to msdn VBA silently converts all integers tolong
.
â Raystafarian
Apr 16 at 21:27
 |Â
show 1 more comment
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Itôs worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.
One of them is : Count distinct or unique values
It is true that collections donôt have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection
Another place worth checking is Get unique values from array
Implementation (pb)
Option Explicit
Public Sub Compare2ColsUsingCollections()
Dim t As Double, tr As String: t = Timer
Dim dColA As Collection: Set dColA = New Collection
Dim dColB As Collection: Set dColB = New Collection
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long, s As String
For Each itm In arrA
s = CStr(itm)
If IsMissing(dColA, s) Then dColA.Add s, s
Next
For Each itm In arrB
s = CStr(itm)
If IsMissing(dColB, s) Then dColB.Add s, s
Next
For Each itm In dColA
r = r + 1
If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC: ur.Columns("D") = arrD
tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
On Error GoTo IsMissingError
Dim v As Variant
v = c(fld)
Exit Function
IsMissingError:
IsMissing = True
End Function
Test results
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Same results with same data:
Dictionaries
Collections
Itôs worth a shot trying with Collections instead of Dictionary. Sounds counter intuitive but have read in several places that for big data it is in fact faster.
One of them is : Count distinct or unique values
It is true that collections donôt have an exists method, but you can take advantage that it throws an error if you try to access with a key that does not exist and make your own method: Check if key exists in collection
Another place worth checking is Get unique values from array
Implementation (pb)
Option Explicit
Public Sub Compare2ColsUsingCollections()
Dim t As Double, tr As String: t = Timer
Dim dColA As Collection: Set dColA = New Collection
Dim dColB As Collection: Set dColB = New Collection
Dim ur As Range: Set ur = ActiveSheet.UsedRange
ur.Columns("C:D").Delete
Dim arrA As Variant: arrA = ur.Columns("A")
Dim arrB As Variant: arrB = ur.Columns("B")
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
Dim itm As Variant, r As Long, s As String
For Each itm In arrA
s = CStr(itm)
If IsMissing(dColA, s) Then dColA.Add s, s
Next
For Each itm In arrB
s = CStr(itm)
If IsMissing(dColB, s) Then dColB.Add s, s
Next
For Each itm In dColA
r = r + 1
If IsMissing(dColB, itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
r = 0
For Each itm In dColB
r = r + 1
If IsMissing(dColA, itm) Then arrD(r, 1) = itm 'Find Col 1 in 2
Next
ur.Columns("C") = arrC: ur.Columns("D") = arrD
tr = "Compare2ColsUsingCollections() - Rows: " & Format(ur.Rows.Count, "#,###") & "; "
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
Private Function IsMissing(ByRef c As Collection, ByVal fld As String) As Boolean
On Error GoTo IsMissingError
Dim v As Variant
v = c(fld)
Exit Function
IsMissingError:
IsMissing = True
End Function
Test results
Compare2ColsUsingCollections() - Rows: 1,048,576; Time: 33.770 sec
Compare2ColsUsingCollections() - Rows: 500,001; Time: 15.730 sec
Compare2ColsUsingCollections() - Rows: 100,001; Time: 2.945 sec
Compare2ColsUsingCollections() - Rows: 50,001; Time: 1.410 sec
Compare2ColsUsingCollections() - Rows: 40,001; Time: 1.109 sec
Compare2ColsUsingCollections() - Rows: 30,001; Time: 0.813 sec
Compare2ColsUsingCollections() - Rows: 20,001; Time: 0.539 sec
Compare2ColsUsingCollections() - Rows: 10,001; Time: 0.262 sec
Same results with same data:
Dictionaries
Collections
edited Apr 16 at 20:54
answered Apr 16 at 19:58
juvian
85838
85838
Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
â paul bica
Apr 16 at 20:18
@paulbica do you plan to make it generic or only for integers?
â juvian
Apr 16 at 20:31
Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
â paul bica
Apr 16 at 20:33
Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
â paul bica
Apr 16 at 21:00
@paul Integers - integers are obsolete. According to msdn VBA silently converts all integers tolong
.
â Raystafarian
Apr 16 at 21:27
 |Â
show 1 more comment
Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
â paul bica
Apr 16 at 20:18
@paulbica do you plan to make it generic or only for integers?
â juvian
Apr 16 at 20:31
Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
â paul bica
Apr 16 at 20:33
Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
â paul bica
Apr 16 at 21:00
@paul Integers - integers are obsolete. According to msdn VBA silently converts all integers tolong
.
â Raystafarian
Apr 16 at 21:27
Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
â paul bica
Apr 16 at 20:18
Thanks juvian, this actually addresses the main performance issue - it's 4 times faster than dictionaries, for large data!
â paul bica
Apr 16 at 20:18
@paulbica do you plan to make it generic or only for integers?
â juvian
Apr 16 at 20:31
@paulbica do you plan to make it generic or only for integers?
â juvian
Apr 16 at 20:31
Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
â paul bica
Apr 16 at 20:33
Generic, but if you have any other suggestions for improving it specifically for Longs, you can add them to your answer as well (doesn't hurt)
â paul bica
Apr 16 at 20:33
Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
â paul bica
Apr 16 at 21:00
Great! I'll leave the question open for a few days, in case someone else can improve the performance (unlikely), so I'll probably accept it. Thanks again!
â paul bica
Apr 16 at 21:00
@paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to
long
.â Raystafarian
Apr 16 at 21:27
@paul Integers - integers are obsolete. According to msdn VBA silently converts all integers to
long
.â Raystafarian
Apr 16 at 21:27
 |Â
show 1 more comment
up vote
6
down vote
I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.
Confusion
That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange
? It's a bit misleading, but that's okay. Maybe rename ur
to be more clear. UserRange
for instance. And why create variants of columns you just deleted?
ur.Columns("C:D").Delete
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
If you need some empty variants sized, you can do that explicitly using the UBound
of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?
And I know you were going for flexible, but arrA arrB arrC arrD
aren't doing me any favors. What are they? Is arrC
a result of arrA
? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.
Also, what if my UsedRange
doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G
- the A
to D
will look strange to me.
Dictionaries
Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm
to key
to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.
Populating
When you populate your arrays here
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
You're increasing the row and leaving it blank if the if
condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.
Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.
Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me IDim
and ISet1
- done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of theUsedRange
). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4.arrA
,arrB
,arrC
,arrD
as parameters would becomevArrColToCompare1
,vArrColToCompare2
, etc
â paul bica
Apr 16 at 7:02
TheUsedRange
is debatable: what if the user has empty cells.End(xlDown)
will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
â paul bica
Apr 16 at 7:06
Can you think of any performance improvements at all?
â paul bica
Apr 16 at 7:08
I just meant if you're starting withusedrange
then you know thelast row
. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
â Raystafarian
Apr 16 at 7:08
add a comment |Â
up vote
6
down vote
I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.
Confusion
That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange
? It's a bit misleading, but that's okay. Maybe rename ur
to be more clear. UserRange
for instance. And why create variants of columns you just deleted?
ur.Columns("C:D").Delete
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
If you need some empty variants sized, you can do that explicitly using the UBound
of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?
And I know you were going for flexible, but arrA arrB arrC arrD
aren't doing me any favors. What are they? Is arrC
a result of arrA
? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.
Also, what if my UsedRange
doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G
- the A
to D
will look strange to me.
Dictionaries
Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm
to key
to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.
Populating
When you populate your arrays here
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
You're increasing the row and leaving it blank if the if
condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.
Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.
Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me IDim
and ISet1
- done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of theUsedRange
). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4.arrA
,arrB
,arrC
,arrD
as parameters would becomevArrColToCompare1
,vArrColToCompare2
, etc
â paul bica
Apr 16 at 7:02
TheUsedRange
is debatable: what if the user has empty cells.End(xlDown)
will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
â paul bica
Apr 16 at 7:06
Can you think of any performance improvements at all?
â paul bica
Apr 16 at 7:08
I just meant if you're starting withusedrange
then you know thelast row
. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
â Raystafarian
Apr 16 at 7:08
add a comment |Â
up vote
6
down vote
up vote
6
down vote
I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.
Confusion
That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange
? It's a bit misleading, but that's okay. Maybe rename ur
to be more clear. UserRange
for instance. And why create variants of columns you just deleted?
ur.Columns("C:D").Delete
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
If you need some empty variants sized, you can do that explicitly using the UBound
of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?
And I know you were going for flexible, but arrA arrB arrC arrD
aren't doing me any favors. What are they? Is arrC
a result of arrA
? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.
Also, what if my UsedRange
doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G
- the A
to D
will look strange to me.
Dictionaries
Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm
to key
to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.
Populating
When you populate your arrays here
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
You're increasing the row and leaving it blank if the if
condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.
Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.
I'll be up-front, I detest colons in VBA. I just do. I feel like it doesn't make it any more clear and it seems more cluttered, especially to those who aren't familiar with them. So, that's my statement for that, and I won't include it in the review.
Confusion
That being said, why are you using entire columns? Because you've already defined the last row based on the UsedRange
? It's a bit misleading, but that's okay. Maybe rename ur
to be more clear. UserRange
for instance. And why create variants of columns you just deleted?
ur.Columns("C:D").Delete
Dim arrC As Variant: arrC = ur.Columns("C")
Dim arrD As Variant: arrD = ur.Columns("D")
If you need some empty variants sized, you can do that explicitly using the UBound
of one of the other arrays. I know it seems excessive, but someone may expect there to be data in a variant that was set to a range, right?
And I know you were going for flexible, but arrA arrB arrC arrD
aren't doing me any favors. What are they? Is arrC
a result of arrA
? If there's a relationship, maybe make that clear. I know it's rough trying to stay generic and flexible, but I couldn't not say anything.
Also, what if my UsedRange
doesn't match your expectation? What if it's more than 4 columns, or less that 4 columns? What if it's in D:G
- the A
to D
will look strange to me.
Dictionaries
Nice work with the dictionaries. I love using them for removing duplicates and I think you did a good job. However, maybe rename itm
to key
to be more clear. I also enjoy the late binding much more than early binding, but that's my opinion.
Populating
When you populate your arrays here
For Each itm In dColA
r = r + 1
If Not dColB.Exists(itm) Then arrC(r, 1) = itm 'Find Col 1 in 2
Next
You're increasing the row and leaving it blank if the if
condition isn't met. Is that on purpose, to leave blanks? But what happens with the count of the dictionary is less than the bounds of the array? Do you need to shift up or leave blanks? It's not explicitly stated what the goal is there, in the code. I think your best bet would to create a function with a name as to what it does, pass your dictionary to it and return the new variant. Your name of the function should be able to tell us a lot.
Overall, I think this is a pretty solid procedure, but it could use some refactoring and a little bit more clarity and maybe some error handling.
answered Apr 16 at 6:49
Raystafarian
5,4331046
5,4331046
Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me IDim
and ISet1
- done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of theUsedRange
). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4.arrA
,arrB
,arrC
,arrD
as parameters would becomevArrColToCompare1
,vArrColToCompare2
, etc
â paul bica
Apr 16 at 7:02
TheUsedRange
is debatable: what if the user has empty cells.End(xlDown)
will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
â paul bica
Apr 16 at 7:06
Can you think of any performance improvements at all?
â paul bica
Apr 16 at 7:08
I just meant if you're starting withusedrange
then you know thelast row
. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
â Raystafarian
Apr 16 at 7:08
add a comment |Â
Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me IDim
and ISet1
- done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of theUsedRange
). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4.arrA
,arrB
,arrC
,arrD
as parameters would becomevArrColToCompare1
,vArrColToCompare2
, etc
â paul bica
Apr 16 at 7:02
TheUsedRange
is debatable: what if the user has empty cells.End(xlDown)
will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)
â paul bica
Apr 16 at 7:06
Can you think of any performance improvements at all?
â paul bica
Apr 16 at 7:08
I just meant if you're starting withusedrange
then you know thelast row
. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.
â Raystafarian
Apr 16 at 7:08
Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I
Dim
and I Set1
- done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange
). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA
, arrB
, arrC
, arrD
as parameters would become vArrColToCompare1
, vArrColToCompare2
, etcâ paul bica
Apr 16 at 7:02
Wow! Thanks for the quick response! To answer your point: Colons are a personal preference (I know they're not favored) but I use them extremely carefully for visual alignment - to me I
Dim
and I Set1
- done (I know where to look fast). 2. why are you using entire columns? - Where am I using entire columns (I'm using the entire columns of the UsedRange
). 3. And why create variants of columns you just deleted those are the results columns - they need to be empty before every run. 4. arrA
, arrB
, arrC
, arrD
as parameters would become vArrColToCompare1
, vArrColToCompare2
, etcâ paul bica
Apr 16 at 7:02
The
UsedRange
is debatable: what if the user has empty cells .End(xlDown)
will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)â paul bica
Apr 16 at 7:06
The
UsedRange
is debatable: what if the user has empty cells .End(xlDown)
will eliminate the rest of the cells bellow. This utility is not fully generic yet - I just wanted to see if I can optimize it first. Lastly: You're increasing the row and leaving it blank if the if condition isn't met - that's my intent - I meant to show the position where the value was not found in the other column; But what happens with the count of the dictionary is less than the bounds of the array? good point - I need to fix this... (I was moving between dictionaries and arrays quite a bit)â paul bica
Apr 16 at 7:06
Can you think of any performance improvements at all?
â paul bica
Apr 16 at 7:08
Can you think of any performance improvements at all?
â paul bica
Apr 16 at 7:08
I just meant if you're starting with
usedrange
then you know the last row
. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.â Raystafarian
Apr 16 at 7:08
I just meant if you're starting with
usedrange
then you know the last row
. And it looks like you're using entire columns, I know you aren't. The only thing I might do differently is use one variant instead of 4, but I'm not sure that would be much of an improvement. Maybe there's a better algorithm, but I don't know it.â Raystafarian
Apr 16 at 7:08
add a comment |Â
up vote
1
down vote
If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs
.
Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long
arrays.
This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.
Option Explicit
Public Sub Compare2NumColsUsingBoolArrays()
Dim t As Double, tr As String: t = Timer
Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)
Dim arr As Variant: arr = ur.Columns("A:D")
Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)
Dim r As Long, rC As Long, rD As Long
For r = 1 To UBound(arr) 'Arr index = value (true/false)
nArrA(arr(r, 1)) = True
nArrB(arr(r, 2)) = True
Next
For r = minN To maxN
If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
rC = rC + 1: arr(rC, 3) = r
ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
rD = rD + 1: arr(rD, 4) = r
End If
Next
ur.Columns("A:D") = arr
tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
.
Test results
Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec
.
Notes:
- Adapted code to exclude Headers, and place results back in Col C and D
- Test results don't show the positioning of missing elements
At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
â paul bica
Apr 25 at 9:10
Hopefully it helps!
â Ryan Wildry
Apr 25 at 13:17
add a comment |Â
up vote
1
down vote
If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs
.
Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long
arrays.
This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.
Option Explicit
Public Sub Compare2NumColsUsingBoolArrays()
Dim t As Double, tr As String: t = Timer
Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)
Dim arr As Variant: arr = ur.Columns("A:D")
Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)
Dim r As Long, rC As Long, rD As Long
For r = 1 To UBound(arr) 'Arr index = value (true/false)
nArrA(arr(r, 1)) = True
nArrB(arr(r, 2)) = True
Next
For r = minN To maxN
If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
rC = rC + 1: arr(rC, 3) = r
ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
rD = rD + 1: arr(rD, 4) = r
End If
Next
ur.Columns("A:D") = arr
tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
.
Test results
Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec
.
Notes:
- Adapted code to exclude Headers, and place results back in Col C and D
- Test results don't show the positioning of missing elements
At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
â paul bica
Apr 25 at 9:10
Hopefully it helps!
â Ryan Wildry
Apr 25 at 13:17
add a comment |Â
up vote
1
down vote
up vote
1
down vote
If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs
.
Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long
arrays.
This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.
Option Explicit
Public Sub Compare2NumColsUsingBoolArrays()
Dim t As Double, tr As String: t = Timer
Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)
Dim arr As Variant: arr = ur.Columns("A:D")
Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)
Dim r As Long, rC As Long, rD As Long
For r = 1 To UBound(arr) 'Arr index = value (true/false)
nArrA(arr(r, 1)) = True
nArrB(arr(r, 2)) = True
Next
For r = minN To maxN
If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
rC = rC + 1: arr(rC, 3) = r
ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
rD = rD + 1: arr(rD, 4) = r
End If
Next
ur.Columns("A:D") = arr
tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
.
Test results
Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec
.
Notes:
- Adapted code to exclude Headers, and place results back in Col C and D
- Test results don't show the positioning of missing elements
If you are looking for fast performance and you are only dealing with countable numbers (e.g. Longs), this approach would be considerably faster. My code below assumes you'd only be working with Longs
.
Basically, the way this works is enumerating all possible numbers you expect to see, then flips them to true when it's been seen. I do this over each range, then dump the results to Long
arrays.
This really only works if you know what the min/max range you'd be working in, so it has limited usefulness, however it might be a good fit for your specific use case.
Option Explicit
Public Sub Compare2NumColsUsingBoolArrays()
Dim t As Double, tr As String: t = Timer
Dim appFuncs As WorksheetFunction: Set appFuncs = Application.WorksheetFunction
Dim ur As Range: Set ur = ActiveSheet.UsedRange: ur.Columns("C:D").Delete
Set ur = ur.Resize(ur.Rows.Count - 1): Set ur = ur.Offset(1)
Dim arr As Variant: arr = ur.Columns("A:D")
Dim lbA As Long: lbA = appFuncs.Min(ur.Columns("A"))
Dim ubA As Long: ubA = appFuncs.Max(ur.Columns("A"))
Dim lbB As Long: lbB = appFuncs.Min(ur.Columns("B"))
Dim ubB As Long: ubB = appFuncs.Max(ur.Columns("B"))
Dim minN As Long: minN = IIf(lbA < lbB, lbA, lbB)
Dim maxN As Long: maxN = IIf(ubA > ubB, ubA, ubB)
Dim nArrA() As Boolean: ReDim nArrA(minN To maxN)
Dim nArrB() As Boolean: ReDim nArrB(minN To maxN)
Dim r As Long, rC As Long, rD As Long
For r = 1 To UBound(arr) 'Arr index = value (true/false)
nArrA(arr(r, 1)) = True
nArrB(arr(r, 2)) = True
Next
For r = minN To maxN
If nArrA(r) And Not nArrB(r) Then 'In ColA, not in ColB
rC = rC + 1: arr(rC, 3) = r
ElseIf Not nArrA(r) And nArrB(r) Then 'In ColB, not in ColA
rD = rD + 1: arr(rD, 4) = r
End If
Next
ur.Columns("A:D") = arr
tr = "Compare2NumColsUsingBoolArrays() - Rows:" & Format(ur.Rows.Count, "#,###") & ";"
Debug.Print tr & "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
.
Test results
Compare2ColsUsingCollections() - Rows: 1,048,575; Time: 32.563 sec
Compare2NumColsUsingBoolArrays() - Rows: 1,048,575; Time: 3.139 sec
.
Notes:
- Adapted code to exclude Headers, and place results back in Col C and D
- Test results don't show the positioning of missing elements
edited Apr 25 at 8:40
paul bica
1,059613
1,059613
answered Apr 24 at 1:27
Ryan Wildry
211210
211210
At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
â paul bica
Apr 25 at 9:10
Hopefully it helps!
â Ryan Wildry
Apr 25 at 13:17
add a comment |Â
At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
â paul bica
Apr 25 at 9:10
Hopefully it helps!
â Ryan Wildry
Apr 25 at 13:17
At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
â paul bica
Apr 25 at 9:10
At first I was a bit skeptic, but for strictly comparing Longs, this is an interesting approach, and also very fast - 3 seconds! You have my vote. I edited your answer to determine the lower and upper bounds of the boolean arrays dynamically (it accepts negatives as well), and place the results back on the Range. I also tested it with the same data set and included the result measurement
â paul bica
Apr 25 at 9:10
Hopefully it helps!
â Ryan Wildry
Apr 25 at 13:17
Hopefully it helps!
â Ryan Wildry
Apr 25 at 13:17
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192166%2fsub-that-compares-2-columns-fast%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password