Macro to convert an indented hierarchy to a database format
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
4
down vote
favorite
Our business intelligence suite outputs data as an indented hierarchy, as below:
Level 1 1000
Level 2a 600
Level 3a 500
Level 3b 100
Level 2b 400
Level 3c 400
I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):
1 2 3 #
Level 1 Level 2a Level 3a 500
Level 1 Level 2a Level 3b 100
Level 1 Level 2b Level 3c 400
The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:
Sub Database()
Application.ScreenUpdating = False
Dim WS As Worksheet
Dim SR As Range
Dim Rows As Integer
Dim Indent As Integer
Dim TR As Integer
Dim BR As Integer
Set WS = ActiveWorkbook.ActiveSheet
'StartCell is a function that returns the address of the first cell in the hierarchy
Set SR = WS.Range(StartCell())
Rows = SR.End(xlDown).Row - SR.Row
BR = SR.End(xlDown).Row
TR = SR.Row
'Insert 4 columns & add headers (Level 1, Level 2, etc.)
For x = 0 To 3
SR.EntireColumn.Insert
SR.Offset(-1, -1) = "Level " & x + 3
Next x
x = 0
q = 0
'The main code
Do While x < Rows + 1
'Identifies a row with base-level indentation & sets indent value to this level
If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
Indent = SR.Offset(x, 0).IndentLevel
End If
i = 0
'Loop while the indentation level is greater than one
Do While Indent > 1
'Move upwards and check whether indentation of new cell is one less than initial cell
If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
'Set new indent level - the next loop will now look for the parent of the new cell
Indent = SR.Offset(x - i, 0).IndentLevel
'If indent level is not one less than initial cell, continue moving upward
Else: i = i + 1
End If
Loop
x = x + 1
Loop
'Remove all rows that are not base-level
For q = BR To TR Step -1
If WS.Cells(q, 6).IndentLevel <> 5 Then
WS.Cells(q, 6).EntireRow.Delete
End If
Next q
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
End Sub
performance vba excel tree
add a comment |Â
up vote
4
down vote
favorite
Our business intelligence suite outputs data as an indented hierarchy, as below:
Level 1 1000
Level 2a 600
Level 3a 500
Level 3b 100
Level 2b 400
Level 3c 400
I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):
1 2 3 #
Level 1 Level 2a Level 3a 500
Level 1 Level 2a Level 3b 100
Level 1 Level 2b Level 3c 400
The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:
Sub Database()
Application.ScreenUpdating = False
Dim WS As Worksheet
Dim SR As Range
Dim Rows As Integer
Dim Indent As Integer
Dim TR As Integer
Dim BR As Integer
Set WS = ActiveWorkbook.ActiveSheet
'StartCell is a function that returns the address of the first cell in the hierarchy
Set SR = WS.Range(StartCell())
Rows = SR.End(xlDown).Row - SR.Row
BR = SR.End(xlDown).Row
TR = SR.Row
'Insert 4 columns & add headers (Level 1, Level 2, etc.)
For x = 0 To 3
SR.EntireColumn.Insert
SR.Offset(-1, -1) = "Level " & x + 3
Next x
x = 0
q = 0
'The main code
Do While x < Rows + 1
'Identifies a row with base-level indentation & sets indent value to this level
If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
Indent = SR.Offset(x, 0).IndentLevel
End If
i = 0
'Loop while the indentation level is greater than one
Do While Indent > 1
'Move upwards and check whether indentation of new cell is one less than initial cell
If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
'Set new indent level - the next loop will now look for the parent of the new cell
Indent = SR.Offset(x - i, 0).IndentLevel
'If indent level is not one less than initial cell, continue moving upward
Else: i = i + 1
End If
Loop
x = x + 1
Loop
'Remove all rows that are not base-level
For q = BR To TR Step -1
If WS.Cells(q, 6).IndentLevel <> 5 Then
WS.Cells(q, 6).EntireRow.Delete
End If
Next q
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
End Sub
performance vba excel tree
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
Our business intelligence suite outputs data as an indented hierarchy, as below:
Level 1 1000
Level 2a 600
Level 3a 500
Level 3b 100
Level 2b 400
Level 3c 400
I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):
1 2 3 #
Level 1 Level 2a Level 3a 500
Level 1 Level 2a Level 3b 100
Level 1 Level 2b Level 3c 400
The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:
Sub Database()
Application.ScreenUpdating = False
Dim WS As Worksheet
Dim SR As Range
Dim Rows As Integer
Dim Indent As Integer
Dim TR As Integer
Dim BR As Integer
Set WS = ActiveWorkbook.ActiveSheet
'StartCell is a function that returns the address of the first cell in the hierarchy
Set SR = WS.Range(StartCell())
Rows = SR.End(xlDown).Row - SR.Row
BR = SR.End(xlDown).Row
TR = SR.Row
'Insert 4 columns & add headers (Level 1, Level 2, etc.)
For x = 0 To 3
SR.EntireColumn.Insert
SR.Offset(-1, -1) = "Level " & x + 3
Next x
x = 0
q = 0
'The main code
Do While x < Rows + 1
'Identifies a row with base-level indentation & sets indent value to this level
If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
Indent = SR.Offset(x, 0).IndentLevel
End If
i = 0
'Loop while the indentation level is greater than one
Do While Indent > 1
'Move upwards and check whether indentation of new cell is one less than initial cell
If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
'Set new indent level - the next loop will now look for the parent of the new cell
Indent = SR.Offset(x - i, 0).IndentLevel
'If indent level is not one less than initial cell, continue moving upward
Else: i = i + 1
End If
Loop
x = x + 1
Loop
'Remove all rows that are not base-level
For q = BR To TR Step -1
If WS.Cells(q, 6).IndentLevel <> 5 Then
WS.Cells(q, 6).EntireRow.Delete
End If
Next q
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
End Sub
performance vba excel tree
Our business intelligence suite outputs data as an indented hierarchy, as below:
Level 1 1000
Level 2a 600
Level 3a 500
Level 3b 100
Level 2b 400
Level 3c 400
I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):
1 2 3 #
Level 1 Level 2a Level 3a 500
Level 1 Level 2a Level 3b 100
Level 1 Level 2b Level 3c 400
The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:
Sub Database()
Application.ScreenUpdating = False
Dim WS As Worksheet
Dim SR As Range
Dim Rows As Integer
Dim Indent As Integer
Dim TR As Integer
Dim BR As Integer
Set WS = ActiveWorkbook.ActiveSheet
'StartCell is a function that returns the address of the first cell in the hierarchy
Set SR = WS.Range(StartCell())
Rows = SR.End(xlDown).Row - SR.Row
BR = SR.End(xlDown).Row
TR = SR.Row
'Insert 4 columns & add headers (Level 1, Level 2, etc.)
For x = 0 To 3
SR.EntireColumn.Insert
SR.Offset(-1, -1) = "Level " & x + 3
Next x
x = 0
q = 0
'The main code
Do While x < Rows + 1
'Identifies a row with base-level indentation & sets indent value to this level
If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
Indent = SR.Offset(x, 0).IndentLevel
End If
i = 0
'Loop while the indentation level is greater than one
Do While Indent > 1
'Move upwards and check whether indentation of new cell is one less than initial cell
If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
'Set new indent level - the next loop will now look for the parent of the new cell
Indent = SR.Offset(x - i, 0).IndentLevel
'If indent level is not one less than initial cell, continue moving upward
Else: i = i + 1
End If
Loop
x = x + 1
Loop
'Remove all rows that are not base-level
For q = BR To TR Step -1
If WS.Cells(q, 6).IndentLevel <> 5 Then
WS.Cells(q, 6).EntireRow.Delete
End If
Next q
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
End Sub
performance vba excel tree
edited Aug 3 at 9:12
200_success
123k14143398
123k14143398
asked Aug 2 at 15:16
Theodore Rapanu
211
211
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:
Region Base Level Account 1 Account 2
USA Level 1 500 800
USA Level 2a 300 400
USA Level 2b 200 400
Here is my new code based on JNevill's framework:
Sub HierarchyConvert()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim WS As Worksheet
Dim SR As Range
Dim LastRow As Long
Dim rngReadCell As Range
Dim rngWriteRow As Range
Dim Indent As Integer
Dim LastIndent As Integer
Dim MaxIndent As Integer
Dim ValueArray(0 To 19) As Variant
Set WS = ActiveWorkbook.ActiveSheet
Set SR = WS.Range(StartCell())
LastRow = SR.End(xlDown).Row
MaxIndent = 5
Set rngWriteRow = WS.Rows(SR.Row)
For x = 0 To 4
SR.Offset(0, 1).EntireColumn.Insert
SR.Offset(-1, 1) = "Level " & 7 - x
Next x
SR.Offset(-1, 0) = "Level 2"
SR.Offset(-1, 5) = "PC"
For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
Indent = rngReadCell.IndentLevel
If Indent <= LastIndent And LastIndent <> 0 Then
Set rngWriteRow = rngWriteRow.Offset(1)
For i = 1 To Indent
rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
Next i
End If
rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
If Indent = MaxIndent Then
'Copies leftmost header from base-level row to top left of write-row
rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
'Copies data to right of base-level row to the write-row
For Z = 0 To 19
ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
Next Z
For M = 0 To 19
rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
Next M
End If
LastIndent = Indent
Next rngReadCell
Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
add a comment |Â
up vote
0
down vote
In reviewing your code first, there are several things you can do to make your code more consistent.
- Always use
Option Explicit
. Please. - When you're looking at performance, you can do more than just disable
ScreenUpdating
. See this answer for my usual solution if you feel you still need it. - Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.
- Your "main code" loop uses
Rows
, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.
As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
' handles the multi-cell case and will return the indent
' level of ONLY the top left cell of the range
GetIndent = target.Resize(1, 1).IndentLevel
End Function
Using this function in the first column to the right of your data (=GetIndent(A1)
) now turns my source data into this:
I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.
A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const
declarations to keep track of column numbers, such as
Const COL_FIRST_NAME As Long = 1
Const COL_LAST_NAME As Long = 2
Const COL_ADDRESS As Long = 3
And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum
to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
You'll see how they are used below.
First get your data into your memory-based array:
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const
. You could use a WorksheetFunction
to accomplish the same thing if you'd prefer.
In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
And finally, it's a quick copy to get the database array out to the destination:
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
This runs very fast. Here's the entire module:
Option Explicit
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
GetIndent = target.IndentLevel
End Function
Sub ConvertToDatabase()
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
'--- we can determine how many rows in the destination database
' by getting a count of the highest indent level in the array
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
'--- establish an empty database
Dim dstData() As Variant
ReDim dstData(1 To maxDBRows, 1 To 4)
'--- load up the database
Dim level1 As String
Dim level2 As String
Dim level3 As String
Dim newDBRow As Long
newDBRow = 1
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
'--- finally copy the array out to the destination
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
End Sub
Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g.COL_SOMETHING1_SOMETHING2
) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
â AJD
Aug 3 at 21:05
1
With yourIndentLevel
UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
â AJD
Aug 3 at 21:07
You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
â PeterT
yesterday
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:
Region Base Level Account 1 Account 2
USA Level 1 500 800
USA Level 2a 300 400
USA Level 2b 200 400
Here is my new code based on JNevill's framework:
Sub HierarchyConvert()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim WS As Worksheet
Dim SR As Range
Dim LastRow As Long
Dim rngReadCell As Range
Dim rngWriteRow As Range
Dim Indent As Integer
Dim LastIndent As Integer
Dim MaxIndent As Integer
Dim ValueArray(0 To 19) As Variant
Set WS = ActiveWorkbook.ActiveSheet
Set SR = WS.Range(StartCell())
LastRow = SR.End(xlDown).Row
MaxIndent = 5
Set rngWriteRow = WS.Rows(SR.Row)
For x = 0 To 4
SR.Offset(0, 1).EntireColumn.Insert
SR.Offset(-1, 1) = "Level " & 7 - x
Next x
SR.Offset(-1, 0) = "Level 2"
SR.Offset(-1, 5) = "PC"
For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
Indent = rngReadCell.IndentLevel
If Indent <= LastIndent And LastIndent <> 0 Then
Set rngWriteRow = rngWriteRow.Offset(1)
For i = 1 To Indent
rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
Next i
End If
rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
If Indent = MaxIndent Then
'Copies leftmost header from base-level row to top left of write-row
rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
'Copies data to right of base-level row to the write-row
For Z = 0 To 19
ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
Next Z
For M = 0 To 19
rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
Next M
End If
LastIndent = Indent
Next rngReadCell
Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
add a comment |Â
up vote
2
down vote
Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:
Region Base Level Account 1 Account 2
USA Level 1 500 800
USA Level 2a 300 400
USA Level 2b 200 400
Here is my new code based on JNevill's framework:
Sub HierarchyConvert()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim WS As Worksheet
Dim SR As Range
Dim LastRow As Long
Dim rngReadCell As Range
Dim rngWriteRow As Range
Dim Indent As Integer
Dim LastIndent As Integer
Dim MaxIndent As Integer
Dim ValueArray(0 To 19) As Variant
Set WS = ActiveWorkbook.ActiveSheet
Set SR = WS.Range(StartCell())
LastRow = SR.End(xlDown).Row
MaxIndent = 5
Set rngWriteRow = WS.Rows(SR.Row)
For x = 0 To 4
SR.Offset(0, 1).EntireColumn.Insert
SR.Offset(-1, 1) = "Level " & 7 - x
Next x
SR.Offset(-1, 0) = "Level 2"
SR.Offset(-1, 5) = "PC"
For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
Indent = rngReadCell.IndentLevel
If Indent <= LastIndent And LastIndent <> 0 Then
Set rngWriteRow = rngWriteRow.Offset(1)
For i = 1 To Indent
rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
Next i
End If
rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
If Indent = MaxIndent Then
'Copies leftmost header from base-level row to top left of write-row
rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
'Copies data to right of base-level row to the write-row
For Z = 0 To 19
ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
Next Z
For M = 0 To 19
rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
Next M
End If
LastIndent = Indent
Next rngReadCell
Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:
Region Base Level Account 1 Account 2
USA Level 1 500 800
USA Level 2a 300 400
USA Level 2b 200 400
Here is my new code based on JNevill's framework:
Sub HierarchyConvert()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim WS As Worksheet
Dim SR As Range
Dim LastRow As Long
Dim rngReadCell As Range
Dim rngWriteRow As Range
Dim Indent As Integer
Dim LastIndent As Integer
Dim MaxIndent As Integer
Dim ValueArray(0 To 19) As Variant
Set WS = ActiveWorkbook.ActiveSheet
Set SR = WS.Range(StartCell())
LastRow = SR.End(xlDown).Row
MaxIndent = 5
Set rngWriteRow = WS.Rows(SR.Row)
For x = 0 To 4
SR.Offset(0, 1).EntireColumn.Insert
SR.Offset(-1, 1) = "Level " & 7 - x
Next x
SR.Offset(-1, 0) = "Level 2"
SR.Offset(-1, 5) = "PC"
For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
Indent = rngReadCell.IndentLevel
If Indent <= LastIndent And LastIndent <> 0 Then
Set rngWriteRow = rngWriteRow.Offset(1)
For i = 1 To Indent
rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
Next i
End If
rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
If Indent = MaxIndent Then
'Copies leftmost header from base-level row to top left of write-row
rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
'Copies data to right of base-level row to the write-row
For Z = 0 To 19
ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
Next Z
For M = 0 To 19
rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
Next M
End If
LastIndent = Indent
Next rngReadCell
Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:
Region Base Level Account 1 Account 2
USA Level 1 500 800
USA Level 2a 300 400
USA Level 2b 200 400
Here is my new code based on JNevill's framework:
Sub HierarchyConvert()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Dim WS As Worksheet
Dim SR As Range
Dim LastRow As Long
Dim rngReadCell As Range
Dim rngWriteRow As Range
Dim Indent As Integer
Dim LastIndent As Integer
Dim MaxIndent As Integer
Dim ValueArray(0 To 19) As Variant
Set WS = ActiveWorkbook.ActiveSheet
Set SR = WS.Range(StartCell())
LastRow = SR.End(xlDown).Row
MaxIndent = 5
Set rngWriteRow = WS.Rows(SR.Row)
For x = 0 To 4
SR.Offset(0, 1).EntireColumn.Insert
SR.Offset(-1, 1) = "Level " & 7 - x
Next x
SR.Offset(-1, 0) = "Level 2"
SR.Offset(-1, 5) = "PC"
For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
Indent = rngReadCell.IndentLevel
If Indent <= LastIndent And LastIndent <> 0 Then
Set rngWriteRow = rngWriteRow.Offset(1)
For i = 1 To Indent
rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
Next i
End If
rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
If Indent = MaxIndent Then
'Copies leftmost header from base-level row to top left of write-row
rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
'Copies data to right of base-level row to the write-row
For Z = 0 To 19
ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
Next Z
For M = 0 To 19
rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
Next M
End If
LastIndent = Indent
Next rngReadCell
Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
WS.UsedRange.IndentLevel = 0
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
answered Aug 3 at 15:05
Theodore Rapanu
211
211
add a comment |Â
add a comment |Â
up vote
0
down vote
In reviewing your code first, there are several things you can do to make your code more consistent.
- Always use
Option Explicit
. Please. - When you're looking at performance, you can do more than just disable
ScreenUpdating
. See this answer for my usual solution if you feel you still need it. - Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.
- Your "main code" loop uses
Rows
, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.
As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
' handles the multi-cell case and will return the indent
' level of ONLY the top left cell of the range
GetIndent = target.Resize(1, 1).IndentLevel
End Function
Using this function in the first column to the right of your data (=GetIndent(A1)
) now turns my source data into this:
I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.
A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const
declarations to keep track of column numbers, such as
Const COL_FIRST_NAME As Long = 1
Const COL_LAST_NAME As Long = 2
Const COL_ADDRESS As Long = 3
And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum
to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
You'll see how they are used below.
First get your data into your memory-based array:
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const
. You could use a WorksheetFunction
to accomplish the same thing if you'd prefer.
In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
And finally, it's a quick copy to get the database array out to the destination:
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
This runs very fast. Here's the entire module:
Option Explicit
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
GetIndent = target.IndentLevel
End Function
Sub ConvertToDatabase()
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
'--- we can determine how many rows in the destination database
' by getting a count of the highest indent level in the array
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
'--- establish an empty database
Dim dstData() As Variant
ReDim dstData(1 To maxDBRows, 1 To 4)
'--- load up the database
Dim level1 As String
Dim level2 As String
Dim level3 As String
Dim newDBRow As Long
newDBRow = 1
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
'--- finally copy the array out to the destination
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
End Sub
Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g.COL_SOMETHING1_SOMETHING2
) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
â AJD
Aug 3 at 21:05
1
With yourIndentLevel
UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
â AJD
Aug 3 at 21:07
You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
â PeterT
yesterday
add a comment |Â
up vote
0
down vote
In reviewing your code first, there are several things you can do to make your code more consistent.
- Always use
Option Explicit
. Please. - When you're looking at performance, you can do more than just disable
ScreenUpdating
. See this answer for my usual solution if you feel you still need it. - Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.
- Your "main code" loop uses
Rows
, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.
As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
' handles the multi-cell case and will return the indent
' level of ONLY the top left cell of the range
GetIndent = target.Resize(1, 1).IndentLevel
End Function
Using this function in the first column to the right of your data (=GetIndent(A1)
) now turns my source data into this:
I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.
A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const
declarations to keep track of column numbers, such as
Const COL_FIRST_NAME As Long = 1
Const COL_LAST_NAME As Long = 2
Const COL_ADDRESS As Long = 3
And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum
to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
You'll see how they are used below.
First get your data into your memory-based array:
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const
. You could use a WorksheetFunction
to accomplish the same thing if you'd prefer.
In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
And finally, it's a quick copy to get the database array out to the destination:
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
This runs very fast. Here's the entire module:
Option Explicit
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
GetIndent = target.IndentLevel
End Function
Sub ConvertToDatabase()
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
'--- we can determine how many rows in the destination database
' by getting a count of the highest indent level in the array
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
'--- establish an empty database
Dim dstData() As Variant
ReDim dstData(1 To maxDBRows, 1 To 4)
'--- load up the database
Dim level1 As String
Dim level2 As String
Dim level3 As String
Dim newDBRow As Long
newDBRow = 1
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
'--- finally copy the array out to the destination
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
End Sub
Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g.COL_SOMETHING1_SOMETHING2
) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
â AJD
Aug 3 at 21:05
1
With yourIndentLevel
UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
â AJD
Aug 3 at 21:07
You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
â PeterT
yesterday
add a comment |Â
up vote
0
down vote
up vote
0
down vote
In reviewing your code first, there are several things you can do to make your code more consistent.
- Always use
Option Explicit
. Please. - When you're looking at performance, you can do more than just disable
ScreenUpdating
. See this answer for my usual solution if you feel you still need it. - Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.
- Your "main code" loop uses
Rows
, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.
As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
' handles the multi-cell case and will return the indent
' level of ONLY the top left cell of the range
GetIndent = target.Resize(1, 1).IndentLevel
End Function
Using this function in the first column to the right of your data (=GetIndent(A1)
) now turns my source data into this:
I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.
A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const
declarations to keep track of column numbers, such as
Const COL_FIRST_NAME As Long = 1
Const COL_LAST_NAME As Long = 2
Const COL_ADDRESS As Long = 3
And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum
to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
You'll see how they are used below.
First get your data into your memory-based array:
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const
. You could use a WorksheetFunction
to accomplish the same thing if you'd prefer.
In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
And finally, it's a quick copy to get the database array out to the destination:
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
This runs very fast. Here's the entire module:
Option Explicit
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
GetIndent = target.IndentLevel
End Function
Sub ConvertToDatabase()
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
'--- we can determine how many rows in the destination database
' by getting a count of the highest indent level in the array
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
'--- establish an empty database
Dim dstData() As Variant
ReDim dstData(1 To maxDBRows, 1 To 4)
'--- load up the database
Dim level1 As String
Dim level2 As String
Dim level3 As String
Dim newDBRow As Long
newDBRow = 1
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
'--- finally copy the array out to the destination
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
End Sub
In reviewing your code first, there are several things you can do to make your code more consistent.
- Always use
Option Explicit
. Please. - When you're looking at performance, you can do more than just disable
ScreenUpdating
. See this answer for my usual solution if you feel you still need it. - Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.
- Your "main code" loop uses
Rows
, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.
As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
' handles the multi-cell case and will return the indent
' level of ONLY the top left cell of the range
GetIndent = target.Resize(1, 1).IndentLevel
End Function
Using this function in the first column to the right of your data (=GetIndent(A1)
) now turns my source data into this:
I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.
A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const
declarations to keep track of column numbers, such as
Const COL_FIRST_NAME As Long = 1
Const COL_LAST_NAME As Long = 2
Const COL_ADDRESS As Long = 3
And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum
to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
You'll see how they are used below.
First get your data into your memory-based array:
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const
. You could use a WorksheetFunction
to accomplish the same thing if you'd prefer.
In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
And finally, it's a quick copy to get the database array out to the destination:
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
This runs very fast. Here's the entire module:
Option Explicit
'--- convenience declarations for accessing data columns
Private Enum SrcColumns
ID = 1
Number = 2
Indent = 3
End Enum
Private Enum DstColumns
L1 = 1
L2 = 2
L3 = 3
Number = 4
End Enum
Public Function GetIndent(ByRef target As Range) As Long
'--- UDF to return the numeric indent level of the target cell
GetIndent = target.IndentLevel
End Function
Sub ConvertToDatabase()
Dim srcWS As Worksheet
Dim dstWS As Worksheet
Set srcWS = ThisWorkbook.Sheets("Sheet1")
Set dstWS = ThisWorkbook.Sheets("Sheet2")
'--- get our source data into an array
Dim srcRange As Range
Dim srcData As Variant
Set srcRange = srcWS.UsedRange
srcData = srcRange
'--- we can determine how many rows in the destination database
' by getting a count of the highest indent level in the array
Const MAX_LEVEL = 2
Dim i As Long
Dim maxDBRows As Long
For i = 1 To UBound(srcData, 1)
If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
maxDBRows = maxDBRows + 1
End If
Next i
'--- establish an empty database
Dim dstData() As Variant
ReDim dstData(1 To maxDBRows, 1 To 4)
'--- load up the database
Dim level1 As String
Dim level2 As String
Dim level3 As String
Dim newDBRow As Long
newDBRow = 1
For i = 1 To UBound(srcData, 1)
Select Case srcData(i, SrcColumns.Indent)
Case 0
level1 = srcData(i, SrcColumns.ID)
Case 1
level2 = srcData(i, SrcColumns.ID)
Case 2
level3 = srcData(i, SrcColumns.ID)
dstData(newDBRow, DstColumns.L1) = level1
dstData(newDBRow, DstColumns.L2) = level2
dstData(newDBRow, DstColumns.L3) = level3
dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
newDBRow = newDBRow + 1
End Select
Next i
'--- finally copy the array out to the destination
Dim dstRange As Range
Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
dstRange = dstData
End Sub
edited yesterday
answered Aug 3 at 15:02
PeterT
1,153210
1,153210
Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g.COL_SOMETHING1_SOMETHING2
) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
â AJD
Aug 3 at 21:05
1
With yourIndentLevel
UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
â AJD
Aug 3 at 21:07
You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
â PeterT
yesterday
add a comment |Â
Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g.COL_SOMETHING1_SOMETHING2
) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
â AJD
Aug 3 at 21:05
1
With yourIndentLevel
UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
â AJD
Aug 3 at 21:07
You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
â PeterT
yesterday
Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g.
COL_SOMETHING1_SOMETHING2
) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_â AJD
Aug 3 at 21:05
Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g.
COL_SOMETHING1_SOMETHING2
) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_â AJD
Aug 3 at 21:05
1
1
With your
IndentLevel
UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.â AJD
Aug 3 at 21:07
With your
IndentLevel
UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.â AJD
Aug 3 at 21:07
You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
â PeterT
yesterday
You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
â PeterT
yesterday
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200827%2fmacro-to-convert-an-indented-hierarchy-to-a-database-format%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password