Macro to convert an indented hierarchy to a database format

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












Our business intelligence suite outputs data as an indented hierarchy, as below:



Level 1 1000
Level 2a 600
Level 3a 500
Level 3b 100
Level 2b 400
Level 3c 400


I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):



 1 2 3 #
Level 1 Level 2a Level 3a 500
Level 1 Level 2a Level 3b 100
Level 1 Level 2b Level 3c 400


The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:



Sub Database()

Application.ScreenUpdating = False

Dim WS As Worksheet
Dim SR As Range
Dim Rows As Integer
Dim Indent As Integer
Dim TR As Integer
Dim BR As Integer

Set WS = ActiveWorkbook.ActiveSheet
'StartCell is a function that returns the address of the first cell in the hierarchy
Set SR = WS.Range(StartCell())
Rows = SR.End(xlDown).Row - SR.Row
BR = SR.End(xlDown).Row
TR = SR.Row

'Insert 4 columns & add headers (Level 1, Level 2, etc.)
For x = 0 To 3
SR.EntireColumn.Insert
SR.Offset(-1, -1) = "Level " & x + 3
Next x

x = 0
q = 0

'The main code
Do While x < Rows + 1
'Identifies a row with base-level indentation & sets indent value to this level
If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
Indent = SR.Offset(x, 0).IndentLevel
End If
i = 0
'Loop while the indentation level is greater than one
Do While Indent > 1
'Move upwards and check whether indentation of new cell is one less than initial cell
If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
'Set new indent level - the next loop will now look for the parent of the new cell
Indent = SR.Offset(x - i, 0).IndentLevel
'If indent level is not one less than initial cell, continue moving upward
Else: i = i + 1
End If
Loop
x = x + 1
Loop

'Remove all rows that are not base-level
For q = BR To TR Step -1
If WS.Cells(q, 6).IndentLevel <> 5 Then
WS.Cells(q, 6).EntireRow.Delete
End If
Next q

WS.UsedRange.IndentLevel = 0

Application.ScreenUpdating = True

End Sub






share|improve this question



























    up vote
    4
    down vote

    favorite












    Our business intelligence suite outputs data as an indented hierarchy, as below:



    Level 1 1000
    Level 2a 600
    Level 3a 500
    Level 3b 100
    Level 2b 400
    Level 3c 400


    I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):



     1 2 3 #
    Level 1 Level 2a Level 3a 500
    Level 1 Level 2a Level 3b 100
    Level 1 Level 2b Level 3c 400


    The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:



    Sub Database()

    Application.ScreenUpdating = False

    Dim WS As Worksheet
    Dim SR As Range
    Dim Rows As Integer
    Dim Indent As Integer
    Dim TR As Integer
    Dim BR As Integer

    Set WS = ActiveWorkbook.ActiveSheet
    'StartCell is a function that returns the address of the first cell in the hierarchy
    Set SR = WS.Range(StartCell())
    Rows = SR.End(xlDown).Row - SR.Row
    BR = SR.End(xlDown).Row
    TR = SR.Row

    'Insert 4 columns & add headers (Level 1, Level 2, etc.)
    For x = 0 To 3
    SR.EntireColumn.Insert
    SR.Offset(-1, -1) = "Level " & x + 3
    Next x

    x = 0
    q = 0

    'The main code
    Do While x < Rows + 1
    'Identifies a row with base-level indentation & sets indent value to this level
    If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
    Indent = SR.Offset(x, 0).IndentLevel
    End If
    i = 0
    'Loop while the indentation level is greater than one
    Do While Indent > 1
    'Move upwards and check whether indentation of new cell is one less than initial cell
    If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
    'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
    SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
    'Set new indent level - the next loop will now look for the parent of the new cell
    Indent = SR.Offset(x - i, 0).IndentLevel
    'If indent level is not one less than initial cell, continue moving upward
    Else: i = i + 1
    End If
    Loop
    x = x + 1
    Loop

    'Remove all rows that are not base-level
    For q = BR To TR Step -1
    If WS.Cells(q, 6).IndentLevel <> 5 Then
    WS.Cells(q, 6).EntireRow.Delete
    End If
    Next q

    WS.UsedRange.IndentLevel = 0

    Application.ScreenUpdating = True

    End Sub






    share|improve this question























      up vote
      4
      down vote

      favorite









      up vote
      4
      down vote

      favorite











      Our business intelligence suite outputs data as an indented hierarchy, as below:



      Level 1 1000
      Level 2a 600
      Level 3a 500
      Level 3b 100
      Level 2b 400
      Level 3c 400


      I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):



       1 2 3 #
      Level 1 Level 2a Level 3a 500
      Level 1 Level 2a Level 3b 100
      Level 1 Level 2b Level 3c 400


      The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:



      Sub Database()

      Application.ScreenUpdating = False

      Dim WS As Worksheet
      Dim SR As Range
      Dim Rows As Integer
      Dim Indent As Integer
      Dim TR As Integer
      Dim BR As Integer

      Set WS = ActiveWorkbook.ActiveSheet
      'StartCell is a function that returns the address of the first cell in the hierarchy
      Set SR = WS.Range(StartCell())
      Rows = SR.End(xlDown).Row - SR.Row
      BR = SR.End(xlDown).Row
      TR = SR.Row

      'Insert 4 columns & add headers (Level 1, Level 2, etc.)
      For x = 0 To 3
      SR.EntireColumn.Insert
      SR.Offset(-1, -1) = "Level " & x + 3
      Next x

      x = 0
      q = 0

      'The main code
      Do While x < Rows + 1
      'Identifies a row with base-level indentation & sets indent value to this level
      If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
      Indent = SR.Offset(x, 0).IndentLevel
      End If
      i = 0
      'Loop while the indentation level is greater than one
      Do While Indent > 1
      'Move upwards and check whether indentation of new cell is one less than initial cell
      If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
      'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
      SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
      'Set new indent level - the next loop will now look for the parent of the new cell
      Indent = SR.Offset(x - i, 0).IndentLevel
      'If indent level is not one less than initial cell, continue moving upward
      Else: i = i + 1
      End If
      Loop
      x = x + 1
      Loop

      'Remove all rows that are not base-level
      For q = BR To TR Step -1
      If WS.Cells(q, 6).IndentLevel <> 5 Then
      WS.Cells(q, 6).EntireRow.Delete
      End If
      Next q

      WS.UsedRange.IndentLevel = 0

      Application.ScreenUpdating = True

      End Sub






      share|improve this question













      Our business intelligence suite outputs data as an indented hierarchy, as below:



      Level 1 1000
      Level 2a 600
      Level 3a 500
      Level 3b 100
      Level 2b 400
      Level 3c 400


      I've written a macro that converts this into a 'database' format, where only rows with the base (most granular) level are preserved, and the parents are listed to the left as below (this way the # column is summable):



       1 2 3 #
      Level 1 Level 2a Level 3a 500
      Level 1 Level 2a Level 3b 100
      Level 1 Level 2b Level 3c 400


      The problem I've been running into is that it takes 5-10 minutes to process a file with ~8000 rows. Although my code works, I'm convinced there's a faster way. See below for my code:



      Sub Database()

      Application.ScreenUpdating = False

      Dim WS As Worksheet
      Dim SR As Range
      Dim Rows As Integer
      Dim Indent As Integer
      Dim TR As Integer
      Dim BR As Integer

      Set WS = ActiveWorkbook.ActiveSheet
      'StartCell is a function that returns the address of the first cell in the hierarchy
      Set SR = WS.Range(StartCell())
      Rows = SR.End(xlDown).Row - SR.Row
      BR = SR.End(xlDown).Row
      TR = SR.Row

      'Insert 4 columns & add headers (Level 1, Level 2, etc.)
      For x = 0 To 3
      SR.EntireColumn.Insert
      SR.Offset(-1, -1) = "Level " & x + 3
      Next x

      x = 0
      q = 0

      'The main code
      Do While x < Rows + 1
      'Identifies a row with base-level indentation & sets indent value to this level
      If Left(SR.Offset(x, 0), 5) = "P_PC7" Then
      Indent = SR.Offset(x, 0).IndentLevel
      End If
      i = 0
      'Loop while the indentation level is greater than one
      Do While Indent > 1
      'Move upwards and check whether indentation of new cell is one less than initial cell
      If SR.Offset(x - i, 0).IndentLevel = Indent - 1 Then
      'If so, this cell is the parent of the initial cell - copy it into the appropriate spot to the left of the base level cell
      SR.Offset(x - i, 0).Copy SR.Offset(x, -1 * (6 - Indent))
      'Set new indent level - the next loop will now look for the parent of the new cell
      Indent = SR.Offset(x - i, 0).IndentLevel
      'If indent level is not one less than initial cell, continue moving upward
      Else: i = i + 1
      End If
      Loop
      x = x + 1
      Loop

      'Remove all rows that are not base-level
      For q = BR To TR Step -1
      If WS.Cells(q, 6).IndentLevel <> 5 Then
      WS.Cells(q, 6).EntireRow.Delete
      End If
      Next q

      WS.UsedRange.IndentLevel = 0

      Application.ScreenUpdating = True

      End Sub








      share|improve this question












      share|improve this question




      share|improve this question








      edited Aug 3 at 9:12









      200_success

      123k14143398




      123k14143398









      asked Aug 2 at 15:16









      Theodore Rapanu

      211




      211




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote













          Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:



          Region Base Level Account 1 Account 2 

          USA Level 1 500 800
          USA Level 2a 300 400
          USA Level 2b 200 400


          Here is my new code based on JNevill's framework:



          Sub HierarchyConvert()


          Application.ScreenUpdating = False
          Application.Calculation = xlCalculationManual

          Dim WS As Worksheet
          Dim SR As Range
          Dim LastRow As Long
          Dim rngReadCell As Range
          Dim rngWriteRow As Range
          Dim Indent As Integer
          Dim LastIndent As Integer
          Dim MaxIndent As Integer
          Dim ValueArray(0 To 19) As Variant

          Set WS = ActiveWorkbook.ActiveSheet
          Set SR = WS.Range(StartCell())
          LastRow = SR.End(xlDown).Row
          MaxIndent = 5

          Set rngWriteRow = WS.Rows(SR.Row)

          For x = 0 To 4
          SR.Offset(0, 1).EntireColumn.Insert
          SR.Offset(-1, 1) = "Level " & 7 - x
          Next x

          SR.Offset(-1, 0) = "Level 2"
          SR.Offset(-1, 5) = "PC"

          For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
          Indent = rngReadCell.IndentLevel
          If Indent <= LastIndent And LastIndent <> 0 Then
          Set rngWriteRow = rngWriteRow.Offset(1)
          For i = 1 To Indent
          rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
          Next i
          End If
          rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
          If Indent = MaxIndent Then
          'Copies leftmost header from base-level row to top left of write-row
          rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
          'Copies data to right of base-level row to the write-row
          For Z = 0 To 19
          ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
          Next Z
          For M = 0 To 19
          rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
          Next M
          End If
          LastIndent = Indent
          Next rngReadCell

          Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
          WS.UsedRange.IndentLevel = 0

          Application.ScreenUpdating = True
          Application.Calculation = xlCalculationAutomatic

          End Sub





          share|improve this answer




























            up vote
            0
            down vote













            In reviewing your code first, there are several things you can do to make your code more consistent.



            1. Always use Option Explicit. Please.

            2. When you're looking at performance, you can do more than just disable ScreenUpdating. See this answer for my usual solution if you feel you still need it.

            3. Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.

            4. Your "main code" loop uses Rows, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.

            As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:



            Public Function GetIndent(ByRef target As Range) As Long
            '--- UDF to return the numeric indent level of the target cell
            ' handles the multi-cell case and will return the indent
            ' level of ONLY the top left cell of the range
            GetIndent = target.Resize(1, 1).IndentLevel
            End Function


            Using this function in the first column to the right of your data (=GetIndent(A1)) now turns my source data into this:



            enter image description here



            I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.



            A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const declarations to keep track of column numbers, such as



            Const COL_FIRST_NAME As Long = 1
            Const COL_LAST_NAME As Long = 2
            Const COL_ADDRESS As Long = 3


            And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created



            '--- convenience declarations for accessing data columns
            Private Enum SrcColumns
            ID = 1
            Number = 2
            Indent = 3
            End Enum

            Private Enum DstColumns
            L1 = 1
            L2 = 2
            L3 = 3
            Number = 4
            End Enum


            You'll see how they are used below.



            First get your data into your memory-based array:



            Dim srcWS As Worksheet
            Dim dstWS As Worksheet
            Set srcWS = ThisWorkbook.Sheets("Sheet1")
            Set dstWS = ThisWorkbook.Sheets("Sheet2")

            '--- get our source data into an array
            Dim srcRange As Range
            Dim srcData As Variant
            Set srcRange = srcWS.UsedRange
            srcData = srcRange


            We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:



            Const MAX_LEVEL = 2
            Dim i As Long
            Dim maxDBRows As Long
            For i = 1 To UBound(srcData, 1)
            If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
            maxDBRows = maxDBRows + 1
            End If
            Next i


            Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const. You could use a WorksheetFunction to accomplish the same thing if you'd prefer.



            In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:



            For i = 1 To UBound(srcData, 1)
            Select Case srcData(i, SrcColumns.Indent)
            Case 0
            level1 = srcData(i, SrcColumns.ID)
            Case 1
            level2 = srcData(i, SrcColumns.ID)
            Case 2
            level3 = srcData(i, SrcColumns.ID)
            dstData(newDBRow, DstColumns.L1) = level1
            dstData(newDBRow, DstColumns.L2) = level2
            dstData(newDBRow, DstColumns.L3) = level3
            dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
            newDBRow = newDBRow + 1
            End Select
            Next i


            And finally, it's a quick copy to get the database array out to the destination:



            Dim dstRange As Range
            Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
            dstRange = dstData


            This runs very fast. Here's the entire module:



            Option Explicit

            '--- convenience declarations for accessing data columns
            Private Enum SrcColumns
            ID = 1
            Number = 2
            Indent = 3
            End Enum

            Private Enum DstColumns
            L1 = 1
            L2 = 2
            L3 = 3
            Number = 4
            End Enum

            Public Function GetIndent(ByRef target As Range) As Long
            '--- UDF to return the numeric indent level of the target cell
            GetIndent = target.IndentLevel
            End Function

            Sub ConvertToDatabase()
            Dim srcWS As Worksheet
            Dim dstWS As Worksheet
            Set srcWS = ThisWorkbook.Sheets("Sheet1")
            Set dstWS = ThisWorkbook.Sheets("Sheet2")

            '--- get our source data into an array
            Dim srcRange As Range
            Dim srcData As Variant
            Set srcRange = srcWS.UsedRange
            srcData = srcRange

            '--- we can determine how many rows in the destination database
            ' by getting a count of the highest indent level in the array
            Const MAX_LEVEL = 2
            Dim i As Long
            Dim maxDBRows As Long
            For i = 1 To UBound(srcData, 1)
            If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
            maxDBRows = maxDBRows + 1
            End If
            Next i

            '--- establish an empty database
            Dim dstData() As Variant
            ReDim dstData(1 To maxDBRows, 1 To 4)

            '--- load up the database
            Dim level1 As String
            Dim level2 As String
            Dim level3 As String
            Dim newDBRow As Long
            newDBRow = 1
            For i = 1 To UBound(srcData, 1)
            Select Case srcData(i, SrcColumns.Indent)
            Case 0
            level1 = srcData(i, SrcColumns.ID)
            Case 1
            level2 = srcData(i, SrcColumns.ID)
            Case 2
            level3 = srcData(i, SrcColumns.ID)
            dstData(newDBRow, DstColumns.L1) = level1
            dstData(newDBRow, DstColumns.L2) = level2
            dstData(newDBRow, DstColumns.L3) = level3
            dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
            newDBRow = newDBRow + 1
            End Select
            Next i

            '--- finally copy the array out to the destination
            Dim dstRange As Range
            Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
            dstRange = dstData

            End Sub





            share|improve this answer























            • Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g. COL_SOMETHING1_SOMETHING2) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
              – AJD
              Aug 3 at 21:05






            • 1




              With your IndentLevel UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
              – AJD
              Aug 3 at 21:07










            • You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
              – PeterT
              yesterday










            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "196"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200827%2fmacro-to-convert-an-indented-hierarchy-to-a-database-format%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote













            Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:



            Region Base Level Account 1 Account 2 

            USA Level 1 500 800
            USA Level 2a 300 400
            USA Level 2b 200 400


            Here is my new code based on JNevill's framework:



            Sub HierarchyConvert()


            Application.ScreenUpdating = False
            Application.Calculation = xlCalculationManual

            Dim WS As Worksheet
            Dim SR As Range
            Dim LastRow As Long
            Dim rngReadCell As Range
            Dim rngWriteRow As Range
            Dim Indent As Integer
            Dim LastIndent As Integer
            Dim MaxIndent As Integer
            Dim ValueArray(0 To 19) As Variant

            Set WS = ActiveWorkbook.ActiveSheet
            Set SR = WS.Range(StartCell())
            LastRow = SR.End(xlDown).Row
            MaxIndent = 5

            Set rngWriteRow = WS.Rows(SR.Row)

            For x = 0 To 4
            SR.Offset(0, 1).EntireColumn.Insert
            SR.Offset(-1, 1) = "Level " & 7 - x
            Next x

            SR.Offset(-1, 0) = "Level 2"
            SR.Offset(-1, 5) = "PC"

            For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
            Indent = rngReadCell.IndentLevel
            If Indent <= LastIndent And LastIndent <> 0 Then
            Set rngWriteRow = rngWriteRow.Offset(1)
            For i = 1 To Indent
            rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
            Next i
            End If
            rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
            If Indent = MaxIndent Then
            'Copies leftmost header from base-level row to top left of write-row
            rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
            'Copies data to right of base-level row to the write-row
            For Z = 0 To 19
            ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
            Next Z
            For M = 0 To 19
            rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
            Next M
            End If
            LastIndent = Indent
            Next rngReadCell

            Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
            WS.UsedRange.IndentLevel = 0

            Application.ScreenUpdating = True
            Application.Calculation = xlCalculationAutomatic

            End Sub





            share|improve this answer

























              up vote
              2
              down vote













              Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:



              Region Base Level Account 1 Account 2 

              USA Level 1 500 800
              USA Level 2a 300 400
              USA Level 2b 200 400


              Here is my new code based on JNevill's framework:



              Sub HierarchyConvert()


              Application.ScreenUpdating = False
              Application.Calculation = xlCalculationManual

              Dim WS As Worksheet
              Dim SR As Range
              Dim LastRow As Long
              Dim rngReadCell As Range
              Dim rngWriteRow As Range
              Dim Indent As Integer
              Dim LastIndent As Integer
              Dim MaxIndent As Integer
              Dim ValueArray(0 To 19) As Variant

              Set WS = ActiveWorkbook.ActiveSheet
              Set SR = WS.Range(StartCell())
              LastRow = SR.End(xlDown).Row
              MaxIndent = 5

              Set rngWriteRow = WS.Rows(SR.Row)

              For x = 0 To 4
              SR.Offset(0, 1).EntireColumn.Insert
              SR.Offset(-1, 1) = "Level " & 7 - x
              Next x

              SR.Offset(-1, 0) = "Level 2"
              SR.Offset(-1, 5) = "PC"

              For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
              Indent = rngReadCell.IndentLevel
              If Indent <= LastIndent And LastIndent <> 0 Then
              Set rngWriteRow = rngWriteRow.Offset(1)
              For i = 1 To Indent
              rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
              Next i
              End If
              rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
              If Indent = MaxIndent Then
              'Copies leftmost header from base-level row to top left of write-row
              rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
              'Copies data to right of base-level row to the write-row
              For Z = 0 To 19
              ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
              Next Z
              For M = 0 To 19
              rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
              Next M
              End If
              LastIndent = Indent
              Next rngReadCell

              Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
              WS.UsedRange.IndentLevel = 0

              Application.ScreenUpdating = True
              Application.Calculation = xlCalculationAutomatic

              End Sub





              share|improve this answer























                up vote
                2
                down vote










                up vote
                2
                down vote









                Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:



                Region Base Level Account 1 Account 2 

                USA Level 1 500 800
                USA Level 2a 300 400
                USA Level 2b 200 400


                Here is my new code based on JNevill's framework:



                Sub HierarchyConvert()


                Application.ScreenUpdating = False
                Application.Calculation = xlCalculationManual

                Dim WS As Worksheet
                Dim SR As Range
                Dim LastRow As Long
                Dim rngReadCell As Range
                Dim rngWriteRow As Range
                Dim Indent As Integer
                Dim LastIndent As Integer
                Dim MaxIndent As Integer
                Dim ValueArray(0 To 19) As Variant

                Set WS = ActiveWorkbook.ActiveSheet
                Set SR = WS.Range(StartCell())
                LastRow = SR.End(xlDown).Row
                MaxIndent = 5

                Set rngWriteRow = WS.Rows(SR.Row)

                For x = 0 To 4
                SR.Offset(0, 1).EntireColumn.Insert
                SR.Offset(-1, 1) = "Level " & 7 - x
                Next x

                SR.Offset(-1, 0) = "Level 2"
                SR.Offset(-1, 5) = "PC"

                For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
                Indent = rngReadCell.IndentLevel
                If Indent <= LastIndent And LastIndent <> 0 Then
                Set rngWriteRow = rngWriteRow.Offset(1)
                For i = 1 To Indent
                rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
                Next i
                End If
                rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
                If Indent = MaxIndent Then
                'Copies leftmost header from base-level row to top left of write-row
                rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
                'Copies data to right of base-level row to the write-row
                For Z = 0 To 19
                ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
                Next Z
                For M = 0 To 19
                rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
                Next M
                End If
                LastIndent = Indent
                Next rngReadCell

                Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
                WS.UsedRange.IndentLevel = 0

                Application.ScreenUpdating = True
                Application.Calculation = xlCalculationAutomatic

                End Sub





                share|improve this answer













                Thanks very much to JNevill for his solution - it is indeed significantly faster than my original code. I had to make some changes to accommodate more than one # column, as well as headers to the left of the indented hierarchy column i.e.:



                Region Base Level Account 1 Account 2 

                USA Level 1 500 800
                USA Level 2a 300 400
                USA Level 2b 200 400


                Here is my new code based on JNevill's framework:



                Sub HierarchyConvert()


                Application.ScreenUpdating = False
                Application.Calculation = xlCalculationManual

                Dim WS As Worksheet
                Dim SR As Range
                Dim LastRow As Long
                Dim rngReadCell As Range
                Dim rngWriteRow As Range
                Dim Indent As Integer
                Dim LastIndent As Integer
                Dim MaxIndent As Integer
                Dim ValueArray(0 To 19) As Variant

                Set WS = ActiveWorkbook.ActiveSheet
                Set SR = WS.Range(StartCell())
                LastRow = SR.End(xlDown).Row
                MaxIndent = 5

                Set rngWriteRow = WS.Rows(SR.Row)

                For x = 0 To 4
                SR.Offset(0, 1).EntireColumn.Insert
                SR.Offset(-1, 1) = "Level " & 7 - x
                Next x

                SR.Offset(-1, 0) = "Level 2"
                SR.Offset(-1, 5) = "PC"

                For Each rngReadCell In WS.Range(SR.Address & ":B" & LastRow)
                Indent = rngReadCell.IndentLevel
                If Indent <= LastIndent And LastIndent <> 0 Then
                Set rngWriteRow = rngWriteRow.Offset(1)
                For i = 1 To Indent
                rngWriteRow.Cells(1, i + 1).Value = rngWriteRow.Cells(1, i + 1).Offset(-1).Value
                Next i
                End If
                rngWriteRow.Cells(Indent + 2).Value = Trim(rngReadCell.Value)
                If Indent = MaxIndent Then
                'Copies leftmost header from base-level row to top left of write-row
                rngWriteRow.Cells(1) = rngReadCell.Offset(, -1).Value
                'Copies data to right of base-level row to the write-row
                For Z = 0 To 19
                ValueArray(Z) = rngReadCell.Offset(, Z + 6).Value
                Next Z
                For M = 0 To 19
                rngWriteRow.Cells(Indent + M + 3).Value = ValueArray(M)
                Next M
                End If
                LastIndent = Indent
                Next rngReadCell

                Range("A" & SR.Offset(0, 1).End(xlDown).Row + 1 & ":Z" & LastRow + 1).ClearContents
                WS.UsedRange.IndentLevel = 0

                Application.ScreenUpdating = True
                Application.Calculation = xlCalculationAutomatic

                End Sub






                share|improve this answer













                share|improve this answer



                share|improve this answer











                answered Aug 3 at 15:05









                Theodore Rapanu

                211




                211






















                    up vote
                    0
                    down vote













                    In reviewing your code first, there are several things you can do to make your code more consistent.



                    1. Always use Option Explicit. Please.

                    2. When you're looking at performance, you can do more than just disable ScreenUpdating. See this answer for my usual solution if you feel you still need it.

                    3. Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.

                    4. Your "main code" loop uses Rows, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.

                    As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:



                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    ' handles the multi-cell case and will return the indent
                    ' level of ONLY the top left cell of the range
                    GetIndent = target.Resize(1, 1).IndentLevel
                    End Function


                    Using this function in the first column to the right of your data (=GetIndent(A1)) now turns my source data into this:



                    enter image description here



                    I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.



                    A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const declarations to keep track of column numbers, such as



                    Const COL_FIRST_NAME As Long = 1
                    Const COL_LAST_NAME As Long = 2
                    Const COL_ADDRESS As Long = 3


                    And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created



                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum


                    You'll see how they are used below.



                    First get your data into your memory-based array:



                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange


                    We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:



                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i


                    Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const. You could use a WorksheetFunction to accomplish the same thing if you'd prefer.



                    In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:



                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i


                    And finally, it's a quick copy to get the database array out to the destination:



                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData


                    This runs very fast. Here's the entire module:



                    Option Explicit

                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum

                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    GetIndent = target.IndentLevel
                    End Function

                    Sub ConvertToDatabase()
                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange

                    '--- we can determine how many rows in the destination database
                    ' by getting a count of the highest indent level in the array
                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i

                    '--- establish an empty database
                    Dim dstData() As Variant
                    ReDim dstData(1 To maxDBRows, 1 To 4)

                    '--- load up the database
                    Dim level1 As String
                    Dim level2 As String
                    Dim level3 As String
                    Dim newDBRow As Long
                    newDBRow = 1
                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i

                    '--- finally copy the array out to the destination
                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData

                    End Sub





                    share|improve this answer























                    • Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g. COL_SOMETHING1_SOMETHING2) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
                      – AJD
                      Aug 3 at 21:05






                    • 1




                      With your IndentLevel UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
                      – AJD
                      Aug 3 at 21:07










                    • You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
                      – PeterT
                      yesterday














                    up vote
                    0
                    down vote













                    In reviewing your code first, there are several things you can do to make your code more consistent.



                    1. Always use Option Explicit. Please.

                    2. When you're looking at performance, you can do more than just disable ScreenUpdating. See this answer for my usual solution if you feel you still need it.

                    3. Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.

                    4. Your "main code" loop uses Rows, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.

                    As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:



                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    ' handles the multi-cell case and will return the indent
                    ' level of ONLY the top left cell of the range
                    GetIndent = target.Resize(1, 1).IndentLevel
                    End Function


                    Using this function in the first column to the right of your data (=GetIndent(A1)) now turns my source data into this:



                    enter image description here



                    I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.



                    A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const declarations to keep track of column numbers, such as



                    Const COL_FIRST_NAME As Long = 1
                    Const COL_LAST_NAME As Long = 2
                    Const COL_ADDRESS As Long = 3


                    And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created



                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum


                    You'll see how they are used below.



                    First get your data into your memory-based array:



                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange


                    We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:



                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i


                    Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const. You could use a WorksheetFunction to accomplish the same thing if you'd prefer.



                    In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:



                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i


                    And finally, it's a quick copy to get the database array out to the destination:



                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData


                    This runs very fast. Here's the entire module:



                    Option Explicit

                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum

                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    GetIndent = target.IndentLevel
                    End Function

                    Sub ConvertToDatabase()
                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange

                    '--- we can determine how many rows in the destination database
                    ' by getting a count of the highest indent level in the array
                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i

                    '--- establish an empty database
                    Dim dstData() As Variant
                    ReDim dstData(1 To maxDBRows, 1 To 4)

                    '--- load up the database
                    Dim level1 As String
                    Dim level2 As String
                    Dim level3 As String
                    Dim newDBRow As Long
                    newDBRow = 1
                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i

                    '--- finally copy the array out to the destination
                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData

                    End Sub





                    share|improve this answer























                    • Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g. COL_SOMETHING1_SOMETHING2) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
                      – AJD
                      Aug 3 at 21:05






                    • 1




                      With your IndentLevel UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
                      – AJD
                      Aug 3 at 21:07










                    • You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
                      – PeterT
                      yesterday












                    up vote
                    0
                    down vote










                    up vote
                    0
                    down vote









                    In reviewing your code first, there are several things you can do to make your code more consistent.



                    1. Always use Option Explicit. Please.

                    2. When you're looking at performance, you can do more than just disable ScreenUpdating. See this answer for my usual solution if you feel you still need it.

                    3. Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.

                    4. Your "main code" loop uses Rows, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.

                    As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:



                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    ' handles the multi-cell case and will return the indent
                    ' level of ONLY the top left cell of the range
                    GetIndent = target.Resize(1, 1).IndentLevel
                    End Function


                    Using this function in the first column to the right of your data (=GetIndent(A1)) now turns my source data into this:



                    enter image description here



                    I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.



                    A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const declarations to keep track of column numbers, such as



                    Const COL_FIRST_NAME As Long = 1
                    Const COL_LAST_NAME As Long = 2
                    Const COL_ADDRESS As Long = 3


                    And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created



                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum


                    You'll see how they are used below.



                    First get your data into your memory-based array:



                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange


                    We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:



                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i


                    Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const. You could use a WorksheetFunction to accomplish the same thing if you'd prefer.



                    In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:



                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i


                    And finally, it's a quick copy to get the database array out to the destination:



                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData


                    This runs very fast. Here's the entire module:



                    Option Explicit

                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum

                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    GetIndent = target.IndentLevel
                    End Function

                    Sub ConvertToDatabase()
                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange

                    '--- we can determine how many rows in the destination database
                    ' by getting a count of the highest indent level in the array
                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i

                    '--- establish an empty database
                    Dim dstData() As Variant
                    ReDim dstData(1 To maxDBRows, 1 To 4)

                    '--- load up the database
                    Dim level1 As String
                    Dim level2 As String
                    Dim level3 As String
                    Dim newDBRow As Long
                    newDBRow = 1
                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i

                    '--- finally copy the array out to the destination
                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData

                    End Sub





                    share|improve this answer















                    In reviewing your code first, there are several things you can do to make your code more consistent.



                    1. Always use Option Explicit. Please.

                    2. When you're looking at performance, you can do more than just disable ScreenUpdating. See this answer for my usual solution if you feel you still need it.

                    3. Typical professional developers will use start variable names with lower case letters. Functions/Subs will start with upper case. CamelCaseVariableOrSubNames is also most common.

                    4. Your "main code" loop uses Rows, which implies the rows on the active worksheet. "Implying" which rows you're referencing will get you into loads of trouble. Always declare variables to specifically reference which worksheet or range that you're using and it's easier to keep it straight.

                    As with many of the performance questions in Code Review, your need for speed will be solved with a memory-based array. But one of the stumbling blocks you have is detecting the indent level for each of the rows in your source data. My quick solution is to create a "helper column" of data next to your source that uses a simple User Defined Function (UDF) to identify the indent level. The UDF is a single line:



                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    ' handles the multi-cell case and will return the indent
                    ' level of ONLY the top left cell of the range
                    GetIndent = target.Resize(1, 1).IndentLevel
                    End Function


                    Using this function in the first column to the right of your data (=GetIndent(A1)) now turns my source data into this:



                    enter image description here



                    I had to do this because if I pull your original source data into an array, the array loses the indent level information. Otherwise, I'd have to continually refer back to the worksheet which is taking the bulk of your processing time.



                    A quick side note on how I am defining and accessing columns of data in my code. (I deeply regret I've lost track of which user on SO/CR I lifted this tip from. Whoever you are, mad props!) I find that much of my column-based data can change "shape" over a period of development and use. Columns can be added or deleted or switched in order. Hard-coding the column number in code then becomes problematic and forces lots of code changes to keep up. For the longest time I defined a set of Const declarations to keep track of column numbers, such as



                    Const COL_FIRST_NAME As Long = 1
                    Const COL_LAST_NAME As Long = 2
                    Const COL_ADDRESS As Long = 3


                    And this works just fine, but the names get tedious and it's easy to lose track of which constant to use for which range of data. So from one of the many things I've learned here, I now create an Enum to define column indexes that can more specifically be tied to a set of data. In the course of your solution, I have created



                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum


                    You'll see how they are used below.



                    First get your data into your memory-based array:



                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange


                    We have to next figure out how many rows we'll need in our resulting database. This turns out to be straightforward by counting the number of times the maximum indent level appears in the data. In this case, the max indent level is 2. So:



                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i


                    Optionally (ideally), you can dynamically determine the maximum indent level instead of creating a Const. You could use a WorksheetFunction to accomplish the same thing if you'd prefer.



                    In order to create your database, I'm making a strict assumption that you will always encounter previous indent levels before reaching the maximum indent. This means that inside my loop I can capture all the level labels up to the maximum level and keep them. So creating the database now becomes a simple loop:



                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i


                    And finally, it's a quick copy to get the database array out to the destination:



                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData


                    This runs very fast. Here's the entire module:



                    Option Explicit

                    '--- convenience declarations for accessing data columns
                    Private Enum SrcColumns
                    ID = 1
                    Number = 2
                    Indent = 3
                    End Enum

                    Private Enum DstColumns
                    L1 = 1
                    L2 = 2
                    L3 = 3
                    Number = 4
                    End Enum

                    Public Function GetIndent(ByRef target As Range) As Long
                    '--- UDF to return the numeric indent level of the target cell
                    GetIndent = target.IndentLevel
                    End Function

                    Sub ConvertToDatabase()
                    Dim srcWS As Worksheet
                    Dim dstWS As Worksheet
                    Set srcWS = ThisWorkbook.Sheets("Sheet1")
                    Set dstWS = ThisWorkbook.Sheets("Sheet2")

                    '--- get our source data into an array
                    Dim srcRange As Range
                    Dim srcData As Variant
                    Set srcRange = srcWS.UsedRange
                    srcData = srcRange

                    '--- we can determine how many rows in the destination database
                    ' by getting a count of the highest indent level in the array
                    Const MAX_LEVEL = 2
                    Dim i As Long
                    Dim maxDBRows As Long
                    For i = 1 To UBound(srcData, 1)
                    If srcData(i, SrcColumns.Indent) = MAX_LEVEL Then
                    maxDBRows = maxDBRows + 1
                    End If
                    Next i

                    '--- establish an empty database
                    Dim dstData() As Variant
                    ReDim dstData(1 To maxDBRows, 1 To 4)

                    '--- load up the database
                    Dim level1 As String
                    Dim level2 As String
                    Dim level3 As String
                    Dim newDBRow As Long
                    newDBRow = 1
                    For i = 1 To UBound(srcData, 1)
                    Select Case srcData(i, SrcColumns.Indent)
                    Case 0
                    level1 = srcData(i, SrcColumns.ID)
                    Case 1
                    level2 = srcData(i, SrcColumns.ID)
                    Case 2
                    level3 = srcData(i, SrcColumns.ID)
                    dstData(newDBRow, DstColumns.L1) = level1
                    dstData(newDBRow, DstColumns.L2) = level2
                    dstData(newDBRow, DstColumns.L3) = level3
                    dstData(newDBRow, DstColumns.Number) = srcData(i, SrcColumns.Number)
                    newDBRow = newDBRow + 1
                    End Select
                    Next i

                    '--- finally copy the array out to the destination
                    Dim dstRange As Range
                    Set dstRange = dstWS.Range("A1").Resize(UBound(dstData, 1), UBound(dstData, 2))
                    dstRange = dstData

                    End Sub






                    share|improve this answer















                    share|improve this answer



                    share|improve this answer








                    edited yesterday


























                    answered Aug 3 at 15:02









                    PeterT

                    1,153210




                    1,153210











                    • Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g. COL_SOMETHING1_SOMETHING2) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
                      – AJD
                      Aug 3 at 21:05






                    • 1




                      With your IndentLevel UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
                      – AJD
                      Aug 3 at 21:07










                    • You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
                      – PeterT
                      yesterday
















                    • Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g. COL_SOMETHING1_SOMETHING2) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
                      – AJD
                      Aug 3 at 21:05






                    • 1




                      With your IndentLevel UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
                      – AJD
                      Aug 3 at 21:07










                    • You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
                      – PeterT
                      yesterday















                    Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g. COL_SOMETHING1_SOMETHING2) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
                    – AJD
                    Aug 3 at 21:05




                    Named Ranges can also be used to stabilise code instead of fixed column numbers. I am working on a code base at the moment where I am using "magic numbers" similar to your constants (e.g. COL_SOMETHING1_SOMETHING2) but converting between column letter and column number when fixing the template is tedious. In the process of changing to named columns which means that the code base would never change again (). _() for various definitions of "never"_
                    – AJD
                    Aug 3 at 21:05




                    1




                    1




                    With your IndentLevel UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
                    – AJD
                    Aug 3 at 21:07




                    With your IndentLevel UDF: This works fine on an single cell, but can fail if multiple cells are passed. There was once an MSDN article that explained this but the new MSDN format is a lot blander and doesn't contain this useful in-depth information anymore.
                    – AJD
                    Aug 3 at 21:07












                    You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
                    – PeterT
                    yesterday




                    You make a good point about multiple cells being passed to the UDF. I've modified the code to handle that case -- simplistically assuming that the indent level of the first cell (upper left) of the range will be returned.
                    – PeterT
                    yesterday












                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200827%2fmacro-to-convert-an-indented-hierarchy-to-a-database-format%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Greedy Best First Search implementation in Rust

                    Function to Return a JSON Like Objects Using VBA Collections and Arrays

                    C++11 CLH Lock Implementation