Powershell function to search CSV logs for certain regexes

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite












I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.



This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.



This one grabs the IDNumber from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate), then based on the regex, adds files to certain variables ($Scriptdone, $Updatedone, $Failed) and starts the checks to see if they have them.



Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!



function Get-MR4RES 
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,

[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param

begin

# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''

# End begin

process

# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach

# End process section

End

# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))

# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii



# If no CSVPath is used in get-command
else

# Out-put to console
Write-Output $array

# End of else

# End of the End

# End of function






share|improve this question





















  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mast
    Jul 11 at 18:51
















up vote
1
down vote

favorite












I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.



This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.



This one grabs the IDNumber from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate), then based on the regex, adds files to certain variables ($Scriptdone, $Updatedone, $Failed) and starts the checks to see if they have them.



Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!



function Get-MR4RES 
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,

[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param

begin

# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''

# End begin

process

# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach

# End process section

End

# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))

# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii



# If no CSVPath is used in get-command
else

# Out-put to console
Write-Output $array

# End of else

# End of the End

# End of function






share|improve this question





















  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mast
    Jul 11 at 18:51












up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.



This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.



This one grabs the IDNumber from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate), then based on the regex, adds files to certain variables ($Scriptdone, $Updatedone, $Failed) and starts the checks to see if they have them.



Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!



function Get-MR4RES 
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,

[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param

begin

# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''

# End begin

process

# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach

# End process section

End

# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))

# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii



# If no CSVPath is used in get-command
else

# Out-put to console
Write-Output $array

# End of else

# End of the End

# End of function






share|improve this question













I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.



This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.



This one grabs the IDNumber from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate), then based on the regex, adds files to certain variables ($Scriptdone, $Updatedone, $Failed) and starts the checks to see if they have them.



Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!



function Get-MR4RES 
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,

[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param

begin

# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''

# End begin

process

# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach

# End process section

End

# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))

# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii



# If no CSVPath is used in get-command
else

# Out-put to console
Write-Output $array

# End of else

# End of the End

# End of function








share|improve this question












share|improve this question




share|improve this question








edited Jul 11 at 19:07









Jamal♦

30.1k11114225




30.1k11114225









asked Jul 10 at 16:15









Just_learning

64




64











  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mast
    Jul 11 at 18:51
















  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Mast
    Jul 11 at 18:51















Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mast
Jul 11 at 18:51




Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Mast
Jul 11 at 18:51










2 Answers
2






active

oldest

votes

















up vote
1
down vote













Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.



$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton


Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.



$size = $array.Length
#Assumes there's only one line that will match a given regex per file
#If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
$validate |% {
switch -regex($_)
$Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
$updatecomplete $array += "$IDNumber, $Good2"; break
$Failedvalidation $array += "$IDNumber, $Fail1"; break
default


#Checks to see if array has grown, if it hasn't, no matches were found
#Bit hacky and there's probably a better way to do it.
if($size -eq $array.length)
$array += -join ("$IDNumber",', ',"$Fail2")



Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.






share|improve this answer






























    up vote
    0
    down vote













    You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:



    $Validate = Get-Content -Path $File


    Or it's this part where you scan through the in-memory array:



    $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
    $Updatedone = $Validate | where-object $_ -match $Updatecomplete
    $Failed = $Validate | Where-Object $_ -match $FailedValidaton


    Or maybe both contribute to the slowness.



    You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content line, and just load in one of the files once at the beginning. Does that speed it up?



    You can also try commenting out the scanning lines. Does that speed it up?



    An observation:



    $Scriptdone, $Updatedone, $Failed appear to be mutually exclusive. You don't need to find $Scriptdone if $Failed is true, for instance. You could restructure your code like this to remove the redundant processing:



    $Failed = $Validate | Where-Object $_ -match $FailedValidaton

    if($Failed)
    # ...

    else
    $Scriptdone = $Validate


    Some questions:



    • Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?


    • How big are the files?


    Edit:



    Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.



    I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.



    I can just throw some random ideas out there.




    • Try using the -Raw switch on the Get-Content to load the whole in one chunk:



      $Validate = Get-Content -Path $File -Raw



      $Scriptdone = $Validate -match $Scriptcompletedsuccess




    • Try using Select-String to search through the files:



      Note: Delete the Get-Content line for this idea.



      $Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1



      The Select-Object -First 1 is optional, but it should speed things up because the search will stop as soon as the first match is found.



    • My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.


    Ultimately you have to track down the source of the slowness before you can fix the problem.






    share|improve this answer























    • Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
      – Just_learning
      Jul 11 at 16:48










    • I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
      – Just_learning
      Jul 11 at 17:09










    • @Just_learning, please see the edit at the bottom of my answer.
      – Dangph
      Jul 13 at 9:24










    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f198230%2fpowershell-function-to-search-csv-logs-for-certain-regexes%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.



    $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
    $Updatedone = $Validate | where-object $_ -match $Updatecomplete
    $Failed = $Validate | Where-Object $_ -match $FailedValidaton


    Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.



    $size = $array.Length
    #Assumes there's only one line that will match a given regex per file
    #If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
    $validate |% {
    switch -regex($_)
    $Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
    $updatecomplete $array += "$IDNumber, $Good2"; break
    $Failedvalidation $array += "$IDNumber, $Fail1"; break
    default


    #Checks to see if array has grown, if it hasn't, no matches were found
    #Bit hacky and there's probably a better way to do it.
    if($size -eq $array.length)
    $array += -join ("$IDNumber",', ',"$Fail2")



    Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.






    share|improve this answer



























      up vote
      1
      down vote













      Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.



      $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
      $Updatedone = $Validate | where-object $_ -match $Updatecomplete
      $Failed = $Validate | Where-Object $_ -match $FailedValidaton


      Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.



      $size = $array.Length
      #Assumes there's only one line that will match a given regex per file
      #If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
      $validate |% {
      switch -regex($_)
      $Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
      $updatecomplete $array += "$IDNumber, $Good2"; break
      $Failedvalidation $array += "$IDNumber, $Fail1"; break
      default


      #Checks to see if array has grown, if it hasn't, no matches were found
      #Bit hacky and there's probably a better way to do it.
      if($size -eq $array.length)
      $array += -join ("$IDNumber",', ',"$Fail2")



      Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.






      share|improve this answer

























        up vote
        1
        down vote










        up vote
        1
        down vote









        Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.



        $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
        $Updatedone = $Validate | where-object $_ -match $Updatecomplete
        $Failed = $Validate | Where-Object $_ -match $FailedValidaton


        Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.



        $size = $array.Length
        #Assumes there's only one line that will match a given regex per file
        #If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
        $validate |% {
        switch -regex($_)
        $Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
        $updatecomplete $array += "$IDNumber, $Good2"; break
        $Failedvalidation $array += "$IDNumber, $Fail1"; break
        default


        #Checks to see if array has grown, if it hasn't, no matches were found
        #Bit hacky and there's probably a better way to do it.
        if($size -eq $array.length)
        $array += -join ("$IDNumber",', ',"$Fail2")



        Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.






        share|improve this answer















        Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.



        $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
        $Updatedone = $Validate | where-object $_ -match $Updatecomplete
        $Failed = $Validate | Where-Object $_ -match $FailedValidaton


        Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.



        $size = $array.Length
        #Assumes there's only one line that will match a given regex per file
        #If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
        $validate |% {
        switch -regex($_)
        $Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
        $updatecomplete $array += "$IDNumber, $Good2"; break
        $Failedvalidation $array += "$IDNumber, $Fail1"; break
        default


        #Checks to see if array has grown, if it hasn't, no matches were found
        #Bit hacky and there's probably a better way to do it.
        if($size -eq $array.length)
        $array += -join ("$IDNumber",', ',"$Fail2")



        Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.







        share|improve this answer















        share|improve this answer



        share|improve this answer








        edited Jul 25 at 1:22


























        answered Jul 25 at 1:11









        Veskah

        1115




        1115






















            up vote
            0
            down vote













            You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:



            $Validate = Get-Content -Path $File


            Or it's this part where you scan through the in-memory array:



            $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
            $Updatedone = $Validate | where-object $_ -match $Updatecomplete
            $Failed = $Validate | Where-Object $_ -match $FailedValidaton


            Or maybe both contribute to the slowness.



            You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content line, and just load in one of the files once at the beginning. Does that speed it up?



            You can also try commenting out the scanning lines. Does that speed it up?



            An observation:



            $Scriptdone, $Updatedone, $Failed appear to be mutually exclusive. You don't need to find $Scriptdone if $Failed is true, for instance. You could restructure your code like this to remove the redundant processing:



            $Failed = $Validate | Where-Object $_ -match $FailedValidaton

            if($Failed)
            # ...

            else
            $Scriptdone = $Validate


            Some questions:



            • Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?


            • How big are the files?


            Edit:



            Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.



            I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.



            I can just throw some random ideas out there.




            • Try using the -Raw switch on the Get-Content to load the whole in one chunk:



              $Validate = Get-Content -Path $File -Raw



              $Scriptdone = $Validate -match $Scriptcompletedsuccess




            • Try using Select-String to search through the files:



              Note: Delete the Get-Content line for this idea.



              $Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1



              The Select-Object -First 1 is optional, but it should speed things up because the search will stop as soon as the first match is found.



            • My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.


            Ultimately you have to track down the source of the slowness before you can fix the problem.






            share|improve this answer























            • Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
              – Just_learning
              Jul 11 at 16:48










            • I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
              – Just_learning
              Jul 11 at 17:09










            • @Just_learning, please see the edit at the bottom of my answer.
              – Dangph
              Jul 13 at 9:24














            up vote
            0
            down vote













            You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:



            $Validate = Get-Content -Path $File


            Or it's this part where you scan through the in-memory array:



            $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
            $Updatedone = $Validate | where-object $_ -match $Updatecomplete
            $Failed = $Validate | Where-Object $_ -match $FailedValidaton


            Or maybe both contribute to the slowness.



            You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content line, and just load in one of the files once at the beginning. Does that speed it up?



            You can also try commenting out the scanning lines. Does that speed it up?



            An observation:



            $Scriptdone, $Updatedone, $Failed appear to be mutually exclusive. You don't need to find $Scriptdone if $Failed is true, for instance. You could restructure your code like this to remove the redundant processing:



            $Failed = $Validate | Where-Object $_ -match $FailedValidaton

            if($Failed)
            # ...

            else
            $Scriptdone = $Validate


            Some questions:



            • Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?


            • How big are the files?


            Edit:



            Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.



            I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.



            I can just throw some random ideas out there.




            • Try using the -Raw switch on the Get-Content to load the whole in one chunk:



              $Validate = Get-Content -Path $File -Raw



              $Scriptdone = $Validate -match $Scriptcompletedsuccess




            • Try using Select-String to search through the files:



              Note: Delete the Get-Content line for this idea.



              $Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1



              The Select-Object -First 1 is optional, but it should speed things up because the search will stop as soon as the first match is found.



            • My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.


            Ultimately you have to track down the source of the slowness before you can fix the problem.






            share|improve this answer























            • Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
              – Just_learning
              Jul 11 at 16:48










            • I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
              – Just_learning
              Jul 11 at 17:09










            • @Just_learning, please see the edit at the bottom of my answer.
              – Dangph
              Jul 13 at 9:24












            up vote
            0
            down vote










            up vote
            0
            down vote









            You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:



            $Validate = Get-Content -Path $File


            Or it's this part where you scan through the in-memory array:



            $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
            $Updatedone = $Validate | where-object $_ -match $Updatecomplete
            $Failed = $Validate | Where-Object $_ -match $FailedValidaton


            Or maybe both contribute to the slowness.



            You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content line, and just load in one of the files once at the beginning. Does that speed it up?



            You can also try commenting out the scanning lines. Does that speed it up?



            An observation:



            $Scriptdone, $Updatedone, $Failed appear to be mutually exclusive. You don't need to find $Scriptdone if $Failed is true, for instance. You could restructure your code like this to remove the redundant processing:



            $Failed = $Validate | Where-Object $_ -match $FailedValidaton

            if($Failed)
            # ...

            else
            $Scriptdone = $Validate


            Some questions:



            • Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?


            • How big are the files?


            Edit:



            Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.



            I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.



            I can just throw some random ideas out there.




            • Try using the -Raw switch on the Get-Content to load the whole in one chunk:



              $Validate = Get-Content -Path $File -Raw



              $Scriptdone = $Validate -match $Scriptcompletedsuccess




            • Try using Select-String to search through the files:



              Note: Delete the Get-Content line for this idea.



              $Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1



              The Select-Object -First 1 is optional, but it should speed things up because the search will stop as soon as the first match is found.



            • My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.


            Ultimately you have to track down the source of the slowness before you can fix the problem.






            share|improve this answer















            You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:



            $Validate = Get-Content -Path $File


            Or it's this part where you scan through the in-memory array:



            $Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
            $Updatedone = $Validate | where-object $_ -match $Updatecomplete
            $Failed = $Validate | Where-Object $_ -match $FailedValidaton


            Or maybe both contribute to the slowness.



            You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content line, and just load in one of the files once at the beginning. Does that speed it up?



            You can also try commenting out the scanning lines. Does that speed it up?



            An observation:



            $Scriptdone, $Updatedone, $Failed appear to be mutually exclusive. You don't need to find $Scriptdone if $Failed is true, for instance. You could restructure your code like this to remove the redundant processing:



            $Failed = $Validate | Where-Object $_ -match $FailedValidaton

            if($Failed)
            # ...

            else
            $Scriptdone = $Validate


            Some questions:



            • Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?


            • How big are the files?


            Edit:



            Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.



            I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.



            I can just throw some random ideas out there.




            • Try using the -Raw switch on the Get-Content to load the whole in one chunk:



              $Validate = Get-Content -Path $File -Raw



              $Scriptdone = $Validate -match $Scriptcompletedsuccess




            • Try using Select-String to search through the files:



              Note: Delete the Get-Content line for this idea.



              $Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1



              The Select-Object -First 1 is optional, but it should speed things up because the search will stop as soon as the first match is found.



            • My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.


            Ultimately you have to track down the source of the slowness before you can fix the problem.







            share|improve this answer















            share|improve this answer



            share|improve this answer








            edited Jul 13 at 9:23


























            answered Jul 11 at 2:44









            Dangph

            1,458510




            1,458510











            • Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
              – Just_learning
              Jul 11 at 16:48










            • I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
              – Just_learning
              Jul 11 at 17:09










            • @Just_learning, please see the edit at the bottom of my answer.
              – Dangph
              Jul 13 at 9:24
















            • Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
              – Just_learning
              Jul 11 at 16:48










            • I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
              – Just_learning
              Jul 11 at 17:09










            • @Just_learning, please see the edit at the bottom of my answer.
              – Dangph
              Jul 13 at 9:24















            Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
            – Just_learning
            Jul 11 at 16:48




            Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
            – Just_learning
            Jul 11 at 16:48












            I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
            – Just_learning
            Jul 11 at 17:09




            I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
            – Just_learning
            Jul 11 at 17:09












            @Just_learning, please see the edit at the bottom of my answer.
            – Dangph
            Jul 13 at 9:24




            @Just_learning, please see the edit at the bottom of my answer.
            – Dangph
            Jul 13 at 9:24












             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f198230%2fpowershell-function-to-search-csv-logs-for-certain-regexes%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Chat program with C++ and SFML

            Function to Return a JSON Like Objects Using VBA Collections and Arrays

            Will my employers contract hold up in court?