Powershell function to search CSV logs for certain regexes
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.
This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.
This one grabs the IDNumber
from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate
), then based on the regex, adds files to certain variables ($Scriptdone
, $Updatedone
, $Failed
) and starts the checks to see if they have them.
Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!
function Get-MR4RES
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,
[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param
begin
# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''
# End begin
process
# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach
# End process section
End
# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))
# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii
# If no CSVPath is used in get-command
else
# Out-put to console
Write-Output $array
# End of else
# End of the End
# End of function
performance regex csv powershell
add a comment |Â
up vote
1
down vote
favorite
I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.
This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.
This one grabs the IDNumber
from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate
), then based on the regex, adds files to certain variables ($Scriptdone
, $Updatedone
, $Failed
) and starts the checks to see if they have them.
Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!
function Get-MR4RES
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,
[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param
begin
# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''
# End begin
process
# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach
# End process section
End
# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))
# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii
# If no CSVPath is used in get-command
else
# Out-put to console
Write-Output $array
# End of else
# End of the End
# End of function
performance regex csv powershell
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Mast
Jul 11 at 18:51
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.
This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.
This one grabs the IDNumber
from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate
), then based on the regex, adds files to certain variables ($Scriptdone
, $Updatedone
, $Failed
) and starts the checks to see if they have them.
Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!
function Get-MR4RES
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,
[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param
begin
# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''
# End begin
process
# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach
# End process section
End
# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))
# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii
# If no CSVPath is used in get-command
else
# Out-put to console
Write-Output $array
# End of else
# End of the End
# End of function
performance regex csv powershell
I have a parser that goes through two different logs, both .csv files, and checks for certain lines based off the regex code that I have chosen.
This parser works fine, it just takes about a minute to parse through about 100 files. This parser is based off another parser I have that only parses through one log type and it is incredibly fast, like 200 files in maybe 15 secs.
This one grabs the IDNumber
from the beginning of the filename (1234-randomfile.csv), then adds the files location to a variable ($Validate
), then based on the regex, adds files to certain variables ($Scriptdone
, $Updatedone
, $Failed
) and starts the checks to see if they have them.
Like I said, it works, but it is slow. If you have any input on a way to quicken this up, or maybe clean up my code (I am still learning), it will all be greatly appreciated!
function Get-MR4RES
[CmdletBinding()]
param (
[Parameter(Position = 0,
Mandatory = $True)]
[ValidateNotNullorEmpty()]
[ValidateScript( Test-Path -Path $_ -PathType 'Any')]
[String]
$Files,
[Parameter(Position = 1,
Mandatory = $false)]
[String]
$CSVPath) # End Param
begin
# Setting Global Variables
$Scriptcompletedsuccess = '.+Scriptscompletedssuccessfully.+' # 3:44:15 End function called, Script completed successfully at 3:44:15 on Tue 07/03/2018
$Updatecomplete = 'w+s+:s[d+:d+:d+]s+w+scomplete' # STATUS : [03:43:07] Update complete
$FailedValidaton = '.+checksfail.+'
$Fail1 = 'Validation Failed'
$Fail2 = 'Failed'
$Good1 = 'Script completed'
$Good2 = 'Update completed'
$array = @('IDNumber, Results')
$counter = 0
$FileList = (Get-ChildItem -Path $Files -File -Filter "*.log").FullName
$Done = ''
# End begin
process
# Do the following code in all the files in the filelist
foreach ($File in $fileList) Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate # End of foreach
# End process section
End
# If CSVPath is used in get-command
if ($PSBoundParameters.ContainsKey('CSVPath'))
# Pipe the array data to a CSV
Add-Content -Path $CSVPath -Value $array -Encoding ascii
# If no CSVPath is used in get-command
else
# Out-put to console
Write-Output $array
# End of else
# End of the End
# End of function
performance regex csv powershell
edited Jul 11 at 19:07
Jamalâ¦
30.1k11114225
30.1k11114225
asked Jul 10 at 16:15
Just_learning
64
64
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Mast
Jul 11 at 18:51
add a comment |Â
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Mast
Jul 11 at 18:51
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Mast
Jul 11 at 18:51
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Mast
Jul 11 at 18:51
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.
$size = $array.Length
#Assumes there's only one line that will match a given regex per file
#If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
$validate |% {
switch -regex($_)
$Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
$updatecomplete $array += "$IDNumber, $Good2"; break
$Failedvalidation $array += "$IDNumber, $Fail1"; break
default
#Checks to see if array has grown, if it hasn't, no matches were found
#Bit hacky and there's probably a better way to do it.
if($size -eq $array.length)
$array += -join ("$IDNumber",', ',"$Fail2")
Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.
add a comment |Â
up vote
0
down vote
You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:
$Validate = Get-Content -Path $File
Or it's this part where you scan through the in-memory array:
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Or maybe both contribute to the slowness.
You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content
line, and just load in one of the files once at the beginning. Does that speed it up?
You can also try commenting out the scanning lines. Does that speed it up?
An observation:
$Scriptdone
, $Updatedone
, $Failed
appear to be mutually exclusive. You don't need to find $Scriptdone
if $Failed
is true, for instance. You could restructure your code like this to remove the redundant processing:
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
if($Failed)
# ...
else
$Scriptdone = $Validate
Some questions:
Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?
How big are the files?
Edit:
Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.
I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.
I can just throw some random ideas out there.
Try using the -Raw switch on the Get-Content to load the whole in one chunk:
$Validate = Get-Content -Path $File -Raw
$Scriptdone = $Validate -match $Scriptcompletedsuccess
Try using Select-String to search through the files:
Note: Delete the Get-Content line for this idea.
$Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1
The
Select-Object -First 1
is optional, but it should speed things up because the search will stop as soon as the first match is found.My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.
Ultimately you have to track down the source of the slowness before you can fix the problem.
Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
â Just_learning
Jul 11 at 16:48
I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
â Just_learning
Jul 11 at 17:09
@Just_learning, please see the edit at the bottom of my answer.
â Dangph
Jul 13 at 9:24
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.
$size = $array.Length
#Assumes there's only one line that will match a given regex per file
#If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
$validate |% {
switch -regex($_)
$Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
$updatecomplete $array += "$IDNumber, $Good2"; break
$Failedvalidation $array += "$IDNumber, $Fail1"; break
default
#Checks to see if array has grown, if it hasn't, no matches were found
#Bit hacky and there's probably a better way to do it.
if($size -eq $array.length)
$array += -join ("$IDNumber",', ',"$Fail2")
Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.
add a comment |Â
up vote
1
down vote
Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.
$size = $array.Length
#Assumes there's only one line that will match a given regex per file
#If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
$validate |% {
switch -regex($_)
$Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
$updatecomplete $array += "$IDNumber, $Good2"; break
$Failedvalidation $array += "$IDNumber, $Fail1"; break
default
#Checks to see if array has grown, if it hasn't, no matches were found
#Bit hacky and there's probably a better way to do it.
if($size -eq $array.length)
$array += -join ("$IDNumber",', ',"$Fail2")
Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.
$size = $array.Length
#Assumes there's only one line that will match a given regex per file
#If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
$validate |% {
switch -regex($_)
$Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
$updatecomplete $array += "$IDNumber, $Good2"; break
$Failedvalidation $array += "$IDNumber, $Fail1"; break
default
#Checks to see if array has grown, if it hasn't, no matches were found
#Bit hacky and there's probably a better way to do it.
if($size -eq $array.length)
$array += -join ("$IDNumber",', ',"$Fail2")
Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.
Just a quick off the cuff and building off of what Dangph started but the triple scan is probably killing performance.
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Each one is reading through the entire file to find one thing. One route you can try out is using a foreach(You can shorthand it with % as I'll do below) and a switch.
$size = $array.Length
#Assumes there's only one line that will match a given regex per file
#If not, it'll add duplicates which can be stripped at the end with an $array | sort -unique
$validate |% {
switch -regex($_)
$Scriptcompletedsuccess $array += "$IDNumber, $Good1"; break
$updatecomplete $array += "$IDNumber, $Good2"; break
$Failedvalidation $array += "$IDNumber, $Fail1"; break
default
#Checks to see if array has grown, if it hasn't, no matches were found
#Bit hacky and there's probably a better way to do it.
if($size -eq $array.length)
$array += -join ("$IDNumber",', ',"$Fail2")
Oh yeah, another performance boost (not sure how much of one though) would be changing the array into an ArrayList. They can append whereas the array resizes on each add.
edited Jul 25 at 1:22
answered Jul 25 at 1:11
Veskah
1115
1115
add a comment |Â
add a comment |Â
up vote
0
down vote
You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:
$Validate = Get-Content -Path $File
Or it's this part where you scan through the in-memory array:
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Or maybe both contribute to the slowness.
You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content
line, and just load in one of the files once at the beginning. Does that speed it up?
You can also try commenting out the scanning lines. Does that speed it up?
An observation:
$Scriptdone
, $Updatedone
, $Failed
appear to be mutually exclusive. You don't need to find $Scriptdone
if $Failed
is true, for instance. You could restructure your code like this to remove the redundant processing:
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
if($Failed)
# ...
else
$Scriptdone = $Validate
Some questions:
Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?
How big are the files?
Edit:
Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.
I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.
I can just throw some random ideas out there.
Try using the -Raw switch on the Get-Content to load the whole in one chunk:
$Validate = Get-Content -Path $File -Raw
$Scriptdone = $Validate -match $Scriptcompletedsuccess
Try using Select-String to search through the files:
Note: Delete the Get-Content line for this idea.
$Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1
The
Select-Object -First 1
is optional, but it should speed things up because the search will stop as soon as the first match is found.My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.
Ultimately you have to track down the source of the slowness before you can fix the problem.
Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
â Just_learning
Jul 11 at 16:48
I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
â Just_learning
Jul 11 at 17:09
@Just_learning, please see the edit at the bottom of my answer.
â Dangph
Jul 13 at 9:24
add a comment |Â
up vote
0
down vote
You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:
$Validate = Get-Content -Path $File
Or it's this part where you scan through the in-memory array:
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Or maybe both contribute to the slowness.
You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content
line, and just load in one of the files once at the beginning. Does that speed it up?
You can also try commenting out the scanning lines. Does that speed it up?
An observation:
$Scriptdone
, $Updatedone
, $Failed
appear to be mutually exclusive. You don't need to find $Scriptdone
if $Failed
is true, for instance. You could restructure your code like this to remove the redundant processing:
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
if($Failed)
# ...
else
$Scriptdone = $Validate
Some questions:
Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?
How big are the files?
Edit:
Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.
I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.
I can just throw some random ideas out there.
Try using the -Raw switch on the Get-Content to load the whole in one chunk:
$Validate = Get-Content -Path $File -Raw
$Scriptdone = $Validate -match $Scriptcompletedsuccess
Try using Select-String to search through the files:
Note: Delete the Get-Content line for this idea.
$Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1
The
Select-Object -First 1
is optional, but it should speed things up because the search will stop as soon as the first match is found.My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.
Ultimately you have to track down the source of the slowness before you can fix the problem.
Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
â Just_learning
Jul 11 at 16:48
I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
â Just_learning
Jul 11 at 17:09
@Just_learning, please see the edit at the bottom of my answer.
â Dangph
Jul 13 at 9:24
add a comment |Â
up vote
0
down vote
up vote
0
down vote
You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:
$Validate = Get-Content -Path $File
Or it's this part where you scan through the in-memory array:
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Or maybe both contribute to the slowness.
You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content
line, and just load in one of the files once at the beginning. Does that speed it up?
You can also try commenting out the scanning lines. Does that speed it up?
An observation:
$Scriptdone
, $Updatedone
, $Failed
appear to be mutually exclusive. You don't need to find $Scriptdone
if $Failed
is true, for instance. You could restructure your code like this to remove the redundant processing:
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
if($Failed)
# ...
else
$Scriptdone = $Validate
Some questions:
Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?
How big are the files?
Edit:
Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.
I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.
I can just throw some random ideas out there.
Try using the -Raw switch on the Get-Content to load the whole in one chunk:
$Validate = Get-Content -Path $File -Raw
$Scriptdone = $Validate -match $Scriptcompletedsuccess
Try using Select-String to search through the files:
Note: Delete the Get-Content line for this idea.
$Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1
The
Select-Object -First 1
is optional, but it should speed things up because the search will stop as soon as the first match is found.My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.
Ultimately you have to track down the source of the slowness before you can fix the problem.
You have to find out where the slow part is before you can speed it up. Just by looking at it, I would say it is either this part where you load the whole file into memory:
$Validate = Get-Content -Path $File
Or it's this part where you scan through the in-memory array:
$Scriptdone = $Validate | Where-Object $_ -match $Scriptcompletedsuccess
$Updatedone = $Validate | where-object $_ -match $Updatecomplete
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
Or maybe both contribute to the slowness.
You should do some experiments to determine which part is slow. You could for instance comment out the Get-Content
line, and just load in one of the files once at the beginning. Does that speed it up?
You can also try commenting out the scanning lines. Does that speed it up?
An observation:
$Scriptdone
, $Updatedone
, $Failed
appear to be mutually exclusive. You don't need to find $Scriptdone
if $Failed
is true, for instance. You could restructure your code like this to remove the redundant processing:
$Failed = $Validate | Where-Object $_ -match $FailedValidaton
if($Failed)
# ...
else
$Scriptdone = $Validate
Some questions:
Where do the lines you are looking for appear in the files? Can they be anywhere, or are they at some particular place (the beginning or the end for instance)?
How big are the files?
Edit:
Based on the answers to those questions (see the comments), I have to say I don't understand why it is slow.
I don't think a Get-Content on 100 4MB files should take any time at all. I am sceptical that it is the cause. Since I don't know what what the source of the slowness is, I can't really suggest much except to do more experimentation to work out what it is.
I can just throw some random ideas out there.
Try using the -Raw switch on the Get-Content to load the whole in one chunk:
$Validate = Get-Content -Path $File -Raw
$Scriptdone = $Validate -match $Scriptcompletedsuccess
Try using Select-String to search through the files:
Note: Delete the Get-Content line for this idea.
$Scriptdone = Select-String $Scriptcompletedsuccess $File | | Select-Object -First 1
The
Select-Object -First 1
is optional, but it should speed things up because the search will stop as soon as the first match is found.My last idea is to try simplifying the regular expressions, just as an experiment. Sometimes some regular expressions can be slow. I don't think that should be the case with yours, but you never know.
Ultimately you have to track down the source of the slowness before you can fix the problem.
edited Jul 13 at 9:23
answered Jul 11 at 2:44
Dangph
1,458510
1,458510
Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
â Just_learning
Jul 11 at 16:48
I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
â Just_learning
Jul 11 at 17:09
@Just_learning, please see the edit at the bottom of my answer.
â Dangph
Jul 13 at 9:24
add a comment |Â
Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
â Just_learning
Jul 11 at 16:48
I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
â Just_learning
Jul 11 at 17:09
@Just_learning, please see the edit at the bottom of my answer.
â Dangph
Jul 13 at 9:24
Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
â Just_learning
Jul 11 at 16:48
Thanks for the answer! I will try what you suggested and let you know if the results. As for your questions, They can appear anywhere in the files, and the files range from 20kb to 4mb.
â Just_learning
Jul 11 at 16:48
I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
â Just_learning
Jul 11 at 17:09
I commented out the 'get-content' and it speeds through everything but outputs it all as failures. I have done as you suggested above and restructured, but it is still slow. Any other suggestions?
â Just_learning
Jul 11 at 17:09
@Just_learning, please see the edit at the bottom of my answer.
â Dangph
Jul 13 at 9:24
@Just_learning, please see the edit at the bottom of my answer.
â Dangph
Jul 13 at 9:24
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f198230%2fpowershell-function-to-search-csv-logs-for-certain-regexes%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
â Mast
Jul 11 at 18:51