Custom double parser optimized for performance
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
7
down vote
favorite
I'm trying to beat the native Double.TryParse
for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##
" format doubles.
Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)
My Implementation
This is the setup of the function (mostly for context).
private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];
/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)
result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];
// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)
sign = -1d;
++currentIndex;
As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:
unchecked
while (true)
// Return now if we have reached the end of the string
if (currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)
// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
NegPow10
at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n
for quick reference. Anything bigger just falls back to Math.Pow()
/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double
1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;
Test Cases
The following test cases are all passing:
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.
Benchmarks
I benchmarked my implementation against native Double.TryParse
to guage the performance difference.
I tested parsing an array of 10 million different strings using:
Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)
Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:
Parsers.FastTryParseDouble(value, out _)
Results
Native Double.TryParse took ~4500 ms.
Custom Parsers.FastTryParseDouble took ~950 ms.
Performance gain was ~370%
Next Steps
See any other ways I can squeeze out more performance?
Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true
and a bad result.
c# performance parsing floating-point
 |Â
show 5 more comments
up vote
7
down vote
favorite
I'm trying to beat the native Double.TryParse
for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##
" format doubles.
Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)
My Implementation
This is the setup of the function (mostly for context).
private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];
/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)
result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];
// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)
sign = -1d;
++currentIndex;
As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:
unchecked
while (true)
// Return now if we have reached the end of the string
if (currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)
// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
NegPow10
at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n
for quick reference. Anything bigger just falls back to Math.Pow()
/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double
1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;
Test Cases
The following test cases are all passing:
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.
Benchmarks
I benchmarked my implementation against native Double.TryParse
to guage the performance difference.
I tested parsing an array of 10 million different strings using:
Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)
Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:
Parsers.FastTryParseDouble(value, out _)
Results
Native Double.TryParse took ~4500 ms.
Custom Parsers.FastTryParseDouble took ~950 ms.
Performance gain was ~370%
Next Steps
See any other ways I can squeeze out more performance?
Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true
and a bad result.
c# performance parsing floating-point
You missed 3 cases,âÂÂ
,-âÂÂ
andNaN
. These should parse out asDouble.PositiveInfinity
,Double.NegativeInfinity
andDouble.NaN
. Actually there is one more, your tests should be able to parse the positive symbol, ie:+1234.5678
â Ron Beyer
Jul 31 at 1:34
1
You should use aStopwatch
instead of just two time differences when doing benchmarks.
â Ron Beyer
Jul 31 at 1:55
@RonBeyer I just discovered thatStopwatch
usesDateTime.UTCNow
behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks;
Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
â Alain
Jul 31 at 2:47
1
Added comments and test cases to better indicate what formats are and aren't meant to be supported.
â Alain
Jul 31 at 4:22
I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to usewhile(true)
and have a separate check forif (currentIndex >= length)
. It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
â Alain
Aug 1 at 0:08
 |Â
show 5 more comments
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I'm trying to beat the native Double.TryParse
for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##
" format doubles.
Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)
My Implementation
This is the setup of the function (mostly for context).
private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];
/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)
result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];
// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)
sign = -1d;
++currentIndex;
As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:
unchecked
while (true)
// Return now if we have reached the end of the string
if (currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)
// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
NegPow10
at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n
for quick reference. Anything bigger just falls back to Math.Pow()
/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double
1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;
Test Cases
The following test cases are all passing:
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.
Benchmarks
I benchmarked my implementation against native Double.TryParse
to guage the performance difference.
I tested parsing an array of 10 million different strings using:
Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)
Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:
Parsers.FastTryParseDouble(value, out _)
Results
Native Double.TryParse took ~4500 ms.
Custom Parsers.FastTryParseDouble took ~950 ms.
Performance gain was ~370%
Next Steps
See any other ways I can squeeze out more performance?
Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true
and a bad result.
c# performance parsing floating-point
I'm trying to beat the native Double.TryParse
for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##
" format doubles.
Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)
My Implementation
This is the setup of the function (mostly for context).
private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];
/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)
result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];
// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)
sign = -1d;
++currentIndex;
As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:
unchecked
while (true)
// Return now if we have reached the end of the string
if (currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)
// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
NegPow10
at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n
for quick reference. Anything bigger just falls back to Math.Pow()
/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double
1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;
Test Cases
The following test cases are all passing:
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.
Benchmarks
I benchmarked my implementation against native Double.TryParse
to guage the performance difference.
I tested parsing an array of 10 million different strings using:
Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)
Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:
Parsers.FastTryParseDouble(value, out _)
Results
Native Double.TryParse took ~4500 ms.
Custom Parsers.FastTryParseDouble took ~950 ms.
Performance gain was ~370%
Next Steps
See any other ways I can squeeze out more performance?
Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true
and a bad result.
c# performance parsing floating-point
edited Aug 2 at 15:44
asked Jul 31 at 0:25
Alain
2831314
2831314
You missed 3 cases,âÂÂ
,-âÂÂ
andNaN
. These should parse out asDouble.PositiveInfinity
,Double.NegativeInfinity
andDouble.NaN
. Actually there is one more, your tests should be able to parse the positive symbol, ie:+1234.5678
â Ron Beyer
Jul 31 at 1:34
1
You should use aStopwatch
instead of just two time differences when doing benchmarks.
â Ron Beyer
Jul 31 at 1:55
@RonBeyer I just discovered thatStopwatch
usesDateTime.UTCNow
behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks;
Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
â Alain
Jul 31 at 2:47
1
Added comments and test cases to better indicate what formats are and aren't meant to be supported.
â Alain
Jul 31 at 4:22
I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to usewhile(true)
and have a separate check forif (currentIndex >= length)
. It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
â Alain
Aug 1 at 0:08
 |Â
show 5 more comments
You missed 3 cases,âÂÂ
,-âÂÂ
andNaN
. These should parse out asDouble.PositiveInfinity
,Double.NegativeInfinity
andDouble.NaN
. Actually there is one more, your tests should be able to parse the positive symbol, ie:+1234.5678
â Ron Beyer
Jul 31 at 1:34
1
You should use aStopwatch
instead of just two time differences when doing benchmarks.
â Ron Beyer
Jul 31 at 1:55
@RonBeyer I just discovered thatStopwatch
usesDateTime.UTCNow
behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks;
Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
â Alain
Jul 31 at 2:47
1
Added comments and test cases to better indicate what formats are and aren't meant to be supported.
â Alain
Jul 31 at 4:22
I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to usewhile(true)
and have a separate check forif (currentIndex >= length)
. It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
â Alain
Aug 1 at 0:08
You missed 3 cases,
âÂÂ
, -âÂÂ
and NaN
. These should parse out as Double.PositiveInfinity
, Double.NegativeInfinity
and Double.NaN
. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
â Ron Beyer
Jul 31 at 1:34
You missed 3 cases,
âÂÂ
, -âÂÂ
and NaN
. These should parse out as Double.PositiveInfinity
, Double.NegativeInfinity
and Double.NaN
. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
â Ron Beyer
Jul 31 at 1:34
1
1
You should use a
Stopwatch
instead of just two time differences when doing benchmarks.â Ron Beyer
Jul 31 at 1:55
You should use a
Stopwatch
instead of just two time differences when doing benchmarks.â Ron Beyer
Jul 31 at 1:55
@RonBeyer I just discovered that
Stopwatch
uses DateTime.UTCNow
behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks;
Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.â Alain
Jul 31 at 2:47
@RonBeyer I just discovered that
Stopwatch
uses DateTime.UTCNow
behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks;
Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.â Alain
Jul 31 at 2:47
1
1
Added comments and test cases to better indicate what formats are and aren't meant to be supported.
â Alain
Jul 31 at 4:22
Added comments and test cases to better indicate what formats are and aren't meant to be supported.
â Alain
Jul 31 at 4:22
I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use
while(true)
and have a separate check for if (currentIndex >= length)
. It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.â Alain
Aug 1 at 0:08
I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use
while(true)
and have a separate check for if (currentIndex >= length)
. It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.â Alain
Aug 1 at 0:08
 |Â
show 5 more comments
4 Answers
4
active
oldest
votes
up vote
2
down vote
Alternatives for OP to try
As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.
// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++)
ch = input[i]
if (some_isdigit_test(ch))
significant = significant * 10 + ch - '0'
else if (ch == DP)
DP = '0' // Never match again
fractionLengh = input.len - i - 1
else
return fail;
}
// continue as before
if (fractionLengh < NegPow10.Length) ....
Perhaps integers?
Instead of accumulating result
as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double
.
Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double
for subsequent calculation.
Not alwasy the best
There are concerns with OP's code about generating the best answer.
Challenging (lengthly) text input eventual causes significant * 10
to round its answer and even perhaps overflow (even with an in range possible result).
With OP's fractionalPart
being rounded and NegPow10[fractionLengh])
also rounded, the product and than addition to result
may be off by 1 or 2 ULP.
To get the best result, additional (slower) code is needed.
-0.0
It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".
Range
I'd expect test cases should include the maximum +/-Double
as text and the minimum non-zero value +/-0.01
Positive numbers?
Code test for a leading '-'
. How about a leading '+'
? Research CurrentCulture.NumberFormat.PositiveSign
.
On Integers: I actually have a second (much simpler)FastTryParseInt
- it's ~2x slower (on my 64 bit machine) than than an identical routine usingdoubles
. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
â Alain
Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
â Alain
Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants tolong
(until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
â Alain
Aug 3 at 13:52
add a comment |Â
up vote
0
down vote
Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.
A little more work when checking for negatives:
// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)
if (length <= 1) return false;
sign = -1d;
++currentIndex;
nextChar = input[1];
And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.
unchecked
// Break on the first non-numeric character
while (nextChar >= '0' && nextChar <= '9')
// Multiply by 10 and add the next digit.
result = result * 10 + (nextChar - '0');
// Return if we have reached the end of the string
if (++currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex];
// The next character should be the decimal separator, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
// Perform a similar loop to construct the fractional part of the decimal
double fractionalPart = 0d;
int fractionLengh = length - ++currentIndex;
while (currentIndex < length)
nextChar = input[currentIndex++];
// If we encounter a non-numeric character now, it's an error
if (nextChar < '0'
// Adjust the magnitude and add the fractional part to the result
// Use our power of negative powers of 10 if possible.
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.
I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.
add a comment |Â
up vote
0
down vote
Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.
It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if
checks that simply returned false before.
I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.
/// <summary>High performance double parser with rudimentary flexibility.</summary>
/// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
/// <remarks>Does not support thousand separators or whitespace.</remarks>
/// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
/// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
/// So long as all culture symbols are a single character in length.
/// TODO: In theory, this class could be made instantiable, take the culture as an argument,
/// and support changing the culture at runtime in case the file the user is uploading
/// was generated on a machine with different culture settings.</remarks>
/// <remarks>Supports leading negative signs and positive signs, scientific notation,
/// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
/// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
/// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
public static bool FastTryParseDouble(string input, out double result)
/// <summary>Checks if the string matches one of a few supported special case
/// double strings. If so, assigns the result and returns true.</summary>
public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)
if (input == NumberFormat.PositiveInfinitySymbol)
result = Double.PositiveInfinity;
else if (input == NumberFormat.NegativeInfinitySymbol)
result = Double.NegativeInfinity;
else if (input == NumberFormat.NaNSymbol)
result = Double.NaN;
// Special Case: Excel has been known to format zero as "-".
// We intentionally support it by returning zero now (most parsers would not)
else if (input == NumberFormat.NegativeSign)
result = 0d;
// Special Case: Our organization treats the term "Unlimited" as referring
// to Double.MaxValue (most parsers would not)
else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
result = Double.MaxValue;
// Anything else is not a valid input
else
result = Double.NaN;
return false;
return true;
/// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
private const int MaxDoubleExponent = 308;
/// <summary>The number of elements that will be generated in the Pow10 array.</summary>
private const int Pow10Length = MaxDoubleExponent * 2 + 1;
/// <summary>A cache of all possible positive powers of 10 that might be required to
/// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
/// exponents. (Indices 309-616)</summary>
private static readonly double Pow10 =
Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
.Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
.ToArray();
/// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
private const int NegPow10Length = 16;
/// <summary>A cache of the first 15 negative powers of 10 for quick
/// magnitude adjustment of common parsed fractional parts of doubles.</summary>
/// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
/// users that don't use scientific notation or extremely long fractional parts
/// might get a speedup by being able to reference the smaller array, which has a better
/// chance of being served out of L1/L2 cache.</remarks>
private static readonly double NegPow10 =
Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();
This new method matches all of the following test cases:
// Numbers without a fractional part
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
// Numbers with a fractional part
TestSuccess("123.45678", 123.45678);
TestSuccess("-123.45678", -123.45678);
// Numbers without an integer part
TestSuccess(".12345678901234", 0.12345678901234);
TestSuccess("-.12345678901234", -0.12345678901234);
// Various high-precision numbers
TestSuccess("0.12345678901234", 0.12345678901234);
TestSuccess("-0.12345678901234", -0.12345678901234);
TestSuccess("0.00000987654321", 0.00000987654321);
TestSuccess("-0.00000987654321", -0.00000987654321);
TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
// Numbers with very long fractional parts (more than 16 characters)
TestSuccess("0.00826499999979784", 0.00826499999979784);
TestSuccess("-0.00826499999979784", -0.00826499999979784);
TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
// Numbers with a leading positive sign
TestSuccess("+1", 1d);
TestSuccess("+12345678901234", 12345678901234d);
TestSuccess("+.12345678901234", 0.12345678901234);
TestSuccess("+0.00826499999979784", 0.00826499999979784);
// Very large numbers without scientific notation
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
// Very small numbers without scientific notation
TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
// Scientific notation without a sign
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("-1.2345678e5", -1.2345678e5);
// Scientific notation with a sign
TestSuccess("1.2345678e+25", 1.2345678e25);
TestSuccess("-1.2345678e+25", -1.2345678e25);
TestSuccess("1.2345678e-255", 1.2345678e-255);
TestSuccess("-1.2345678e-255", -1.2345678e-255);
// Epsilon, and other tiny doubles
// TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
// for these, but the native parser returns Double.Epsilon (smallest number greater
// than zero). I think we can live with this shortcoming.
//TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
//TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
TestSuccess("3.33E-333", 3.33E-333);
TestSuccess("-3.33E-333", -3.33E-333);
TestSuccess("1E-1022", 1E-1022);
TestSuccess("-1E-1022", -1E-1022);
// Boundary cases
TestSuccess("1e0", 1);
TestSuccess("1e1", 10);
TestSuccess("1e-1", 0.1);
TestSuccess("1e-308", 1e-308);
TestSuccess("1e308", 1e308);
// Min and Max Double
TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
// Large Negative Exponents (Near-epsilon) doubles.
TestSuccess("1.23E-999", 1.23E-999);
TestSuccess("-1.23E-999", -1.23E-999);
// Special keywords
TestSuccess("âÂÂ", Double.PositiveInfinity);
TestSuccess("-âÂÂ", Double.NegativeInfinity);
TestSuccess("NaN", Double.NaN);
// Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
TestSuccess("Unlimited", Double.MaxValue);
// Special case: "-" character only means zero in accounting formats.
TestSuccess("-", 0d);
Benchmark Results
Using a Stopwatch
this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:
Native parser took 26220 ms.
Custom parser took 6471 ms.
Performance gain was 305.19%
@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
â Alain
Aug 3 at 12:36
Code likedo nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length);
is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit toindex
and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
â chux
Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
â Alain
Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
â chux
Aug 3 at 14:39
Other idea: consider profiling a table lookupisdigit[nextChar]
instead ofif (nextChar > '9' || nextChar < '0')
.
â chux
Aug 3 at 14:40
 |Â
show 2 more comments
up vote
0
down vote
Here is an alternative algorithm based on a recommendation by @chux.
We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:
/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char characters = new char[256];
public static bool FastTryParseDouble(string input, out double result)
Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:
Native parser took 21422 ms.
Custom parser took 8684 ms.
Performance gain was 146.68%
I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.
I just discovered that I can gofixed (char* cString = input)
in anunsafe
method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
â Alain
Aug 3 at 21:02
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
Alternatives for OP to try
As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.
// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++)
ch = input[i]
if (some_isdigit_test(ch))
significant = significant * 10 + ch - '0'
else if (ch == DP)
DP = '0' // Never match again
fractionLengh = input.len - i - 1
else
return fail;
}
// continue as before
if (fractionLengh < NegPow10.Length) ....
Perhaps integers?
Instead of accumulating result
as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double
.
Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double
for subsequent calculation.
Not alwasy the best
There are concerns with OP's code about generating the best answer.
Challenging (lengthly) text input eventual causes significant * 10
to round its answer and even perhaps overflow (even with an in range possible result).
With OP's fractionalPart
being rounded and NegPow10[fractionLengh])
also rounded, the product and than addition to result
may be off by 1 or 2 ULP.
To get the best result, additional (slower) code is needed.
-0.0
It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".
Range
I'd expect test cases should include the maximum +/-Double
as text and the minimum non-zero value +/-0.01
Positive numbers?
Code test for a leading '-'
. How about a leading '+'
? Research CurrentCulture.NumberFormat.PositiveSign
.
On Integers: I actually have a second (much simpler)FastTryParseInt
- it's ~2x slower (on my 64 bit machine) than than an identical routine usingdoubles
. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
â Alain
Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
â Alain
Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants tolong
(until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
â Alain
Aug 3 at 13:52
add a comment |Â
up vote
2
down vote
Alternatives for OP to try
As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.
// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++)
ch = input[i]
if (some_isdigit_test(ch))
significant = significant * 10 + ch - '0'
else if (ch == DP)
DP = '0' // Never match again
fractionLengh = input.len - i - 1
else
return fail;
}
// continue as before
if (fractionLengh < NegPow10.Length) ....
Perhaps integers?
Instead of accumulating result
as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double
.
Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double
for subsequent calculation.
Not alwasy the best
There are concerns with OP's code about generating the best answer.
Challenging (lengthly) text input eventual causes significant * 10
to round its answer and even perhaps overflow (even with an in range possible result).
With OP's fractionalPart
being rounded and NegPow10[fractionLengh])
also rounded, the product and than addition to result
may be off by 1 or 2 ULP.
To get the best result, additional (slower) code is needed.
-0.0
It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".
Range
I'd expect test cases should include the maximum +/-Double
as text and the minimum non-zero value +/-0.01
Positive numbers?
Code test for a leading '-'
. How about a leading '+'
? Research CurrentCulture.NumberFormat.PositiveSign
.
On Integers: I actually have a second (much simpler)FastTryParseInt
- it's ~2x slower (on my 64 bit machine) than than an identical routine usingdoubles
. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
â Alain
Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
â Alain
Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants tolong
(until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
â Alain
Aug 3 at 13:52
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Alternatives for OP to try
As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.
// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++)
ch = input[i]
if (some_isdigit_test(ch))
significant = significant * 10 + ch - '0'
else if (ch == DP)
DP = '0' // Never match again
fractionLengh = input.len - i - 1
else
return fail;
}
// continue as before
if (fractionLengh < NegPow10.Length) ....
Perhaps integers?
Instead of accumulating result
as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double
.
Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double
for subsequent calculation.
Not alwasy the best
There are concerns with OP's code about generating the best answer.
Challenging (lengthly) text input eventual causes significant * 10
to round its answer and even perhaps overflow (even with an in range possible result).
With OP's fractionalPart
being rounded and NegPow10[fractionLengh])
also rounded, the product and than addition to result
may be off by 1 or 2 ULP.
To get the best result, additional (slower) code is needed.
-0.0
It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".
Range
I'd expect test cases should include the maximum +/-Double
as text and the minimum non-zero value +/-0.01
Positive numbers?
Code test for a leading '-'
. How about a leading '+'
? Research CurrentCulture.NumberFormat.PositiveSign
.
Alternatives for OP to try
As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.
// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++)
ch = input[i]
if (some_isdigit_test(ch))
significant = significant * 10 + ch - '0'
else if (ch == DP)
DP = '0' // Never match again
fractionLengh = input.len - i - 1
else
return fail;
}
// continue as before
if (fractionLengh < NegPow10.Length) ....
Perhaps integers?
Instead of accumulating result
as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double
.
Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double
for subsequent calculation.
Not alwasy the best
There are concerns with OP's code about generating the best answer.
Challenging (lengthly) text input eventual causes significant * 10
to round its answer and even perhaps overflow (even with an in range possible result).
With OP's fractionalPart
being rounded and NegPow10[fractionLengh])
also rounded, the product and than addition to result
may be off by 1 or 2 ULP.
To get the best result, additional (slower) code is needed.
-0.0
It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".
Range
I'd expect test cases should include the maximum +/-Double
as text and the minimum non-zero value +/-0.01
Positive numbers?
Code test for a leading '-'
. How about a leading '+'
? Research CurrentCulture.NumberFormat.PositiveSign
.
edited 3 hours ago
answered Aug 2 at 16:50
chux
11.4k11238
11.4k11238
On Integers: I actually have a second (much simpler)FastTryParseInt
- it's ~2x slower (on my 64 bit machine) than than an identical routine usingdoubles
. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
â Alain
Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
â Alain
Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants tolong
(until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
â Alain
Aug 3 at 13:52
add a comment |Â
On Integers: I actually have a second (much simpler)FastTryParseInt
- it's ~2x slower (on my 64 bit machine) than than an identical routine usingdoubles
. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
â Alain
Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
â Alain
Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants tolong
(until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
â Alain
Aug 3 at 13:52
On Integers: I actually have a second (much simpler)
FastTryParseInt
- it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles
. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.â Alain
Aug 2 at 17:06
On Integers: I actually have a second (much simpler)
FastTryParseInt
- it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles
. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.â Alain
Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
â Alain
Aug 2 at 17:14
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
â Alain
Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to
long
(until the final result) to see how that went, but it did cut performance almost by double. Quite strange.â Alain
Aug 3 at 13:52
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to
long
(until the final result) to see how that went, but it did cut performance almost by double. Quite strange.â Alain
Aug 3 at 13:52
add a comment |Â
up vote
0
down vote
Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.
A little more work when checking for negatives:
// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)
if (length <= 1) return false;
sign = -1d;
++currentIndex;
nextChar = input[1];
And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.
unchecked
// Break on the first non-numeric character
while (nextChar >= '0' && nextChar <= '9')
// Multiply by 10 and add the next digit.
result = result * 10 + (nextChar - '0');
// Return if we have reached the end of the string
if (++currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex];
// The next character should be the decimal separator, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
// Perform a similar loop to construct the fractional part of the decimal
double fractionalPart = 0d;
int fractionLengh = length - ++currentIndex;
while (currentIndex < length)
nextChar = input[currentIndex++];
// If we encounter a non-numeric character now, it's an error
if (nextChar < '0'
// Adjust the magnitude and add the fractional part to the result
// Use our power of negative powers of 10 if possible.
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.
I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.
add a comment |Â
up vote
0
down vote
Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.
A little more work when checking for negatives:
// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)
if (length <= 1) return false;
sign = -1d;
++currentIndex;
nextChar = input[1];
And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.
unchecked
// Break on the first non-numeric character
while (nextChar >= '0' && nextChar <= '9')
// Multiply by 10 and add the next digit.
result = result * 10 + (nextChar - '0');
// Return if we have reached the end of the string
if (++currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex];
// The next character should be the decimal separator, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
// Perform a similar loop to construct the fractional part of the decimal
double fractionalPart = 0d;
int fractionLengh = length - ++currentIndex;
while (currentIndex < length)
nextChar = input[currentIndex++];
// If we encounter a non-numeric character now, it's an error
if (nextChar < '0'
// Adjust the magnitude and add the fractional part to the result
// Use our power of negative powers of 10 if possible.
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.
I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.
A little more work when checking for negatives:
// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)
if (length <= 1) return false;
sign = -1d;
++currentIndex;
nextChar = input[1];
And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.
unchecked
// Break on the first non-numeric character
while (nextChar >= '0' && nextChar <= '9')
// Multiply by 10 and add the next digit.
result = result * 10 + (nextChar - '0');
// Return if we have reached the end of the string
if (++currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex];
// The next character should be the decimal separator, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
// Perform a similar loop to construct the fractional part of the decimal
double fractionalPart = 0d;
int fractionLengh = length - ++currentIndex;
while (currentIndex < length)
nextChar = input[currentIndex++];
// If we encounter a non-numeric character now, it's an error
if (nextChar < '0'
// Adjust the magnitude and add the fractional part to the result
// Use our power of negative powers of 10 if possible.
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.
I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.
Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.
A little more work when checking for negatives:
// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)
if (length <= 1) return false;
sign = -1d;
++currentIndex;
nextChar = input[1];
And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.
unchecked
// Break on the first non-numeric character
while (nextChar >= '0' && nextChar <= '9')
// Multiply by 10 and add the next digit.
result = result * 10 + (nextChar - '0');
// Return if we have reached the end of the string
if (++currentIndex >= length)
result *= sign;
return true;
nextChar = input[currentIndex];
// The next character should be the decimal separator, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
// Perform a similar loop to construct the fractional part of the decimal
double fractionalPart = 0d;
int fractionLengh = length - ++currentIndex;
while (currentIndex < length)
nextChar = input[currentIndex++];
// If we encounter a non-numeric character now, it's an error
if (nextChar < '0'
// Adjust the magnitude and add the fractional part to the result
// Use our power of negative powers of 10 if possible.
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
return true;
The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.
I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.
answered Aug 1 at 2:25
Alain
2831314
2831314
add a comment |Â
add a comment |Â
up vote
0
down vote
Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.
It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if
checks that simply returned false before.
I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.
/// <summary>High performance double parser with rudimentary flexibility.</summary>
/// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
/// <remarks>Does not support thousand separators or whitespace.</remarks>
/// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
/// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
/// So long as all culture symbols are a single character in length.
/// TODO: In theory, this class could be made instantiable, take the culture as an argument,
/// and support changing the culture at runtime in case the file the user is uploading
/// was generated on a machine with different culture settings.</remarks>
/// <remarks>Supports leading negative signs and positive signs, scientific notation,
/// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
/// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
/// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
public static bool FastTryParseDouble(string input, out double result)
/// <summary>Checks if the string matches one of a few supported special case
/// double strings. If so, assigns the result and returns true.</summary>
public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)
if (input == NumberFormat.PositiveInfinitySymbol)
result = Double.PositiveInfinity;
else if (input == NumberFormat.NegativeInfinitySymbol)
result = Double.NegativeInfinity;
else if (input == NumberFormat.NaNSymbol)
result = Double.NaN;
// Special Case: Excel has been known to format zero as "-".
// We intentionally support it by returning zero now (most parsers would not)
else if (input == NumberFormat.NegativeSign)
result = 0d;
// Special Case: Our organization treats the term "Unlimited" as referring
// to Double.MaxValue (most parsers would not)
else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
result = Double.MaxValue;
// Anything else is not a valid input
else
result = Double.NaN;
return false;
return true;
/// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
private const int MaxDoubleExponent = 308;
/// <summary>The number of elements that will be generated in the Pow10 array.</summary>
private const int Pow10Length = MaxDoubleExponent * 2 + 1;
/// <summary>A cache of all possible positive powers of 10 that might be required to
/// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
/// exponents. (Indices 309-616)</summary>
private static readonly double Pow10 =
Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
.Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
.ToArray();
/// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
private const int NegPow10Length = 16;
/// <summary>A cache of the first 15 negative powers of 10 for quick
/// magnitude adjustment of common parsed fractional parts of doubles.</summary>
/// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
/// users that don't use scientific notation or extremely long fractional parts
/// might get a speedup by being able to reference the smaller array, which has a better
/// chance of being served out of L1/L2 cache.</remarks>
private static readonly double NegPow10 =
Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();
This new method matches all of the following test cases:
// Numbers without a fractional part
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
// Numbers with a fractional part
TestSuccess("123.45678", 123.45678);
TestSuccess("-123.45678", -123.45678);
// Numbers without an integer part
TestSuccess(".12345678901234", 0.12345678901234);
TestSuccess("-.12345678901234", -0.12345678901234);
// Various high-precision numbers
TestSuccess("0.12345678901234", 0.12345678901234);
TestSuccess("-0.12345678901234", -0.12345678901234);
TestSuccess("0.00000987654321", 0.00000987654321);
TestSuccess("-0.00000987654321", -0.00000987654321);
TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
// Numbers with very long fractional parts (more than 16 characters)
TestSuccess("0.00826499999979784", 0.00826499999979784);
TestSuccess("-0.00826499999979784", -0.00826499999979784);
TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
// Numbers with a leading positive sign
TestSuccess("+1", 1d);
TestSuccess("+12345678901234", 12345678901234d);
TestSuccess("+.12345678901234", 0.12345678901234);
TestSuccess("+0.00826499999979784", 0.00826499999979784);
// Very large numbers without scientific notation
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
// Very small numbers without scientific notation
TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
// Scientific notation without a sign
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("-1.2345678e5", -1.2345678e5);
// Scientific notation with a sign
TestSuccess("1.2345678e+25", 1.2345678e25);
TestSuccess("-1.2345678e+25", -1.2345678e25);
TestSuccess("1.2345678e-255", 1.2345678e-255);
TestSuccess("-1.2345678e-255", -1.2345678e-255);
// Epsilon, and other tiny doubles
// TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
// for these, but the native parser returns Double.Epsilon (smallest number greater
// than zero). I think we can live with this shortcoming.
//TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
//TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
TestSuccess("3.33E-333", 3.33E-333);
TestSuccess("-3.33E-333", -3.33E-333);
TestSuccess("1E-1022", 1E-1022);
TestSuccess("-1E-1022", -1E-1022);
// Boundary cases
TestSuccess("1e0", 1);
TestSuccess("1e1", 10);
TestSuccess("1e-1", 0.1);
TestSuccess("1e-308", 1e-308);
TestSuccess("1e308", 1e308);
// Min and Max Double
TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
// Large Negative Exponents (Near-epsilon) doubles.
TestSuccess("1.23E-999", 1.23E-999);
TestSuccess("-1.23E-999", -1.23E-999);
// Special keywords
TestSuccess("âÂÂ", Double.PositiveInfinity);
TestSuccess("-âÂÂ", Double.NegativeInfinity);
TestSuccess("NaN", Double.NaN);
// Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
TestSuccess("Unlimited", Double.MaxValue);
// Special case: "-" character only means zero in accounting formats.
TestSuccess("-", 0d);
Benchmark Results
Using a Stopwatch
this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:
Native parser took 26220 ms.
Custom parser took 6471 ms.
Performance gain was 305.19%
@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
â Alain
Aug 3 at 12:36
Code likedo nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length);
is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit toindex
and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
â chux
Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
â Alain
Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
â chux
Aug 3 at 14:39
Other idea: consider profiling a table lookupisdigit[nextChar]
instead ofif (nextChar > '9' || nextChar < '0')
.
â chux
Aug 3 at 14:40
 |Â
show 2 more comments
up vote
0
down vote
Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.
It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if
checks that simply returned false before.
I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.
/// <summary>High performance double parser with rudimentary flexibility.</summary>
/// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
/// <remarks>Does not support thousand separators or whitespace.</remarks>
/// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
/// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
/// So long as all culture symbols are a single character in length.
/// TODO: In theory, this class could be made instantiable, take the culture as an argument,
/// and support changing the culture at runtime in case the file the user is uploading
/// was generated on a machine with different culture settings.</remarks>
/// <remarks>Supports leading negative signs and positive signs, scientific notation,
/// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
/// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
/// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
public static bool FastTryParseDouble(string input, out double result)
/// <summary>Checks if the string matches one of a few supported special case
/// double strings. If so, assigns the result and returns true.</summary>
public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)
if (input == NumberFormat.PositiveInfinitySymbol)
result = Double.PositiveInfinity;
else if (input == NumberFormat.NegativeInfinitySymbol)
result = Double.NegativeInfinity;
else if (input == NumberFormat.NaNSymbol)
result = Double.NaN;
// Special Case: Excel has been known to format zero as "-".
// We intentionally support it by returning zero now (most parsers would not)
else if (input == NumberFormat.NegativeSign)
result = 0d;
// Special Case: Our organization treats the term "Unlimited" as referring
// to Double.MaxValue (most parsers would not)
else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
result = Double.MaxValue;
// Anything else is not a valid input
else
result = Double.NaN;
return false;
return true;
/// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
private const int MaxDoubleExponent = 308;
/// <summary>The number of elements that will be generated in the Pow10 array.</summary>
private const int Pow10Length = MaxDoubleExponent * 2 + 1;
/// <summary>A cache of all possible positive powers of 10 that might be required to
/// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
/// exponents. (Indices 309-616)</summary>
private static readonly double Pow10 =
Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
.Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
.ToArray();
/// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
private const int NegPow10Length = 16;
/// <summary>A cache of the first 15 negative powers of 10 for quick
/// magnitude adjustment of common parsed fractional parts of doubles.</summary>
/// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
/// users that don't use scientific notation or extremely long fractional parts
/// might get a speedup by being able to reference the smaller array, which has a better
/// chance of being served out of L1/L2 cache.</remarks>
private static readonly double NegPow10 =
Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();
This new method matches all of the following test cases:
// Numbers without a fractional part
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
// Numbers with a fractional part
TestSuccess("123.45678", 123.45678);
TestSuccess("-123.45678", -123.45678);
// Numbers without an integer part
TestSuccess(".12345678901234", 0.12345678901234);
TestSuccess("-.12345678901234", -0.12345678901234);
// Various high-precision numbers
TestSuccess("0.12345678901234", 0.12345678901234);
TestSuccess("-0.12345678901234", -0.12345678901234);
TestSuccess("0.00000987654321", 0.00000987654321);
TestSuccess("-0.00000987654321", -0.00000987654321);
TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
// Numbers with very long fractional parts (more than 16 characters)
TestSuccess("0.00826499999979784", 0.00826499999979784);
TestSuccess("-0.00826499999979784", -0.00826499999979784);
TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
// Numbers with a leading positive sign
TestSuccess("+1", 1d);
TestSuccess("+12345678901234", 12345678901234d);
TestSuccess("+.12345678901234", 0.12345678901234);
TestSuccess("+0.00826499999979784", 0.00826499999979784);
// Very large numbers without scientific notation
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
// Very small numbers without scientific notation
TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
// Scientific notation without a sign
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("-1.2345678e5", -1.2345678e5);
// Scientific notation with a sign
TestSuccess("1.2345678e+25", 1.2345678e25);
TestSuccess("-1.2345678e+25", -1.2345678e25);
TestSuccess("1.2345678e-255", 1.2345678e-255);
TestSuccess("-1.2345678e-255", -1.2345678e-255);
// Epsilon, and other tiny doubles
// TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
// for these, but the native parser returns Double.Epsilon (smallest number greater
// than zero). I think we can live with this shortcoming.
//TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
//TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
TestSuccess("3.33E-333", 3.33E-333);
TestSuccess("-3.33E-333", -3.33E-333);
TestSuccess("1E-1022", 1E-1022);
TestSuccess("-1E-1022", -1E-1022);
// Boundary cases
TestSuccess("1e0", 1);
TestSuccess("1e1", 10);
TestSuccess("1e-1", 0.1);
TestSuccess("1e-308", 1e-308);
TestSuccess("1e308", 1e308);
// Min and Max Double
TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
// Large Negative Exponents (Near-epsilon) doubles.
TestSuccess("1.23E-999", 1.23E-999);
TestSuccess("-1.23E-999", -1.23E-999);
// Special keywords
TestSuccess("âÂÂ", Double.PositiveInfinity);
TestSuccess("-âÂÂ", Double.NegativeInfinity);
TestSuccess("NaN", Double.NaN);
// Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
TestSuccess("Unlimited", Double.MaxValue);
// Special case: "-" character only means zero in accounting formats.
TestSuccess("-", 0d);
Benchmark Results
Using a Stopwatch
this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:
Native parser took 26220 ms.
Custom parser took 6471 ms.
Performance gain was 305.19%
@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
â Alain
Aug 3 at 12:36
Code likedo nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length);
is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit toindex
and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
â chux
Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
â Alain
Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
â chux
Aug 3 at 14:39
Other idea: consider profiling a table lookupisdigit[nextChar]
instead ofif (nextChar > '9' || nextChar < '0')
.
â chux
Aug 3 at 14:40
 |Â
show 2 more comments
up vote
0
down vote
up vote
0
down vote
Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.
It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if
checks that simply returned false before.
I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.
/// <summary>High performance double parser with rudimentary flexibility.</summary>
/// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
/// <remarks>Does not support thousand separators or whitespace.</remarks>
/// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
/// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
/// So long as all culture symbols are a single character in length.
/// TODO: In theory, this class could be made instantiable, take the culture as an argument,
/// and support changing the culture at runtime in case the file the user is uploading
/// was generated on a machine with different culture settings.</remarks>
/// <remarks>Supports leading negative signs and positive signs, scientific notation,
/// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
/// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
/// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
public static bool FastTryParseDouble(string input, out double result)
/// <summary>Checks if the string matches one of a few supported special case
/// double strings. If so, assigns the result and returns true.</summary>
public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)
if (input == NumberFormat.PositiveInfinitySymbol)
result = Double.PositiveInfinity;
else if (input == NumberFormat.NegativeInfinitySymbol)
result = Double.NegativeInfinity;
else if (input == NumberFormat.NaNSymbol)
result = Double.NaN;
// Special Case: Excel has been known to format zero as "-".
// We intentionally support it by returning zero now (most parsers would not)
else if (input == NumberFormat.NegativeSign)
result = 0d;
// Special Case: Our organization treats the term "Unlimited" as referring
// to Double.MaxValue (most parsers would not)
else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
result = Double.MaxValue;
// Anything else is not a valid input
else
result = Double.NaN;
return false;
return true;
/// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
private const int MaxDoubleExponent = 308;
/// <summary>The number of elements that will be generated in the Pow10 array.</summary>
private const int Pow10Length = MaxDoubleExponent * 2 + 1;
/// <summary>A cache of all possible positive powers of 10 that might be required to
/// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
/// exponents. (Indices 309-616)</summary>
private static readonly double Pow10 =
Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
.Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
.ToArray();
/// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
private const int NegPow10Length = 16;
/// <summary>A cache of the first 15 negative powers of 10 for quick
/// magnitude adjustment of common parsed fractional parts of doubles.</summary>
/// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
/// users that don't use scientific notation or extremely long fractional parts
/// might get a speedup by being able to reference the smaller array, which has a better
/// chance of being served out of L1/L2 cache.</remarks>
private static readonly double NegPow10 =
Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();
This new method matches all of the following test cases:
// Numbers without a fractional part
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
// Numbers with a fractional part
TestSuccess("123.45678", 123.45678);
TestSuccess("-123.45678", -123.45678);
// Numbers without an integer part
TestSuccess(".12345678901234", 0.12345678901234);
TestSuccess("-.12345678901234", -0.12345678901234);
// Various high-precision numbers
TestSuccess("0.12345678901234", 0.12345678901234);
TestSuccess("-0.12345678901234", -0.12345678901234);
TestSuccess("0.00000987654321", 0.00000987654321);
TestSuccess("-0.00000987654321", -0.00000987654321);
TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
// Numbers with very long fractional parts (more than 16 characters)
TestSuccess("0.00826499999979784", 0.00826499999979784);
TestSuccess("-0.00826499999979784", -0.00826499999979784);
TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
// Numbers with a leading positive sign
TestSuccess("+1", 1d);
TestSuccess("+12345678901234", 12345678901234d);
TestSuccess("+.12345678901234", 0.12345678901234);
TestSuccess("+0.00826499999979784", 0.00826499999979784);
// Very large numbers without scientific notation
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
// Very small numbers without scientific notation
TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
// Scientific notation without a sign
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("-1.2345678e5", -1.2345678e5);
// Scientific notation with a sign
TestSuccess("1.2345678e+25", 1.2345678e25);
TestSuccess("-1.2345678e+25", -1.2345678e25);
TestSuccess("1.2345678e-255", 1.2345678e-255);
TestSuccess("-1.2345678e-255", -1.2345678e-255);
// Epsilon, and other tiny doubles
// TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
// for these, but the native parser returns Double.Epsilon (smallest number greater
// than zero). I think we can live with this shortcoming.
//TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
//TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
TestSuccess("3.33E-333", 3.33E-333);
TestSuccess("-3.33E-333", -3.33E-333);
TestSuccess("1E-1022", 1E-1022);
TestSuccess("-1E-1022", -1E-1022);
// Boundary cases
TestSuccess("1e0", 1);
TestSuccess("1e1", 10);
TestSuccess("1e-1", 0.1);
TestSuccess("1e-308", 1e-308);
TestSuccess("1e308", 1e308);
// Min and Max Double
TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
// Large Negative Exponents (Near-epsilon) doubles.
TestSuccess("1.23E-999", 1.23E-999);
TestSuccess("-1.23E-999", -1.23E-999);
// Special keywords
TestSuccess("âÂÂ", Double.PositiveInfinity);
TestSuccess("-âÂÂ", Double.NegativeInfinity);
TestSuccess("NaN", Double.NaN);
// Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
TestSuccess("Unlimited", Double.MaxValue);
// Special case: "-" character only means zero in accounting formats.
TestSuccess("-", 0d);
Benchmark Results
Using a Stopwatch
this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:
Native parser took 26220 ms.
Custom parser took 6471 ms.
Performance gain was 305.19%
Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.
It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if
checks that simply returned false before.
I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.
/// <summary>High performance double parser with rudimentary flexibility.</summary>
/// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
/// <remarks>Does not support thousand separators or whitespace.</remarks>
/// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
/// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
/// So long as all culture symbols are a single character in length.
/// TODO: In theory, this class could be made instantiable, take the culture as an argument,
/// and support changing the culture at runtime in case the file the user is uploading
/// was generated on a machine with different culture settings.</remarks>
/// <remarks>Supports leading negative signs and positive signs, scientific notation,
/// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
/// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
/// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
public static bool FastTryParseDouble(string input, out double result)
/// <summary>Checks if the string matches one of a few supported special case
/// double strings. If so, assigns the result and returns true.</summary>
public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)
if (input == NumberFormat.PositiveInfinitySymbol)
result = Double.PositiveInfinity;
else if (input == NumberFormat.NegativeInfinitySymbol)
result = Double.NegativeInfinity;
else if (input == NumberFormat.NaNSymbol)
result = Double.NaN;
// Special Case: Excel has been known to format zero as "-".
// We intentionally support it by returning zero now (most parsers would not)
else if (input == NumberFormat.NegativeSign)
result = 0d;
// Special Case: Our organization treats the term "Unlimited" as referring
// to Double.MaxValue (most parsers would not)
else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
result = Double.MaxValue;
// Anything else is not a valid input
else
result = Double.NaN;
return false;
return true;
/// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
private const int MaxDoubleExponent = 308;
/// <summary>The number of elements that will be generated in the Pow10 array.</summary>
private const int Pow10Length = MaxDoubleExponent * 2 + 1;
/// <summary>A cache of all possible positive powers of 10 that might be required to
/// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
/// exponents. (Indices 309-616)</summary>
private static readonly double Pow10 =
Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
.Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
.ToArray();
/// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
private const int NegPow10Length = 16;
/// <summary>A cache of the first 15 negative powers of 10 for quick
/// magnitude adjustment of common parsed fractional parts of doubles.</summary>
/// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
/// users that don't use scientific notation or extremely long fractional parts
/// might get a speedup by being able to reference the smaller array, which has a better
/// chance of being served out of L1/L2 cache.</remarks>
private static readonly double NegPow10 =
Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();
This new method matches all of the following test cases:
// Numbers without a fractional part
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
// Numbers with a fractional part
TestSuccess("123.45678", 123.45678);
TestSuccess("-123.45678", -123.45678);
// Numbers without an integer part
TestSuccess(".12345678901234", 0.12345678901234);
TestSuccess("-.12345678901234", -0.12345678901234);
// Various high-precision numbers
TestSuccess("0.12345678901234", 0.12345678901234);
TestSuccess("-0.12345678901234", -0.12345678901234);
TestSuccess("0.00000987654321", 0.00000987654321);
TestSuccess("-0.00000987654321", -0.00000987654321);
TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
// Numbers with very long fractional parts (more than 16 characters)
TestSuccess("0.00826499999979784", 0.00826499999979784);
TestSuccess("-0.00826499999979784", -0.00826499999979784);
TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
// Numbers with a leading positive sign
TestSuccess("+1", 1d);
TestSuccess("+12345678901234", 12345678901234d);
TestSuccess("+.12345678901234", 0.12345678901234);
TestSuccess("+0.00826499999979784", 0.00826499999979784);
// Very large numbers without scientific notation
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
// Very small numbers without scientific notation
TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
// Scientific notation without a sign
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("-1.2345678e5", -1.2345678e5);
// Scientific notation with a sign
TestSuccess("1.2345678e+25", 1.2345678e25);
TestSuccess("-1.2345678e+25", -1.2345678e25);
TestSuccess("1.2345678e-255", 1.2345678e-255);
TestSuccess("-1.2345678e-255", -1.2345678e-255);
// Epsilon, and other tiny doubles
// TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
// for these, but the native parser returns Double.Epsilon (smallest number greater
// than zero). I think we can live with this shortcoming.
//TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
//TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
TestSuccess("3.33E-333", 3.33E-333);
TestSuccess("-3.33E-333", -3.33E-333);
TestSuccess("1E-1022", 1E-1022);
TestSuccess("-1E-1022", -1E-1022);
// Boundary cases
TestSuccess("1e0", 1);
TestSuccess("1e1", 10);
TestSuccess("1e-1", 0.1);
TestSuccess("1e-308", 1e-308);
TestSuccess("1e308", 1e308);
// Min and Max Double
TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
// Large Negative Exponents (Near-epsilon) doubles.
TestSuccess("1.23E-999", 1.23E-999);
TestSuccess("-1.23E-999", -1.23E-999);
// Special keywords
TestSuccess("âÂÂ", Double.PositiveInfinity);
TestSuccess("-âÂÂ", Double.NegativeInfinity);
TestSuccess("NaN", Double.NaN);
// Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
TestSuccess("Unlimited", Double.MaxValue);
// Special case: "-" character only means zero in accounting formats.
TestSuccess("-", 0d);
Benchmark Results
Using a Stopwatch
this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:
Native parser took 26220 ms.
Custom parser took 6471 ms.
Performance gain was 305.19%
edited Aug 3 at 12:33
answered Aug 2 at 16:08
Alain
2831314
2831314
@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
â Alain
Aug 3 at 12:36
Code likedo nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length);
is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit toindex
and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
â chux
Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
â Alain
Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
â chux
Aug 3 at 14:39
Other idea: consider profiling a table lookupisdigit[nextChar]
instead ofif (nextChar > '9' || nextChar < '0')
.
â chux
Aug 3 at 14:40
 |Â
show 2 more comments
@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
â Alain
Aug 3 at 12:36
Code likedo nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length);
is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit toindex
and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
â chux
Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
â Alain
Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
â chux
Aug 3 at 14:39
Other idea: consider profiling a table lookupisdigit[nextChar]
instead ofif (nextChar > '9' || nextChar < '0')
.
â chux
Aug 3 at 14:40
@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
â Alain
Aug 3 at 12:36
@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
â Alain
Aug 3 at 12:36
Code like
do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length);
is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index
and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `â chux
Aug 3 at 14:20
Code like
do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length);
is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index
and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `â chux
Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
â Alain
Aug 3 at 14:30
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
â Alain
Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
â chux
Aug 3 at 14:39
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
â chux
Aug 3 at 14:39
Other idea: consider profiling a table lookup
isdigit[nextChar]
instead of if (nextChar > '9' || nextChar < '0')
.â chux
Aug 3 at 14:40
Other idea: consider profiling a table lookup
isdigit[nextChar]
instead of if (nextChar > '9' || nextChar < '0')
.â chux
Aug 3 at 14:40
 |Â
show 2 more comments
up vote
0
down vote
Here is an alternative algorithm based on a recommendation by @chux.
We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:
/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char characters = new char[256];
public static bool FastTryParseDouble(string input, out double result)
Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:
Native parser took 21422 ms.
Custom parser took 8684 ms.
Performance gain was 146.68%
I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.
I just discovered that I can gofixed (char* cString = input)
in anunsafe
method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
â Alain
Aug 3 at 21:02
add a comment |Â
up vote
0
down vote
Here is an alternative algorithm based on a recommendation by @chux.
We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:
/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char characters = new char[256];
public static bool FastTryParseDouble(string input, out double result)
Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:
Native parser took 21422 ms.
Custom parser took 8684 ms.
Performance gain was 146.68%
I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.
I just discovered that I can gofixed (char* cString = input)
in anunsafe
method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
â Alain
Aug 3 at 21:02
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Here is an alternative algorithm based on a recommendation by @chux.
We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:
/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char characters = new char[256];
public static bool FastTryParseDouble(string input, out double result)
Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:
Native parser took 21422 ms.
Custom parser took 8684 ms.
Performance gain was 146.68%
I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.
Here is an alternative algorithm based on a recommendation by @chux.
We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:
/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char characters = new char[256];
public static bool FastTryParseDouble(string input, out double result)
Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:
Native parser took 21422 ms.
Custom parser took 8684 ms.
Performance gain was 146.68%
I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.
answered Aug 3 at 18:25
Alain
2831314
2831314
I just discovered that I can gofixed (char* cString = input)
in anunsafe
method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
â Alain
Aug 3 at 21:02
add a comment |Â
I just discovered that I can gofixed (char* cString = input)
in anunsafe
method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
â Alain
Aug 3 at 21:02
I just discovered that I can go
fixed (char* cString = input)
in an unsafe
method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.â Alain
Aug 3 at 21:02
I just discovered that I can go
fixed (char* cString = input)
in an unsafe
method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.â Alain
Aug 3 at 21:02
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200627%2fcustom-double-parser-optimized-for-performance%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
You missed 3 cases,
âÂÂ
,-âÂÂ
andNaN
. These should parse out asDouble.PositiveInfinity
,Double.NegativeInfinity
andDouble.NaN
. Actually there is one more, your tests should be able to parse the positive symbol, ie:+1234.5678
â Ron Beyer
Jul 31 at 1:34
1
You should use a
Stopwatch
instead of just two time differences when doing benchmarks.â Ron Beyer
Jul 31 at 1:55
@RonBeyer I just discovered that
Stopwatch
usesDateTime.UTCNow
behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks;
Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.â Alain
Jul 31 at 2:47
1
Added comments and test cases to better indicate what formats are and aren't meant to be supported.
â Alain
Jul 31 at 4:22
I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use
while(true)
and have a separate check forif (currentIndex >= length)
. It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.â Alain
Aug 1 at 0:08