Custom double parser optimized for performance

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
7
down vote

favorite












I'm trying to beat the native Double.TryParse for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##" format doubles.



Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)



My Implementation



This is the setup of the function (mostly for context).



private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];

/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)

result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];

// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)

sign = -1d;
++currentIndex;



As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:



 unchecked

while (true)

// Return now if we have reached the end of the string
if (currentIndex >= length)

result *= sign;
return true;

nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)

// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

return true;



NegPow10 at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n for quick reference. Anything bigger just falls back to Math.Pow()



/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double

1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;



Test Cases



The following test cases are all passing:



TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);


I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.




Benchmarks



I benchmarked my implementation against native Double.TryParse to guage the performance difference.



I tested parsing an array of 10 million different strings using:



Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)


Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:



Parsers.FastTryParseDouble(value, out _)


Results




Native Double.TryParse took ~4500 ms.



Custom Parsers.FastTryParseDouble took ~950 ms.



Performance gain was ~370%





Next Steps



See any other ways I can squeeze out more performance?



Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true and a bad result.







share|improve this question





















  • You missed 3 cases, ∞, -∞ and NaN. These should parse out as Double.PositiveInfinity, Double.NegativeInfinity and Double.NaN. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
    – Ron Beyer
    Jul 31 at 1:34







  • 1




    You should use a Stopwatch instead of just two time differences when doing benchmarks.
    – Ron Beyer
    Jul 31 at 1:55










  • @RonBeyer I just discovered that Stopwatch uses DateTime.UTCNow behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks; Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
    – Alain
    Jul 31 at 2:47







  • 1




    Added comments and test cases to better indicate what formats are and aren't meant to be supported.
    – Alain
    Jul 31 at 4:22










  • I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use while(true) and have a separate check for if (currentIndex >= length). It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
    – Alain
    Aug 1 at 0:08

















up vote
7
down vote

favorite












I'm trying to beat the native Double.TryParse for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##" format doubles.



Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)



My Implementation



This is the setup of the function (mostly for context).



private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];

/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)

result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];

// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)

sign = -1d;
++currentIndex;



As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:



 unchecked

while (true)

// Return now if we have reached the end of the string
if (currentIndex >= length)

result *= sign;
return true;

nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)

// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

return true;



NegPow10 at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n for quick reference. Anything bigger just falls back to Math.Pow()



/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double

1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;



Test Cases



The following test cases are all passing:



TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);


I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.




Benchmarks



I benchmarked my implementation against native Double.TryParse to guage the performance difference.



I tested parsing an array of 10 million different strings using:



Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)


Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:



Parsers.FastTryParseDouble(value, out _)


Results




Native Double.TryParse took ~4500 ms.



Custom Parsers.FastTryParseDouble took ~950 ms.



Performance gain was ~370%





Next Steps



See any other ways I can squeeze out more performance?



Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true and a bad result.







share|improve this question





















  • You missed 3 cases, ∞, -∞ and NaN. These should parse out as Double.PositiveInfinity, Double.NegativeInfinity and Double.NaN. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
    – Ron Beyer
    Jul 31 at 1:34







  • 1




    You should use a Stopwatch instead of just two time differences when doing benchmarks.
    – Ron Beyer
    Jul 31 at 1:55










  • @RonBeyer I just discovered that Stopwatch uses DateTime.UTCNow behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks; Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
    – Alain
    Jul 31 at 2:47







  • 1




    Added comments and test cases to better indicate what formats are and aren't meant to be supported.
    – Alain
    Jul 31 at 4:22










  • I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use while(true) and have a separate check for if (currentIndex >= length). It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
    – Alain
    Aug 1 at 0:08













up vote
7
down vote

favorite









up vote
7
down vote

favorite











I'm trying to beat the native Double.TryParse for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##" format doubles.



Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)



My Implementation



This is the setup of the function (mostly for context).



private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];

/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)

result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];

// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)

sign = -1d;
++currentIndex;



As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:



 unchecked

while (true)

// Return now if we have reached the end of the string
if (currentIndex >= length)

result *= sign;
return true;

nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)

// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

return true;



NegPow10 at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n for quick reference. Anything bigger just falls back to Math.Pow()



/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double

1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;



Test Cases



The following test cases are all passing:



TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);


I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.




Benchmarks



I benchmarked my implementation against native Double.TryParse to guage the performance difference.



I tested parsing an array of 10 million different strings using:



Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)


Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:



Parsers.FastTryParseDouble(value, out _)


Results




Native Double.TryParse took ~4500 ms.



Custom Parsers.FastTryParseDouble took ~950 ms.



Performance gain was ~370%





Next Steps



See any other ways I can squeeze out more performance?



Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true and a bad result.







share|improve this question













I'm trying to beat the native Double.TryParse for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##" format doubles.



Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)



My Implementation



This is the setup of the function (mostly for context).



private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
CurrentCulture.NumberFormat.NumberDecimalSeparator[0];

/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)

result = 0d;
int length = input.Length;
if (length == 0) return false;
double sign = 1d;
int currentIndex = 0;
char nextChar = input[0];

// Handle a possible negative sign at the beginning of the string.
if (nextChar == CharNegative)

sign = -1d;
++currentIndex;



As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:



 unchecked

while (true)

// Return now if we have reached the end of the string
if (currentIndex >= length)

result *= sign;
return true;

nextChar = input[currentIndex++];
// Break if the result wasn't a digit between 0 and 9
if (nextChar < '0'
// The next character should be a decimal character, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
double fractionalPart = 0d;
int fractionLengh = length - currentIndex;
while (currentIndex < length)

// Add the fractional part to the result, apply sign, and return
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

return true;



NegPow10 at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n for quick reference. Anything bigger just falls back to Math.Pow()



/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double

1d,
0.1,
0.01,
///... you get the idea
0.0000000000000001
;



Test Cases



The following test cases are all passing:



TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);


I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.




Benchmarks



I benchmarked my implementation against native Double.TryParse to guage the performance difference.



I tested parsing an array of 10 million different strings using:



Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)


Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:



Parsers.FastTryParseDouble(value, out _)


Results




Native Double.TryParse took ~4500 ms.



Custom Parsers.FastTryParseDouble took ~950 ms.



Performance gain was ~370%





Next Steps



See any other ways I can squeeze out more performance?



Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true and a bad result.









share|improve this question












share|improve this question




share|improve this question








edited Aug 2 at 15:44
























asked Jul 31 at 0:25









Alain

2831314




2831314











  • You missed 3 cases, ∞, -∞ and NaN. These should parse out as Double.PositiveInfinity, Double.NegativeInfinity and Double.NaN. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
    – Ron Beyer
    Jul 31 at 1:34







  • 1




    You should use a Stopwatch instead of just two time differences when doing benchmarks.
    – Ron Beyer
    Jul 31 at 1:55










  • @RonBeyer I just discovered that Stopwatch uses DateTime.UTCNow behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks; Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
    – Alain
    Jul 31 at 2:47







  • 1




    Added comments and test cases to better indicate what formats are and aren't meant to be supported.
    – Alain
    Jul 31 at 4:22










  • I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use while(true) and have a separate check for if (currentIndex >= length). It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
    – Alain
    Aug 1 at 0:08

















  • You missed 3 cases, ∞, -∞ and NaN. These should parse out as Double.PositiveInfinity, Double.NegativeInfinity and Double.NaN. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
    – Ron Beyer
    Jul 31 at 1:34







  • 1




    You should use a Stopwatch instead of just two time differences when doing benchmarks.
    – Ron Beyer
    Jul 31 at 1:55










  • @RonBeyer I just discovered that Stopwatch uses DateTime.UTCNow behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks; Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
    – Alain
    Jul 31 at 2:47







  • 1




    Added comments and test cases to better indicate what formats are and aren't meant to be supported.
    – Alain
    Jul 31 at 4:22










  • I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use while(true) and have a separate check for if (currentIndex >= length). It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
    – Alain
    Aug 1 at 0:08
















You missed 3 cases, ∞, -∞ and NaN. These should parse out as Double.PositiveInfinity, Double.NegativeInfinity and Double.NaN. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
– Ron Beyer
Jul 31 at 1:34





You missed 3 cases, ∞, -∞ and NaN. These should parse out as Double.PositiveInfinity, Double.NegativeInfinity and Double.NaN. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678
– Ron Beyer
Jul 31 at 1:34





1




1




You should use a Stopwatch instead of just two time differences when doing benchmarks.
– Ron Beyer
Jul 31 at 1:55




You should use a Stopwatch instead of just two time differences when doing benchmarks.
– Ron Beyer
Jul 31 at 1:55












@RonBeyer I just discovered that Stopwatch uses DateTime.UTCNow behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks; Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
– Alain
Jul 31 at 2:47





@RonBeyer I just discovered that Stopwatch uses DateTime.UTCNow behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks; Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay.
– Alain
Jul 31 at 2:47





1




1




Added comments and test cases to better indicate what formats are and aren't meant to be supported.
– Alain
Jul 31 at 4:22




Added comments and test cases to better indicate what formats are and aren't meant to be supported.
– Alain
Jul 31 at 4:22












I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use while(true) and have a separate check for if (currentIndex >= length). It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
– Alain
Aug 1 at 0:08





I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use while(true) and have a separate check for if (currentIndex >= length). It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly.
– Alain
Aug 1 at 0:08











4 Answers
4






active

oldest

votes

















up vote
2
down vote













Alternatives for OP to try



As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.



// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++)
ch = input[i]
if (some_isdigit_test(ch))
significant = significant * 10 + ch - '0'
else if (ch == DP)
DP = '0' // Never match again
fractionLengh = input.len - i - 1
else
return fail;

}

// continue as before
if (fractionLengh < NegPow10.Length) ....


Perhaps integers?



Instead of accumulating result as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double.



Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double for subsequent calculation.




Not alwasy the best



There are concerns with OP's code about generating the best answer.



Challenging (lengthly) text input eventual causes significant * 10 to round its answer and even perhaps overflow (even with an in range possible result).



With OP's fractionalPart being rounded and NegPow10[fractionLengh]) also rounded, the product and than addition to result may be off by 1 or 2 ULP.



To get the best result, additional (slower) code is needed.



-0.0



It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".



Range



I'd expect test cases should include the maximum +/-Double as text and the minimum non-zero value +/-0.01



Positive numbers?



Code test for a leading '-'. How about a leading '+'? Research CurrentCulture.NumberFormat.PositiveSign.






share|improve this answer























  • On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
    – Alain
    Aug 2 at 17:06











  • I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
    – Alain
    Aug 2 at 17:14











  • I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
    – Alain
    Aug 3 at 13:52

















up vote
0
down vote













Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.



A little more work when checking for negatives:



// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)

if (length <= 1) return false;
sign = -1d;
++currentIndex;
nextChar = input[1];



And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.



unchecked

// Break on the first non-numeric character
while (nextChar >= '0' && nextChar <= '9')

// Multiply by 10 and add the next digit.
result = result * 10 + (nextChar - '0');
// Return if we have reached the end of the string
if (++currentIndex >= length)

result *= sign;
return true;

nextChar = input[currentIndex];

// The next character should be the decimal separator, or else it's invalid.
if (nextChar != CharDecimalSeparator) return false;
// Perform a similar loop to construct the fractional part of the decimal
double fractionalPart = 0d;
int fractionLengh = length - ++currentIndex;
while (currentIndex < length)

nextChar = input[currentIndex++];
// If we encounter a non-numeric character now, it's an error
if (nextChar < '0'
// Adjust the magnitude and add the fractional part to the result
// Use our power of negative powers of 10 if possible.
if (fractionLengh < NegPow10.Length)
result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
else
result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

return true;


The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.



I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.






share|improve this answer




























    up vote
    0
    down vote













    Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.



    It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if checks that simply returned false before.



    I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.



    /// <summary>High performance double parser with rudimentary flexibility.</summary>
    /// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
    /// <remarks>Does not support thousand separators or whitespace.</remarks>
    /// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
    /// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
    /// So long as all culture symbols are a single character in length.
    /// TODO: In theory, this class could be made instantiable, take the culture as an argument,
    /// and support changing the culture at runtime in case the file the user is uploading
    /// was generated on a machine with different culture settings.</remarks>
    /// <remarks>Supports leading negative signs and positive signs, scientific notation,
    /// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
    /// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
    /// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
    public static bool FastTryParseDouble(string input, out double result)


    /// <summary>Checks if the string matches one of a few supported special case
    /// double strings. If so, assigns the result and returns true.</summary>
    public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)

    if (input == NumberFormat.PositiveInfinitySymbol)
    result = Double.PositiveInfinity;
    else if (input == NumberFormat.NegativeInfinitySymbol)
    result = Double.NegativeInfinity;
    else if (input == NumberFormat.NaNSymbol)
    result = Double.NaN;
    // Special Case: Excel has been known to format zero as "-".
    // We intentionally support it by returning zero now (most parsers would not)
    else if (input == NumberFormat.NegativeSign)
    result = 0d;
    // Special Case: Our organization treats the term "Unlimited" as referring
    // to Double.MaxValue (most parsers would not)
    else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
    result = Double.MaxValue;
    // Anything else is not a valid input
    else

    result = Double.NaN;
    return false;

    return true;


    /// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
    private const int MaxDoubleExponent = 308;

    /// <summary>The number of elements that will be generated in the Pow10 array.</summary>
    private const int Pow10Length = MaxDoubleExponent * 2 + 1;

    /// <summary>A cache of all possible positive powers of 10 that might be required to
    /// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
    /// exponents. (Indices 309-616)</summary>
    private static readonly double Pow10 =
    Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
    .Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
    .ToArray();

    /// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
    private const int NegPow10Length = 16;

    /// <summary>A cache of the first 15 negative powers of 10 for quick
    /// magnitude adjustment of common parsed fractional parts of doubles.</summary>
    /// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
    /// users that don't use scientific notation or extremely long fractional parts
    /// might get a speedup by being able to reference the smaller array, which has a better
    /// chance of being served out of L1/L2 cache.</remarks>
    private static readonly double NegPow10 =
    Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();


    This new method matches all of the following test cases:



    // Numbers without a fractional part
    TestSuccess("0", 0d);
    TestSuccess("1", 1d);
    TestSuccess("-1", -1d);
    TestSuccess("12345678901234", 12345678901234d);
    TestSuccess("-12345678901234", -12345678901234d);
    // Numbers with a fractional part
    TestSuccess("123.45678", 123.45678);
    TestSuccess("-123.45678", -123.45678);
    // Numbers without an integer part
    TestSuccess(".12345678901234", 0.12345678901234);
    TestSuccess("-.12345678901234", -0.12345678901234);
    // Various high-precision numbers
    TestSuccess("0.12345678901234", 0.12345678901234);
    TestSuccess("-0.12345678901234", -0.12345678901234);
    TestSuccess("0.00000987654321", 0.00000987654321);
    TestSuccess("-0.00000987654321", -0.00000987654321);
    TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
    TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
    // Numbers with very long fractional parts (more than 16 characters)
    TestSuccess("0.00826499999979784", 0.00826499999979784);
    TestSuccess("-0.00826499999979784", -0.00826499999979784);
    TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
    TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
    // Numbers with a leading positive sign
    TestSuccess("+1", 1d);
    TestSuccess("+12345678901234", 12345678901234d);
    TestSuccess("+.12345678901234", 0.12345678901234);
    TestSuccess("+0.00826499999979784", 0.00826499999979784);
    // Very large numbers without scientific notation
    TestSuccess("123456789000000000000000", 123456789000000000000000d);
    TestSuccess("-123456789000000000000000", -123456789000000000000000d);
    // Very small numbers without scientific notation
    TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
    TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
    // Scientific notation without a sign
    TestSuccess("1.2345678e5", 1.2345678e5);
    TestSuccess("1.2345678e5", 1.2345678e5);
    TestSuccess("-1.2345678e5", -1.2345678e5);
    // Scientific notation with a sign
    TestSuccess("1.2345678e+25", 1.2345678e25);
    TestSuccess("-1.2345678e+25", -1.2345678e25);
    TestSuccess("1.2345678e-255", 1.2345678e-255);
    TestSuccess("-1.2345678e-255", -1.2345678e-255);
    // Epsilon, and other tiny doubles
    // TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
    // for these, but the native parser returns Double.Epsilon (smallest number greater
    // than zero). I think we can live with this shortcoming.
    //TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
    //TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
    TestSuccess("3.33E-333", 3.33E-333);
    TestSuccess("-3.33E-333", -3.33E-333);
    TestSuccess("1E-1022", 1E-1022);
    TestSuccess("-1E-1022", -1E-1022);
    // Boundary cases
    TestSuccess("1e0", 1);
    TestSuccess("1e1", 10);
    TestSuccess("1e-1", 0.1);
    TestSuccess("1e-308", 1e-308);
    TestSuccess("1e308", 1e308);
    // Min and Max Double
    TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
    TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
    // Large Negative Exponents (Near-epsilon) doubles.
    TestSuccess("1.23E-999", 1.23E-999);
    TestSuccess("-1.23E-999", -1.23E-999);
    // Special keywords
    TestSuccess("∞", Double.PositiveInfinity);
    TestSuccess("-∞", Double.NegativeInfinity);
    TestSuccess("NaN", Double.NaN);
    // Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
    TestSuccess("Unlimited", Double.MaxValue);
    // Special case: "-" character only means zero in accounting formats.
    TestSuccess("-", 0d);


    Benchmark Results



    Using a Stopwatch this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:




    Native parser took 26220 ms.



    Custom parser took 6471 ms.



    Performance gain was 305.19%







    share|improve this answer























    • @chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
      – Alain
      Aug 3 at 12:36










    • Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
      – chux
      Aug 3 at 14:20











    • That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
      – Alain
      Aug 3 at 14:30










    • "think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
      – chux
      Aug 3 at 14:39










    • Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0').
      – chux
      Aug 3 at 14:40

















    up vote
    0
    down vote













    Here is an alternative algorithm based on a recommendation by @chux.



    We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:



    /// <summary>A buffer used store a copy of input characters so that we can
    /// append a null character avoid array index boundary checks while looping.</summary>
    private static readonly char characters = new char[256];

    public static bool FastTryParseDouble(string input, out double result)



    Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:




    Native parser took 21422 ms.



    Custom parser took 8684 ms.



    Performance gain was 146.68%




    I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.






    share|improve this answer





















    • I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
      – Alain
      Aug 3 at 21:02










    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "196"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200627%2fcustom-double-parser-optimized-for-performance%23new-answer', 'question_page');

    );

    Post as a guest






























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    2
    down vote













    Alternatives for OP to try



    As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.



    // Pseudo code
    DP = '.'
    significant = 0.0
    fractionLengh = 0
    for (i=0; i < input.len; i++)
    ch = input[i]
    if (some_isdigit_test(ch))
    significant = significant * 10 + ch - '0'
    else if (ch == DP)
    DP = '0' // Never match again
    fractionLengh = input.len - i - 1
    else
    return fail;

    }

    // continue as before
    if (fractionLengh < NegPow10.Length) ....


    Perhaps integers?



    Instead of accumulating result as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double.



    Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double for subsequent calculation.




    Not alwasy the best



    There are concerns with OP's code about generating the best answer.



    Challenging (lengthly) text input eventual causes significant * 10 to round its answer and even perhaps overflow (even with an in range possible result).



    With OP's fractionalPart being rounded and NegPow10[fractionLengh]) also rounded, the product and than addition to result may be off by 1 or 2 ULP.



    To get the best result, additional (slower) code is needed.



    -0.0



    It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".



    Range



    I'd expect test cases should include the maximum +/-Double as text and the minimum non-zero value +/-0.01



    Positive numbers?



    Code test for a leading '-'. How about a leading '+'? Research CurrentCulture.NumberFormat.PositiveSign.






    share|improve this answer























    • On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
      – Alain
      Aug 2 at 17:06











    • I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
      – Alain
      Aug 2 at 17:14











    • I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
      – Alain
      Aug 3 at 13:52














    up vote
    2
    down vote













    Alternatives for OP to try



    As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.



    // Pseudo code
    DP = '.'
    significant = 0.0
    fractionLengh = 0
    for (i=0; i < input.len; i++)
    ch = input[i]
    if (some_isdigit_test(ch))
    significant = significant * 10 + ch - '0'
    else if (ch == DP)
    DP = '0' // Never match again
    fractionLengh = input.len - i - 1
    else
    return fail;

    }

    // continue as before
    if (fractionLengh < NegPow10.Length) ....


    Perhaps integers?



    Instead of accumulating result as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double.



    Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double for subsequent calculation.




    Not alwasy the best



    There are concerns with OP's code about generating the best answer.



    Challenging (lengthly) text input eventual causes significant * 10 to round its answer and even perhaps overflow (even with an in range possible result).



    With OP's fractionalPart being rounded and NegPow10[fractionLengh]) also rounded, the product and than addition to result may be off by 1 or 2 ULP.



    To get the best result, additional (slower) code is needed.



    -0.0



    It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".



    Range



    I'd expect test cases should include the maximum +/-Double as text and the minimum non-zero value +/-0.01



    Positive numbers?



    Code test for a leading '-'. How about a leading '+'? Research CurrentCulture.NumberFormat.PositiveSign.






    share|improve this answer























    • On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
      – Alain
      Aug 2 at 17:06











    • I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
      – Alain
      Aug 2 at 17:14











    • I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
      – Alain
      Aug 3 at 13:52












    up vote
    2
    down vote










    up vote
    2
    down vote









    Alternatives for OP to try



    As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.



    // Pseudo code
    DP = '.'
    significant = 0.0
    fractionLengh = 0
    for (i=0; i < input.len; i++)
    ch = input[i]
    if (some_isdigit_test(ch))
    significant = significant * 10 + ch - '0'
    else if (ch == DP)
    DP = '0' // Never match again
    fractionLengh = input.len - i - 1
    else
    return fail;

    }

    // continue as before
    if (fractionLengh < NegPow10.Length) ....


    Perhaps integers?



    Instead of accumulating result as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double.



    Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double for subsequent calculation.




    Not alwasy the best



    There are concerns with OP's code about generating the best answer.



    Challenging (lengthly) text input eventual causes significant * 10 to round its answer and even perhaps overflow (even with an in range possible result).



    With OP's fractionalPart being rounded and NegPow10[fractionLengh]) also rounded, the product and than addition to result may be off by 1 or 2 ULP.



    To get the best result, additional (slower) code is needed.



    -0.0



    It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".



    Range



    I'd expect test cases should include the maximum +/-Double as text and the minimum non-zero value +/-0.01



    Positive numbers?



    Code test for a leading '-'. How about a leading '+'? Research CurrentCulture.NumberFormat.PositiveSign.






    share|improve this answer















    Alternatives for OP to try



    As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.



    // Pseudo code
    DP = '.'
    significant = 0.0
    fractionLengh = 0
    for (i=0; i < input.len; i++)
    ch = input[i]
    if (some_isdigit_test(ch))
    significant = significant * 10 + ch - '0'
    else if (ch == DP)
    DP = '0' // Never match again
    fractionLengh = input.len - i - 1
    else
    return fail;

    }

    // continue as before
    if (fractionLengh < NegPow10.Length) ....


    Perhaps integers?



    Instead of accumulating result as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double.



    Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double for subsequent calculation.




    Not alwasy the best



    There are concerns with OP's code about generating the best answer.



    Challenging (lengthly) text input eventual causes significant * 10 to round its answer and even perhaps overflow (even with an in range possible result).



    With OP's fractionalPart being rounded and NegPow10[fractionLengh]) also rounded, the product and than addition to result may be off by 1 or 2 ULP.



    To get the best result, additional (slower) code is needed.



    -0.0



    It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".



    Range



    I'd expect test cases should include the maximum +/-Double as text and the minimum non-zero value +/-0.01



    Positive numbers?



    Code test for a leading '-'. How about a leading '+'? Research CurrentCulture.NumberFormat.PositiveSign.







    share|improve this answer















    share|improve this answer



    share|improve this answer








    edited 3 hours ago


























    answered Aug 2 at 16:50









    chux

    11.4k11238




    11.4k11238











    • On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
      – Alain
      Aug 2 at 17:06











    • I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
      – Alain
      Aug 2 at 17:14











    • I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
      – Alain
      Aug 3 at 13:52
















    • On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
      – Alain
      Aug 2 at 17:06











    • I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
      – Alain
      Aug 2 at 17:14











    • I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
      – Alain
      Aug 3 at 13:52















    On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
    – Alain
    Aug 2 at 17:06





    On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles.
    – Alain
    Aug 2 at 17:06













    I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
    – Alain
    Aug 2 at 17:14





    I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered.
    – Alain
    Aug 2 at 17:14













    I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
    – Alain
    Aug 3 at 13:52




    I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange.
    – Alain
    Aug 3 at 13:52












    up vote
    0
    down vote













    Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.



    A little more work when checking for negatives:



    // Handle a possible negative sign at the beginning of the string.
    // We will apply the sign before returning the final result.
    if (nextChar == CharNegative)

    if (length <= 1) return false;
    sign = -1d;
    ++currentIndex;
    nextChar = input[1];



    And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.



    unchecked

    // Break on the first non-numeric character
    while (nextChar >= '0' && nextChar <= '9')

    // Multiply by 10 and add the next digit.
    result = result * 10 + (nextChar - '0');
    // Return if we have reached the end of the string
    if (++currentIndex >= length)

    result *= sign;
    return true;

    nextChar = input[currentIndex];

    // The next character should be the decimal separator, or else it's invalid.
    if (nextChar != CharDecimalSeparator) return false;
    // Perform a similar loop to construct the fractional part of the decimal
    double fractionalPart = 0d;
    int fractionLengh = length - ++currentIndex;
    while (currentIndex < length)

    nextChar = input[currentIndex++];
    // If we encounter a non-numeric character now, it's an error
    if (nextChar < '0'
    // Adjust the magnitude and add the fractional part to the result
    // Use our power of negative powers of 10 if possible.
    if (fractionLengh < NegPow10.Length)
    result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
    else
    result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

    return true;


    The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.



    I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.






    share|improve this answer

























      up vote
      0
      down vote













      Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.



      A little more work when checking for negatives:



      // Handle a possible negative sign at the beginning of the string.
      // We will apply the sign before returning the final result.
      if (nextChar == CharNegative)

      if (length <= 1) return false;
      sign = -1d;
      ++currentIndex;
      nextChar = input[1];



      And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.



      unchecked

      // Break on the first non-numeric character
      while (nextChar >= '0' && nextChar <= '9')

      // Multiply by 10 and add the next digit.
      result = result * 10 + (nextChar - '0');
      // Return if we have reached the end of the string
      if (++currentIndex >= length)

      result *= sign;
      return true;

      nextChar = input[currentIndex];

      // The next character should be the decimal separator, or else it's invalid.
      if (nextChar != CharDecimalSeparator) return false;
      // Perform a similar loop to construct the fractional part of the decimal
      double fractionalPart = 0d;
      int fractionLengh = length - ++currentIndex;
      while (currentIndex < length)

      nextChar = input[currentIndex++];
      // If we encounter a non-numeric character now, it's an error
      if (nextChar < '0'
      // Adjust the magnitude and add the fractional part to the result
      // Use our power of negative powers of 10 if possible.
      if (fractionLengh < NegPow10.Length)
      result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
      else
      result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

      return true;


      The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.



      I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.



        A little more work when checking for negatives:



        // Handle a possible negative sign at the beginning of the string.
        // We will apply the sign before returning the final result.
        if (nextChar == CharNegative)

        if (length <= 1) return false;
        sign = -1d;
        ++currentIndex;
        nextChar = input[1];



        And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.



        unchecked

        // Break on the first non-numeric character
        while (nextChar >= '0' && nextChar <= '9')

        // Multiply by 10 and add the next digit.
        result = result * 10 + (nextChar - '0');
        // Return if we have reached the end of the string
        if (++currentIndex >= length)

        result *= sign;
        return true;

        nextChar = input[currentIndex];

        // The next character should be the decimal separator, or else it's invalid.
        if (nextChar != CharDecimalSeparator) return false;
        // Perform a similar loop to construct the fractional part of the decimal
        double fractionalPart = 0d;
        int fractionLengh = length - ++currentIndex;
        while (currentIndex < length)

        nextChar = input[currentIndex++];
        // If we encounter a non-numeric character now, it's an error
        if (nextChar < '0'
        // Adjust the magnitude and add the fractional part to the result
        // Use our power of negative powers of 10 if possible.
        if (fractionLengh < NegPow10.Length)
        result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
        else
        result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

        return true;


        The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.



        I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.






        share|improve this answer













        Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.



        A little more work when checking for negatives:



        // Handle a possible negative sign at the beginning of the string.
        // We will apply the sign before returning the final result.
        if (nextChar == CharNegative)

        if (length <= 1) return false;
        sign = -1d;
        ++currentIndex;
        nextChar = input[1];



        And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.



        unchecked

        // Break on the first non-numeric character
        while (nextChar >= '0' && nextChar <= '9')

        // Multiply by 10 and add the next digit.
        result = result * 10 + (nextChar - '0');
        // Return if we have reached the end of the string
        if (++currentIndex >= length)

        result *= sign;
        return true;

        nextChar = input[currentIndex];

        // The next character should be the decimal separator, or else it's invalid.
        if (nextChar != CharDecimalSeparator) return false;
        // Perform a similar loop to construct the fractional part of the decimal
        double fractionalPart = 0d;
        int fractionLengh = length - ++currentIndex;
        while (currentIndex < length)

        nextChar = input[currentIndex++];
        // If we encounter a non-numeric character now, it's an error
        if (nextChar < '0'
        // Adjust the magnitude and add the fractional part to the result
        // Use our power of negative powers of 10 if possible.
        if (fractionLengh < NegPow10.Length)
        result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
        else
        result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

        return true;


        The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.



        I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.







        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered Aug 1 at 2:25









        Alain

        2831314




        2831314




















            up vote
            0
            down vote













            Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.



            It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if checks that simply returned false before.



            I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.



            /// <summary>High performance double parser with rudimentary flexibility.</summary>
            /// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
            /// <remarks>Does not support thousand separators or whitespace.</remarks>
            /// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
            /// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
            /// So long as all culture symbols are a single character in length.
            /// TODO: In theory, this class could be made instantiable, take the culture as an argument,
            /// and support changing the culture at runtime in case the file the user is uploading
            /// was generated on a machine with different culture settings.</remarks>
            /// <remarks>Supports leading negative signs and positive signs, scientific notation,
            /// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
            /// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
            /// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
            public static bool FastTryParseDouble(string input, out double result)


            /// <summary>Checks if the string matches one of a few supported special case
            /// double strings. If so, assigns the result and returns true.</summary>
            public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)

            if (input == NumberFormat.PositiveInfinitySymbol)
            result = Double.PositiveInfinity;
            else if (input == NumberFormat.NegativeInfinitySymbol)
            result = Double.NegativeInfinity;
            else if (input == NumberFormat.NaNSymbol)
            result = Double.NaN;
            // Special Case: Excel has been known to format zero as "-".
            // We intentionally support it by returning zero now (most parsers would not)
            else if (input == NumberFormat.NegativeSign)
            result = 0d;
            // Special Case: Our organization treats the term "Unlimited" as referring
            // to Double.MaxValue (most parsers would not)
            else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
            result = Double.MaxValue;
            // Anything else is not a valid input
            else

            result = Double.NaN;
            return false;

            return true;


            /// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
            private const int MaxDoubleExponent = 308;

            /// <summary>The number of elements that will be generated in the Pow10 array.</summary>
            private const int Pow10Length = MaxDoubleExponent * 2 + 1;

            /// <summary>A cache of all possible positive powers of 10 that might be required to
            /// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
            /// exponents. (Indices 309-616)</summary>
            private static readonly double Pow10 =
            Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
            .Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
            .ToArray();

            /// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
            private const int NegPow10Length = 16;

            /// <summary>A cache of the first 15 negative powers of 10 for quick
            /// magnitude adjustment of common parsed fractional parts of doubles.</summary>
            /// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
            /// users that don't use scientific notation or extremely long fractional parts
            /// might get a speedup by being able to reference the smaller array, which has a better
            /// chance of being served out of L1/L2 cache.</remarks>
            private static readonly double NegPow10 =
            Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();


            This new method matches all of the following test cases:



            // Numbers without a fractional part
            TestSuccess("0", 0d);
            TestSuccess("1", 1d);
            TestSuccess("-1", -1d);
            TestSuccess("12345678901234", 12345678901234d);
            TestSuccess("-12345678901234", -12345678901234d);
            // Numbers with a fractional part
            TestSuccess("123.45678", 123.45678);
            TestSuccess("-123.45678", -123.45678);
            // Numbers without an integer part
            TestSuccess(".12345678901234", 0.12345678901234);
            TestSuccess("-.12345678901234", -0.12345678901234);
            // Various high-precision numbers
            TestSuccess("0.12345678901234", 0.12345678901234);
            TestSuccess("-0.12345678901234", -0.12345678901234);
            TestSuccess("0.00000987654321", 0.00000987654321);
            TestSuccess("-0.00000987654321", -0.00000987654321);
            TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
            TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
            // Numbers with very long fractional parts (more than 16 characters)
            TestSuccess("0.00826499999979784", 0.00826499999979784);
            TestSuccess("-0.00826499999979784", -0.00826499999979784);
            TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
            TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
            // Numbers with a leading positive sign
            TestSuccess("+1", 1d);
            TestSuccess("+12345678901234", 12345678901234d);
            TestSuccess("+.12345678901234", 0.12345678901234);
            TestSuccess("+0.00826499999979784", 0.00826499999979784);
            // Very large numbers without scientific notation
            TestSuccess("123456789000000000000000", 123456789000000000000000d);
            TestSuccess("-123456789000000000000000", -123456789000000000000000d);
            // Very small numbers without scientific notation
            TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
            TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
            // Scientific notation without a sign
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("-1.2345678e5", -1.2345678e5);
            // Scientific notation with a sign
            TestSuccess("1.2345678e+25", 1.2345678e25);
            TestSuccess("-1.2345678e+25", -1.2345678e25);
            TestSuccess("1.2345678e-255", 1.2345678e-255);
            TestSuccess("-1.2345678e-255", -1.2345678e-255);
            // Epsilon, and other tiny doubles
            // TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
            // for these, but the native parser returns Double.Epsilon (smallest number greater
            // than zero). I think we can live with this shortcoming.
            //TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
            //TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
            TestSuccess("3.33E-333", 3.33E-333);
            TestSuccess("-3.33E-333", -3.33E-333);
            TestSuccess("1E-1022", 1E-1022);
            TestSuccess("-1E-1022", -1E-1022);
            // Boundary cases
            TestSuccess("1e0", 1);
            TestSuccess("1e1", 10);
            TestSuccess("1e-1", 0.1);
            TestSuccess("1e-308", 1e-308);
            TestSuccess("1e308", 1e308);
            // Min and Max Double
            TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
            TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
            // Large Negative Exponents (Near-epsilon) doubles.
            TestSuccess("1.23E-999", 1.23E-999);
            TestSuccess("-1.23E-999", -1.23E-999);
            // Special keywords
            TestSuccess("∞", Double.PositiveInfinity);
            TestSuccess("-∞", Double.NegativeInfinity);
            TestSuccess("NaN", Double.NaN);
            // Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
            TestSuccess("Unlimited", Double.MaxValue);
            // Special case: "-" character only means zero in accounting formats.
            TestSuccess("-", 0d);


            Benchmark Results



            Using a Stopwatch this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:




            Native parser took 26220 ms.



            Custom parser took 6471 ms.



            Performance gain was 305.19%







            share|improve this answer























            • @chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
              – Alain
              Aug 3 at 12:36










            • Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
              – chux
              Aug 3 at 14:20











            • That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
              – Alain
              Aug 3 at 14:30










            • "think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
              – chux
              Aug 3 at 14:39










            • Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0').
              – chux
              Aug 3 at 14:40














            up vote
            0
            down vote













            Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.



            It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if checks that simply returned false before.



            I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.



            /// <summary>High performance double parser with rudimentary flexibility.</summary>
            /// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
            /// <remarks>Does not support thousand separators or whitespace.</remarks>
            /// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
            /// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
            /// So long as all culture symbols are a single character in length.
            /// TODO: In theory, this class could be made instantiable, take the culture as an argument,
            /// and support changing the culture at runtime in case the file the user is uploading
            /// was generated on a machine with different culture settings.</remarks>
            /// <remarks>Supports leading negative signs and positive signs, scientific notation,
            /// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
            /// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
            /// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
            public static bool FastTryParseDouble(string input, out double result)


            /// <summary>Checks if the string matches one of a few supported special case
            /// double strings. If so, assigns the result and returns true.</summary>
            public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)

            if (input == NumberFormat.PositiveInfinitySymbol)
            result = Double.PositiveInfinity;
            else if (input == NumberFormat.NegativeInfinitySymbol)
            result = Double.NegativeInfinity;
            else if (input == NumberFormat.NaNSymbol)
            result = Double.NaN;
            // Special Case: Excel has been known to format zero as "-".
            // We intentionally support it by returning zero now (most parsers would not)
            else if (input == NumberFormat.NegativeSign)
            result = 0d;
            // Special Case: Our organization treats the term "Unlimited" as referring
            // to Double.MaxValue (most parsers would not)
            else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
            result = Double.MaxValue;
            // Anything else is not a valid input
            else

            result = Double.NaN;
            return false;

            return true;


            /// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
            private const int MaxDoubleExponent = 308;

            /// <summary>The number of elements that will be generated in the Pow10 array.</summary>
            private const int Pow10Length = MaxDoubleExponent * 2 + 1;

            /// <summary>A cache of all possible positive powers of 10 that might be required to
            /// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
            /// exponents. (Indices 309-616)</summary>
            private static readonly double Pow10 =
            Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
            .Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
            .ToArray();

            /// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
            private const int NegPow10Length = 16;

            /// <summary>A cache of the first 15 negative powers of 10 for quick
            /// magnitude adjustment of common parsed fractional parts of doubles.</summary>
            /// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
            /// users that don't use scientific notation or extremely long fractional parts
            /// might get a speedup by being able to reference the smaller array, which has a better
            /// chance of being served out of L1/L2 cache.</remarks>
            private static readonly double NegPow10 =
            Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();


            This new method matches all of the following test cases:



            // Numbers without a fractional part
            TestSuccess("0", 0d);
            TestSuccess("1", 1d);
            TestSuccess("-1", -1d);
            TestSuccess("12345678901234", 12345678901234d);
            TestSuccess("-12345678901234", -12345678901234d);
            // Numbers with a fractional part
            TestSuccess("123.45678", 123.45678);
            TestSuccess("-123.45678", -123.45678);
            // Numbers without an integer part
            TestSuccess(".12345678901234", 0.12345678901234);
            TestSuccess("-.12345678901234", -0.12345678901234);
            // Various high-precision numbers
            TestSuccess("0.12345678901234", 0.12345678901234);
            TestSuccess("-0.12345678901234", -0.12345678901234);
            TestSuccess("0.00000987654321", 0.00000987654321);
            TestSuccess("-0.00000987654321", -0.00000987654321);
            TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
            TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
            // Numbers with very long fractional parts (more than 16 characters)
            TestSuccess("0.00826499999979784", 0.00826499999979784);
            TestSuccess("-0.00826499999979784", -0.00826499999979784);
            TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
            TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
            // Numbers with a leading positive sign
            TestSuccess("+1", 1d);
            TestSuccess("+12345678901234", 12345678901234d);
            TestSuccess("+.12345678901234", 0.12345678901234);
            TestSuccess("+0.00826499999979784", 0.00826499999979784);
            // Very large numbers without scientific notation
            TestSuccess("123456789000000000000000", 123456789000000000000000d);
            TestSuccess("-123456789000000000000000", -123456789000000000000000d);
            // Very small numbers without scientific notation
            TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
            TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
            // Scientific notation without a sign
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("-1.2345678e5", -1.2345678e5);
            // Scientific notation with a sign
            TestSuccess("1.2345678e+25", 1.2345678e25);
            TestSuccess("-1.2345678e+25", -1.2345678e25);
            TestSuccess("1.2345678e-255", 1.2345678e-255);
            TestSuccess("-1.2345678e-255", -1.2345678e-255);
            // Epsilon, and other tiny doubles
            // TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
            // for these, but the native parser returns Double.Epsilon (smallest number greater
            // than zero). I think we can live with this shortcoming.
            //TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
            //TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
            TestSuccess("3.33E-333", 3.33E-333);
            TestSuccess("-3.33E-333", -3.33E-333);
            TestSuccess("1E-1022", 1E-1022);
            TestSuccess("-1E-1022", -1E-1022);
            // Boundary cases
            TestSuccess("1e0", 1);
            TestSuccess("1e1", 10);
            TestSuccess("1e-1", 0.1);
            TestSuccess("1e-308", 1e-308);
            TestSuccess("1e308", 1e308);
            // Min and Max Double
            TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
            TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
            // Large Negative Exponents (Near-epsilon) doubles.
            TestSuccess("1.23E-999", 1.23E-999);
            TestSuccess("-1.23E-999", -1.23E-999);
            // Special keywords
            TestSuccess("∞", Double.PositiveInfinity);
            TestSuccess("-∞", Double.NegativeInfinity);
            TestSuccess("NaN", Double.NaN);
            // Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
            TestSuccess("Unlimited", Double.MaxValue);
            // Special case: "-" character only means zero in accounting formats.
            TestSuccess("-", 0d);


            Benchmark Results



            Using a Stopwatch this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:




            Native parser took 26220 ms.



            Custom parser took 6471 ms.



            Performance gain was 305.19%







            share|improve this answer























            • @chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
              – Alain
              Aug 3 at 12:36










            • Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
              – chux
              Aug 3 at 14:20











            • That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
              – Alain
              Aug 3 at 14:30










            • "think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
              – chux
              Aug 3 at 14:39










            • Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0').
              – chux
              Aug 3 at 14:40












            up vote
            0
            down vote










            up vote
            0
            down vote









            Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.



            It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if checks that simply returned false before.



            I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.



            /// <summary>High performance double parser with rudimentary flexibility.</summary>
            /// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
            /// <remarks>Does not support thousand separators or whitespace.</remarks>
            /// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
            /// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
            /// So long as all culture symbols are a single character in length.
            /// TODO: In theory, this class could be made instantiable, take the culture as an argument,
            /// and support changing the culture at runtime in case the file the user is uploading
            /// was generated on a machine with different culture settings.</remarks>
            /// <remarks>Supports leading negative signs and positive signs, scientific notation,
            /// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
            /// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
            /// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
            public static bool FastTryParseDouble(string input, out double result)


            /// <summary>Checks if the string matches one of a few supported special case
            /// double strings. If so, assigns the result and returns true.</summary>
            public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)

            if (input == NumberFormat.PositiveInfinitySymbol)
            result = Double.PositiveInfinity;
            else if (input == NumberFormat.NegativeInfinitySymbol)
            result = Double.NegativeInfinity;
            else if (input == NumberFormat.NaNSymbol)
            result = Double.NaN;
            // Special Case: Excel has been known to format zero as "-".
            // We intentionally support it by returning zero now (most parsers would not)
            else if (input == NumberFormat.NegativeSign)
            result = 0d;
            // Special Case: Our organization treats the term "Unlimited" as referring
            // to Double.MaxValue (most parsers would not)
            else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
            result = Double.MaxValue;
            // Anything else is not a valid input
            else

            result = Double.NaN;
            return false;

            return true;


            /// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
            private const int MaxDoubleExponent = 308;

            /// <summary>The number of elements that will be generated in the Pow10 array.</summary>
            private const int Pow10Length = MaxDoubleExponent * 2 + 1;

            /// <summary>A cache of all possible positive powers of 10 that might be required to
            /// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
            /// exponents. (Indices 309-616)</summary>
            private static readonly double Pow10 =
            Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
            .Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
            .ToArray();

            /// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
            private const int NegPow10Length = 16;

            /// <summary>A cache of the first 15 negative powers of 10 for quick
            /// magnitude adjustment of common parsed fractional parts of doubles.</summary>
            /// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
            /// users that don't use scientific notation or extremely long fractional parts
            /// might get a speedup by being able to reference the smaller array, which has a better
            /// chance of being served out of L1/L2 cache.</remarks>
            private static readonly double NegPow10 =
            Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();


            This new method matches all of the following test cases:



            // Numbers without a fractional part
            TestSuccess("0", 0d);
            TestSuccess("1", 1d);
            TestSuccess("-1", -1d);
            TestSuccess("12345678901234", 12345678901234d);
            TestSuccess("-12345678901234", -12345678901234d);
            // Numbers with a fractional part
            TestSuccess("123.45678", 123.45678);
            TestSuccess("-123.45678", -123.45678);
            // Numbers without an integer part
            TestSuccess(".12345678901234", 0.12345678901234);
            TestSuccess("-.12345678901234", -0.12345678901234);
            // Various high-precision numbers
            TestSuccess("0.12345678901234", 0.12345678901234);
            TestSuccess("-0.12345678901234", -0.12345678901234);
            TestSuccess("0.00000987654321", 0.00000987654321);
            TestSuccess("-0.00000987654321", -0.00000987654321);
            TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
            TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
            // Numbers with very long fractional parts (more than 16 characters)
            TestSuccess("0.00826499999979784", 0.00826499999979784);
            TestSuccess("-0.00826499999979784", -0.00826499999979784);
            TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
            TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
            // Numbers with a leading positive sign
            TestSuccess("+1", 1d);
            TestSuccess("+12345678901234", 12345678901234d);
            TestSuccess("+.12345678901234", 0.12345678901234);
            TestSuccess("+0.00826499999979784", 0.00826499999979784);
            // Very large numbers without scientific notation
            TestSuccess("123456789000000000000000", 123456789000000000000000d);
            TestSuccess("-123456789000000000000000", -123456789000000000000000d);
            // Very small numbers without scientific notation
            TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
            TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
            // Scientific notation without a sign
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("-1.2345678e5", -1.2345678e5);
            // Scientific notation with a sign
            TestSuccess("1.2345678e+25", 1.2345678e25);
            TestSuccess("-1.2345678e+25", -1.2345678e25);
            TestSuccess("1.2345678e-255", 1.2345678e-255);
            TestSuccess("-1.2345678e-255", -1.2345678e-255);
            // Epsilon, and other tiny doubles
            // TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
            // for these, but the native parser returns Double.Epsilon (smallest number greater
            // than zero). I think we can live with this shortcoming.
            //TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
            //TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
            TestSuccess("3.33E-333", 3.33E-333);
            TestSuccess("-3.33E-333", -3.33E-333);
            TestSuccess("1E-1022", 1E-1022);
            TestSuccess("-1E-1022", -1E-1022);
            // Boundary cases
            TestSuccess("1e0", 1);
            TestSuccess("1e1", 10);
            TestSuccess("1e-1", 0.1);
            TestSuccess("1e-308", 1e-308);
            TestSuccess("1e308", 1e308);
            // Min and Max Double
            TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
            TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
            // Large Negative Exponents (Near-epsilon) doubles.
            TestSuccess("1.23E-999", 1.23E-999);
            TestSuccess("-1.23E-999", -1.23E-999);
            // Special keywords
            TestSuccess("∞", Double.PositiveInfinity);
            TestSuccess("-∞", Double.NegativeInfinity);
            TestSuccess("NaN", Double.NaN);
            // Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
            TestSuccess("Unlimited", Double.MaxValue);
            // Special case: "-" character only means zero in accounting formats.
            TestSuccess("-", 0d);


            Benchmark Results



            Using a Stopwatch this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:




            Native parser took 26220 ms.



            Custom parser took 6471 ms.



            Performance gain was 305.19%







            share|improve this answer















            Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.



            It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if checks that simply returned false before.



            I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.



            /// <summary>High performance double parser with rudimentary flexibility.</summary>
            /// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
            /// <remarks>Does not support thousand separators or whitespace.</remarks>
            /// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
            /// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
            /// So long as all culture symbols are a single character in length.
            /// TODO: In theory, this class could be made instantiable, take the culture as an argument,
            /// and support changing the culture at runtime in case the file the user is uploading
            /// was generated on a machine with different culture settings.</remarks>
            /// <remarks>Supports leading negative signs and positive signs, scientific notation,
            /// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
            /// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
            /// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
            public static bool FastTryParseDouble(string input, out double result)


            /// <summary>Checks if the string matches one of a few supported special case
            /// double strings. If so, assigns the result and returns true.</summary>
            public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)

            if (input == NumberFormat.PositiveInfinitySymbol)
            result = Double.PositiveInfinity;
            else if (input == NumberFormat.NegativeInfinitySymbol)
            result = Double.NegativeInfinity;
            else if (input == NumberFormat.NaNSymbol)
            result = Double.NaN;
            // Special Case: Excel has been known to format zero as "-".
            // We intentionally support it by returning zero now (most parsers would not)
            else if (input == NumberFormat.NegativeSign)
            result = 0d;
            // Special Case: Our organization treats the term "Unlimited" as referring
            // to Double.MaxValue (most parsers would not)
            else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
            result = Double.MaxValue;
            // Anything else is not a valid input
            else

            result = Double.NaN;
            return false;

            return true;


            /// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
            private const int MaxDoubleExponent = 308;

            /// <summary>The number of elements that will be generated in the Pow10 array.</summary>
            private const int Pow10Length = MaxDoubleExponent * 2 + 1;

            /// <summary>A cache of all possible positive powers of 10 that might be required to
            /// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
            /// exponents. (Indices 309-616)</summary>
            private static readonly double Pow10 =
            Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
            .Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
            .ToArray();

            /// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
            private const int NegPow10Length = 16;

            /// <summary>A cache of the first 15 negative powers of 10 for quick
            /// magnitude adjustment of common parsed fractional parts of doubles.</summary>
            /// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
            /// users that don't use scientific notation or extremely long fractional parts
            /// might get a speedup by being able to reference the smaller array, which has a better
            /// chance of being served out of L1/L2 cache.</remarks>
            private static readonly double NegPow10 =
            Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();


            This new method matches all of the following test cases:



            // Numbers without a fractional part
            TestSuccess("0", 0d);
            TestSuccess("1", 1d);
            TestSuccess("-1", -1d);
            TestSuccess("12345678901234", 12345678901234d);
            TestSuccess("-12345678901234", -12345678901234d);
            // Numbers with a fractional part
            TestSuccess("123.45678", 123.45678);
            TestSuccess("-123.45678", -123.45678);
            // Numbers without an integer part
            TestSuccess(".12345678901234", 0.12345678901234);
            TestSuccess("-.12345678901234", -0.12345678901234);
            // Various high-precision numbers
            TestSuccess("0.12345678901234", 0.12345678901234);
            TestSuccess("-0.12345678901234", -0.12345678901234);
            TestSuccess("0.00000987654321", 0.00000987654321);
            TestSuccess("-0.00000987654321", -0.00000987654321);
            TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
            TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
            // Numbers with very long fractional parts (more than 16 characters)
            TestSuccess("0.00826499999979784", 0.00826499999979784);
            TestSuccess("-0.00826499999979784", -0.00826499999979784);
            TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
            TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
            // Numbers with a leading positive sign
            TestSuccess("+1", 1d);
            TestSuccess("+12345678901234", 12345678901234d);
            TestSuccess("+.12345678901234", 0.12345678901234);
            TestSuccess("+0.00826499999979784", 0.00826499999979784);
            // Very large numbers without scientific notation
            TestSuccess("123456789000000000000000", 123456789000000000000000d);
            TestSuccess("-123456789000000000000000", -123456789000000000000000d);
            // Very small numbers without scientific notation
            TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
            TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
            // Scientific notation without a sign
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("1.2345678e5", 1.2345678e5);
            TestSuccess("-1.2345678e5", -1.2345678e5);
            // Scientific notation with a sign
            TestSuccess("1.2345678e+25", 1.2345678e25);
            TestSuccess("-1.2345678e+25", -1.2345678e25);
            TestSuccess("1.2345678e-255", 1.2345678e-255);
            TestSuccess("-1.2345678e-255", -1.2345678e-255);
            // Epsilon, and other tiny doubles
            // TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
            // for these, but the native parser returns Double.Epsilon (smallest number greater
            // than zero). I think we can live with this shortcoming.
            //TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
            //TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
            TestSuccess("3.33E-333", 3.33E-333);
            TestSuccess("-3.33E-333", -3.33E-333);
            TestSuccess("1E-1022", 1E-1022);
            TestSuccess("-1E-1022", -1E-1022);
            // Boundary cases
            TestSuccess("1e0", 1);
            TestSuccess("1e1", 10);
            TestSuccess("1e-1", 0.1);
            TestSuccess("1e-308", 1e-308);
            TestSuccess("1e308", 1e308);
            // Min and Max Double
            TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
            TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
            // Large Negative Exponents (Near-epsilon) doubles.
            TestSuccess("1.23E-999", 1.23E-999);
            TestSuccess("-1.23E-999", -1.23E-999);
            // Special keywords
            TestSuccess("∞", Double.PositiveInfinity);
            TestSuccess("-∞", Double.NegativeInfinity);
            TestSuccess("NaN", Double.NaN);
            // Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
            TestSuccess("Unlimited", Double.MaxValue);
            // Special case: "-" character only means zero in accounting formats.
            TestSuccess("-", 0d);


            Benchmark Results



            Using a Stopwatch this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:




            Native parser took 26220 ms.



            Custom parser took 6471 ms.



            Performance gain was 305.19%








            share|improve this answer















            share|improve this answer



            share|improve this answer








            edited Aug 3 at 12:33


























            answered Aug 2 at 16:08









            Alain

            2831314




            2831314











            • @chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
              – Alain
              Aug 3 at 12:36










            • Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
              – chux
              Aug 3 at 14:20











            • That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
              – Alain
              Aug 3 at 14:30










            • "think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
              – chux
              Aug 3 at 14:39










            • Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0').
              – chux
              Aug 3 at 14:40
















            • @chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
              – Alain
              Aug 3 at 12:36










            • Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
              – chux
              Aug 3 at 14:20











            • That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
              – Alain
              Aug 3 at 14:30










            • "think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
              – chux
              Aug 3 at 14:39










            • Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0').
              – chux
              Aug 3 at 14:40















            @chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
            – Alain
            Aug 3 at 12:36




            @chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed.
            – Alain
            Aug 3 at 12:36












            Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
            – chux
            Aug 3 at 14:20





            Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. `
            – chux
            Aug 3 at 14:20













            That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
            – Alain
            Aug 3 at 14:30




            That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array?
            – Alain
            Aug 3 at 14:30












            "think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
            – chux
            Aug 3 at 14:39




            "think of any way without building a new array?" --> No, yet appending to the existing array may not cost much.
            – chux
            Aug 3 at 14:39












            Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0').
            – chux
            Aug 3 at 14:40




            Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0').
            – chux
            Aug 3 at 14:40










            up vote
            0
            down vote













            Here is an alternative algorithm based on a recommendation by @chux.



            We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:



            /// <summary>A buffer used store a copy of input characters so that we can
            /// append a null character avoid array index boundary checks while looping.</summary>
            private static readonly char characters = new char[256];

            public static bool FastTryParseDouble(string input, out double result)



            Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:




            Native parser took 21422 ms.



            Custom parser took 8684 ms.



            Performance gain was 146.68%




            I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.






            share|improve this answer





















            • I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
              – Alain
              Aug 3 at 21:02














            up vote
            0
            down vote













            Here is an alternative algorithm based on a recommendation by @chux.



            We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:



            /// <summary>A buffer used store a copy of input characters so that we can
            /// append a null character avoid array index boundary checks while looping.</summary>
            private static readonly char characters = new char[256];

            public static bool FastTryParseDouble(string input, out double result)



            Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:




            Native parser took 21422 ms.



            Custom parser took 8684 ms.



            Performance gain was 146.68%




            I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.






            share|improve this answer





















            • I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
              – Alain
              Aug 3 at 21:02












            up vote
            0
            down vote










            up vote
            0
            down vote









            Here is an alternative algorithm based on a recommendation by @chux.



            We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:



            /// <summary>A buffer used store a copy of input characters so that we can
            /// append a null character avoid array index boundary checks while looping.</summary>
            private static readonly char characters = new char[256];

            public static bool FastTryParseDouble(string input, out double result)



            Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:




            Native parser took 21422 ms.



            Custom parser took 8684 ms.



            Performance gain was 146.68%




            I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.






            share|improve this answer













            Here is an alternative algorithm based on a recommendation by @chux.



            We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:



            /// <summary>A buffer used store a copy of input characters so that we can
            /// append a null character avoid array index boundary checks while looping.</summary>
            private static readonly char characters = new char[256];

            public static bool FastTryParseDouble(string input, out double result)



            Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:




            Native parser took 21422 ms.



            Custom parser took 8684 ms.



            Performance gain was 146.68%




            I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.







            share|improve this answer













            share|improve this answer



            share|improve this answer











            answered Aug 3 at 18:25









            Alain

            2831314




            2831314











            • I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
              – Alain
              Aug 3 at 21:02
















            • I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
              – Alain
              Aug 3 at 21:02















            I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
            – Alain
            Aug 3 at 21:02




            I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying.
            – Alain
            Aug 3 at 21:02












             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f200627%2fcustom-double-parser-optimized-for-performance%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Greedy Best First Search implementation in Rust

            Function to Return a JSON Like Objects Using VBA Collections and Arrays

            C++11 CLH Lock Implementation