trjhtr

Question

I'm trying to beat the native Double.TryParse for performance in parsing large multi-million row (simple) CSV files as much as possible. I do not have to support exponential notation, thousand separators, Inf, -Inf, NaN, or anything exotic. Just millions of "0.##" format doubles.

Here's my best attempt, which is ~350% faster by my tests (64 bit release mode)

My Implementation

This is the setup of the function (mostly for context).

private static readonly char CharNegative = CurrentCulture.NumberFormat.NegativeSign[0];
private static readonly char CharDecimalSeparator =
 CurrentCulture.NumberFormat.NumberDecimalSeparator[0];

/// <summary>High performance double parser with rudimentary flexibility.
/// <returns>Returns true only if we can be certain we parsed the string correctly.
/// <remarks>Does not support exponential notation, thousand separators or whitespace.
/// Does not support Infinity, Negative Infinity, NaN, or detect over/underflows.
/// Supports only leading negative signs, no positive signs or trailing signs.</remarks>
public static bool FastTryParseDouble(string input, out double result)

 result = 0d;
 int length = input.Length;
 if (length == 0) return false;
 double sign = 1d;
 int currentIndex = 0;
 char nextChar = input[0];

 // Handle a possible negative sign at the beginning of the string.
 if (nextChar == CharNegative)
 
 sign = -1d;
 ++currentIndex;

As you can see, I try to remain culture aware and support negative numbers.
This is the remainder of the method, which I think needs to be optimized for performance:

 unchecked
 
 while (true)
 
 // Return now if we have reached the end of the string
 if (currentIndex >= length)
 
 result *= sign;
 return true;
 
 nextChar = input[currentIndex++];
 // Break if the result wasn't a digit between 0 and 9
 if (nextChar < '0' 
 // The next character should be a decimal character, or else it's invalid.
 if (nextChar != CharDecimalSeparator) return false;
 double fractionalPart = 0d;
 int fractionLengh = length - currentIndex;
 while (currentIndex < length)
 
 // Add the fractional part to the result, apply sign, and return
 if (fractionLengh < NegPow10.Length)
 result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
 else
 result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;
 
 return true;

NegPow10 at the end there is just the following array, which has a quick lookup value to cover the first 20 or so values of 10^-n for quick reference. Anything bigger just falls back to Math.Pow()

/// <summary>A cache of negative powers of 10 for quick magnitude adjustment of parsed
/// decimals up to the maximum number of possible decimal places that might be consumed
/// from a string representation of a double.</summary>
private static readonly double NegPow10 = new double

 1d,
 0.1,
 0.01,
 ///... you get the idea
 0.0000000000000001
;

Test Cases

The following test cases are all passing:

TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("123.45", 123.45);
TestSuccess("-123.45", -123.45);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
TestSuccess("0.12", 0.12);
TestSuccess("-0.12", -0.12);
TestSuccess("0.00", 0.00);
TestSuccess("-0.00", -0.00);
TestSuccess("1234567890123.01", 1234567890123.01);
TestSuccess("-1234567890123.01", -1234567890123.01);
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);

I also have the unsupported (failure) cases laid out if anyone's interested, but it's basically the limitations mentioned in the remarks above.

Benchmarks

I benchmarked my implementation against native Double.TryParse to guage the performance difference.

I tested parsing an array of 10 million different strings using:

Double.TryParse(value, NumberStyles.Float, cachedCulture, out _)

Note that I cache the culture instance and pass in explicit NumberStyles to get the native method as fast as possible before comparing it to my own. My method was of course running 10 million strings through:

Parsers.FastTryParseDouble(value, out _)

Results

Native Double.TryParse took ~4500 ms.

Custom Parsers.FastTryParseDouble took ~950 ms.

Performance gain was ~370%

Next Steps

See any other ways I can squeeze out more performance?

Any awful flaws that might cause incorrect results to be returned? Note that I'm always happy to return "false" for unsupported cases if that's what's fastest, but I'm not okay to return true and a bad result.

You missed 3 cases, Ã¢ÂˆÂž, -Ã¢ÂˆÂž and NaN. These should parse out as Double.PositiveInfinity, Double.NegativeInfinity and Double.NaN. Actually there is one more, your tests should be able to parse the positive symbol, ie: +1234.5678 — Jul 31 at 1:34
You should use a Stopwatch instead of just two time differences when doing benchmarks. — Jul 31 at 1:55
@RonBeyer I just discovered that Stopwatch uses DateTime.UTCNow behind the scenes unless you use it in high resolution mode.if (!Stopwatch.IsHighResolution) return DateTime.UtcNow.Ticks; Looks like it's only necessary if you're timing things at sub-20ms resolution. We're at seconds resolution here, so it should be okay. — Jul 31 at 2:47
Added comments and test cases to better indicate what formats are and aren't meant to be supported. — Jul 31 at 4:22
I found a bug in my 1st while loop that caused an invalid chacater to erronously return a double from all characters up to it. The fix was to use while(true) and have a separate check for if (currentIndex >= length). It brought performance gain down by almost 80%, so I'm looking for ways to merge the two again while still handling the failure scenario correctly. — Aug 1 at 0:08

score 2 · Answer 1 · 2018-08-07 00:49:54Z

Alternatives for OP to try

As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.

// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++) 
 ch = input[i]
 if (some_isdigit_test(ch)) 
 significant = significant * 10 + ch - '0'
 else if (ch == DP) 
 DP = '0' // Never match again
 fractionLengh = input.len - i - 1
 else 
 return fail;
 
}

// continue as before
if (fractionLengh < NegPow10.Length) ....

Perhaps integers?

Instead of accumulating result as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double.

Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double for subsequent calculation.

Not alwasy the best

There are concerns with OP's code about generating the best answer.

Challenging (lengthly) text input eventual causes significant * 10 to round its answer and even perhaps overflow (even with an in range possible result).

With OP's fractionalPart being rounded and NegPow10[fractionLengh]) also rounded, the product and than addition to result may be off by 1 or 2 ULP.

To get the best result, additional (slower) code is needed.

-0.0

It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".

Range

I'd expect test cases should include the maximum +/-Double as text and the minimum non-zero value +/-0.01

Positive numbers?

Code test for a leading '-'. How about a leading '+'? Research CurrentCulture.NumberFormat.PositiveSign.

On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles. — Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered. — Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange. — Aug 3 at 13:52

Alain 2831314 · Answer 2 · 2018-08-01 02:25:30Z

Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.

A little more work when checking for negatives:

// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)

 if (length <= 1) return false;
 sign = -1d;
 ++currentIndex;
 nextChar = input[1];

And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.

unchecked

 // Break on the first non-numeric character
 while (nextChar >= '0' && nextChar <= '9')
 
 // Multiply by 10 and add the next digit.
 result = result * 10 + (nextChar - '0');
 // Return if we have reached the end of the string
 if (++currentIndex >= length)
 
 result *= sign;
 return true;
 
 nextChar = input[currentIndex];
 
 // The next character should be the decimal separator, or else it's invalid.
 if (nextChar != CharDecimalSeparator) return false;
 // Perform a similar loop to construct the fractional part of the decimal
 double fractionalPart = 0d;
 int fractionLengh = length - ++currentIndex;
 while (currentIndex < length)
 
 nextChar = input[currentIndex++];
 // If we encounter a non-numeric character now, it's an error
 if (nextChar < '0' 
 // Adjust the magnitude and add the fractional part to the result
 // Use our power of negative powers of 10 if possible.
 if (fractionLengh < NegPow10.Length)
 result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
 else
 result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

return true;

The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.

I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.

score 0 · Answer 3 · 2018-08-03 12:33:24Z

Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.

It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if checks that simply returned false before.

I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.

/// <summary>High performance double parser with rudimentary flexibility.</summary>
/// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
/// <remarks>Does not support thousand separators or whitespace.</remarks>
/// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
/// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
/// So long as all culture symbols are a single character in length.
/// TODO: In theory, this class could be made instantiable, take the culture as an argument,
/// and support changing the culture at runtime in case the file the user is uploading
/// was generated on a machine with different culture settings.</remarks>
/// <remarks>Supports leading negative signs and positive signs, scientific notation,
/// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
/// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
/// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
public static bool FastTryParseDouble(string input, out double result)


/// <summary>Checks if the string matches one of a few supported special case
/// double strings. If so, assigns the result and returns true.</summary>
public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)

 if (input == NumberFormat.PositiveInfinitySymbol)
 result = Double.PositiveInfinity;
 else if (input == NumberFormat.NegativeInfinitySymbol)
 result = Double.NegativeInfinity;
 else if (input == NumberFormat.NaNSymbol)
 result = Double.NaN;
 // Special Case: Excel has been known to format zero as "-".
 // We intentionally support it by returning zero now (most parsers would not)
 else if (input == NumberFormat.NegativeSign)
 result = 0d;
 // Special Case: Our organization treats the term "Unlimited" as referring
 // to Double.MaxValue (most parsers would not)
 else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
 result = Double.MaxValue;
 // Anything else is not a valid input
 else
 
 result = Double.NaN;
 return false;
 
 return true;


/// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
private const int MaxDoubleExponent = 308;

/// <summary>The number of elements that will be generated in the Pow10 array.</summary>
private const int Pow10Length = MaxDoubleExponent * 2 + 1;

/// <summary>A cache of all possible positive powers of 10 that might be required to
/// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
/// exponents. (Indices 309-616)</summary>
private static readonly double Pow10 =
 Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
 .Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
 .ToArray();

/// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
private const int NegPow10Length = 16;

/// <summary>A cache of the first 15 negative powers of 10 for quick
/// magnitude adjustment of common parsed fractional parts of doubles.</summary>
/// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
/// users that don't use scientific notation or extremely long fractional parts
/// might get a speedup by being able to reference the smaller array, which has a better
/// chance of being served out of L1/L2 cache.</remarks>
private static readonly double NegPow10 =
 Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();

This new method matches all of the following test cases:

// Numbers without a fractional part
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
// Numbers with a fractional part
TestSuccess("123.45678", 123.45678);
TestSuccess("-123.45678", -123.45678);
// Numbers without an integer part
TestSuccess(".12345678901234", 0.12345678901234);
TestSuccess("-.12345678901234", -0.12345678901234);
// Various high-precision numbers
TestSuccess("0.12345678901234", 0.12345678901234);
TestSuccess("-0.12345678901234", -0.12345678901234);
TestSuccess("0.00000987654321", 0.00000987654321);
TestSuccess("-0.00000987654321", -0.00000987654321);
TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
// Numbers with very long fractional parts (more than 16 characters)
TestSuccess("0.00826499999979784", 0.00826499999979784);
TestSuccess("-0.00826499999979784", -0.00826499999979784);
TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
// Numbers with a leading positive sign
TestSuccess("+1", 1d);
TestSuccess("+12345678901234", 12345678901234d);
TestSuccess("+.12345678901234", 0.12345678901234);
TestSuccess("+0.00826499999979784", 0.00826499999979784);
// Very large numbers without scientific notation
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
// Very small numbers without scientific notation
TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
// Scientific notation without a sign
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("-1.2345678e5", -1.2345678e5);
// Scientific notation with a sign
TestSuccess("1.2345678e+25", 1.2345678e25);
TestSuccess("-1.2345678e+25", -1.2345678e25);
TestSuccess("1.2345678e-255", 1.2345678e-255);
TestSuccess("-1.2345678e-255", -1.2345678e-255);
// Epsilon, and other tiny doubles
// TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
// for these, but the native parser returns Double.Epsilon (smallest number greater
// than zero). I think we can live with this shortcoming.
//TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
//TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
TestSuccess("3.33E-333", 3.33E-333);
TestSuccess("-3.33E-333", -3.33E-333);
TestSuccess("1E-1022", 1E-1022);
TestSuccess("-1E-1022", -1E-1022);
// Boundary cases
TestSuccess("1e0", 1);
TestSuccess("1e1", 10);
TestSuccess("1e-1", 0.1);
TestSuccess("1e-308", 1e-308);
TestSuccess("1e308", 1e308);
// Min and Max Double
TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
// Large Negative Exponents (Near-epsilon) doubles.
TestSuccess("1.23E-999", 1.23E-999);
TestSuccess("-1.23E-999", -1.23E-999);
// Special keywords
TestSuccess("Ã¢ÂˆÂž", Double.PositiveInfinity);
TestSuccess("-Ã¢ÂˆÂž", Double.NegativeInfinity);
TestSuccess("NaN", Double.NaN);
// Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
TestSuccess("Unlimited", Double.MaxValue);
// Special case: "-" character only means zero in accounting formats.
TestSuccess("-", 0d);

Benchmark Results

Using a Stopwatch this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:

Native parser took 26220 ms.

Custom parser took 6471 ms.

Performance gain was 305.19%

@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed. — Aug 3 at 12:36
Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. ` — Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array? — Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much. — Aug 3 at 14:39
Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0'). — Aug 3 at 14:40

Alain 2831314 · Answer 4 · 2018-08-03 18:25:31Z

Here is an alternative algorithm based on a recommendation by @chux.

We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:

/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char characters = new char[256];

public static bool FastTryParseDouble(string input, out double result)

Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:

Native parser took 21422 ms.

Custom parser took 8684 ms.

Performance gain was 146.68%

I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.

I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying. — Aug 3 at 21:02

score 2 · Answer 5 · 2018-08-07 00:49:54Z

Alternatives for OP to try

As OP is looking for performance improvements, consider only 1 loop for both whole and fractional part calculation. Simply iterate through all the digits in one loop and note if and where the decimal point occurred.

// Pseudo code
DP = '.'
significant = 0.0
fractionLengh = 0
for (i=0; i < input.len; i++) 
 ch = input[i]
 if (some_isdigit_test(ch)) 
 significant = significant * 10 + ch - '0'
 else if (ch == DP) 
 DP = '0' // Never match again
 fractionLengh = input.len - i - 1
 else 
 return fail;
 
}

// continue as before
if (fractionLengh < NegPow10.Length) ....

Perhaps integers?

Instead of accumulating result as some floating point type, accumulate the digits as a 64-bit integer. This, depending on platform, is often significantly faster than double.

Code could simply count leading zeros (important if there is a '.' there) and then loop onto the minimum of of the remaining text length or 18 (number of 999... digts in a 64-bit integer) and then do a final integer to double for subsequent calculation.

Not alwasy the best

There are concerns with OP's code about generating the best answer.

Challenging (lengthly) text input eventual causes significant * 10 to round its answer and even perhaps overflow (even with an in range possible result).

With OP's fractionalPart being rounded and NegPow10[fractionLengh]) also rounded, the product and than addition to result may be off by 1 or 2 ULP.

To get the best result, additional (slower) code is needed.

-0.0

It appears OP's code will generate the correct result. I suspect unposted test code is insufficient to fully test this case. Perhaps OP is not concerned about this case as "anything exotic".

Range

I'd expect test cases should include the maximum +/-Double as text and the minimum non-zero value +/-0.01

Positive numbers?

Code test for a leading '-'. How about a leading '+'? Research CurrentCulture.NumberFormat.PositiveSign.

On Integers: I actually have a second (much simpler) FastTryParseInt - it's ~2x slower (on my 64 bit machine) than than an identical routine using doubles. The answer I found on the webs was that CPUs are optimized to do integer addition quickly, but integer multiplication is slow. Doubles are the opposite - addition takes more ticks than multiplication. The multiplication by 10 in the inner loop appears to really slow down integers. Haven't tested with longs, which should in theory be the same speed on a machine with 64 bit registers, but less risk of overflowing for large doubles. — Aug 2 at 17:06
I like the idea of not tracking a separate "fraction part" double, and just continuing to add to result. I'm sure I can work with that. I still think it might be required to maintain two while loops (even though the code is not as nice) so that the branch predictor doesn't have to re-acclimate after the separator is encountered. — Aug 2 at 17:14
I applied your pseudo code to my new version below and it did improve performance, thanks! I think I'll make a separate version without support for exponential notation again and see if the single loop approach makes it any faster. I also changed the in-progress result and constants to long (until the final result) to see how that went, but it did cut performance almost by double. Quite strange. — Aug 3 at 13:52

Alain 2831314 · Answer 6 · 2018-08-01 02:25:30Z

Here's a slightly improved implementation that saves a few instructions by combining some redundant checks and assignments.

A little more work when checking for negatives:

// Handle a possible negative sign at the beginning of the string.
// We will apply the sign before returning the final result.
if (nextChar == CharNegative)

 if (length <= 1) return false;
 sign = -1d;
 ++currentIndex;
 nextChar = input[1];

And the first main loop has its while / break conditions altered to remove the need for redundant checks after breaking.

unchecked

 // Break on the first non-numeric character
 while (nextChar >= '0' && nextChar <= '9')
 
 // Multiply by 10 and add the next digit.
 result = result * 10 + (nextChar - '0');
 // Return if we have reached the end of the string
 if (++currentIndex >= length)
 
 result *= sign;
 return true;
 
 nextChar = input[currentIndex];
 
 // The next character should be the decimal separator, or else it's invalid.
 if (nextChar != CharDecimalSeparator) return false;
 // Perform a similar loop to construct the fractional part of the decimal
 double fractionalPart = 0d;
 int fractionLengh = length - ++currentIndex;
 while (currentIndex < length)
 
 nextChar = input[currentIndex++];
 // If we encounter a non-numeric character now, it's an error
 if (nextChar < '0' 
 // Adjust the magnitude and add the fractional part to the result
 // Use our power of negative powers of 10 if possible.
 if (fractionLengh < NegPow10.Length)
 result = (result + fractionalPart * NegPow10[fractionLengh]) * sign;
 else
 result = (result + fractionalPart * Math.Pow(10, -fractionLengh)) * sign;

return true;

The difference in performance is imperceptible during release mode. I'm guessing comile optimizations and/or instruction pipelining made the updates irrelevant.

I toyed with different ways of doing the final computation (wrapping the power multiplication in a ternary operator, splitting them out into multiple lines, etc) but this one version seems to compile down the best.

score 0 · Answer 7 · 2018-08-03 12:33:24Z

Here is a much longer implementation that handles scientific notation, NaN, Infinity, Negative Infinity, and leading positive signs. I also added a lot of comments to help visually break it into chunks.

It manages to be almost as fast as the previous method - most of the logic takes place in the body of previous if checks that simply returned false before.

I found a few places where I could avoid repeated checks for non-digit characters, and use the first digit character to initialize the result directly to avoid unnecessary additions/multiplcations with zero on the first iteration of the loop.

/// <summary>High performance double parser with rudimentary flexibility.</summary>
/// <returns>Returns true only if we can be certain we parsed the string correctly.</returns>
/// <remarks>Does not support thousand separators or whitespace.</remarks>
/// <remarks>Supports all culture-specific symbols specified by the NumberFormatInfo of the
/// <see cref="CultureInfo.CurrentCulture"/> at the time this static class is instantiated.
/// So long as all culture symbols are a single character in length.
/// TODO: In theory, this class could be made instantiable, take the culture as an argument,
/// and support changing the culture at runtime in case the file the user is uploading
/// was generated on a machine with different culture settings.</remarks>
/// <remarks>Supports leading negative signs and positive signs, scientific notation,
/// as well as Infinity, Negative Infinity, and NaN, string representations.</remarks>
/// <remarks>A string containing only a negative sign (usually "-") intentionally returns a
/// value of zero. This is because it's a common representation of 0 in accounting.</remarks>
public static bool FastTryParseDouble(string input, out double result)


/// <summary>Checks if the string matches one of a few supported special case
/// double strings. If so, assigns the result and returns true.</summary>
public static bool CheckForSpecialCaseDoubleStrings(string input, out double result)

 if (input == NumberFormat.PositiveInfinitySymbol)
 result = Double.PositiveInfinity;
 else if (input == NumberFormat.NegativeInfinitySymbol)
 result = Double.NegativeInfinity;
 else if (input == NumberFormat.NaNSymbol)
 result = Double.NaN;
 // Special Case: Excel has been known to format zero as "-".
 // We intentionally support it by returning zero now (most parsers would not)
 else if (input == NumberFormat.NegativeSign)
 result = 0d;
 // Special Case: Our organization treats the term "Unlimited" as referring
 // to Double.MaxValue (most parsers would not)
 else if (input.Equals("unlimited", StringComparison.OrdinalIgnoreCase))
 result = Double.MaxValue;
 // Anything else is not a valid input
 else
 
 result = Double.NaN;
 return false;
 
 return true;


/// <summary>The largest exponent (or smallest when negative) that can be given to a Double.</summary>
private const int MaxDoubleExponent = 308;

/// <summary>The number of elements that will be generated in the Pow10 array.</summary>
private const int Pow10Length = MaxDoubleExponent * 2 + 1;

/// <summary>A cache of all possible positive powers of 10 that might be required to
/// apply an exponent to a double (Indices 0-308), as well as the first 308 negative
/// exponents. (Indices 309-616)</summary>
private static readonly double Pow10 =
 Enumerable.Range(0, MaxDoubleExponent + 1).Select(i => Math.Pow(10, i))
 .Concat(Enumerable.Range(1, MaxDoubleExponent).Select(i => Math.Pow(10, -i)))
 .ToArray();

/// <summary>The number of negative powers to pre-compute and store in a small array.</summary>
private const int NegPow10Length = 16;

/// <summary>A cache of the first 15 negative powers of 10 for quick
/// magnitude adjustment of common parsed fractional parts of doubles.</summary>
/// <remarks>Even though this overlaps with the Pow10 array, it is kept separate so that
/// users that don't use scientific notation or extremely long fractional parts
/// might get a speedup by being able to reference the smaller array, which has a better
/// chance of being served out of L1/L2 cache.</remarks>
private static readonly double NegPow10 =
 Enumerable.Range(0, NegPow10Length).Select(i => Math.Pow(10, -i)).ToArray();

This new method matches all of the following test cases:

// Numbers without a fractional part
TestSuccess("0", 0d);
TestSuccess("1", 1d);
TestSuccess("-1", -1d);
TestSuccess("12345678901234", 12345678901234d);
TestSuccess("-12345678901234", -12345678901234d);
// Numbers with a fractional part
TestSuccess("123.45678", 123.45678);
TestSuccess("-123.45678", -123.45678);
// Numbers without an integer part
TestSuccess(".12345678901234", 0.12345678901234);
TestSuccess("-.12345678901234", -0.12345678901234);
// Various high-precision numbers
TestSuccess("0.12345678901234", 0.12345678901234);
TestSuccess("-0.12345678901234", -0.12345678901234);
TestSuccess("0.00000987654321", 0.00000987654321);
TestSuccess("-0.00000987654321", -0.00000987654321);
TestSuccess("1234567890123.0123456789", 1234567890123.0123456789);
TestSuccess("-1234567890123.0123456789", -1234567890123.0123456789);
// Numbers with very long fractional parts (more than 16 characters)
TestSuccess("0.00826499999979784", 0.00826499999979784);
TestSuccess("-0.00826499999979784", -0.00826499999979784);
TestSuccess("1.0123456789012345678901234567890", 1.0123456789012345678901234567890);
TestSuccess("-1.0123456789012345678901234567890", -1.0123456789012345678901234567890);
// Numbers with a leading positive sign
TestSuccess("+1", 1d);
TestSuccess("+12345678901234", 12345678901234d);
TestSuccess("+.12345678901234", 0.12345678901234);
TestSuccess("+0.00826499999979784", 0.00826499999979784);
// Very large numbers without scientific notation
TestSuccess("123456789000000000000000", 123456789000000000000000d);
TestSuccess("-123456789000000000000000", -123456789000000000000000d);
// Very small numbers without scientific notation
TestSuccess("0.00000000000000000123456789", 0.00000000000000000123456789);
TestSuccess("-0.00000000000000000123456789", -0.00000000000000000123456789);
// Scientific notation without a sign
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("1.2345678e5", 1.2345678e5);
TestSuccess("-1.2345678e5", -1.2345678e5);
// Scientific notation with a sign
TestSuccess("1.2345678e+25", 1.2345678e25);
TestSuccess("-1.2345678e+25", -1.2345678e25);
TestSuccess("1.2345678e-255", 1.2345678e-255);
TestSuccess("-1.2345678e-255", -1.2345678e-255);
// Epsilon, and other tiny doubles
// TODO: Known "failure" scenarios. Our parsing logic results in a return value of 0
// for these, but the native parser returns Double.Epsilon (smallest number greater
// than zero). I think we can live with this shortcoming.
//TestSuccess("4.94065645841247e-324", 4.94065645841247e-324);
//TestSuccess("-4.94065645841247e-324", -4.94065645841247e-324);
TestSuccess("3.33E-333", 3.33E-333);
TestSuccess("-3.33E-333", -3.33E-333);
TestSuccess("1E-1022", 1E-1022);
TestSuccess("-1E-1022", -1E-1022);
// Boundary cases
TestSuccess("1e0", 1);
TestSuccess("1e1", 10);
TestSuccess("1e-1", 0.1);
TestSuccess("1e-308", 1e-308);
TestSuccess("1e308", 1e308);
// Min and Max Double
TestSuccess("1.7976931348623157E+308", 1.7976931348623157E+308);
TestSuccess("-1.7976931348623157E+308", -1.7976931348623157E+308);
// Large Negative Exponents (Near-epsilon) doubles.
TestSuccess("1.23E-999", 1.23E-999);
TestSuccess("-1.23E-999", -1.23E-999);
// Special keywords
TestSuccess("Ã¢ÂˆÂž", Double.PositiveInfinity);
TestSuccess("-Ã¢ÂˆÂž", Double.NegativeInfinity);
TestSuccess("NaN", Double.NaN);
// Special case: "Unlimited" is used in our organization to refer to Double.MaxValue
TestSuccess("Unlimited", Double.MaxValue);
// Special case: "-" character only means zero in accounting formats.
TestSuccess("-", 0d);

Benchmark Results

Using a Stopwatch this time, and ran with 1,000,000,000 (a billion) strings just to quell any debate about timing sensitivity:

Native parser took 26220 ms.

Custom parser took 6471 ms.

Performance gain was 305.19%

@chux I applied your suggestion in this updated version and it took the performance up another ~15% - while still supporting all valid double cases. The results still differ from Native Double.Parse by 1 or 2 ULPs for certain very large or very small numbers, but I don't consider that a meaningful drawback. It just means results aren't always being parsed to precisely the closest possible double representation of the string processed. — Aug 3 at 12:36
Code like do nextChar = input[currentIndex]; if (nextChar > '9' while (++currentIndex < length); is doing 2 checks (is digit, is at end) per loop to see if it should stop. If able, append a non-digit to index and then code only need to to check for digit. After to loop code can test if loop quit due to end of string or not. ` — Aug 3 at 14:20
That's a good idea. If only strings in c# returned the null character as the last element in their character array like c++. Can you think of any way without building a new array? — Aug 3 at 14:30
"think of any way without building a new array?" --> No, yet appending to the existing array may not cost much. — Aug 3 at 14:39
Other idea: consider profiling a table lookup isdigit[nextChar] instead of if (nextChar > '9' || nextChar < '0'). — Aug 3 at 14:40

Alain 2831314 · Answer 8 · 2018-08-03 18:25:31Z

Here is an alternative algorithm based on a recommendation by @chux.

We append a null character to the end of the string ('') although any non-digit character will do. Now we don't have to check for array boundaries. All of our while loops become super tight:

/// <summary>A buffer used store a copy of input characters so that we can
/// append a null character avoid array index boundary checks while looping.</summary>
private static readonly char characters = new char[256];

public static bool FastTryParseDouble(string input, out double result)

Unfortunately, C# doesn't let you cheaply append an element to an array. We need to copy the data to a new (larger) array in order to add the 'terminating' character. Even if we keep a single, shared "buffer" for re-use to avoid allocating a new array on each call, the copy operation is prohibitively expensive. As such, the performance ends up worse:

Native parser took 21422 ms.

Custom parser took 8684 ms.

Performance gain was 146.68%

I'm sure this solution would be much more optimal in a language like c++, where the strings already contain a null character at the end. In such a case, there would be no need to copy the character array at all, and the algorithm below could be that much faster than one that has to constantly test array index bounds and test for whether the character is a digit character.

I just discovered that I can go fixed (char* cString = input) in an unsafe method to get a null-terminated character array pointer directly from a string. I'm going to try again using that, which will remove the need for any memory copying. — Aug 3 at 21:02

搜尋此網誌

trjhtr

Custom double parser optimized for performance

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read files from a directory using Promises

Read an image with ADNS2610 optical sensor and Arduino Uno

Custom double parser optimized for performance

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read files from a directory using Promises

Read an image with ADNS2610 optical sensor and Arduino Uno

4 Answers
4

4 Answers
4

4 Answers
4