Compute mean, variance and standard deviation of CSV number file

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
6
down vote

favorite

I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.

For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.

The application reads a file where numbers are on separate lines and computes the mean, variance and standard deviation of all the numbers. Afterwards I print the whole numbers list and the stats that were computed.

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
#import <cmath>

using namespace std;

std::vector<int> readFile(const std::string &filePath) 
 ifstream in_file;
 in_file.open(filePath);
 std::vector<int> numbers;

 std::string line;

 while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));
 

 return numbers;


float computeMean(std::vector<int> numbers)

 if(numbers.empty()) return 0;

 float total = 0;
 for (int number : numbers) 
 total += number;
 

 return (total / numbers.size());


float computeVariance(float mean, std::vector<int> numbers)

 float result = 0;
 for(int number : numbers)
 
 result += (number - mean)*(number - mean);
 

 return result / (numbers.size() - 1);


int main() 
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 std::vector<int> numbers = readFile(filePath);
 float mean = computeMean(numbers);
 float variance = computeVariance(mean, numbers);
 float standardDeviation = sqrt(variance);

 std::cout << std::to_string(numbers.size()) + " numbers : ";
 std::string data;
 for(int number : numbers) 
 data += std::to_string(number) + ", ";
 
 data = data.substr(0, data.length()-2);
 std::cout << data << std::endl;

 std::cout << "Mean : " << std::to_string(mean) << std::endl;
 std::cout << "Variance : " << std::to_string(variance) << std::endl;
 std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jan 19 at 18:35

asked Jan 19 at 5:21

IEatBagels

8,56123078

add a commentÂ |Â

up vote
6
down vote

favorite

I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.

For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
#import <cmath>

using namespace std;

std::vector<int> readFile(const std::string &filePath) 
 ifstream in_file;
 in_file.open(filePath);
 std::vector<int> numbers;

 std::string line;

 while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));
 

 return numbers;


float computeMean(std::vector<int> numbers)

 if(numbers.empty()) return 0;

 float total = 0;
 for (int number : numbers) 
 total += number;
 

 return (total / numbers.size());


float computeVariance(float mean, std::vector<int> numbers)

 float result = 0;
 for(int number : numbers)
 
 result += (number - mean)*(number - mean);
 

 return result / (numbers.size() - 1);


int main() 
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 std::vector<int> numbers = readFile(filePath);
 float mean = computeMean(numbers);
 float variance = computeVariance(mean, numbers);
 float standardDeviation = sqrt(variance);

 std::cout << std::to_string(numbers.size()) + " numbers : ";
 std::string data;
 for(int number : numbers) 
 data += std::to_string(number) + ", ";
 
 data = data.substr(0, data.length()-2);
 std::cout << data << std::endl;

 std::cout << "Mean : " << std::to_string(mean) << std::endl;
 std::cout << "Variance : " << std::to_string(variance) << std::endl;
 std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jan 19 at 18:35

asked Jan 19 at 5:21

IEatBagels

8,56123078

add a commentÂ |Â

up vote
6
down vote

favorite

I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.

For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
#import <cmath>

using namespace std;

std::vector<int> readFile(const std::string &filePath) 
 ifstream in_file;
 in_file.open(filePath);
 std::vector<int> numbers;

 std::string line;

 while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));
 

 return numbers;


float computeMean(std::vector<int> numbers)

 if(numbers.empty()) return 0;

 float total = 0;
 for (int number : numbers) 
 total += number;
 

 return (total / numbers.size());


float computeVariance(float mean, std::vector<int> numbers)

 float result = 0;
 for(int number : numbers)
 
 result += (number - mean)*(number - mean);
 

 return result / (numbers.size() - 1);


int main() 
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 std::vector<int> numbers = readFile(filePath);
 float mean = computeMean(numbers);
 float variance = computeVariance(mean, numbers);
 float standardDeviation = sqrt(variance);

 std::cout << std::to_string(numbers.size()) + " numbers : ";
 std::string data;
 for(int number : numbers) 
 data += std::to_string(number) + ", ";
 
 data = data.substr(0, data.length()-2);
 std::cout << data << std::endl;

 std::cout << "Mean : " << std::to_string(mean) << std::endl;
 std::cout << "Variance : " << std::to_string(variance) << std::endl;
 std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jan 19 at 18:35

asked Jan 19 at 5:21

IEatBagels

8,56123078

I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.

For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
#import <cmath>

using namespace std;

std::vector<int> readFile(const std::string &filePath) 
 ifstream in_file;
 in_file.open(filePath);
 std::vector<int> numbers;

 std::string line;

 while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));
 

 return numbers;


float computeMean(std::vector<int> numbers)

 if(numbers.empty()) return 0;

 float total = 0;
 for (int number : numbers) 
 total += number;
 

 return (total / numbers.size());


float computeVariance(float mean, std::vector<int> numbers)

 float result = 0;
 for(int number : numbers)
 
 result += (number - mean)*(number - mean);
 

 return result / (numbers.size() - 1);


int main() 
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 std::vector<int> numbers = readFile(filePath);
 float mean = computeMean(numbers);
 float variance = computeVariance(mean, numbers);
 float standardDeviation = sqrt(variance);

 std::cout << std::to_string(numbers.size()) + " numbers : ";
 std::string data;
 for(int number : numbers) 
 data += std::to_string(number) + ", ";
 
 data = data.substr(0, data.length()-2);
 std::cout << data << std::endl;

 std::cout << "Mean : " << std::to_string(mean) << std::endl;
 std::cout << "Variance : " << std::to_string(variance) << std::endl;
 std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jan 19 at 18:35

asked Jan 19 at 5:21

IEatBagels

8,56123078

edited Jan 19 at 18:35

asked Jan 19 at 5:21

IEatBagels

8,56123078

asked Jan 19 at 5:21

IEatBagels

8,56123078

asked Jan 19 at 5:21

IEatBagels

8,56123078

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
8
down vote

accepted

+50

Portability

#import is a GCC extension (or perhaps a preview of C++20).

There's no good reason not to simply #include <cmath> here.

Headers and namespaces

We don't use any string-stream, so #include <sstream> can be removed.

Bringing all names in from a namespace is problematic; namespace std particularly so. It can silently change the meaning of your program when you're not expecting it. Get used to using the namespace prefix (std is intentionally very short), or importing just the names you need into the smallest reasonable scope.

In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.

`readFile()`

These lines can be simplified:

std::ifstream in_file;
in_file.open(filePath);

We can ask the constructor to open the file for us:

std::ifstream in_file(filePath);

This loop has some error checking, but it's not complete:

while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));

Firstly, we don't expect any carriage-return in the input file (we opened it in text mode, so on systems that use CR as line delimiter, they will be converted to n). Secondly, std::stoi throws exceptions when the string cannot be converted, but we probably also want to check whether there are leftover, unconverted characters after our integer (e.g. if someone thought they could supply decimal values).

`computeMean()`

Why return a float rather than double? Single-precision floats are normally used only where the storage size is an important consideration, which is not the case here. (Note that on many platforms, double is the natural (and fastest) size of floating-point.)

We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.

Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):

if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

This loop:

double total = 0;
for (int number : numbers) 
 total += number;

can be written (with #include <numeric>) as

double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);

`computeVariance()`

We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.

Apart from that, the comments above for computeMean() are relevant:

double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto add_square = [mean](double sum, int i)
 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);

Single-pass algorithm

It is possible to compute the mean and variances in a single pass - but not using the (exact-arithmetic) methods you likely learnt in high school, which suffer from lack of precision with the inexact floating-point types we can use. The topic is too deep for this review, but if you research Welford's Algorithm, you will find reference implementations to guide you.

That said, for your purposes, the straightforward two-pass algorithm is probably appropriate, and it's easy to read and understand, so I wouldn't recommend changing it unless you reach a point where your input set becomes too large to hold in a vector (and even then, only if you can't read the file multiple times).

My version

#include <algorithm>
#include <cmath>
#include <fstream>
#include <iostream>
#include <iterator>
#include <limits>
#include <numeric>
#include <vector>

std::vector<int> readFile(const std::string& filePath)

 std::ifstream in_file(filePath);
 std::istream_iterator<int> startin_file, end;
 std::vector<int> numbers;
 std::copy(start, end, std::back_inserter(numbers));
 return numbers;


double computeMean(const std::vector<int>& numbers)

 if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

 return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto const add_square = [mean](double sum, int i) 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);


int main()

#ifdef TEST
 const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
#else
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 const std::vector<int> numbers = readFile(filePath);
#endif

 double mean = computeMean(numbers);
 double variance = computeSampleVariance(mean, numbers);
 double standardDeviation = std::sqrt(variance);

 std::cout << numbers.size() << " numbers : ";
 auto separator = "";
 for (int number: numbers) 
 std::cout << separator << number;
 separator = ", ";
 
 std::cout << std::endl;

 std::cout << "Mean : " << std::to_string(mean)
 << "Variance : " << std::to_string(variance)
 << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jun 6 at 13:27

answered Jan 30 at 9:38

Toby Speight

17.8k13491

add a commentÂ |Â

up vote
5
down vote

You could compute both the mean and variance on a single pass which removes the need for storing the numbers.

$beginalign
sigma(x) times (n-1) &= sum( (overline x-x_i)^2 ) \
&= sum( overline x^2 &-&2 overline x times x_i &+ x_i^2 ) \
&= ntimesoverline x^2 &-& 2 overline xtimessum x_i &+ sum x_i^2 \
&= ntimes(sum x_i/n)^2&-& 2 sum(x_i)/ntimessum x_i &+ sum x_i^2\
&= 1/n(sum x_i)^2 &-& 2/n (sum(x_i))^2 &+ sum x_i^2\
&= -1/n (sum x_i)^2 &&& +sum x_i^2
endalign$

So all you need is is to get the sum of the values and the sum of the squares of the values.

However if the values are large and the variance is small then you can run into stability issues. So instead you can subtract a constant (pick the first value) from each value to compute the variance.

std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

 float sum = 0;
 float sumAdjusted= 0;
 float sumSquares = 0;
 int constant = numbers.front();
 for(int number : numbers)
 
 sum += number;
 sumAdjusted += number-constant;
 sumSquares += (number-constant)*(number-constant)
 
 float average = sum / (numbers.size());
 float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
 return std::make_pair(average , variance);

edited Jan 26 at 0:31

200_success

123k14143401

answered Jan 19 at 12:48

ratchet freak

11.4k1240

1

That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
â€“Â Toby Speight
Jan 19 at 16:33

This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
â€“Â Cris Luengo
Jan 29 at 20:23

@CrisLuengo I moved the mean closer to zero using the first element as offset.
â€“Â ratchet freak
Jan 29 at 20:39

1

In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
â€“Â Cris Luengo
Jan 29 at 22:00

add a commentÂ |Â

up vote
4
down vote

In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.

If you only have one number in your file, computeVariance will return a NaN, because it'll divide 0 by 0. There are two different ways to calculate variance. Are you using the correct one? This will have an effect on the computed standard deviation.

When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.

bool first = true;
for (auto number: numbers) 
 if (!first) std::cout << ", ";
 first = false;
 std::cout << number;

std::cout << std::endl;

endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.

When outputting the results, just output the number; don't convert it to a string first.

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185450%2fcompute-mean-variance-and-standard-deviation-of-csv-number-file%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
8
down vote

accepted

+50

Portability

#import is a GCC extension (or perhaps a preview of C++20).

There's no good reason not to simply #include <cmath> here.

Headers and namespaces

We don't use any string-stream, so #include <sstream> can be removed.

In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.

`readFile()`

These lines can be simplified:

std::ifstream in_file;
in_file.open(filePath);

We can ask the constructor to open the file for us:

std::ifstream in_file(filePath);

This loop has some error checking, but it's not complete:

while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));

`computeMean()`

We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.

Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):

if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

This loop:

double total = 0;
for (int number : numbers) 
 total += number;

can be written (with #include <numeric>) as

double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);

`computeVariance()`

We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.

Apart from that, the comments above for computeMean() are relevant:

double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto add_square = [mean](double sum, int i)
 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);

Single-pass algorithm

My version

#include <algorithm>
#include <cmath>
#include <fstream>
#include <iostream>
#include <iterator>
#include <limits>
#include <numeric>
#include <vector>

std::vector<int> readFile(const std::string& filePath)

 std::ifstream in_file(filePath);
 std::istream_iterator<int> startin_file, end;
 std::vector<int> numbers;
 std::copy(start, end, std::back_inserter(numbers));
 return numbers;


double computeMean(const std::vector<int>& numbers)

 if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

 return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto const add_square = [mean](double sum, int i) 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);


int main()

#ifdef TEST
 const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
#else
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 const std::vector<int> numbers = readFile(filePath);
#endif

 double mean = computeMean(numbers);
 double variance = computeSampleVariance(mean, numbers);
 double standardDeviation = std::sqrt(variance);

 std::cout << numbers.size() << " numbers : ";
 auto separator = "";
 for (int number: numbers) 
 std::cout << separator << number;
 separator = ", ";
 
 std::cout << std::endl;

 std::cout << "Mean : " << std::to_string(mean)
 << "Variance : " << std::to_string(variance)
 << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jun 6 at 13:27

answered Jan 30 at 9:38

Toby Speight

17.8k13491

add a commentÂ |Â

up vote
8
down vote

accepted

+50

Portability

#import is a GCC extension (or perhaps a preview of C++20).

There's no good reason not to simply #include <cmath> here.

Headers and namespaces

We don't use any string-stream, so #include <sstream> can be removed.

In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.

`readFile()`

These lines can be simplified:

std::ifstream in_file;
in_file.open(filePath);

We can ask the constructor to open the file for us:

std::ifstream in_file(filePath);

This loop has some error checking, but it's not complete:

while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));

`computeMean()`

We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.

Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):

if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

This loop:

double total = 0;
for (int number : numbers) 
 total += number;

can be written (with #include <numeric>) as

double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);

`computeVariance()`

We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.

Apart from that, the comments above for computeMean() are relevant:

double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto add_square = [mean](double sum, int i)
 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);

Single-pass algorithm

My version

#include <algorithm>
#include <cmath>
#include <fstream>
#include <iostream>
#include <iterator>
#include <limits>
#include <numeric>
#include <vector>

std::vector<int> readFile(const std::string& filePath)

 std::ifstream in_file(filePath);
 std::istream_iterator<int> startin_file, end;
 std::vector<int> numbers;
 std::copy(start, end, std::back_inserter(numbers));
 return numbers;


double computeMean(const std::vector<int>& numbers)

 if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

 return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto const add_square = [mean](double sum, int i) 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);


int main()

#ifdef TEST
 const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
#else
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 const std::vector<int> numbers = readFile(filePath);
#endif

 double mean = computeMean(numbers);
 double variance = computeSampleVariance(mean, numbers);
 double standardDeviation = std::sqrt(variance);

 std::cout << numbers.size() << " numbers : ";
 auto separator = "";
 for (int number: numbers) 
 std::cout << separator << number;
 separator = ", ";
 
 std::cout << std::endl;

 std::cout << "Mean : " << std::to_string(mean)
 << "Variance : " << std::to_string(variance)
 << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jun 6 at 13:27

answered Jan 30 at 9:38

Toby Speight

17.8k13491

add a commentÂ |Â

up vote
8
down vote

accepted

+50

up vote
8
down vote

accepted

+50

Portability

#import is a GCC extension (or perhaps a preview of C++20).

There's no good reason not to simply #include <cmath> here.

Headers and namespaces

We don't use any string-stream, so #include <sstream> can be removed.

In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.

`readFile()`

These lines can be simplified:

std::ifstream in_file;
in_file.open(filePath);

We can ask the constructor to open the file for us:

std::ifstream in_file(filePath);

This loop has some error checking, but it's not complete:

while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));

`computeMean()`

We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.

Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):

if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

This loop:

double total = 0;
for (int number : numbers) 
 total += number;

can be written (with #include <numeric>) as

double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);

`computeVariance()`

We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.

Apart from that, the comments above for computeMean() are relevant:

double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto add_square = [mean](double sum, int i)
 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);

Single-pass algorithm

My version

#include <algorithm>
#include <cmath>
#include <fstream>
#include <iostream>
#include <iterator>
#include <limits>
#include <numeric>
#include <vector>

std::vector<int> readFile(const std::string& filePath)

 std::ifstream in_file(filePath);
 std::istream_iterator<int> startin_file, end;
 std::vector<int> numbers;
 std::copy(start, end, std::back_inserter(numbers));
 return numbers;


double computeMean(const std::vector<int>& numbers)

 if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

 return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto const add_square = [mean](double sum, int i) 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);


int main()

#ifdef TEST
 const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
#else
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 const std::vector<int> numbers = readFile(filePath);
#endif

 double mean = computeMean(numbers);
 double variance = computeSampleVariance(mean, numbers);
 double standardDeviation = std::sqrt(variance);

 std::cout << numbers.size() << " numbers : ";
 auto separator = "";
 for (int number: numbers) 
 std::cout << separator << number;
 separator = ", ";
 
 std::cout << std::endl;

 std::cout << "Mean : " << std::to_string(mean)
 << "Variance : " << std::to_string(variance)
 << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jun 6 at 13:27

answered Jan 30 at 9:38

Toby Speight

17.8k13491

Portability

#import is a GCC extension (or perhaps a preview of C++20).

There's no good reason not to simply #include <cmath> here.

Headers and namespaces

We don't use any string-stream, so #include <sstream> can be removed.

In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.

`readFile()`

These lines can be simplified:

std::ifstream in_file;
in_file.open(filePath);

We can ask the constructor to open the file for us:

std::ifstream in_file(filePath);

This loop has some error checking, but it's not complete:

while(std::getline(in_file,line,'r'))
 numbers.push_back(std::stoi(line));

`computeMean()`

We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.

Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):

if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

This loop:

double total = 0;
for (int number : numbers) 
 total += number;

can be written (with #include <numeric>) as

double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);

`computeVariance()`

We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.

Apart from that, the comments above for computeMean() are relevant:

double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto add_square = [mean](double sum, int i)
 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);

Single-pass algorithm

My version

#include <algorithm>
#include <cmath>
#include <fstream>
#include <iostream>
#include <iterator>
#include <limits>
#include <numeric>
#include <vector>

std::vector<int> readFile(const std::string& filePath)

 std::ifstream in_file(filePath);
 std::istream_iterator<int> startin_file, end;
 std::vector<int> numbers;
 std::copy(start, end, std::back_inserter(numbers));
 return numbers;


double computeMean(const std::vector<int>& numbers)

 if (numbers.empty())
 return std::numeric_limits<double>::quiet_NaN();

 return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


double computeSampleVariance(const double mean, const std::vector<int>& numbers)

 if (numbers.size() <= 1u)
 return std::numeric_limits<double>::quiet_NaN();

 auto const add_square = [mean](double sum, int i) 
 auto d = i - mean;
 return sum + d*d;
 ;
 double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
 return total / (numbers.size() - 1);


int main()

#ifdef TEST
 const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
#else
 std::cout << "Please enter the file path :" << std::endl;
 std::string filePath;
 std::cin >> filePath;
 const std::vector<int> numbers = readFile(filePath);
#endif

 double mean = computeMean(numbers);
 double variance = computeSampleVariance(mean, numbers);
 double standardDeviation = std::sqrt(variance);

 std::cout << numbers.size() << " numbers : ";
 auto separator = "";
 for (int number: numbers) 
 std::cout << separator << number;
 separator = ", ";
 
 std::cout << std::endl;

 std::cout << "Mean : " << std::to_string(mean)
 << "Variance : " << std::to_string(variance)
 << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
 return 0;

edited Jun 6 at 13:27

answered Jan 30 at 9:38

Toby Speight

17.8k13491

edited Jun 6 at 13:27

answered Jan 30 at 9:38

Toby Speight

17.8k13491

answered Jan 30 at 9:38

Toby Speight

17.8k13491

answered Jan 30 at 9:38

Toby Speight

17.8k13491

add a commentÂ |Â

up vote
5
down vote

You could compute both the mean and variance on a single pass which removes the need for storing the numbers.

So all you need is is to get the sum of the values and the sum of the squares of the values.

std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

 float sum = 0;
 float sumAdjusted= 0;
 float sumSquares = 0;
 int constant = numbers.front();
 for(int number : numbers)
 
 sum += number;
 sumAdjusted += number-constant;
 sumSquares += (number-constant)*(number-constant)
 
 float average = sum / (numbers.size());
 float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
 return std::make_pair(average , variance);

edited Jan 26 at 0:31

200_success

123k14143401

answered Jan 19 at 12:48

ratchet freak

11.4k1240

1

That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
â€“Â Toby Speight
Jan 19 at 16:33

This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
â€“Â Cris Luengo
Jan 29 at 20:23

@CrisLuengo I moved the mean closer to zero using the first element as offset.
â€“Â ratchet freak
Jan 29 at 20:39

1

In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
â€“Â Cris Luengo
Jan 29 at 22:00

add a commentÂ |Â

up vote
5
down vote

You could compute both the mean and variance on a single pass which removes the need for storing the numbers.

So all you need is is to get the sum of the values and the sum of the squares of the values.

std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

 float sum = 0;
 float sumAdjusted= 0;
 float sumSquares = 0;
 int constant = numbers.front();
 for(int number : numbers)
 
 sum += number;
 sumAdjusted += number-constant;
 sumSquares += (number-constant)*(number-constant)
 
 float average = sum / (numbers.size());
 float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
 return std::make_pair(average , variance);

edited Jan 26 at 0:31

200_success

123k14143401

answered Jan 19 at 12:48

ratchet freak

11.4k1240

1

That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
â€“Â Toby Speight
Jan 19 at 16:33

This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
â€“Â Cris Luengo
Jan 29 at 20:23

@CrisLuengo I moved the mean closer to zero using the first element as offset.
â€“Â ratchet freak
Jan 29 at 20:39

1

In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
â€“Â Cris Luengo
Jan 29 at 22:00

add a commentÂ |Â

up vote
5
down vote

You could compute both the mean and variance on a single pass which removes the need for storing the numbers.

So all you need is is to get the sum of the values and the sum of the squares of the values.

std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

 float sum = 0;
 float sumAdjusted= 0;
 float sumSquares = 0;
 int constant = numbers.front();
 for(int number : numbers)
 
 sum += number;
 sumAdjusted += number-constant;
 sumSquares += (number-constant)*(number-constant)
 
 float average = sum / (numbers.size());
 float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
 return std::make_pair(average , variance);

edited Jan 26 at 0:31

200_success

123k14143401

answered Jan 19 at 12:48

ratchet freak

11.4k1240

You could compute both the mean and variance on a single pass which removes the need for storing the numbers.

So all you need is is to get the sum of the values and the sum of the squares of the values.

std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

 float sum = 0;
 float sumAdjusted= 0;
 float sumSquares = 0;
 int constant = numbers.front();
 for(int number : numbers)
 
 sum += number;
 sumAdjusted += number-constant;
 sumSquares += (number-constant)*(number-constant)
 
 float average = sum / (numbers.size());
 float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
 return std::make_pair(average , variance);

edited Jan 26 at 0:31

200_success

123k14143401

answered Jan 19 at 12:48

ratchet freak

11.4k1240

edited Jan 26 at 0:31

200_success

123k14143401

edited Jan 26 at 0:31

200_success

123k14143401

edited Jan 26 at 0:31

200_success

123k14143401

answered Jan 19 at 12:48

ratchet freak

11.4k1240

answered Jan 19 at 12:48

ratchet freak

11.4k1240

answered Jan 19 at 12:48

ratchet freak

11.4k1240

1

That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
â€“Â Toby Speight
Jan 19 at 16:33

This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
â€“Â Cris Luengo
Jan 29 at 20:23

@CrisLuengo I moved the mean closer to zero using the first element as offset.
â€“Â ratchet freak
Jan 29 at 20:39

1

In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
â€“Â Cris Luengo
Jan 29 at 22:00

add a commentÂ |Â

1

That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
â€“Â Toby Speight
Jan 19 at 16:33

This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
â€“Â Cris Luengo
Jan 29 at 20:23

@CrisLuengo I moved the mean closer to zero using the first element as offset.
â€“Â ratchet freak
Jan 29 at 20:39

1

In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
â€“Â Cris Luengo
Jan 29 at 22:00

That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
â€“Â Toby Speight
Jan 19 at 16:33

This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
â€“Â Cris Luengo
Jan 29 at 20:23

@CrisLuengo I moved the mean closer to zero using the first element as offset.
â€“Â ratchet freak
Jan 29 at 20:39

In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
â€“Â Cris Luengo
Jan 29 at 22:00

add a commentÂ |Â

up vote
4
down vote

In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.

When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.

bool first = true;
for (auto number: numbers) 
 if (!first) std::cout << ", ";
 first = false;
 std::cout << number;

std::cout << std::endl;

endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.

When outputting the results, just output the number; don't convert it to a string first.

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

add a commentÂ |Â

up vote
4
down vote

In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.

When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.

bool first = true;
for (auto number: numbers) 
 if (!first) std::cout << ", ";
 first = false;
 std::cout << number;

std::cout << std::endl;

endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.

When outputting the results, just output the number; don't convert it to a string first.

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

add a commentÂ |Â

up vote
4
down vote

In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.

When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.

bool first = true;
for (auto number: numbers) 
 if (!first) std::cout << ", ";
 first = false;
 std::cout << number;

std::cout << std::endl;

endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.

When outputting the results, just output the number; don't convert it to a string first.

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.

When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.

bool first = true;
for (auto number: numbers) 
 if (!first) std::cout << ", ";
 first = false;
 std::cout << number;

std::cout << std::endl;

endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.

When outputting the results, just output the number; don't convert it to a string first.

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

answered Jan 26 at 0:57

1201ProgramAlarm

2,5852618

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Compute mean, variance and standard deviation of CSV number file

3 Answers 3

Portability

Headers and namespaces

readFile()

computeMean()

computeVariance()

Single-pass algorithm

My version

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Portability

Headers and namespaces

readFile()

computeMean()

computeVariance()

Single-pass algorithm

My version

Portability

Headers and namespaces

readFile()

computeMean()

computeVariance()

Single-pass algorithm

My version

Portability

Headers and namespaces

readFile()

computeMean()

computeVariance()

Single-pass algorithm

My version

Portability

Headers and namespaces

readFile()

computeMean()

computeVariance()

Single-pass algorithm

My version

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Python Lists

Aion

JavaScript Array Iteration Methods

3 Answers
3

`readFile()`

`computeMean()`

`computeVariance()`

3 Answers
3

3 Answers
3

`readFile()`

`computeMean()`

`computeVariance()`

`readFile()`

`computeMean()`

`computeVariance()`

`readFile()`

`computeMean()`

`computeVariance()`

`readFile()`

`computeMean()`

`computeVariance()`