Compute mean, variance and standard deviation of CSV number file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
6
down vote

favorite
1












I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.



For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.



The application reads a file where numbers are on separate lines and computes the mean, variance and standard deviation of all the numbers. Afterwards I print the whole numbers list and the stats that were computed.



#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>
#import <cmath>

using namespace std;

std::vector<int> readFile(const std::string &filePath)
ifstream in_file;
in_file.open(filePath);
std::vector<int> numbers;

std::string line;

while(std::getline(in_file,line,'r'))
numbers.push_back(std::stoi(line));


return numbers;


float computeMean(std::vector<int> numbers)

if(numbers.empty()) return 0;

float total = 0;
for (int number : numbers)
total += number;


return (total / numbers.size());


float computeVariance(float mean, std::vector<int> numbers)

float result = 0;
for(int number : numbers)

result += (number - mean)*(number - mean);


return result / (numbers.size() - 1);


int main()
std::cout << "Please enter the file path :" << std::endl;
std::string filePath;
std::cin >> filePath;
std::vector<int> numbers = readFile(filePath);
float mean = computeMean(numbers);
float variance = computeVariance(mean, numbers);
float standardDeviation = sqrt(variance);

std::cout << std::to_string(numbers.size()) + " numbers : ";
std::string data;
for(int number : numbers)
data += std::to_string(number) + ", ";

data = data.substr(0, data.length()-2);
std::cout << data << std::endl;

std::cout << "Mean : " << std::to_string(mean) << std::endl;
std::cout << "Variance : " << std::to_string(variance) << std::endl;
std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
return 0;







share|improve this question



























    up vote
    6
    down vote

    favorite
    1












    I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.



    For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.



    The application reads a file where numbers are on separate lines and computes the mean, variance and standard deviation of all the numbers. Afterwards I print the whole numbers list and the stats that were computed.



    #include <iostream>
    #include <fstream>
    #include <vector>
    #include <sstream>
    #import <cmath>

    using namespace std;

    std::vector<int> readFile(const std::string &filePath)
    ifstream in_file;
    in_file.open(filePath);
    std::vector<int> numbers;

    std::string line;

    while(std::getline(in_file,line,'r'))
    numbers.push_back(std::stoi(line));


    return numbers;


    float computeMean(std::vector<int> numbers)

    if(numbers.empty()) return 0;

    float total = 0;
    for (int number : numbers)
    total += number;


    return (total / numbers.size());


    float computeVariance(float mean, std::vector<int> numbers)

    float result = 0;
    for(int number : numbers)

    result += (number - mean)*(number - mean);


    return result / (numbers.size() - 1);


    int main()
    std::cout << "Please enter the file path :" << std::endl;
    std::string filePath;
    std::cin >> filePath;
    std::vector<int> numbers = readFile(filePath);
    float mean = computeMean(numbers);
    float variance = computeVariance(mean, numbers);
    float standardDeviation = sqrt(variance);

    std::cout << std::to_string(numbers.size()) + " numbers : ";
    std::string data;
    for(int number : numbers)
    data += std::to_string(number) + ", ";

    data = data.substr(0, data.length()-2);
    std::cout << data << std::endl;

    std::cout << "Mean : " << std::to_string(mean) << std::endl;
    std::cout << "Variance : " << std::to_string(variance) << std::endl;
    std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
    return 0;







    share|improve this question























      up vote
      6
      down vote

      favorite
      1









      up vote
      6
      down vote

      favorite
      1






      1





      I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.



      For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.



      The application reads a file where numbers are on separate lines and computes the mean, variance and standard deviation of all the numbers. Afterwards I print the whole numbers list and the stats that were computed.



      #include <iostream>
      #include <fstream>
      #include <vector>
      #include <sstream>
      #import <cmath>

      using namespace std;

      std::vector<int> readFile(const std::string &filePath)
      ifstream in_file;
      in_file.open(filePath);
      std::vector<int> numbers;

      std::string line;

      while(std::getline(in_file,line,'r'))
      numbers.push_back(std::stoi(line));


      return numbers;


      float computeMean(std::vector<int> numbers)

      if(numbers.empty()) return 0;

      float total = 0;
      for (int number : numbers)
      total += number;


      return (total / numbers.size());


      float computeVariance(float mean, std::vector<int> numbers)

      float result = 0;
      for(int number : numbers)

      result += (number - mean)*(number - mean);


      return result / (numbers.size() - 1);


      int main()
      std::cout << "Please enter the file path :" << std::endl;
      std::string filePath;
      std::cin >> filePath;
      std::vector<int> numbers = readFile(filePath);
      float mean = computeMean(numbers);
      float variance = computeVariance(mean, numbers);
      float standardDeviation = sqrt(variance);

      std::cout << std::to_string(numbers.size()) + " numbers : ";
      std::string data;
      for(int number : numbers)
      data += std::to_string(number) + ", ";

      data = data.substr(0, data.length()-2);
      std::cout << data << std::endl;

      std::cout << "Mean : " << std::to_string(mean) << std::endl;
      std::cout << "Variance : " << std::to_string(variance) << std::endl;
      std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
      return 0;







      share|improve this question













      I've written my first C++ application this week and I wanted to see if there's anything that aren't up to standards or better ways to do what I'm currently doing.



      For the goal of the exercise, let's assume I can't use any other mathematical function other than the square root.



      The application reads a file where numbers are on separate lines and computes the mean, variance and standard deviation of all the numbers. Afterwards I print the whole numbers list and the stats that were computed.



      #include <iostream>
      #include <fstream>
      #include <vector>
      #include <sstream>
      #import <cmath>

      using namespace std;

      std::vector<int> readFile(const std::string &filePath)
      ifstream in_file;
      in_file.open(filePath);
      std::vector<int> numbers;

      std::string line;

      while(std::getline(in_file,line,'r'))
      numbers.push_back(std::stoi(line));


      return numbers;


      float computeMean(std::vector<int> numbers)

      if(numbers.empty()) return 0;

      float total = 0;
      for (int number : numbers)
      total += number;


      return (total / numbers.size());


      float computeVariance(float mean, std::vector<int> numbers)

      float result = 0;
      for(int number : numbers)

      result += (number - mean)*(number - mean);


      return result / (numbers.size() - 1);


      int main()
      std::cout << "Please enter the file path :" << std::endl;
      std::string filePath;
      std::cin >> filePath;
      std::vector<int> numbers = readFile(filePath);
      float mean = computeMean(numbers);
      float variance = computeVariance(mean, numbers);
      float standardDeviation = sqrt(variance);

      std::cout << std::to_string(numbers.size()) + " numbers : ";
      std::string data;
      for(int number : numbers)
      data += std::to_string(number) + ", ";

      data = data.substr(0, data.length()-2);
      std::cout << data << std::endl;

      std::cout << "Mean : " << std::to_string(mean) << std::endl;
      std::cout << "Variance : " << std::to_string(variance) << std::endl;
      std::cout << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
      return 0;









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 19 at 18:35
























      asked Jan 19 at 5:21









      IEatBagels

      8,56123078




      8,56123078




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          8
          down vote



          accepted
          +50










          Portability



          #import is a GCC extension (or perhaps a preview of C++20).



          There's no good reason not to simply #include <cmath> here.



          Headers and namespaces



          We don't use any string-stream, so #include <sstream> can be removed.



          Bringing all names in from a namespace is problematic; namespace std particularly so. It can silently change the meaning of your program when you're not expecting it. Get used to using the namespace prefix (std is intentionally very short), or importing just the names you need into the smallest reasonable scope.



          In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.



          readFile()



          These lines can be simplified:



          std::ifstream in_file;
          in_file.open(filePath);


          We can ask the constructor to open the file for us:



          std::ifstream in_file(filePath);


          This loop has some error checking, but it's not complete:



          while(std::getline(in_file,line,'r'))
          numbers.push_back(std::stoi(line));



          Firstly, we don't expect any carriage-return in the input file (we opened it in text mode, so on systems that use CR as line delimiter, they will be converted to n). Secondly, std::stoi throws exceptions when the string cannot be converted, but we probably also want to check whether there are leftover, unconverted characters after our integer (e.g. if someone thought they could supply decimal values).



          computeMean()



          Why return a float rather than double? Single-precision floats are normally used only where the storage size is an important consideration, which is not the case here. (Note that on many platforms, double is the natural (and fastest) size of floating-point.)



          We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.



          Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):



          if (numbers.empty())
          return std::numeric_limits<double>::quiet_NaN();


          This loop:



          double total = 0;
          for (int number : numbers)
          total += number;



          can be written (with #include <numeric>) as



          double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);


          computeVariance()



          We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.



          Apart from that, the comments above for computeMean() are relevant:



          double computeSampleVariance(const double mean, const std::vector<int>& numbers)

          if (numbers.size() <= 1u)
          return std::numeric_limits<double>::quiet_NaN();

          auto add_square = [mean](double sum, int i)

          auto d = i - mean;
          return sum + d*d;
          ;
          double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
          return total / (numbers.size() - 1);



          Single-pass algorithm



          It is possible to compute the mean and variances in a single pass - but not using the (exact-arithmetic) methods you likely learnt in high school, which suffer from lack of precision with the inexact floating-point types we can use. The topic is too deep for this review, but if you research Welford's Algorithm, you will find reference implementations to guide you.



          That said, for your purposes, the straightforward two-pass algorithm is probably appropriate, and it's easy to read and understand, so I wouldn't recommend changing it unless you reach a point where your input set becomes too large to hold in a vector (and even then, only if you can't read the file multiple times).




          My version



          #include <algorithm>
          #include <cmath>
          #include <fstream>
          #include <iostream>
          #include <iterator>
          #include <limits>
          #include <numeric>
          #include <vector>

          std::vector<int> readFile(const std::string& filePath)

          std::ifstream in_file(filePath);
          std::istream_iterator<int> startin_file, end;
          std::vector<int> numbers;
          std::copy(start, end, std::back_inserter(numbers));
          return numbers;


          double computeMean(const std::vector<int>& numbers)

          if (numbers.empty())
          return std::numeric_limits<double>::quiet_NaN();

          return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


          double computeSampleVariance(const double mean, const std::vector<int>& numbers)

          if (numbers.size() <= 1u)
          return std::numeric_limits<double>::quiet_NaN();

          auto const add_square = [mean](double sum, int i)
          auto d = i - mean;
          return sum + d*d;
          ;
          double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
          return total / (numbers.size() - 1);


          int main()

          #ifdef TEST
          const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
          #else
          std::cout << "Please enter the file path :" << std::endl;
          std::string filePath;
          std::cin >> filePath;
          const std::vector<int> numbers = readFile(filePath);
          #endif

          double mean = computeMean(numbers);
          double variance = computeSampleVariance(mean, numbers);
          double standardDeviation = std::sqrt(variance);

          std::cout << numbers.size() << " numbers : ";
          auto separator = "";
          for (int number: numbers)
          std::cout << separator << number;
          separator = ", ";

          std::cout << std::endl;

          std::cout << "Mean : " << std::to_string(mean)
          << "Variance : " << std::to_string(variance)
          << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
          return 0;






          share|improve this answer






























            up vote
            5
            down vote













            You could compute both the mean and variance on a single pass which removes the need for storing the numbers.



            $beginalign
            sigma(x) times (n-1) &= sum( (overline x-x_i)^2 ) \
            &= sum( overline x^2 &-&2 overline x times x_i &+ x_i^2 ) \
            &= ntimesoverline x^2 &-& 2 overline xtimessum x_i &+ sum x_i^2 \
            &= ntimes(sum x_i/n)^2&-& 2 sum(x_i)/ntimessum x_i &+ sum x_i^2\
            &= 1/n(sum x_i)^2 &-& 2/n (sum(x_i))^2 &+ sum x_i^2\
            &= -1/n (sum x_i)^2 &&& +sum x_i^2
            endalign$



            So all you need is is to get the sum of the values and the sum of the squares of the values.



            However if the values are large and the variance is small then you can run into stability issues. So instead you can subtract a constant (pick the first value) from each value to compute the variance.



            std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

            float sum = 0;
            float sumAdjusted= 0;
            float sumSquares = 0;
            int constant = numbers.front();
            for(int number : numbers)

            sum += number;
            sumAdjusted += number-constant;
            sumSquares += (number-constant)*(number-constant)

            float average = sum / (numbers.size());
            float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
            return std::make_pair(average , variance);






            share|improve this answer



















            • 1




              That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
              – Toby Speight
              Jan 19 at 16:33










            • This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
              – Cris Luengo
              Jan 29 at 20:23










            • @CrisLuengo I moved the mean closer to zero using the first element as offset.
              – ratchet freak
              Jan 29 at 20:39






            • 1




              In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
              – Cris Luengo
              Jan 29 at 22:00

















            up vote
            4
            down vote













            In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.



            If you only have one number in your file, computeVariance will return a NaN, because it'll divide 0 by 0. There are two different ways to calculate variance. Are you using the correct one? This will have an effect on the computed standard deviation.



            When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.



            bool first = true;
            for (auto number: numbers)
            if (!first) std::cout << ", ";
            first = false;
            std::cout << number;

            std::cout << std::endl;


            endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.



            When outputting the results, just output the number; don't convert it to a string first.






            share|improve this answer





















              Your Answer




              StackExchange.ifUsing("editor", function ()
              return StackExchange.using("mathjaxEditing", function ()
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
              );
              );
              , "mathjax-editing");

              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "196"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: false,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );








               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185450%2fcompute-mean-variance-and-standard-deviation-of-csv-number-file%23new-answer', 'question_page');

              );

              Post as a guest






























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              8
              down vote



              accepted
              +50










              Portability



              #import is a GCC extension (or perhaps a preview of C++20).



              There's no good reason not to simply #include <cmath> here.



              Headers and namespaces



              We don't use any string-stream, so #include <sstream> can be removed.



              Bringing all names in from a namespace is problematic; namespace std particularly so. It can silently change the meaning of your program when you're not expecting it. Get used to using the namespace prefix (std is intentionally very short), or importing just the names you need into the smallest reasonable scope.



              In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.



              readFile()



              These lines can be simplified:



              std::ifstream in_file;
              in_file.open(filePath);


              We can ask the constructor to open the file for us:



              std::ifstream in_file(filePath);


              This loop has some error checking, but it's not complete:



              while(std::getline(in_file,line,'r'))
              numbers.push_back(std::stoi(line));



              Firstly, we don't expect any carriage-return in the input file (we opened it in text mode, so on systems that use CR as line delimiter, they will be converted to n). Secondly, std::stoi throws exceptions when the string cannot be converted, but we probably also want to check whether there are leftover, unconverted characters after our integer (e.g. if someone thought they could supply decimal values).



              computeMean()



              Why return a float rather than double? Single-precision floats are normally used only where the storage size is an important consideration, which is not the case here. (Note that on many platforms, double is the natural (and fastest) size of floating-point.)



              We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.



              Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):



              if (numbers.empty())
              return std::numeric_limits<double>::quiet_NaN();


              This loop:



              double total = 0;
              for (int number : numbers)
              total += number;



              can be written (with #include <numeric>) as



              double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);


              computeVariance()



              We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.



              Apart from that, the comments above for computeMean() are relevant:



              double computeSampleVariance(const double mean, const std::vector<int>& numbers)

              if (numbers.size() <= 1u)
              return std::numeric_limits<double>::quiet_NaN();

              auto add_square = [mean](double sum, int i)

              auto d = i - mean;
              return sum + d*d;
              ;
              double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
              return total / (numbers.size() - 1);



              Single-pass algorithm



              It is possible to compute the mean and variances in a single pass - but not using the (exact-arithmetic) methods you likely learnt in high school, which suffer from lack of precision with the inexact floating-point types we can use. The topic is too deep for this review, but if you research Welford's Algorithm, you will find reference implementations to guide you.



              That said, for your purposes, the straightforward two-pass algorithm is probably appropriate, and it's easy to read and understand, so I wouldn't recommend changing it unless you reach a point where your input set becomes too large to hold in a vector (and even then, only if you can't read the file multiple times).




              My version



              #include <algorithm>
              #include <cmath>
              #include <fstream>
              #include <iostream>
              #include <iterator>
              #include <limits>
              #include <numeric>
              #include <vector>

              std::vector<int> readFile(const std::string& filePath)

              std::ifstream in_file(filePath);
              std::istream_iterator<int> startin_file, end;
              std::vector<int> numbers;
              std::copy(start, end, std::back_inserter(numbers));
              return numbers;


              double computeMean(const std::vector<int>& numbers)

              if (numbers.empty())
              return std::numeric_limits<double>::quiet_NaN();

              return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


              double computeSampleVariance(const double mean, const std::vector<int>& numbers)

              if (numbers.size() <= 1u)
              return std::numeric_limits<double>::quiet_NaN();

              auto const add_square = [mean](double sum, int i)
              auto d = i - mean;
              return sum + d*d;
              ;
              double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
              return total / (numbers.size() - 1);


              int main()

              #ifdef TEST
              const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
              #else
              std::cout << "Please enter the file path :" << std::endl;
              std::string filePath;
              std::cin >> filePath;
              const std::vector<int> numbers = readFile(filePath);
              #endif

              double mean = computeMean(numbers);
              double variance = computeSampleVariance(mean, numbers);
              double standardDeviation = std::sqrt(variance);

              std::cout << numbers.size() << " numbers : ";
              auto separator = "";
              for (int number: numbers)
              std::cout << separator << number;
              separator = ", ";

              std::cout << std::endl;

              std::cout << "Mean : " << std::to_string(mean)
              << "Variance : " << std::to_string(variance)
              << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
              return 0;






              share|improve this answer



























                up vote
                8
                down vote



                accepted
                +50










                Portability



                #import is a GCC extension (or perhaps a preview of C++20).



                There's no good reason not to simply #include <cmath> here.



                Headers and namespaces



                We don't use any string-stream, so #include <sstream> can be removed.



                Bringing all names in from a namespace is problematic; namespace std particularly so. It can silently change the meaning of your program when you're not expecting it. Get used to using the namespace prefix (std is intentionally very short), or importing just the names you need into the smallest reasonable scope.



                In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.



                readFile()



                These lines can be simplified:



                std::ifstream in_file;
                in_file.open(filePath);


                We can ask the constructor to open the file for us:



                std::ifstream in_file(filePath);


                This loop has some error checking, but it's not complete:



                while(std::getline(in_file,line,'r'))
                numbers.push_back(std::stoi(line));



                Firstly, we don't expect any carriage-return in the input file (we opened it in text mode, so on systems that use CR as line delimiter, they will be converted to n). Secondly, std::stoi throws exceptions when the string cannot be converted, but we probably also want to check whether there are leftover, unconverted characters after our integer (e.g. if someone thought they could supply decimal values).



                computeMean()



                Why return a float rather than double? Single-precision floats are normally used only where the storage size is an important consideration, which is not the case here. (Note that on many platforms, double is the natural (and fastest) size of floating-point.)



                We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.



                Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):



                if (numbers.empty())
                return std::numeric_limits<double>::quiet_NaN();


                This loop:



                double total = 0;
                for (int number : numbers)
                total += number;



                can be written (with #include <numeric>) as



                double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);


                computeVariance()



                We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.



                Apart from that, the comments above for computeMean() are relevant:



                double computeSampleVariance(const double mean, const std::vector<int>& numbers)

                if (numbers.size() <= 1u)
                return std::numeric_limits<double>::quiet_NaN();

                auto add_square = [mean](double sum, int i)

                auto d = i - mean;
                return sum + d*d;
                ;
                double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
                return total / (numbers.size() - 1);



                Single-pass algorithm



                It is possible to compute the mean and variances in a single pass - but not using the (exact-arithmetic) methods you likely learnt in high school, which suffer from lack of precision with the inexact floating-point types we can use. The topic is too deep for this review, but if you research Welford's Algorithm, you will find reference implementations to guide you.



                That said, for your purposes, the straightforward two-pass algorithm is probably appropriate, and it's easy to read and understand, so I wouldn't recommend changing it unless you reach a point where your input set becomes too large to hold in a vector (and even then, only if you can't read the file multiple times).




                My version



                #include <algorithm>
                #include <cmath>
                #include <fstream>
                #include <iostream>
                #include <iterator>
                #include <limits>
                #include <numeric>
                #include <vector>

                std::vector<int> readFile(const std::string& filePath)

                std::ifstream in_file(filePath);
                std::istream_iterator<int> startin_file, end;
                std::vector<int> numbers;
                std::copy(start, end, std::back_inserter(numbers));
                return numbers;


                double computeMean(const std::vector<int>& numbers)

                if (numbers.empty())
                return std::numeric_limits<double>::quiet_NaN();

                return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


                double computeSampleVariance(const double mean, const std::vector<int>& numbers)

                if (numbers.size() <= 1u)
                return std::numeric_limits<double>::quiet_NaN();

                auto const add_square = [mean](double sum, int i)
                auto d = i - mean;
                return sum + d*d;
                ;
                double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
                return total / (numbers.size() - 1);


                int main()

                #ifdef TEST
                const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
                #else
                std::cout << "Please enter the file path :" << std::endl;
                std::string filePath;
                std::cin >> filePath;
                const std::vector<int> numbers = readFile(filePath);
                #endif

                double mean = computeMean(numbers);
                double variance = computeSampleVariance(mean, numbers);
                double standardDeviation = std::sqrt(variance);

                std::cout << numbers.size() << " numbers : ";
                auto separator = "";
                for (int number: numbers)
                std::cout << separator << number;
                separator = ", ";

                std::cout << std::endl;

                std::cout << "Mean : " << std::to_string(mean)
                << "Variance : " << std::to_string(variance)
                << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
                return 0;






                share|improve this answer

























                  up vote
                  8
                  down vote



                  accepted
                  +50







                  up vote
                  8
                  down vote



                  accepted
                  +50




                  +50




                  Portability



                  #import is a GCC extension (or perhaps a preview of C++20).



                  There's no good reason not to simply #include <cmath> here.



                  Headers and namespaces



                  We don't use any string-stream, so #include <sstream> can be removed.



                  Bringing all names in from a namespace is problematic; namespace std particularly so. It can silently change the meaning of your program when you're not expecting it. Get used to using the namespace prefix (std is intentionally very short), or importing just the names you need into the smallest reasonable scope.



                  In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.



                  readFile()



                  These lines can be simplified:



                  std::ifstream in_file;
                  in_file.open(filePath);


                  We can ask the constructor to open the file for us:



                  std::ifstream in_file(filePath);


                  This loop has some error checking, but it's not complete:



                  while(std::getline(in_file,line,'r'))
                  numbers.push_back(std::stoi(line));



                  Firstly, we don't expect any carriage-return in the input file (we opened it in text mode, so on systems that use CR as line delimiter, they will be converted to n). Secondly, std::stoi throws exceptions when the string cannot be converted, but we probably also want to check whether there are leftover, unconverted characters after our integer (e.g. if someone thought they could supply decimal values).



                  computeMean()



                  Why return a float rather than double? Single-precision floats are normally used only where the storage size is an important consideration, which is not the case here. (Note that on many platforms, double is the natural (and fastest) size of floating-point.)



                  We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.



                  Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):



                  if (numbers.empty())
                  return std::numeric_limits<double>::quiet_NaN();


                  This loop:



                  double total = 0;
                  for (int number : numbers)
                  total += number;



                  can be written (with #include <numeric>) as



                  double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);


                  computeVariance()



                  We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.



                  Apart from that, the comments above for computeMean() are relevant:



                  double computeSampleVariance(const double mean, const std::vector<int>& numbers)

                  if (numbers.size() <= 1u)
                  return std::numeric_limits<double>::quiet_NaN();

                  auto add_square = [mean](double sum, int i)

                  auto d = i - mean;
                  return sum + d*d;
                  ;
                  double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
                  return total / (numbers.size() - 1);



                  Single-pass algorithm



                  It is possible to compute the mean and variances in a single pass - but not using the (exact-arithmetic) methods you likely learnt in high school, which suffer from lack of precision with the inexact floating-point types we can use. The topic is too deep for this review, but if you research Welford's Algorithm, you will find reference implementations to guide you.



                  That said, for your purposes, the straightforward two-pass algorithm is probably appropriate, and it's easy to read and understand, so I wouldn't recommend changing it unless you reach a point where your input set becomes too large to hold in a vector (and even then, only if you can't read the file multiple times).




                  My version



                  #include <algorithm>
                  #include <cmath>
                  #include <fstream>
                  #include <iostream>
                  #include <iterator>
                  #include <limits>
                  #include <numeric>
                  #include <vector>

                  std::vector<int> readFile(const std::string& filePath)

                  std::ifstream in_file(filePath);
                  std::istream_iterator<int> startin_file, end;
                  std::vector<int> numbers;
                  std::copy(start, end, std::back_inserter(numbers));
                  return numbers;


                  double computeMean(const std::vector<int>& numbers)

                  if (numbers.empty())
                  return std::numeric_limits<double>::quiet_NaN();

                  return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


                  double computeSampleVariance(const double mean, const std::vector<int>& numbers)

                  if (numbers.size() <= 1u)
                  return std::numeric_limits<double>::quiet_NaN();

                  auto const add_square = [mean](double sum, int i)
                  auto d = i - mean;
                  return sum + d*d;
                  ;
                  double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
                  return total / (numbers.size() - 1);


                  int main()

                  #ifdef TEST
                  const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
                  #else
                  std::cout << "Please enter the file path :" << std::endl;
                  std::string filePath;
                  std::cin >> filePath;
                  const std::vector<int> numbers = readFile(filePath);
                  #endif

                  double mean = computeMean(numbers);
                  double variance = computeSampleVariance(mean, numbers);
                  double standardDeviation = std::sqrt(variance);

                  std::cout << numbers.size() << " numbers : ";
                  auto separator = "";
                  for (int number: numbers)
                  std::cout << separator << number;
                  separator = ", ";

                  std::cout << std::endl;

                  std::cout << "Mean : " << std::to_string(mean)
                  << "Variance : " << std::to_string(variance)
                  << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
                  return 0;






                  share|improve this answer















                  Portability



                  #import is a GCC extension (or perhaps a preview of C++20).



                  There's no good reason not to simply #include <cmath> here.



                  Headers and namespaces



                  We don't use any string-stream, so #include <sstream> can be removed.



                  Bringing all names in from a namespace is problematic; namespace std particularly so. It can silently change the meaning of your program when you're not expecting it. Get used to using the namespace prefix (std is intentionally very short), or importing just the names you need into the smallest reasonable scope.



                  In this program, the only places the std:: prefix were missing were std::ifstream and std::sqrt, so this wasn't hard to fix.



                  readFile()



                  These lines can be simplified:



                  std::ifstream in_file;
                  in_file.open(filePath);


                  We can ask the constructor to open the file for us:



                  std::ifstream in_file(filePath);


                  This loop has some error checking, but it's not complete:



                  while(std::getline(in_file,line,'r'))
                  numbers.push_back(std::stoi(line));



                  Firstly, we don't expect any carriage-return in the input file (we opened it in text mode, so on systems that use CR as line delimiter, they will be converted to n). Secondly, std::stoi throws exceptions when the string cannot be converted, but we probably also want to check whether there are leftover, unconverted characters after our integer (e.g. if someone thought they could supply decimal values).



                  computeMean()



                  Why return a float rather than double? Single-precision floats are normally used only where the storage size is an important consideration, which is not the case here. (Note that on many platforms, double is the natural (and fastest) size of floating-point.)



                  We should pass the vector by reference to a const object, as we don't want to make a copy or to modify the value.



                  Instead of returning zero when there are no members, perhaps we should return a NaN value (which is more consistent with arithmetic 0.0 / 0):



                  if (numbers.empty())
                  return std::numeric_limits<double>::quiet_NaN();


                  This loop:



                  double total = 0;
                  for (int number : numbers)
                  total += number;



                  can be written (with #include <numeric>) as



                  double total = std::accumulate(numbers.begin(), numbers.end(), 0.0);


                  computeVariance()



                  We need to be clear which variance (sample or population) we're returning. We're also missing a size check similar to that for the mean.



                  Apart from that, the comments above for computeMean() are relevant:



                  double computeSampleVariance(const double mean, const std::vector<int>& numbers)

                  if (numbers.size() <= 1u)
                  return std::numeric_limits<double>::quiet_NaN();

                  auto add_square = [mean](double sum, int i)

                  auto d = i - mean;
                  return sum + d*d;
                  ;
                  double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
                  return total / (numbers.size() - 1);



                  Single-pass algorithm



                  It is possible to compute the mean and variances in a single pass - but not using the (exact-arithmetic) methods you likely learnt in high school, which suffer from lack of precision with the inexact floating-point types we can use. The topic is too deep for this review, but if you research Welford's Algorithm, you will find reference implementations to guide you.



                  That said, for your purposes, the straightforward two-pass algorithm is probably appropriate, and it's easy to read and understand, so I wouldn't recommend changing it unless you reach a point where your input set becomes too large to hold in a vector (and even then, only if you can't read the file multiple times).




                  My version



                  #include <algorithm>
                  #include <cmath>
                  #include <fstream>
                  #include <iostream>
                  #include <iterator>
                  #include <limits>
                  #include <numeric>
                  #include <vector>

                  std::vector<int> readFile(const std::string& filePath)

                  std::ifstream in_file(filePath);
                  std::istream_iterator<int> startin_file, end;
                  std::vector<int> numbers;
                  std::copy(start, end, std::back_inserter(numbers));
                  return numbers;


                  double computeMean(const std::vector<int>& numbers)

                  if (numbers.empty())
                  return std::numeric_limits<double>::quiet_NaN();

                  return std::accumulate(numbers.begin(), numbers.end(), 0.0) / numbers.size();


                  double computeSampleVariance(const double mean, const std::vector<int>& numbers)

                  if (numbers.size() <= 1u)
                  return std::numeric_limits<double>::quiet_NaN();

                  auto const add_square = [mean](double sum, int i)
                  auto d = i - mean;
                  return sum + d*d;
                  ;
                  double total = std::accumulate(numbers.begin(), numbers.end(), 0.0, add_square);
                  return total / (numbers.size() - 1);


                  int main()

                  #ifdef TEST
                  const std::vector<int> numbers = -2, -1, 1, 2, 100000-2, 100000-1, 100000+1, 100000+2;
                  #else
                  std::cout << "Please enter the file path :" << std::endl;
                  std::string filePath;
                  std::cin >> filePath;
                  const std::vector<int> numbers = readFile(filePath);
                  #endif

                  double mean = computeMean(numbers);
                  double variance = computeSampleVariance(mean, numbers);
                  double standardDeviation = std::sqrt(variance);

                  std::cout << numbers.size() << " numbers : ";
                  auto separator = "";
                  for (int number: numbers)
                  std::cout << separator << number;
                  separator = ", ";

                  std::cout << std::endl;

                  std::cout << "Mean : " << std::to_string(mean)
                  << "Variance : " << std::to_string(variance)
                  << "Standard Deviation : " << std::to_string(standardDeviation) << std::endl;
                  return 0;







                  share|improve this answer















                  share|improve this answer



                  share|improve this answer








                  edited Jun 6 at 13:27


























                  answered Jan 30 at 9:38









                  Toby Speight

                  17.8k13491




                  17.8k13491






















                      up vote
                      5
                      down vote













                      You could compute both the mean and variance on a single pass which removes the need for storing the numbers.



                      $beginalign
                      sigma(x) times (n-1) &= sum( (overline x-x_i)^2 ) \
                      &= sum( overline x^2 &-&2 overline x times x_i &+ x_i^2 ) \
                      &= ntimesoverline x^2 &-& 2 overline xtimessum x_i &+ sum x_i^2 \
                      &= ntimes(sum x_i/n)^2&-& 2 sum(x_i)/ntimessum x_i &+ sum x_i^2\
                      &= 1/n(sum x_i)^2 &-& 2/n (sum(x_i))^2 &+ sum x_i^2\
                      &= -1/n (sum x_i)^2 &&& +sum x_i^2
                      endalign$



                      So all you need is is to get the sum of the values and the sum of the squares of the values.



                      However if the values are large and the variance is small then you can run into stability issues. So instead you can subtract a constant (pick the first value) from each value to compute the variance.



                      std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

                      float sum = 0;
                      float sumAdjusted= 0;
                      float sumSquares = 0;
                      int constant = numbers.front();
                      for(int number : numbers)

                      sum += number;
                      sumAdjusted += number-constant;
                      sumSquares += (number-constant)*(number-constant)

                      float average = sum / (numbers.size());
                      float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
                      return std::make_pair(average , variance);






                      share|improve this answer



















                      • 1




                        That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
                        – Toby Speight
                        Jan 19 at 16:33










                      • This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
                        – Cris Luengo
                        Jan 29 at 20:23










                      • @CrisLuengo I moved the mean closer to zero using the first element as offset.
                        – ratchet freak
                        Jan 29 at 20:39






                      • 1




                        In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
                        – Cris Luengo
                        Jan 29 at 22:00














                      up vote
                      5
                      down vote













                      You could compute both the mean and variance on a single pass which removes the need for storing the numbers.



                      $beginalign
                      sigma(x) times (n-1) &= sum( (overline x-x_i)^2 ) \
                      &= sum( overline x^2 &-&2 overline x times x_i &+ x_i^2 ) \
                      &= ntimesoverline x^2 &-& 2 overline xtimessum x_i &+ sum x_i^2 \
                      &= ntimes(sum x_i/n)^2&-& 2 sum(x_i)/ntimessum x_i &+ sum x_i^2\
                      &= 1/n(sum x_i)^2 &-& 2/n (sum(x_i))^2 &+ sum x_i^2\
                      &= -1/n (sum x_i)^2 &&& +sum x_i^2
                      endalign$



                      So all you need is is to get the sum of the values and the sum of the squares of the values.



                      However if the values are large and the variance is small then you can run into stability issues. So instead you can subtract a constant (pick the first value) from each value to compute the variance.



                      std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

                      float sum = 0;
                      float sumAdjusted= 0;
                      float sumSquares = 0;
                      int constant = numbers.front();
                      for(int number : numbers)

                      sum += number;
                      sumAdjusted += number-constant;
                      sumSquares += (number-constant)*(number-constant)

                      float average = sum / (numbers.size());
                      float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
                      return std::make_pair(average , variance);






                      share|improve this answer



















                      • 1




                        That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
                        – Toby Speight
                        Jan 19 at 16:33










                      • This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
                        – Cris Luengo
                        Jan 29 at 20:23










                      • @CrisLuengo I moved the mean closer to zero using the first element as offset.
                        – ratchet freak
                        Jan 29 at 20:39






                      • 1




                        In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
                        – Cris Luengo
                        Jan 29 at 22:00












                      up vote
                      5
                      down vote










                      up vote
                      5
                      down vote









                      You could compute both the mean and variance on a single pass which removes the need for storing the numbers.



                      $beginalign
                      sigma(x) times (n-1) &= sum( (overline x-x_i)^2 ) \
                      &= sum( overline x^2 &-&2 overline x times x_i &+ x_i^2 ) \
                      &= ntimesoverline x^2 &-& 2 overline xtimessum x_i &+ sum x_i^2 \
                      &= ntimes(sum x_i/n)^2&-& 2 sum(x_i)/ntimessum x_i &+ sum x_i^2\
                      &= 1/n(sum x_i)^2 &-& 2/n (sum(x_i))^2 &+ sum x_i^2\
                      &= -1/n (sum x_i)^2 &&& +sum x_i^2
                      endalign$



                      So all you need is is to get the sum of the values and the sum of the squares of the values.



                      However if the values are large and the variance is small then you can run into stability issues. So instead you can subtract a constant (pick the first value) from each value to compute the variance.



                      std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

                      float sum = 0;
                      float sumAdjusted= 0;
                      float sumSquares = 0;
                      int constant = numbers.front();
                      for(int number : numbers)

                      sum += number;
                      sumAdjusted += number-constant;
                      sumSquares += (number-constant)*(number-constant)

                      float average = sum / (numbers.size());
                      float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
                      return std::make_pair(average , variance);






                      share|improve this answer















                      You could compute both the mean and variance on a single pass which removes the need for storing the numbers.



                      $beginalign
                      sigma(x) times (n-1) &= sum( (overline x-x_i)^2 ) \
                      &= sum( overline x^2 &-&2 overline x times x_i &+ x_i^2 ) \
                      &= ntimesoverline x^2 &-& 2 overline xtimessum x_i &+ sum x_i^2 \
                      &= ntimes(sum x_i/n)^2&-& 2 sum(x_i)/ntimessum x_i &+ sum x_i^2\
                      &= 1/n(sum x_i)^2 &-& 2/n (sum(x_i))^2 &+ sum x_i^2\
                      &= -1/n (sum x_i)^2 &&& +sum x_i^2
                      endalign$



                      So all you need is is to get the sum of the values and the sum of the squares of the values.



                      However if the values are large and the variance is small then you can run into stability issues. So instead you can subtract a constant (pick the first value) from each value to compute the variance.



                      std::pair<float, float> computeVarianceAndMean(std::vector<int> const& numbers)

                      float sum = 0;
                      float sumAdjusted= 0;
                      float sumSquares = 0;
                      int constant = numbers.front();
                      for(int number : numbers)

                      sum += number;
                      sumAdjusted += number-constant;
                      sumSquares += (number-constant)*(number-constant)

                      float average = sum / (numbers.size());
                      float variance = (-sumAdjusted/numbers.size() + sumSquares)/(numbers.size()-1);
                      return std::make_pair(average , variance);







                      share|improve this answer















                      share|improve this answer



                      share|improve this answer








                      edited Jan 26 at 0:31









                      200_success

                      123k14143401




                      123k14143401











                      answered Jan 19 at 12:48









                      ratchet freak

                      11.4k1240




                      11.4k1240







                      • 1




                        That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
                        – Toby Speight
                        Jan 19 at 16:33










                      • This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
                        – Cris Luengo
                        Jan 29 at 20:23










                      • @CrisLuengo I moved the mean closer to zero using the first element as offset.
                        – ratchet freak
                        Jan 29 at 20:39






                      • 1




                        In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
                        – Cris Luengo
                        Jan 29 at 22:00












                      • 1




                        That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
                        – Toby Speight
                        Jan 19 at 16:33










                      • This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
                        – Cris Luengo
                        Jan 29 at 20:23










                      • @CrisLuengo I moved the mean closer to zero using the first element as offset.
                        – ratchet freak
                        Jan 29 at 20:39






                      • 1




                        In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
                        – Cris Luengo
                        Jan 29 at 22:00







                      1




                      1




                      That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
                      – Toby Speight
                      Jan 19 at 16:33




                      That method has serious problems with numerical stability - see the resources mentioned in comments to A bag of numbers in C++ for constant time statistics queries - follow-up 2. Consider using Welford's method, instead.
                      – Toby Speight
                      Jan 19 at 16:33












                      This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
                      – Cris Luengo
                      Jan 29 at 20:23




                      This method is useful only if variance is large with respect to mean. It is possible to make this assumption before seeing the data if the data is 8-bit or 16-bit integers. For anything else, you can get into deep trouble. And when you do, you might not even notice...
                      – Cris Luengo
                      Jan 29 at 20:23












                      @CrisLuengo I moved the mean closer to zero using the first element as offset.
                      – ratchet freak
                      Jan 29 at 20:39




                      @CrisLuengo I moved the mean closer to zero using the first element as offset.
                      – ratchet freak
                      Jan 29 at 20:39




                      1




                      1




                      In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
                      – Cris Luengo
                      Jan 29 at 22:00




                      In that case you should compute the mean as constant + sumAdjusted/numbers.size(). But it would still be better to use the algorithm as described by Weldorf in the links referenced above. Your algorithm would have a problem if the first element is an outlier.
                      – Cris Luengo
                      Jan 29 at 22:00










                      up vote
                      4
                      down vote













                      In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.



                      If you only have one number in your file, computeVariance will return a NaN, because it'll divide 0 by 0. There are two different ways to calculate variance. Are you using the correct one? This will have an effect on the computed standard deviation.



                      When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.



                      bool first = true;
                      for (auto number: numbers)
                      if (!first) std::cout << ", ";
                      first = false;
                      std::cout << number;

                      std::cout << std::endl;


                      endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.



                      When outputting the results, just output the number; don't convert it to a string first.






                      share|improve this answer

























                        up vote
                        4
                        down vote













                        In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.



                        If you only have one number in your file, computeVariance will return a NaN, because it'll divide 0 by 0. There are two different ways to calculate variance. Are you using the correct one? This will have an effect on the computed standard deviation.



                        When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.



                        bool first = true;
                        for (auto number: numbers)
                        if (!first) std::cout << ", ";
                        first = false;
                        std::cout << number;

                        std::cout << std::endl;


                        endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.



                        When outputting the results, just output the number; don't convert it to a string first.






                        share|improve this answer























                          up vote
                          4
                          down vote










                          up vote
                          4
                          down vote









                          In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.



                          If you only have one number in your file, computeVariance will return a NaN, because it'll divide 0 by 0. There are two different ways to calculate variance. Are you using the correct one? This will have an effect on the computed standard deviation.



                          When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.



                          bool first = true;
                          for (auto number: numbers)
                          if (!first) std::cout << ", ";
                          first = false;
                          std::cout << number;

                          std::cout << std::endl;


                          endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.



                          When outputting the results, just output the number; don't convert it to a string first.






                          share|improve this answer













                          In computeMean and computeVariance, you can pass the numbers vector in by const &. This avoids making a copy of the vector.



                          If you only have one number in your file, computeVariance will return a NaN, because it'll divide 0 by 0. There are two different ways to calculate variance. Are you using the correct one? This will have an effect on the computed standard deviation.



                          When you output all the numbers, don't build a string. Just output the numbers. The comma can be handled by outputting it before the number, if the number isn't the first one.



                          bool first = true;
                          for (auto number: numbers)
                          if (!first) std::cout << ", ";
                          first = false;
                          std::cout << number;

                          std::cout << std::endl;


                          endl will flush the output stream, and since you're doing more output right away you could use 'n' instead.



                          When outputting the results, just output the number; don't convert it to a string first.







                          share|improve this answer













                          share|improve this answer



                          share|improve this answer











                          answered Jan 26 at 0:57









                          1201ProgramAlarm

                          2,5852618




                          2,5852618






















                               

                              draft saved


                              draft discarded


























                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185450%2fcompute-mean-variance-and-standard-deviation-of-csv-number-file%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Popular posts from this blog

                              Python Lists

                              Aion

                              JavaScript Array Iteration Methods