Implementing and benchmarking Radix Sort on unsigned integers

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
8
down vote

favorite

Introduction

I implemented a restricted version of the sorting algorithm, the one that works only with unsigned integers. I wanted to try using std::stable_partition with bitmask, as I chose base 2. The reason partition works here is because there are only two "bins", thus it is somewhat like partitioning. I mandated customizing the maximum power of two that is encountered in the sequence.

Code

#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;

Weird results

The algorithm has a lottle of wiggle regarding the constant factor, though it is linear. The picture below is for highest power of 16, x-axis is element count, and y-axis is nanoseconds (sorry I'm not on good terms with gnuplot). So it takes ~1.6 milliseconds to sort 30'000 elements, which is I think quite a lot for linear sorting algorithm.

Performance of radix sort

I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.

Full code

#include <vector>
#include <algorithm>
#include <random>

std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)

 auto generator = [power_limit]
 
 static std::mt19937 twister;
 static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
 return dist(twister);
 ;

 std::vector<unsigned> v(size);
 std::generate(v.begin(), v.end(), generator);

 return v;


#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;
 
 


#include <chrono>
#include <atomic>

namespace shino

 template <typename Clock = std::chrono::high_resolution_clock>
 class stopwatch
 
 const typename Clock::time_point start_point;
 public:
 stopwatch() :
 start_point(Clock::now())
 

 template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
 Rep elapsed_time() const
 
 std::atomic_thread_fence(std::memory_order_relaxed);
 auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
 std::atomic_thread_fence(std::memory_order_relaxed);
 return static_cast<Rep>(counted_time);
 
 ;

 using precise_stopwatch = stopwatch<>;
 using system_stopwatch = stopwatch<std::chrono::system_clock>;
 using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;


#include <stdexcept>
#include <fstream>

int main()

 std::ofstream measurings("measurings.csv");
 if (!measurings)
 throw std::runtime_error("measurings file opening failed");

 for (unsigned power = 4; power <= 16; ++power)
 
 for (auto size = 0ull; size <= 20'000; ++size)
 
 std::cout << "radix sorting vector of size " << size << 'n';
 auto input = generate_vector(size, power);
 shino::precise_stopwatch stopwatch;
 shino::radix_sort(input.begin(), input.end(), power);
 auto elapsed_time = stopwatch.elapsed_time();
 measurings << power << ',' << size << ',' << elapsed_time << 'n';
 if (not std::is_sorted(input.begin(), input.end()))
 throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));

The code records performance and checks if sort actually worked correctly. So if you make any changes, just run the code to test it. stopwatch code is taken from here. In fact, I believe it deserves a review on its own.

I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.

Concerns

Is algorithm implemented well?

Are there points to improve performance? Make a better use of standard library?

Will the inner lambda cause code bloat?

Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.

Anything else that comes to mind

edited Jun 23 at 7:35

asked Jun 22 at 22:53

Incomputable

5,99721144

1

One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of ofstream and in particular when it needs to stop and flush the data buffer to disk.
â€“Â 1201ProgramAlarm
Jun 23 at 1:42

@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â€“Â Incomputable
Jun 23 at 7:36

Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â€“Â JDÃ…Â‚ugosz
Jun 23 at 9:23

@JDÃ…Â‚ugosz, I believe it does. IÃ¢Â€Â™m not sure though.
â€“Â Incomputable
Jun 23 at 10:34

Could I get a copy of your cvs file please? :)
â€“Â Rakete1111
Jun 26 at 15:55

Â |Â
show 2 more comments

up vote
8
down vote

favorite

Introduction

Code

#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;

Weird results

Performance of radix sort

I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.

Full code

#include <vector>
#include <algorithm>
#include <random>

std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)

 auto generator = [power_limit]
 
 static std::mt19937 twister;
 static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
 return dist(twister);
 ;

 std::vector<unsigned> v(size);
 std::generate(v.begin(), v.end(), generator);

 return v;


#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;
 
 


#include <chrono>
#include <atomic>

namespace shino

 template <typename Clock = std::chrono::high_resolution_clock>
 class stopwatch
 
 const typename Clock::time_point start_point;
 public:
 stopwatch() :
 start_point(Clock::now())
 

 template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
 Rep elapsed_time() const
 
 std::atomic_thread_fence(std::memory_order_relaxed);
 auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
 std::atomic_thread_fence(std::memory_order_relaxed);
 return static_cast<Rep>(counted_time);
 
 ;

 using precise_stopwatch = stopwatch<>;
 using system_stopwatch = stopwatch<std::chrono::system_clock>;
 using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;


#include <stdexcept>
#include <fstream>

int main()

 std::ofstream measurings("measurings.csv");
 if (!measurings)
 throw std::runtime_error("measurings file opening failed");

 for (unsigned power = 4; power <= 16; ++power)
 
 for (auto size = 0ull; size <= 20'000; ++size)
 
 std::cout << "radix sorting vector of size " << size << 'n';
 auto input = generate_vector(size, power);
 shino::precise_stopwatch stopwatch;
 shino::radix_sort(input.begin(), input.end(), power);
 auto elapsed_time = stopwatch.elapsed_time();
 measurings << power << ',' << size << ',' << elapsed_time << 'n';
 if (not std::is_sorted(input.begin(), input.end()))
 throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));

I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.

Concerns

Is algorithm implemented well?

Are there points to improve performance? Make a better use of standard library?

Will the inner lambda cause code bloat?

Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.

Anything else that comes to mind

edited Jun 23 at 7:35

asked Jun 22 at 22:53

Incomputable

5,99721144

1

One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of ofstream and in particular when it needs to stop and flush the data buffer to disk.
â€“Â 1201ProgramAlarm
Jun 23 at 1:42

@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â€“Â Incomputable
Jun 23 at 7:36

Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â€“Â JDÃ…Â‚ugosz
Jun 23 at 9:23

@JDÃ…Â‚ugosz, I believe it does. IÃ¢Â€Â™m not sure though.
â€“Â Incomputable
Jun 23 at 10:34

Could I get a copy of your cvs file please? :)
â€“Â Rakete1111
Jun 26 at 15:55

Â |Â
show 2 more comments

up vote
8
down vote

favorite

Introduction

Code

#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;

Weird results

Performance of radix sort

I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.

Full code

#include <vector>
#include <algorithm>
#include <random>

std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)

 auto generator = [power_limit]
 
 static std::mt19937 twister;
 static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
 return dist(twister);
 ;

 std::vector<unsigned> v(size);
 std::generate(v.begin(), v.end(), generator);

 return v;


#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;
 
 


#include <chrono>
#include <atomic>

namespace shino

 template <typename Clock = std::chrono::high_resolution_clock>
 class stopwatch
 
 const typename Clock::time_point start_point;
 public:
 stopwatch() :
 start_point(Clock::now())
 

 template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
 Rep elapsed_time() const
 
 std::atomic_thread_fence(std::memory_order_relaxed);
 auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
 std::atomic_thread_fence(std::memory_order_relaxed);
 return static_cast<Rep>(counted_time);
 
 ;

 using precise_stopwatch = stopwatch<>;
 using system_stopwatch = stopwatch<std::chrono::system_clock>;
 using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;


#include <stdexcept>
#include <fstream>

int main()

 std::ofstream measurings("measurings.csv");
 if (!measurings)
 throw std::runtime_error("measurings file opening failed");

 for (unsigned power = 4; power <= 16; ++power)
 
 for (auto size = 0ull; size <= 20'000; ++size)
 
 std::cout << "radix sorting vector of size " << size << 'n';
 auto input = generate_vector(size, power);
 shino::precise_stopwatch stopwatch;
 shino::radix_sort(input.begin(), input.end(), power);
 auto elapsed_time = stopwatch.elapsed_time();
 measurings << power << ',' << size << ',' << elapsed_time << 'n';
 if (not std::is_sorted(input.begin(), input.end()))
 throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));

I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.

Concerns

Is algorithm implemented well?

Are there points to improve performance? Make a better use of standard library?

Will the inner lambda cause code bloat?

Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.

Anything else that comes to mind

edited Jun 23 at 7:35

asked Jun 22 at 22:53

Incomputable

5,99721144

Introduction

Code

#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;

Weird results

Performance of radix sort

I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.

Full code

#include <vector>
#include <algorithm>
#include <random>

std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)

 auto generator = [power_limit]
 
 static std::mt19937 twister;
 static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
 return dist(twister);
 ;

 std::vector<unsigned> v(size);
 std::generate(v.begin(), v.end(), generator);

 return v;


#include <iterator>
#include <algorithm>

namespace shino

 template <typename BidirIterator>
 void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
 
 using category = typename std::iterator_traits<BidirIterator>::iterator_category;
 using value_type = typename std::iterator_traits<BidirIterator>::value_type;
 static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
 static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");
 value_type mask = 1;
 for (auto power = 0u; power <= upper_power_bound; ++power)
 
 auto predicate = [mask](const auto& value)
 
 return !(value & mask);
 ;
 std::stable_partition(first, last, predicate);
 mask <<= 1;
 
 


#include <chrono>
#include <atomic>

namespace shino

 template <typename Clock = std::chrono::high_resolution_clock>
 class stopwatch
 
 const typename Clock::time_point start_point;
 public:
 stopwatch() :
 start_point(Clock::now())
 

 template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
 Rep elapsed_time() const
 
 std::atomic_thread_fence(std::memory_order_relaxed);
 auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
 std::atomic_thread_fence(std::memory_order_relaxed);
 return static_cast<Rep>(counted_time);
 
 ;

 using precise_stopwatch = stopwatch<>;
 using system_stopwatch = stopwatch<std::chrono::system_clock>;
 using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;


#include <stdexcept>
#include <fstream>

int main()

 std::ofstream measurings("measurings.csv");
 if (!measurings)
 throw std::runtime_error("measurings file opening failed");

 for (unsigned power = 4; power <= 16; ++power)
 
 for (auto size = 0ull; size <= 20'000; ++size)
 
 std::cout << "radix sorting vector of size " << size << 'n';
 auto input = generate_vector(size, power);
 shino::precise_stopwatch stopwatch;
 shino::radix_sort(input.begin(), input.end(), power);
 auto elapsed_time = stopwatch.elapsed_time();
 measurings << power << ',' << size << ',' << elapsed_time << 'n';
 if (not std::is_sorted(input.begin(), input.end()))
 throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));

I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.

Concerns

Is algorithm implemented well?

Are there points to improve performance? Make a better use of standard library?

Will the inner lambda cause code bloat?

Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.

Anything else that comes to mind

edited Jun 23 at 7:35

asked Jun 22 at 22:53

Incomputable

5,99721144

edited Jun 23 at 7:35

asked Jun 22 at 22:53

Incomputable

5,99721144

asked Jun 22 at 22:53

Incomputable

5,99721144

asked Jun 22 at 22:53

Incomputable

5,99721144

1

One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of ofstream and in particular when it needs to stop and flush the data buffer to disk.
â€“Â 1201ProgramAlarm
Jun 23 at 1:42

@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â€“Â Incomputable
Jun 23 at 7:36

Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â€“Â JDÃ…Â‚ugosz
Jun 23 at 9:23

@JDÃ…Â‚ugosz, I believe it does. IÃ¢Â€Â™m not sure though.
â€“Â Incomputable
Jun 23 at 10:34

Could I get a copy of your cvs file please? :)
â€“Â Rakete1111
Jun 26 at 15:55

Â |Â
show 2 more comments

1

One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of ofstream and in particular when it needs to stop and flush the data buffer to disk.
â€“Â 1201ProgramAlarm
Jun 23 at 1:42

@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â€“Â Incomputable
Jun 23 at 7:36

Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â€“Â JDÃ…Â‚ugosz
Jun 23 at 9:23

@JDÃ…Â‚ugosz, I believe it does. IÃ¢Â€Â™m not sure though.
â€“Â Incomputable
Jun 23 at 10:34

Could I get a copy of your cvs file please? :)
â€“Â Rakete1111
Jun 26 at 15:55

One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of ofstream and in particular when it needs to stop and flush the data buffer to disk.
â€“Â 1201ProgramAlarm
Jun 23 at 1:42

@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â€“Â Incomputable
Jun 23 at 7:36

Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â€“Â JDÃ…Â‚ugosz
Jun 23 at 9:23

@JDÃ…Â‚ugosz, I believe it does. IÃ¢Â€Â™m not sure though.
â€“Â Incomputable
Jun 23 at 10:34

Could I get a copy of your cvs file please? :)
â€“Â Rakete1111
Jun 26 at 15:55

Â |Â
show 2 more comments

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

I'll focus just on the algorithm (not the timing code), and the questions.

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");

It seems a bit superfluous to say "static_assert(is_foo, "must be foo")". The assertion predicates are clear enough on their own to not require additional explanation. Also, I wouldn't recommend combining requirements. It makes it harder to diagnose exactly which assertion failed:

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);

Down to the lambda:

auto predicate = [mask](const auto& value)

 return !(value & mask);
;

Unless you're planning to support non-fundamental integer types, taking the argument by const& is probably an unnecessary pessimization. Even if it's not a built-in type, if the binary & operator and conversion to bool are implemented sanely, the lambda's frame can be basically optimized away. It's nice to be able to do everything in-line:

std::stable_partition(first, last, [mask](auto v) return !(v & mask); );

As for the questions:

Is it implemented well? I'd say so. I can't think of many improvements/extensions I'd make. Maybe a check to make sure upper_power_bound is less than the number of bits in value_type so you're not just spinning your wheels... but that could be tricky to pull off generally if you're supporting non-builtin integer types (maybe check to see if mask is zero, assuming the 1 bit drops off the end into the ether if you shift it too far?). Maybe a range version, but that's a pretty trivial addition once the iterator version's working.

Will the inner lambda cause code bloat? No. On any compiler worth mentioning and on any optimization level worth caring about, it will be inlined - likely to a single instruction, maybe two (assuming a built-in unsigned type). Modern compilers are terrifyingly good at optimizing lambdas (or anything where all the code is right there in front of them, really).

Even if it weren't inlined to effectively nothing, the lambda would be dwarfed by std::stable_partition()'s code. Like by two or three orders of magnitude, easy. It's totally not something to worry about.

answered Jun 23 at 11:08

indi

3,021417

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197087%2fimplementing-and-benchmarking-radix-sort-on-unsigned-integers%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
5
down vote

accepted

I'll focus just on the algorithm (not the timing code), and the questions.

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);

Down to the lambda:

auto predicate = [mask](const auto& value)

 return !(value & mask);
;

std::stable_partition(first, last, [mask](auto v) return !(v & mask); );

As for the questions:

answered Jun 23 at 11:08

indi

3,021417

add a commentÂ |Â

up vote
5
down vote

accepted

I'll focus just on the algorithm (not the timing code), and the questions.

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);

Down to the lambda:

auto predicate = [mask](const auto& value)

 return !(value & mask);
;

std::stable_partition(first, last, [mask](auto v) return !(v & mask); );

As for the questions:

answered Jun 23 at 11:08

indi

3,021417

add a commentÂ |Â

up vote
5
down vote

accepted

I'll focus just on the algorithm (not the timing code), and the questions.

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);

Down to the lambda:

auto predicate = [mask](const auto& value)

 return !(value & mask);
;

std::stable_partition(first, last, [mask](auto v) return !(v & mask); );

As for the questions:

answered Jun 23 at 11:08

indi

3,021417

I'll focus just on the algorithm (not the timing code), and the questions.

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
 "At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
 "Only unsigned integers are supported for radix sort");

static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);

Down to the lambda:

auto predicate = [mask](const auto& value)

 return !(value & mask);
;

std::stable_partition(first, last, [mask](auto v) return !(v & mask); );

As for the questions:

answered Jun 23 at 11:08

indi

3,021417

answered Jun 23 at 11:08

indi

3,021417

answered Jun 23 at 11:08

indi

3,021417

answered Jun 23 at 11:08

indi

3,021417

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr