Implementing and benchmarking Radix Sort on unsigned integers
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
8
down vote
favorite
Introduction
I implemented a restricted version of the sorting algorithm, the one that works only with unsigned integers. I wanted to try using std::stable_partition
with bitmask, as I chose base 2. The reason partition works here is because there are only two "bins", thus it is somewhat like partitioning. I mandated customizing the maximum power of two that is encountered in the sequence.
Code
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
Weird results
The algorithm has a lottle of wiggle regarding the constant factor, though it is linear. The picture below is for highest power of 16, x-axis is element count, and y-axis is nanoseconds (sorry I'm not on good terms with gnuplot). So it takes ~1.6 milliseconds to sort 30'000 elements, which is I think quite a lot for linear sorting algorithm.
I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.
Full code
#include <vector>
#include <algorithm>
#include <random>
std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)
auto generator = [power_limit]
static std::mt19937 twister;
static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
return dist(twister);
;
std::vector<unsigned> v(size);
std::generate(v.begin(), v.end(), generator);
return v;
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
#include <chrono>
#include <atomic>
namespace shino
template <typename Clock = std::chrono::high_resolution_clock>
class stopwatch
const typename Clock::time_point start_point;
public:
stopwatch() :
start_point(Clock::now())
template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
Rep elapsed_time() const
std::atomic_thread_fence(std::memory_order_relaxed);
auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
std::atomic_thread_fence(std::memory_order_relaxed);
return static_cast<Rep>(counted_time);
;
using precise_stopwatch = stopwatch<>;
using system_stopwatch = stopwatch<std::chrono::system_clock>;
using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;
#include <stdexcept>
#include <fstream>
int main()
std::ofstream measurings("measurings.csv");
if (!measurings)
throw std::runtime_error("measurings file opening failed");
for (unsigned power = 4; power <= 16; ++power)
for (auto size = 0ull; size <= 20'000; ++size)
std::cout << "radix sorting vector of size " << size << 'n';
auto input = generate_vector(size, power);
shino::precise_stopwatch stopwatch;
shino::radix_sort(input.begin(), input.end(), power);
auto elapsed_time = stopwatch.elapsed_time();
measurings << power << ',' << size << ',' << elapsed_time << 'n';
if (not std::is_sorted(input.begin(), input.end()))
throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));
The code records performance and checks if sort actually worked correctly. So if you make any changes, just run the code to test it. stopwatch
code is taken from here. In fact, I believe it deserves a review on its own.
I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.
Concerns
- Is algorithm implemented well?
Are there points to improve performance? Make a better use of standard library?
- Will the inner lambda cause code bloat?
Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.
- Anything else that comes to mind
c++ sorting c++17 benchmarking
 |Â
show 2 more comments
up vote
8
down vote
favorite
Introduction
I implemented a restricted version of the sorting algorithm, the one that works only with unsigned integers. I wanted to try using std::stable_partition
with bitmask, as I chose base 2. The reason partition works here is because there are only two "bins", thus it is somewhat like partitioning. I mandated customizing the maximum power of two that is encountered in the sequence.
Code
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
Weird results
The algorithm has a lottle of wiggle regarding the constant factor, though it is linear. The picture below is for highest power of 16, x-axis is element count, and y-axis is nanoseconds (sorry I'm not on good terms with gnuplot). So it takes ~1.6 milliseconds to sort 30'000 elements, which is I think quite a lot for linear sorting algorithm.
I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.
Full code
#include <vector>
#include <algorithm>
#include <random>
std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)
auto generator = [power_limit]
static std::mt19937 twister;
static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
return dist(twister);
;
std::vector<unsigned> v(size);
std::generate(v.begin(), v.end(), generator);
return v;
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
#include <chrono>
#include <atomic>
namespace shino
template <typename Clock = std::chrono::high_resolution_clock>
class stopwatch
const typename Clock::time_point start_point;
public:
stopwatch() :
start_point(Clock::now())
template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
Rep elapsed_time() const
std::atomic_thread_fence(std::memory_order_relaxed);
auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
std::atomic_thread_fence(std::memory_order_relaxed);
return static_cast<Rep>(counted_time);
;
using precise_stopwatch = stopwatch<>;
using system_stopwatch = stopwatch<std::chrono::system_clock>;
using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;
#include <stdexcept>
#include <fstream>
int main()
std::ofstream measurings("measurings.csv");
if (!measurings)
throw std::runtime_error("measurings file opening failed");
for (unsigned power = 4; power <= 16; ++power)
for (auto size = 0ull; size <= 20'000; ++size)
std::cout << "radix sorting vector of size " << size << 'n';
auto input = generate_vector(size, power);
shino::precise_stopwatch stopwatch;
shino::radix_sort(input.begin(), input.end(), power);
auto elapsed_time = stopwatch.elapsed_time();
measurings << power << ',' << size << ',' << elapsed_time << 'n';
if (not std::is_sorted(input.begin(), input.end()))
throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));
The code records performance and checks if sort actually worked correctly. So if you make any changes, just run the code to test it. stopwatch
code is taken from here. In fact, I believe it deserves a review on its own.
I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.
Concerns
- Is algorithm implemented well?
Are there points to improve performance? Make a better use of standard library?
- Will the inner lambda cause code bloat?
Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.
- Anything else that comes to mind
c++ sorting c++17 benchmarking
1
One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals ofofstream
and in particular when it needs to stop and flush the data buffer to disk.
â 1201ProgramAlarm
Jun 23 at 1:42
@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â Incomputable
Jun 23 at 7:36
Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â JDà Âugosz
Jun 23 at 9:23
@JDà Âugosz, I believe it does. IâÂÂm not sure though.
â Incomputable
Jun 23 at 10:34
Could I get a copy of your cvs file please? :)
â Rakete1111
Jun 26 at 15:55
 |Â
show 2 more comments
up vote
8
down vote
favorite
up vote
8
down vote
favorite
Introduction
I implemented a restricted version of the sorting algorithm, the one that works only with unsigned integers. I wanted to try using std::stable_partition
with bitmask, as I chose base 2. The reason partition works here is because there are only two "bins", thus it is somewhat like partitioning. I mandated customizing the maximum power of two that is encountered in the sequence.
Code
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
Weird results
The algorithm has a lottle of wiggle regarding the constant factor, though it is linear. The picture below is for highest power of 16, x-axis is element count, and y-axis is nanoseconds (sorry I'm not on good terms with gnuplot). So it takes ~1.6 milliseconds to sort 30'000 elements, which is I think quite a lot for linear sorting algorithm.
I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.
Full code
#include <vector>
#include <algorithm>
#include <random>
std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)
auto generator = [power_limit]
static std::mt19937 twister;
static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
return dist(twister);
;
std::vector<unsigned> v(size);
std::generate(v.begin(), v.end(), generator);
return v;
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
#include <chrono>
#include <atomic>
namespace shino
template <typename Clock = std::chrono::high_resolution_clock>
class stopwatch
const typename Clock::time_point start_point;
public:
stopwatch() :
start_point(Clock::now())
template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
Rep elapsed_time() const
std::atomic_thread_fence(std::memory_order_relaxed);
auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
std::atomic_thread_fence(std::memory_order_relaxed);
return static_cast<Rep>(counted_time);
;
using precise_stopwatch = stopwatch<>;
using system_stopwatch = stopwatch<std::chrono::system_clock>;
using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;
#include <stdexcept>
#include <fstream>
int main()
std::ofstream measurings("measurings.csv");
if (!measurings)
throw std::runtime_error("measurings file opening failed");
for (unsigned power = 4; power <= 16; ++power)
for (auto size = 0ull; size <= 20'000; ++size)
std::cout << "radix sorting vector of size " << size << 'n';
auto input = generate_vector(size, power);
shino::precise_stopwatch stopwatch;
shino::radix_sort(input.begin(), input.end(), power);
auto elapsed_time = stopwatch.elapsed_time();
measurings << power << ',' << size << ',' << elapsed_time << 'n';
if (not std::is_sorted(input.begin(), input.end()))
throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));
The code records performance and checks if sort actually worked correctly. So if you make any changes, just run the code to test it. stopwatch
code is taken from here. In fact, I believe it deserves a review on its own.
I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.
Concerns
- Is algorithm implemented well?
Are there points to improve performance? Make a better use of standard library?
- Will the inner lambda cause code bloat?
Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.
- Anything else that comes to mind
c++ sorting c++17 benchmarking
Introduction
I implemented a restricted version of the sorting algorithm, the one that works only with unsigned integers. I wanted to try using std::stable_partition
with bitmask, as I chose base 2. The reason partition works here is because there are only two "bins", thus it is somewhat like partitioning. I mandated customizing the maximum power of two that is encountered in the sequence.
Code
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
Weird results
The algorithm has a lottle of wiggle regarding the constant factor, though it is linear. The picture below is for highest power of 16, x-axis is element count, and y-axis is nanoseconds (sorry I'm not on good terms with gnuplot). So it takes ~1.6 milliseconds to sort 30'000 elements, which is I think quite a lot for linear sorting algorithm.
I believe those spikes are context switches, but I couldn't verify that. It seems like there is no way to reliably track them and correspond to the spike.
Full code
#include <vector>
#include <algorithm>
#include <random>
std::vector<unsigned> generate_vector(std::size_t size, unsigned power_limit)
auto generator = [power_limit]
static std::mt19937 twister;
static std::uniform_int_distribution<unsigned > dist0u, (1u << power_limit);
return dist(twister);
;
std::vector<unsigned> v(size);
std::generate(v.begin(), v.end(), generator);
return v;
#include <iterator>
#include <algorithm>
namespace shino
template <typename BidirIterator>
void radix_sort(BidirIterator first, BidirIterator last, unsigned upper_power_bound)
using category = typename std::iterator_traits<BidirIterator>::iterator_category;
using value_type = typename std::iterator_traits<BidirIterator>::value_type;
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
value_type mask = 1;
for (auto power = 0u; power <= upper_power_bound; ++power)
auto predicate = [mask](const auto& value)
return !(value & mask);
;
std::stable_partition(first, last, predicate);
mask <<= 1;
#include <chrono>
#include <atomic>
namespace shino
template <typename Clock = std::chrono::high_resolution_clock>
class stopwatch
const typename Clock::time_point start_point;
public:
stopwatch() :
start_point(Clock::now())
template <typename Rep = typename Clock::duration::rep, typename Units = typename Clock::duration>
Rep elapsed_time() const
std::atomic_thread_fence(std::memory_order_relaxed);
auto counted_time = std::chrono::duration_cast<Units>(Clock::now() - start_point).count();
std::atomic_thread_fence(std::memory_order_relaxed);
return static_cast<Rep>(counted_time);
;
using precise_stopwatch = stopwatch<>;
using system_stopwatch = stopwatch<std::chrono::system_clock>;
using monotonic_stopwatch = stopwatch<std::chrono::steady_clock>;
#include <stdexcept>
#include <fstream>
int main()
std::ofstream measurings("measurings.csv");
if (!measurings)
throw std::runtime_error("measurings file opening failed");
for (unsigned power = 4; power <= 16; ++power)
for (auto size = 0ull; size <= 20'000; ++size)
std::cout << "radix sorting vector of size " << size << 'n';
auto input = generate_vector(size, power);
shino::precise_stopwatch stopwatch;
shino::radix_sort(input.begin(), input.end(), power);
auto elapsed_time = stopwatch.elapsed_time();
measurings << power << ',' << size << ',' << elapsed_time << 'n';
if (not std::is_sorted(input.begin(), input.end()))
throw std::logic_error("radix sort doesn't work correctly on size " + std::to_string(size));
The code records performance and checks if sort actually worked correctly. So if you make any changes, just run the code to test it. stopwatch
code is taken from here. In fact, I believe it deserves a review on its own.
I couldn't get gnuplot to plot 3d graphs, sorry no pics for the changes related to maximum power.
Concerns
- Is algorithm implemented well?
Are there points to improve performance? Make a better use of standard library?
- Will the inner lambda cause code bloat?
Lambdas are anonymous classes, so will be instantiated for every version of radix sort itself. I'm unsure if using lambda was a wise choice.
- Anything else that comes to mind
c++ sorting c++17 benchmarking
edited Jun 23 at 7:35
asked Jun 22 at 22:53
Incomputable
5,99721144
5,99721144
1
One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals ofofstream
and in particular when it needs to stop and flush the data buffer to disk.
â 1201ProgramAlarm
Jun 23 at 1:42
@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â Incomputable
Jun 23 at 7:36
Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â JDà Âugosz
Jun 23 at 9:23
@JDà Âugosz, I believe it does. IâÂÂm not sure though.
â Incomputable
Jun 23 at 10:34
Could I get a copy of your cvs file please? :)
â Rakete1111
Jun 26 at 15:55
 |Â
show 2 more comments
1
One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals ofofstream
and in particular when it needs to stop and flush the data buffer to disk.
â 1201ProgramAlarm
Jun 23 at 1:42
@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â Incomputable
Jun 23 at 7:36
Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â JDà Âugosz
Jun 23 at 9:23
@JDà Âugosz, I believe it does. IâÂÂm not sure though.
â Incomputable
Jun 23 at 10:34
Could I get a copy of your cvs file please? :)
â Rakete1111
Jun 26 at 15:55
1
1
One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of
ofstream
and in particular when it needs to stop and flush the data buffer to disk.â 1201ProgramAlarm
Jun 23 at 1:42
One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of
ofstream
and in particular when it needs to stop and flush the data buffer to disk.â 1201ProgramAlarm
Jun 23 at 1:42
@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â Incomputable
Jun 23 at 7:36
@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â Incomputable
Jun 23 at 7:36
Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â JDà Âugosz
Jun 23 at 9:23
Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â JDà Âugosz
Jun 23 at 9:23
@JDà Âugosz, I believe it does. IâÂÂm not sure though.
â Incomputable
Jun 23 at 10:34
@JDà Âugosz, I believe it does. IâÂÂm not sure though.
â Incomputable
Jun 23 at 10:34
Could I get a copy of your cvs file please? :)
â Rakete1111
Jun 26 at 15:55
Could I get a copy of your cvs file please? :)
â Rakete1111
Jun 26 at 15:55
 |Â
show 2 more comments
1 Answer
1
active
oldest
votes
up vote
5
down vote
accepted
I'll focus just on the algorithm (not the timing code), and the questions.
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
It seems a bit superfluous to say "static_assert(is_foo, "must be foo")
". The assertion predicates are clear enough on their own to not require additional explanation. Also, I wouldn't recommend combining requirements. It makes it harder to diagnose exactly which assertion failed:
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);
Down to the lambda:
auto predicate = [mask](const auto& value)
return !(value & mask);
;
Unless you're planning to support non-fundamental integer types, taking the argument by const&
is probably an unnecessary pessimization. Even if it's not a built-in type, if the binary &
operator and conversion to bool
are implemented sanely, the lambda's frame can be basically optimized away. It's nice to be able to do everything in-line:
std::stable_partition(first, last, [mask](auto v) return !(v & mask); );
As for the questions:
Is it implemented well? I'd say so. I can't think of many improvements/extensions I'd make. Maybe a check to make sure upper_power_bound
is less than the number of bits in value_type
so you're not just spinning your wheels... but that could be tricky to pull off generally if you're supporting non-builtin integer types (maybe check to see if mask
is zero, assuming the 1
bit drops off the end into the ether if you shift it too far?). Maybe a range version, but that's a pretty trivial addition once the iterator version's working.
Will the inner lambda cause code bloat? No. On any compiler worth mentioning and on any optimization level worth caring about, it will be inlined - likely to a single instruction, maybe two (assuming a built-in unsigned type). Modern compilers are terrifyingly good at optimizing lambdas (or anything where all the code is right there in front of them, really).
Even if it weren't inlined to effectively nothing, the lambda would be dwarfed by std::stable_partition()
's code. Like by two or three orders of magnitude, easy. It's totally not something to worry about.
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
I'll focus just on the algorithm (not the timing code), and the questions.
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
It seems a bit superfluous to say "static_assert(is_foo, "must be foo")
". The assertion predicates are clear enough on their own to not require additional explanation. Also, I wouldn't recommend combining requirements. It makes it harder to diagnose exactly which assertion failed:
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);
Down to the lambda:
auto predicate = [mask](const auto& value)
return !(value & mask);
;
Unless you're planning to support non-fundamental integer types, taking the argument by const&
is probably an unnecessary pessimization. Even if it's not a built-in type, if the binary &
operator and conversion to bool
are implemented sanely, the lambda's frame can be basically optimized away. It's nice to be able to do everything in-line:
std::stable_partition(first, last, [mask](auto v) return !(v & mask); );
As for the questions:
Is it implemented well? I'd say so. I can't think of many improvements/extensions I'd make. Maybe a check to make sure upper_power_bound
is less than the number of bits in value_type
so you're not just spinning your wheels... but that could be tricky to pull off generally if you're supporting non-builtin integer types (maybe check to see if mask
is zero, assuming the 1
bit drops off the end into the ether if you shift it too far?). Maybe a range version, but that's a pretty trivial addition once the iterator version's working.
Will the inner lambda cause code bloat? No. On any compiler worth mentioning and on any optimization level worth caring about, it will be inlined - likely to a single instruction, maybe two (assuming a built-in unsigned type). Modern compilers are terrifyingly good at optimizing lambdas (or anything where all the code is right there in front of them, really).
Even if it weren't inlined to effectively nothing, the lambda would be dwarfed by std::stable_partition()
's code. Like by two or three orders of magnitude, easy. It's totally not something to worry about.
add a comment |Â
up vote
5
down vote
accepted
I'll focus just on the algorithm (not the timing code), and the questions.
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
It seems a bit superfluous to say "static_assert(is_foo, "must be foo")
". The assertion predicates are clear enough on their own to not require additional explanation. Also, I wouldn't recommend combining requirements. It makes it harder to diagnose exactly which assertion failed:
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);
Down to the lambda:
auto predicate = [mask](const auto& value)
return !(value & mask);
;
Unless you're planning to support non-fundamental integer types, taking the argument by const&
is probably an unnecessary pessimization. Even if it's not a built-in type, if the binary &
operator and conversion to bool
are implemented sanely, the lambda's frame can be basically optimized away. It's nice to be able to do everything in-line:
std::stable_partition(first, last, [mask](auto v) return !(v & mask); );
As for the questions:
Is it implemented well? I'd say so. I can't think of many improvements/extensions I'd make. Maybe a check to make sure upper_power_bound
is less than the number of bits in value_type
so you're not just spinning your wheels... but that could be tricky to pull off generally if you're supporting non-builtin integer types (maybe check to see if mask
is zero, assuming the 1
bit drops off the end into the ether if you shift it too far?). Maybe a range version, but that's a pretty trivial addition once the iterator version's working.
Will the inner lambda cause code bloat? No. On any compiler worth mentioning and on any optimization level worth caring about, it will be inlined - likely to a single instruction, maybe two (assuming a built-in unsigned type). Modern compilers are terrifyingly good at optimizing lambdas (or anything where all the code is right there in front of them, really).
Even if it weren't inlined to effectively nothing, the lambda would be dwarfed by std::stable_partition()
's code. Like by two or three orders of magnitude, easy. It's totally not something to worry about.
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
I'll focus just on the algorithm (not the timing code), and the questions.
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
It seems a bit superfluous to say "static_assert(is_foo, "must be foo")
". The assertion predicates are clear enough on their own to not require additional explanation. Also, I wouldn't recommend combining requirements. It makes it harder to diagnose exactly which assertion failed:
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);
Down to the lambda:
auto predicate = [mask](const auto& value)
return !(value & mask);
;
Unless you're planning to support non-fundamental integer types, taking the argument by const&
is probably an unnecessary pessimization. Even if it's not a built-in type, if the binary &
operator and conversion to bool
are implemented sanely, the lambda's frame can be basically optimized away. It's nice to be able to do everything in-line:
std::stable_partition(first, last, [mask](auto v) return !(v & mask); );
As for the questions:
Is it implemented well? I'd say so. I can't think of many improvements/extensions I'd make. Maybe a check to make sure upper_power_bound
is less than the number of bits in value_type
so you're not just spinning your wheels... but that could be tricky to pull off generally if you're supporting non-builtin integer types (maybe check to see if mask
is zero, assuming the 1
bit drops off the end into the ether if you shift it too far?). Maybe a range version, but that's a pretty trivial addition once the iterator version's working.
Will the inner lambda cause code bloat? No. On any compiler worth mentioning and on any optimization level worth caring about, it will be inlined - likely to a single instruction, maybe two (assuming a built-in unsigned type). Modern compilers are terrifyingly good at optimizing lambdas (or anything where all the code is right there in front of them, really).
Even if it weren't inlined to effectively nothing, the lambda would be dwarfed by std::stable_partition()
's code. Like by two or three orders of magnitude, easy. It's totally not something to worry about.
I'll focus just on the algorithm (not the timing code), and the questions.
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>,
"At least bidirectional iterators are required");
static_assert(std::is_integral_v<value_type> and std::is_unsigned_v<value_type>,
"Only unsigned integers are supported for radix sort");
It seems a bit superfluous to say "static_assert(is_foo, "must be foo")
". The assertion predicates are clear enough on their own to not require additional explanation. Also, I wouldn't recommend combining requirements. It makes it harder to diagnose exactly which assertion failed:
static_assert(std::is_base_of_v<std::bidirectional_iterator_tag, category>);
static_assert(std::is_integral_v<value_type>);
static_assert(std::is_unsigned_v<value_type>);
Down to the lambda:
auto predicate = [mask](const auto& value)
return !(value & mask);
;
Unless you're planning to support non-fundamental integer types, taking the argument by const&
is probably an unnecessary pessimization. Even if it's not a built-in type, if the binary &
operator and conversion to bool
are implemented sanely, the lambda's frame can be basically optimized away. It's nice to be able to do everything in-line:
std::stable_partition(first, last, [mask](auto v) return !(v & mask); );
As for the questions:
Is it implemented well? I'd say so. I can't think of many improvements/extensions I'd make. Maybe a check to make sure upper_power_bound
is less than the number of bits in value_type
so you're not just spinning your wheels... but that could be tricky to pull off generally if you're supporting non-builtin integer types (maybe check to see if mask
is zero, assuming the 1
bit drops off the end into the ether if you shift it too far?). Maybe a range version, but that's a pretty trivial addition once the iterator version's working.
Will the inner lambda cause code bloat? No. On any compiler worth mentioning and on any optimization level worth caring about, it will be inlined - likely to a single instruction, maybe two (assuming a built-in unsigned type). Modern compilers are terrifyingly good at optimizing lambdas (or anything where all the code is right there in front of them, really).
Even if it weren't inlined to effectively nothing, the lambda would be dwarfed by std::stable_partition()
's code. Like by two or three orders of magnitude, easy. It's totally not something to worry about.
answered Jun 23 at 11:08
indi
3,021417
3,021417
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f197087%2fimplementing-and-benchmarking-radix-sort-on-unsigned-integers%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
One issue is when you calculate the elapsed time. This should be done right after the sorting is done, before you start doing any output. That output will have variable amounts of additional time, depending on the internals of
ofstream
and in particular when it needs to stop and flush the data buffer to disk.â 1201ProgramAlarm
Jun 23 at 1:42
@1201ProgramAlarm, thanks, fixed. I guess I shouldn't write algorithms after 20 hours of no sleep...
â Incomputable
Jun 23 at 7:36
Does the stopwatch class take efforts to prevent code movement during optimization and re-ordering during execution? It could be inaccurate on the scale of a few hundred instructions.
â JDà Âugosz
Jun 23 at 9:23
@JDà Âugosz, I believe it does. IâÂÂm not sure though.
â Incomputable
Jun 23 at 10:34
Could I get a copy of your cvs file please? :)
â Rakete1111
Jun 26 at 15:55