Zip6

This post is part of a series on Mohammad Anwar’s excellent Perl Weekly Challenge, where Perl and Raku hackers submit solutions to two different challenges every week. (It’s a lot of fun, if you’re into that sort of thing.)

The zip6 function has long been a staple of the List::MoreUtils CPAN module. The Week 40 challenge #1 describes it very well. But even more succinctly: zip6 takes an array of arrays (AoA) and returns another AoA of all the 1st elements, then the 2nd, and so on.

It’s not a difficult algorithm by any stretch. In fact, the description more or less tells you how to do it. Simply iterate through all of the arrays in parallel, building up the resulting array as you go.

With a simple nested map (and borrowing max from core module List::Util, though that isn’t strictly necessary), the solution is easy:

sub my_zip6(\@\@;\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@) {
    my $max = max map $#$_, @_;  # Use the size of the longest array

    map { my $i = $_; [ map $_->[$i], @_ ] } 0..$max;
}

That prototype does look a bit silly, but I was trying to match List::MoreUtils’ behaviour of accepting bare arrays, like so: my_zip6 @a1, @a2, @a3; But if we don’t care about compatibility with List::MoreUtils, we can allow an arbitrary number of arrays (including anonymous arrays) by accepting refs instead.

At minimum, all we need to do is provide a different prototype! Since the \@ in the original prototype coerces the input arrays to refs already, the code is exactly the same, and unsurprisingly, performs exactly the same as well.

However, for a robust implementation, we might be tempted to add type checking, since Perl will no longer ensure all arguments are arrays. In fact, it would actually die with a somewhat useful error message (Not an ARRAY reference), but the line number of the error will be the line number within the sub, not that of the caller. We can fix that by doing our own check and failing with Carp‘s croak.

Here is the whole sub:

sub zip6ref_check($$;@) {
    croak "Not an ARRAY ref" if any { 'ARRAY' ne ref } @_; # Type check
    my $max = max map $#$_, @_;

    map { my $i = $_; [ map $_->[$i], @_ ] } 0..$max;
}

Now the line number of the error will point to the caller, which is far more useful.

Unfortunately, the additional validation slows things down significantly on small arrays, cutting the performance almost in half. See below for the benchmark results. As always, there are tradeoffs for performance, convenience, and safety.

As always, my Perl 5 solution is available on GitHub.

How does List::MoreUtils do it?

It turns out the pure Perl List::MoreUtils::PP’ zip6 code is almost identical to my first solution, save for even more \@s in the prototype. There just aren’t many good ways to iterate through n arrays.

sub zip6 (\@\@;\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@\@)
{
    my $max = -1;
    $max < $#$_ && ($max = $#$_) foreach @_;
    map {
        my $ix = $_;
        [map $_->[$ix], @_];
    } 0 .. $max;
}

Also, List::MoreUtils chose to roll their own maximum calculation, rather than using List::Utils max. For small arrays, at least, the performance difference is negligible.

Going faster: List::Utils::XS

To go even faster, List::MoreUtils::XS has a concise C solution that runs blazingly fast. Here are the numbers for all the versions, brought to you by Benchmark‘s cmpthese:

               Rate ref w/chk       ref   my_zip6  zip6(PP)  zip6(XS)
ref w/chk  531258/s        --      -39%      -41%      -42%      -79%
ref        873274/s       64%        --       -2%       -5%      -66%
my_zip6    893209/s       68%        2%        --       -3%      -65%
zip6(PP)   921625/s       73%        6%        3%        --      -64%
zip6(XS)  2535241/s      377%      190%      184%      175%        --

List::MoreUtils::XS is not installed along with List::MoreUtils, so you’re missing out if you aren’t already using it. However, if you do have the XS version installed, don’t be tempted to directly use it in your code. use List::MoreUtils instead; it will upgrade to the XS version if it’s installed, or gracefully fall back to pure-Perl when running on systems without the XS version.

Raku Version

Raku has a zip routine that should need no introduction, as it works more or less identically to the zip6 routines we just looked at.

.Str.say for zip @a1, @a2, @a3;

The .Str is necessary to convert each returned array to a string representation, to get rid of the brackets. Without it, the output doesn’t quite match the challenge specification.

There is also the builtin zip operator (Z), which is both infix and chaining, so it can support as many arrays as you like:

.Str.say for @a1 Z @a2 Z @a3;

It’s just that easy.

Leave a Reply

Your email address will not be published. Required fields are marked *