We’re back for week #242 of The Weekly Challenge.

This week, I’m trying something different. I’m going to work on my solution to the task through this blog and then copy-and-paste my solution into a terminal window to see if it works. I’ll walk you through my process, think out loud, and maybe even hit the programming lottery and write something that works on the first try.

Probably not, but maybe

Think of this as the blogging version of Twitch. Look over my shoulder as I figure things out…

“Missing Members”

You are given two arrays of integers.

Write a script to find out the missing members in each other arrays.

The example given:

Input: @arr1 = (1, 2, 3)
       @arr2 = (2, 4, 6)
Output: ([1, 3], [4, 6])

(1, 2, 3) has 2 members (1, 3) missing in the array (2, 4, 6).
(2, 4, 6) has 2 members (4, 6) missing in the array (1, 2, 3).

OK, that seems “easy” enough. Most things that I think are “easy” to program often have a horrible complication in the second half that I never see coming, though.

That would make for an interesting blog post, at least.

My first thought is that this is something that should just be brute forced. Loop through the arrays, do something with the data, compare the data, print out the solution.

I also think that it’s very likely there’s a module on CPAN that would make this program supremely simple to write. There’s probably a two line solution to this there somehow.

I also bet I could jump over to ChatGPT right now and get an answer. Maybe I’ll try that after my solution, for kicks. The point of this programming exercise, though, is to practice skills, not prompts.

Here’s my first idea:

my @arr1 = (1, 2, 3);
my @arr2 = (2, 4, 6);

my @missing_from_second;
my $match = 0;
foreach my $value_first( @arr1 ) {
  foreach my $value_second( @arr2 ) {
     $match++ if ($value_second == $value_first);
  }
  push(@missing_from_second, $value_first) if ($match == 0);
  $match = 0;
}

We do a nested loop to compare all the values. If there’s a match, we essentially set a flag(*). If there’s no match, that means the number is missing in the second array and we add it to the results array.

Is it elegant? Not really. Is it optimized? Heck, no. I already see one or two things I could do to speed it up. (Two ideas off the top of my head: Short circuiting as soon as a match is found. Sorting the arrays first and short-circuiting once the loop has passed by the number.)

This will only check the first case — how many values are in the first array that aren’t in the second?

It loops over the first array and flags any time a value from there is in the second array as it loops over it. If there is a match, that sets a flag which will cause the value NOT to be added to the list of missing values.

Does it work? I think it might. Let’s write a quick output to check:

print "Array 1 has " . scalar(@missing_from_second) . " members missing in Array 2\n";

Now I’ll do a quick copy-and-paste and check it in the terminal window and run it:

Array 1 has 2 members missing in Array 2

That is correct. 1 and 3 are in the first array, but not the second. My code just worked on the first try.

Thanks for reading my blog. I’m taking my Win and going home.

OK, nevermind, I’ll see this all the way through.

Printing Out The Solution, Wherein the Perl Weekly Newsletter Saves My Bacon

How could we print that final message in a prettier way? We need to print out an array, which means I’ll just use my pretty printer from previous challenges:


sub pretty_print_array {

        my @array = @_;
        my $length = scalar @array;
        my $count = 1;

        print "(";

        foreach my $value(@array) {
                print $value;
                print ", " if $count < $length;
                $count++;
        }

        print ")\n";

        return;
}

Let’s integrate that:

print pretty_print_array(@arr1) . " has " . scalar( @missing_from_second ) . " members " . pretty_print_array(@missing_from_second) . " missing in the array " . pretty_print_array(@arr2) . "\n";

I’ve used that pretty_print_array a lot in recent challenges, but never as much as in this one line. That gives us:

(1, 2, 3)
(1, 3)
(2, 4, 6)
 has 2 members  missing in the array 

That ain’t right.

However, I immediately recognize the problem, since it was fresh in my mind.

The reason for this wackiness was covered in a link from the Perl Weekly newsletter just last week. Check out this Reddit article discussing how the Perl print statement works. It explains everything in graphic detail.

So I made this ugly-looking thing, instead —

print pretty_print_array(@arr1);
print " has ";
print scalar( @missing_from_second ) . " members ";
print pretty_print_array(@missing_from_second);
print " missing in the array ";
print pretty_print_array(@arr2) , "\n";

— and it worked beautifully:

(1, 2, 3) has 2 members (1, 3) missing in the array (2, 4, 6)

I also had to remove the new line at the end of the pretty_print_array() function so everything would be together in a single line.

Again, Functionalize All the Things!

That solves the problem for the first example, but what about the second? How can I structure this code to easily run as many times as I want without having to copy and paste code?

It’s time to Function-ify the code! (I’m going to keep making up new verbs until I find one that I like…)

This is where we’re starting from, minus the pretty_print_array() function:

my @arr1 = (1, 2, 3);
my @arr2 = (2, 4, 6);

my @missing_from_second;
my $match = 0;
foreach my $value_first( @arr1 ) {
  foreach my $value_second( @arr2 ) {
     $match++ if ($value_second == $value_first);
  }
  push(@missing_from_second, $value_first) if ($match == 0);
  $match = 0;
}

print pretty_print_array(@arr1);
print " has ";
print scalar( @missing_from_second ) . " members ";
print pretty_print_array(@missing_from_second);
print " missing in the array ";
print pretty_print_array(@arr2) , "\n";

This should be simple enough. I’m going to make one big function out of the whole thing and then call it with the arrays as the parameters. I would normally separate the print part out at the end into its own statement, but then I’m passing along four different arrays. I don’t want to do that for a simple piece of code like this.

So let’s see the final code with calls from both examples:

my @arr1 = (1,2,3);
my @arr2 = (2,4,6);
find_missing_members( \@arr1, \@arr2 );

@arr1 = (1, 2, 3, 3);
@arr2 = (1, 1, 2, 2);
find_missing_members( \@arr1, \@arr2 );

sub find_missing_members {

  my @arr1 = @{ shift() };
  my @arr2 = @{ shift() };

  my @missing_from_second;
  my $match = 0;
  foreach my $value_first( @arr1 ) {
    foreach my $value_second( @arr2 ) {
      $match++ if ($value_second == $value_first);
    }
  push(@missing_from_second, $value_first) if ($match == 0);
  $match = 0;
  }

  print pretty_print_array(@arr1);
  print " has ";
  print scalar( @missing_from_second ) . " members ";
  print pretty_print_array(@missing_from_second); 
  print " missing in the array "; 
  print pretty_print_array(@arr2) , "\n";
}

Turning This Car Around

That works, BUT! I just realized that I haven’t done the comparison in reverse order. We know which members of @arr1 are missing in @arr2, but we don’t yet know which members of @arr2 are missing in @arr1!

I have to admit that I had a moment here where, in my head, I wondered about how I might functionalize the first half of find_missing_members() further to make a generic comparison and then send the arrays in different orders. Or what if I had to — gulp — copy and paste the nested loop to do it again in the other order?

That’s when dummy me realized the much simpler solution: Call the same function I already have, but reverse the order of the arrays I’m sending in:

my @arr1 = (1,2,3);
my @arr2 = (2,4,6);
find_missing_members( \@arr1, \@arr2 );
find_missing_members( \@arr2, \@arr1 );

@arr1 = (1, 2, 3, 3);
@arr2 = (1, 1, 2, 2);
find_missing_members( \@arr1, \@arr2 );
find_missing_members( \@arr2, \@arr1 );

That gives you the answer from the challenge write-up:

(1, 2, 3) has 2 members (1, 3) missing in the array (2, 4, 6)
(2, 4, 6) has 2 members (4, 6) missing in the array (1, 2, 3)
(1, 2, 3, 3) has 2 members (3, 3) missing in the array (1, 1, 2, 2)
(1, 1, 2, 2) has 0 members () missing in the array (1, 2, 3, 3)

Rewrite the Whole Thing? Optimize?

Like I said at the top, this is a brute force solution. Nested loops seem a little much. That means with two 3-value arrays, we go through the loop 9 times.

What if we only had to go through the loop once, and we set up a hash to carry the set of unique values in the second array? Even populating that hash by looping over one array once, we’ll cut down the number of loops from 9 to 3 + 3, or 6.

Like so:

  my @missing_from_second;
  my %second_values;
  map{ $second_values{$_} = 1; } @arr2;

  foreach my $first_value( @arr1 ) {
    push(@missing_from_second, $first_value) if !$second_values{$first_value};
  }

The one bit of trickiness we have here is that we’re hashing the second array, then looping over the first. That’s the correct order to do things. Then we loop over the first array and look for each value as a key in the hash of the second. This means only one comparison per iteration of the loop.

In the first set of input data, that changes the total iterations from 9 to 6. In the second set of data with 4-item arrays, we go from 16 down to 8.

If we had two 100 item arrays, this would cut us back from 10,000 iterations down to 200.

It doesn’t seem like a big difference with the smaller numbers, but the savings pile up quickly.

On the other hand, the nested loops thing feels easier to read. There’s less thinking to do when you’re reading it to understand what’s going on. So long as the arrays don’t grow ridiculously big, will those extra steps really be an issue? Computers are pretty fast these days. Am I being too clever?

On the other, other hand, I like the way I constructed this loop. I love how readable it actually is with the if after the push. (“Push the value onto the missing list if it doesn’t exist in the hash.”)

It would also work as a second map statement:

map{ $second_values{$_} = 1; } @arr2;
map{ push(@missing_from_second, $_) if !$second_values($_);  } @arr1;

Now it’s a one-liner loop that matches the map used to turn the other array into a hash.

Wait, if we’re talking functional programming and that whole second loop there is to filter out a list into an array, shouldn’t we just use grep{} ?!?

map{ $second_values{$_} = 1; } @arr2;

@missing_from_second = grep{
  !$second_values{$_};
} @arr1;

For some, I have no doubt we’re getting even less readable now. However, as a functional programmer wannabe, I love this solution. We ditch the push() command, an if statement, and the foreach() loop and replace it with a grep{} function.

Variable naming also counts here. With this grep we filter just to the values that don’t exist and put them in an array with “missing” in its name. That helps guide the mind in the right direction.

This is also where I get a little frustrated by Perl, because I know other languages have a built-in command to determine if a value is present in an array. Javascript has includes() as an Array method. Python has an “in” command that will work. Elixir has “Enum.member?” as well as an “in” thing of its own. Perl has grep(), but it feels weird to grep() inside a grep{}.

Also, under the hood, I’m sure those other languages are just going a similar thing. (**)

I’m 100% sure there’s a module on CPAN that will do this for me, but I try to stick to core Perl as much as possible. (Honestly, it’s easier at work to use everything built-in than having to fill out a form to get permission to use a new module from the legal and security teams…)

Ask ChatGPT

As promised near the top of this post, I ran the challenge through ChatGPT, which did a better job of printing out the answer and keeping everything in bite-sized chunks.

However, at the core of things, we were fairly close with our functional thinking:

my @missing_in_arr1 = grep { my $num = $_; !grep { $_ == $num } @$arr2 } @$arr1;
my @missing_in_arr2 = grep { my $num = $_; !grep { $_ == $num } @$arr1 } @$arr2;

I like my solution better. It doesn’t work as hard to fit everything into one line, which gives the answer room to breathe and be more easily understandable. It essentially does the same thing, but that’s a lot of stuff on each line in there.

Also, the ChatGPT solution checks in both directions inside the same function. I prefer having one function and calling it twice, just with different parameters.(***)

Annotations

(*) I just realized that back in college, we’d call these “sentinels.” I haven’t heard that term since. Everyone has called them “flags” ever since.

(**) I looked up the source code for Enum.is_member in Elixir. It’s a reduce function using some underlying Erlang code, of course:

{:error, module} ->
        module.reduce(enumerable, {:cont, false}, fn
          v, _ when v === element -> {:halt, true}
          _, _ -> {:cont, false}
        end)
        |> elem(1)

(***) I’m running out of time to publish this, so I’m not going to go look up the difference between parameters and arguments at the moment. Please forgive me if I got it backwards here. I’ll do better next time.

Categories: Weekly Challenge