Core Elixir: List Module Misc.

Core Elixir looks into the List module and finds Erlang wrappers, test-driven development, and the joy of tuples.

We’ll be dipping a toe into the List module waters today…1

Elixir Drop dips his toe into the water

In researching modules and functions for Core Elixir, I often come across dead ends. These are the functions that don’t go very deep and that aren’t very interesting for a Core Elixir article.

The #1 cause of this is Erlang functions that are called directly by Elixir just to switch the parameters around so the list or collection comes first. Thanks, Pipeline Operator.

For example:

List.duplicate/2 takes a list and a non negative integer that describes how many times the caller wants to repeat that list. So, to steal the example straight out of the documentation:


iex> List.duplicate([1,2], 2)
[[1,2], [1,2]]

You know how that works? List.duplicate(list, multiplier) calls Erlang’s :lists.duplicate(multiplier, list).

For Core Elixir purposes, that’s a dead end.

That seems simple, but Elixir goes one step simpler:

List.flatten/1 directly calls :lists.flatten/1. It doesn’t even need to swap the order of the parameters around, since there’s only one.

But, wait, you say, there’s another version of List.flatten that has an arity of 2! The second parameter is a value that gets slapped on at the end of the newly flattened list! Does that extra parameter make any difference?

No. List.flatten(list, tail) calls :lists.flatten(list, tail).

The list still comes first. Erlang is inconsistent compared to Elixir in this way.

Purely Functional Looks Test Driven

I like this bit of code, to define List.last, the biggest tongue twister in Elixir’s code base:


  def last([]),    do: nil
  def last([h]),   do: h
  def last([_|t]), do: last(t)

It feels like the way you’d write something when you’re test driving it, doesn’t it? Let’s work through it:

Make it work for the simplest possible case — an empty list. In that case, there’s no making heads or tails of it (ha!), so you return a nil.

OK, so what happens when there’s only one item in the list? The answer isn’t nil anymore. You need to write that case next. The single value is the head AND the last item in the list.

And when you have two or more items? You look for the last value in the list by going through the whole list until you get to the end, recursively. Keep stripping off the head from the list and look for the last of the rest.

As a bonus, the second case now makes more sense. Why is the single value the last value? Because if you started with two or more values, this is the base case. A list with one value would effectively be the most tail item in the list. Pretty nifty.

It’s logic like that that makes me like functional programming. It just feels smart, somehow.

Of Tuples and Keyfinds

List.keyfind/3 strikes me as funny, mostly because it isn’t what I thought it was at first. And then I felt stupid for missing it. Let me assume for the moment that we’re all being honest with ourselves and that all of our minds blank at something ridiculously basic every once in a while. Thanks.

`List.keyfind/3′ calls on the similarly named :lists.keyfind/3, like so:


  def keyfind(list, key, position, default \\ nil) do
    :lists.keyfind(key, position + 1, list) || default
  end

The function takes a list of tuples ( [a: 1, b: 2, c: 3] ), a value that it’s looking for, and which position in each tuple it’s looking for that value.

Here’s the trick to remember: A tuple can have not just two values, but also more or less than 2 values. A tuple is just an ordered list of things. Curly braces surround the list, and the values inside are separated by commas.

Valid tuples:
* {1}
* {1,2}
* {1,2,3}
* {1,2,3,4}

Et cetera. Well, you probably want to use a map after that, instead, but it’s your program. Do what you want.

When you have a two element tuple, it’s not a key and a value. It’s just two values.

I had forgotten that in a list with keywords, the curly braces can be dropped.


iex> t1 = [a: 1, b: 2]
[a: 1, b: 2]

iex> List.first(t1)
{:a, 1}

See? That’s a list with two tuples. Each is kinda sorta key and value, but not really. It’s position 0 and position 1 in a two element tuple. [a: 1, b: 2] is the same thing as [{:a, 1}, {:b, 1}]. It just looks a lot cleaner, so long as you remember that little trick. I’m a little out of practice using it, I guess.

In List.keyfind, the position parameter refers to which value you’re looking at in each tuple, the zeroth or the first, in these cases. For what I think of as the key, I’d always use 0. For what is traditionally the value, it’s a 1.

If I wanted to make a ridiculous pull request to the Elixir core language, I’d offer up positions of :key and :value that are equal to 0 and 1 in two element tuples. (Pro tip: Don’t do it. It’s a silly idea.)

Finally, the last parameter sent to List.keyfind is the default value to return if nothing is found. “Nil” is the default default.

There’s one other gotcha to point out here in this code. Erlang likes to be a 1-based language, while Elixir likes the 0-based values. List.keyfind calls :lists.keyfind, moving the list to the front and adding 1 to the position value to make up for that off-by-one position.

You may all quibble amongst yourself over whether everything should start with 0 or 1. I’ll be over here indenting my code with the tab key just to annoy you.

I’m just kidding. I would never do that. I program in a non fixed width font, so indentations never looks right no matter which keys I use.

Sorry, I’m kidding again. I don’t use a non fixed width font. That’s madness! But this switch to a Dvorak keyboards means d;jdsfo dsmowm ojf of s;ljfy. !

(17)

  1. This might just be the best Elixir Drop drawing I’ve done yet. Just sayin’

Core Elixir: Path.relative_to/2

Sit back, this one has tangents on top of tangents.

(Don’t I say that every time?)

(If you’re tangential to a tangent, are you asymptotic?)

According to the documentation, Path.relative_to/2 does this:

Returns the given path relative to the given from path.
In other words, it tries to strip the from prefix from path.

Sounds like fun. Even better, it’s recursive. Eventually.

This function does not interact with the filesystem. It does it all via string manipulation, without any regexes.

Should the path not contain the from, it returns the full path back to the user, like so:

      iex> Path.relative_to("/usr/local/foo", "/etc")
      "/usr/local/foo"

Otherwise, it returns the relative path:

      iex> Path.relative_to("/usr/local/foo", "/usr/local")
      "foo"
      iex> Path.relative_to("/usr/local/foo", "/")
      "usr/local/foo"

(I stole all those examples from the source code documentation.)

But How Does It Work?

We start here:

  def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(split(path), split(from), path)
  end

First, note that this is NOT an Erlang wrapper. That always helps when writing Core Elixir. I’m selfish like that.

The first thing Path.relative_to/2 does is translate the path argument to be a string. Why’s that? Looking at the top of the Path source code’s documentation, we see this:

   The functions in this module may receive a char data as
  argument (i.e. a string or a list of characters / string)
  and will always return a string (encoded in UTF-8).

It leaves the input options open, but standardizes on one particular output type. That’s smart. That gives the user maximal flexibility on input with a reliable return. It lets the language do the work behind the scenes.

So how does IO.chardata_to_string do its thing? Smells like a tangent…

IO.chardata_to_string/1

First, it tests if it’s already a string, and returns it unchanged if it is. Thanks, guard clause!

  def chardata_to_string(string) when is_binary(string) do
    string
  end

If, on the other hand, it’s a list of characters, it goes to an Erlang command, :unicode.characters_to_binary/1 to handle the heavy lifting. It returns the resulting string if all worked according to plan. If not, it passes back an appropriate error:

  def chardata_to_string(list) when is_list(list) do
    case :unicode.characters_to_binary(list) do
      result when is_binary(result) ->
        result

      {:error, encoded, rest} ->
        raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :invalid

      {:incomplete, encoded, rest} ->
        raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :incomplete
    end
  end

I think that’s fairly straightforward. You get to use a case statement instead of a three way if-then-else, so it looks nicer, too.

Yes, this part of the function is an Erlang wrapper, but it has a lot of good error-checking afterwards added to it.

You might wonder what happens when the argument sent in is neither a list nor a string. Fair enough.

You get an error:

iex> c = HashDict.new
#HashDict<[]>
iex> IO.chardata_to_string(c)
** (FunctionClauseError) no function clause matching in IO.chardata_to_string/1
    (elixir) lib/io.ex:329: IO.chardata_to_string(#HashDict<[]>)

This makes sense. Let it crash and all. Give the code the two ways it might work, and then let it fail in any other. The programmer is smart enough to take the hint from there.

Back to the Function

Where were we again? Ah, here:

  def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(split(path), split(from), path)
  end

path is now a string version of whatever was passed in, whether it was a string already or just a list of characters.

The next line (#3 above) splits apart the two paths that were passed in into their own lists of directories.

“What’s that?” you say. “How do we split a path?”

Glad you asked. Please join me on our next tangent.

Path.split

Elsewhere in the Path module, the split function takes in a path and breaks it apart into a list of its elements. It uses a forward or backward slash as the separator, becuase it knows if you’re on Windows and adjusts itself accordingly. (Pro tip: It’s always a good idea to use the language’s built-in functions for dealing with paths for just that reason. It will allow your program to run anywhere.)

Here’s Path.split/1:

 # Work around a bug in Erlang on UNIX
  def split(""), do: []

Wait, read that comment again. UNIX is the problem child here and not Windows? Hunh. Well, now I’ve seen everything.

But what if the path isn’t empty? It’s not so easy. Wait, actually, it is:

  def split(path) do
    FN.split(IO.chardata_to_string(path))
  end

FN is an alias for Erlang’s :filename module. We’re letting Erlang do the work.

I bet some of you are screaming “DRY” right about now. We already talked about using IO.chardata_to_string earlier in the Path.relative_to function before we got to the line where we called Path.split. Why now just let Path.split take care of that?

While it is a duplication of effort, we can’t just go rewriting Core Elixir for this one function’s sake. And, hey, this is functional programming: We can perform the same function on a given piece of data a million times and always get back the same answer. Also, the function will see that the path has already been stringified and pass it right back immediately.

I somehow doubt running this twice will result in a drag on the program’s speediness.

When all is said and done, we end up with a path that’s been split into a list of its directories, like in this example:

iex> File.cwd! |> Path.split
["/", "home", "vagrant", "augiedb", "elixir"]

Note that the first part is “/” to indicate that this isn’t a relative path to where we are at the moment. That’s not some kind of indicator of what the directory split character is or anything. You’re welcome to use Path.type to see if the path is relative or absolute, but that’s a topic for another time…

Also, I used cwd! in this example because plain old cwd returns one of those {:ok, blah_blah_blah}' tuples, which can't be pipelined intoPath.split`.

Are we ready yet to get to the recursive part of Path.relative_to? Yes. Yes, we are:

Bring On the Recursion!

This should look familiar by now:

  def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(split(path), split(from), path)
  end

Relative_to/2 calls on Relative_to/3 to do all the work. The three arguments are (from left to right) a list of elements in the path that we’re looking at, a list of elements in the path that we’re starting from, and the initial path itself in plain string text. We’ll see how that’s used shortly.

The bulk of the work, assuming a long directory path, comes in the first pattern match on this function, and it’s a little piece of pattern matching genius:

  defp relative_to([h|t1], [h|t2], original) do
    relative_to(t1, t2, original)
  end

The function pops the head off the list from both paths. If they are the same (thus, the two “H”s in the parameters), we toss out that first item and do this again with the rest of the lists. So “/usr/a” and “/usr/” would both match “/” as the heads and the function would get called again against “[usr/a]” and “[usr]”. It would run against those values and pull out the “use” items. Then it would call itself with “[a]” and “[]”.

In that case, where the two heads of the lists are now different (and the second one is already empty), it doesn’t match this pattern and it bounces down to this pattern match:

 defp relative_to([_|_] = l1, [], _original) do
    join(l1)
  end

If the second value is an empty list, then take whatever is left from the first value and return it. There’s your answer.

[|] would match to [a|[]]. In the second line, that list, as a whole, would be passed over to `Path.join’ to — well, become the rest of the directory after that shared portion.

If the difference were greater and we were looking at something like “/usr/a/b/c/d”, then the match would leave you with “a”, “b”, “c”, “d” and you’d join that list back up with Path.join to give you (in Unix world) “a/b/c/d”.

I bet you’re wondering how Path.join works, aren’t you? Sure, but that’s a lesson for another day.

Path.relative_to

So far, we’ve covered two out of the three pattern matches that the three-arity version of Path.relative_to can handle. We’ve seen what it can do when the first item in the path matches (it goes all recursive), and when the path has more items left than the from (time to return results), but what about when there are no lists left? It’s just one value for the from and for the path? What happens when the two paths are the same, or the ‘from’ path is longer than the path?

If we’ve come that far, we can’t pull one path out from the other. Think about it: how can we pull a one item value out from a one item value? It won’t work. We default to returning the original path, as was given to us earlier in the processs.

  defp relative_to(_, _, original) do
    original
  end

Since one can’t be calculated from the other, we pass back the original value. For example: /usr/a can’t be derived from /usr/b. This function would run when passed the values for ‘a’ and ‘b’. If both paths are the same length, there’s no way one will branch off from the other.

How I Tested This All

Trying to wrap my head around the pattern matching in Path.relative_to/3 meant extracting the code for that function and testing it separately. I created a file named rel.exs and copied the function’s code into it. I did some good old fashioned “IO.Puts Debugging” on this bad boy:

defmodule Augie do

 def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(Path.split(path), Path.split(from), path)
  end

  defp relative_to([h|t1], [h|t2], original) do
    IO.puts("first pm")
    relative_to(t1, t2, original)
  end

  defp relative_to([_|_] = l1, [], _original) do
    IO.puts("second pm")
    Path.join(l1)
  end

  defp relative_to(a, b, original) do
    IO.puts("third pm")
    IO.puts "-- path param: #{a}"
    IO.puts "-- from param: #{b}"
    original
  end

end

I name the module after myself, again, because I’m an egomaniac and I know it won’t have namespace issues with core elixir. (I doubt a pull request to create an “Augie” module would get past Jose…) I could have named it “Patha” or something, but I’m pretty fast with typing my own name.

I tested in iex, calling the script as a parameter to iex:

vagrant@precise32:~/augiedb/elixir$ iex rel.exs
Erlang/OTP 18 [erts-7.0] [source] [async-threads:10] [kernel-poll:false]

Interactive Elixir (1.1.0-dev) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>

And then I ran every possible scenario through it and saw where the debug statements came from:

iex(1)> Augie.relative_to('/opt', '/opt/a')
first pm
first pm
third pm
-- path param:
-- from param: a
"/opt"

iex(2)> Augie.relative_to('/opt/a', '/opt')
first pm
first pm
second pm
"a"

iex(3)> Augie.relative_to('/opt', '/opt')
first pm
first pm
third pm
-- path param:
-- from param:
"/opt"

iex(4)> Augie.relative_to('', '')
third pm
-- path param:
-- from param:
""

iex(5)> Augie.relative_to('', '/opt')
third pm
-- path param:
-- from param: /opt
""
iex(6)> Augie.relative_to('/opt', '')
second pm
"/opt"
iex(7)>

It’s also somewhat magical when you type your name, hit tab, and a function is autocompleted after your name.

(16)

Core Elixir: List.foldl/3 and List.foldr/3

Before Elixir, I never dealt with folding functions. Now, I have a foldl and a foldr function to wrap my head around. It seemed so simple at first, but I had to bend my mind around this one for a couple of reasons. Let’s try to walk through these two functions and see what we can learn.

Digging Not Too Deep

Before we go too far with this, let’s jump to the end: List.foldl and List.foldr are wrappers for Erlang functions :lists.foldl and :lists.foldr. Elixir rearranges the parameter order and that’s it.

I mention this now because the Erlang documentation helps out with explaining these functions shortly.

What Are We Folding Here?

List.foldl takes three arguments: a list, an accumulator, and a function. The accumulator doubles as your starting point.

A fold is basically a reduction. It takes a list, processes every item of it through a function, and returns a single value.

In fact, the Elixir Enum.reduce/3 function is also a wrapper for Erlang’s :lists.foldl/3 function.

List.foldl/3 applies the function to each element in the list, from left to right. Let me repeat it so it doesn’t get too confusing: foldl runs from left to right.

It’s a bit confusing in the Elixir documention, which says that the foldl function

Folds (reduces) the given list to the left with a function.

To my mind, that would mean it runs right to left. The Erlang documentation defines foldr as being “Like foldl/3, but the list is traversed from right to left.”

I think the Erlang docs use the preposition better.

I’ve read other explanations of the two folds that rely on the “from” preposition to explain it better: foldl works from the left, while foldr works its way from the right.

Does It Matter?

A lot of the times, it won’t matter whether you go from the left or from the right. If you’re just summing up or even multiplying together the values in the list, for example, you’ll get the same answer either way. You may remember the “commutative property” from your childhood. That applies here.

On the other hand, when order counts, there’s a big difference between left and right. Take subtraction, for one easy example that’s also used in the Elixir docs:

  • 1-2-3-4 is -8.
  • 4-3-2-1 is a mere -2.

Fold statements are a little more complex than that, though, as they can use the accumulator value to start at a different place, or even in the calculations along the way. Once you try to do something like f(x) -> x - acc, you have a new kettle of fish.

Also keep in mind that the accumulator is mandatory in your fold statements in Erlang and, thus, Elixir. You don’t need to use it, but you must include it.

Start the accumulator at 0 and let’s foldl the list of the first four numbers and see where we go:

x  - acc = new_acc
------------------
1  - 0   = 1
2  - 1   = 1
3  - 1   = 2
4  - 2   = 2

Reverse that and let’s subtract from the right with foldr:

x - acc = new_acc
-----------------
4 - 0   = 4
3 - 4   = -1
2 - -1  = 3
1 - 3   = -2

At each step of the way, reduce keeps track of which item on the list it’s at, and what the value of the accumulator is.

The reduce function itself can be a complicated piece of machinery, and it’s one that powers much of Enum. Someday, I hope to get around to figuring out how it works in graphic detail.

When in doubt, foldl

Why? It’s tail recursive. This makes sense. You’re iterating over a list. How many times have we seen now that lists work best when processed from left to right? I suspect it has something to do with that. And if it doesn’t — well, it’s an easy way to remember it, at least.

For more detail on this, check out this HaskellWiki article.

Heavier Reading

This whitepaper was a pretty interesting-looking appreciation of using a fold over recursion:

…we show that even though the pattern of recursion encapsulated by fold is simple, in a language with tuples and functions as first-class values the fold operator has greater expressive power than might first be expected.

Not that I understand a word of it.

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(15)

Core Elixir Follow-Up: List.delete_all

Three updates from the List.delete_all post a couple weeks back:

Naming is Hard

Yeah, that’s a fair point. A better name for the function might be List.delete_all_of, maybe? (“Delete all of the 5s from this list.”) It’s tough to be descriptive and terse at the same time. Nobody wants to type List.delete_all_values_from_this_list_that_equal_this. We’re not Objective-C with XCode to autocomplete all of that for us.

I poke fun of Objective-C sometimes, but I admit that it’s helped me to make lengthier and more descriptive variable names that I’ve appreciated more than once six months later… I do, however, use Ruby’s underscore syntax over CamelCase to make those variable and function names. In Perl.

I’m a heretic.

Refactoring

I had a pull request from Daniel Garcia, who made a smart observation about my original List.delete_all code. Here’s what I started with:

  def delete_all(list, value) do
    delete_all(list, value, [] ) |> Enum.reverse
  end

  defp delete_all([ h | [] ], value, end_list) when h === value do
    end_list
  end

  defp delete_all([ h | [] ], _value, end_list) do
    [h|end_list]
  end

  defp delete_all([h|t], value, end_list) when h === value do
    delete_all(t, value, end_list)
  end

  defp delete_all([h|t], value, end_list) do
    delete_all(t, value, [h|end_list])
  end

I didn’t think too much about it. I got something to work and stopped there, because I knew I’d be working on the problem in different styles. Still, there’s an obvious pattern matching issue here that I should have noticed. Take a look at these two functions, in particular:

  defp delete_all([ h | [] ], value, end_list) when h === value do
  defp delete_all([ h | t  ], value, end_list) when h === value do

You can pattern match inside the arguments. If you’re looking for the case where h and value are the same thing, you don’t need to put that in the guard clause. You can match it inside the argument list:

  defp delete_all([ value | [] ], value, end_list) do
  defp delete_all([ value | t  ], value, end_list) do

Now, look at that and realize that in one major case, it’s the same function. When the tail is an empty list, the first version will be triggered instead of the second. Can we combined those two into one? Yes, if the base case (the one that returns the final results) is rewritten to look for just an empty list.

Those middle three functions can then be combined down into two:

  def delete_all(list, value) do
    delete_all(list, value, []) |> Enum.reverse
  end

  defp delete_all([], value, end_list) do
    end_list
  end

  defp delete_all([value|t], value, end_list) do
    delete_all(t, value, end_list)
  end

  defp delete_all([h|t], value, end_list) do
    delete_all(t, value, [h|end_list])
  end

Your variants of the private function handle the cases where (1) the value is the same as the head, (2) the two are different, and (3) you have an empty list. That’s a lot more straight forward and obvious than my initial version was, where you had alternate versions of the same function depending on a guard statement that was redundant and the base case was one step removed from the end of the actual list. It just plain old makes more sense.

Reduce It Again

I wrote up a basic reduce function to do the same thing, admitting that I thought it might be done cleaner somehow.

In jumped Kash Nouroozi with a gist to change my function —

def delete_all(collection, value) do
  Enum.reduce( collection, [ ], fn(x, acc) ->
    case x !== value do
      true  -> [x | acc]   # Not the value, add it to the list
      false -> acc         # Matches the value, so don't add it
    end
  end)  |> Enum.reverse
end

— to something more elegant using foldr that doesn’t even need the reverse at the end:

def delete_all(collection, value) when is_list(collection) do
  List.foldr collection, [ ], fn
    x, acc when x == value ->
      acc
    x, acc ->
      [x|acc]
  end
end

I understand this new bit of code, but I’m afraid explaining it in graphic detail will have to wait for another day. I am working on a foldl/foldr edition of Core Elixir for some point in the future. So stay tuned there.

Thanks once again to Daniel and Kash for their contributions/pull requests/gists.

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(14)

Core Elixir: File.cd and Friends

We’ve talked briefly about the File library in the past when we discussed File.stat and, of course and not confusingly, File.Stat.

This week, we return to the File library to look at how we might change the current working directory in Elixir. It’s pretty straightforward, but there’s one neat twist to it.

Change Your Directory

There might come a time in your code where you will want to change your working directory. File.cwd will tell you what that directory currently is, but it is File.cd/1 that will set you in a different directory. You tell it the directory to change to, and it will either return an :ok, or an :error with the appropriate POSIX-compliant error message.

More succinctly, the official Elixir documentation says this:

cd(Path.t) :: :ok | {:error, posix}

Same thing.

Here it is in action:

iex> File.cd("/tmp")
:ok

iex> File.cwd
{:ok, "/tmp"}

Now, since this is Core Elixir, let’s look at the implementation details.

Yup, It’s an Erlang Wrapper

The first clue that we’re wrapping Erlang here is that the source code uses the alias F. Looking at the top of the file, we see where that goes:

 alias :file, as: F

Here’s the code, then:

  def cd(path) do
    F.set_cwd(IO.chardata_to_string(path))
  end

:file.set_cwd is the Erlang command to set the current working directory. The only trick is that you have to stringify the path name so that Erlang will deal with it. That’s not so much of a trick as it is Elixir 101 at this point.

As usual, there is also the cd! version of the function. Instead of just returning the error message and carrying on its merry way, it blows the whole thing up:

iex> File.cd("/tmp/this_directory_does_not_exist")
{:error, :enoent}

iex> File.cd!("/tmp/this_directory_does_not_exist")
** (File.Error) could not set current working directory to /tmp/this_directory_does_not_exist: no such file or directory
    (elixir) lib/file.ex:1087: File.cd!/1

That latter error message appears in all red text, by the way, in case you questioned just how serious it was.

The source for this is, as you might expect, fairly similar to File.cd, but with an extra layer added on. (See lines 3 – 7, in particular.)

  def cd!(path) do
    path = IO.chardata_to_string(path)
    case F.set_cwd(path) do
      :ok -> :ok
      {:error, reason} ->
          raise File.Error, reason: reason, action: "set current working directory to", path: path
    end
  end

The source stringifies the path name first, then runs F.set_cwd (which is also basically File.cd now) and pattern matches the results. If all is good, it returns :ok. If something went horribly wrong, it pattern matches on the :error, prints out the reason , and blows stuff up.

It’s just a case statement, though. It’s nothing we haven’t seen before.

##One More Thing…

There’s a File.cd!/2 which I find the most fascinating function of all. This one takes a directory and a function. It changes into that directory, runs that function, and then resets your current working directory to where it was before you ran the command.

That’s pretty neat.

iex> File.cwd
{:ok, "/home/vagrant/augiedb/elixir"}

iex> File.ls
{:ok,
 [".DS_Store", ".iex",  "card_game", "cards",  "cards.exs", "chapter13",  "dose",   "elixir", "Elixir.Card.beam", "Elixir.CardPoints.beam", "Elixir.Chip.beam",
  "Elixir.Factorial.beam", "Elixir.Gcd.beam", "Elixir.Guard.beam",
"factorial1.exs", "filechange", ...]}

iex> File.cd!("fileutils", fn() -> File.ls end)
{:ok,
 [".git", ".gitignore", "Augie", "config", "lib", "mix.exs", "README.md",
  "t.txt", "test", "_build"]}

iex> File.cwd
{:ok, "/home/vagrant/augiedb/elixir"}

You can see that we end in the same directory as we started, despite using a cd command in the middle there. You can also see that there are two completely different directory listings given.

Let’s see how it works:

  def cd!(path, function) do
    old = cwd!
    cd!(path)
    try do
      function.()
    after
      cd!(old)
    end
  end

I almost laughed when I saw this code. It just strikes me as funny how terse it is. It’s the most succinct code I’ve seen in Elixir to date. Nothing fancy. It’s like a baby learning to speak. Everything looks like a two word sentence.

“Mama. Dada. Try do.”

That’s mostly because we’re at a slightly higher level of abstraction here. This is File.cd!/2 using File.cd/1 to do the work of changing directories. Much like with File.cd!/1 using File.cd/1, you don’t need to repeat all that code again for File.cd!/2. It’s already done for you; use it. Plus, the function that File.cd!/2 is running is neatly saved in a variable, function, so you don’t have to look at all that code. It’s all building up on itself.

Here’s how it works:

File.cd!/2 first grabs the current working directory and saves it so it knows where to go back to later.

Then, it changes to the new directory, failing out if that doesn’t work.

Next, it runs the function before changing back to the original directory, no matter what. The try do/after construction just means that even if the first statement fails, the statement in the after section will still run.

Note that there is no File.cd/2 (without the !). This is code that will either work, or blow up. There is no half measure on this one.

##One Last Piece of Trivia

File.cd and all of its variant arities aren’t used anywhere else in the File library.

Hey, they call it “trivia” for a reason…

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(13)

Core Elixir: List.delete/2 and List.delete_all/2

Last week, we looked at List.delete_at, which deletes a value at a specific position of a given list.

What if you know the value you want to delete, though, but not the position? Elixir has List.delete/2 for that!

There is one catch, however. If your list contains the value you’re looking to delete in more than one place, it will only delete the first one. The rest will remain.

How does this function work? Why doesn’t it delete more than one value? What kind of daring recursion knows to short circuit itself so strongly?

Beats me. The function is an Erlang wrapper:

  def delete(list, item) do
    :lists.delete(item, list)
  end

Digging into Erlang source code is beyond the scope for Core Elixir.

It does, however, perfectly follow one of the Elixir language’s goals mentioned at the top of the List library’s documentation:

A decision was taken to delegate most functions to Erlang’s standard library but follow Elixir’s convention of receiving the target (in this case, a list) as the first argument.

Delete Them. ALL of Them.

As a programming exercise, let’s create a List.delete_all/2 that won’t stop after the first example of the value is found.

We’ll try recursion with a dash of pattern matching:

  def delete_all(list, value) do
    delete_all(list, value, []) |> Enum.reverse
  end

  defp delete_all([h|[]], value, end_list) when h === value do
    end_list
  end

  defp delete_all([h|[]], _value, end_list) do
    [h|end_list]
  end

  defp delete_all([h|t], value, end_list) when h === value do
    delete_all(t, value, end_list)
  end

  defp delete_all([h|t], value, end_list) do
    delete_all(t, value, [h|end_list])
  end

You have List.delete_all/2 now that takes a list and the value you’re looking to eliminate from that list. In turn, that calls the private function List.delete_all/3, where all the recursive fun begins. It goes through the list left to right, checking each member of the list to see if it matches the value you’re trying to eliminate. If it’s a match, it skips over it and calls itself with the tail of the list you started with, and without adding any values to the new list you’re constructing.

When you get to the final value, it checks for a match in the guard clause, and then returns the appropriate list.

The end result, naturally, is backwards, so the main function handles the reversing of the list to make it look right again.

Like this:

iex> li = [0,1,0,0,1,0]
iex> List.delete_all(li, 0)
[1, 1]
iex> List.delete_all(li, 1)
[0, 0, 0, 0]

iex> li = [1,2,3,4,5]
iex> List.delete_all(li, 3)
[1, 2, 4, 5]
iex> List.delete_all(li, 5)
[1, 2, 3, 4]
iex> List.delete_all(li, 0)
[1, 2, 3, 4, 5]

Take Two

Then I stopped to think, and remembered that functional programmers love three things:

  • Map performs the same function to every item in a list. (New List: Same Length)
  • Filter eliminates values from a list that don’t match up to a function. (New List: Smaller or Same Length)
  • Reduce takes a list and returns a single value derived from the items in the list. (Accumulator Value)

This is a gross simplification, but it works for me.

When I look at what I coded above, it’s basically a map function. I’m going over a list and applying a function to each item, one by one, then returning a list. But that’s not the best abstraction for this problem.

Map, by its very nature, is meant to apply a function to every item in a list and return that entire list, transformed. My code basically mapped over a list and created a new list. Kinda close, but not really it.

The problem I’m solving for here is an obvious filter. We need to filter out the values from the list that we don’t want:

  def delete_all(list, value) do
    Enum.filter(list, fn(x) -> x !== value end)
  end

These three lines of code do the same thing as the other monstrosity I programmed earlier. It filters the list, returning only the values for which the function is true — that the item in the list isn’t the same as the value we’re looking to get rid of. So it only returns values from the original list that don’t equal the value you’re looking to, well, “filter out.” It’s right there in pretty plain English. And you can save 12 lines of code.

Map. Filter. Reduce.

Such helpful concepts.

Functional programmers love it so much, they even combine them. Elixir’s Enum library also contains map_reduce/3 and filter_map/3. There’s even something called flat_map_reduce/3.

Filter It, Just a Little Bit (More)

But filter isn’t as deep as we could go. Here’s the Enum.filter source code:

def filter(collection, fun) when is_list(collection) do
  for item <- collection, fun.(item), do: item
end

def filter(collection, fun) do
  Enumerable.reduce(collection, {:cont, []}, R.filter(fun))
  |> elem(1) |> :lists.reverse
end

We’re dealing with a list here, so we get to use the much more simple first instance you see there. That’s a list comprehension that acts as a filter. It goes through each item in the collection, runs it through a function to make sure it returns a truthful value, and then does something with it. In this case, it returns the value, un-altered, so long as the function turns out to be true when run against that item.

The second filter version works when your collection it not a list. We’re not going to worry about that one now. We’ll get back to reduce in a little bit, though.

What if we rewrote our delete_all function directly with the list comprehension code that filter uses?

def delete_all_filter(collection, value) when is_list(collection) do
  fun = fn(x) -> x !== value end
  for item <- collection, fun.(item), do: item
end

Double check that it works:

iex> li = [1,2,3,4,3,4,5]
iex> List.delete_all(li, 4)
[1, 2, 3, 3, 5]
iex> List.delete_all(li, 3)
[1, 2, 4, 4, 5]

Yup, that’ll do it.

I doubt this speeds anything up, but it does feel “closer to the metal.”

If you were really feeling adventurous, though, you’d rewrite it as a straight reduce.

Oh, we’ve gone this deep, already. Why not?

The Reduce Road

Joe Kain has a great reduce write-up that I can’t recommend enough. It’s the best explanation of reduce I’ve seen so far, and I used it as inspiration for writing my function here:

  def delete_all(collection, value) do
    Enum.reduce( collection, [], fn(x, acc) ->
      case x !== value do
        true  -> [x | acc]   # Not the value, add it to the list
        false -> acc         # Matches the value, so don't add it
      end
    end)  |> Enum.reverse
  end

Note it still has the same user interface. It’s still a list and a value for the two parameters and that’s it. The reduce function takes care of the rest. The only complication here is the case statement I had to throw in. If there’s a way around it without adding more functions, please drop me a line or an issue/pull request. I’d love to simplify this further, so long as it’s readable.

The way this works is by using basically the same function we used above, but instead of just returning one value at a time, the reduce statement grows its own list as the list is traversed. It does, however, return the list backwards, so we pipe the results to Enum.reverse to straighten it back out. Now the input and the output of this function are identical to the other options discussed above.

In the end, I think the filter solution is the cleanest and clearest to me. It certainly beats my original stab at the problem. Maybe that’s the lesson we learned today? The further down you dig, the more options you have and the better your final function could be. Also, you might learn some stuff.

Sure, I’ll go with that.

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(12)

Core Elixir: List.delete_at/2

The List module includes a bunch of functions that can’t be done with an Enum function, which collections conform to. If you’re having a hard time deciding whether to handle something as a List or Collection, try the Collection first.

But if you have a list and want to delete a single item on it, you’ve come to the right place!

List.delete_at/2

The first thing you need to remember is that Elixir is immutable. The return value from this function will be a new list. You can rebind it to the original list’s name, but it will not be directly affecting the data you started with. In reality, you’re not so much deleting an item out of the list as you are generating a new list that happens to not contain this value.

I do not include this at the top because I made that mistake. Oh, no. I’m a blogger and, as with Wikipedia, an Unimpeachable Voice of Authority/Wisdom. I’m telling you this for a friend. Or something.

Anyway —

delete_at takes two arguments: the list and an index, which is the position of the value you want to delete. This position is zero-based. Again, I remind you of this for a friend.

iex> a = [1,2,3,4,5]
iex> List.delete_at(a, 3)
[1, 2, 3, 5]

Remember what I said before about immutability?

iex> a
[1, 2, 3, 4, 5]

The value of the a list didn’t change at all. You can rebind, though:

iex> a = List.delete_at(a, 3)
[1, 2, 3, 5]
iex> a
[1, 2, 3, 5]

This function will also accept a negative index, which will count from the end of the list:

iex> List.delete_at(a, -2)
[1, 2, 5]

When you’re counting backwards, you count 1-based and not zero-based. Not that I’d accuse you of ever trying something silly like using a -0 index. Oh, no. That’s for me to try for you– er, your friend:

iex> a = [1,2,3,4,5]
iex> List.delete_at(a, -0)
[2, 3, 4, 5]

Remember, the index is 0-based and 0 is the same as -0 and is a valid number. If you try to delete -0, you’re just deleting 0.

I do these things so you don’t have to.

If you give a number that makes no sense, er, is out of bounds, the original list is returned:

iex> a = [1,2,3,4,5]
iex> List.delete_at(a, 27)
[1, 2, 3, 4, 5]

There’s nothing in the 27th (er, 28th) position to delete, so Elixir doesn’t do anything.

When you offer the function a negative number to count backwards with, it’s all a trick. The source code won’t be counting back, it’ll do something else. Let’s look at the source now so we can get that point.

The Source

delete_at is a gatekeeper function, meant to make things a little easier for the programmer. It exists to convert the negative index into a positive value first before passing everything along to do_delete_at, which is a private function that takes two values: the list and the index.

Let’s start with a positive number example, and look more closely at do_delete_at.

It first takes care of the simplest solutions. If the list is empty, it returns an empty list:

  defp do_delete_at([], _index) do
    []
  end

Duh.

If you only want to delete the 0th element (which is the first value, remember, as well as the -0th), return just the tail:

  defp do_delete_at([_|t], 0) do
    t
  end

With those cases out of the way, now we can get to the fun recursive stuff. This one bent my mind a little, until I sketched it out. Here’s a good tip for figuring out recursive functions: Start at the base case and increment your way up. In this case, I started with an index of 0 already: It returns the tail.

Let’s look at the code all together, and then we’ll walk through it with an index of 1:

defp do_delete_at([_|t], 0) do
  t
end 

defp do_delete_at(list, index) when index < 0 do
  list
end

defp do_delete_at([h|t], index) do
  [h | do_delete_at(t, index-1)]
end

The third pattern would match (list, 1), returning the head value followed by the results of putting (tail, 0) through the same function. You pass in 0 (decrementing the index) the second time since the list is now one item shorter at the front.

We’ve already seen what (tail, 0) is going to return — the tail of the list you just passed in. Thus, the head is thrown out.

So, this function returns the head of the original list, tosses out the next value, then returns everything else: You just deleted the second item on a list, effectively.

Once you have that in your mind, the rest of them are easy. 2, 3, 4, etc. just get to the point where you cut off the head and return the tail plus all the previous heads.

As a nifty bonus, you don’t need to reverse the list at the end. The process maintains the order as it goes along.

Counting Backwards

A negative index does something interesting. There’s no code for explicitly counting backwards. do_delete instead calculates the forward-counting position in the list to delete. It adds the length of the list (hello, List.length/1) to the negative number you provided and deletes that item.

For example: If your list is the numbers 1 through 5 and you want to delete the -2nd item in the list (4), do_delete will add list length 5 to -2 and delete the element at position 3, which is the fourth item on the list.

Voila!

That was “simple.”

Wait, What About That Middle Function?

Update: 8/23/2015 Answering a Reddit question here, which I skipped over in the initial explanation.

What, this one in List.do_delete_at?

defp do_delete_at(list, index) when index < 0 do
  list
end

That’s there in case the negative value the user submits has an absolute value greater than the length of the list. if the user does something silly like that, Elixir will just pass back the original list, much the same way it does when the user provides a positive index greater than the length of the list.

iex> a = [1,2,3,4,5]
iex> List.delete_at(a, -100)
[1, 2, 3, 4, 5]

The gateway function (List.delete_at/2) doesn’t test for a negative value’s actual value. That happens inside the private functions’ (List.do_delete_at/2) pattern matching. This is what the gateway looks like:

  def delete_at(list, index) do
    if index < 0 do
      do_delete_at(list, length(list) + index)
    else
      do_delete_at(list, index)
    end
  end

You’ll see that while it checks to see if the index value being passed in is negative, it doesn’t check to make sure that it isn’t TOO negative. That happens in the next step.

Coming Up Next Week

We’ll take a look at a sister function of List.delete_at, how it differs, how we might improve it, create a new function, refactor it two ways, and then take a long nap.

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(11)

Core Elixir: Grab Bag 1

In the course of researching “Core Elixir” articles, I come across all sorts of little points of interest that don’t always fit into the article, no matter how far I stretch them. Those wind up in a little scrap file. That scrap file becomes the Grab Bag.

Self-Evident Git Commit

They say your Git commits should be descriptive.

This commit added backticks to surround true and false 64 times across 30 files. The git log on it reads, “add backticks to a massive amount of true, false and nil.”

“Massive” is a good word.

Alias Assumption

When you use an Alias in a module, you don’t have to specify what to Alias it as — if you just want to use the last section of the module’s name.

For example, if you have a module named Long.Module.Name, the default alias will be ‘Name’. These two commands do the same thing:

alias Long.Module.Name, as: Name
alias Long.Module.Name

I’m still wrapping my brain around writing modules with multiple levels like that, so this is a neat trick for me.

Renaming Maps

Elixir works hard to keep the arguments of its function calls consistent, for the sake of the pipeline operator. But it doesn’t stop there.

The Map library has a couple of functions that wrap an Erlang function directly, re-ordering the arguments for the pipeline operator’s sake, but also to rename the function for readability. For example:

def has_key?(map, key), do: :maps.is_key(key, map)

It’s not enough to simply re-order the arguments there. The function name, itself, is changed to make it more readable and consistent with the rest of Elixir’s syntax. Read it out loud and I think you’ll agree that, in that order, the “has_key” name works better. ” ‘Map’ has key ‘key’ ” reads better than however-you’d-read-the-Erlang equivalent. (“‘Maps’ ‘is a key’ named ‘key’ in ‘map'”?!?) It’s obvious what it does, but it doesn’t read as cleanly.

Since you’ll likely be including the module name with a call to this function, there’s even a certain rhythm there:

Map.has_key?(map, key)

It alternates between Map and Key.

And, because Elixir allows for it (as in Ruby), the question mark in the function name indicates that the return value will be a boolean. That’s not enforced by the language. It’s in the source code’s specs, but that’s only useful for documentation and tools like Dialyzer. It’s a stylistic thing that you’re encouraged to use.

If you do declare a function that ends in a question mark that returns an integer, prepare yourself for a pull request to fix that…

Erlang is Not For the Faint of Heart

The Supervisor shutdown variable is a bloodthirsty one. You can choose between three values when shutting down a Supervisor.

First, there is :infinity. This is the most forgiving option, leaving the child process that is another Supervisor all the time it needs to stop everything below it first before the Supervisor dies off.

Second, there is a number, which represents the time it’ll wait for the child process to shutdown. If it doesn’t get confirmation back from the child that it shut itself down in x number of milliseconds, the Supervisor takes out the child hard:

  Process.exit(child, :kill)

In the words of Elixir’s documentation, “the child process is unconditionally terminated”. Gruesome!

There aren’t too many languages out there that give you the chance to kill your children like this.

But, wait, there’s more! There’s a third option:

  Process.exit(child, :brutal_kill)

Elixir is starting to sound like either an 80s action movie or a 90s fighting video game. That option goes straight to killing the child without spending any time waiting. Just — boom! The child is out of its misery without warning.

Some might call that more humane.

Housekeeping: Updating Core Elixir

I made an update to the Collection To List installment that added a new section, cleaned up some confusing code, and just generally rambled on.

I also [cleaned a couple of minor things up on last week’s System.tmp_dir post.

Further back, I made an update to a January 2014 post about random number generation to reflect a recent deprecation in Erlang.

But the big news is, this blog is now on Github! I’ve taken the original Markdown files for every post and created a Github project for them. If you see any problems with past, present, or future posts here, make a pull request. (Spelling errors, coding errors, what have you.) If there’s a part of Elixir you’d like to see me tackle in this series, raise an issue.

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(10)

Core Elixir: System.tmp_dir/0

In which a Hex module is suggested, a pull request is considered, Erlang is explored, and a big warning kicks things off.

Sometimes, in the course of handling files through automation in any programming language, you’ll have need of a temporary directory to stash things away in. Elixir has something to help you find that directory: System.tmp_dir/0

The command returns the name of the most suitable temp directory for you that is writable.

But the devil’s in the details and it’s very important to set expectations properly here.

A BIG GIANT HUGE WORD OF WARNING

Seriously. Heed me.

System.tmp_dir only finds a directory for you. It does not create one special to this program. It will not guarantee for you that the directory is empty to begin with.

It will merely give you a location on the file system where files can be written. If you’re lucky, there’s an environment variable set to guide you to a good place. If you’re not lucky, it’ll return nil and you won’t have anywhere to go. If you’re supremely not lucky and are having a very bad terribly no good day, you’ll get the current working directory returned to you.

You see, my precious Perl has a module like this called File::TempDir. (It’s based on File::Temp, naturally.) It creates a new and randomly-named temporary directory for you and then destroys it and everything in it once it goes out of scope. It’s a truly temporary place that’s also self-cleaning. Best of both worlds.

Elixir’s System.tmp_dir will only help you find a directory that’s writable. Period. It won’t guarantee that other files won’t be there. It won’t guarantee that what files you write to it will ever go away. It won’t destroy the temporary directory once it’s gone. That’s still all on you. So keep track of things.

You may consider this a warning, or you may consider this a golden opportunity to create a new application to put up on Hex for one and all. Take your pick.

With that out of the way…

Let’s Look at the Code

System.tmp_dir takes no arguments and returns one string value: the location of the temporary directory you can write to.

  def tmp_dir do
    write_env_tmp_dir('TMPDIR') ||
      write_env_tmp_dir('TEMP') ||
      write_env_tmp_dir('TMP')  ||
      write_tmp_dir('/tmp')     ||
      ((cwd = cwd()) && write_tmp_dir(cwd))
  end

This one is pretty simple. The entire thing is a short-circuiting OR (||) statement. The first chance Elixir gets to find something that will work, it’ll exit out with that directory’s location.

This function breaks up into three pieces:

  • First, it checks for environment variables. (lines 2-4)
  • Second, it’ll check on a /tmp directory. (line 5)
  • Third, if all else fails, it’ll go with the current directory. (line 6)

Let’s drill down now and get our hands dirty at even deeper code:

Looking Up the Directories

The three checks on the environment variables all use the function write_env_tmp_dir/1, taking as an argument the environment variable to check.

Let’s see how it works:

defp write_env_tmp_dir(env) do
  case :os.getenv(env) do
    false -> nil
    tmp   -> write_tmp_dir(tmp)
  end
end

This uses the case statement, which is an Elixir macro short for “We need an IF statement but we’re functional programmers and we don’t use if statements, so cover it with a macro.”

It farms the work out to Erlang, using its :os.getenv function to look for the system variable provided to it.

That’s an interesting Erlang function by itself. So let’s take a look at it. One more level down, we go:

:os.getenv/0 – :os.getenv/2

It comes in three flavors, with arities of 0, 1, and 2. The 0 arity function call returns all available environment variables. Here’s a truncated look at what that result is on my humble little Vagrant box:

iex> :os.getenv
['LC_PAPER="en_US.utf8"', 'SSH_CONNECTION=192.168.33.1 51655 192.168.33.10 22',
 'PWD=/home/vagrant/augiedb/elixir', 'LC_ALL=en_US.utf8',
 'LC_IDENTIFICATION="en_US.utf8"', 'LC_MEASUREMENT="en_US.utf8"',
 'LESSCLOSE=/usr/bin/lesspipe %s %s', 'LC_NAME="en_US.utf8"', 'SHELL=/bin/bash',
 'ROOTDIR=/usr/local/lib/erlang', 'LC_MESSAGES="en_US.utf8"',
 'PATH=/usr/local/lib/erlang/erts-7.0/bin:/usr/local/lib/erlang/bin:/home/vagrant/.rakudobrew/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/vagrant_ruby/bin',
 'LC_COLLATE="en_US.utf8"', 'TERM=xterm', '_=/usr/local/bin/iex',
 'LOGNAME=vagrant', 'BINDIR=/usr/local/lib/erlang/erts-7.0/bin',
 'MAIL=/var/mail/vagrant', 'LESSOPEN=| /usr/bin/lesspipe %s']

If you call it and pass along the environment variable (the /1 arity), it’ll return the value:

iex> :os.getenv('PWD')
'/home/vagrant/augiedb/elixir'

Finally, you can pass both a key AND a value into the function. If Erlang finds that environment variable, it returns its value. If that variable doesn’t exist, it returns the value you passed in. Do note, though, that the environment variable won’t be created or modified.

iex> :os.getenv('HELLO')  # Does not exist.
false
iex> :os.getenv('HELLO', 'WORLD') # Provide a default.
'WORLD'
iex> :os.getenv('HELLO')  # See?  It still does not exist.
false

Where Were We? Oh, Yes…

defp write_env_tmp_dir(env) do
    case :os.getenv(env) do
      false -> nil
      tmp   -> write_tmp_dir(tmp)
    end
  end

We’re past the case statement now. You’re getting ‘false’ in response to the :os.getenv/1 command if the environment variable doesn’t exist. As a result, this function will return a nil back, which will force the next statement after the || (or) in the tmp_dir function above.

If there is a value for that key, you get that value back, which the code pattern matches with ‘tmp’ and sends as the argument to the next function down, write_tmp_dir. We have a directory, but now we’ll check that we can use it!

write_tmp_dir/1

This is the end of the road for our exploration. It’s as deep as we’re going to get. Honest. This is where our final answer will be found and returned.

write_tmp_dir begins, amusingly, with our old friend File.stat! This time, it’s looking for the properties of that temporary directory we’re working on providing the coder, who’s back about six steps now and doesn’t realize it since things run so quickly on modern computer chips.

defp write_tmp_dir(dir) do
  case File.stat(dir) do
    {:ok, stat} ->
      case {stat.type, stat.access} do
        {:directory, access} when access in [:read_write, :write] ->
          IO.chardata_to_string(dir)
        _ ->
          nil
      end
    {:error, _} -> nil
  end
end

Do if statements make you feel dirty? How do nested case statements work for you? Maybe I need to make a pull request to help flatten this out:

defp write_tmp_dir(dir) do
  case File.stat(dir) do
    {:ok, stat} -> return_tmp_dir(stat, dir)
    {:error, _} -> nil
  end
end

defp return_tmp_dir(stat, dir) do
 case {stat.type, stat.access} do
  {:directory, access} when access in [:read_write, :write] ->
    IO.chardata_to_string(dir)
  _ ->
    nil
 end
end

That looks so much better to me.

return_tmp_dir is probably not the best name, but naming is hard and that isn’t the point of this write-up.

I probably won’t submit this as a pull request, though. Quoting the Elixir Contributions docs:

NOTE: Do not send code style changes as pull requests like changing the indentation of some particular code snippet or how a function is called. Those will not be accepted as they pollute the repository history with non functional changes and are often based on personal preferences.

I’m in a gray area here with this one. This isn’t a high holy war of camelCase versus snaked_names or parenthesis vs. no parenthesis, but it also doesn’t add any functionality. It looks prettier, I think, but I don’t want to be a polluter. 😉

As much as I loved Garrett Smith’s talk, that doesn’t mean I should apply his techniques everywhere I see Erlang/Elixir code. In my projects, sure…

I’ve also looked at other Elixir source code and found a mixed bag of nested case statements. A couple do, indeed, call out to another private function, but most just nest the code right there without that extra level of redirection. (File.do_cp_r actually goes three levels deep!)

In any case, in plain English, the function grabs the stats on the directory. If it can’t for some reason (the directory doesn’t exist, for example), it won’t match with {:ok, stat} and will instead return a nil. Game over. (Or, at least, on to the next check…)

If there are stats to be had, we go deeper. Let’s pull those lines out specifically:

  {:ok, stat} ->
    case {stat.type, stat.access} do
      {:directory, access} when access in [:read_write, :write] ->
        IO.chardata_to_string(dir)
      _ ->
        nil
    end

Here’s a sample File.stat result

iex(1)> File.stat('ia')                                                              
{:ok,                                                                                
  %File.Stat{access: :read_write, atime: { {2015, 7, 24}, {22, 35, 36} },
  ctime: { {2015, 7, 24}, {22, 35, 36} }, gid: 1000, inode: 112, links: 1,             
  major_device: 22, minor_device: 0, mode: 16895,                                     
  mtime: { {2015, 7, 24}, {22, 35, 36} }, size: 0, type: :directory, uid: 1000} }   

The code looks specifically for the type and access keys, pattern matching them in the next line to :directory and access. In other words, you better be a directory and not, say, a file. Secondly, there’s a guard clause on line 3 to govern what type of access you need to have to that directory to use it. It needs to be either read/write or write only. That makes sense to have for a directory you want to write files into temporarily, don’t you think?

If that all holds up, we get to our end position, which is that the function returns the writeable temporary directory name, properly stringified for peak Elixir usage.

Is That All?

No.

Don’t be silly.

We haven’t discussed what happens when the case statement back at the top leads to the /tmp directory. You didn’t forget about that while I was dragging your attention all around the world, did you?

Can’t say as I blame you…

In Case of /tmp

It’s been awhile, so let’s go back to the very beginning:

  def tmp_dir do
    write_env_tmp_dir('TMPDIR') ||
      write_env_tmp_dir('TEMP') ||
      write_env_tmp_dir('TMP')  ||
      write_tmp_dir('/tmp')     ||    ## You are here
      ((cwd = cwd()) && write_tmp_dir(cwd))
  end

We’re up to line 5, where no environment variables have been set and so we look for the /tmp directory as a possible answer. With this option, we can skip over the checks for environment variable status that we had with write_env_tmp_dir and go straight to checking on the /tmp directory at write_tmp_dir/1, a function we’ve already discussed.

It’ll tell us if the directory exists, is a directory, and is writable. Or not.

If not, we arrive at the final option.

Think Globally, Write Locally

I call it the “nuclear option” for reasons outlined at the very top. This is the default answer when all else fails: The current working directory.

Instead of calling out to another function, there’s a little block of code with two commands joined by an AND (&&) to run the second half assuming the first half returns a value. (Erlang’s documentation does bring up a circumstance under which this would fail — if the directory’s permissions prevent it.) First, it sets the cwd variable up with the current working directory and, once that has a value, it goes back to the write_tmp_dir well to prove that it is a valid and writeable directory.

I’m not so sure why they didn’t just combine it into one command like so:

 (write_tmp_dir( cwd() ) )

I tested it. It works. Maybe it’s less readable that way? Is it a violation of Elixir style guides in that you’re reading it inside out instead of left to right? For one level of depth, I don’t think it’s a bad call. Plus, eliminated a temporary variables is always a pleasant thing.

What about the more Elixir-ish pipeline option:

  cwd() |> &write_tmp_dir/1

That looks like a jumble of punctuation, though. I’m still not convinced the current line is the prettiest version of the code, but I’m really nit-picking at this point. I definitely won’t do a pull request here.

One Last Note

There, is of course, a version of the function with the added bang to return a more verbose error in case of failure. I present to you System.tmp_dir!/0:

  def tmp_dir! do
    tmp_dir ||
      raise RuntimeError, message: "could not get a writable temporary directory, " <>
                                   "please set the TMPDIR environment variable"
  end

As you can see, it just calls on the bang-less tmp_dir function we’ve already exhaustively covered and, should it fail and receive a nil, it executes the second half of the OR (||) statement, raising an error complete with a suggestion for how to fix the problem. After all, the problem isn’t the code; it’s the file system. Set your ENV and be a better UNIX person.

At Long Last, It’s Summary Time!

There you have it — the fancy pants Elixir way to find a temporary directory. Use with care. Track the files you’re playing with. Be a nice citizen and clean up after yourself.

And always eat your vegetables.

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(9)

Updated August 11, 2015: Whops, got the arity slash in the wrong direction in one spot. Also added clarification to the “AND (&&) to guarantee both halves will run” bit. Thanks to Henrik for the spots!

Core Elixir: Collection to List

First, we need to define some terms: Elixir Collections include things like HashDicts, Tuples, and Lists. A List is a very specific type of collection: It’s a singly linked list, basically. It has an order, which the other collections don’t have.

If you want to do anything with a list, it’s expensive. You need to traverse the whole list before you can do much with it. You can’t get to the nth member of the list without going through the first n-1 members to get there. There’s no direct references to individual members of the list. No indices. No extra pointers.

The Enum module works with collections. It takes the place of the various types of loops you might write in other languages.

Lists, in particular, have their own module, cleverly named List, that handles functions that make sense only to lists and not collections as a whole. You can flatten a list there, find an item in a specific position, or fold a list to the left or right, for four examples.

So, to sum it up:

  • Lists are a special form of Collections.
  • Enum deals with Collections.
  • List deals with list-specific things an Enum isn’t appropriate for.

With that out of the way:

Conversion Therapy

Can we convert a collection to a list? Of course we can! We use Enum.to_list/1.

The crazy thing is how that function does it.

Remember that with Elixir it’s easy to separate the head (first value) from the tail (everything afterwards) of a list. It’s not easy to do the reverse and break the last value off the rest of the list. (The List.last function does this by traversing its way all the way to the end in tail-recursive style.) You need to always move from the front to the back of a list; performance can be rather dismal, particularly as your list gets bigger.

There’s no direct way of converting a collection into a list. You can’t wave a magic wand and have it happen. Instead, you trick Elixir into looping over every value in the collection and adding those values to a list. As it happens, that’s a side effect for one of the Enum functions! How convenient.

Thinking Backwards

The Elixir Enum.reverse function takes in a collection and reverses it, making it a list in the process. It uses reduce and everything. Look:

  def reverse(collection, tail) do
    reduce(collection, to_list(tail), fn(entry, acc) ->
      [entry|acc]
    end)
  end

That reduce statement goes item by item in the collection and keeps putting the next item at the head of a new list. See that line perfectly in the middle of the code? [entry|acc] is forming that list. acc is the list so far (the accumulator), and entry is the head of the remaining collection that gets stapled into front of the list.

When all is said and done, you have a list of elements that came out of a collection.

The new list it creates, however, is in the reverse of the order the collection started with, since you’re always placing the next element ahead of all the others so far.

For example, we’ll take a new Map (which is a collection), give it some values, and see what happens when we Enum.reverse it:

iex> m = Map.new()
%{}
iex> m = Map.put_new(m, :a, 1)
%{a: 1}
iex> m = Map.put_new(m, :b, 2)
%{a: 1, b: 2}
iex> m = Enum.reverse(m)
[b: 2, a: 1]

(Yes, I know the Enum.into trick, but for clarity’s sake, I’m spelling this out long hand. Maybe we’ll talk about Enum.into in the future…)

Look at that: You have a list now. You can tell because it’s surrounded in brackets and not a percent sign and curly braces. Or, you can tell programmatically:

iex> is_list(m)
true

Elixir’s List module doesn’t contain a reverse function. That’s because the List module only contains functions that wouldn’t make sense as an Enum function.

Up to this point, you’ve done two things at the same time: Converted a collection to a list, and reversed the order. That’s more than you wanted to do, though. You just wanted the conversion part, not the reversal part.

Since you just wanted to convert the collection to a list, you need to reverse the list back into its original order. Since it’s a list now, it makes sense to use the list version of reverse. On the off chance you skipped what I wrote two paragraphs ago: There is no Elixir List version of the reverse function.

We do the next best thing: we call directly out to Erlang’s.

That follows Elixir’s standards. Per the List module documentation:

A decision was taken to delegate most functions to Erlang’s standard library but follow Elixir’s convention of receiving the target (in this case, a list) as the first argument.

In any case, whether you do the second reversal with the Enum or :lists library, it will work, since we’re dealing with a list at that point, either way:

iex> Enum.reverse(%{a: 1, b: 2}) |> Enum.reverse
[a: 1, b: 2]
iex> Enum.reverse(%{a: 1, b: 2}) |> :lists.reverse
[a: 1, b: 2]

But the source code goes with the latter.

That’s 800 words to describe what is a one-liner in Core Elixir:

  def to_list(collection) do
    reverse(collection) |> :lists.reverse
  end

But we’re not done yet!

The Easier Way (Doesn’t Work)

05 August 2015: Major updates to change the example code in this section to be the same as the Maps-based example above. A new section has been added right afterwards, as well.

Wait, isn’t there a way to traipse across the collection one item at a time and not reverse the subsequent list?

What if we took the Enum.reverse function and rewrote it?

Here again is what the key part of it looks like today:

  def reverse(collection, tail) do
    reduce(collection, to_list(tail), fn(entry, acc) ->
      [entry|acc]
    end)
  end

As an exercise, try flip-flopping the [entry|acc] to give you [acc|entry]. Don’t you think that might stop the loop over the collection from returning results in reverse order?

Sorta, but the results are not pretty:

[[[] | {:a, 1}] | {:b, 2}]

If you tease that apart, the first item in the list (the head) is a list of an empty list with a tail of {:a, 1}.

The reverse/2 call is fronted by this reverse/1 call:

  def reverse(collection) do
    reverse(collection, [])
  end

The programmer just sends in a collection and doesn’t worry about it. Let the library worry about the accumulator. And, here, it’s seeded as an empty list, [].

Since we started the recursion with [] as the list, that starts as the head with {:a, 1} as the tail. Then, that whole construct becomes the new head, while the next value, {:b, 2} becomes the tail of a list where the head is an empty list and {:a, 1}.

If we extended the original map out a little bit, the new list looks even wonkier. (I rewrote the reverse function in a new module I named after myself. I’m not just an egomaniac, but I am very fast at typing my own name.)

iex> Augie.reverse(%{a: 1, b: 2, c: 3, d: 4, e: 5, f: 6}) 
[[[[[  [[] | {:a, 1}] | {:b, 2}] | {:c, 3}] | {:d, 4}] | {:e, 5}] | {:f, 6}]          

This is the head of that list:

[[[[[[] | {:a, 1}] | {:b, 2}] | {:c, 3}] | {:d, 4}] | {:e, 5}]

And the tail:

{:f, 6}

So I suppose you could do a weird kind of reverse recursion where you keep dealing with the tail and passing the head along. You’d process the list backwards. And since that would go against every bit of conventional wisdom in programming, it’s probably safe to ignore that. It also doesn’t bring us closer to converting a collection to a list without the second reverse.

This is how the new list is then constructed:

[ [] | {:a, 1} ]
[ [ [] | {:a, 1}] | {:b, 2}]
[ [ [ [] | {:a, 1}] | {:b, 2}] | {:c, 3}]
[ [ [ [ [] | {:a, 1}] | {:b, 2}] | {:c, 3}] | {:d, 4} ]
etc. 

I stopped there before you got dizzy from the brackets and pipes. I’ve spent my whole career avoiding Lisp. This is getting perilously too close.

Note the distinct lack of commas in there. You can’t flatten that list if you tried. And, yes, I tried. Because I’m thorough:

iex> list = [[[[[[[[] | {:a, 1}] | {:b, 2}] | {:c, 3}] | {:d, 4}] | {:e, 5}] | {:f, 6}], {:g, 7}]
iex> List.flatten(list)
** (FunctionClauseError) no function clause matching in :lists.do_flatten/2
    (stdlib) lists.erl:625: :lists.do_flatten(9, '\n')
    (stdlib) lists.erl:626: :lists.do_flatten/2

You break Erlang with that crazy request… Congratulations.

New on August 5, 2015 – You CAN Do It!

I received an email from a reader, Roman, who made a smart suggestion to help fix this. Instead of [ acc | entry ], try [ acc | [entry] ]. The system is expecting a list after the pipe “|”, so give it one.

The new reverse function looks like this:

defmodule Augie do
    def reverse(collection, tail \\ []) do
        Enum.reduce(collection, Enum.to_list(tail), fn(entry, acc) ->
            [acc|[entry]]
        end
    end
end

Look at the big difference that gives us in results, before and after:

iex> Augie.reverse(%{a: 1, b: 2, c: 3, d: 4, e: 5, f: 6}) # [ acc | entry ]
[[[[[[[] | {:a, 1}] | {:b, 2}] | {:c, 3}] | {:d, 4}] | {:e, 5}] | {:f, 6}]

iex> Augie.reverse(%{a: 1, b: 2, c: 3, d: 4, e: 5, f: 6})  # [ acc | [entry] ]
[[[[[[[], {:a, 1}], {:b, 2}], {:c, 3}], {:d, 4}], {:e, 5}], {:f, 6}]

All of those pipes for the head/tails separators have been replaced by glorious commas. Now you can flatten it:

iex> Augie.reverse(%{a: 1, b: 2, c: 3, d: 4, e: 5, f: 6}) |> List.flatten
[a: 1, b: 2, c: 3, d: 4, e: 5, f: 6]

Doesn’t that look prettier now? And you don’t need to reverse it anymore, either!

So why not go this way? It’s slower.

Just as a down and dirty test, I ran these two lines in iex to see how many milliseconds it would take to create a list with 100,000 entries and create a flat version of it. I used the Erlang :os.timestamp function to grab the time before and after each calculation. Even with the extra step, the current Enum.reverse clearly wins:

{_,_,c} = :os.timestamp; 1..100000 |> Augie.reverse |> List.flatten;  {_, _, c1} = :os.timestamp; IO.puts c1 - c;

{_,_,c} = :os.timestamp; 1..100000 |> Enum.reverse |> List.flatten |> :lists.reverse; {_, _, c1} = :os.timestamp; IO.puts c1 - c;

The results are never identical, but the range for the current Enum.reverse solution sits somewhere in the 20,000 – 23,000 microseconds range, while the Augie.reverse solution ranges between 24,000 and 30,000 microseconds.

Thanks again to Roman for pointing this out. It’s a good reminder to keep more proper lists…

The Anti-Climax

This is not an essay that ends with a brilliant pull request to convert a collection into a list in one less step with moderate gains in speed and performance.

I don’t have an answer to this. Honestly, this isn’t a problem that needs a solution. That’s not why I started writing this one. This is about finding out how Elixir works behind the scenes. What are the quirks of the language? Where does it hand things off to Erlang? What would be useful for you to know as a programmer?

Sometimes, it’s a winding path that goes in circles as we blindly grope for an answer. Or an explanation.

It’s that little kid’s tendency to ask “Why?” constantly that drives this series.

Even when we hit the bottom without a clickbait twist to put in the headline.

It’s just plain old Elixir. And it’s lots of fun.

Did you ever think the way to convert one data type to another is to use two functions that do completely unrelated things to the task at hand, but do the same thing logistically in two different languages? Crazy, right?

Post Script: Did You Know?

:lists.reverse has two different versions in Erlang. You have your pick between sending one argument or two. What’s the difference? The first argument in both cases is the list you’re looking to turn around.

When the arity is 2, though, the second argument is another list that will be added to the end of your reversed list, just in case you need that kind of thing:

iex> :lists.reverse([1,2,3,4,5])
[5, 4, 3, 2, 1]
iex> :lists.reverse([1,2,3,4,5],[100,200,300])
[5, 4, 3, 2, 1, 100, 200, 300]

Note that the second list doesn’t get reversed.

If your second argument isn’t a list, it becomes the list’s new tail:

iex> :lists.reverse([1,2,3,4,5],1001)
[5, 4, 3, 2, 1 | 1001]

(Again, don’t try to run flatten on that. It doesn’t work that way. Weren’t you paying attention 500 words ago?!?)

Elixir has the same thing. When you run Enum.reverse against a collection, it actually calls Enum.reverse/2, with an empty list as the the tail.

But if you call Enum.reverse/2 on purpose with some list to add to the end of a collection, then you’re actually using an optimization in the language. Elixir could just reverse the collection and then append the tail to it like this:

Enum.concat(Enum.reverse(collection), tail)

Instead, Elixir pulls out Yet Another Reduce function:

reduce(collection, to_list(tail), fn(entry, acc) ->
      [entry|acc]
    end)

Really, is there anything that reduce can’t do?

Summing it all up

If you feel the need to convert a collection to a list, just reverse it. Twice. Once in Elixir, once in Erlang.

For extra credit, pin a tail on it afterwards.

If you have any comments, questions, complaints, criticisms, or corrections, catch me on Twitter, @AugieDB. Or make a pull request on Github! That Twitter handle and Github ID is the same as my GMail account, if you want to deal with it more quietly. I want these articles to be factually correct and will update them as necessary.

(8)