Fix Elixir

Elixir is too complicated.

Elixir needs to be fixed.

I think we can all agree with these notions. Have you seen the Elixir-lang mailing list lately?

Having used it for a couple or three weeks, you realize it requires far too much typing — that is, pounding away on the keyboard with your fingers. What it truly lacks is typing — the kind that pre-defines a variable’s content type. We’re going to talk about fixing that shortly.

In this article, I hope to make a few suggestions to fix Elixir to make it better for future generations.

Steel Workers on a construction site to fix Elixir
If you need a change in your infrastructure, try a steel worker. Or three. They can fix Elixir for you.

Elixir Needs Types

For starters, we need to differentiate between data types. But we need to keep this simple so people will use it. I propose three types: Strings, Associative Arrays, and Lists. For simplicity’s sake, we’ll define numbers as strings, also. This doesn’t make sense, but it’s convenient and quick. Also, these types are more about the structure of the data than the contents of them. A string, thus, is a single unit of data without any sort of structure around it.

Current maps are far too complicated with far too many ways to do the same thing. Aren’t we all tired of looking for a value for a particular key by typing in this mess:

iex> example_value = Map.fetch(example_map, example_key)

::yawn::

Why not just have:

iex> example_value = example_map{example_key}

It’s so much more direct.

Fixing Strings

Strings are too complicated. I think we’re all tired of typing in “Hello” and getting back a result like:

iex> "Hello"
[67, 63, 70, 70, 72]

(No, it doesn’t actually do that in iex. This is a dramatization, like that time Dateline NBC blew up pick-up trucks, OK?)

Why not just learn Assembler at this point?

Let’s Unicode all the things and make them all just plain strings, composed of lists of characters.

Separating the Data Types

We need something short and pithy that gives us an easy visual reference to a variable’s data type. Because, right now, in Elixir, map_example could just as easily be a valid map, string, integer, or process id. MADNESS! We all know you can’t trust programmers to name things.

The solution is simple. Use a sigil. Elixir has a few leftover, but I say we just need to go all the way and use whatever we can. Let’s stick with an “S” for a string, an “A” for a list since it’s so similar to an array, and, er, we’ll use an “H” for an Associate Array, which is the same thing as a Hash.

Now, it might be confusing to use letters, so let’s use special characters. The “S” looks a lot like a “$”. The “A” can be an “@”. The “H” can be “%”, which is a bit of a stretch, but I’m running out of options here already. Prefixing with “|-|” wouldn’t work as it looks a bit like a pipe, and we might need that for all the multitudes of changes the pipeline operator needs. (That’ll be a future post.)

This also makes string interpolations simpler. We don’t need that ridiculous #{} muddying up our beautiful list of Unicode characters anymore. "Hello #{variable_for_world}" is now the much simpler "Hello $variable_for_world". Plus, your left pinky finger hits shift two less times now.

More Minor Changes

Using “<>” to concatenate strings is the most baffling design decision in the history of computer programming. They don’t do that in C, so why would any language do anything different? Everything should be based on C, just like every command line should look like it comes out of a Unix box. Even Microsoft is using Gnu BASH now…

We’re not going to use “+” for that, because we’re better than Javascript and we’re working on clearing up the differences between strings and integers here. I don’t want to overload the operators. Just the variables.

We’ll pick something sane here. In fact, I’ll give you a couple of options: “.” and “,”. Those make sense.

Tuples are dead. They are replaced with arrays/lists. If you like {:ok, “All is good”}, you’re going to love a two element array with those values. If you want to know if the function was successful, just ask for the first element of your list:

iex> @results = ("ok", "All is good")
iex> List.first(@results)
"ok"

Or, just plain old:

iex> @results[0]
"ok"

Everything in life should be zero based. You can’t spell foist without an 0, amIrite?

Summing It Up (So Far)

In these three short steps, we’ve already fixed strings, typing, and visual identification of types. We’ve cleaned up Elixir for the programmer, not touching on OTP at all.

I’ll be submitting pull requests on all of these items momentarily.

M. Night Shyamalan, The Programmer

Congratulations, you’ve just fixed Elixir by turning it into Perl.

Kinda almost like Perl6, actually. Perl has a couple of web frameworks, too, you know. Perl Dancer and Mojolicious can help.

You’re welcome.

I’m on the Elixir Fountain Podcast!

Elixir Drop records a podcast with Elixir Fountain

Recorded on Wednesday, today sees the release of my conversation with the ever-amiable Johnny Winn on the topic of Elixir. And Phoenix. And Perl. And Ruby. I may have even mentioned Haskell at some point in there. We cover a lot of stuff. (And then we talked comics, naturally.)

It was only afterwards that I realized I’m a complete blabbermouth and didn’t go too deep on any one topic and likely will need to explain myself at length on a couple of things after the fact.

In other words: More blogging material! =)

But if you have any questions, leave a comment or send me a tweet, and I might just make a post here out of it. Thanks!

Core Elixir: List Module Misc.

Core Elixir looks into the List module and finds Erlang wrappers, test-driven development, and the joy of tuples.

We’ll be dipping a toe into the List module waters today…1

Elixir Drop dips his toe into the water

In researching modules and functions for Core Elixir, I often come across dead ends. These are the functions that don’t go very deep and that aren’t very interesting for a Core Elixir article.

The #1 cause of this is Erlang functions that are called directly by Elixir just to switch the parameters around so the list or collection comes first. Thanks, Pipeline Operator.

For example:

List.duplicate/2 takes a list and a non negative integer that describes how many times the caller wants to repeat that list. So, to steal the example straight out of the documentation:


iex> List.duplicate([1,2], 2)
[[1,2], [1,2]]

You know how that works? List.duplicate(list, multiplier) calls Erlang’s :lists.duplicate(multiplier, list).

For Core Elixir purposes, that’s a dead end.

That seems simple, but Elixir goes one step simpler:

List.flatten/1 directly calls :lists.flatten/1. It doesn’t even need to swap the order of the parameters around, since there’s only one.

But, wait, you say, there’s another version of List.flatten that has an arity of 2! The second parameter is a value that gets slapped on at the end of the newly flattened list! Does that extra parameter make any difference?

No. List.flatten(list, tail) calls :lists.flatten(list, tail).

The list still comes first. Erlang is inconsistent compared to Elixir in this way.

Purely Functional Looks Test Driven

I like this bit of code, to define List.last, the biggest tongue twister in Elixir’s code base:


  def last([]),    do: nil
  def last([h]),   do: h
  def last([_|t]), do: last(t)

It feels like the way you’d write something when you’re test driving it, doesn’t it? Let’s work through it:

Make it work for the simplest possible case — an empty list. In that case, there’s no making heads or tails of it (ha!), so you return a nil.

OK, so what happens when there’s only one item in the list? The answer isn’t nil anymore. You need to write that case next. The single value is the head AND the last item in the list.

And when you have two or more items? You look for the last value in the list by going through the whole list until you get to the end, recursively. Keep stripping off the head from the list and look for the last of the rest.

As a bonus, the second case now makes more sense. Why is the single value the last value? Because if you started with two or more values, this is the base case. A list with one value would effectively be the most tail item in the list. Pretty nifty.

It’s logic like that that makes me like functional programming. It just feels smart, somehow.

Of Tuples and Keyfinds

List.keyfind/3 strikes me as funny, mostly because it isn’t what I thought it was at first. And then I felt stupid for missing it. Let me assume for the moment that we’re all being honest with ourselves and that all of our minds blank at something ridiculously basic every once in a while. Thanks.

`List.keyfind/3′ calls on the similarly named :lists.keyfind/3, like so:


  def keyfind(list, key, position, default \\ nil) do
    :lists.keyfind(key, position + 1, list) || default
  end

The function takes a list of tuples ( [a: 1, b: 2, c: 3] ), a value that it’s looking for, and which position in each tuple it’s looking for that value.

Here’s the trick to remember: A tuple can have not just two values, but also more or less than 2 values. A tuple is just an ordered list of things. Curly braces surround the list, and the values inside are separated by commas.

Valid tuples:
* {1}
* {1,2}
* {1,2,3}
* {1,2,3,4}

Et cetera. Well, you probably want to use a map after that, instead, but it’s your program. Do what you want.

When you have a two element tuple, it’s not a key and a value. It’s just two values.

I had forgotten that in a list with keywords, the curly braces can be dropped.


iex> t1 = [a: 1, b: 2]
[a: 1, b: 2]

iex> List.first(t1)
{:a, 1}

See? That’s a list with two tuples. Each is kinda sorta key and value, but not really. It’s position 0 and position 1 in a two element tuple. [a: 1, b: 2] is the same thing as [{:a, 1}, {:b, 1}]. It just looks a lot cleaner, so long as you remember that little trick. I’m a little out of practice using it, I guess.

In List.keyfind, the position parameter refers to which value you’re looking at in each tuple, the zeroth or the first, in these cases. For what I think of as the key, I’d always use 0. For what is traditionally the value, it’s a 1.

If I wanted to make a ridiculous pull request to the Elixir core language, I’d offer up positions of :key and :value that are equal to 0 and 1 in two element tuples. (Pro tip: Don’t do it. It’s a silly idea.)

Finally, the last parameter sent to List.keyfind is the default value to return if nothing is found. “Nil” is the default default.

There’s one other gotcha to point out here in this code. Erlang likes to be a 1-based language, while Elixir likes the 0-based values. List.keyfind calls :lists.keyfind, moving the list to the front and adding 1 to the position value to make up for that off-by-one position.

You may all quibble amongst yourself over whether everything should start with 0 or 1. I’ll be over here indenting my code with the tab key just to annoy you.

I’m just kidding. I would never do that. I program in a non fixed width font, so indentations never looks right no matter which keys I use.

Sorry, I’m kidding again. I don’t use a non fixed width font. That’s madness! But this switch to a Dvorak keyboards means d;jdsfo dsmowm ojf of s;ljfy. !

(17)

  1. This might just be the best Elixir Drop drawing I’ve done yet. Just sayin’

ElixirDaze 2016

I will not be attending ElixirDaze, sorry to say. I have other commitments that weekend, including a birthday of my own with far too large a number to discuss in public. ;–)

But I did design the logo for the conference, so there’s that.

Core Elixir: Path.relative_to/2

Sit back, this one has tangents on top of tangents.

(Don’t I say that every time?)

(If you’re tangential to a tangent, are you asymptotic?)

According to the documentation, Path.relative_to/2 does this:

Returns the given path relative to the given from path.
In other words, it tries to strip the from prefix from path.

Sounds like fun. Even better, it’s recursive. Eventually.

This function does not interact with the filesystem. It does it all via string manipulation, without any regexes.

Should the path not contain the from, it returns the full path back to the user, like so:

      iex> Path.relative_to("/usr/local/foo", "/etc")
      "/usr/local/foo"

Otherwise, it returns the relative path:

      iex> Path.relative_to("/usr/local/foo", "/usr/local")
      "foo"
      iex> Path.relative_to("/usr/local/foo", "/")
      "usr/local/foo"

(I stole all those examples from the source code documentation.)

But How Does It Work?

We start here:

  def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(split(path), split(from), path)
  end

First, note that this is NOT an Erlang wrapper. That always helps when writing Core Elixir. I’m selfish like that.

The first thing Path.relative_to/2 does is translate the path argument to be a string. Why’s that? Looking at the top of the Path source code’s documentation, we see this:

   The functions in this module may receive a char data as
  argument (i.e. a string or a list of characters / string)
  and will always return a string (encoded in UTF-8).

It leaves the input options open, but standardizes on one particular output type. That’s smart. That gives the user maximal flexibility on input with a reliable return. It lets the language do the work behind the scenes.

So how does IO.chardata_to_string do its thing? Smells like a tangent…

IO.chardata_to_string/1

First, it tests if it’s already a string, and returns it unchanged if it is. Thanks, guard clause!

  def chardata_to_string(string) when is_binary(string) do
    string
  end

If, on the other hand, it’s a list of characters, it goes to an Erlang command, :unicode.characters_to_binary/1 to handle the heavy lifting. It returns the resulting string if all worked according to plan. If not, it passes back an appropriate error:

  def chardata_to_string(list) when is_list(list) do
    case :unicode.characters_to_binary(list) do
      result when is_binary(result) ->
        result

      {:error, encoded, rest} ->
        raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :invalid

      {:incomplete, encoded, rest} ->
        raise UnicodeConversionError, encoded: encoded, rest: rest, kind: :incomplete
    end
  end

I think that’s fairly straightforward. You get to use a case statement instead of a three way if-then-else, so it looks nicer, too.

Yes, this part of the function is an Erlang wrapper, but it has a lot of good error-checking afterwards added to it.

You might wonder what happens when the argument sent in is neither a list nor a string. Fair enough.

You get an error:

iex> c = HashDict.new
#HashDict<[]>
iex> IO.chardata_to_string(c)
** (FunctionClauseError) no function clause matching in IO.chardata_to_string/1
    (elixir) lib/io.ex:329: IO.chardata_to_string(#HashDict<[]>)

This makes sense. Let it crash and all. Give the code the two ways it might work, and then let it fail in any other. The programmer is smart enough to take the hint from there.

Back to the Function

Where were we again? Ah, here:

  def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(split(path), split(from), path)
  end

path is now a string version of whatever was passed in, whether it was a string already or just a list of characters.

The next line (#3 above) splits apart the two paths that were passed in into their own lists of directories.

“What’s that?” you say. “How do we split a path?”

Glad you asked. Please join me on our next tangent.

Path.split

Elsewhere in the Path module, the split function takes in a path and breaks it apart into a list of its elements. It uses a forward or backward slash as the separator, becuase it knows if you’re on Windows and adjusts itself accordingly. (Pro tip: It’s always a good idea to use the language’s built-in functions for dealing with paths for just that reason. It will allow your program to run anywhere.)

Here’s Path.split/1:

 # Work around a bug in Erlang on UNIX
  def split(""), do: []

Wait, read that comment again. UNIX is the problem child here and not Windows? Hunh. Well, now I’ve seen everything.

But what if the path isn’t empty? It’s not so easy. Wait, actually, it is:

  def split(path) do
    FN.split(IO.chardata_to_string(path))
  end

FN is an alias for Erlang’s :filename module. We’re letting Erlang do the work.

I bet some of you are screaming “DRY” right about now. We already talked about using IO.chardata_to_string earlier in the Path.relative_to function before we got to the line where we called Path.split. Why now just let Path.split take care of that?

While it is a duplication of effort, we can’t just go rewriting Core Elixir for this one function’s sake. And, hey, this is functional programming: We can perform the same function on a given piece of data a million times and always get back the same answer. Also, the function will see that the path has already been stringified and pass it right back immediately.

I somehow doubt running this twice will result in a drag on the program’s speediness.

When all is said and done, we end up with a path that’s been split into a list of its directories, like in this example:

iex> File.cwd! |> Path.split
["/", "home", "vagrant", "augiedb", "elixir"]

Note that the first part is “/” to indicate that this isn’t a relative path to where we are at the moment. That’s not some kind of indicator of what the directory split character is or anything. You’re welcome to use Path.type to see if the path is relative or absolute, but that’s a topic for another time…

Also, I used cwd! in this example because plain old cwd returns one of those {:ok, blah_blah_blah}' tuples, which can't be pipelined intoPath.split`.

Are we ready yet to get to the recursive part of Path.relative_to? Yes. Yes, we are:

Bring On the Recursion!

This should look familiar by now:

  def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(split(path), split(from), path)
  end

Relative_to/2 calls on Relative_to/3 to do all the work. The three arguments are (from left to right) a list of elements in the path that we’re looking at, a list of elements in the path that we’re starting from, and the initial path itself in plain string text. We’ll see how that’s used shortly.

The bulk of the work, assuming a long directory path, comes in the first pattern match on this function, and it’s a little piece of pattern matching genius:

  defp relative_to([h|t1], [h|t2], original) do
    relative_to(t1, t2, original)
  end

The function pops the head off the list from both paths. If they are the same (thus, the two “H”s in the parameters), we toss out that first item and do this again with the rest of the lists. So “/usr/a” and “/usr/” would both match “/” as the heads and the function would get called again against “[usr/a]” and “[usr]”. It would run against those values and pull out the “use” items. Then it would call itself with “[a]” and “[]”.

In that case, where the two heads of the lists are now different (and the second one is already empty), it doesn’t match this pattern and it bounces down to this pattern match:

 defp relative_to([_|_] = l1, [], _original) do
    join(l1)
  end

If the second value is an empty list, then take whatever is left from the first value and return it. There’s your answer.

[|] would match to [a|[]]. In the second line, that list, as a whole, would be passed over to `Path.join’ to — well, become the rest of the directory after that shared portion.

If the difference were greater and we were looking at something like “/usr/a/b/c/d”, then the match would leave you with “a”, “b”, “c”, “d” and you’d join that list back up with Path.join to give you (in Unix world) “a/b/c/d”.

I bet you’re wondering how Path.join works, aren’t you? Sure, but that’s a lesson for another day.

Path.relative_to

So far, we’ve covered two out of the three pattern matches that the three-arity version of Path.relative_to can handle. We’ve seen what it can do when the first item in the path matches (it goes all recursive), and when the path has more items left than the from (time to return results), but what about when there are no lists left? It’s just one value for the from and for the path? What happens when the two paths are the same, or the ‘from’ path is longer than the path?

If we’ve come that far, we can’t pull one path out from the other. Think about it: how can we pull a one item value out from a one item value? It won’t work. We default to returning the original path, as was given to us earlier in the processs.

  defp relative_to(_, _, original) do
    original
  end

Since one can’t be calculated from the other, we pass back the original value. For example: /usr/a can’t be derived from /usr/b. This function would run when passed the values for ‘a’ and ‘b’. If both paths are the same length, there’s no way one will branch off from the other.

How I Tested This All

Trying to wrap my head around the pattern matching in Path.relative_to/3 meant extracting the code for that function and testing it separately. I created a file named rel.exs and copied the function’s code into it. I did some good old fashioned “IO.Puts Debugging” on this bad boy:

defmodule Augie do

 def relative_to(path, from) do
    path = IO.chardata_to_string(path)
    relative_to(Path.split(path), Path.split(from), path)
  end

  defp relative_to([h|t1], [h|t2], original) do
    IO.puts("first pm")
    relative_to(t1, t2, original)
  end

  defp relative_to([_|_] = l1, [], _original) do
    IO.puts("second pm")
    Path.join(l1)
  end

  defp relative_to(a, b, original) do
    IO.puts("third pm")
    IO.puts "-- path param: #{a}"
    IO.puts "-- from param: #{b}"
    original
  end

end

I name the module after myself, again, because I’m an egomaniac and I know it won’t have namespace issues with core elixir. (I doubt a pull request to create an “Augie” module would get past Jose…) I could have named it “Patha” or something, but I’m pretty fast with typing my own name.

I tested in iex, calling the script as a parameter to iex:

vagrant@precise32:~/augiedb/elixir$ iex rel.exs
Erlang/OTP 18 [erts-7.0] [source] [async-threads:10] [kernel-poll:false]

Interactive Elixir (1.1.0-dev) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>

And then I ran every possible scenario through it and saw where the debug statements came from:

iex(1)> Augie.relative_to('/opt', '/opt/a')
first pm
first pm
third pm
-- path param:
-- from param: a
"/opt"

iex(2)> Augie.relative_to('/opt/a', '/opt')
first pm
first pm
second pm
"a"

iex(3)> Augie.relative_to('/opt', '/opt')
first pm
first pm
third pm
-- path param:
-- from param:
"/opt"

iex(4)> Augie.relative_to('', '')
third pm
-- path param:
-- from param:
""

iex(5)> Augie.relative_to('', '/opt')
third pm
-- path param:
-- from param: /opt
""
iex(6)> Augie.relative_to('/opt', '')
second pm
"/opt"
iex(7)>

It’s also somewhat magical when you type your name, hit tab, and a function is autocompleted after your name.

(16)