Understanding Ruby's #[]

So I was doing the Ruby koans with a view to deepen my understanding of Ruby, when I stumbled upon this:

array = [:peanut, :butter, :and, :jelly]
assert_equal  [], array[4,0]
assert_equal nil, array[5,0]

Given that 4 is just as out of bound as 5, I wouldn’t understand why Ruby would return something different for both.

So I crawled Ruby’s source code until I eventually found my answer in the definition of rb_ary_subseq().

VALUE
rb_ary_subseq(VALUE ary, long beg, long len)
{
    ...
    long alen = RARRAY_LEN(ary);

    if (beg > alen) return Qnil;
    if (beg < 0 || len < 0) return Qnil;
    
    ...
}

By standard, #[start,length] will return nil if start is bigger than the array’s length. That still doesn't explain why/when an empty list is returned; we'll explain that later.

In our example, our array is of size 4. Which means that by standard, it is perfectly normal that array[5,0] returns nil.

Ok.. but, « by standard®? »

Yes, you can refer to the Ruby spec at ruby/spec. Anyway we’re going to explore this in the following sections by manipulating a small array of symbols.

a = %i|how are you doing|

Array slicing in Ruby

In Ruby, the #[] operator is overloaded: you can use it to query a single element from an array using the #[index] form..

a[0] # => :how
a[3] # => :doing
a[4] # => nil
a[5] # => nil
a[6] # => nil

Or as many as you wish in a single query, this time using the #[start, length] form:

a[0,4] # => [:how, :are, :you, :doing]
This is called slicing.
a[0,4] # => [:how, :are, :you, :doing]
a[1,3] # => [:are, :you, :doing]
a[2,2] # => [:you, :doing]
a[3,1] # => [:doing]
a[4,0] # => [] <---- WHY?
a[5,0] # => nil
a[6,0] # => nil

There might be an inconstency between #[start, length] and #[index].. Wait no, it wouldn't make sense for #[index] to return an array; empty or not. It returns nil because it cannot make up a value for a[4]. While it is perfectly reasonable for #[start, length] to return an empty list and only return nil when it legitimately can't make up a value.

Basically, querying zero elements from after the last one results in an empty list. The « Why » is purely a matter of taste since this behavior has nothing to do with the way array’s are implemented: in Ruby’s source code, you’ll find a line which can pretty much be translated to « If we’re starting after the last element, return Array.new »

You can refer yourself to the Rubinius implementation of #[].

More troubling then:

a[4,10] # => []

Shouldn't it raise an IndexError or something? No, under the hood, you can think that Ruby will fetch 10 elements maximum. It means that in this case, a[4,10] is equivalent to a[4,0].

In the same fashion:

a[0,100] # => [:how, :are, :you, :doing]

Bonus

In rb_ary_subseq() (See code listing at beginning), we find the following code:

if (beg < 0 || ...) return Qnil;

Then why does array[-1,0] return [] instead of nil?

Truth is that rb_ary_subseq() isn’t the first function to be called. In our case, it is called by another function, namely rb_ary_aref2(), which automatically adds array.length to beg when it turns out beg is a negative number.

It allows us to do something like array[-1,1] and obtain the last element as a result. (array[-1,1] would be, in our case, equivalent to array[-1+4,1].)

You can refer yourself to the Rubinius implementation of #[].

Written by Tanguy Andreani
on 18th of July

Comments are currently disabled on this post due to a bug with Remarkbox. Please come back later ;(