The Well Grounded Rubyist - Part 9

Collection and container objects

In Ruby, two built-in classes dominate the container-object landscape:

  • 1) Arrays
  • 2) Hashes

Then there is

  • 3) Range, a hybrid that works partly as a Boolean filter
  • 4) Set, not built-in. But it's a class from the standard library

Comparison: Arrays and hashes

ordered

  • array (yes)
  • hashes (yes, in newer Ruby versions)

  • hashes remember the order in which their keys were inserted. That's the order in which the hash replays itself when you iterate through the pairs in it or print a string representation of it to the screen

  • hashes are sometimes called dictionaries or associative arrays in other languages

  • if you do use consecutive integers as hash keys, arrays and hashes start behaving similarly when you do lookups

array = ["ruby", "diamond", "emerald"]  
hash = { 0 => "ruby", 1 => "diamond", 2 => "emerald" }  
puts array[0]    # => ruby  
puts hash[0]     # => ruby  
  • you can use the with_index method to step through a hash and have its key/value pairs counted off, consecutively
hash = { "red" => "ruby", "white" => "diamond", "green" => "emerald" }  
hash.each.with_index { |(key, value), i| puts "#{i}: #{key} / #{value}" }  

Note:
The parentheses in the block parameters (key,value) serve to split apart an array. Each key/value pair comes at the block as an array of two elements. If the parameters were key,value,i, then the parameter key would end up bound to the entire [key,value] array; value would be bound to the index; and i would be nil. That’s obviously not what you want. The paren- thetical grouping of (key,value) is a signal that you want the array to be dis- tributed across those two parameters, element by element.

  • conversions of various kinds between arrays and hashes are common

Collection handling with arrays

  • how to insert, retrieve, and remove array elements
  • combining arrays with each other
  • transforming arrays (e.g. flattening a nested array into a one-dimensional array
  • querying arrays as to their properties and state
Creating a new array
  • you can create an array in 4 ways:
    • Array.new method
    • literal array constructor (square brackets)
    • top-level method called Array
    • special %w{...} and %i{...} notations
Array.new
  • lets you specify the size of the array if you wish
# one argument
Array.new(3) # => [nil, nil, nil]

#two arguments
Array.new(3, "hello") # => ["hello", "hello", "hello"]  
  • you can provide a code block to Array.new. In this case, the array elements are initialised by repeated callbacks to the block
n = 0  
Array.new(3) { n += 1; n * 10 }  
  • when you initialise multiple elements of an array using a second argument to Array.new - like Array.new(3, "abc" - all the elements of the array are initialised to the same object
  • so if you do this: a = Array.new(3,"abc"); a[0] << "def"; puts a[1], you will find that all the elements are abcdef, even though you appended "def" to the first element
  • to create an array that inserts a different "abc" string into each slot, you do: Array.new(3) { "abc" } instead
Literal array constructor
  • pre-initialising arrays isn't always necessary because your arrays grow as you add elements to them
  • you can create an array by using he literal array constructor [] (square brackets)
a = []  
  • when you create an array with the literal constructor, you can put objects into the array at the same time:
a = [ 1, 2, "three", 4, [] ]  
  • square brackets can mean a lot of things in Ruby:
    • array construction
    • array indexing (as well as string and hash indexing)
    • character classes in regular expressions
    • delimiters in %q[]-style string notation
    • calling of an anonymous function
The Array method
  • The Array method creates an array from its single argument
  • if the argument object has a to_ary method, then `Array calls that method on the object to generate an array
  • if there's no to_ary method, it tries to call to_a
  • if to_a isn't defined either, Array wraos the object in an array and returns that Array#[]=

  • the Array method is constrained by the need for there to be a to_ary or to_a method available

%w and %W array constructors
  • the %w operator automatically generates an array of strings from the space-separated strings you put inside it []

  • if any strings in the list contains a whitespace character, you need to escape if with a backslash slice

  • when using %w, the strings in the list are parsed as single-quoted strings
  • to use double-quoted strings, use %W instead
%i and %I array constructors
  • you can use %i and %I to create arrays of symbols []
The try_convert family of methods
  • several built-in Ruby classes each have a class method called try_convert
  • try_convert takes one argument
  • try_convert looks for a conversion method on the argument object. If the method exists, it gets called; if not try_convert returns nil
  • if the conversion method returns an object of a class other than the class to which conversion is being attempted, it's a fatal error (TypeError)
implementing class
required conversion method
with_index
hash = { "red" => "ruby", "white" => "diamond", "green" => "emerald" }  
hash.each.with_index { |(key, value), i| puts "#{i}: #{key} / #{value}" }  
Array.new Array
%w{...} %i{...}
# one argument
Array.new(3) # => [nil, nil, nil]

#two arguments
Array.new(3, "hello") # => ["hello", "hello", "hello"]  
Array.new
n = 0  
Array.new(3) { n += 1; n * 10 }  
Array.new
  • Example slice
Inserting, retrieving, and removing array elements
  • the general technique for inserting one or more items into an array is the setter method []=
  • the setter method looks odd, but the syntactic sugar allows you to do this: slice!
  • when you have objects in an array, you can retrieve those objects by using the [] method, which is the getter equivalent of the []= setter method values_at
Setting or getting more than one array element at a time
  • if you give either Array#[] or Array#[]= a second argument, it is treated as a length. i.e. a number of elements to set or retrieve
  • when retrieving more than one element, the results are returned inside a new array

values_at - to is a synonym for the [] method called slice - like the [] method, slice takes 2 arguments: (1) a starting index and (2) optional length - there is also a corresponding bang method called slice!, which removes the sliced items permanently from the arry

  • another method for extracting multiple array elements i the values_at method
  • values_at takes one or more arguments representing indexes and returns an array consisting of the values stored at those indexes in the receiver array

unshift

Special methods for manipulating the beginnings and ends of arrays
  • you can use unshift to add an object to the beginning of an array push
  • to add an element to the end of an array, you can use push <<
  • the method << also places an object on the end of the array push
  • the difference between push and << is that push can take more than one argument <<

  • the shift method works opposite to unshift. shift removes one object from the beginning of the array

  • shift can remove more than one item at a time. You can do this by passing an integer representing how many elements to remove
  • it permanently changes the original (receiver) array push

  • the pop method works opposite to push. pop removes one object from the end of the array

  • pop can remove more than one item at a time. You can do this by passing an integer representing how many elements to remove
  • it permanently changes the original (receiver) array shift
Combining arrays with other arrays
  • remember that in every case, even though you're reading with two (or more) arrays, one array is always the receiver. The other arrays involved in the operation are arguments to the method

  • to add contents of one array to another, you can use concat

  • concat permanently changes the contents of the receiver unshift

  • concat differs in an important way from push

  • push will push the second array as an element, not the elements within the array
  • if you substitute push in place of the concat method for the above operation, you get: shift

  • if you want to combine two arrays into a third, you can use the + operator

  • the receiver, a, will not be changed by the + shift
  • you can replace contents of an array using replace
  • the original contents of the receiver array object will be replaced with the contents of the argument array
  • the receiver array object is still the same object
  • replace is NOT the same as reassignmeng of a variable. Reassignment causes a variable to refer to a completely different array object than the first one pop
  • when a variable is reassigned, any bonds are broken. This happens because the reassignment will be pointing to a completely different array object. For example: push
Array transformations
  • a useful transformation is flatten, which does an un-nesting of inner arrays
  • you can specify how many levels of flattening you want, the default being the full un-nesting
  • flatten doesn't change the receiver array
  • there is also an in-place flatten!, which permanently changes the receiver array

pop

  • the reverse transformation method reverses the elements of an array, much like the String#reverse does for characters in a string
  • reverse doesn't change the receiver array
  • so there is a corresponding bang method version, reverse!, that permanently changes the receiver array pop

  • the join transformation method is a common way to turn an array into a string

  • that is to say, the return value of the join method is not an array, but a string
  • join takes an optional argument, if given, the argument is placed between each pair of elements. The argument is the delimiter or separator concat
  • another way of joining arrays is by using the * method
  • it looks like multiplying an array by a string, but you're actually performing a join concat

  • you can also transform an array with uniq

  • the uniq transformation method gives you a new array, consisting of the elements of the original with all duplicates removed
  • duplicate status is determined by testing pairs of elements with ==
  • any two elements for which the == returns true are considered duplicates of each other
  • the uniq method also has a corresponding bang equivalent, uniq!, which permanently removes duplicates from the original receiver array concat

  • sometimes you have an array that includes occurances of nil. You can use the compact transformation method to get rid of them push

Array querying
  • you can ask arrays for information about themselves, such as:
method name/sample call
meaning
Array.new(3, "abc" a.k.a a = Array.new(3,"abc"); a[0] << "def"; puts a[1] Number of elements in an array
abcdef True if Array.new(3) { "abc" } is an empty array; false if it has any elements
[] True if the array includes items; false otherwise
a = []  
Number of occurances of
a = [ 1, 2, "three", 4, [] ]  
in array
%q[] First Array elements of an array
Array Last to_ary elements of an array
to_ary to_a random elements from array

Hashes

  • hashes let you perform lookup operations based on keys
  • you can also perform more complex filtering and selection operations
  • hashes remember the insertion order of their keys, and observe that order when you iterate over them or examine them
Creating a new hash
  • there are 4 ways to create a hash
    • 1) with the literal constructor (curly braces)
    • 2) with the Hash.new method
    • 3) with the Hash.[] method (a sqaure-bracket class method of Hash)
    • 4) with the top-level method whose name is Hash
Creating a literal hash
  • the literal hash constructor is convenient when you have values you wish to hash that aren't going to change. State abbreviations are a good example push
The Hash.new constructor
  • Hash.new creates an empty hash
  • if you provide an argument to Hash.new, it is treated as the default value for nonexistent hash keys
The Hash.[] class method
  • the class method [] takes a comma-separated llist of items and, assuming there's an even number of arguments, treats them as alternating keys and values
  • if you provide an odd number of arguments, a fatal error is raised
  • you can pass in an array of arrays, where each subarray consists of two elements push
The top-level Hash method
  • if you called the Hash method with an empty array ([ ]) or nil, it returns an empty hash. Otherwise, it calls to_hash on its single argument
  • if the argument doesn't have a to_hash method, a fatal error (TypeError) is raised
Inserting, retrieving, and removing hash pairs
Adding a key/value pair to a hash
  • you can use the []= method plus syntactic sugar to add an key/value pair to a hash. Much like adding an element to an array
  • so instead of doing this: concat you can just do:
    +
  • you can also use the synonymous method store to perform this operation
  • store takes ywo arguments (a key and a value) a
  • although hash values do not have to be unique (you can assign the same vlaue to two or more keys), you cannot have duplicaet keys
  • if you ass a key/value pair to a hash that already has an entry for the key you're adding, the old entry is overwritten
  • if you reassign to a given hash key, that key still maintains its place in the insertion order of the hash +
Retrieving values from a hash
  • the most common way to retrieve hash values is this method: [] replace
  • using a hash key is much like indexing an array, except that the index (the key) cab be anything, whereas in an array, it's always an integer
  • hashes also have a fetch method, which gives you an alternative way of retrieving values by key
  • fetch differs from [] in the way it behaves when you ask it to look up a nonexistent key. fetch raises an exception, whereas [] gives you either nil or a default you've specified replace

  • if you provide a second argument to fetch, that argument will be returned instead of an exception being raised if the key isn't found flatten

  • you can retrieve multiple values with one operation, using values_at flatten

Specifying default hash values and behaviour
  • by default, when you ask a hash for a value corresponding to a nonexistent key, you get nil flatten!

  • but you can specify a different value by supplying an argument to Hash.new

  • you can also set the default key to an already existing hash with the default method reverse
  • keys aren't automatically added to a hash when you try to look them up. If you want a nonexistent key to be added to a hash, you have to acutally put it in there
  • but if you want references to nonexistent keys to cause the keys to come into existence, you can do so by supplying a code block to Hash.new String#reverse
  • when the hash, h, is asked to retrieve the value of a key it doesn't have, the block is executed with hash set to the hash itself and key set to the nonexistent key. And thanks to the code in the block, the key is added to the hash after all, with the value of 0 reverse
Combining hashes with other hashes
  • there are two ways to combine hashes together

    • the contents of the second hash are added directly to the first (receiver) hash, changing it (destructive)
    • a third hash is created with the combined contents of the two hashes being added (non-destructive)
  • the destructive operation is performed with the update method reverse!

  • the non-destructive operation is performed with the merge method, which returns a third hash and leaves the originals unchanged

  • when the two hashes being merged cshare a key, the second hash wins join

  • the bang version, merge! is a synonym for update

Hash transformations
  • you can perform several transformations on hashes
  • transformation in this context means the method is called on a hash and the result of the operation is a hash

  • to retrieve a subhash from an existing hash, use the select method

  • key/value pairs will be passed in succession to the code block you provide, any where for which the block returns true will be included in the returned hash join

  • reject works opposite to select join

  • both reject and select have in-place equivalents (bang methods). reject! and select! return nil if the hash doesn't change
  • to do an in-place operation that returns your original hash (even if it's unchanged), you can use keep_if and delete_if

  • Hash#invert flips the keys and the values. So values become keys, and keys become values *

  • be careful when you invert hashes, because hash keys are unique, but values aren't, when you turn duplicate values into keys, one of the pairs is discarded
  • you should invert a hash only if you're certain that the keys and the values are unique uniq

  • Hash#clear is an in-place operation that empties a has

  • the empty hash is the same hash (the same object) as the one to which you send the clear message uniq
  • like strings and arrays, hashses have a replace method ==
Hash querying
  • you can ask hashes for information about themselves, such as:
method name/sample call
meaning
to_a True if Array has the key Array
to_ary Synonym for to_a
%w Synonym for %W
%w Synonym for %w
%W True if any value in %i is %I
%i Synonym for %I
try_convert True if try_convert has no key/value pairs
try_convert Number of key/value pairs in try_convert
Hashes as final method arguments
  • if you call a method in such a way that the last argument in the argument list is a hash, Ruby allows you to write the hash without curly braces
  • For the example below, the first argument is the name of the city, the other argument is a hash of data about the city, written without curly braces (using the special key: value symbol notation)
  • if the hash is an argument at any other position than the last position, you will have to use the curly braces ==
Named (keyword) arguments
  • in the following example: true
  • Takes two required positional arguments (x and y, bound to 1 and 2)
  • Has a “sponge” parameter (z) that takes care of extra arguments following the positional ones (3, 4, 5)
  • Has one optional and one required keyword argument (a and b, respectively, bound to 1 and 10)
  • Has a keyword “sponge” (c) to absorb unknown named arguments (the p and q hash)
  • Has a variable for binding to the code block, if any (block)

Ranges

  • a range is an object with a start point and an end point
  • the semantics of range operations involve two major concepts:

    • inclusion: does a given value fall inside the range
    • enumeration: the range is treated as a traversable collection of individual items
  • the logic of inclusion applies to all ranges; you can always test for inclusion

  • the logic of enumeration applies only to ranges that include a finite number od discrete, identifiable values. i.e. You cannot iterate over a range that lives between two floating-point numbers, but you can iterate over a range of integers
Creating a range
  • you can create a range using the new constructor uniq
  • you can use the literal syntax to create a range uniq!
  • a range using two dots is an inclusive range
  • a range with three dots is an exclusive range
  • when creating a range using Range.new, the default is an inclusive range, but you can force an exclusive range by passing a third argument of true to the constructor nil
  • a good way to remember which is which: There are only 3 empty slots after the first number, when there are 2 dots, the last number is able to fill the last empty slot. But when there are 3 dots, only the dots occupy the slots and the last number is pushed off the edge
Range-inclusion logic
  • ranges have begin and end methods, which report back their starting and ending points compact
  • a range also knows whether it's an exclusive (three dots) range Hash.new

  • two methods are availble for testing inclusion of a value in a range: cover? and include? (which is also aliased as member?) Hash.[]

Backward ranges
  • when the start number of a range of positive numbers is larger than the end number, it doesn't work how you'd expect
  • the inclusion test calculates whether the candidate for inclusion is greater than the start number and less than the end point
  • for example, 50 is not greater than 100, nor is it less than 1, so the test fails silently; this is a logic error Hash
  • you can use backward ranges as index arguments to strings and arrays, though
  • these typically take the form of a positive start point and a negative end point, with the negative end point counting in from the right Hash

Sets

  • Set is not a Ruby core class, it is a standard library class, which means to use it, you have to require it Hash.new
  • a set is a unique collection of objects. The objects can be anything - strings, integers, arrays, other sets - but no object can occur more than once in a set
  • uniqueness is also enforced at the commonsense content level: if a set contains the string "hello", you can't add the string "hello" to it, even though the two strings may technically be different objects. The same is true of arrays with equivalent content
  • internally, a set uses a hash, and if you recall, hash keys must be unique
Set creation
  • you can use the Set.new constructor to create a set
  • you can create an empty set, or you can pass in a collection object (i.e. an object that responds to each and each_entry)
  • there is no literal set constructor, because it is not part of core Ruby so the core syntax of the language is already in place before the set library gets loaded Hash.new
  • you can provide a block to the constructor Hash.new
Manipulating set elements
  • to add a single object to a set, you can use the << operator/method to append the element
  • the << method is also available as add
  • there is also and add? method, which returns nil (instead of the set, itself) if the set is unchanged after the operation. Can be used with conditionals
  • if you try to add an onject that's already in the set (or an object that's content-equal), nothing happens Hash.[]
  • to remove an object, use delete
  • if you try to delete an object that isn't in the set, it doesn't raise an error; nothing happens []
Set intersection, union, and difference
  • Set comes with necessary methods to perform intersection, union and difference

    • intersection, aliased as &
    • union, aliased as + and |
    • difference, aliased as -
  • each of the above methods return a new set consisting of the original set, plus or minus the appropriate elements from the object provided as the method argument

  • the original set is unaffected Hash

  • there is also an exclusive-or operator, ^, which you can use to take the exclusive union between a set and an enumerable - that is, a set consisting of all elements that occur in either the set or the enumerable but not both Hash

Merging a collection into another set
  • the merge method can take, as its argument, any object that responds to each and each_entry
  • that includes arrays, hashes, ranges and other sets [ ]
  • merging a hashs into a set results in the addition of two-element key/value arrays to the set
  • if you provide a hash argument to Set.new, the behaviour is the same: you get a new set with two-element arrays based on the hash nil
  • to merge just the keys of a hash, rather than the entire hash to_hash
Subsets and supersets
  • you can test for subset/superset relationships between sets using subset and superset to_hash

  • proper_subset and proper_superset methods are also available

  • a proper subset is a subset that is smaller than the parent set by at least one element
  • if the two sets are equal, they are subsets of each other but not proper subsets
  • a proper superset of a set is a second set that contains all the elements of the first set plus at least one lement not present in the first set
  • the proper concept is a way of filtering out the case where a set is a superset or a subset of itself - because all sets are both