The Well Grounded Rubyist - Part 8

Strings, symbols, and other scalar objects

The term scalar means one-dimensional. Here it refers to objects that represent single values, as opposed to collections or container objects, which hold multiple values

Strings

  • In Ruby, the String and Symbol classes provide functionality for text representation and manipulation
  • strings are the standard way to represent bodies of text of arbitrary content and length
String notation
  • double quotes around a string means the string can be interpolated
  • single quotes around a string disables interpolation
Other quoting mechanisms
  • The alternate quoting mechanisms take the form %char{text}, where
    • char is one of several special characters and
    • the curly braces stand in for a delimiter of your choosing
    • Example: puts %q{Needn't escape apostropes}
      • The single quote character isn't being used, therefore needn't be escaped
  • %Q{} generates a double-quoted string. Actually, %{} also generates a double-quoted string
  • the delimiter can be just about anything, as long as opening delimiter matches the closing one, for instance: { and } are matching pairs
  • you can't use alphanumeric characters as delimiters, but you can use a space
"here" documents
  • a "here" document or here-doc is usually a multiline string that often takes the form of a template or a set of data lines
  • it is said to be "here" because it's physically present in the program file, not read in from a separate text file
  • here-docs come into being through the << operator
  • Example:
text = <<EOM  
This is the first line of text.  
This is the second line.  
Now we're done.  
EOM  
  • The expression <<EOM means the text that follows, up to but not including the next occurrence of "EOM"
  • the delimiter can be any string, EOM is just a common choice
  • It has to be flush-left, and it has to be the only thing on the line where it occurs
  • You can switch off the flush-left requirement by putting a hyphen before the << operator: text = <<-EOM
  • by default, here-docs are read in as double-quoted strings
  • to use single quotes, put the closing delimiter in single quotes when you start the document:
text = <<-'EOM'  
Single-quoted!  
Note the literal \n.  
And the literal #{2+2}.  
EOM  
  • you can even use a here-doc in a literal object constructor. For instance, you use a here-doc in as an array element
  • you can also use the <<EOM notation as a method argument
Getting/setting substrings
  • to retrieve the nth character in a string, you can use the [] operator/method
  • Example:
string = "Ruby so cool, like"  
string[5] # => "s"  
  • negative numbers index from the end of the string. So the last character can be found like so:
string = "Ruby so cool, like"  
string[-1] # => "e"  
  • you can use a range object as the argument
string = "Ruby so cool, like"  
# including the end character in the range
string[0..3] # => "Ruby"  
# not including the end character in the range
string[0...3] # => "Rub"  
  • Index logic only goes from left to right. So you can use negative numbers (which count from the end of the string), but the second index has to be closer to the end of the string than the first index
string = "Ruby so cool, like"  
string[-4..18] # => "like"  
  • you can get a substring based on an explicit substring search. If it is found, it is returned. If not found, the return value is nil
string = "Ruby so cool, like"  
string["Ruby"] # => "Ruby"  
string["JavaScript"] # => nil  
  • it is also possible to search for a pattern match using the [] technique with a regular expression
string = "Ruby so cool, like"  
string[/c[ol]+/]  
  • the [] method is also available under the name slice. The receiver-changing version, slice! removes the characters from the string permanently
string = "Ruby so cool, like"  
string.slice("Ruby") # => "Ruby"  
string # => "Ruby so cool, like"  
string.slice!(", like") # => ", like"  
string # => "Ruby so cool"  
  • to set part of the string to a new value, use the []= method
string = "Ruby so cool, like."  
string["cool"] = "great" # => "great"  
string # => "Ruby so great, like."  
string[-1] = "!"  
string # => "Ruby so great, like!"  
Combining strings
  • the methods for combining strings differ as to whether the operation changes the receiver
  • this means:
    • whether the second string is permanently added to the first, or
    • whether a new, third string is created out of the first two
  • the + method/operator creates a new string consisting of two or more strings. Example: "a" + "b" #=> "ab"
    • or with variables:
s1 = "hello"  
s2 = "world"

s1 + s2 # => "helloworld"  
  • to add a second string permanently to an existing string, use the << method, which also has a syntactic sugar, pseudo-operator form:
str = "Hello "  
str << "World"  
str # => "Hello World"  
String combination via interpolation
  • you really don't want to do this, but you can actually interpolate any code you want
  • Ruby interpolates by calling to_s on the object to which the interpolation code evaluates
  • you can define your own to_s methods
Querying strings
  • string queries give you
    • a Boolean response (true or false)
    • a kind of status report on the current state of the string
Boolean string queries
  • include? can be used to ask a string if it includes a given substring:
string = "Ruby is a cool language."  
string.include?("Ruby") # => true  
string.include?("lame") # => false  
  • you can test for a given start or end to a string with start_with? and end_with?:
string = "Ruby is a cool language."  
string.start_with?("Ruby") # => true  
string.end_with?("!!!") # => false  
  • you can test for the absence of content (i.e. for the presence of any characters at all) with the empty? method:
string = "Ruby is a cool language."  
string.empty? # => false  
"".empty? # => true
Content queries
  • you can find the size or the length (they are synonyms for the same method):
string = "Ruby is a cool language."  
string.size # => 24  
string.length # => 24  
  • you can use count to find how many times a given letter or set of letters occurs in a string:
string = "Ruby is a cool language."  
string.count("a") # => 3  
  • you can count how many of a range of letters there are by using a hyphen-separated range:
string = "Ruby is a cool language."  
string.count("g-m") # => 5  
  • character specifications are case-sensitive:
string = "Ruby is a cool language."  
string.count("A-Z") # => 1  
  • you can also provide a written-out set of characters you want to count:
string = "Ruby is a cool language."  
string.count("aey. ") # => 10  
  • to count the number of characters that don't match the ones you specify, use a caret (^) at the beginning of your specification
string = "Ruby is a cool language."  
string.count("^aey. ") # => 14  
string.count("^g-m") # => 19  
  • the caret technique is a close cousin of the regular expression character class negation. You can combine the specification syntaxes and even provide more than one argument:
string = "Ruby is a cool language."  
string.count("ag-m") # => 8  
string.count("ag-m", "^l") # => 6  
  • the index method is sort of the inverse of using [] with a numerical index. Instead of looking up a substring at a particular index, it returns the index at which a given substring occurs
  • index returns the first occurance from the left
  • using rindex returns the first occurance from the right
string = "Ruby is a cool language."  
string.index("cool") # => 10  
string.index("l") # => 13  
string.rindex("l") # => 15  
  • the ord can be used to find the ordinal code of one-character strings
  • using ord on a longer string gives you the codes of the first character
  • the reverse operation to ord is chr
  • using chr on a number that has no corresponding character will cause a fatal error
"a".ord # => 97
"abc".ord # => 97
97.char # => "a"  
Comparing two strings for equality
  • the most common string comparison method is ==, which tests for equality of string content
  • the two literal "string" objects are different objects, but they have the same content, so they pass the == test
"string" == "string" # => true
"string" == "house" # => false
  • String#eql? tests two strings for identical content. In practice, it returns the same result as ==
  • String#equal? tests whether two strings are the same object
"a" == "a" # => true
"a".eql?("a") # => true
"a".equal?("a") # => false
String transformation
  • string transformation in ruby informally falls into 3 categories:
    • 1) case transformation
    • 2) formatting transformation
    • 3) content transformation
Case transformation
  • string lets you raise, lower and swap their case
  • all case-changing methods have receiver-modifying equivalents
string = "Hello, World!"  
string.upcase # => "HELLO, WORLD!"  
string.downcase # => "hello, world!"  
string.swapcase # => "hELLO, wORLD!"  
"hello".capitalize # => "Hello"
Formatting transformations
  • strictly speaking, format transformations are a subset of content transformations
string = "Hello, World!"  
string.rjust(25) # => "           Hello, World!"  
string.ljust(25) # => "Hello, World!           "  
  • if you supply a second argument, it is used as padding. This second argument can be more than one character long
  • the padding pattern will repeat as many times as it will fit, truncating the last placement if necessary
string = "Hello, World!"  
string.rjust(25, '.') # => "...........Hello, World!"  
string.rjust(25, '><') # => "><><><><><>Hello, World!"  
  • there is a center method which behaves like rjust and ljust but puts the characters of the string in the center
  • odd-numbered padding spots are rendered right-heavy
string = "Hello, World!"  
string.center(20, "*") # => "*****The middle*****"  
string.center(21, "*") # => "*****The middle******"  
  • you can prettify your strings by stripping whitespace
string = "     Hello, World!     "  
string.strip # => "Hello, World!"  
string.lstrip # => "Hello, World!     "  
string.rstrip # => "     Hello, World!"  
Content transformations
  • the main difference between chop and chomp is that:
    • chop removes a character unconditionally
    • chomp removes a target substring if it finds the substring at the end of the string
  • by default, chomp's target substring is the newline character
  • both chop and chomp have bang equivalents
"Hello, World!".chop # => "Hello, World"
"Hello, World!\n".chomp # => "Hello, World!"
"Hello, World".chomp("ld") # => "Hello, Wor"
  • the clear method empties a string of all its characters, leaving an empty string
  • the clear method changes its receiver but doesn't end with a bang
string = "Hello, World!"  
string.clear # => ""  
string # => ""  
  • replace takes a string argument and replaces the current content of the string with the content of the argument
  • as with clear, the replace method permanently changes the string
string = "Hello, World!"  
string.replace("Goodbye, Space!") # => "Goodbye, Space!"  
  • delete lets you target certain characters for removal from the string
"Hello, World!".delete("aeiou") # => "Hll, Wrld"
"Hello, World!".delete("^aeiou") # => "eoo"
"Hello, World!".delete("a-e", "^o") # => "Hllo, Worl"
  • crypt performs a Data Encryption Standard (DES) encryption on the string
  • the single argument to crypt is the two-character salt string
"Hello, World!".crypt("34") # => "34YjtHFLlMy0g"
  • the succ method (also available as next) lets you do string incrementation
  • the ability to increment strings comes in handy when you need to batch-generate unique strings for filenames or similar
"a".succ # => "b"
"a".next # => "b"
"abc".succ # => "abd"
"azz".succ # => "baa"
String conversions
  • the to_i method offers an additional feature: if you give it a positive integer argument in the range of 2-36, the string you're converting is interpreted as representing a number in the base corresponding to the argument
  • for example, if you want to interpret 100 as a base 17 number, the output is the decimal equivalent of 100, base 17:
"100".to_i(17) # => 289
  • base 8 and base 16 are special cases and have dedicated methods
"100".oct # => 64
"100".hex # => 256
  • other conversion methods available to strings are:
    • to_f
    • to_s (it returns its receiver)
    • to_sym or intern (converts to a Symbol object
"1.2345".to_f # => 1.2345
"Hello".to_s # => "Hello"
"abcde".to_s # => :abcde
"1.2345and some workds".to_f # => 1.2345
"just some works".to_i # => 0

String encoding - setting the encoding of the source file

  • Ruby source files use UTF-8 encoding
  • you can ask Ruby to display the value, by putting this line in a file: puts __ENCODING__. Typing it into irb may get different results
  • to change the encoding of the source file, you nedd to use a magic comment at the top of the file: # encoding: encoding e.g. ASCII encoding would use: # encoding: ASCII as the magic comment
Encoding of individual strings
  • strings can tell you their encoding:
str = "Test string"  
str.encoding # => #<Encoding:UTF-8>  
  • you can encode a string with a different encoding, as long as the conversion from the original encoding to the new one - transcoding - is permitted
string.encode!("US-ASCII")  
string.encoding # => #<Encoding:US-ASCII>  
  • the encoding of a string is also affected by the precence of certain characters in a string and/or by the amending of the string with certain characters
  • you can represent arbitrary characters in a string using either -the \x escape sequence, for a two-digit hexadecimal number reprenting a byte, or
    • the \u escape sequence, which follows a UTF-8 code, and inserts the corresponding character

Symbols and their uses

  • symbols are instances of the Symbol class
  • the literal constructor for symbols is the leading colon
  • Example
:a
:book
:"Hello world"
  • you can create a symbol programmatically by calling the to_sym method (also known by the synonym intern) on a string
"a".to_sym # => :a
:a.to_s # => "a"
"Hello world".intern # => :"Hello world"
Characteristics of symbols
  • the chief characteristics of symbols are
    • immutability
    • uniqueness
Immutability
  • symbols are immutable, there's no such thing as appending characters to a symbol; once the symbol exists, that's it.
  • like an integer, a symbol cannot be changed
Uniqueness
  • symbols are unique
  • whenever you see a symbol like :abc, it is the same object. With strings, though, "abc" and "abc" are two different objects
  • you can see the uniqueness by querying objects for their object_id
  • because symbols are unique, there is no point having a constructor for them. Ruby has no Symbol#new method
Symbols and identifiers
  • symbols don't represent anything other than themselves
  • Ruby uses symbols to keep track of all the names it has created for variables, methods, and constants
  • you can see the list of them using Symbol.all_symbols class method. There are over 3000 symbols
  • grep is a regular expression based way of looking for matching elements in an array. Example: Symbol.all_symbols.grep(/my_symbol/)
  • if you use Symbol.all_symbols.include?(:abc) to test for the existence of a symbol, it will always return true, because the very act of writing :abc in the include? test puts the symbol :abc into the symbols table
  • Ruby keeps track of what symbols its supposed to know about so it can look them up quickly. The inclusion of a symbol in the symbol table doesn't tell you anything about what the symbol is for
Symbols in practice
  • the two most common uses of symbols are:
    • symbols as method arguments
    • hash keys
Symbols and strings in comparison
  • you can think of symbols as integer-like entities dressed up in characters

Numerical Objects

  • in Ruby, numbers are objects. You cna send messages to them, just as you can to any object
Numberical classes
  • several classes make up the numerical landscape
  • Numerical
    • Float
    • Integer
      • Fixnum
      • Bignum
Performing arithmetic equations
  • if you are diving integers, the result is always an integer. If you want a floating-point number, you must feed Ruby floating-poimt number
  • Ruby also lets you manipulate numbers in nondecimal bases. HExadecimal integers are indicated by a leading 0x.
0x12 # => 18  
0x12 + 12 # => 30  
  • integers beginning with a 0 are interpreted as octal (base 8)
012 # => 10  
012 + 12 # => 22  
012 + 0x12 # => 28  
  • you can supply the base you want to convert from as an argument to to_i
"10".to_i(17) # => 17
"12345".to_i(13) # => 33519
"ruby".to_i(35) # => 1194794
  • most arithmetic operators you see in Ruby are methods. They don't look that way because of operator-like syntactic sugar
  • in practice, nobody writes arithmetic operations this way, you will always see the syntactic sugar equivalents
1.+(1) # => 2  
12./(3) # => 4  
-12.-(-7) # => -5 

Time and dates

  • ruby gives you a lot of ways to manipulate times and dates
  • Times and dates are manipulated through three classes: Time, Date, and DateTime. These are collectively referred to as date/time objects
  • you can require the libraries as follows:
require 'date'  
require 'time'  
Creating Date objects
  • you can get today's date with the Date.today constructor
Date.today  
  • you can get a simpler string by running to_s on the date, or by putsing
puts Date.today  
  • you can create date objects with Date.new (also available as Date.civil)
puts Date.new(1959, 2, 1)  
puts Date.civil(1959, 2, 1)  
  • you can create a new date with the parse constructor, which expects a string representing a date
  • Date.parse makes an effort to make sense of whatever you pass in
puts Date.parse("2003/6/8") # => 2003-06-09  
puts Date.parse("03/6/8") # => 2003-06-09  
puts Date.parse("33/6/9") # => 2033-06-09  
puts Date.parse("November 2 2013") # => 2013-11-02  
puts Date.parse("Nov 2 2013") # => 2013-11-02  
  • you can create Julian and commercial (Monday-based rather than Sunday-based day-of-week counting) Date objects with the methods jd and commercial
  • you can scan a string against a format specification, generating a Date object with strptime
Creating Time objects
  • you can create Time objects with the following constructors:
    • new (also available as now)
    • at
    • local (also available as mktime)
    • parse
  • To use Time.parse, you have to load the time library
Time.new # => 2017-05-03 08:59:07 +0800  
Time.now # => 2017-05-03 08:59:07 +0800  
Time.at(100000000) # => 1973-03-03 17:16:40 +0730  
Time.local(2017,10,3,14,3,6) # => 2017-10-03 14:03:06 +0800  
Time.mktime(2017,10,3,14,3,6) # => 2017-10-03 14:03:06 +0800

require 'time'  
Time.parse("March 22, 1985, 10:35 PM") # => 1985-03-22 22:35:00 +0000  
Creating Date/Time objects
  • DateTime is a subclass of Date, but its constructors are a little different thanks to some overrides
  • the most common constructors are
    • new (also available as civil)
    • now
    • parse
  • DateTime also features the specialised jd (Julian date), commercial and strptime
puts DateTime.new(2009, 1, 2, 3, 4, 5)  
puts DateTime.civil(2009, 1, 2, 3, 4, 5)  
puts DateTime.parse("October 23, 1973, 10:34 AM")  
Date/time query methods
  • date/time objects have a second method, as well as sec. Time objects have only sec
  • you can check if the given date/tome is or sign't a particular day of the week
require 'date'  
require 'time'

dt = DateTime.now # => #<DateTime: 2017-05-03T09:33:16+08:00 ((2457877j,5596s,658389000n),+28800s,2299161j)>  
dt.year # => 2017  
dt.hour  
dt.minute  
dt.second

t = Time.now # => 2017-05-03 09:34:35 +0800  
t.month  
t.sec

d = Date.today # => #<Date: 2017-05-03 ((2457877j,0s,0n),+0s,2299161j)>  
d.day

d.monday? # => false  
d.wednesday? # => true  
Date/time formatting methods
  • all date/time objects have the strftime method, which allows you to format their fields in a flexible way using format strings
t = Time.now # => 2017-05-03 09:39:16 +0800  
t.strftime("%d-%m-%y")  
"03-05-17"

t.strftime("Otherwise known as %d-%b-%y") # => "Otherwise known as 03-May-17"  
Date.today.rfc2822  
DateTime.now.httpdate  
  • common time/date format specifiers for strftime
Specifier
Description
%Y Year (four digits)
%y Year (last two digits)
%b, %B short month, full month
%m Month (number)
%d Day of month (left padded with zeros)
%e Day of month (left padded with blanks)
%a, %A short day name, full day name
%H, %I Hour (24-hour clock), hour (12-hour clock)
%M Minute
%S Second
%c Equivalent to "%a %b %H:%M:%S %Y"
%x Equivalent to "%m/%d/%y"
Date/time conversion methods
Object
Methods available
Time to_date, to_datetime
Date to_time, to_datetime
DateTime to_time, to_date
Date/time arithmetic
  • Time objects let you add and subtract seconds from them, returning a new time object
t = Time.now # => 2017-05-03 10:09:00 +0800  
t - 20 # => 2017-05-03 10:08:40 +0800  
t + 20 # => 2017-05-03 10:09:20 +0800  
  • Date and DateTime objects interpret + and - as day-wise operators and they allow for month-wise conversions with << and >>
dt = DateTime.now  
puts dt + 100  
puts dt >> 3  
puts dt << 10  
  • you can move ahead one using the next (also available as succ) method
d = Date.today  
puts d.next  
puts d.succ  
puts d.next_year  
puts d.next_month(3)  
puts d.prev_day(10)  
  • Date and DateTime objects allow you to iterate over a range of them, using the upto and downto methods. Each of which takes a time, date, or date/time object
d = Date.today  
next_week = d + 7  
d.upto(next_week) {|date| puts "#{date} is a #{date.strftime("%A")}" }