Getting Started With the Language

Prerequisites

This guide assumes that you know how to create a project, edit and evaluate your formulas. If not, you can browse through it here Getting Started. We will be referring to these features throughout this document.

About the language

Formulative is a functional language with its expressions closely linked to Mathematics. This means that problems modelled using mathematical constructs can - in most cases - be translated to Formulative using the same mathematical notation. Model and implementation can in many cases stay very close and the resulting implementation is terse.

Formulative supports Numerical problems and up to now we have not implemented Symbolic computations.

Not all models can be expressed with pure mathematics (nor is it always desirable). Therefore Formulative has generic functional constructs like map, reduce or fold to solve generic problems.

The expressions are evaluated lazily: when you ask it to compute some variable, it will build up a path of expressions in your model to get that result. You cannot control the procedural logic, how results are calculated, i.e. Formulative is not a procedural language.

Formulative has some extras to help implement Calculations like “math-style” functions, collections (sets, vectors and matrices) and operators to work with them, advanced date-time manipulation, etc.

Variables

Variable identifiers follow math conventions, therefore it is important that two variables which render almost similarly might be in fact two different variables, e.g. x and x are two distinct Variables.

Note

Identifier embellishments are “decorators” and create distinct variables

Formulative also has some common built in identifiers, e.g.: $e, $i, $\pi

Find more details on variables here: Identifiers.

Numerical expressions

Create a simple expression that defines a decimal literal constant:

x = 123.2

Note

You can copy & paste formulas like the above (x = 123.2) into the formula editor

Embellishments are significant, so now enter this formula:

x' = 232.1
../_images/number-literal-formula.png

and check the Evaluation variable list, it will show you the two variables with their respective values:

../_images/number-literal-eval.png

To use some built in constants and a numeric operator, create a literal constant with its value equal to pi square ( π 2 ). The expression is:

a = $\pi^2

Once you entered the formula you can Evaluate the result:

../_images/pi-square-eval.png

Checkout the reference for more on Numbers (literals, arithmetic, relational operators).

String expressions

Create the following expressions to do some String literal manipulation:

  • expression defining first name: n' = "John"
  • expression defining last name: n'' = "Doe"
  • expression to concatenate them: n''' =  n' *** " " *** n''

To get a substring of the above variable n , add the following expression:

n_'substr = (n''')_{1,...,4}

This will select the first 4 characters from the string:

../_images/string-name-substring.png

Note that character indexing starts with 1 (instead of 0 as in many other languages).

Checkout the reference for more on Strings

Keywords

Keywords are specific strings and are defined with an apostrophe, e.g. suppose you have payment frequencies, then you could use 'monthly as a specific value in the list of payment frequency values. You often use keywords in relational table resources to represent a strict set of values in some of the columns. More on this subject in the Tables section.

Date / Time & Durations

Formulative has built in support for Date and Time manipulation. Start with a simple date literal:

!d_'lit = '2018-10-23

Next create a time literal:

!t_'lit = '2018-10-23T01:02:03

Continue with a duration adding 2 years and 3 months:

dur_'lit = 2 $!year + 3 $!month

Now, let’s add this to the !d_lit variable by:

!d'_lit' = !d_'lit + dur_'lit

You can also convert between data and date-time literals.

Create a date-time from a date value (by extending it with 00:00:00):

!t_'extend = $time(!d_'lit)

Do it the other way, create a date by extracting the date part from a date-time:

  • create a time value:
!t_'lit = '2018-10-23T01:02:03
  • extract the date part:
!d_'extract = $date(!t_'lit)

Using the built in functions $today and $now provide the current date and time respectively.

The following expression constructs a date from parts (Y, M, D):

!d_'comp = $date(2018, 12, 6)

You can use the $time(Y, M, D, h, m, s) similarly to construct a time value:

!t_'comp = $time(2018, 12, 6, 12, 5, 0)

Subtracting parts from a date or time is done using subscripts, e.g. to get the years part:

n_'years = (!!d_'comp)_'year

Checkout the reference for more on Date / Time & Durations

Ranges, repeats, tuple

A simple use of the range construct is to create some collection from it. For example to create a tuple with numbers from 1 to 10, you could write this:

a = (1,...,10)

Now that we have the numbers in a collection we could sum up say the first 5 numbers like this:

b = {+}__{i=1}^^5 (a)_i

What is going on in this formula?

  • b = is an assignment of a new variable
  • {+} is the summation operator that will be invoked on our collection of integers
  • _{i=1} the subscript notation is used on the operator to tell the index to begin from
  • ^5 the superscript notation tells the upper index to use
  • (a)_i is the element at position i in collection a

You can use expressions in both the subscript and superscript, so if you wanted you could iterate over a previously unknown length by using the $len function like this:

c = {+}__{i=1}^^{$len a} (a)_i

With ranges and repeats you can also create iterator bindings that will bind a variable to a range of values and let you work with these values one by one in some expression.

For example to create the same range of numbers and bind them to a variable used to create the Tuple, you could write this:

d = (i | i = 1,...,10)

What is going on in this formula?

  • d = is an assignment of a new variable
  • i = 1,...,10 binds i to a list of integers (given as a range here)
  • i the variable in the binding is a local variable in the expression which takes the listed values
  • | maps the expression on the left to the binding on the right, using iteration
  • the braces ( ) are important, they are not used for grouping but for collection construction: constructing a tuple.

Note

Had we used {{ }} instead of (  ) we would have created a set instead of a tuple.

Creating a Tuple with this approach is more complex but has some additional features. For example, if we wanted to add up only even numbers between 1 and 10 we could add a condition to the range construct as follows:

e = (i | i = 1,...,10 /\ i $mod 2 = 0)

and to sum these up:

f = {+}__{i=1}^^{$len e} (e)_i

This is what your formulas look like (source and rendered):

../_images/range-even-tuple.png

See this for more on lists and ranges: Lists

See this for more on Tuples

Finite Sets

Constructing a finite set manually, by listing its elements:

s_'fromLiteral = {{1,2,3,4,4,4,4,4}}

Since we are creating a set there will be only 4 elements.

Building on the range example above, we can create a set with the even numbers from 1 to 10 like this:

S = {{i | i = 1,...,10 /\ i $mod 2 = 0}}

Then adding the Set’s values is very concise:

sum_'S = {+}_{x (- S} x

Note

For this sum operation {+} to work you need to have a Set.

There are other predefined operators on sets like multiplication or set algebra.

Create two sets of keywords (we will use letters for keywords), then we will see how the set operations work:

P={{'a,'b,'c,'d}}
T={{'a,'b,'e,'f}}

Some of the Set Operations and their respective results using the above sets:

Set operations and results
Source Result
P_'unionT = P (_) T P unionT = a , b , c , d , e , f
P_'intsT = P (~) T P intsT = a , b
P_'diffT = P \\ T P diffT = c , d

See this for more on Finite Sets

Associations (maps) and Relational Tables

Associations are key-value maps. Keywords come handy when building keys of associations. If you wanted to represent attributes of some object you can use Associations. As a simple example, here is an association that holds the name, age and gender attributes of a person:

John = {{'name |-> "John", 'age |-> 28, 'gender|->'male}}

Note the following:

  • The mathematical model of an association is simply a (finite) function: it maps elements from a set to another set
  • Elements of associations are actually (key, value) pairs
  • The maplet operator |-> is syntactic sugar, i.e. key |-> value = (key, value)
  • However, when using the maplet operator |-> to construct an association the keys are checked for uniqueness
  • We were using keywords, e.g: 'name for the keys of the associations
  • We were using strings where there is no closed value set, e.g. the name attribute
  • We were using keywords again when the value is from a closed set like genders

Then create a second person:

Thomas = {{'name |-> "Thomas", 'age |-> 32, 'gender |-> 'male}}

To create a relational table of the people/characters in our example we can combine these into a Relational Table, which is just a plain set of associations:

P = {{John, Thomas,
{{'name |-> "John", 'age |-> 31, 'gender |-> 'male}},
{{'name |-> "Winona", 'age |-> 13, 'gender |-> 'female}},
{{'name |-> "Lige", 'age |-> 35, 'gender |-> 'male}},
{{'name |-> "Starling", 'age |-> 29, 'gender |-> 'male}},
{{'name |-> "Rosalee", 'age |-> 24, 'gender |-> 'female}}
}}

Notice that:

  • we have used John and Thomas variables in the table construction, from their previous declaration
  • we have added other associations by constructing them literally in place

Now we can do some relational table manipulation:

Select Thomas’s age with the following expression:

Age_'T = {{(r)_'age | r (- P /\ (r)_'name = "Thomas"}}

Some explanation of the above expression:

  • we are constructing the result as a set by the set construction operator: {{...}}
  • (r)_'age will select the age attribute (with the subscript) of the current element bound to the local variable r
  • | r (- P here we bind r, a local variable, to elements of the set P
  • /\ (r)_'name = "Thomas" is a condition that r has to satisfy. The subscript selects the attribute (name)

Select all female characters:

C_'f = {{(r)_'name | r (- P /\ (r)_'gender = 'female}}

We could have used a simpler form of selection to create a new relational table (sub-select) with the sigma operator:

C'_f = $\sigma_{'gender|->'female} P

Select all male characters older then 30:

C_'gt30 = {{(r)_'name | r (- P /\ (r)_'gender = 'male /\ (r)_'age >= 30}}

Some explanation of the above expression:

  • here we used more than one condition on the current element r

See this for more on Associations (Maps) & Sets of Associations (Relational Tables)

Parameter Tables

Parameter tables you define (or upload) are Relational Tables, too. Make sure you create the scoring parameter table in this getting started tutorial: Add interest rate calculation based on Scoring

To get the the annual rate for a customer with a score of 720 write an expression similar to how selection was done from the above relational table:

S_'customer = (r)_'APR | r (- @Scores : (r)_'From <= 720 /\ (r)_'To >= 720

Notice the following:

  • we used @Scores to refer to the Parameter table (as a relational table)
  • after we bound r to elements of the Scores table, we used : to describe selection criteria, this is an

alternative to using /\, as we did before.

Sequences

Start by creating a simple sequence literal:

S = {<1,2,3,4,5>}

Sequences are more complex than Sets, therefore the following expression that sums the above sequence will require some explanation:

S_'sum = {+}_{(i,e) (- S} e

Notice the following:

  • we are using the {+} operator similar to the example with Sets
  • the subscript _{(i,e) (- S} binds to the elements of Sequence S
    • the (i,e) part, however, is not as simple as in the case of sets, where we bound a single variable to the elements of the Set.
    • here we destructure each element of S into two distinct variables
      • i- the index of the element, and
      • e- the actual member @ index i, i.e. the value
  • then we use e to sum up the elements of the sequence

The mathematical model of sequences is an association that has numeric keys in the range 1,...,n, for some n. Sequences can also be described as finite functions whose domain is an initial segment of the positive integers. All this means that sequences are functions, associations, and, as such, are sets as well. Therfore, if we iterate elements of a Sequence with the (- (element of) operator, we get (i,e) (index, member) pairs as elements.

Note

Sets have elements. Sequences have members. However, as sequences are also sets, their elements are pairs consisting of an index and the corresponding member. Therefore the following are all the same sequence: {< a, b, c >}, {{ 1 |-> a, 2 |-> b, 3 |-> c }}, {{ (1,a), (2,b), (3,c) }}.

Create a subset of the above sequence with only the first two elements:

S_'sub = {< e | i = 1,..., $len S; e = S(i) /\ i < 3 >}

Notice the following:

  • {<...>} is the sequence constructor operator, so the result will be a sequence
  • e will refer to the value of each sequence element (index/value) pair
  • i is bound to a range of indexes from 1 to the length of S, $len S
  • e = S(i) extracts the value from the sequence at position i
  • /\ i < 3 is a condition that tells to select only elements with indexes below 3 (indexes start at 1)
  • this example shows that several simple bindings can be joined with ;.

If you wanted the square of all members of S you could write this:

S_'subsq = {< e^2 | i = 1,..., $len S; e = S(i) /\ i < 3 >}

Note

Notice above that the expression e^2 is mapped to the binding on the right. This notation makes it possible to construct any sequence whose elements can be written as an expression of the index.

Now that you have seen the hard way to create a subset with the first two elements here is a simpler but less generic way to do the same:

S_'sub2 = ( S )_{< 1,...,2 >}

Functions

Before going further with map/reduce/fold/sort functions we need to make a little detour into functions. Other than expressions assigning values to variables you can also define functions. Let us start with a simple one calculating the Area of a circle:

f_'AofCircle: %R --> %R, r |--> r^2 $\pi

There are more than one way to define a function, let’s dissect this one:

  • f_'AofCircle is the name of the function

  • %R --> %R describes that the function will have one argument (a real number, denoted by %R) and will return a real value (%R to the right of the arrow)

  • %R is a set that Formulative calls a “domain”, you can find out more about domains here: Operations on Numeric Sequences

  • r names our single argument (which we previously described to be a real)

    • if we had more arguments we would have put brackets around like: (x,y,z)
  • to the right of the |--> literal we define the body of the function.

To call our function and assign a value to a variable you could write:

A = f_'AofCircle(x)

Where x is an unbound variable which you can set when you evaluate your formula.

Let’s do something with a sequence of real numbers like finding out the average:

avg: $Seq(%R) --> %R, S |--> 1/($len S) * ({+}_{(i,e) (- S} e)

Invoke our new avg function on the first 100 numbers wth this formula:

x^^ =  avg({<1,...,100>})

What happens when you have multiple arguments ? Here is a function that calculates the area of a rectangle:

f_'AofRect: %R**%R -) (a,b) |--> a b (- %R

Some explanation:

  • this notation is an alternative form of f_'AofRect: %R**%R --> %R, (a,b) |--> a b
  • we used the cartesian product of reals, %R**%R, to define the input parameter domain
  • the -) (“contains as element”) operator described the argument list (a,b)
  • we used the invisible multiplier operator to do the multiplication
  • (- %R describes that the function returns a real number.

Similarly to the invocation of the circle area function use this with unbound variables a and b:

A_'rect = f_'AofRect(a,b)

Then evaluate the function if you want to check the result, specify a and b as input variables and A_'rect as your output variable.

There are alternative notations to declare functions which you can find here: Functions (Morphisms)

Sequence operations

Now that Functions were introduced we can get back and do some more operations with Sequences.

Sorting

To begin with let us create a sequence of unordered numbers:

N_'us = {<33,55,22,11,5,324,10>}

Sorting requires a comparator function that takes two parameters (a, b) and returns a %R (real). If the returned value is is less than 0, sort a to an index lower than b, i.e. a comes first. Our number comparator function is the following:

numCmp: %Z**%Z --> %Z, (a,b) |--> a - b

Since the sequence contains integers, we used the %Z (integers) domain.

This expression will then do the sorting:

N_'s = $sort_numCmp (N_'us)

Note

You could have also sorted this sequence using a default sorter for integers by using this N_'sdef = $sort_%Z (N_'us)

The expression defines the N_'s variable that will hold the sorted sequence. The subscript _numCmp tells the $sort function which comparator function to use for sorting.

Sorting Sequences of Associations

Let us do something more complex, sort the People sequence according to age. Assuming the S_'people sequence is defined, before moving on, define the domain (or schema) of your Relational Table:

PersonD = $Asc( {{'name}} --> %S,
           {{'age}} --> %N,
           {{'gender}} --> %K )

This declares that the PersonD domain is the set of all associations made up of:

  • a name keyword that maps to a String value %S
  • an age keyword that maps to a Number value %N
  • a gender keyword that maps to keywords %K

With the domain declared we can write a comparator function:

ageCmp: PersonD**PersonD  --> %R, (e_1, e_2)
    |--> (e_1)_'age - (e_2)_'age

Recall from the function introduction that following the function name declaration the Parameter Domains have to be specified. The comparator function used by the sort function is given two elements that it has to compare. The elements will be from the sequence therefore we can reuse the PersonD domain declaration to declare the domains of parameters e_1 and e_2.

Note

Instead of DomainD**DomainD we could have used a shorter form DomainD^2.

The following expression will finally create the sequence containing the people, sorted by age:

S_'peopleSorted =  $sort_ageCmp(S_'people)

Reduce / Fold

Formulative has support for map/reduce/fold algorithms that work on sequences. Working with sequences has show how mapping works (on any collection). Reduce and Fold however requires a collection where the order of elements is guaranteed. Sequences provide this guarantee, therefore to Reduce and Fold operate only on Sequences.

As a first example let’s see how Reduce works. Create a reduce function that will add the square of the elements in an integer sequence:

sqrRe: %Z**%Z --> %Z, (acc, item)
    |--> acc + item^2

Reduce functions take two arguments:

  • the accumulator acc (initially 0 because we are using the %Z (integer) domain)
  • the current item item

Invoke the reduce function as shown below:

Re_'sqr = $reduce_sqrRe(S)

Note

If you wanted to specify an initial accumulator value e.g. start with -10, you could specify it in the subscript along with the reducer function name: Re_'sqr = $reduce_{sqrRe, -10}(S)

A Fold function will be used to count the even numbers in a sequence. The fold function is the following:

cntFld : %R^2 --> %R, (a,e) |--> a + [ e $mod 2 = 0 ]

The full signature of the fold function (a,e,i,s) is more complex than above (and the reduce callback function’s):

  • a : the accumulator value (as with reduce)
  • e : the current element in the sequence (as with reduce)
  • i : the optional index value of the e -s index in the input sequence
  • s : the optional input sequence

In order to do the counting evaluate this expression:

Fld_'cnt = $fold_{ cntFld, 0 } ({<1,...,50>})

Matrix and Vector operations

To demonstrate the use of Matrices we are going to solve a simple linear regression, line-fitting problem.

The problem is of the form: Y = A + b X

Where Y is the criterion variable, X is the predictor variable. Variable A is the intercept and b is the slope of the curve.

The Example data is the following:

Linear Regression Example data
Predictor (X) Criterion (Y)
1.0 1.0
2.0 2.0
3.0 1.3
4.0 3.75
5.0 2.25

In order to find out the best fitting line we will use linear transformations. The formula we are going to use is the normal equation formula: ϕ = X T X 1 X T y

In order to implement out problem, let’s create the design matrix X (with x 0 = 1 ):

X = [ 1 # 1 ##
    1 # 2 ##
    1 # 3 ##
    1 # 4 ##
    1 # 5]

Then the vector for Y the criterion variable:

Y = [1, 2, 1.3, 3.75, 2.25]

The formula below implements the pseudo inverse calculation X T X 1 X T

X_'pinv= ($transpose(X) * X)^{-1} * $transpose(X)

Finally we get the parameter vector, by this multiplication (we could have kept the two last formulas together, separated to make the calculation of the pseudo inverse more explicit):

\varphi = X_'pinv * Y

When you evaluate the expressions you should get the parameter vector ϕ = 0.785 , 0.425 (cut precision in display)

Therefore filling the parameters into the equation, we got to this solution:

Y = 0.785 + 0.425 X

Calculating the standard error

Finish this part by computing the standard error of the estimate based on errors of prediction.

Example data with Prediction Errors
  Predictor (X) Criterion (Y) Prediction (Y’) Error (Y-Y’) Error Sqr (Y - Y’)^2
  1.00 1.00 1.210 -0.210 0.044
  2.00 2.00 1.635 0.365 0.133
  3.00 1.30 2.060 -0.760 0.578
  4.00 3.75 2.485 1.265 1.600
  5.00 2.25 2.910 -0.660 0.436
Sum 15.00 10.30 10.30 0.000 2.791

The standard error of the estimate is a measure of the accuracy of predictions. Our regression line is the line that minimizes the sum of squared deviations of prediction (also called the sum of squares error). The standard error of the estimate is defined below:

σ pred = i = 1 N Err i N where Err i is Y i Y i 2 .

To perform this calculation we need to implement the followings:

  • implement the linear function formula to call to get the predicted results
  • create a sequence holding tuples of reals, one for X, Y, Y’, Y - Y’ and (Y - Y’)^2
  • do a fold on the above sequence to sum the (Y - Y’)^2 values
  • calculate the standard error according to the above formula.

The linear function implementation requires the following 3 expressions:

A = (\varphi)_1
b = (\varphi)_2
f_'lin : %R --> %R, x |--> A + b x

Note

We could have used (\varphi)_1 and (\varphi)_2 in place the linear function - f lin : R R , x ϕ 1 + ϕ 2 x - as well.

The sequence holding tuples of X, Y, Y’, Y - Y’ and (Y - Y’)^2:

Pred = {< (x, y, y', y - y', (y - y')^2) | i = 1, ..., $size Y; x = ( X )_{i,2}; y = (Y)_i; y' = f_'lin(x) >}

Some explanation:

  • the (x, y, y', y - y', (y - y')^2) part will create a tuple with the values calculated in the binding to the righ of |
  • i = 1, ..., $size Y creates an iterator over the indexes of the Y vector (size)
  • x = ( X )_{i,2} gets the second element of the X (design) matrix, the predictor variable at index i
  • y = (Y)_i the criterion value at index i
  • y' = f_'lin(x) calls our linear function with the predictor variable to get the predicted value Y’

The fold function that will summarize (Y - Y’)^2 values (the 5th element in the tuples)

fldPredErr: %R ** %R^5 --> %R, (acc, item)
   |--> acc + (item)_5

The calculation of the standard error given the above constructs:

\sigma_'pred = \/~{$fold_{fldPredErr, 0} (Pred) / $size Y}

The above formula will do the fold on the Pred sequence variable and calculate the square root of the mean of squared errors. Rendered like this:

σ pred = fold fldPredErr , 0 Pred size Y

Finally here is the original article from onlinestatbook