Types

Mars is a strict statically typed language. Its type system is based on the parametric Hindley-Milner system of Haskell (and before it, ML). All variables and expressions have a known type at compile time.

In the traditional sense, a data type is a set of possible values. Data types in Mars are exclusive – every value belongs to exactly one type. Int and Array(Int) are examples of data types.

Mars also features type constructors, which are types with parameters. Array is an example of a type constructor. A type constructor is still a type – it is said to be “uninhabited”.

Kinds

To distinguish between different types of types, Mars has what is known as a “kind system”. This is similar to Haskell’s kind system, with two important differences.

Firstly, type constructors are n-ary, not unary. This means types take all of their arguments at the same time, rather than implicitly currying as they do in Haskell (so in Mars, T(a,b) is not equivalent to T(a)(b), whereas in Haskell, it is).

Secondly, type constructors in Mars may be variadic. This means that a type may take an arbitrary number of arguments, similar to a “var-args” function in many programming languages.

For a general overview of kinds, see Kind (type theory) on Wikipedia.

Kind values take one of three forms:

  1. * is the kind of all data types.
  2. (k1,k2,...,kn) -> k is the kind of a type constructor, which takes exactly n types of kind k1, k2, ..., kn, respectively, and produces a type of kind k.
  3. (k1,k2,...,kn,**) -> k is the kind of a type constructor, which takes n or more types, and produces a type of kind k. The first n types must be of kind k1, k2, ..., kn, respectively, and any additional arguments must be of kind *.

The kind of a type is used to distinguish between type constructors and data types, and (for type constructors) the number and kind of their arguments.

Kinds are not directly visible in Mars. They are not specified in source code, hence there is no syntax for them. They are merely a behind-the-scenes mechanic, which is automatically computed for each type. Note that it is possible to display the kind of a type [1].

Note

While this general definition of the kind system allows type constructors with non-star argument and return kinds, the Mars language provides no way to create such types, nor are there any such built-in types. Hence, an implementation may simply represent type constructors in the less general form of an (arity :: Int, isvariadic :: Bool) pair.

Type Values

Type values have the following syntax:

type          ::=  function_type | primary_type
function_type ::=  "(" [type_list] ")" "->" type
                   | primary_type "->" type
primary_type  ::=  atom_type
                    | primary_type "(" [type_list] ["," "..."] ")"
atom_type     ::=  type_variable | type_name | "->"
type_list     ::=  type ("," type)*
type_variable ::=  lower_identifier
type_name     ::=  upper_identifier

Type values take one of the following forms:

  1. A type name is an identifier beginning with an uppercase letter, and references a user-defined or built-in type constructor. Its kind is the kind of the referenced type constructor.
  2. A type variable is an identifier beginning with a lowercase letter, and represents an unknown type. Its kind is inferred from the context.
  3. The token -> is the special builtin type -> (function type constructor). Its kind is (*, **) -> *. Note that this form is rarely used.
  4. Type application, of the form t(t1,t2,...,tn), or partial type application, of the form t(t1,t2,...,tn,"...") (where the final ... is a literal “...” token). See Type application.
  5. A function type, of the form (t1,t2,...,tn) -> t is syntactic sugar for the type ->(t1,t2,...,tn,t). It represents a function which takes n arguments, t1 through to tn, as input, and returns a t. Its kind is always *. The -> operator is right-associative. Parentheses around the argument type are optional for functions of exactly one argument.

Warning

The current Mars implementation does not support the ... notation. It provides no way to perform partial type application. In light of the fact that type constructors cannot accept or produce non-star types, it is a useless feature.

Type application

Type application is the application of zero or more type arguments to a type constructor, producing a new type. Typically, type constructors have kind (*,*,...,*) -> *, and each type argument has kind *, resulting in a new type of kind *. However, this is not the most general case.

In general, type application takes the form t(t1, t2, ..., tn [, "..."]). If the application ends with “...”, it is said to be curried. Type t must have kind (k1, k2, ..., km [, **]) -> k, or the type application is a kind error, and the compiler must reject the program. If the argument kinds of t end with “**”, t is said to be variadic.

It must hold that n == m, with the following exceptions:

  • If the application is curried, it must hold that n <= m.
  • If t is variadic, it must hold that n >= m.
  • If both the application is curried, and t is variadic, n may be any size.

For each ti, where 1 <= i <= min(n, m), ti must have kind ki. Any additional arguments must have kind *.

If the application is not curried, it is considered complete. The resulting type has kind k.

If the application is curried, additional arguments are required. The resulting type has kind (kn+1, ..., km [, **]) -> k. (That is, it is a new type constructor, accepting all remaining arguments after the first n).

Examples of Type Values

  • Int, a built-in type of kind *.
  • List, a type defined in the prelude, of kind * -> * (that is, requiring one type argument).
  • List(Int), the type Int applied to type List, producing a type of kind *, which is a data type.
  • ->, a special built-in type of kind (*, **) -> *.
  • ->(Int, Int), two arguments, both Int, applied to type ->, producing a type of kind *.
    • A function which accepts an Int, and produces an Int.
  • Int -> Int, the same as above, with a more natural syntax.
  • ->(Int, ...), one argument of type Int, applied to type ->, and curried, producing a type of kind (**) -> *.
  • a, a type variable, of unknown kind (requires context).
  • a -> Int, a function type which accepts an argument of any type, and produces an Int.
  • ->(a, b, c), three arguments applied to type ->, producing a type of kind *.
    • A function which accepts two arguments, of types a and b, and produces a c.
  • (a, b) -> c, the same as above, with a more natural syntax.
  • a -> b -> c, a function which accepts an a, and produces a function which accepts a b, and produces a c.
    • This is a classic “curried style” function, as may be found in Haskell.
    • This type may also be written as ->(a, ->(b, c)).

Type unification

Unification is the main algorithm Mars uses for type checking and type inference. For a general overview of unification, see Unification on Wikipedia.

When two or more values are expected to have the same type, their types are unified. The specification explicitly states when types should be unified. If the two types are the same, they successfully unify. Otherwise, they fail to unify, and the program must be rejected (due to a type error).

For example, the type Int will unify with the type Int, but fail to unify with the type Array(Int).

This is complicated by type variables. Each type variable is either free, bound or rigid.

  • A free type variable a will successfully unify with any type t, but once unified, a is bound to t for the entirety of the function.
  • A bound type variable a, bound to type t, will unify with s if and only if t unifies with s.
  • A rigid type variable a is never bound, and unifies only with a.

For example, the type a (if a is a free type variable) will unify with the type Int, but result in a being bound to Int. In future unifications within the same function, a will unify only with Int.

Type variables explicitly named in the header or body of a procedure are rigid, and will not unify with any type other than themselves. For example, this code is invalid:

def to_int(x :: a) :: Int:
    return x                # Type error

It is invalid because the header specifies that the caller may pass an argument of non-Int type, but in that case, it won’t be able to return an Int. Thus the type variable a is rigid; the procedure is only valid if a is not unified with any other type.

Non-rigid type variables are introduced by implicitly-typed variables (for type inference) or by expressions with polymorphic types (such as the empty array literal, or global variables and functions with polymorphic types). For example:

def singleton(x :: Int) :: Array(Int):
    v = []
    return array_add(v, x)

In this example, the variable v is given the type Array(a), with free variable a, when it is first assigned. During the call to array_add, a is unified with Int, so v has type Array(Int). It would be a type error to treat v as a different type elsewhere, even though it hasn’t got any data in it (once its type is bound, it is bound for the entire body of the function; type variables in Mars are monomorphic), as in the following example:

def monomorphic(x :: Int) :: Array(Int):
    v = []
    r = array_add(v, x)
    w = array_add(v, [1])   # Type error
    return r

The unification rules are as follows (note all rules are symmetric):

  • Type variable a unifies with type t with the rules as above (regardless of whether t is a type name, a type variable or a type application). If successful, this results in variable a becoming bound.
  • Type name x unifies with type name x, and no other type name.
  • Type names do not unify with type applications.
  • Type application t ( t1, ..., tn ) unifies with type application s ( s1, ..., sn ) if and only if t unifies with s and ti unifies with si for all i <= n. Any bindings made in the recursive unifications apply.

Polymorphism in global constants

The example above shows that type variables in Mars are monomorphic. This is true only for local variables. As a special rule, global constants (including functions and data constructors) in Mars are polymorphic, meaning they can be given a different binding upon each use.

The type of each global constant is “generalised” by taking each type variable in its original type, and universally quantifying it. A variable of type t containing type variables a1, ..., an is generalised as “∀ a1, ..., an. t

A successful unification of a type with a universal quantification will not cause the quantified variables to become bound, so they may be unified again as a free variable.

The example above can be “fixed” by making v a global constant:

def v :: Array(a) = []

def monomorphic(x :: Int) :: Array(Int):
    r = array_add(v, x)
    w = array_add(v, [1])
    return r

Now v has type ∀a. Array(a). The first unification between ∀a. Array(a) and Array(Int) succeeds without binding a. Thus, the second unification between ∀a. Array(a) and Array(Array(Int)) also succeeds.

The polymorphic / monomorphic distinction is important when dealing with polymorphic functions. Consider:

def twomaps(f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
    y = array_map(f, x)
    z = array_map(g, x)
    return Pair(y, z)

Contrast with:

def twomaps(f :: a -> b, g :: a -> c, x :: Array(a)) :: Pair(Array(b), Array(c)):
    mymap = array_map
    y = mymap(f, x)
    z = mymap(g, x)     # Type error
    return Pair(y, z)

In the former example, array_map is called twice. Its type is ∀t. ∀u. (t -> u, Array(t)) -> Array(u). On the first call, its first argument is unified with a -> b; hence it is treated as though it has type (a -> b, Array(a)) -> Array(b), binding y‘s type to Array(b). However, this unification does not change the type of array_map. On the second call, its first argument is unified with a -> c; hence it is treated as though it has type (a -> c, Array(a)) -> Array(c), binding z‘s type to Array(c). The two uses of array_map treat it as though it has two different, incompatible types, which is valid because the function itself is polymorphic.

In the latter example, array_map is used only once, to assign to the local (monomorphic) variable mymap. This gives mymap the type (t -> u, Array(t)) -> Array(u) (note the lack of universal quantification – the free type variables t and u are scoped to the whole function). On the first call to mymap, its first argument is unified with a -> b; hence free variable t is bound to rigid variable a and free variable u is bound to rigid variable b. This permanently changes the type of mymap to (a -> b, Array(a)) -> Array(b). It is as if the function contained the declaration:

var mymap :: (a -> b, Array(a)) -> Array(b)

Therefore, on the second call to mymap, its first argument of type a -> b is unified with a -> c, and the unification of rigid variables b and c fails, producing a type error.

The bottom line is that once a function (or any other value) is assigned to a local variable, it has a fixed type, even if it is not explicitly declared.

Built-In Types

Mars defines three built-in types, which could not be written in the language itself.

type Int

Has kind *.

Values of this type are arbitrary-precision integers. Positive and negative integers of any magnitude are members of this type.

This type is special because it is not possible to declare user-defined types with arbitrarily-many elements, nor with the special integer literal syntax.

Integer literals form the data constructors for this type. Displaying values of this type results in integer literal form.

type Array(a)

Has kind * -> *.

Provides extensible arrays (sometimes known as “vectors”) which may be accessed and updated in constant time.

Warning

Constant-time array updates are currently only possible by importing the impure library module, and are considered to be an unofficial part of the language.

This type is special because it is not possible to declare user-defined types with the same performance characteristics as a true array.

Array literals may be used to build array values. Built-in functions are also available to create and manipulate arrays. Displaying values of this type results in array literal form.

type ->

Has kind (*, **) -> *.

The function type constructor. Values of type a -> b are functions which accept an argument of type a, and produce a result of type b.

Functions may only be created by being declared at the top level, or through currying.

The display representation of a function is implementation-defined. It is not mathematically valid to print out anything, but implementations may print out information about the function object, to be helpful. The value should not be used in computations.

Type Declarations

Type declarations have the following syntax:

typedef     ::=  "type" type_name [type_params] ":" NEWLINE
                     INDENT constructor+ DEDENT
type_params ::=  "(" [type_variable ("," type_variable)*] ")"
constructor ::=  ctor_name [ctor_params]
ctor_params ::=  "(" [ctor_param ("," ctor_param)*] ")"
ctor_param  ::=  [param_name "::"] type
ctor_name   ::=  upper_identifier
param_name  ::=  lower_identifier

Use of the typedef production will declare a new globally-available type name, as well as a number of globally-available constructor function names. Types must have a name (an identifier beginning with an uppercase letter), and at least one constructor.

A type may optionally have zero or more type parameters. If a type does not have parameters, its kind is *. If a type has n parameters, its kind is (*1, *2, ..., *n) -> *. Any parameter variables may be used in the types of the constructor parameters.

This statement only creates type constructors of the form (*,*,...,*) -> *. It is not possible to create a user-defined type constructor which accepts or returns higher-kinded types, or variadic type constructors.

Note

Declaring a type with empty parentheses (eg. type Foo()) is possible in this grammar, but not something you should ever want to do.

It creates a type constructor, of kind () -> *, which accepts zero arguments. This type cannot be used as a data type without accompanying parentheses wherever it is referred to.

This is allowed in Mars, for more transparency in the kind system, but in practice, all nullary types should be declared without parentheses (eg. type Foo), giving them the kind *.

Types have one or more constructors. Each constructor has a globally-unique name, and zero or more parameters. The constructor declares a globally-available function of the same name, which takes the given arguments as inputs, and returns a value of the type being declared. Each constructor parameter may optionally have a name. No two parameters of a constructor may have the same name. The same parameter name may appear in multiple constructors, but it MUST have the same type in all constructors.

For example, the following type is illegal, as a constructor has two parameters with the same name:

type Foo:
    X(v :: Int, v :: Int)               # Error: Duplicate field name

The following type is valid, as the name v appears in multiple constructors, but has the same type in all instances:

type Foo:
    X(u :: Int, v :: Int)
    Y(v :: Int)

The following type is illegal, as the name v appears in multiple constructors with differing types:

type Foo:
    A(u :: Int, v :: Int)
    B(v :: Array(Int))                  # Error: Duplicate field name

Note

A constructor without parameters is, in effect, declaring a global constant of this type. A type is an “enumeration type” if all of its constructors are nullary, and as such, it simply declares a fixed-size set of named constants.

Note that, just as with the type itself, a constructor without parameters is distinct from a constructor with 0 parameters. Constructors with 0 parameters require an empty function application, and are typically not useful.

Footnotes

[1]The Mars interpreter will print out the kind of any type, using the :k command.

Table Of Contents

Previous topic

Lexical analysis

Next topic

Values

This Page