Shape Rank
An Advanced Programming System for Machine Learning, Data Analytics and Reactive Programming
Gilad Bracha

1. Motivation

1.1. Mathematical Programming is on the Rise

1.2. Array Programming

Array programming originated in the early 1960s with Iverson’s Turing-award winning APL. It was two generations ahead of its time.

1.3. Example: Dot Product

Dot product is defined as

$\Sigma_{i =1}^n a_i \cdot b_i$, where $a$, $b$ are $n$-vectors.

In a traditional imperative programming language, we'd implement the above definition via something like:

c := 0;
for i := 1 to n { c := c + a[i]*b[i];};

In ShapeRank, we write

$reduce(+, a*b)$

where $a$ and $b$ are given in the program below

1.4. Advantages

1.5. Array Programming, Statically Typed

Recent research shows how we can augment array programming with static typing, making it more applicable to GPUs, TPUs and other accelerators. However, that work has remained largely theoretical; it has been unclear how to incorporate it into a practical design.

1.6. Hyperstream Programming

ShapeRank extends the ideas of APL to streams, in particular, multi-dimensional streams, which we call hyperstreams.

1.7. More than Just Programming Language

We’ve seen a flowering of “Notebook” systems - Mathematica, Matlab, Jupyter, Observable ….

However, Notebooks have a long way to go. We seek to integrate advanced statically typed programming with highly interactive live programming and user customizable visualization and tooling

Ultimately, we want something like Medium meets Arkiv meets Mathematica - an open tool for both research and development that advances the state of the art and gives users of mathematical programming tools for development, collaboration and dissemination.

2. Examples

2.1. Concepts

Scalars: basic values like numbers, strings, structs

Hyperstreams: Multi-dimensional streams, e.g.,

$\begin{bmatrix}2 & 4 & 8 \\16 & 32 & 64 \end{bmatrix}$

Written as

Every hyperstream has a shape and a rank.

The shape is a vector of numbers indicating the size of each dimension.

The rank is the number of dimensions.

$\forall a:rank(a) = length(shape(a))$

Scalars are hyperstreams too!

The shape of a scalar is the empty vector []. The rank of a scalar is 0

2.2. Example : Extending Operations Pointwise

$ a = [1, 2, 3]$

Implicitly, $map(-, a )$.

2.3. Example 2: Extending N-ary Operations Pointwise

$ v_1 = [10, 20, 30]$

$ v_2 = [100, 200, 300]$

Implicitly, $map(+, zip(v1 , v2 ))$.

2.4. Example 3: Replication

$ a = [1, 2, 3]$

Equivalent to $[2, 2, 2] \cdot a$

Implicitly, $map(+, zip(replicate(2, [3]), a ))$.

2.5. Example 4: Arbitrary Rank

3. Language Overview

$\bf{func}$ $+(a, b)$

$\bf{func}$ $sum(x: []) = reduce(+, 0, x)$

$\bf{func}$ $append(a: [], b: [])$

$\bf{func}$ $contains(v: ?, x)$

3.2. Streams

Hyperstreams can have the unbounded dimension, $?$, denoting a stream of indeterminate (and potentially infinite) length.

4. Types

Types statically capture the shape of the data - its rank and the dimensions at each rank, as well as the type of the individual elements of a stream.

Concretely, types consist of

4.1. Type Annotations

Type annotations are used only in the program header, where they indicate the types of program parameters.

A type annotation consists of a shape annotation followed by a base type annotation. The shape annotation may be omitted if the type denotes a scalar hyperstream.

A dimension variable or sum can be used as a type; it is shorthand for the integer subset it denotes.

A base type annotation is either an identifier denoting an predefined type or a type variable, or a record type.

4.1.1. Example:

$\bf func$ $..(start, end)$

is the infix range operator, which produces a vector whose elements are the integers between its two arguments (inclusive).

A shape variable or append can be used as a type; It matches specially marked integer vectors that specify a shape

4.1.2. Example:

$makeTensor(dims: [], contents: [])$

$makeTensor([2, 3], [1, 2, 3, 4, 5, 6])$

4.1.3. Example:

In some cases, the shape of a hyperstream cannot be captured statically, e.g.:

The dimension of the result is inferred to be $?$. A dynamically checked cast must be used to use it where a dimension other than $?$ is expected.

5. Machine Learning

5.1. Some common functions:

$\bf{func}$ $boolToFloat(b) = if$ $b$ $then$ $1.0$ $else$ $0.0$

$\bf{func}$ $mean(x: []) = sum(x)/length(x)$

$\bf{func}$ $softmax(x: []) = e^x / sum(x)$

${\bf func}\hbox{ }cross\_entropy(y: [], y\_prime: []) = -sum(y\_prime * log(y))$
${\bf func}\hbox{ }argmax(v: []) =$ $\hbox{ }fold(maxEntry, [0, -infinity], pair(v, indexes(v)))[2]$

where

${\bf func}\hbox{ }maxEntry(a: [2]Num, b: [2]Num) = \{if$ $a[1] > b[1]$ $then$ $a$ $else$ $b\}$

Examples:

5.2. Linear Classification

The standard description of a linear classification model is

${\bf func}\hbox{ }model(params, x: [])$ $\hbox{ } = softmax(matmul(params.W,x) + params.b))$

Training

Prior to training the model, a loss function must be defined. A common formulation is

${\bf func}\hbox{ }loss(params, data: []) =avg(crossEntropyLoss(data.yp, model(params, data.x)))$

Using this loss function, training is defined via gradient descent, starting with initial parameters for the model and the training data

${\bf func}\hbox{ }train(params,data: []) = gradientDescent(0.05, loss, params, data)$

We can now train the model

${\bf func}\hbox{ }trainedParams = fold(train, initParams, trainingSet)$

We then reduce over an epoch, which consists of $\$bn$ batches :

Testing and Inference

We can now test the accuracy of the trained model thus

${\bf func}\hbox{ }infer(x: []) = argmax(model(trainedParams, x))$

$infer([[0.25, 0.65], [0.2, 0.9], [0.1, 0.1], [0.7, 0]])$

6. Next Steps

Complete Implementation

High performance Compiler

Improve Tooling