Data analytics and Machine Learning are increasingly widespread and important.
But there is a large gap between the mathematics and the programming. Reducing that gap can make those engaged in mathematical programming more productive.
Array programming originated in the early 1960s with Iverson’s Turing-award winning APL. It was two generations ahead of its time.
Dot product is defined as
, where , are -vectors.
In a traditional imperative programming language, we'd implement the above definition via something like:
c := 0;
for i := 1 to n { c := c + a[i]*b[i];};
In ShapeRank, we write
where and are given in the program below
Recent research shows how we can augment array programming with static typing, making it more applicable to GPUs, TPUs and other accelerators. However, that work has remained largely theoretical; it has been unclear how to incorporate it into a practical design.
ShapeRank extends the ideas of APL to streams, in particular, multi-dimensional streams, which we call hyperstreams.
We’ve seen a flowering of “Notebook” systems - Mathematica, Matlab, Jupyter, Observable ….
However, Notebooks have a long way to go. We seek to integrate advanced statically typed programming with highly interactive live programming and user customizable visualization and tooling
Ultimately, we want something like Medium meets Arkiv meets Mathematica - an open tool for both research and development that advances the state of the art and gives users of mathematical programming tools for development, collaboration and dissemination.
Scalars: basic values like numbers, strings, structs
Hyperstreams: Multi-dimensional streams, e.g.,
Written as
Every hyperstream has a shape and a rank.
The shape is a vector of numbers indicating the size of each dimension.
The rank is the number of dimensions.
Scalars are hyperstreams too!
The shape of a scalar is the empty vector []. The rank of a scalar is 0
Implicitly, .
Implicitly, .
Equivalent to
Implicitly, .
Functional, statically typed
Types, including shapes, are inferred, except for external functions and streams passed to the program from the outside world
Formal parameters require explicit rank annotations, but no other shape or type information.
A rank annotation is either where each pair of square brackets denotes one dimension, or , denoting arbitrary rank
Hyperstreams can have the unbounded dimension, , denoting a stream of indeterminate (and potentially infinite) length.
Types statically capture the shape of the data - its rank and the dimensions at each rank, as well as the type of the individual elements of a stream.
Concretely, types consist of
Type annotations are used only in the program header, where they indicate the types of program parameters.
A type annotation consists of a shape annotation followed by a base type annotation. The shape annotation may be omitted if the type denotes a scalar hyperstream.
Shape annotations are either
Dimensions are given between brackets. A dimension may be:
A dimension variable or sum can be used as a type; it is shorthand for the integer subset it denotes.
A base type annotation is either an identifier denoting an predefined type or a type variable, or a record type.
is the infix range operator, which produces a vector whose elements are the integers between its two arguments (inclusive).
A shape variable or append can be used as a type; It matches specially marked integer vectors that specify a shape
In some cases, the shape of a hyperstream cannot be captured statically, e.g.:
The dimension of the result is inferred to be . A dynamically checked cast must be used to use it where a dimension other than is expected.
where
Examples:
The standard description of a linear classification model is
Prior to training the model, a loss function must be defined. A common formulation is
Using this loss function, training is defined via gradient descent, starting with initial parameters for the model and the training data
We can now train the model
We then reduce over an epoch, which consists of batches :
We can now test the accuracy of the trained model thus
Complete Implementation
High performance Compiler
Improve Tooling