Parametrically Polymorphic Datatype Definition
Summary
This do***ent describes a simple mechanism that extends the
well-known monomorphic datatype definition into a parametrically
polymorphic one.
After short motivational discussion the concepts of a polymorphic
datatype and of a polymorphic type-case deconstructor are presented,
followed by two skeletal examples of a generic tree and its reduction
to a concrete integer tree.
The reference implementation of polymorphic datatype definition,
type-case de-constructor and fully working examples of trees are
embedded within modules produced by my-module framework, which has
been
presented previously on com.lang.scheme forum. The modularized version
serves the dual purpose: it is used in my-module environment on its
own
rights and it helps testing it. The implementation has been tested and
works ****tably on Mzscheme and Petite Chez Scheme platforms.
Nevertheless, this code can be re-implemented on any platform, with
or without modularization. A list of few required changes is shown
in section 5.
Irrespectively of the modularity issue the polymorphic definitions
maintain high degree of separation from the rest of environment. There
are no references to a global registry, user types may be defined
locally and the type constructors and the type predicates are only
exposed when needed.
It is worth to note that a somewhat similar, although more complex,
problem of type cl***** has been handled in [4], from which I have
borrowed the idea of constructor dispatching.
Contents:
1. The motivation
2. The concept
3. A sketch of modularized Tree
4. A sketch of modularized Integer-Tree
5. ****ting issues
Implementation:
1. module-datatype
2. module-tree
3. module-integer-tree
1. The motivation
Many wonderful ideas, which are presented in EOPL[1] and PLAI[2]
books,
are centered around define-datatype, a.k.a define-type construct. In
essence, the construct lets us define our own recursive datatypes with
variants, which can be furnished with type constraints, and whose
roles
are:
+ clarification of datatype definition
+ runtime contract verification
+ static type checking if a type-checker is available
and is employed, as shown in [3], and other papers.
For example, the following recursive definition of datatype Tree
(define-type Tree
[leaf (datum number?)]
[branch (left Tree?) (right Tree?)])
declares a binary tree structure, with numerical data stored in its
leaves. Verification of data is performed during the tree
construction;
an error is signaled when a datum fails to pass the predicate test
(number? datum).
Accompanying define-type is the type-case construct, which serves
as the type de-constructor, as in this definition of procedure
tree-sum:
(define (tree-sum atree)
(type-case Tree atree
[(leaf datum) datum]
[(branch left right)
(+ (tree-sum left) (tree-sum right))]))
This powerful mechanism has one significant shortcoming though, which
is
best explained by the following tree-map example:
(define (tree-map f atree)
(type-case Tree atree
[(leaf datum) (Leaf (f datum))]
[(branch left right)
(Branch (tree-map f left) (tree-map f right))]))
This tree-map works fine -- as long as the function f maps numbers to
numbers:
f :: number -> number
Any other type of mapping, such as
f :: number -> string
would lead to failure, since no newly generated leaf of destination
tree would ever pass the predicate test (number? datum). There are
two naive workarounds for this problem:
+ Redefine tree definition using any-type? predicate, instead of
number?
predicate
+ Define N tree types such as Number-Tree, String-Tree, etc.
and N*N corresponding tree-map functions: tree-map-number->string,
tree-map-string->number, etc.
Neither solution seems right for obvious reasons. We need to extend
the
define-type and type-case definitions to allow for what is known as a
parametric polymorphism.
2. The concept
Using Haskell notation, the datatype definition of the example tree
discussed above is of this form:
data NumberTree =3D Leaf Number | Branch NumberTree NumberTree
or better yet
data Tree Number =3D Leaf Number | Branch (Tree Number) (Tree Number)
which leads to generalization into trees parameterized by a type
variable 'a
data Tree a =3D Leaf a | Branch (Tree a) (Tree a)
The equivalent datatype definition in Scheme will be of this form:
(define tree
(datatype (tree a?)
[leaf (datum a?)]
[branch (left (tree? a?)) (right (tree? a?))]))
In Haskell notation the symbol 'a represents a parametric type
variable, while Scheme version deals with a parametric contract
predicate a -> Bool, which I abbreviated to a? for readability.
[Compared to original define-datatype, the polymorphic datatype shown
here is a lambda expression, not a definition. As such it may be used
locally in let-expressions but to form a full definition it has to
be bound explicitly to a name, such as 'tree.]
The above example uses only one generic predicate, but nothing
prevents us from introducing two or more such predicates, as in this
example of a contrived fancy tree, which stores two kinds of data
in leaves and in branches[5]
Haskell:
data FancyTree a b =3D
FancyLeaf a
| FancyBranch b (FancyTree a b) (FancyTree a b)
Scheme:
(define fancy-tree
(datatype (fancy-tree a? b?)
[leaf (a-datum a?)]
[branch (b-datum b?)
(left (fancy-tree? a? b?))
(right (fancy-tree? a? b?))]))
[A side note: Haskell variants are global and they cannot be
reused in different datatypes. Hence a need for some new labels, such
as FancyLeaf and FancyBranch if FancyTree is to stay on the equal
footing
with Tree.
Some Scheme implementations of define-datatype follow this tradition
as
well by publicizing the datatype constructors as independent entities.
In my view the pollution of the global namespace should be kept at the
absolute minimum. For this reason this implementation of
define-datatype keeps all data local within a single definition of
datatype and exposes them only on demand. Modularization of course
helps to keep such private information well hidden.]
As a consequence of this new datatype design, it is easy to define a
parametric version of tree-map procedure. Compared to its Haskell
counterpart its signature is a bit more complicated
Haskell:
treeMap :: (a -> b) -> Tree a -> Tree b
Scheme:
tree-map :: a? -> b? -> ((a -> b) -> Tree a -> Tree b)
The Scheme version of tree-map accepts two additional arguments, the
generic predicates a? and b?. Haskell does not need this, since all
typing information is statically resolvable. In contrary, Scheme
executes contracts in runtime only, so we must attach them explicitly
to datatype definition.
The Scheme implementation of the tree-map curries these first two
arguments
(define (tree-map a? b?)
(lambda (f t)
(type-case (tree a?) t
[
(leaf x)
((leaf b?) (f x))
][
(branch left right)
((branch b?)
((tree-map a? b?) f left)
((tree-map a? b?) f right))
])))
which allows for easy specialization to some concrete types, as in:
((tree-map number? string?) number->string a-numeric-tree)
The datatype definition is generic enough to handle both, the
concrete
monomorphic cases and polymorphic ones. For example, the monomorphic
numeric-tree datatype could have been defined without referring to
generic type predicates
(define numeric-tree
(datatype (numeric-tree)
[leaf (datum number?)]
[branch (left (numeric-tree?)) (right (numeric-tree?))]))
but then such definition would have had the same limitation as the
original one, discussed before. Compared to the original definition
this form is a bit more complicated due to extra parentheses around
predicates. But such price is worth to pay for the extra
functionality;
i.e., for the parametric polymorphism sup****t.
3. A sketch of modularized Tree
The following is a skeleton of the module-tree, as implemented in
my-module framework, which I have previously described on this forum.
Following this section is a specialization of module-tree to
module-integer-tree.
(my-module module-tree
(provide:
datatype? ;; a -> Bool
tree ;; a? -> Tree a
tree? ;; a? -> (b -> Bool)
leaf ;; a? -> (a -> Tree a)
branch ;; a? -> (Tree a -> Tree a -> Tree a)
tree-height ;; a? -> (Tree a -> Integer)
tree-size ;; a? -> (Tree a -> Integer)
tree-map ;; (a? -> b?) -> ((a -> b) -> Tree a -> Tree b)
fringe ;; a? -> (Tree a -> List a)
tree->list ;; a? -> (Tree a -> S-expression a)
)
(require:
(module-datatype
datatype?
type-predicate
constructor
))
(define-syntax* datatype (from module-datatype datatype))
(define-syntax* type-case (from module-datatype type-case))
;; Parametric tree
;; tree :: a? -> Tree a
(define tree
(datatype (tree a?)
[
leaf (datum a?)
][
branch (left (tree? a?)) (right (tree? a?))
]))
;; Parametric tree predicate
;; tree? :: a? -> (b -> Bool)
;; Example: ((tree? symbol?) ((leaf symbol?) 'hello)) =3D=3D> #t
(define (tree? a?) (type-predicate (tree a?)))
;; Parametric leaf constructor
;; leaf :: a? -> (a -> Tree a)
;; Example: ((leaf symbol?) 'hello) =3D=3D> (tree . #<procedure>)
;; Example: ((leaf symbol?) 12) =3D=3D> Error
(define (leaf a?) (constructor (tree a?) 'leaf))
;; Parametric branch constructor
;; branch :: a? -> (Tree a -> Tree a -> Tree a)
;; Example: ((branch symbol?)
;; ((leaf symbol?) 'hello) ((leaf symbol?) 'world))
;; =3D=3D> (tree . #<procedure>)
(define (branch a?) (constructor (tree a?) 'branch))
;; Parametric tree-map
;; tree-map :: a? -> b? -> ((a -> b) -> Tree a -> Tree b)
;; Example: ((tree-map symbol? string?) symbol->string a-symbol-tree)
;; =3D=3D> (tree . #<procedure>) representing a-numeric-tree
(define (tree-map a? b?)
(lambda (f t)
(type-case (tree a?) t
[
(leaf x)
((leaf b?) (f x))
][
(branch left right)
((branch b?)
((tree-map a? b?) f left)
((tree-map a? b?) f right))
])))
;; Parametric tree-size
;; tree-size :: a? -> (Tree a -> Integer)
(define (tree-size a?)
(lambda (t)
(type-case (tree a) t
[
(leaf x) 1
][
(branch left right)
(+ ((tree-size a?) left)
((tree-size a?) right))
])))
;; etc.
)
4. A sketch of modularized Integer-Tree
The parameterized Tree from the module-Tree can be further specialized
by partially applying functions of the former to the type predicate
integer? (in most cases) or integer? integer? in case of tree-map.
(my-module module-integer-tree
(provide:
tree ;; Integer-Tree
tree? ;; a -> Bool
Leaf ;; Integer -> Integer-Tree
Branch ;; Integer-Tree -> Integer-Tree -> Integer-Tree
tree-height ;; Integer-Tree -> Integer
tree-size ;; Integer-Tree -> Integer
tree-map ;; (Integer -> Integer) -> Integer-Tree -> Integer-Tree
fringe ;; Integer-Tree -> Integer-List
scale-tree ;; Integer -> Integer-Tree -> Integer-Tree
tree->list ;; Integer-Tree -> Integer-S-expression
leaf-sum ;; Integer-Tree -> Integer
)
(require:
(module-tree
(tree parametric-tree)
(tree? parametric-tree?)
(leaf parametric-leaf)
(branch parametric-branch)
(tree-height parametric-tree-height)
(tree-size parametric-tree-size)
(tree-map parametric-tree-map)
(fringe parametric-fringe)
(tree->list parametric-tree->list)
))
(define tree (parametric-tree integer?))
(define tree? (parametric-tree? integer?))
(define Leaf (parametric-leaf integer?))
(define Branch (parametric-branch integer?))
(define tree-height (parametric-tree-height integer?))
(define tree-size (parametric-tree-size integer?))
(define tree-map (parametric-tree-map integer? integer?))
(define fringe (parametric-fringe integer?))
(define tree->list (parametric-tree->list integer?))
(define-syntax* type-case (from module-datatype type-case))
;; Sum of values stored in tree leaves
(define (leaf-sum t)
(type-case (parametric-tree integer?) t
((leaf datum) datum)
((branch left right)
(+ (leaf-sum left) (leaf-sum right)))))
(define (scale-tree factor t)
(type-case (parametric-tree integer?) t
((leaf datum) (Leaf (* factor datum)))
((branch left right)
(Branch (scale-tree factor left) (scale-tree factor right)))))
;; etc.
)
5. ****ting issues
There are basically three ****ting issues of module-datatype, easily
fixable:
1. Transformers 'datatype and 'type-case can be directly converted
to syntax by replacing 'define by 'define-syntax.
2. In addition, the transformers refer to helper functions via a
construct
like this:
(from module-datatype a-helper)
Replace (from module-datatype a-helper) by a-helper.
3. The module-datatype im****ts several functions from other modules
defined in my environment:
any->string
every
duplicates
=E2=89=A3
For your convenience this code is copied at the bottom of this page.
To ****t examples in module-tree and module-integer-tree few choices
are
available - depending whether you wish to use host modules or stick to
the top level. For example, the 'datytype and 'type-case macros can
be made visible within the top level, or hidden as in original design.
In either case, just follow the advice given above.
References:
[1] Daniel P. Friedman, Mitchell Wand, Christopher T. Haynes
Essentials of programming languages, second edition
[2] Shriram Krishnamurthi, Programming Languages: Application
and Interpretation
[3] Gary T. Leavens, Curtis Clifton, and Brian Dorn,
A Type Notation for Scheme
[4] Message from Andre van Tonder to comp.lang.scheme,
'Typeclass envy', Feb 13 2004, 12:25 pm
[5] Paul Hudak, The Haskell School of Expression


|