\documentclass [fleqn,runningheads% %if dvipdfm ,dvipdfm %endif ]{llncs} % Babel destroys the runningheads: %\usepackage[english]{babel} \usepackage{xspace} \usepackage{url} \usepackage{afterpage} %include lhs2TeX.fmt %include lhs2TeX.sty %include polycode.fmt %include forall.fmt %include spacing.fmt %include indices.fmt %options ghci \usepackage{stmaryrd} \usepackage{amssymb} \usepackage{doubleequals} \title{``Scrap Your Boilerplate'' Reloaded} \titlerunning{``Scrap Your Boilerplate'' Reloaded} \author{Ralf Hinze\inst{1} \and Andres L\"{o}h\inst{1} \and Bruno C.~d.~S.~Oliveira\inst{2}} \authorrunning{R.~Hinze, A.~L\"{o}h, and B.~Oliveira} \institute{Institut f\"{u}r Informatik III, Universit\"{a}t Bonn% \\R\"{o}merstra\ss e 164, 53117 Bonn, Germany% \\\email{$\{$ralf,loeh$\}$\symbol{64}informatik.uni-bonn.de} %\\\texttt{http://www.informatik.uni-bonn.de/\homedir ralf/} \and Oxford University Computing Laboratory% \\Wolfson Building, Parks Road, Oxford OX1 3QD, UK% \\\email{bruno\symbol{64}comlab.ox.ac.uk}% %\\\texttt{http://www.cs.uu.nl/\homedir$\{$ralf,johanj,andres$\}$/}% } %if techreport \subtitle{Extended version -- \today} %endif %if False > {-# OPTIONS -fglasgow-exts -fallow-undecidable-instances -fallow-overlapping-instances #-} The latter two options above are only required for the class-based implementation. > module SYB0 where > import qualified Prelude > import Prelude hiding (sum, show) > import Control.Monad (msum) > import Data.Maybe (fromJust) > import Data.Typeable -- only required for the class-based implementation > dots = undefined > pair = (,) %endif %format ... = "\dots" %format dots = "\dots" %format Dots = ... %format space = " " %format TDots1 = ... "\gobble{" %format TDots2 = TDots1 %format TDots3 = TDots1 %format TDots4 = TDots1 %format tdotsend = "}" %format == = "\doubleequals" %format pair = "(\mathbin{,})" \newcommand{\gobble}[1]{} \newcommand{\fp}{\Xenspace.} \newcommand{\fc}{\Xenspace,} \let\Xenspace=\enspace \let\originalinlinehs=\inlinehs \def\inlinehs{\let\Xenspace\empty\originalinlinehs} \newcommand{\technical}[1]{\textsl{#1}}% sl is ugly \newcommand{\SYB}{SYB\xspace} \newcommand{\GP}{GP\xspace} \newcommand{\GADT}{GADT\xspace} \newcommand{\GADTs}{GADTs\xspace} \newcommand{\GHC}{GHC\xspace} \begin{document} \maketitle \begin{abstract} The paper ``Scrap your boilerplate'' (\SYB) introduces a combinator library for generic programming that offers generic traversals and queries. Classically, support for generic programming consists of two essential ingredients: a way to write (type-)overloaded functions, and independently, a way to access the structure of data types. \SYB seems to lack the second. As a consequence, it is difficult to compare with other approaches such as PolyP or Generic Haskell. In this paper we reveal the structural view that \SYB builds upon. This allows us to define the combinators as generic functions in the classical sense. We explain the \SYB approach in this changed setting from ground up, and use the understanding gained to relate it to other generic programming approaches. Furthermore, we show that the \SYB view is applicable to a very large class of data types, including generalized algebraic data types. \end{abstract} \section{Introduction} The paper ``Scrap your boilerplate'' (\SYB)~\cite{laemmel03scrap} introduces a combinator library for generic programming that offers generic traversals and queries. Classically, support for generic programming consists of two essential ingredients: a way to write (type-)overloaded functions, and independently, a way to access the structure of data types. \SYB seems to lacks the second, because it is entirely based on combinators. In this paper, we make the following contributions: \begin{itemize} \item We explain the \SYB approach from ground up using an explicit representation of data types, the \technical{spine view}. Many of the \SYB library functions are more easily defined in the spine view than using the combinators underlying the original \SYB library. \item We compare the expressive power and applicability of the spine view to the original \SYB paper, to PolyP~\cite{polyp} and to Generic Haskell~\cite{loeh04exploring,ghcoral}. \item Furthermore, we show that the \SYB view is applicable to a very large class of data types, including generalized algebraic data types (\GADTs)~\cite{gadt,pj04wobbly}. \end{itemize} We use Haskell~\cite{Haskell} for all our examples. The source code of this paper~\cite{syb0tr} constitutes a Haskell program that can be compiled by \GHC~\cite{ghcuser} in order to test and experiment with our implementation. While our implementation is not directly usable as a separate library, because it is not extensible (new data types cannot be added in a compositional way), this deficiency is not tied to the idea of the |Spine| view: %if not techreport the technical report version of this paper~\cite{syb0tr} contains a slightly less elegant implementation that is extensible. %else We show in Section~\ref{classbased} how to integrate the |Spine| view with the third \SYB paper, thereby providing an extensible, albeit slightly less elegant implementation. %endif In this introduction, we explain the ingredients of a system for generic programming, and argue that the original \SYB presentation does not clearly qualify as such a system. In order to better understand the concept of generic programming, let us first look at plain functional programming. \subsection{Functional programming and views} As functional programmers in a statically typed language, we are used to define functions by case analysis on a data type. In fact, it is standard practice to define a function on a data type by performing case analysis on the input. The shape of the data type guides our function definitions, and affects how easy it is to define certain functions. As an example, assume we want to implement a priority queue supporting among others the operation %format PQ = PriorityQueue %format PQ' = PQ %format insert' = insert %format splitMinimum' = splitMinimum > splitMinimum :: PQ -> Maybe (Int,PQ) to separate the minimum from the remaining queue if the queue is not empty. %if False > splitMinimum = undefined > insert = undefined %endif We can choose a heap-structured tree to implement the priority queue, and define > data Tree a = Empty | Node (Tree a) a (Tree a) > type PQ = Tree Int {-"\fp"-} The choice of a heap as the underlying data stucture makes the implementation of |splitMinimum| slightly tricky, requiring an auxiliary operation to merge two heaps. If, on the other hand, we choose a sorted list to represent the priority queue > data PQ' = Void | Min Int PQ' {-"\fc"-} we make our life much easier, because |splitMinimum| is now trivial to define. The price we pay is that the implementation on lists is likely to be less efficient than the one using the tree. Such different \technical{views} on a data structure need not be mutually exclusive. Wadler and others have proposed language support for views~\cite{wadler87views,haskell13views}. Many functions on a single data type follow common traversal and recursion patterns. Instead of defining each function by case analysis, it is possible to define combinators that capture these patterns. For instance, given functions > foldTree :: r -> (r -> a -> r -> r) -> Tree a -> r > mapTree :: (a -> b) -> Tree a -> Tree b {-"\fc"-} %if False > foldTree empty node Empty = empty > foldTree empty node (Node l x r) = > node (foldTree empty node l) x (foldTree empty node r) > mapTree f Empty = Empty > mapTree f (Node l x r) = Node (mapTree f l) (f x) (mapTree f r) {-"\fc"-} %endif we can write functions to perform an inorder traversal of the tree or to increase every label in a tree by one very concisely: > inorder = foldTree [] (\l x r -> l ++ [x] ++ r) > incTree = mapTree (+1) {-"\fp"-} \subsection{Generic programming} A \technical{generic function} is a function that is defined once, but works for many data types. It can adapt itself to the structure of data types. Generic functions are also called \technical{polytypic} or \technical{structurally polymorphic}. Genericity is different from \technical{parametric polymorphism}, where the same code works for multiple types, and the structure of a data type is not available for analysis. It is also more specific than \technical{ad-hoc polymorphism}, which allows a function to be defined for different data types, by providing one implementation for each type. Typical examples of generic functions are equality or comparison, parsing and unparsing, serialization, traversals over large data structures and many others. Support for generic programming consists of two essential ingredients. Firstly, support for ad-hoc polymorphism is required. This allows the programmer to write \technical{overloaded functions}, i.e., functions that dispatch on a type argument. Secondly, we need a \technical{generic view} on the structure of data types. In a nominal type system, types with similar structure are considered to be completely distinct. To employ generic programming, we need to lower this barrier and make the structure transparent if desired. The two ingredients are orthogonal, and for both, there is a choice. Overloaded functions can be expressed in Haskell using the class system, using a type-safe cast operation, by reflecting the type system on the value level, or by a combination of the above. Any of these approaches has certain advantages and disadvantages, but they are mostly interchangeable and do not dramatically affect the expressivity of the generic programming system. The structural view, on the other hand, dictates the flavour of the whole system: it affects the set of data types we can represent in the view, the class of functions we can write using case analysis on the structure, and potentially the efficiency of these functions. The structural view is used to make an overloaded function truly generic, working for a data type even if it has no ad-hoc case for that type. For instance, PolyP views data types as fixed points of regular functors. Therefore its approach is limited to regular data types, but the view allows access to the points of recursion and allows the definition of recursion combinators such as catamorphisms. Generic Haskell uses a sum-of-products view which is more widely applicable, but limits the class of functions we can write. The concept of generic views is explained further in a recent paper~\cite{holdermans2005views}, and is related to \technical{universes} in dependently-typed programming~\cite{benke2003universes}. In summary, it turns out that there is a close analogy between plain functional and generic programming: the concepts of views, function definition by case analysis, and combinators occur in both settings. \subsection{Scrap your boilerplate} In analogy with the situation on plain functions, not all generic functions are defined by case analysis. Just as there are powerful combinators for ordinary functions, such combinators also exist for generic programming. In fact, the very combinators we have used above, |foldTree| and |mapTree|, are typical candidates for generalization. The paper ``Scrap your boilerplate'' (\SYB) describes a library for \technical{strategic programming}~\cite{visser2000language}, i.e., it offers combinators for generic traversals and queries on terms. Two central combinators of the \SYB library are |everywhere| to traverse a data structure and modify it in certain places, and |everything| to traverse a data structure and collect information in the process. The \SYB approach builds completely on combinators, and some fundamental combinators are assumed to be provided by the implementation. While this is fine in practice, it makes it difficult to compare \SYB with other approaches such as PolyP or Generic Haskell. The reason is that the concept of a generic view seems to be missing. Functions are never defined by case analysis on the structure of types. However, the generic view is only hidden in the original presentation. In this paper we reveal the structure that \SYB uses behind the scenes and that allows us to define the \SYB combinators as generic functions by case analysis on that structure. We will explain the \SYB approach in this changed setting from ground up. The focus of the presentation is on conceptual conciseness. We do not strive to replace the original implementation, but to complement it by an alternative implementation which may be easier to understand and relate to other approaches. \subsection{Organization of this paper} The rest of this paper is organized as follows: We first describe the two orthogonal ingredients required for generic programming in our presentation of the \SYB approach: overloaded functions (Section~\ref{functionsontypes}) and the \technical{spine view}, the structure that is the hidden foundation of \SYB (Section~\ref{spineview}). We then review the central combinators of \SYB in Section~\ref{mapqmapt}. Section~\ref{genericshow} shows how we can access names of constructors. In Section~\ref{comparison}, we take a step back and relate \SYB to other generic programming approaches. Inspired by our analysis on the expressiveness of the \SYB approach, we demonstrate how to extend the spine view to generalized algebraic data types (Section~\ref{ssyb}). %if techreport In Section~\ref{classbased}, we take a more detailed look at the way that extensible generic functions are encoded in the third of the \SYB papers. We explain how the spine view can be used with a class-based encoding of overloaded functions. %endif Section~\ref{conclusions} discusses related work and concludes. \section{Overloaded functions}% \label{functionsontypes} % Alternative section titles: % Type-indexed functions % Generic functions % Type case The standard way in Haskell to express an overloaded function is to use a type class. In fact, this is the way taken by the original \SYB papers: in the first \SYB paper, type classes are used in conjunction with a type-safe cast operation, and in the third paper, overloaded functions are expressed solely based on type classes. However, type classes leave it to the compiler to find the correct instance, and thus hide a non-trivial aspect of the program. In this paper, we prefer to be more explicit and emphasize the idea that an overloaded function dispatches on a type argument. Haskell excels at embedded languages, so it seems a good idea to try to embed the type language in Haskell. The following way to encode overloaded functions is not new: it is based on Hinze's ``Fun of Programming'' chapter~\cite{hinze03fun} and has been used widely elsewhere~\cite{oliveira05typecase}. The whole point of static types is that they can be used at compile time to distinguish programs, hence we certainly do not want to use an unparameterized data type |Type| to represent types. Instead, we add a parameter so that |Type t| comprises only type representations for the type |t|. We now need ways to construct values of type |Type t|. For instance, |Int| can be a representation of the type |Int|, so that we have |Int :: Type Int|. Similarly, if we have a representation |r| of type |a|, we can make |List r| a representation of type |[a]|, or formally |List :: Type a -> Type [a]|. The notation we use suggests that |Int| and |List| are data constructors of type |Type|, but this impossible in Haskell~98, because the result type of a constructor must always be unrestricted, i.e., |Type a| for some type variable~|a|. Fortunately, \GHC now supports \technical{generalized algebraic data types} (\GADTs)~\cite{gadt,pj04wobbly}, which lift exactly this restriction. Therefore, we can indeed define |Type| in Haskell using the following \GADT: > data Type :: * -> * {-" "-} where > Int :: Type Int > Char :: Type Char > List :: Type a -> Type [a] > Pair :: Type a -> Type b -> Type (a,b) > Tree :: Type a -> Type (Tree a) {-"\fp"-} %if False > Bool :: Type Bool > Maybe :: Type a -> Type (Maybe a) > Rose :: (forall a. Type a -> Type (f a)) -> Type a -> Type (Rose f a) > Type :: Type a -> Type (Type a) > Dynamic :: Type Dynamic > Dots :: Type a -- dummy For the refactoring example: > HsExp :: Type HsExp > HsPat :: Type HsPat > HsName :: Type HsName > HsLiteral :: Type HsLiteral > HsAlt :: Type HsAlt > HsGuardedAlt :: Type HsGuardedAlt %endif This type allows us to represent integers, characters, lists, pairs, and trees -- enough to give an example of a simple overloaded function that sums up all integers in a value: %format sum0 = sum > sum0 :: Type a -> a -> Int > sum0 Int n = n > sum0 Char _ = 0 > sum0 (List a) xs = foldr (+) 0 (map (sum0 a) xs) > sum0 (Pair a b) (x,y) = sum0 a x + sum0 b y > sum0 (Tree a) t = sum0 (List a) (inorder t) {-"\fp"-} The function |sum0| works on all types that can be constructed from |Int|, |Char|, |[]|, |pair|, and |Tree|, for instance, on a complex type such as |[(Char,Int)]|: the expression \begingroup \invisiblecomments %{ %format = = " " %format v0 = " " > v0=sum0 (List (Pair Char Int)) [('k',6),('s',9),(' ',27)] evaluates to \eval{v0}. %} \endgroup The function |sum0| is an example of an ad-hoc-polymorphic function. There are a limited number of cases for different types, defining potentially unrelated behavior of |sum| for these types. The function will not work on types such as |Bool| or |Maybe| or even on a type > newtype MyPair a b = MyPair (a,b) {-"\fc"-} because Haskell has a nominal type system, hence |MyPair a b| is isomorphic to, yet distinct from |(a,b)|. %if False Let us emphasize again that the encoding of overloaded functions is orthogonal to the concept of a generic view, therefore the remainder of this paper is not intricately tied to the choice that overloaded functions are represented using an explicit argument of type |Type|. %endif \section{The spine view}% \label{spineview} %format :$ = "\mathbin{\raisebox{.15ex}{$\scriptstyle\lozenge$}}" %format :$: = :$ %format $$ = "\mathbin{\raisebox{.15ex}{$\scriptstyle\blacklozenge$}}" %format Spine0 = Spine %format Constr0 = Constr In this section, we learn how to define a truly generic |sum|, which works on |Bool| and |Maybe| and |MyType|, among others. Take a look at any Haskell value. If it is not of some abstract type, it can always be written as a data constructor applied to other values. For example, |Node Empty 2 Empty| is the |Node| data constructor, applied to the three values |Empty|, |2|, and |Empty|. Even built-in types such as |Int| or |Char| are not fundamentally different: every literal can be seen as a nullary constructor. Let us make the structure of constructed values visible and mark each constructor using |Constr|, and each function application using |:$|. The example from above becomes %{ %format = = " " %format v1 = " " %format v2 = " " %format v3 = " " %if False > v1=Constr0 1 > v2=Constr0 (:) :$: 1 :$: [2,3] %endif \begingroup %if not techreport \inlinehs %endif > v3=Constr0 Node :$: Empty :$: 2 :$: Empty {-"\fp"-} \endgroup %} The functions |Constr| and |(:$)|% \footnote{We use |(:$:)| as a symbol for an infix data constructor. For our presentation, we ignore the Haskell rule that names of infix data constructors must start with a colon.} are themselves constructors of a new data type |Spine0|:% \footnote{Note that in contrast to |Type|, the data type |Spine0| is not necessarily a \emph{generalized} algebraic data type. The result types of the constructors are not restricted, |Spine| could therefore be defined in \GHC as a normal data type with existentials. However, we prefer the \GADT syntax.} > data Spine0 :: * -> * {-" "-} where > Constr0 :: a -> Spine0 a > (:$:) :: Spine0 (a -> b) -> a -> Spine0 b {-"\fp"-} %format fromSpine0 = fromSpine %format toSpine0 = toSpine Given a value of type |Spine a|, we can recover the original value of type |a| by undoing the conversion step made before: > fromSpine0 :: Spine0 a -> a > fromSpine0 (Constr0 c) = c > fromSpine0 (f :$: a) = (fromSpine0 f) a {-"\fp"-} The function |fromSpine0| is parametrically polymorphic, i.e., it works independently of the type in question: it just replaces |Constr| with the original constructor and |(:$)| with function application. Unfortunately, |fromSpine0| is the only interesting function we can write on a |Spine0|. Reconsider the type of the |(:$:)| constructor: < (:$:) :: Spine0 (a -> b) -> a -> Spine0 b {-"\fp"-} The type |a| is not visible in the final result (it is existentially quantified in the data type), so the only thing we can do with the component of type |a| is to combine it somehow with the component of type |Spine0 (a -> b)|. %format ::: = "\mathrel{:}" Since we intend to call overloaded functions on the value of type |a|, we require a representation of the type of |a|. Our solution is thus that together with the value of type |a|, we store a representation of its type. To this end, we introduce a data type for typed values\footnote{We use |(:::)| as data constructor for the type |Typed| in this paper. The \technical{cons}-operator for lists, written |(:)| in Haskell, does not occur in this paper.} > data Typed a = a ::: Type a {-"\fc"-} and then adapt |(:$:)| to use |Typed a| instead of |a|: > data Spine :: * -> * {-" "-} where > Constr :: a -> Spine a > (:$) :: Spine (a -> b) -> Typed a -> Spine b {-"\fp"-} %if False [Alternatively, we can use |Apply| instead of |(:$)| and add a third argument.] %endif Of course, we have to adapt |fromSpine| to ignore the new type annotations: > fromSpine :: Spine a -> a > fromSpine (Constr c) = c > fromSpine (f :$ (a ::: _)) = (fromSpine f) a {-"\fp"-} We can define a right inverse to |fromSpine|, as an overloaded function. For each data type, the definition follows a trivial pattern. Here are the cases for the |Int| and |Tree| types: > toSpine :: Type a -> a -> Spine a > toSpine Int n = Constr n > toSpine (Tree a) Empty = Constr Empty > toSpine (Tree a) (Node l x r) = Constr Node > :$ (l ::: Tree a) :$ (x ::: a) :$ (r ::: Tree a) {-"\fp"-} %if False > toSpine (List a) x = fromList a x > toSpine Char x = fromChar x > toSpine Bool x = fromBool x For the refactoring example: > toSpine HsExp x = fromHsExp x > toSpine HsPat x = fromHsPat x > toSpine HsName x = fromHsName x > toSpine HsLiteral x = fromHsLiteral x > toSpine HsAlt x = fromHsAlt x > toSpine HsGuardedAlt x = fromHsGuardedAlt x Some of the auxiliary functions: > fromList :: Type a -> [a] -> Spine [a] > fromList _ [] = Constr [] > fromList a (x : xs) = Constr (:) :$ (x ::: a) :$ (xs ::: List a) > fromChar :: Char -> Spine Char > fromChar c = Constr c > fromBool :: Bool -> Spine Bool > fromBool b = Constr b Old, separated (but too long) variants of |fromInt| and |fromTree|: > fromInt :: Int -> Spine Int > fromInt n = Constr n > fromTree :: Type a -> Tree a -> Spine (Tree a) > fromTree _ Empty = Constr Empty > fromTree a (Node l x r) = Constr Node > :$ (l ::: Tree a) :$ (x ::: a) :$ (r ::: Tree a) %endif With all the machinery in place, we can now write the truly generic |sum|: %, given in Figure~\ref{genericsum}. %\begin{figure}[b] > sum :: Type a -> a -> Int > sum Int n = n > sum t x = sum' (toSpine t x) > > sum' :: Spine a -> Int > sum' (Constr c) = 0 > sum' (f :$ (x ::: t)) = sum' f + sum t x {-"\fp"-} %\caption{Generic |sum|}% %\label{genericsum} %\end{figure} This function requires only a single type-specific case, namely the one for |Int|. The reason is that we want to do something specific for integers, which does not follow the general pattern, whereas the formerly explicit behavior for the types |Char|, |[]|, |pair|, and |Tree| is now completely subsumed by the function |sum'|. Note also that in the last line of |sum'|, the type information |t| for |x| is indispensable, as we call the generic function |sum| recursively. Why are we in a better situation than before? If we encounter a new data type such as |Maybe|, we still have to extend the representation type %if techreport %format Type4 = Type %format toSpine4 = toSpine %format Maybe4 = Maybe > data Type4 :: * -> * {-" "-} where > TDots4 :: Type4 tdotsend > Maybe4 :: Type4 a -> Type4 (Maybe a) and adapt the |toSpine| function > fromMaybe :: Type a -> Maybe a -> Spine (Maybe a) > fromMaybe _ Nothing = Constr Nothing > fromMaybe a (Just x) = Constr Just :$ (x ::: a) > > toSpine4 :: Type a -> a -> Spine a > toSpine4 Dots = dots > toSpine4 (Maybe a) = fromMaybe a {-"\fp"-} %else with a constructor |Maybe :: Type a -> Type (Maybe a)| and provide a case for the constructor |Maybe| in the |toSpine| function. %endif However, this has to be done only once per data type, and it is so simple that it could easily be done automatically. The code for the generic functions (of which there can be many) is completely unaffected by the addition of a new data type. \section{Generic queries and traversals}% \label{mapqmapt} In this section, we implement the two central \SYB combinators |everything| and |everywhere| that are used to construct generic queries and traversals. \subsection{Generic queries} %format everything0 = everything %format everything0' = everything0 "''" A query is an overloaded function that returns a result of a specific type: > type Query r = forall a. Type a -> a -> r {-"\fp"-} We have already seen an example of a query, namely the |sum| function from %Figure~\ref{genericsum}. Section~\ref{spineview}. There are many more applications of queries: computation of the size of a structure, collection of names, collection of free variables, building a finite map, finding a specific element etc. If we look back at the generic |sum| function, we see that it performs several tasks at once, leading to a relatively complex definition: integers are preserved, while in general, constructors are replaced by |0|; the subresults are added; finally, a recursive traversal is performed over the entire data structure. In the following, we describe how to separate these different activities into different functions, and at the same time abstract from the specific problem of summing up values. If we already have a query, we can define a derived query that applies the original query to all immediate children of a given constructor: > mapQ :: Query r -> Query [r] > mapQ q t = mapQ' q . toSpine t > > mapQ' :: Query r -> (forall a. Spine a -> [r]) > mapQ' q (Constr c) = [] > mapQ' q (f :$ (x ::: t)) = mapQ' q f ++ [q t x] {-"\fp"-} The results of the original query |q| are collected in a list. The combinator |mapQ| does not traverse the input data structure. The traversal is the job of |everything'|, which is defined in terms of |mapQ|: > everything' :: Query r -> Query [r] > everything' q t x = [q t x] ++ concat (mapQ (everything' q) t x) {-"\fp"-} Here, we apply the given query |q| to the entire argument |x|, and then recurse for the immediate children. The \SYB version of |everything| fuses |everything'| with an application of |foldl1|, using a binary operator to combine all the elements of the nonempty list returned by |everything'|: > everything :: (r -> r -> r) -> Query r -> Query r > everything op q t x = foldl1 op ([q t x] ++ mapQ (everything op q) t x) {-"\fp"-} In order to express the query |sum| in terms of |everything|, we need a simple query |sumQ| expressing that we want to count integers: %format sum1 = sum > sumQ :: Query Int > sumQ Int n = n > sumQ t x = 0 > > sum1 :: Query Int > sum1 = everything (+) sumQ {-"\fp"-} \subsection{Generic traversals}% \label{mapt}% While a query computes an answer of a fixed type from an input, a traversal is an overloaded function that preserves the type of its input: > type Traversal = forall a. Type a -> a -> a {-"\fp"-} The counterpart of |mapQ| is |mapT|. It applies a given traversal to the immediate children of a constructor, then rebuilds a value of the same constructor from the results: > mapT :: Traversal -> Traversal > mapT h t = fromSpine . mapT' h . toSpine t > > mapT' :: Traversal -> (forall a. Spine a -> Spine a) > mapT' h (Constr c) = Constr c > mapT' h (f :$ (x ::: t)) = mapT' h f :$ (h t x ::: t) {-"\fp"-} The function |mapT| not only consumes a value of the type argument, but also produces one. Therefore we call not only |toSpine| on the input value, but also |fromSpine| before returning the result. The calls to |fromSpine| and |toSpine| are determined by the type of the generic function that is defined. The general principle is described elsewhere~\cite[Chapter 11]{loeh04exploring}. Using |mapT|, we can build bottom-up or top-down variants of |everywhere|, which apply the given traversal recursively: %format everywhere_td = everywhere "_{\textsc{td}}" %format everywhere_bu = everywhere "_{\textsc{bu}}" > everywhere_bu :: Traversal -> Traversal > everywhere_bu f t = f t . mapT (everywhere_bu f) t > > everywhere_td :: Traversal -> Traversal > everywhere_td f t = mapT (everywhere_td f) t . f t {-"\fp"-} There are many applications of traversals, such as renaming variables in an abstract syntax tree, annotating a structure with additional information, optimizing or simplifying a structure etc. Here is a simplified example of a transformation performed by the Haskell refactorer HaRe~\cite{li2003refactor}, which rewrites a Haskell |if| construct into an equivalent |case| expression according to the rule %if False Haskell abstract syntax (from Language.Haskell, simplified): > data HsExp = > HsVar HsName -- different from the original > | HsCon HsName -- different from the original > | HsLit HsLiteral > | HsApp HsExp HsExp > | HsLambda {- SrcLoc -} [HsPat] HsExp > -- | HsLet [HsDecl] HsExp > | HsIf HsExp HsExp HsExp > | HsCase HsExp [HsAlt] > | HsTuple [HsExp] > | HsParen HsExp > | HsWildCard deriving (Eq,Show) > fromHsExp (HsVar name) = Constr HsVar :$ (name ::: HsName) > fromHsExp (HsLit lit) = Constr HsLit :$ (lit ::: HsLiteral) > fromHsExp (HsApp e1 e2) = Constr HsApp :$ (e1 ::: HsExp) > :$ (e2 ::: HsExp) > fromHsExp (HsLambda ps e) = Constr HsLambda :$ (ps ::: List HsPat) > :$ (e ::: HsExp) > fromHsExp (HsIf e1 e2 e3) = Constr HsIf :$ (e1 ::: HsExp) > :$ (e2 ::: HsExp) > :$ (e3 ::: HsExp) > fromHsExp (HsCase e l) = Constr HsCase :$ (e ::: HsExp) > :$ (l ::: List HsAlt) > data HsPat = > HsPVar HsName > | HsPLit HsLiteral > | HsPTuple [HsPat] > | HsPParen HsPat > | HsPWildCard deriving (Eq,Show) > > fromHsPat (HsPVar name) = Constr HsPVar :$ (name ::: HsName) > fromHsPat (HsPLit lit) = Constr HsPLit :$ (lit ::: HsLiteral) > fromHsPat (HsPWildCard) = Constr HsPWildCard > > data HsName = HsIdent String deriving (Eq,Show) > > fromHsName (HsIdent name) = Constr HsIdent :$ (name ::: List Char) > > data HsLiteral = > HsChar Char > | HsInt Int > | HsBool Bool deriving (Eq,Show) > > fromHsLiteral (HsChar c) = Constr HsChar :$ (c ::: Char) > fromHsLiteral (HsInt x) = Constr HsInt :$ (x ::: Int) > fromHsLiteral (HsBool x) = Constr HsBool :$ (x ::: Bool) > data HsAlt = HsAltC HsPat HsExp deriving (Eq,Show) > fromHsAlt (HsAltC pat exp) = Constr HsAltC :$ (pat ::: HsPat) > :$ (exp ::: HsExp) > data HsGuardedAlt = HsGuardedAltC HsExp HsExp deriving (Eq,Show) > > fromHsGuardedAlt (HsGuardedAltC e1 e2) = Constr HsGuardedAltC > :$ (e1 ::: HsExp) > :$ (e2 ::: HsExp) %endif %format ==> = "\quad\leadsto\quad" %format HsAltC = HsAlt < if e then e1 else e2 ==> case e of True -> e1; False -> e2 {-"\fp"-} We assume a suitable abstract syntax for Haskell. The rewrite rule is captured by the traversal > ifToCaseT :: Traversal > ifToCaseT HsExp (HsIf e e1 e2) = > HsCase e [ HsAltC (HsPLit (HsBool True)) e1, > HsAltC (HsPLit (HsBool False)) e2] > ifToCaseT _ e = e {-"\fp"-} The traversal can be applied to a complete Haskell program using > ifToCase = everywhere_bu ifToCaseT {-"\fp"-} %if False This is the Haskell expression |if x == 0 then 1 else x|: > testIf = HsIf (HsApp (HsApp (HsVar (HsIdent "==")) > (HsVar (HsIdent "x"))) > (HsLit (HsInt 0))) > (HsLit (HsInt 1)) > (HsVar (HsIdent "x")) > testHaReExample = ifToCase HsExp testIf %endif \section{Generically showing values}% \label{genericshow} We have seen that we can traverse data types in several ways, performing potentially complex calculations in the process. However, we cannot reimplement Haskell's |show| function, even though it looks like a |Query String|. The reason is that there is no way to access the name of a constructor. We have a case for constructors, |Constr|, in our |Spine| data type, but there is really not much we can do at this point. So far, we have either invented a constant value (|[]| in the case of |mapQ|), or applied the constructor itself again (in the case of |mapT|). But it is easy to provide additional information for each constructor. When we define |toSpine| for a specific data type, whether manually or automatically, we have information about the constructors of the data type available, so why not use it? Let us therefore modify |Spine| once more: %format SpineC = Spine %format :$$ = :$ > data SpineC :: * -> * {-" "-} where > As :: a -> ConDescr -> SpineC a > (:$$) :: SpineC (a -> b) -> Typed a -> SpineC b {-"\fp"-} We have renamed |Constr| to |As|, as we intend to use it as a binary operator which takes a constructor function and information about the constructor. In this paper, we use only the name to describe a constructor, > type ConDescr = String {-"\fc"-} %format fromTreeC = fromTree %format toSpineC = toSpine but we could include additional information such as its arity, the name of the type, the ``house number'' of the constructor and so on. Adapting |SpineC| means that the generation of |toSpineC| has to be modified as well. We show as an example how to do this for the type |Tree|: %if False Again, we had |fromTreeC| as a separate function before: > fromTreeC :: Type a -> Tree a -> SpineC (Tree a) > fromTreeC _ Empty = Empty `As` "Empty" > fromTreeC a (Node l x r) = Node `As` "Node" > :$$ (l ::: Tree a) :$$ (x ::: a) :$$ (r ::: Tree a) %endif %if False > toSpineC :: Type a -> a -> SpineC a %endif > toSpineC (Tree a) Empty = Empty `As` "Empty" > toSpineC (Tree a) (Node l x r) = Node `As` "Node" > :$$ (l ::: Tree a) :$$ (x ::: a) :$$ (r ::: Tree a) {-"\fp"-} %if False > toSpineC Dots space = dots %endif %if False > toSpineC (List a) x = fromListC a x > toSpineC Int x = fromIntC x > toSpineC Char x = fromCharC x > toSpineC (Pair a b) x = fromPairC a b x > toSpineC (Type a) x = fromType x > toSpineC Dynamic x = fromDynamic x > fromIntC :: Int -> SpineC Int > fromIntC n = n `As` Prelude.show n > fromCharC :: Char -> SpineC Char > fromCharC c = c `As` Prelude.show c > fromPairC :: Type a -> Type b -> (a,b) -> SpineC (a,b) > fromPairC a b (x,y) = pair `As` "(,)" :$$ (x ::: a) :$$ (y ::: b) > fromListC :: Type a -> [a] -> SpineC [a] > fromListC _ [] = [] `As` "[]" > fromListC a (x:xs) = (:) `As` "(:)" :$$ (x ::: a) :$$ (xs ::: List a) %endif With the new version of |Spine|, the function |show| is straightforward to write: > show :: Type a -> a -> String > show t x = show' (toSpineC t x) > > show' :: SpineC a -> String > show' (_ `As` c) = c > show' (f :$$ (a ::: t)) = "(" ++ show' f ++ " " ++ show t a ++ ")" {-"\fp"-} The result of the call %{ %format = = " " %format v4 = " " > v4=show (Tree Int) (Node (Node Empty 1 Empty) 2 (Node Empty 3 Empty)) is \begingroup \invisiblecomments %if not techreport \inlinehs %else the string %endif > {- ``\eval{putStr $ v4}''\fp -} \endgroup %} %if techreport It is also easy to define |toConDescr|, which returns the |ConDescr| for a value: > toConDescr :: Type a -> a -> ConDescr > toConDescr t = toConDescr' . toSpineC t > > toConDescr' :: SpineC a -> String > toConDescr' (_ `As` c) = c > toConDescr' (f :$$ (a ::: t)) = toConDescr' f {-"\fp"-} %endif Even though we have information about constructors, we cannot define a generic |read| without further extensions. In the next section, we will discuss this and other questions regarding the expressivity of the \SYB approach. \section{\SYB in context}% \label{comparison} In the previous sections, we have introduced the \SYB approach on the basis of the |Spine| data type. Generic functions are overloaded functions that make use of the |Spine| view by calling |toSpine| on their type argument. We have seen that we can define useful and widely applicable combinators such as |everything| and |everywhere| using some basic generic functions. As long as we stay within the expressivity of these combinators, it is possible to perform generic programming avoiding explicit case analysis on types. In this section, we want to answer how expressive the |Spine| view is in comparison to both the original presentation of \SYB, which uses only a given set of combinators, and in relation to other views, as they are employed by other approaches to generic programming such as PolyP and Generic Haskell. \subsection{The original presentation} As described in the section of implementing \SYB in the original paper, it turns out that |mapT| and |mapQ| are both instances of a function that is called |gfoldl|. We can define |gfoldl|, too. To do this, let us define the ordinary fold (or catamorphism, if you like) of the |Spine| type: > foldSpine :: (forall a . a -> r a) -> (forall a b . r (a -> b) -> Typed a -> r b) -> > SpineC a -> r a > foldSpine constr ($$) (c `As` _) = constr c > foldSpine constr ($$) (f :$$ (x ::: t)) = (foldSpine constr ($$) f) $$ (x ::: t) {-"\fp"-} The definition follows the catamorphic principle of replacing data constructors with functions. The \SYB |gfoldl| is just |foldSpine| composed with |toSpine|: > gfoldl :: Type a -> (forall a . a -> r a) -> (forall a b . r (a -> b) -> Typed a -> r b) -> > a -> r a > gfoldl t constr app = foldSpine constr app . toSpineC t {-"\fp"-} It is therefore clear that our approach via the |Spine| type and the original \SYB approach via |gfoldl| are in principle equally expressive, because the |Spine| type can be recovered from |gfoldl|. However, we believe that the presence of the explicit data type |Spine| makes the definitions of some generic functions easier, especially if they do not directly fall in the range of any of the simpler combinators. The original \SYB paper describes only generic functions that either consume a value based on its type (queries, consumers), or that consume a value based on its type and build up a similar value at the same time (traversals). %format some0 = some %format some0' = some' There are also generic functions that construct values based on a type (producers). Such functions include the already mentioned generic |read|, used to parse a string into a value of a data type, or |some|, a function that produces some non-bottom value of a given data type. We cannot define such functions without further help: The definition of |some0| would presumably follow the general pattern of overloaded functions on spines, the shape of the final case dictated by the type of |some0| (cf.~Section~\ref{mapt}): > some0 :: Type a -> a > some0 Dots = dots > some0 t = fromSpine some0' %if False > some0' :: Spine a > some0' = undefined %endif But we cannot define |some0' :: Spine a|, because that would yield %{ %format = = " " %format v5 = " " \begingroup \inlinehs > v5=fromSpine some0' {-"\linebreak[1]"-}:: forall a. a {-"\fc"-} \endgroup %} which has to be |undefined| according to the parametricity theorem~\cite{wadler89theorems}. Due to the well-definedness of |fromSpine|, |some'| would have to be |undefined|, too. It is nevertheless possible to define |some :: Type a -> a|, but only if |Type| is augmented with more information %if not techreport about the type it represents and its constructors. Due to space limitations, the implementation is deferred to the technical report~\cite{syb0tr}, where we give an equivalent of |gunfold| from the second \SYB paper~\cite{syb2}. %else about the type it represents. In particular, it must be possible to obtain a full list of constructors in some suitable data structure from a type representation. If we pursue this path further, we end up with a specific data type to hold the constructors of a single type %format ::$ = "\mathbin{\raisebox{.15ex}{$\scriptstyle\oblong$}}" > data TypeSpine :: * -> * {-" "-} where > TypeAs :: a -> ConDescr -> TypeSpine a > (::$) :: TypeSpine (a -> b) -> Type a -> TypeSpine b This data type is almost the same as |SpineC|, only the second argument of |(::$)| is a |Type| instead of a |Typed|. A |TypeSpine| therefore contains no value except the constructor function itself. We can now write a function > constrs :: Type a -> [TypeSpine a] for each data type, but this function has to be defined for each data type separately, much like |toSpine|. Here is an example for the constructors of |Tree|: > constrs (Tree a) = [ Empty `TypeAs` "Empty", > Node `TypeAs` "Node" ::$ Tree a ::$ a ::$ Tree a ] {-"\fp"-} %if False > constrs (List a) = [ [] `TypeAs` "[]", > (:) `TypeAs` "(:)" ::$ a ::$ List a ] > constrs Int = [ n `TypeAs` Prelude.show n | n <- [0..] ] > -- a proper implementation would use interleave to get negatives as well > constrs (Pair a b)= [ pair `TypeAs` "(,)" ::$ a ::$ b ] %endif With |constrs|, we now can define |some| or even |read|: > some :: Type a -> a > some = some' . head . constrs > > some' :: TypeSpine a -> a > some' (TypeAs c _) = c > some' (f ::$ a) = some' f (some a) {-"\fp"-} %if False %format toSpine2 = toSpine %format toSpine2' = toSpine' Can |toSpine| can be written as a true generic function in terms of |constrs|? I think not. %endif Indeed, the |gunfold| function that is added to the set of predefined combinators in the second \SYB paper~\cite{syb2} is derived from the catamorphism on |Constr| much like |gfoldl| is derived from the catamorphism on |SpineC|. Unfortunately though, there is no relation between |gunfold| and an anamorphism on |SpineC|. %if False > foldTypeSpine :: (forall a . a -> r a) -> (forall a b . r (a -> b) -> Type a -> r b) -> > TypeSpine a -> r a > foldTypeSpine constr ($$) (TypeAs c _) = constr c > foldTypeSpine constr ($$) (f ::$ t) = (foldTypeSpine constr ($$) f) $$ t {-"\fp"-} This is almost the same as |gunfold|, which flips the first two arguments, and uses an abstract type |Constr| instead of |TypeSpine a|. %endif %endif We \emph{can}, however, define functions on multiple type arguments without further additions. The definition of generic equality is very straightforward using the spine view: > eq :: Type a -> Type b -> a -> b -> Bool > eq t1 t2 x y = eq' (toSpineC t1 x) (toSpineC t2 y) > > eq' :: SpineC a -> SpineC b -> Bool > eq' (_ `As` c1) (_ `As` c2) = c1 == c2 > eq' (f1 :$$ (a1 ::: t1)) (f2 :$$ (a2 ::: t2)) = eq' f1 f2 && eq t1 t2 a1 a2 > eq' _ _ = False {-"\fp"-} The generalized type of |eq| avoids the necessity of a type-level equality test. In the second \SYB paper, |eq| is defined in terms of a combinator called |zipWithQ|. %if techreport %format eq1 = eq > type Query2 r = forall a b. Type a -> Type b -> a -> b -> r > > zipWithQ :: Query2 r -> Query2 [r] > zipWithQ q t1 t2 x y = zipWithQ' q (toSpineC t1 x) (toSpineC t2 y) > > zipWithQ' :: Query2 r -> (forall a b. SpineC a -> SpineC b -> [r]) > zipWithQ' q (f1 :$$ (a1 ::: t1)) (f2 :$$ (a2 ::: t2)) = zipWithQ' q f1 f2 ++ > [q t1 t2 a1 a2] > zipWithQ' q s1 s2 = [] > > eqQ :: Query2 Bool > eqQ Int Int x y = x == y > eqQ Char Char x y = x == y > eqQ t1 t2 x y = toConDescr t1 x == toConDescr t2 y > > eq1 :: Query2 Bool > eq1 t1 t2 x y = and ([eqQ t1 t2 x y] ++ zipWithQ eq1 t1 t2 x y) {-"\fp"-} %endif Although we can mirror the definition of |zipWithQ|, we believe that the direct definition is much clearer. \subsection{Other views and their strengths and weaknesses} Let us now look at two other approaches to generic programming, PolyP and Generic Haskell. They are also based on overloaded functions, but they do not represent values using |Spine|. A different choice of view affects the class of generic functions that can be written, how easily they can be expressed, and the data types that can be represented. \subsubsection{PolyP} In PolyP~\cite{polyp}, data types of kind |* -> *| are viewed as fixed points of regular pattern functors. The regular functors in turn are of kind |* -> * -> *| and represented as lifted sums of products. The view makes use of the following type definitions: > data Fix f = In (f (Fix f)) > type LSum f g a r = Either (f a r) (g a r) > type LProd f g a r = (f a r, g a r) > type LUnit a r = () > type Par a r = a > type Rec a r = r {-"\fp"-} Here, |Fix| is a fixed-point computation on the type level. The type constructors |LSum|, |LProd|, and |LUnit| are lifted variants of the binary sum type |Either|, the binary product type |pair|, and the unit type |()|. Finally we have |Par| to select the parameter, and |Rec| to select the recursive call. %format =~ = "\cong " As an example, our type |Tree| has pattern functor |TreeF|: > data TreeF a r = EmptyF | NodeF r a r {-"\fp"-} We have (modulo |undefined|) that |Tree a =~ Fix (TreeF a)|. Furthermore, we can view |TreeF| as a binary sum (it has two constructors), where the right component is a nested binary product (|NodeF| has three fields). The recursive argument |r| is represented by |Rec|, the parameter to |Tree| by |Par|: > type TreeFS a r = LSum LUnit (LProd Rec (LProd Par Rec)) a r {-"\fp"-} Again, we have (modulo |undefined|) an isomorphism |TreeF a r =~ TreeFS a r|. The view of PolyP has two obvious disadvantages: first, due to its two-level nature, it is relatively complicated; second, it is quite limited in its applicability. Only data types of kind~|* -> *| that are regular can be represented. On the other hand, many generic functions on data types of kind~|* -> *| are definable. PolyP can express functions to parse, compare, unify, or print values generically. Its particular strength is that recursion patterns such as cata- or anamorphisms can be expressed generically, because each data type is viewed as a fixed point, and the points of recursion are visible. \subsubsection{Generic Haskell} In contrast to PolyP, Generic Haskell~\cite{loeh04exploring,ghcoral} uses a view that is much more widely applicable and is slightly easier to handle: all data types are (unlifted) sums of products. The data type |Tree| is viewed as the isomorphic > type TreeS a = Either () (Tree a, (a, Tree a)) {-"\fp"-} The original type |Tree| appears in |TreeS|, there is no special mechanism to treat recursion differently. This has a clear advantage, namely that the view is applicable to nested and mutually recursive data types of arbitrary kinds. In fact, in Generic Haskell all Haskell~98 data types can be represented. The price is that recursion patterns such as cata- or anamorphisms cannot be defined directly. On the other hand, generic functions in Generic Haskell can be defined such that they work on types of all kinds. It is therefore significantly more powerful than PolyP. In Generic Haskell we can, for instance, define a generic |map| that works for generalized rose trees, a data type of kind~|(* -> *) -> * -> *|: \begingroup \setlength{\belowdisplayskip}{-2pt}% > data Rose f a = Fork a (f (Rose f a)) {-"\fp"-} \endgroup \subsubsection{Scrap your boilerplate} The |Spine| view is not so much based on the structure of types, but on the structure of values. It emphasizes the structure of a constructor application. We have already noticed that this limits the generic functions that can be written. Pure producers such as |read| or |some| require additional information. Furthermore, all generic functions work on types of kind~|*|. It is not possible to define a generic version of |map| for type constructors, or to define a recursion pattern such as a catamorphism generically. But the |Spine| view also has two noticeable advantages over the other views discussed. Firstly, the view is simple, and the relation between a value and its spine representation is very direct. As a consequence, the transformation functions |fromSpine| and |toSpine| are quite efficient, and it is easy to deforest the |Spine| data structure. Secondly, as every (non-abstract) Haskell value is a constructor application, the view is very widely applicable. Not only all Haskell~98 data types of all kinds can be represented, the |Spine| view is general enough to represent data types containing existentials and even \GADTs without any further modifications. This is particularly remarkable, because at the moment, \GHC does not support automatic derivation of classes for \GADTs. The methods of the classes |Eq|, |Ord|, and |Show| can easily be defined using the \SYB approach. Thus, there is no theoretical problem to allow derivation of these classes also for \GADTs. We discuss this newly found expressive power further in the next section. %if False Example for all kinds: Here is an example for generalized rose trees: %format Type1 = Type %format Rose1 = Rose > data Type1 :: * -> * {-" "-} where > TDots1 :: Type1 tdotsend > Rose1 :: (forall a. Type1 a -> Type1 (f a)) -> Type1 a -> Type1 (Rose f a) %format toSpine1 = toSpine > toSpine1 :: Type a -> a -> Spine a > toSpine1 (Rose f a) = toSpineRose f a > toSpineRose :: (forall a . Type a -> Type (f a)) -> > Type a -> Rose f a -> Spine (Rose f a) > toSpineRose f a (Fork x ts) = Constr Fork :$ (x ::: a) :$ (ts ::: f (Rose f a)) [Note that we cannot use above approach with type classes, because we'd need higher-order (quantified) class constraints.] Possible problem for representation: types which have polymorphic components require polymorphic type representations, but that's an orthogonal issue. > data Monad m = Monad (forall a. a -> m a) (forall a b. m a -> (a -> m b) -> m b) %endif \section{Scrap your boilerplate for ``Scrap your boilerplate''}% \label{ssyb} There are almost no limits to the data types we can represent using |Spine|. One very interesting example is the \GADT of types itself, namely |Type|. This allows us to instantiate generic functions on type |Type|. Consequently, we can show types by invoking the generic |show| function, or compute type equality using the generic equality~|eq|! Both are useful in the context of dynamically typed values: > data Dynamic where > Dyn :: t -> Type t -> Dynamic {-"\fp"-} The difference between the types |Dynamic| and |Typed| is that |Dynamic| contains an existential quantification. Before we can actually use generic functions on |Dynamic|, we require that |Type| has a constructor for types and dynamic values: %format Type2 = Type %format Dynamic2 = Dynamic > data Type2 :: * -> * {-" "-} where > TDots2 :: Type2 tdotsend > Type2 :: Type2 a -> Type2 (Type2 a) > Dynamic2 :: Type2 Dynamic {-"\fp"-} The function |toSpine| also requires cases for |Type| and |Dynamic|, but converting types or dynamics into the |SpineC| view is entirely straightforward, as the following example cases demonstrate: %format toSpineC1 = toSpineC %if False > toSpineC1 :: Type a -> a -> SpineC a %endif > toSpineC1 (Type a') (Type a) = Type `As` "Type" :$$ (a ::: Type a) > toSpineC1 Dynamic (Dyn x t) = Dyn `As` "Dyn" :$$ (x ::: t) :$$ (t ::: Type t) {-"\fp"-} In the first line above, |a'| is always equal to |a|, but the Haskell type system does not know that, so we do not enforce it in the program. %if False > fromType :: Type a -> SpineC (Type a) > fromType (Tree a) = Tree `As` "Tree" :$$ (a ::: Type a) > fromType (Type a) = Type `As` "Type" :$$ (a ::: Type a) > fromType Dots = dots > fromType Int = Int `As` "Int" > fromType Char = Char `As` "Char" > fromType (List a) = List `As` "List" :$$ (a ::: Type a) > fromType (Pair a b) = Pair `As` "Pair" :$$ (a ::: Type a) :$$ (b ::: Type b) > fromDynamic :: Dynamic -> SpineC Dynamic > fromDynamic (Dyn x t) = Dyn `As` "Dyn" :$$ (x ::: t) :$$ (t ::: Type t) > dyn = Dyn (Node Empty 2 Empty) (Tree Int) %endif The output of %{ %format = = " " %format w0 = " " \begingroup %if not techreport \inlinehs %endif > w0=show Dynamic (Dyn (Node Empty 2 Empty) (Tree Int)) is now the string \invisiblecomments < {- ``\eval{putStr $ w0}''\fc -} and comparing the dynamic value to itself using |eq Dynamic Dynamic| yields indeed \eval{eq Dynamic Dynamic dyn dyn}, incorporating a run-time type equality test. \endgroup %} %if techreport \section{Class-based implementation}% \label{classbased} In this section, we combine the idea of the |Spine| view with an alternative way to express overloaded functions, using classes as described in the third \SYB paper~\cite{syb3}, short \SYB 3 in the remainder of this section. We assume knowledge with the material in that paper, as some of the techniques described there are quite subtle. The advantage of this presentation (which comes at the price of elegance) is that the resulting functions are extensible in a compositional way. New data types can be added, and functions adapted, just by providing a few additional instances. As an introduction, here is how the overloaded |sum0| function from Section~\ref{functionsontypes} is expressed using type classes: %format Sum2 = Sum %format sum2 = sum > class Sum2 a where > sum2 :: a -> Int > > instance Sum2 Int where > sum2 n = n > > instance Sum2 Char where > sum2 _ = 0 > > instance Sum2 a => Sum2 [a] where > sum2 xs = foldr (+) 0 (map sum2 xs) > > instance (Sum2 a, Sum2 b) => Sum2 (a,b) where > sum2 (x,y) = sum2 x + sum2 y > > instance Sum2 a => Sum2 (Tree a) where > sum2 t = sum2 (inorder t) {-"\fp"-} If we want to turn this into the generic variant of |sum|, as given near the end of Section~\ref{spineview}, while keeping the function extensible in the spirit of \SYB 3, we have to do a bit of preliminary work. The whole point of the exercise is to reuse the spine view, but to do without explicit type representations. It turns out that the main thing we need to be able to do is to turn values into their spine representation, so we define a |Data| class as follows: %format SpineX = Spine %format toSpineX = toSpine %format constrsX = constrs > class (Sat (ctx a)) => Data ctx a where > toSpineX :: a -> SpineX ctx a {-"\fc"-} which plays more or less the same role as the |Data| class in the \SYB papers. The intricacies of |Sat| and the |ctx| parameter are explained in \SYB 3, they have to do with tying the recursive knot for extensible generic functions. The definition of |Sat| (read: satisfy) is simply > class Sat a where > dict :: a {-"\fp"-} We can now specify the variant of the |Spine| data type we use here, where we use the |Data| class instead of the |Type| data type: %format AsX = As %format :$? = :$ > data SpineX :: (* -> *) -> (* -> *) where > AsX :: a -> ConDescr -> SpineX ctx a > (:$?) :: (Data ctx a) => SpineX ctx (a -> b) -> a -> SpineX ctx b {-"\fp"-} Defining instances of the |Data| class is without surprises: > instance Sat (ctx Int) => Data ctx Int where > toSpineX n = n `AsX` Prelude.show n > > instance (Sat (ctx (Tree a)), Data ctx a) => Data ctx (Tree a) where > toSpineX Empty = Empty `AsX` "Empty" > toSpineX (Node l x r) = Node `AsX` "Node" :$? l :$? x :$? r {-"\fp"-} %format Sum3 = Sum %format sum3 = sum %format sum3' = sum' Now we can start defining the generic |sum3| function. The main part of the function turns out to be simpler than in the non-generic case, because we need only one specific case, for integers: > class Sum3 a where > sum3 :: a -> Int > > instance Sum3 Int where > sum3 n = n {-"\fp"-} The generic case looks as follows: > instance Data SumD a => Sum3 a where > sum3 x = sum3' (toSpineX x) {-"\fp"-} The data type |SumD| is a \technical{dictionary} for the |sum3| function. We use it for recursive calls: > data SumD a = SumD { sumD :: a -> Int } > > sum3' :: SpineX SumD a -> Int > sum3' (c `AsX` _) = 0 > sum3' (f :$? x) = sum3' f + sumD dict x > > instance Sum3 a => Sat (SumD a) where > dict = SumD { sumD = sum3 } {-"\fp"-} This function works as the original generic |sum|, only that we do not have to provide a type argument. The call \begingroup \invisiblecomments %{ %format = = " " %format v6 = " " > v6=sum3 (Node Empty (17 :: Int) (Node Empty 25 Empty)) %} \endgroup evaluates to \eval{v6}. The function |sum3| is easy to extend: if we want to handle new data types, all we have to do is to define appropriate instances for the |Data| and |Sum3| classes. When moving to the classic \SYB combinators, we have to resort to proxies as explained in \SYB 3: %format mapQX = mapQ %format mapQX' = mapQ' %format everythingX = everything %format everythingX' = everything' %format QueryX = Query > data Proxy (ctx :: * -> *) -- empty type > > proxy :: Proxy ctx > proxy = error "proxy" > > type QueryX ctx r = forall a. Data ctx a => Proxy ctx -> a -> r > > mapQX :: QueryX ctx r -> QueryX ctx [r] > mapQX q _ = mapQX' q . toSpineX > > mapQX' :: QueryX ctx r -> (forall a. SpineX ctx a -> [r]) > mapQX' q (c `AsX` _) = [] > mapQX' q (f :$? x) = mapQX' q f ++ [q proxy x] > > everythingX' :: QueryX ctx r -> QueryX ctx [r] > everythingX' q p x = [q p x] ++ concat (mapQX (everythingX' q) p x) > > everythingX :: (r -> r -> r) -> QueryX ctx r -> QueryX ctx r > everythingX op q p x = foldl1 op ([q p x] ++ mapQX (everythingX op q) p x) {-"\fp"-} With these preparations in place, we can now write |sum3| as a query: %format SumQ2 = SumQ %format sumQ2 = sumQ %format sum4 = sum > class SumQ2 a where > sumQ2 :: a -> Int > sumQ2 _ = 0 > > instance SumQ2 Int where > sumQ2 n = n > > data SumQD a = SumQD { sumQD :: a -> Int } > > instance SumQ2 a => Sat (SumQD a) where > dict = SumQD { sumQD = sumQ2 } > > sum4 :: QueryX SumQD Int > sum4 = everythingX (+) (const (sumQD dict)) {-"\fp"-} The rest of the code in this paper can be adapted to a class-based approach using the same techniques as shown above. This demonstrates that the choice of how to represent overloaded functions is mostly independent of the usage of the spine view. %endif \section{Conclusions}% \label{conclusions} The \SYB approach has been developed by Peyton Jones and L\"ammel in a series of papers~\cite{laemmel03scrap,syb2,syb3}. Originally, it was an implementation of \technical{strategic programming}~\cite{visser2000language} in Haskell, intended for traversing and querying complex, compound data such as abstract syntax trees. The ideas underlying the generic programming extension PolyP~\cite{polyp} go back to the categorical notions of functors and catamorphisms, which are independent of the data type in question~\cite{backhousegp}. Generic Haskell~\cite{hinzempc} was motivated by the desire to overcome the restrictions of PolyP. Due to the different backgrounds, it is not surprising that \SYB and generic programming have remained difficult to compare for a long time. The recent work on \technical{generic views}~\cite{holdermans2005views,holdermans2005thesis} has been an attempt to unify different approaches. We believe that we bridged the gap in this paper for the first time, by presenting the |Spine| data type which encodes the \SYB approach faithfully. Our implementation handles the two central ingredients of generic programming differently from the original \SYB paper: we use overloaded functions with explicit type arguments instead of overloaded functions based on a type-safe cast~\cite{laemmel03scrap} or a class-based extensible scheme~\cite{syb3}; and we use the explicit spine view rather than a combinator-based approach. % Both changes are independent of each other, and have been made with clarity in mind: we think that the structure of the \SYB approach is more visible in our setting, and that the relations to PolyP and Generic Haskell become clearer. We have revealed that while the spine view is limited in the class of generic functions that can be written, it is applicable to a very large class of data types, including \GADTs. Our approach cannot be used easily as a library, because the encoding of overloaded functions using explicit type arguments requires the extensibility of the |Type| data type and of functions such as |toSpine|. One can, however, incorporate |Spine| into the \SYB library while still using the techniques of the \SYB papers to encode overloaded functions% %if not techreport \ (see the technical report~\cite{syb0tr} for more details)% %endif . In this paper, we do not use classes at all, and we therefore expect that it is easier to prove algebraic properties about SYB (such as |mapT copy = copy| where |copy _ = id| is the identity traversal) in this setting. For example, we believe that the work of Reig~\cite{reig:tfp:2006} could be recast using our approach, leading to shorter and more concise proofs. \paragraph{Acknowledgements} We thank Jeremy Gibbons, Ralf L\"ammel, Pablo Nogueira, Simon Peyton Jones, Fermin Reig, and the four anonymous referees for several helpful remarks. \bibliographystyle{splncs} %\clearpage %\addcontentsline{toc}{chapter}{\bibname} %if not techreport \afterpage{\enlargethispage*{\baselineskip}}% %endif \bibliography{Syb} \end{document}