P4P: A Syntax Proposal
This document proposes an alternate syntax for Racket. It reduces the parenthetical burden, makes the language appear more traditional, and respects indentation, though not in the way you think. It does all this while retaining the essence of Racket syntax, and even offering a high degree of tool reuse.
1 A Thought Experiment
(define (square x) |
(* x x)) |
define (square x) |
(* x x) |
define (square x) |
(* x x) |
(+ x x) |
define (square x) |
(* x x) |
(+ x x) |
define (square x) |
(* x x) |
(+ x x) |
Except that’s not what we’re going to do. Read on.
2 Examples
Before we dive into details, let’s see some running P4P programs. I intentionally show only code, not output. If there is any doubt as to what these programs mean (and you aren’t just trying to be ornery), P4P has failed.
defvar: m = 10 |
defvar: this-better-be-6 = add(1, 2, 3) |
defvar: this-better-be-0 = add() |
deffun: five() = 5 |
deffun: trpl(x) = add(x, x, x) |
deffun: g(a, b, c) = add(a, b, c) |
deffun: d/dx(f) = |
defvar: delta = 0.001 |
fun: (x) in: |
div(sub(f(add(x, delta)), |
f(x)), |
delta) |
deffun: fib(n) = |
if: numeq(n, 0) |
1 |
elseif: numeq(n, 1) |
1 |
else: |
add(fib(sub1(n)), fib(sub(n, 2))) |
defstruct: memo has: (key, ans) |
deffun: memoize (f) = |
defvar: memo-table = box(empty) |
fun: args in: |
defvar: lookup = |
filter(fun: (v) in: |
equal?(args, memo-key(v)), |
unbox(memo-table)) |
if: empty?(lookup) |
do: ( |
set-box!(memo-table, |
cons (make-memo (args, apply(f, args)), |
unbox(memo-table))), |
apply(f, args)) |
else: |
memo-ans(first(lookup)) |
defvar: this-better-be-9 = {fun: (n) in: mult(n, n)}(3) |
let: |
x = 3, |
y = 2 |
in: |
+(x, y) |
|
let*: x = 3, y = x in: add(x, y) |
|
letrec: even = fun: (n) in: if: zero?(n) true else: odd?(sub1(n)), |
odd = fun: (n) in: |
if: zero?(n) |
false |
else: |
odd?(sub1(n)) |
in: list(odd?(10), even?(10)) |
3 The Central Idea
P4P hinges on one central idea and its consequences.
First, the idea: let’s get rid of implicit-begin-ness. Where we need a variable number of terms, write a do:. This small change suddenly eliminates the ambiguity that pervades Racket parsing and forces parentheses to clarify intent. The odds are that the extra typing engendered by do: will be offset by the reduction in typing parentheses.
Once we have made the syntax unambiguous without the help of parentheses, we can get rid of the parentheses themselves. That is, keywords like deffun: are sufficient to tell us what shape of terms to expect in legal programs. (Of course, every language construct must follow this property – even do:.)
(+ (memofib (sub1 n)) (memofib (- n 2)))])))) |
+(memofib(sub1(n)), memofib(-(n, 2)))) |
IF false IF false THEN 2 |
4 Design
Now I present some design decisions and design choices. Decisions are those I believe in and would change only under duress; choices are points of flexibility where I can be talked into alternatives.
4.1 Decisions
4.1.1 Embracing Prefix
We remain unabashedly prefix. By doing so, we circumvent all decisions about precedence, binding, associativity, and so on. Some initial grumbling may ensue when confronted with code like +(1, 2), but this seems much less strange after you have seen append(list1, list2). Bootstrap anyway wants students to understand that exalted + is just another operation – just like lowly append.
4.1.2 Adopting Racket’s Token Syntax
By not permitting infix, we are free to be generous about token names: append-string, overlay/xy, and d/dx are available. However, there is no reason to preclude e^<i*pi>-1, either. In short, we use Racket’s token syntax, which will simplify interoperation with traditional, parenthesized Racket.
4.1.3 Keeping Parsing Predictable
Despite the lack of parentheses, the parser is top-down and syntax-directed. It has only one token of lookahead, in this one case: when given an identifier in expression position, it has to look ahead for a left-parenthesis to determine whether or not this is an application. This is common in other languages too. If the input stream (file, REPL interaction, etc.) ends after the identifier, P4P treats it as a variable reference. (This ambiguity will affect tools like the kill-s-expression key-binding: if it faces an identifier, it will have to check whether the identifier is followed by an argument list.)
One potential source of ambiguity is the function position of an application being a non-identifier expression. In such cases, the expression must be wrapped in braces. Because the use of expressions in function positions is not common, this is a small price pay. Note that functions passed as arguments are bound to identifiers, so they will not suffer from this burden; the problem similarly disappears if the expression were first bound to a name (which might clarify intent).
4.1.4 Leaving the Semantics Untouched
This is purely about syntax. The semantics of P4P is precisely that of Racket. For instance, the P4P equivalent of begin currently allows only a sequence of expressions; if Racket began to permit definitions before expressions, so would P4P. Even naming stays untouched: if tomorrow structure constructors were to no longer be preceded by make-, that would be just as true of P4P.
4.1.5 Attaining Arity Clarity
Function invocations are delimited. Therefore we neither need to a-priori fix arity nor need types to tell us what the arity will be. Despite this, we can have functions that unambiguously consume multiple arity, just as in Racket: +(1, 2), +(1, 2, 3), +(1), and +() are all legal P4P expressions with the expected meanings.
4.1.6 Adopting Indentation Without Semantics
I increasingly view emphasizing good indentation as critical. In some languages, however, indentation controls semantics. I view this as a mistake.
In P4P, instead, the semantics controls indentation: that is, each construct has indentation rules, and the parser enforces them. However, changing the indentation of a term either leaves the program’s meaning unchanged or results in a syntax error; it cannot change the meaning of the program. I believe this delivers the advantages of enforced indentation while curbing its worst excesses.
There is a pleasant side-effect to this decision: the parser can be run in a mode where indentation-checking is simply turned off. (Obviously, this is meaningless to do in a language where indentation controls semantics.) This can be beneficial when dealing with program-generated code. Thus, it preserves the Lisp tradition’s friendliness to generated code while imposing higher standards on human programmers.
4.1.7 Reusing the Tool Chain
P4P is implemented entirely using existing high-level Racket tools: it is defined entirely in terms of (a particular pattern of) syntax-case and some lower-level syntax-processing primitives. It does not define a lexer or LR-parser. I initially viewed this as a choice, but I have come to view this as a decision: this is the best way to ensure fidelity to Racket syntax.
4.1.8 Avoiding Optional Syntax
P4P does not have any optional syntax. I believe this makes it easier to teach people to program: they want clear instructions, not “You can do this, or you can do that...you can do whatever you want!” (If they were ready to do whatever they wanted, they wouldn’t be asking you.) These trade-offs are best left to semantic and design levels, not syntax. The only options in P4P are thus semantic choices: e.g., you can use or leave out elseif: terms in a conditional, but that is a function of your program’s logic, not your syntactic whimsy.
4.1.9 Avoiding New Spacing Conventions
While P4P’s spacing conventions can (and should) be understood in their own right, experienced Racket programmers can safely fall back on their knowledge of Racket syntax. This, for instance, tells us that both deffun: f(x) = x and deffun: f (x) = x are valid (and so, even, is deffun: f(x)= x), but deffun:f(x) = x and deffun: f(x)=x will not have the presumed intended effect. I do not view this as problematic: beginners (both educators and students) always ask about spacing conventions. Since using spaces around tokens is safe, there is an easy rule to follow, which also enhances readability. It would help for P4P’s parser to be sensitive to the presence of special tokens and build in context-sensitive checks for them (e.g., if the first token after the function header is an identifier that begins with =, this should be caught by a special error case that admonishes the user to insert a space).
4.2 Choices
4.2.1 Distinguishing Keywords
P4P uses colons at the end of keywords. I believe the principle of distinguishing keywords is beneficial: it tells the user, “You are about to use a construct whose basic syntax, rules of indentation, and rules of evaluation may all be different from what you expect.” The particular choice of colon is whimsical and free to change, though it was inspired by Python’s use of colons (which is somewhat different). P4P does not prevent ordinary program variables from ending in :, though it would be silently frowning as it processed programs that took advantage of this liberty.
4.2.2 Using Syntactic Embellishments
= in defvar: and deffun: aren’t necessary, but adding them seemed to immensely improve readability. In particular, they emphasize the substitution nature of these definitions.
There is no = in fun:; I chose in: instead. This is because the argument list does not equal the body, but rather is bound in it. The choice of in: is thus not entirely whimsical, but is very open to improvement. Likewise, there is no = in defstruct:, but instead has:, to emphasize that a structure has the following fields.
do: uses braces (rather than parens) to delimit its sub-terms. (Semi-colons between terms in the do: will never be enforceable, so do: uses commas instead.)
Using the def- prefix for the definition constructs leaves open fun: for anonymous functions.
The syntax of fun: feels a bit naked: one needs to really understand expression-ness to understand (beyond indentation) where a function ends. A pair of delimiters wrapping the entire body would reduce this anxiety.
if: does not need any intermediate keywords at all. In their absence, however, the programmer would be reduced to counting the number of preceding expressions and checking parity to know what they were looking at. Intermediate keywords improve both readability and error-reporting (which are probably linked).
4.2.3 Handling Variterm Constructs
Some constructs, such as Racket’s cond, begin, and when, contain a variable number of body terms. This makes it challenging to keep their parsing simple and predictable. I see two broad ways to handle these: what I call if:-style and do:-style. do:-style is the lazy option: it uses a delimiter pair (specifically, brackets) and brutally dumps the terms between the delimiters. if:-style instead uses carefully-designed intermediate keywords as guideposts to the parser. The brutality of the do:-style could be reduced by the use of intermediate keywords, but at that point the delimiters wouldn’t be necessary any longer. (They wouldn’t be necessary, but they may still be helpful, as the number or size of sub-terms grows large.) Constructs like when:, which frequently have multiple, imperative body terms, would be better served by the brutalist style, because otherwise programmers would have to write an additional do: inside the single body term most of the time.
4.2.4 Avoiding Closing Delimiters
Nothing in the language design precludes closing delimiters. However, because parsing is always predictable, there is no need for them, either (except for variterm constructs). Offering them could improve error reporting.
4.2.5 Not Specifying the Indentation of Parenthetical Pairs
P4P currently does not enforce any indentation convention on parenthetical constructs. Indeed, I wonder to what extent the Scheme antipathy towards putting closing delimiters on separate lines is because of just how many darn ones there are. If the only closing delimiters are for constructs that need them (such as do:), it may even – gasp – be good style to put them on distinct lines, lining up with the opening keyword.
5 Indentation...Rules!
There are only three indentation rules in P4P: SLSC, and SLGC, and SLGEC. These stand for same-line-same-column, same-line-greater-column, and same-line-greater-equal-column, respectively. As you read more about these, you may find them insufficiently restrictive. Keep in mind that indentation rules are contravariant to language size: sub-languages (such as teaching languages) can enforce many more restrictions on lines and columns.
+(1, 2, |
dbl(4), |
dbl(dbl(8))) |
SLSC is used more rarely, when we want rigid alignment. Currently, only if: uses SLSC for its internal keywords (elseif: and else:).
let: x = 3 |
in: +(x, x) |
let: x = 3 |
in: +(x, x) |
deffun: f (x) = |
+(dbl(dbl(x)), |
dbl(x)) |
defvar: mfib = |
memoize( |
fun: (n) in: |
... |
deffun: f (x) = |
+(dbl(dbl(x)), |
dbl(x)) |
deffun: f (x) = |
+(dbl(dbl(x)), |
dbl(x)) |
if: test1 |
e1 |
elseif: test2 |
e2 |
else: |
e3 |
if: test1 |
e1 |
elseif: test2 |
e2 |
else: |
e3 |
6 On Groves and Brooks (or, Trees and Streams)
The Lisp bicameral syntax tradition is based on processing trees. The parentheses chunk tokens into well-formed trees, and the parser chunks these into valid trees. It’s parentheses – and thus trees – all the way down.
Except, it isn’t. A file is not a tree. Thus, sitting outside every Lisp parser of popular imagination is another parser that operates, instead, on streams.
Happily, Racket provides a middle-ground: files without explicit wrappers can be written in #lang, but #%module-begin turns this back into a tree.
This mapping enables the P4P parser to leverage the Racket macro system to bootstrap. P4P removes tokens sequentially, using a slack term in every pattern to match the rest of the stream; each construct’s parser returns a tree and what remains of the stream after it is done processing.
Oh, and commas. Of course, the Racket tokenizer converts commas to unquote. In Racket, the unquote is followed by a single tree; in P4P, it is followed by an arbitrary undelimited expression. So P4P lets Racket turn commas into unquotes, and then simply returns the subsequent tree (in Racket’s terms) to the front of the token stream, for continued P4P parsing.
7 Error Reporting
I have invested (almost) no time into error messages, yet.
By being a macro over existing Racket, P4P inherits much of Racket’s context-sensitive error-reporting. Naturally, having additional clauses in P4P can improve error checking. For instance, in the current implementation, deffun: f "var" = 3 and deffun: f(3) = 3 happen to be caught by P4P itself (which highlights the appropriate term), while other errors pass through to Racket, using its error messages and highlighting. (The expression 3(4) ought to demonstrate this, but currently fails on a internal error instead.)
Because P4P’s parsing is done through streams rather than trees, it is unclear how much of Ryan Culpepper’s infrastructure for strengthening tree-based patterns to insert error checks will apply here. It is more likely that something analogous needs to be created for stream processing. In the best case, of course, Ryan’s work will carry over unchanged. Either way, this will be a fruitful area for further examination.
Finally, one known problematic case is this: when a comma-separated list fails to have a term between the (intended) penultimate comma and the closing parentheses (e.g., f(x, y,)). This is an unfortunate consequence of P4P’s attempt to reuse the Racket toolchain, and will need special support. This is a place where EOPL’s sllgen parser has no problems, because it natively implements both scanner and parser.
8 Syntax Extensions
It would be easy to add new constructs such as provide:, test:, defconst: (to distinguish from defvar:), and so on.
The current design of P4P also does not preclude the addition of syntactic enhancements such as type declarations, default argument values, and so on. It is presumably also possible to add support for Racket keywords and reader extensions.
One particularly curious form of syntactic extension would be to use fully-parenthesized terms in some contexts. For instance, we might add a racket: construct that is followed by a fully-parenthesized s-expression. Because of the nature of P4P’s syntax, this can be done without any ambiguity. One might even, say, decide to use P4P syntax to define macros for parenthetical Racket; the P4P versions of syntax-rules or syntax-case can exploit P4P’s parenthetical sparsity except for the patterns themselves, which would be fully-parenthesized as they would in traditional Racket (and in the source they process).
The macro definer has to understand the stream-processing pattern, which is different from traditional tree-shaped macro processing.
Even more importantly, the macro writer undertakes to create a construct that does not introduce syntactic ambiguity – a property that is guaranteed in Racket, but earned in P4P. (To be clear, a new Racket macro can be ambiguous: imagine an infix macro, which requires precedence rules for disambiguation. However, this ambiguity is limited to the inside of the new construct, and cannot affect terms past the closing parenthesis. In P4P, the effect may leak past the end of the construct.)
The macro writer needs to check indentation. This may require a pattern language that is indentation-sensitive.
The output of the macros will, by default, interact with the indentation checking of the underlying P4P language. One option is to have the macros respect this, though it will likely make them too difficult to write (because any loss of source location would leave the underlying P4P parser unable to perform checks, and hence forced to reject the program). A second option is to generate code in a P4P variant that doesn’t check indentation. A third, perhaps best, solution would be to generate Racket code directly, just as the current P4P does: that is, the macro system would be an attached-at-the-hip, cooperating twin of P4P, rather than a layer atop it.
9 Conclusion
Racket has a excellent language design, a great implementation, a superb programming environment, and terrific tools. Mainstream adoption will, however, always be curtailed by the syntax. Racket could benefit from liposuction, stomach stapling, or just plain getting off the couch and getting out for a ride, to reduce the layers of parenthetical adipose that – as this document argues, needlessly – engird it. P4P is a proposal for how to do this without losing the essential nature of the Lisp syntactic heritage (and, indeed, bringing to the surface the streaming nature that has always been hidden within).