From here on out, you'll be implementing , a subset of R5RS, in C++.
This week, you will implement the parser and enough of the object model to test your parsing strategy. In a later assignment, you will implement the evaluation rules that are necessary for turning a parsed expression into an interpreted program.
We have already provided some parts of the interpreter for you, in particular, the lexer and an object model. Take some time to review these sections to understand how they work. You may be surprised by what you do (or don't) need to do.
An interpreter requires several individual components working together, each of which handling a facet of the read-eval-print loop. The interpreter interacts with the user through a driver loop, which typically hands off most input to the parsing subsystem to produce logical structure from a stream of characters. The structured data that results from parsing can then be evaluated using ``evaluation rules''.
Before you begin to think about the process of taking user input and turning it into data that can be evaluated, you must have some idea of what that data should look like. In particular, the basic unit processed by the interpreter is the ``s-expression'' (``symbolic'' expression). An s-expression can be either a single element, such as a symbol or number, or a list of sub-expressions.
Each type of thing that makes up an s-expression needs to be represented by . In true object-oriented style, all possible components of an s-expression are objects of the type value, which is an abstract class that implements an interface for, at the moment, getting information about a particular value or printing the value. (For instance, in assignment 8, you must implement the insert method for each sub-type of value in order to print to a given output stream.)
Every value type implements insert and type, which returns a tag indicating the subclass of the item. This will come in handy when different types need to be handled differently--when will this show up on assignment 8?
S-expressions will be composed entirely of pointers to values; that is, they will be value *s and a given s-expression will itself be a value *.
As you've seen in the past, nil, (), has special meaning in Scheme, as
it marks the end of every proper list. There is also no reason for
there to be more than one instance of nil. To provide this uniqueness
property, the nil class implements an interface that allows access
to one, and only one, instance of the class. This ``singleton'' design
pattern means that only one copy of nil will exist. You may remember
that we mentioned Design Patterns two weeks ago in section. The
Singleton is one example of a Design Pattern. Refer back to Section Notes 7 for
pointers to some resources that discuss Design Patterns.
You can access this
copy via the static method nil::instance of class nil.
Like the empty list, there are a restricted number of booleans in the world. In particular, #t and #f. You can access these two objects via the static methods t and f of the class boolean.
In , there are only longs; you can't use floating point numbers. The number class is used to represent numbers in s-expressions. For a given numerical value, it is possible that there are two different value * objects that store the same number. How might you get around this?
Symbols are actually quite interesting; a given symbol is distinguished by name, so it will eventually be important that for each name, we always get back the same symbol. For the current assignment, you don't need to worry about this part, but you will need to respect an interface that allows us to later change how things work. Therefore, we have provided yet another static method, intern, that takes a symbol name and returns the associated symbol *.
In assignment 9, having used the intern interface already will allow us to modify how symbols are created without having to change our parsing strategy.
Finally, pairs are the heart of structured data in Scheme. Pairs are what make programming in Scheme possible, since they are the only means of combination provided by the language. Without them, we would have nothing more than an inert sequence of tokens.
A pair is composed of a car and a cdr, each of which are value *s. Your parser will probably need to create these and most parsed s-expressions will be returned as pair *s.
The printing routine for a pair is probably the most interesting, in particular because you must be able to handle both proper and improper lists (those that do not end with nil).
You won't be working with procedures yet in assignment 8. Later, we'll see that procedures get subclassed into two different types, one for primitive (built-in) procedures and another for user-defined procedures (those created with lambda).
For this reason, a mechanism for checking the type of a value * is an important piece of infrastructure. This component has the following requirements:
To meet the third requirement, you will have to perform what is known as a dynamic down-cast. Down-casting means converting a base class type to a child class type (as you know, the opposite conversion is always valid). The C++ syntax is
value *someFunction(value *arg)
{
pair *p;
...
p = dynamic_cast<pair *>(arg);
...
}
The dynamic_cast operator is useful because it will return NULL if the argument is not, in fact, of the correct type. However, needing to do a dynamic cast, NULL check, and raising an exception for every type check violates our first requirement that the code be concise. Abstraction is warranted. You should come up with a way to wrap this functionality more conveniently. Hint: a template might be useful.
The ``lexer'' and ``parser'' work together to take input from a user and turn it into a value * of the proper form.
The ``parser'', built on top of the lexer, adds structure to the stream of tokens by creating an s-expression, which is then handed back to the driver loop which can either print the expression immediately (using the overloaded « operator) or evaluate it. For assignment 8, we just print it, but this does not mean that your program will simply echo back the characters that are sent to it. Parsers allow us to represent languages in a structured format that permits interpretation and transformation as we will discuss by example in this section.
Your interpreter code had better be pretty and, ideally, elegant and easy to read. This is especially true because you will be working with your code for two assignments. Design mistakes or poor style will be more difficult to recover from because you'll be stuck with them (and, potentially, bugs caused by them).
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_navigation -split 0 sec9.tex
The translation was initiated by CS51 Course Library on 2008-04-18