Harvard University, FAS

Home
Syllabus
Lecture Notes

ML Resources
MIPS and SPIM

Computer Science 153
Principles of Programming Language Compilation

Problem Set 2: Building a Front-End for Fish

Due Fri. Oct 2, 2009, 11am


Overview:

The goal of this problem set is to get you used to building lexers and parsers for semi-realistic languages using both parsing combinators as well as ML-Lex and ML-Yacc. In particular, you (and your partner if any) will build two lexers and two parsers (one using parsing combinators, the other using Lex and Yacc) for a small language called Fish, which stands for Fortran-ish. Fish is intended to have syntax similar to that of C and Java and in fact, we'll soon scale Fish up to have more features of these languages.

In the past, students have struggled with this project, so it's very important to start early and seek help if you need it.

Downloading and Compiling Instructions

Create a directory ps2. Download the fish.zip zip file and unzip it within the ps2 directory. You should find a directory fish that contains all of the files that you will need for this assignment.

The files are incomplete (you'll be modifying comblexer.sml, combparser.sml, lexer.lex, and parse.grm) but you should be able to compile them. To do so, start SML in your fish directory (or else use the command OS.FileSys.chDir "path-to-your-fish-directory") and issue the command:

  CM.make "sources.cm"
The Compilation Manager (CM) works much like the Unix make facility except it is smarter. It analyzes all of the files specified in sources.cm, figures out dependencies, and what needs to be (re-)compiled, does so, and then loads the results into the SML top-level. Every time you issue this command, it is as if you re-compiled and re-loaded all of the SML sources that make up the project.

The file ast.sml contains the abstract syntax for Fish programs. The file eval.sml contains an interpreter for Fish programs. The file hash.sml contains a little hash-table library that I used in the interpreter.

The file fish.sml contains the top-level structures defined by the project. The structure FishComb contains simple definitions that will call your combinator-based lexer and parser to build an interpreter. The structure FishYacc contains the definitions needed to glue together the Lex and Yacc-based lexer and parser to build an interpreter.

Fish Syntax

The Fish language is very simple and only includes integer expressions and statements. Here is an example Fish program:

  /* this is a comment */
  x = 42;
  y = 31;
  for (s = 0; x > 0; x = x - 1) {
    y = y + 1;
  }
  return y;
More example Fish programs can be found here. As you can see, the syntax and semantics of Fish programs is intended to follow that of C (and Java). Your job is to write a lexer and parser for Fish, first using the parsing combinators described in class, and then using ML-Lex and ML-Yacc. In the interest of not giving too much away, I've intentionally not specified the language's syntax carefully. To a large degree, this is up to you. However, here are some guidelines that I expect you to follow:
  • Comments, as in C, should start with "/*" and end with "*/". Comments in C do not nest.

  • Expressions are grouped with parentheses, whereas statements are grouped with braces "{" and "}".

  • You should follow the usual precedence and associativity for arithmetic and logical operations. For instance, * and / bind tighter than + and -, and arithmetic binds tighter than comparisons, and comparisons bind tighter than && and ||, and finally, assignment comes last in the precedence.

  • Identifiers must start with a character, and can include characters, digits, or underscores.

  • Numbers are restricted to decimal integers. Note that your parser should support things like "-42". How you handle this is up to you...

  • C lets you have an "empty" statement -- for instance:
        for (x = 0; x < 10; x = x + 1) /* empty statement */ ;
    
    You can use "skip", defined in ast.sml to represent an empty statement.

  • Your grammar should have no conflicts. So you'll have to resolve the "dangling else" problem as discussed in class.

  • You should make sure to test your parser(s) extensively to make sure they are behaving correctly. (See here for some tests.)

Part I: Using Parsing Combinators

The file combinators.sml contains the parsing combinator definitions that we saw in class and which you should use to define your first lexer and parser for Fish. The files comblexer.sml and combparser.sml contain dummy definitions for the lexer and parser. Modify these files so they correctly parse Fish programs. You can put in dummy values for position information for this parser.

Lexer Running Too Slow? The provided combinator library is eager which is quite inefficient since we're only concerned with the first parse. This is a lazy version of the same combinator library. Just overwrite combinators.sml in the fish.zip file and recompile.

Part II: Using Lex and Yacc

The files lexer.lex, parse.grm, and fish.sml contain the boilerplate code that you'll need for your Lex and Yacc-based front-end. You really only need to write the regular expression definitions in lexer.lex to define tokens, and the productions and semantic actions in parse.grm to define your parser.

Submitting your work

First, make sure that you and your partner's name (if any) are on the modified files in a comment. Second, only one of you needs to submit something. However, if you forget to put your names on the submission, we won't know who to assign credit (or more properly, we won't know who you partnered with.)

Please use the CS153 submission script provided on the nice servers at ~lib153/bin/cs153submit. For each assignment, you should create a new subdirectory to hold all of the supplied files along with any file you create. When you have complete the assignment, submit your work with the following command:

# ~lib153/bin/cs153submit N dir
where N is the assignment number, and dir the directory containing the files for the assignment. For example, if you are using a directory called ps2 for this assignment, you can submit your work with the command.
# ~lib153/bin/cs153submit 2 ps2

You can submit your work as many times as you like. Only the last submission will be graded. You can view your current submission using the submit command:

submit ls lib153 2
Show the time-stamp and size of your last submission for assignment 2.
submit contents lib153 2
Show the contents of your last submission for assignment 2.

For more information, see man submit on the nice servers.