CS 51 Week 4, L1: Sttttrrrrreeeeeeeaaaaaaaa


Today's topics:

(define mysolutions 
  (stream-filter (lambda (board) (match? board mypuzzle))
                 (stream-filter valid-board?
                                (make-stream-all-possible-boards 9))))


A Sudoku grid is a special kind of Latin square. Latin squares were defined and named by the 18th century mathematician Leonhard Euler. They are n by n natrices that are filled with n symbols in such a way that the same symbol never appears twice in the same row or column. The standard sudoku grid is a 9x9 latin square with the additional constraint that each of the nine subgrids does not contain duplicate numbers either. The first sudoku puzzle is believed to have appeared in the May 1979 edition of Dell Pencil Puzzles and Word Games and is believed to have been invented by a retired architect named Howard Garns.




Streams as a Data Abstraction

We have seen how useful lists can be as a data structure. Here are some interesting lists: the list of all prime numbers, the list of all valid sudoku boards, the trajectory (list of points) of a gas particle. These are all very large (if not infinite) lists. Yet many interesting questions in these domains can be naturally phrased as list questions.

Today we will see how to build lists using delayed evaluation - in other words, lists that get generated only "just in time" to get used. This allows us to maintain the same conceptual model of lists, while efficiently managing large data.

We will use delay and force to create a new datastructure, called a "Stream". You can think of a stream as a delayed list.

But first, lets see how we can use lambda to create delay

Using Functions to Procrastinate

We can use lambda to implement a kind of "laziness" where we "delay" evaluating something. Notice that making a regular procedure is in fact an exercise in delayed evaluation! For e.g. delay computing "square" until you know what argument to compute it on. Occasionally we may want to delay evaluation for other reasons.

(define foo
       (lambda () (some-computation)))    <--- create a procedure obj of NO ARGUMENTS

> foo   <--- returns a procedure object
> (foo) <--- executes the computation

One example of such a function is "newline" - it takes no arguments, but when you apply it, it writes a newline to the screen.

The value of "foo" is called by many names

We will create promises like foo, by using delay. We cause their execution by calling force. Here is a simple way in which delay and force may work.

(delay expr) === (lambda () expr)             ;; syntactic sugar

(define (force delayed-obj) (delayed-obj))    ;; force just applies the function

Note that delay like define, let, if.. is a special form. We do not want to evaluate "expr" before managing to delay it, so we cannot just define our own function delay. At the same time, we could implement delay by using lambda instead. So delay is simply a syntax transformation that is implemented by the interpreter. Force on the other hand can be implemented directly.

What can we do with delay and force? We can implement expensive computation and expensive data, and pass them around, and not use them until absolutely necessary.

What's an example of an expensive piece of data?

An Infinite List !!


Streams as Delayed Lists

Now we will use "delay" and "force", which is our abstraction barrier for creating and accessing promises. We will use promises to create delayed pairs, and from there create delayed lists. We can think of these pairs and lists as being lazy -- they don't evaluate things up front but wait until we try and claim our promise.

"PROMISE" (Recap)
(Delayed Object / "Thunk")

Constructor

(delay expr) --> "PROMISE[expr]"

Selector

(force "PROMISE[expr]") --> (expr)

Contract

(delay expr) => expr is not evaluated
(force (delay expr)) = (expr)

A Possible Implementation

(delay x) syntactic sugar for (lambda () x)

(define (force y) (y))

The actual implementation and CONTRACT for a promise in Scheme is more involved, and you will hear more about this during section.

Once we have promises we can create delayed pairs. A delayed pair is a pair whose second element contains a promise.

    > (define a (cons 3 (delay (+ 4 5))))

    > (car a)
    3

    > (cdr a)
    #struct:promise

    > (force (cdr a))
    9

Pairs seem to behave as before, except than when we access the second element, we need to "force" it in order to see what the second element really is. We can use this idea to build an infinite list of numbers.

Streams as Infinite Lists

First lets create a regular finite (but long) list of integers.
(define (enumerate start end)
  (if (> start end)
      '()
      (cons start (enumerate (+ start 1)))))

(define finite-integers (enumerate 0 2000))

Now lets create our infinite list of integers.

(define (enumerate-stream start)
   (cons start (delay (enumerate-stream (+ start 1)))))

(define integers (enumerate-stream 0))

Now we can compare both kinds of lists

FINITE INTEGERS (list)
(define finite-integers (enumerate 0 2000))

> finite-integers
(0 1 2 3 ....... 2000)

(car finite-integers)
> 0

(car (cdr finite-integers))
> 1

(car (cdr (cdr finite-integers)))
> 2
INFINITE INTEGERS (stream)
(define integers (enumerate-stream 0))

> integers
CONS[0,promise]

(car integers)
> 0

(car (force (cdr integers)))
> 1

(car (force (cdr (force (cdr integers)))))
> 2 

The interesting thing to note is that while integers is supposedly infinite, it takes less space than the finite list of integers. This is because we never compute the list until we need it. And even when we compute the list, we only compute "just-enough" to do what we need to do.

NOTE: In SICP they use cons-stream, stream-car and stream-cdr which are not defined in the standard Scheme. We will not use those functions, but instead explcitly use delay and force throughout. We cannot define cons-stream because it would need to be a special form. However we can define both stream-car and stream-cdr as

(define (stream-car str) (car str))
(define (stream-cdr str) (force (cdr str)))

Example 1: Streaming Video
Most of you have probably accessed a "streaming video" over the web. What does this mean? One option for viewing the video would be to do the following steps sequentially: (1) download the whole video (2) run the video through a decoder (3) play the decoded video on the screen. But instead of doing these steps sequentially, it would be nice if one could start decoding and viewing before the whole video arrived. Thats exactly what streaming video does.

In this case the video or "list of frames" is not infinite, but merely large. And often one does not care about the whole video -- it may be the wrong one or maybe you fast-forward or maybe you get bored before it finishes. Therefore computing things just in time can be more efficient in space and time. Treating a video as a "stream" is a useful abstraction.

Example 2: Mathematics on Infinite Series
In mathematics we regularly reason about infinite series: integers, odd numbers powers of two, prime numbers, taylor expansion, etc. We can say things like also compute with them, like integers = odd integers and even integers, or that the list of powers of two is just raising 2 to every positive integer. If mentally we had to always had to enumerate the entire list, our finite brains would not be able to tolerate it. Therefore we must employ some other trick in thinking about the infinite.

Example 3: Trajectories in Time
Consider a particle moving in space according to some rules, maybe just brownian motion. One can view the long term trajectory (random walk) as a stream. Many interesting questions can be asked about this trajectory, for example how long before it crosses the point where it started? How far does one get from the initial position? Similarly one could model a sequence of dice rolls as an infinite series. Modeling trajectories (computations) over time is a powerful way of thinking. Streams allows us to capture that concept.

Other Examples:
Streams of news stories, streams of data from remote sensors, streaming databases, --- in all of these cases we must compute as things come in, without ever having the complete data in hand. We can think of our computation as something that doesn't "end" but rather continually acts upon the data. On the other hand, the actual implementation of streams is very different in each case. But the metaphor is the same.

(FOOTNOTE: Streams are not a useful metaphor for doing Problem Sets however, which are not meant to be infinite and we do care about seeing the whole computation)

Computing with infinite streams

Lets start by with two simple streams

;; ONES
;;-------
(define ones (cons 1 (delay ones)))

;; INTEGERS
;;----------

(define (enumerate-stream start)
   (cons start (delay (enumerate-stream (+ start 1)))))

(define integers (enumerate-stream 0))

It would be nice to have a few functions to systematically view some of the streams. For example, stream-head could return a list of the first n elements of the stream and stream-ref could return the nth elements of a stream.

;; USEFUL FUNCTIONS ON STREAMS
;;------------------------------
(define (stream-head n str)  ;; returns first n elements as a list
  (if (<= n 0) 
      '()
      (cons (car str)
            (stream-head (- n 1) (force (cdr str)))) ))

(define (stream-ref n str)             ;; returns the nth element 
  (car (reverse (stream-head n str)))) ;; not the most efficient implementation





What happens when we view our simple streams

> ones
(1 . #struct:promise)
> integers
(0 . #struct:promise)

> (stream-head 10 ones)
(1 1 1 1 1 1 1 1 1 1)

> (stream-head 10 integers)
(0 1 2 3 4 5 6 7 8 9)

(stream-ref 10 integers)
9

From the Interpreter's Point of View

> (stream-head 10 integers)
  (cons 0 (stream-head 9 (force (cdr integers))))
  (cons 0 (stream-head 9 (cons 1 #promise)))
  (cons 0 (cons 1 (stream-head 8 (force (cdr (cons 1 #promise))))))
  (cons 0 (cons 1 (stream-head 8 (cons 2 #promise))))
  ...

The important thing to notice is that the integers do no get computed until the stream-head forces them to. So its as if stream-head is "pulling" values out of an integer box. This is a very different mental image of execution than what we are used to for regular lists -- even though the code for two look very similar.

Creating New Streams

Creating new streams from old ones is very similar to how we constructed new lists from old ones. Many of the functions we defined for lists will also be useful abstractions for streams. Wishfully thinking, we could create the following lists

Clearly having map and filter on streams would be nice. (What's wrong with having reduce ?). If we had such functions, here's how we would define these new streams:

;; EVENS
;;-------
;; remove the odds from integers
(define evens (stream-filter (lambda (e) (even? e)) integers))

;; CHALLENGE: "sevens"
;;----------
;; Define the stream of multiples of 7 => 0, 7, 14, 21 ........
(define sevens ......)

;; POWERS OF TWO
;;---------------
(define powers-of-two (stream-map (lambda (e) (expt 2 e)) 
                                  (force (cdr integers))))   ; start from 1

Notice that this looks just as if we had regular lists, except that we use stream-map instead of map and stream-filter instead of filter. Otherwise conceptually we are writing functions on lists. Now lets write stream-map and stream-filter

Stream Map and Filter

;; STREAM-MAP
;;------------
;; Returns a new stream where func is applied to every element 
;; of the original stream

(define (stream-map func str)
  (if (null? str)	     ;; not all streams are infinite
      '()
      (cons (func (car str))
	    (delay (stream-map func (force (cdr str))))) ))

;; STREAM-FILTER
;;---------------
;; Returns a new stream where only the elements
;; that satisfy the predicate are kept

(define (stream-filter pred str)
  (cond ((null? str) '())
	((pred (car str))  ;; make this an element of the stream
	 (cons (car str) (delay (stream-filter pred (force (cdr str))))) )
	(else              ;; otherwise keep looking
	 (stream-filter pred (force (cdr str))))
	))
;; EXAMPLES
;;----------
;; what is 2^10 ?
> (stream-ref 10 powers-of-two)
1024
;; What is 2^51 ?
> (stream-ref 51 powers-of-two)
2251799813685248

Prime Numbers

Prime numbers are another interesting and important number series. People over the ages have asked lots of interesting questions about primes - for example, is the sequence of primes infinite? (yes, Euclid 300 BC). Is there a pattern for predicting the differences between consequtive primes? (no, it is irregular) How many primes are there that are less than n? (approximately n/logn). Several intriguing simple and unproven hypotheses exist around primes.

We can define the series of prime numbers as follows. This is not terribly efficient, but works reasonably for our purposes.

;; PRIMES
;;----------
(define (prime? p)
  (prime-helper 2 (sqrt p) p))

(define (prime-helper factor end p)
  (cond ((> factor end) #t)
	((divisible? p factor) #f)
	(else (prime-helper (+ factor 1) end p))
	))

(define (divisible? n factor)
  (= (remainder n factor) 0))

(define primes (stream-filter prime? (enumerate-stream 2)))

Question: What are the first 20 primes?

> (stream-head 20 primes)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71)

Question: What is the nth prime number?
Theory predicts that the nth prime number is roughly equal to nlogn
(for n=10, 100, 1000 you get ~23, 460, 6907)

> (stream-ref 10 primes)     
29
> (stream-ref 100 primes)
541
> (stream-ref 1000 primes)
7919

Question: What is the difference between consecutive primes?

;; What is the difference between 10th and 11th?
> (- (stream-ref 11 primes) (stream-ref 10 primes))
2

;; What is the difference between 1000th and 1001th?
> (- (stream-ref 1001 primes) (stream-ref 1000 primes))
8

;; COMBINING STREAMS
;;---------------------
;; Make a new stream which is the difference between consecutive primes
;; (wishfully assume that stream map could take multiple streams...)

(define diff-primes
  (stream-map (lambda (p1 p2) (- p1 p2))
	      (force (cdr primes)) ; shift the prime seq to the left
	      primes))


> (stream-head 20 diff-primes)

(1 2 2 4 2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2)

> (stream-head 200 diff-primes)

(1 2 2 4 2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 14 4 6 2 10
2 6 6 4 6 6 2 10 2 4 2 12 12 4 2 4 6 2 10 6 6 6 2 6 4 2 10 14 4 2 4 14
....
6 4 8 6 4 8 4 14 10 12 2 10 2 4 2 10 14 4 2 4 14 4 2 4 20 4 8 10 8 4 6
6 14 4 6 6 8 6 12 4 6 2 10 2 6 10 2 10 2 6 18 4 2 4 6 6 8 6 6 22 2 10
8 10 6 6 8 12 4 6 6)

> (stream-head 1000 diff-primes)

(1 2 2 4 2 4 2 4 6 2 6 4 2 4 6 6 2 6 4 2 6 4 6 8 4 2 4 2 4 14 4 6 2 10
2 6 6 4 6 6 2 10 2 4 2 12 12 4 2 4 6 2 10 6 6 6 2 6 4 2 10 14 4 2 4 14
....
6 2 10 8 10 6 6 8 4 6 2 10 2 12 4 6 6 2 12 4 14 18 4 6 20 4 8 6 4 8 4
14 6 4 14 12 4 2 30 4 24 6 6 12 12 14 6 4 2 4 18 6 12 8)

The ancient Greeks proved (c. 300 B.C.) that there are infinitely many primes and that they are irregularly spaced in fact, there can be arbitrarily large gaps between successive primes. On the other hand, in the nineteenth century it was shown that the number of primes less than or equal to n approaches n/log n, as n gets very large (prime number theorem). A rough estimate for the nth prime is n log n.

The German-born American mathematician Don Zagier (1951-), in his inaugural lecture at Bonn University, put it this way: "There are two facts about the distribution of prime numbers which I hope to convince you.... The first is that despite their simple definition and role as the building blocks of the natural numbers, the prime numbers grow like weeds among the natural numbers, seeming to obey no other law than that of chance... The second fact is even more astonishing, for it states just the opposite: that the prime numbers exhibit stunning regularity, that there are laws governing their behavior, and that they obey these laws with almost military precision."

Streams as more efficient lists, or as Sudoku boards

What do Coloring a Graph and Solving a Sudoku Puzzle have in common?

They are both NP-complete problems! What this means (roughly) is that there exists graphs and sudoku boards for which the only strategy is to try every possible solution. Thus to build a general purpose graph coloring program or a general purpose Sudoku solver, one has to resort to trying every possible answer. This unfortunately leads to programs whose "time taken" increases exponentially with the problem size.

On the flip size, this also means that programs that solve graph coloring can be adapted to solve Sudoku puzzles! (including your back-tracker from Pset 3)

Consider the following:

All I need to do, is discard the invalid colorings / boards. A graph coloring is invalid if two adjacent nodes end up with the same color. A Sudoku board is invalid if:

In addition, if I have am given a puzzle with some fixed entries, then conflicting with those entries makes a board invalid

Here is a conceptually simple Sudoku solver.

             ____________________          ____________
             |                  |          |          |
  -------->  |  Discard invalid | -------> | Match my | -------->
             |    boards        |          | Puzzle   |
             |__________________|          |__________|
  list of                         list of                 list of
  possible                       all valid               solutions
number boards                  Sudoku boards            to my puzzle

Lets look at a simple case, a 2x2 Sudoku board

  +====+====+
  | A1 | A2 |
  +====+====+   has a state (A1 A2 A3 A4)
  | A3 | A4 |
  +====+====+

The space of all possible number assignments is a list of 16 boards:
(1 1 1 1) (2 1 1 1) (1 2 1 1) (2 2 1 1) (1 1 2 1) (2 1 2 1) (1 2 2 1) (2 2 2 1)
(1 1 1 2) (2 1 1 2) (1 2 1 2) (2 2 1 2) (1 1 2 2) (2 1 2 2) (1 2 2 2) (2 2 2 2)

Discarding invalid boards leaves us with: (1 2 2 1) and (2 1 1 2)

For large boards the list of possible number assignments becomes very large, very fast. For example, for a traditional sudoku board, this method ends up starting with a list of size 9^(81) = ~10^78. Furthermore we may not care to get all solutions, maybe one or a few will do.

We can formulate this problem using streams (and some wishful thinking)

;; Sudoku Solver for nxn board
;;-----------------------------
;; (the high-level structure)

(define (sudoku-solver size mypuzzle)
   (stream-filter (lambda (board) (match? mypuzzle board))
                  (stream-filter valid-board?
                                 (make-state-stream size (allones-board size)) )))

;; Wishful Thinking
;;------------------
;; (match? my-partial-board solutionboard) -> #t or #f
;; (valid-board? board) -> #t or #f
;; (make-state-stream) -> returns a stream of states (1111) (2111) (1211) .....

;; Stream version
(define (make-state-stream max start)
  (if (max-state? max start)
      (cons start '())  ; done
      (cons start       ; first state, followed by next state
	    (delay (make-state-stream max (next-state max start))))
      ))

;; For a 2x2 board, max number is 2 and initial assignment is all 1s
;; (make-state-stream 2 (list 1 1 1 1))
;; (stream-head 5 (make-state-stream 2 (list 1 1 1 1)))

The important thing is that when we call sudoku server, it starts to create a stream. But that stream will contain only one solution, and a promise to compute more if forced to. "Make-state-streams" never really computes the long list of states, rather it only computes them one at a time as long as "filter" pulls.

Making this more efficient:

;; New Sudoku Solver for nxn board
;;-----------------------------
;; (the high-level structure)

(define (sudoku-solver mypuzzle)
   (stream-filter (lambda (board) (match? mypuzzle board)) 
                  (sudoku-valid-boards (make-empty-board-stream (board-size mypuzzle)))))

;; Sudoku-valid-boards returns only valid nxn boards
(define (sudoku-valid-boards current-boards-str)
    (if (complete? (car current-boards-str))   ; The state has no "*"s, so we're done
        current-boards-str                     ; Otherwise,
        (sudoku-valid-boards                   ; Filter out invalid partial boards
             (expand-states                    ; and expand the rest by filling in the next "*"
                 (stream-filter valid-partial-board? current-boards)))))

;; Wishful Thinking
;;------------------
;; (match? partial-board1 partial-board2) -> #t or #f
;; (valid-partial-board? board) -> #t or #f
;; (expand-states str) -> given (1 * * *) produces (1 1 * *) (1 2 * *)

This new solver reduces the number of states we check. Even so, "current-board-str" may have a very large number of elements. But streams take care of the fact that we never keep around the whole list, but instead compute on demand. In essence we achieve an execution behavior similar to back-tracking (depth first search) but with a conceptually simpler program.

Dice, particles and streams with randomness

We can also use streams to represent "time". Think of a dice that you roll during a game. The sequence of rolls you make can be thought of as a list. If it is an infinite sequence of rolls, we could represent it as a stream.

(define (make-dice-rolls)
  (cons (+ (random 6) 1) ; roll between 1 and 6
	(delay (make-dice-rolls))))

;; Two dice roll sequences
(define d1 (make-dice-rolls))
(stream-head 20 d1)

(define d2 (make-dice-rolls))
(stream-head 20 d2)

Some questions we could ask

Particles and Brownian Motion

Similar to a dice, is a particle that executes a random walk. We can think of the trajectory of the particle as a stream. With this model we can build a simulator that allows us to ask questions about the motion of the particle just as we asked questions about the series of prime numbers.

;; A PARTICLE
;;-------------
;; is a stream of random steps
(define (make-particle startpos)
  (let ((newpos (random-step startpos)))
    (cons newpos
          (delay (make-particle newpos)))
    ))

Questions:

Define a stream of experiments!



CS51, Spring 2008, Radhika Nagpal