CS 51 Week 2, L2:
The Map/Reduce/Filter Trilogy


Today's topics:

Announcement: IMPORTANT!!







Functions as First Class Citizens

We often think of procedures and data as being fundamentally different things.

  
   data = numbers, symbols, lists
   functions = procedures that manipulate data

But that's really an artificial distinction. There are many cases where we might want to manipulate functions just as we manipulate data. For example, consider a function that "sorts" elements in a list in ascending order. The act of sorting remains the same whether the list contains strings or numbers. But the function "lessthan" means something different in both cases. One possible solution: create a function sort that takes a function "lessthan" as an argument.

In Scheme, we can treat functions like data.

Today we will see how we can exploit this capability to capture common patterns of programming that occur in many different application areas.

Introduction to Lambda

Lambda comes from lambda calculus, and in Scheme it is the way we create functions.

> (lambda (arg1 ...) body)    <---- returns a PROCEDURE OBJECT

> (lambda (x) (* x x))        <---- a procedure with one argument

> (lambda (x y) (* x y))      <---- a procedure can have multiple arguments

> (lambda () (display "\n"))  <---- or even no arguments



> ( (lambda (x) (* x x)) 2)   <--- Applying the procedure to arguments
4

The truth behind define: Previously we created both variables and functions using define. But in reality define has a much simpler behavior - it only creates a binding between a name and a value. It is really lambda that creates the function.



 (define (square x) (* x x))

  IS SYNTACTIC SUGAR FOR

 (define square (lambda (x) (* x x)))


Syntactic sugar means that the interpreter literally replaces the text for define with the lambda form before evaluating it - thus the way we use define is a short-hand to avoid writing lambda all the time.

However lambda is by itself useful for creating anonymous functions , i.e. temporary functions that have no name. We will see examples of doing this throughout the lecture today.

The Trilogy: Recognizing Common Patterns in Lists

MAP

Consider the following two functions on lists

(define (scale-list s lst)
	(if (null? lst)
	    '()
	    (cons (* s (car lst)) (scale-list s (cdr lst)))))

(define (increment-list lst)
	(if (null? lst)
	    '()
	    (cons (+ 1 (car lst)) (increment-list (cdr lst)))))

There is a common way in which they manipulate a list. They both take a list as an input and "transform" the list by applying a function to every member of the original list.

      THE PATTERN 
      ------------
      (define (NAME lst)
	   (if (null? lst)
	       '()
	       (cons (FUNC (car lst)) (NAME (cdr lst)))))
This pattern is called MAP
(define (map myfunc lst)
	(if (null? lst)
	    '()
	    (cons (myfunc (car lst)) (map myfunc (cdr lst)))))
Using map we can rewrite the first two functions functions.
	(define (scale-list s lst) 
	    (map (lambda (e) (* s e)) lst))

	(define (increment-list lst)
	    (map (lambda (e) (+ 1 e)) lst))
But we can also easily express other useful functions like find-and-replace!
       (define (find-and-replace aword newword page)
            (map (lambda (e) (if (equal? e aword) newword e)) page))

How to think of Map

               ABSTRACTIONBARRIER
(e1 e2 e3...)  |                | (f(e1) f(e2) f(e3)....)
-------------> |     MAP (f)    |------------------------->
               |                |
               ABSTRACTIONBARRIER

FILTER

Map creates a new list of the same size as the input list. Another pattern is taking a list and removing some of the elements, for example:

(define (remove n lst)
   (cond ((null? lst) '())
         ((equal? (car lst) n) 
          (remove n (cdr lst)))
         (else 
          (cons (car lst) (remove n (cdr lst))))
   ))

We can generalize this to create FILTER, a function that filters a list to keep only those elements that satisfy a given predicate.

(define (filter pred? lst)
  (cond ((null? lst) '())
	((pred? (car lst)) 
	 (cons (car lst) (filter pred? (cdr lst)))
	(else 
	 (filter pred? (cdr lst))))
	))

And we redefine remove as

(define (remove n lst)  
   (filter (lambda (e) (not (equal? e n))) lst))

Some more examples of things we can do with filter
;; Filter out odd elements
   (filter even? lst)

;; Compute the bias of a coin from a list of coin tosses
  (define (compute-bias coin-toss-list)
     (/ (length (filter head? coin-toss-list))
        (length coin-toss-list)))

               ABSTRACTIONBARRIER
(e1 e2 e3...)  |                |   (e1 e3.....) 
-------------> | FILTER (pred?) |------------------------->
               |                |   removed elements that fail
               ABSTRACTIONBARRIER

REDUCE

Lets look at a very different pattern


(define (mul-list lst)
	(if (null? lst)
	    1
	    (* (car lst) (mul-list (cdr lst)))))

;; you can similarly define max-list or min-list..

(define (length lst)
	(if (null? lst)
	    0
	    (+ 1 (length (cdr lst)))))

In each case, we take a list and REDUCE it to a single value. What differs is the function we use to combine the elements, and the initial value.

(define (reduce func value lst)
	(if (null? lst)
	    value
	    (func (car lst) (reduce func value (cdr lst))) 
))

Again we can rewrite the previous functions very succinctly

(define (mul-list lst)	(reduce * 1 lst))
(define (max-list lst)  (reduce max 0 lst))

(define (length lst)    (reduce (lambda (carlst value) (+ value 1)) 0 lst))

;; Note that "func" must take 2 arguments


Summary:

MAP func, list --> transform the list by applying func to every element
REDUCE func, value, list --> collapse the list by computing (f e1 (f e2 (f e3 value)))
FILTER pred?, list --> return a new list where only the elements that satisfy pred? have been kept.

Important Notes:

Week 2 Pithy Design Quote
Reuse is old code on new data



Examples

More generally we can think of lists as sequences and writing operations on sequences. Map, Reduce and Filter are a Abstraction Layer on top of sequences. Think of sequences as SIGNALS that pass through Map/Reduce/Filter BOXES and we can connet these boxes in different ways to compute different functions. (SICP section 2.2 has many examples)


Questions:





MapReduce ala Google

(Jeff Dean and Sanjay Ghemawat,
OSDI 2004, 6th Symposium on Operating System Design and Implementation)

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. The model is inspired by similar primitives in LISP and other languages. Many real world tasks are expressible in this model: distributed find, distributed sort, web access log stats, document clustering, inverted index construction, statistical machine translation, etc

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.


Images as Lists

Suppose, we could represent an image as just a list of pixels. Then transforming an image is transforming lists. And the same paradigms are very useful.


;; Image as a list
;;-------------------
;;
;; The image->colorlist and colorlist->image convert jpegs/gifs
;; to and from a list of color pixels.
;;
;; We can treat the image as a list of color pixels, 
;; A color pixel has the following abstraction interface
;; (make-color R G B), color-red, color-green, color-blue

;; Map for Images
;;-----------------
;; Although we want to "think" of the image as a list,
;; we want to view it as an image. But calling colorlist<->image
;; everywhere is ugly. 
;;
;; Therefore we will create a MAP abstraction just for IMAGES
;; MAP image, myfunc -> newimage with myfunc applied to every pixel
;;
;; myfunc must take as input a color pixel, and return a color pixel.
;; MYFUNC color --> color

(define (image-map myfunc img)
     (color-list->image (map myfunc (image->color-list img))
                   (image-width img) (image-height img) 
                   (pinhole-x img) (pinhole-y img)))


;; Extracting the RED content of a pixel
;;--------------------------------------
(define (onlyred c)  (make-color (color-red c) 0 0))

;; Displaying only the RED content in an image
;;--------------------------------------------
(image-map onlyred img)

;; Scale a pixel by some value
;;-----------------------------
(define (scale-color weight c)
  (make-color (floor (* (color-red c) weight))
              (floor (* (color-green c) weight))
              (floor (* (color-blue c) weight))))

;; Darken the Image
;;-------------------
(define (image-darken fraction img)
   (if (>= fraction 1)
       'error-scale-factor-must-be-less-than-one
       (image-map (lambda (c) (scale-pixel fraction c)) img)))


;; Image-merge
;;--------------
;; Take two images and a weight, and merge the teo images so that
;; every pixel p = weight*p1 + (1-weight)*p2

(define (image-merge2 weight img1 img2)
  (if (> weight 1)
      'error-weight-must-be-less-than-one
      ;; Trim images to be the same size
      (let ((w (min (image-width img1) (image-width img2)))
            (h (min (image-height img1) (image-height img2)))
            (merge (lambda (c1 c2)
                     (add-colors (scale-color weight c1) 
	             (scale-color (- 1 weight) c2)))))
        (image-map2 merge
                    (shrink-tl img1 w h) 
                    (shrink-tl img2 w h)))
      ))


(define (image-map2 myfunc img1 img2)
     (color-list->image (map myfunc (image->color-list img1)
                                    (image->color-list img2))
                   (image-width img1) (image-height img1) 
                   (pinhole-x img1) (pinhole-y img1)))

;; Image Morph Series
;;-------------------
;; Make a series of images where one image morphs into another
;; by combining them with different weights

(define (make-morph-series weightlist img1 img2)
    (map (lambda (weight) (image-merge2 weight img1 img2))
         weightlist))




CS51, Spring 2008, Radhika Nagpal