Michael Tracey Zellmann CSCIE 259 proposal April 12, 2005 i. Title Consumer reviews harvested from the web and presented. ii. Author Michael Tracey Zellmann iii. Abstract. Reviews are harvested from amazon.com, audioreview.com, circuitcity.com and cnet.com. In the case of amazon they will be obtained through SOAP. Reviews will be massaged and arranged into an XML structure. Then they will be presented as a set of web pages, created through xslt. iv. What. Other reviews are harvested using perl with regular expressions. A java program will receive all the reviews and analyze them - are they new? are they duplicates or near duplicates? Each web site has unique naming conventions for products that are mapped to a consistent set of Bose product names. Each web site produces reviews with different fields. Common fields are established, and unique fields are handled for each site. All the reviews will then be written out as an xml structure in memory. A java program will then invoke transformations on that data using xslt style sheets to produce a linked set of pages. v. Why. This application exists presently. However, it scrapes the reveiws from web sites and is subject to problems when the web site makes any changes to the page layout. The new application will take advantage of Amazon's api to be insulated from those changes. As currently written, the page layout of the resulting pages is hardcoded in java programs. The new application will make maximum effective use of xml and xslt to use those tools for the presentation. vi. How. Not much to repeat beyond what has been said above. New skills will be thorough use of SOAP and xslt / xml to acquire the amazon material and present all the results. This will be useful to me in other applications. I will not include the perl programs in the project. At a minimum, there will files of the rveiws and the project will completely process the input on nice. The result should be a set of static web pages visible on nice. I am not sure if the amazon review harvesting will be executed on nice, but I will submit the code that accomplishes that. vii. Questions. How to make wise choices in the tradeoffs between java and xslt in producing the results. Possibly create the input as xml, adding xml creation to the perl programs - I could do that manually or there must be a good perl module to do that. Possible other tools that could be useful and good to learn about. Last night in CSCIE 275, I heard about the JDK1.4XMLEncoder/Decoder as a java -> xml -> java tool, but not suited to xml hat hadn't been created by JDK1.4XMLEncoder/Decoder. Also Castor. This Thursday, the New England JAVA Uers Group monthly meeting, which I try to attend has this agenda which may be quite relevant: This session will help developers understand what it takes to develop Web services that are interoperable and re-usable across their Service Oriented Architecture (SOA) infrastructure. We will cover key Java and Web services technologies such as J2EE 1.4, JAX-RPC, BPEL, WS-I, WS-Security, and WS-Reliability. We will start with designing, implementing and consuming Web services, and conclude with a discussion on how to build Web services-based business processes using Business Process Execution Language (BPEL). Finally, just to be clear, I have included a zip file entitled allBuzz_4_2_2005. If you unzip it, you will get a folder named Buzz_Scorcard - it is the latest version of the report. Inside the folder is an html page entitled home.html You should be able to open that directly and follow the links down through category pages, and product pages to the pages that list the reviews. This should help see what the output is. All this information has been harvested from publicly available sources, but I would appreciate it if you would not broadcast it. I would appreciate any advice you can offfer.