HW2

 [|Andreas Weigend] Stanford University Stat 252 and MS&E 238 Spring 2008

=Homework 2=

(Big thanks for Harry Wang, who designed this entire homework!)

**Submit to TAs: ryantibs@stanford.edu / junli07@stanford.edu**
Note, everything you need to turn in is marked in **RED**. = = =**Exercise 1.**= __**Set up your web page, retrieve and analyze web access logs from your Leland account:**__

• Windows: • Mac
 * Step 1**, you need to download and install the necessary software for secure files transfer:
 * SecureFX, click [|here] to learn and click [|here] to download. You can use all default settings during installation.
 * Fetch, click [|here] (serial number included) to learn and click [|here] to download.
 * using Fetch would be similar to SecureFx, follow the [|guide] and remember to change "Hostname" to elaine.stanford.edu.

If you don’t already have a webpage, you will want to transfer one to the **WWW** folder. The opening page should be called **index.html** (a simple example ). (See picture below )
 * Step 2**, In SecureFX connect to **elaine.stanford.edu** and log in with your **SUNet ID** and password.

(If you already have your own website from which you can get logs, you can skip Steps 3 and 4.)


 * Step 3**, request [|here] to have your log dump generated for your Stanford web site (if you don’t do this, no log will be visible to you by default).
 * Note: according to the request page, the logdump will be generated in the morning of the next day of your request. So make sure you start this step early.**
 * Note: if you experience problems, please write to the TA immediately. IT has recently resolved an issue in their script processing the requests, but just in case.**

You can retrieve them through **SecureFX**.
 * Step 4**, now you should retrieve your web access logs from the server. It may take a day for the logdump to be generated. You can find them at **your_home_directory/WWW/logdumps/**.

If you don't know how to extract .bz2 or gzip files, you may want to try [|SecureZip] (freeware). If you see everything squeezed into one big line in the extracted file, that's because the file is in unix format ([|more to read] for whoever is curious), try Microsoft Word.

After creating your page and having your friends hit it a few times, you will need to wait another day for the logs to be refreshed

2) Formulate 3 questions to which you may be interested in finding the answers. ** Some example questions are: what is the most popular link in a certain page? or, how many unique ips are there per day?
 * Step 5**, now you can analyze your web log
 * 1) Comment on the format of logs, and print out a snippet.

1) Follow the instructions to set up your Google Analytics account. **Note: don't forget step 6 in the instructions to put the code right before tag of any page that you wish to be analyzed.**
 * Step 6**, analyze your website using Google Analytics,

2) In Google Analytics, click "View Reports" for your website

3) **You will be shown an Dashboard consisting of the several diagrams below. Take screenshots and submit these plots as part of your homework write-up, and comment on each of these plots, and how you can use some of the information to improve performance (for example, if you find a product you are selling may attract much more people from Asia than from U.S., you may want to focus on Asia market).**

=**Exercise 2:**= __**Automatic Data Service with Yahoo Pipes**__

In this exercise, we will use Yahoo Pipes to do automatic data collection and build alerts on top of it.

RSS is a popular method used to announce recently updated items. The data of a RSS feed is represented in [|XML] format. There are a lot of online services that allow you to subscribe to your favorite RSS feeds to keep yourself updated with the changes, such as [|igoogle], [|google reader], [|livejournal], [|newsgator], etc. The typical use of RSS is subscribe a RSS feed to your favorite RSS feed reader, and you can view all the content you care about in a single place. You can learn more about RSS [|here]. RSS feed is a common data source used in Yahoo Pipes.
 * Step 1**, understanding the basic concepts
 * What is an RSS feed?

“Yahoo pipes is a powerful composition tool to aggregate, manipulate, and mashup content from around the web.” - from [|Yahoo Pipes homepage]. We will show you an example in step 2, but we highly encourage you to [|learn more] about it beforehand. Here are some very good videos tutorials,
 * What is Yahoo Pipes?
 * http://www.jumpcut.com/fullscreen?id=594F555C568011DC9D24000423CEF5B0&type=movie
 * http://usefulvideo.blogspot.com/2007/02/yahoo-pipes-tutorials.html


 * Step 2**, understanding a real-world example

Assume you are sick of your landlord, and now looking for a new apartment. You want to find a “1 bed-room apartment that asks for less than $1400/month and is also cats-friendly in Palo Alto”. So you go to [|craigslist], and search for it, something like http://sfbay.craigslist.org/search/apa/pen?addTwo=purrr&bedrooms=1&maxAsk=1400. But you get two problems: first, craigslist only allows you to limit search to the “peninsula” area, so you have to search “palo alto” in the page; second, you can do the search only when you remember to do so, and you are usually too busy to remember to do it. So ideally, you want the process to be automated, and whenever there is a new listing that matches your requirement, you should be alerted.

Here is the Yahoo Pipe we created to solve the problem, http://pipes.yahoo.com/pipes/pipe.info?_id=bBXZNDgJ3RGCNBwWGsevXg shown in the picture below), and we can set up automatic alerts whenever there is a change of the pipe output. You should go there and view[| the source of the pipe] and play around. If you don't understand how the source code works, you should go back to **Step 1** and re-study some of the concepts.

After the pipe is created, you can set up alert on it whenever there is a change of the result, and you will get informed through email, or mobile, or yahoo messenger.




 * Step 3,** questions for you,

Now you can should design a similar problem, and implement a yahoo pipe to solve it. **Please publish your pipe and send the link in the homework submission, along with your problem definition.

**

Student Web Sites
[|Karen Ryberg] [|Eric Sun] (book project website) [|Bill Whiteley] [|Sunil Menon] [|Andreas Weigend] (ok so he is not a student) [|Ashlee Miller] [|Tom Bankston] (website for my dad's Internet radio station) [|Arun Saha] [|Sean Sit] [|Janine Molino] [|Ryan Mason - http://5pears.org] (my web playgound, mostly dedicated to motorcycling) user:rkm3 [|Yi-Fu Wu] [|Jaehyeok Heo] [|Ming Chen](Anyone know people from North Pole or Central Africa?) [|Shiling Lam] [|Charles Tripp] [|Jiajing Xu] (some travel tools) [|Yi Chai] [|Jaebock Lee] [|Ross Wait] <--- L@@K!! AWESOME WEBSITE; COOL PICS; CHEAP V1AGRA [|Elizabeth Reinoso] [|Myunghwan Kim] [|Pavani Vantimitta] [|Bin Shen] [|Wei-Ting Liu] [|Lin Chao ( Read about Green)] [|Daniel Cheng] <<< Terrible website that you must avoid getting addicted to [|Randal Truong]<<< Best game on the web! [|Shirley (Xinli) Bao] [|Shaun Maguire] [|Bonny Simi] <<<---Click here for a contest with a grand prize of a free airline ticket [|Nelson Ray] [|Enrique Allen] [|Sreeram Duvur]