Andreas Weigend
Stanford University
Stat 252 and MS&E 238
Spring 2008

Homework 2


(Big thanks for Harry Wang, who designed this entire homework!)

Submit to TAs: ryantibs@stanford.edu / junli07@stanford.edu


Note, everything you need to turn in is marked in RED.

Exercise 1.

Set up your web page, retrieve and analyze web access logs from your Leland account:

Step 1, you need to download and install the necessary software for secure files transfer:
• Windows:
  • SecureFX, click here to learn and click here to download. You can use all default settings during installation.
• Mac
  • Fetch, click here (serial number included) to learn and click here to download.
    • using Fetch would be similar to SecureFx, follow the guide and remember to change "Hostname" to elaine.stanford.edu.

Step 2, In SecureFX connect to elaine.stanford.edu and log in with your SUNet ID and password.
252_securefx_setting.png
If you don’t already have a webpage, you will want to transfer one to the WWW folder. The opening page should be called index.html (a simple example ). (See picture below
)securefx_1.png

(If you already have your own website from which you can get logs, you can skip Steps 3 and 4.)

Step 3, request here to have your log dump generated for your Stanford web site (if you don’t do this, no log will be visible to you by default).
Note: according to the request page, the logdump will be generated in the morning of the next day of your request. So make sure you start this step early.
logdump.png
Note: if you experience problems, please write to the TA immediately. IT has recently resolved an issue in their script processing the requests, but just in case.

Step 4, now you should retrieve your web access logs from the server. It may take a day for the logdump to be generated. You can find them at your_home_directory/WWW/logdumps/.
You can retrieve them through SecureFX.
252_www_logdumps.png

If you don't know how to extract .bz2 or gzip files, you may want to try SecureZip (freeware). If you see everything squeezed into one big line in the extracted file, that's because the file is in unix format (more to read for whoever is curious), try Microsoft Word.

After creating your page and having your friends hit it a few times, you will need to wait another day for the logs to be refreshed

Step 5, now you can analyze your web log
1) Comment on the format of logs, and print out a snippet.
2) Formulate 3 questions to which you may be interested in finding the answers.

Some example questions are: what is the most popular link in a certain page? or, how many unique ips are there per day?

Step 6, analyze your website using Google Analytics,
1) Follow the instructions to set up your Google Analytics account. Note: don't forget step 6 in the instructions to put the code right before </body> tag of any page that you wish to be analyzed.

2) In Google Analytics, click "View Reports" for your website

google_analytics1_small.jpg
3) You will be shown an Dashboard consisting of the several diagrams
below. Take screenshots and submit these plots as part of your homework write-up,
and comment on each of these plots, and how you can use some of the information to improve performance
(for example, if you find a product you are selling may attract much more people from Asia than from U.S., you may want to focus on Asia market).

google_analytics2_small.jpg

Exercise 2:

Automatic Data Service with Yahoo Pipes

In this exercise, we will use Yahoo Pipes to do automatic data collection and build alerts on top of it.

Step 1, understanding the basic concepts
  • What is an RSS feed?
RSS is a popular method used to announce recently updated items. The data of a RSS feed is represented in XML format. There are a lot of online services that allow you to subscribe to your favorite RSS feeds to keep yourself updated with the changes, such as igoogle, google reader, livejournal, newsgator, etc. The typical use of RSS is subscribe a RSS feed to your favorite RSS feed reader, and you can view all the content you care about in a single place. You can learn more about RSS here. RSS feed is a common data source used in Yahoo Pipes.

  • What is Yahoo Pipes?
“Yahoo pipes is a powerful composition tool to aggregate, manipulate, and mashup content from around the web.” - from Yahoo Pipes homepage. We will show you an example in step 2, but we highly encourage you to learn more about it beforehand.
Here are some very good videos tutorials,

Step 2, understanding a real-world example

Assume you are sick of your landlord, and now looking for a new apartment. You want to find a “1 bed-room apartment that asks for less than $1400/month and is also cats-friendly in Palo Alto”. So you go to craigslist, and search for it, something like
http://sfbay.craigslist.org/search/apa/pen?addTwo=purrr&bedrooms=1&maxAsk=1400. But you get two problems: first, craigslist only allows you to limit search to the “peninsula” area, so you have to search “palo alto” in the page; second, you can do the search only when you remember to do so, and you are usually too busy to remember to do it. So ideally, you want the process to be automated, and whenever there is a new listing that matches your requirement, you should be alerted.

Here is the Yahoo Pipe we created to solve the problem,
http://pipes.yahoo.com/pipes/pipe.info?_id=bBXZNDgJ3RGCNBwWGsevXg shown in the picture below), and we can set up automatic alerts whenever there is a change of the pipe output.
Picture_2.png
You should go there and view the source of the pipe and play around. If you don't understand how the source code works, you should go back to Step 1 and re-study some of the concepts.

After the pipe is created, you can set up alert on it whenever there is a change of the result, and you will get informed through email, or mobile, or yahoo messenger.
yahoo_pipes_alert2.JPG

yahoo_pipes_alert3.JPG

Step 3, questions for you,

Now you can should design a similar problem, and implement a yahoo pipe to solve it. Please publish your pipe and send the link in the homework submission, along with your problem definition.


Student Web Sites

Karen Ryberg
Eric Sun (book project website)
Bill Whiteley
Sunil Menon
Andreas Weigend (ok so he is not a student)
Ashlee Miller
Tom Bankston (website for my dad's Internet radio station)
Arun Saha
Sean Sit
Janine Molino
Ryan Mason - http://5pears.org (my web playgound, mostly dedicated to motorcycling) - rkm3 rkm3
Yi-Fu Wu
Jaehyeok Heo
Ming Chen(Anyone know people from North Pole or Central Africa?)
Shiling Lam
Charles Tripp
Jiajing Xu (some travel tools)
Yi Chai
Jaebock Lee
Ross Wait <--- L@@K!! AWESOME WEBSITE; COOL PICS; CHEAP V1AGRA
Elizabeth Reinoso
Myunghwan Kim
Pavani Vantimitta
Bin Shen
Wei-Ting Liu
Lin Chao ( Read about Green)
Daniel Cheng <<< Terrible website that you must avoid getting addicted to
Randal Truong<<< Best game on the web!
Shirley (Xinli) Bao
Shaun Maguire
Bonny Simi <<<-------Click here for a contest with a grand prize of a free airline ticket
Nelson Ray
Enrique Allen
Sreeram Duvur