Stanford University
Stat 252 and MS&E 238
Spring 2008 class time: Monday 3:15 - 6:05 pm
Spring 2008 class location: Gates B01



Data Mining and Electronic Business

This course is about people and data: Collecting data about behavior on the web, in social networks, in communication, on dating sites, etc. Mining the data, building predictive models, creating (and rejecting) hypotheses, designing cool experiments, and learning from them quickly. And figuring out what is truly new, what is similar to the past, and what the underlying drivers are. We will discuss the impact of the communication and data revolution on individuals, business, and society, i.e., to many aspects of the world we live in.

The 90’s, the decade of algorithms (data mining), focused on the question: "Given these data, what insights can you get?". Great algorithms were invented, refined, and their strengths and weaknesses understood. The current decade is the decade of data (data mining), and the question has shifted to: "Given these problems, what data can you get?". Furthermore, economic aspects of data are becoming increasingly important, with the question becoming "Who pays whom?".

The first half of the course focuses on data: Click data (what all can be collected and what it is useful for), intention data (such the queries from the searches you do, we will also discuss social search), attention data (such as tags on social bookmarking sites with its important application for discovery), and interaction data (of email headers and social networking sites). The second half of the quarter focuses on models and on creating appropriate structures and incentives. We will discuss models for products (recommender systems), people (reputation systems), situation and location.

The second half discusses applications. They range from personalization, recommendations and online marketing (behavioral and situational targeting), to the principles behind collective intelligence, reputation systems and peer-production, as well as prediction markets as yet another way of gleaning data from people and fostering interactions between them.

Students are expected to actively engage in class discussions, to have their assumptions challenged, and to bring their various backgrounds to bear to make it a great experience for themselves and everybody else. We will also have some great guest speakers come to class.

After each class, a detailed write-up is created by the students as the [[|course wiki]] (see 2007). To help prospective students with the decision of whether to take this course, previous syllabi (2004, 2005) might also be useful.

Schedule: We meet once a week (Monday afternoons) for 3 hours. The dates in Spring 2008 are:
  • Apr 7 The Business of Data
  • Apr 14 Click, Intention, and Attention Data
  • Apr 21 Social Networks and Viral Marketing
  • Apr 28 Prediction Markets
  • May 5 Reputation Systems, Instrumenting the Planet
  • May 12 Location Data (Mobile)
  • May 19 Discovery Systems (Products, People)
  • [no class on May 26, Memorial Day]
  • Jun 2 Personal Genome (tentatively guest from 23andme)
  • Jun 6 Outlook, and Project presentation by students
    Note that the last class is our slot for finals: Friday, 12:15 - 3:15
Meeting only once a week proved useful in the past since it makes it as easy as possible for students to attend class in person. This is a lot more fun than just watching it over the web, and you learn a lot more. Note that this explicitly includes SCPD students who only signed up for remote access, just do not tell anyone :)

Course wiki: All students have full read/write access to the course wiki at [[|stanford2008.wiskispaces.com]]. I encourage you to actively contribute -- the class and you will benefit.

Grading: The main goal is that you get insights and that you transfer them to your area, coming up with some interesting ideas and applications. To support this objective, your grade will be determined by the following:

  • Course wiki: We will form 8 groups. Each group is responsible to create the initial wikipage for one of the classes by Friday 6pm (i.e., 4 days after class). These pages emphasize the key learnings of each class and have links to other materials wherever useful. [40%]
  • Homework: There will be assignments. They are due the day before class at 5pm, such that we can look through them and give brief feedback in a timely manner. [40%]
  • Class participation. [20%]
  • Project: If you have a good and solid idea for an interesting project, I am happy to give feedback and jointly decide on whether it makes sense to do the project. I encourage projects in small groups. [optional]
There are also internship opportunities available for students who like to code, both in the Bay Area and abroad, ranging from Bangkok (Agoda, online travel) to Helsinki (Fruugo, e-business).

Readings
Some of the material is very recent and originates from several academic disciplines. Besides statistics and computer science, it discusses modern marketing techniques, behavioral economics, social network analysis ideas and other concepts. Depending on your specific background and interests, the following might be useful:
Readings and mp3 recordings of the classes are online at weigend.com/files/teaching/stanford/. We also have a facebook group for the class.

Teaching Assistants, office hours and other information is on the [[|course wiki]].