Andreas Weigend
Stanford University
Data Mining and Electronic Business
Stat 252 and MS&E 238
Spring 2008

Student Wishes

Poll is up now, please vote for as many speakers as you like!

Content suggestions

What would you like to learn?
I am particularly interested to learn how to apply class learnings in a corporate environment. I feel that "Corporate America" does not understand how to actually apply the latest intelligence that is right under their nose (blogs, emails, recommender systems, etc). Within the airline industry, 90% of senior leaders are over the age of 45 and these leaders have no concept about Web 2.0 (or even 1.0). They think the web consists of Google, banner ads, and email. Sadly, my company falls into this category. I would like to change this.
-Bonny Simi, JetBlue

I like the idea of having projects presented during finals week. It sounds like people in the class have a diverse background and it would be interesting to see what people are doing.
-Karen Ryberg, USGS

I echo Bonny Simi. What compounds the problem of lack of awareness is a fear that these new technologies provoke in some management. Especially in companies that have a very strict hierarchy model of management built on knowledge control, people don't like to see their hierarchies dismantled if they have spent a lot of time climbing to the top based on those systems. Even if they stand to gain a lot operationally, their fear gets in the way. Or, companies want to adopt so-called "Web 2.0" enterprises but they want to adopt it in a way where they can still retain a hierarchy that they are comfortable with and have someone control it -- completely missing the point about collaboration! That's why I'm so interested in learning about companies that have done this effectively - both startups with new and fresh ideas, and established companies that know how to evolve.
-Ashlee Miller, Stanford

Guest speakers

Who would you like to hear?

Gordon_Bell.jpgGordon Bell (wikipedia, official bio) -
He is pretty much a legend in the development of computers in general including his relatively famous Bell's Law of Computer Classes. There are even prizes for computing performance named after this gentlemen. However, our class may be interested in some of his more recent work involving My Life Bits at Microsoft. It is basically an attempt to store an entire life on a single computer using devices such as a sensecam. I believe this work is particularly interesting from a data mining perspective, not to mention that he is local to Silicon Valley, which I believe increases the probablity of him actually coming to speak to our class. -Bill Whiteley, Northrop Grumman

I am incredibly interested in text analytics. There is a company located in Palo Alto that is a leader in the field - Attensity ( The CEO or CTO might be a great speaker. They could show how they take unstructured data and turn it into something that companies can use. In particular, they are mining blogs and recommender sites to help companies act upon this data. My company ( is considering this software, and I would like to see them put through the rigor of the class to see how they respond to class questions.
-Bonny Simi, JetBlue

Toby Segaran - I read most of the book prior to class starting. I am a statistician and it was refreshing and motivational to read about all of the statistical methods he used in such an approachable, applied context.
-Karen Ryberg, USGS

Mike McCready - CEO, Platinum Blue. I am very interested in predictive statistics, and McCready's Platinum Blue has been succesful in predicting hit songs (ex. Norah Jones' success due to high scores in McCready's song evaluations). A new company, Epagogix is using a neural network to do a similar thing with movies.
-Adam Ting, Stanford

Seriosity: Incentives
Flickr - How do they manage millions of photos? What have they learned from the tags?

Paul Graham, Y Combinator

It would be interesting to hear what Jeff Hammerbacher (head of data analytics at Facebook) has to say about new kinds of data that they're collecting that they might not know what to do with yet.
-Eric Sun, Stanford

David Cole, partner of lunexa, which is a consulting firm focused on providing advisory and implementation services to help clients unlock opportunities from their data assets.
-Ming Chen, Stanford

Someone from Google research

Seth Goldstein - Majestic Research,
Someone from Farecast. They predict airline prices and recommend users to buy tickets now or wait for prices to go down.
-Yi-Fu Wu

Aspects of image recognition for photos and videos
Faces, image biometrics-- Neven Vision part of Picasa/ Google
Visual Search Engine-
Photo synth-- Microsoft Research
- Lin Chao Intel.

I'm interested in mobile data mining and would like to hear about some of the work they're doing at MIT on this with reality mining.
Nathan Eagle for example is a Stanford Grad who might oblige inspite of being on the east coast.

- Arun Saha, Cisco

Executive from LinkedIn - professional social network data mining
Executive from Blizzard - online game/entertainment related data mining
If possible at all, Jack Ma from (if he just happens to be near Stanford?)
- Yi Chai

1. Speaker from Predictify
Predictify: a web-based predction platform which utilizes the wisdom of crowds and a fun and engaging customer experience to gather forward looking information and to revolutionize the ways market research and brand advertising are done.
2. Speaker from NebuAd
NebuAd is transforming the online advertising industry with the first consumer-centric behavioral targeting network.
3. Anne Wojcicki from 23andMe


Tim O'Reilly- the founder of O'Reilly Media. I would like to hear about his idea on social media and other evolution of the web.
- Wei-Ting Liu

I am not sure of specific people in any of these areas, but I think it would be interesting to hear from
1. the CEO, COO, or someone in research at Google or
2. someone from Facebook

I would be interested in having a guest lecturer from Google as I am interested in the specific algorithms they use to develop the list of results corresponding to different searches. I use Google all the time to do research and sometimes my searches yield exactly what I am looking for but other times the search results are all the way in left field. I would also be interested in hearing from someone who works at facebook and who has developed a facebook application. I use facebook all the time and am always getting invitations to download various applications. I am interested in knowing what goes into developing these applications, how one comes up with the idea for a specific application (i.e. is it just an idea the creator had or do they use the data gained through facebook to develop applications that are taylored to specific facebook users). I am also interested in why you have to allow the application access to all the information in your facebook account and to know who you are in order to download (i.e. what do they do with all this information after you have downloaded a given application).

Someone from Cake Financial ( or Prosper ( I know some current and former engineers at Prosper, so I might be able to help make an introduction there.

Tom Bankston, Authenticlick

1) Prosper is very exciting, I'm glad you have some contacts Tom. I'm intersted in modeling risk and predicting behavior, and I think P2P lending is an excellent example under this category. The main difference between this idea and say, a targeted ad campaign seems to bbe the element of risk for the individual. Lenders in this market have more to gain, but possibly more to lose. So I wonder, how does risk impact a successful strategy to P2P lending? The main criticism of this idea is that individuals are not informed enough to set appropriate rates. So I wonder, if the latter is true, what data would people need in order to be successful? Is it possible?

2) Don Tapscott from New Paradigm (author of Wikinomics) explores ideas related to people & data from a corporate persepctive but it probably wouldn' t be feasible to get him here. The most interesting aspect of the corporate perspective to me is the way businesses are reversing the notion that data should be propriety, and are becoming more collaborative and open with their data to the outside world. It parallels our discussion of "who should pay" for a service (the reversal of companies paying for something consumers used to pay for). This area addresses the question "who should know" or "who should control" the data - it used to be the company, now it is the consumers/competitors, everyone.

3) I'm also interested in Mobio - offers widgets/mashups on your mobile to give you information about services in your current location. based in Cupertino so maybe a possibility. I'm suggesting this because I'm very interested in the category of mobile services, perhaps there are others that are even more leading edge. I think most consumers are looking to simplify their lives (a purpose we discussed in class) and so I would love to hear a speaker from a company who offers real-time decision making data for users on mobile phones.
-Ashlee Miller, Stanford

1. Founders, corporate strategists, or scientists from
2. Key people from
3. Scientists/Engineers from and
-Yuen Cheng, Stanford

One of my interests has to do with music discovery and recommendations. For anyone that listens to a lot of music, finding new material that suits your tastes perfectly can be extremely difficult. I think this extends to most forms of media where taste is very subjective. There are a number of companies that try to leverage user opinions and knowledge to make relevant recommendations, and I would be interested in hearing some of the challenges that they have dealt with or are currently facing. Some interesting guest speakers would be someone from Pandora,, or iLike. iLike also has a strong presence as a popular app on Facebook, so they could bring some particularly interesting insights into this area.

Another possibility along the lines of Prosper would be to have someone from LendingClub. I have some contacts here so could possibly help make this happen.
- Jaebock Lee, Stanford/Oracle

I am really interested in incentives and reputation. I think some interesting speakers would be:

1. Someone from Yahoo Answers. Markets for information have a lot of potential and Yahoo Answers has been one of the most successful to date, despite the fact they shunned most of their competitors strategy of involving money. I would love to learn about the incentives they have put in place in order to encourage users to take time out of their busy lives to answer random people's questions. Is their success a result of the community they have built? Their reputation system? The mutual benefits? What are they planning on doing with all this data they are collecting? They are learning the things people are interested in and the skills they offer.
2. Someone from Sourceforge In some ways, this is similar to Yahoo answers in that it has created a market for information. Most of the previous questions still apply.
3. Someone working on the openID project. I simply think this is one of the most interesting things happening on the web right now and that it will create lots of opportunities. I would love to have someone explain their vision for the project and some of the consequences they believe it will have on the web.
- Shaun Maguire, Stanford

Interested in the following people to give a talk,
1. Google CEO (either Larry Page or Sergey Brin) to talk about how google is thinking about the social graph.
2. Jimmy Wales, Founder of wikipedia
3. a VC that's passionate about social data mining (I don't know a name, sorry...) - how they see when to fund

Avinash Kaushik - Head Analytics Guru at Google and author of the great blog Occam's Razor which details how to be smarter about using analytics

I'm interested in any unique leverage offered by mobile applications and their development. Perhaps someone from RIM speaking on the Blackberry developer program and what unique benefits/capturable metrics their mobile products offer to data collection applications.

-Steven V.