Andreas Weigend
Stanford University
Data Mining and Electronic Business
Stat 252 and MS&E 238
Spring 2008

Audio (as mp3): http://www.weigend.com/files/teaching/stanford/recordings/WeigendStanford2008Class8

Shiling Lam (shiling@stanford.edu): Interested in working on this page.
Alex Gleitz - would also like to help with this page agleitz@stanford.edu
Karen Ryberg - I'll help as well kryberg@stanford.edu
Satish Veerapuneni - I'll be interested as well sveerap@stanford.edu
Steven Vasilakos - I would like to participate svasilak@stanford.edu
Johnny Hwin - I would like to participate johnnyhwin@gmail.com
Rob Holmes - I need to participate holmes_rob@gsb.stanford.edu

Class 8

Housekeeping/ ASW

Our last class is Friday:
  • Each group will have a maximum of 3 minutes to present key insights in homework 6
  • Hitchoo case study
  • What students and friends are doing (see also Jobs and Internships)
  • Interested in doing a reception on Friday after class? If so, what shall we get?

Geolocation Project presented by Ryan Mason

  • Andreas will start carrying Ryan's device after class!
  • Google Maps API allows to draw complicated maps simply. Possibilities include:
    • Different number of data points shown based on the zoom level - when zoomed out, points close together cluster into one symbol
    • Different icons can be used based on user criteria (color, shape)
    • Layers of KML data can be updated dynamically
  • KML is the format used by Google Earth as well as a standard
  • Ryan shared his Google mapping project, which combines photos, waypoints and tracks including the checkpoints from the Spot
  • Additional reading material on the topic - a research paper Visual Analytics Tools for Analysis of Movement Data. Other interesting articles can be found on kdd.org - special interest group of ACM that focuses on Knowledge Discovery.

About policing of social networks - Orkut (by google) has been the target of many issues such as its hosting of hate groups, child pornography and numerous security issues. As a result, it's been the target of policing in India and Brazil (where it's most popular):
"Police tie up with Orkut" - http://www.hindu.com/2007/11/29/stories/2007112960280300.htm
"Friends of slain teen arrested, Orkut angle being probed" - http://www.indiaprwire.com/businessnews/20070821/23999.htm

Presentation Title - Data Mining, Social Networks, Time Travel and Privacy

Topics

  • Value of Privacy
  • Threats to Privacy
  • Erasure of 4th Amendment
  • Ease of use and user choice as negatives

The 4th Amendment says "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized." A transcript of the Bill of Rights (Amendments 1-10) is available on a National Archives web site.

Don't be big brother's "preferred vendor."
-Think about how facebook could go wrong for falun gong, burmese monks, german jews

The US mandated that all digital telephony were rigged such that you can wiretap...
-What if the technology we built were placed in such a way as to oppress billions of people?

Think about facebook reversed signup dynamic
-install apps = give some company your data
-in the end, the architecture will set defaults

paradox: end-user control prevents negotiation
-we put up with terms of agreement that are subject to change which means we have no say so why agree in the first place? there's no room to negotiate.

data exported is data lost.
-most web companies are step one underpoints step 2 ?? step 3 monetization. b/c web companies don't know how to monetize yet they're less willing to part ways with your potentially monetizable information. but data exported is data lost.



Break

Begins at 1:23:08 on video, ends 1:38:06

Credibility--people are on the web / Enrique Allen

1:39:35 on video

  • Quickly review what is credibility and trust
  • Review study design and results
  • Propose how to study credibility on Facebook (discussion)

Definition of credibility - believability, assessing the quality of something with a number of inputs. Two key dimensions: trustworthiness and expertise.

The online consumers in the Stanford PTL study and our expert panel of evaluators diverged greatly in their credibility assessment criteria. Overall, our experts were far less concerned about visual appeal as a marker of credibility than the Stanford PTL consumers, and they were more concerned about the quality of a site's information. Among other conclusions, this study found that health experts assigned more credibility to health sites that provided information from reputable sources and cited the names and credentials of authors for each article published. Finance experts assigned more credibility to finance sites that provided investors with a great deal of unbiased educational information and research, rather than nudging consumers toward their own products or services.
The Stanford PTL study found that consumers tended to rely heavily on overall visual design when assessing Web sites, including layout, typography and color schemes. About half (54.6%) of the comments by the consumers regarding finance sites referred to design look, which relates to the visual appeal of a site's design, compared to only 16.4 percent of finance expert comments on this topic. Likewise, 41.8 percent of consumer comments regarding health sites made note of a site's design look, compared to 7.6 percent of surveyed health experts' comments which mentioned this topic.
Our health experts most often relied on the name reputation of a site, its operator, or that of its affiliates, when assessing the credibility of health Web sites (43.9% of health expert comments related to this credibility criterion). The next most common issues mentioned when evaluating health site credibility were information source, which relates to the citation of a site's information sources (25.8%), and company motive, which relates to a user's perception of the motive of the organization behind the site, whether good or bad (22.7%).
Our finance experts most often relied on a site's scope or information focus when assessing the credibility of finance Web sites, which includes consideration of the quantity of information provided (40.3% of finance expert comments related to this credibility criterion). The next most common issues mentioned when evaluating finance site credibility were company motive (35.8%), and information bias (29.9%), which relates to a user's perception of bias in the site's content.
This study also reveals which specific elements lend credibility to a site's perception, according to each health and finance expert group. In addition, the report provides recommendations to Web publishers, particularly those in the health and finance fields, which aim to increase the credibility of sites among each type of Internet audience.


Note: Web credibility is increasingly about people.
http://www.slideshare.net/bjfogg/web-credibility-bj-fogg-stanford-university/


Persuasion Theories Relevant to Trust


Note that even though these theories describe attitude change, they may also apply to behavior change, like overcoming resistance to install a FB app or sharing info via Personal Profiles. We process information differently based on the cues around the information. In a high trust context we are likely to not think as hard about the persuasive attempt. There are several theories that describe two models of processing info, based on cues.

Elaboration Likelihood Model (ELM)
Elaboration Likelihood Model, or ELM, (Petty and Cacioppo, 1986), cognitive processing is the central route and affective/emotion processing is often associated with the peripheral route. The central route pertains to an elaborate cognitive processing of information while the peripheral route relies on cues or feelings. The ELM suggests that true attitude change only happens through the central processing route that incorporates both cognitive and affective components as opposed to the more heuristics-based peripheral route. This suggests that motivation through emotion alone will not result in an attitude change.

Heuristic-Systematic Model (HSM)
According to the competing theory (by Chaiken, Liberman, & Eagly, 1989) information is either processed in a high-involvement and high-effort systematic way, or information is processed through shortcuts known as heuristics. Emotions (affect heuristics, feelings and gut-feeling reactions are often used as shortcuts.
Situational Theories
The big idea is that our situation/context controls our behavior more than other factors - at least a lot more than we expect. The leading thinker (Lee Ross) is here at Stanford who put them forth in his book: The Person and the Situation(short summary here).
Further Reading (Case Studies)
Trust and the Augmented Social Network: Enhancements to Online Community Infrastructures
The authors discuss a model for increasing the level of cooperation and user engagement in social networks by facilitating more trusted interactions. They explore communities of practice (a group of people who share an interest and engage in collaborative learning and/or working) as a type of network that builds such interactions.

How Not To: A Case Study in Breach of Trust at Eons.com
Eons.com started as a social network for "mature adults" with initial tagline of "Lovin' Life on the Flip Side of 50". Funded with $32M of venture capital the company seemed unstoppable. When the pressures to monetize mounted Eons.com dropped the age requirement in the process losing trust of members who promptly deserted the site.
"Welcome to the big time, Facebook": Beacon and User Privacy
Facebook faced a political nightmare in the wake of launching Beacon, a product that broadcast users' purchases from participating companies to their friends using the site's newsfeed feature. This CNET blog entry covers the controversy in detail, from MoveOn.org's allegations that the program violates user privacy to complaints about Beacon lodged with the Federal Trade Commission. It also examines the parallels between users' reactions to Beacon and to Newsfeed, which in 2006 prompted a sea of complaints (and a public apology from Facebook's founder).

More from the persuasive technology lab:

http://credibility.stanford.edu/
http://captology.stanford.edu/

Joy Mountford

2:00:25 on video

Big companies have a unique opportunity to use their immense amounts of data to provide insights into the information world around us and how consumers relate to it. Joy will show a series of compelling visualisations that her Design Innovation Group at Yahoo, in SF, produced using Java and Processing tools. Her YHaus team created dozens of innovative, interactive feeds from real Yahoo! data. These feeds show FAA flight paths and NYTE (ATT data) as well as local traffic data segments, turned into dynamic parabolic trails of light, Yahoo Answers feeds showing key words as animated word clouds, the extent of Y! mail traffic reach and topical search query bursts around the world. She will show applications using a multi-user, multi-touch point large screen to show Geo-tagged FlickR photo feeds and the Internet Archive's entire out of print book collection, all available on a large interactive surface. Some projects were part of an April show at MOMA in New York called Design and the Elastic Mind as well as at TED and Dux. Some described these applications as 'useful' art, but Joy thinks they may be a start towards a different type of interface, ambient or continuous ones.

S. Joy Mountford is an internationally recognized leader in user-centered interaction design. Through her extensive career she designed interfaces for applications ranging from aircraft to personal computers to consumer devices. She has led pioneering efforts for creating interfaces to audio and visual devices, interfaces between the electronic world and the physical world of printed materials, and for interactive music creation and generation over the net. In 1990 she pioneered the Interface Design Project, which sponsors interdisciplinary design at universities around the world; it is now in its 20th year, and has touched the lives of thousands of design professionals. She headed the Human Interface Group at Apple Computer for 8 years and then moved to Interval Research for 6 years to lead a series of consumer music product teams. She then led her own interaction design company, consulting on interaction designs for a range of client companies. In 2005, Joy joined Yahoo Inc. Her initial focus was on leading a team to redesign the world’s most trafficked Web page, Yahoo's Front Page. She led the user experience and design efforts for Yahoo's Communications (Mail, Messenger, Photos) and Community products (Groups) as a vice president. She was also invited to start a Design Innovation Group in San Francisco. The past year of their work focused on data visualisations that reveal the power – and beauty -- of very large data sets of human activity. Her project interests center around building more extensive creative spaces to position technology in appealing and meaningful ways.

Some of the pieces that Joy showed in class are available at http://www.aaronkoblin.com/work.html

In particular:
The flight data visualization is at http://www.aaronkoblin.com/work/flightpatterns/FPWeb_Final_2.mov
The AT&T call/IP data visualization is at http://senseable.mit.edu/nyte/nyte-globe-encounters.mov

One visualization that is not available online, but showed the power of visualization is a globe showing Yahoo! Mail traffic over time. The visualization revealed just how concentrated Yahoo! Mail traffic is in the Silicon Valley. Because the data also showed a "pulse", with big swings in measured email volume, they also realized that they were losing some data because of missing equipment (a $150 box) in a colocation facility. As a result of the visualization, they achieved two relevant business insights.

Observations

  • Data can be fun and helpful: beyond Tufte
  • Companies have much data that is somewhere, but mostly not used
  • Data dashboards viewed weekly as static graphs but change and difference is important
  • Information as an aggregate view offers new insights
  • Information over time shows likely trends/outcomes before needed

Text about Mountford and video of an interview

Related information - Visualizing Large Graphs from information aesthetics (June 3, 2008)

Some notes about Processing:

Processing ( http://processing.org/ ) is an easy to use programming environment created by Ben Fry and Casey Reas. It can be downloaded and used for free, and companion books written by its creators Ben Fry and Casey Reas are available.

A key benefit of Processing is that it is ideal for quickly writing and testing interactive visualization programs (called "Sketches" in Processing).
Processing simplifies some of Java's API abstraction and is designed for very convenient utilization of the powerful Java 2D graphics API.
As it is based on Java, programmers with Java experience will find it familiar, and because it simplifies visual and interactive Java programming, those without Java experience may find this easier to begin working with.
In addition, Processing automates some tasks, such as, applet and web page generation.
Note that Processing also covers mobile device application development ( http://mobile.processing.org/ )

Summary/ ASW


Friday


Starts at 12:15
45 minutes on Data Mining by Alex Wong, Director of Product at 23andMe
Each group will have a maximum of 3 minutes to present key insights in homework 6
Then, case study on Hitchoo. Check out we site before class and be prepared to share views on:
  • what you like about it
  • what you would do differently
  • what you see as the main challenges
  • what data mining you would do
  • what ranking algorithms for events you would like to see

Final opportunity for people to present ideas they have