|
Graph Mining: Discovery in Large Networks
Corinna Cortes, Daryl Pregibon & Chris Volinsky
AT&T Shannon Labs
Florham Park, NJ
Large financial and telecommunication networks provide a rich source of problems for the data mining community. The problems are inherently quite distinct from traditional data mining in that the data records, representing transactions between pairs of entities, are not independent. Indeed, it is often the linkages between entities that are of primary interest. A second factor, network dynamics, induces further challenges as new nodes and edges are introduced through time while old edges and nodes disappear.
We discuss our approach to representing and mining large sparse graphs. Several applications in telecommunications fraud detection are used to illustrate the benefits our approach.
Daryl Pregibon is Head of the Statistics Research Department at AT&T Shannon Research Labs. His department is responsible for developing a theoretical and computational foundation of statistics for very large data sets. He has been with the Labs for over twenty years. He has interest in and has made contributions in the three main areas of statistics: modeling, data analysis and computing. His specific contributions include data analytic methods for generalized linear and tree based models, incorporating statistical expertise in data analysis software, and designing and building application specific data structures in statistical computing. He is very active in "data mining" which he defines as an interdisciplinary field combining statistics, artificial intelligence and database research.
Daryl received a PhD in Statistics from the University of Toronto in 1979 and a MS in Statistics from the University of Waterloo in 1976. He is a fellow of the American Statistical Association, and has published over fifty articles in his field. He was co-author of the best applications paper at KDD2001 (Empirical Bayes Screening for Multi-item Association in Large Databases) and the best research paper at KDD2000 (Hancock: A Language for Extracting Signatures from Data Streams). He is the past Chair of CATS (Committee on Applied and Theoretical Statistics, National Academy of Science). He was co-chair of KDD97 and has been either a special advisor or member of the KDD program committees for the past three years. His is co-founder of SAIAS (Society for Artifical Intelligence And Statistics). Currently he is a member of CNSTAT (Committee on National Statistics, National Academy of Science), a member of the SIGKDD Executive Committee, a member of the Steering Committee of IDA (Intelligent Data analysis), a member of the Editorial Board of Data Mining and Knowledge Discovery.
|