Seven Methods for Transforming Corporate Data into Business Intelligence

Vasant Dhar and Roger Stein (Prentice Hall, 1997)

Most businesses have lots of data. Virtually every company collects data about employees, customers, products, sales, transactions, etc. A huge proliferation of databases has created a great demand for new, powerful tools that can turn data into knowledge. When datasets were small, a few specialized tools selected by an able analyst were sufficient to do a particular “knowledge discovery” task. Often, only one tool has been applied to a dataset, and the limitations of tools shaped human expertise. It is still common that one group of analysts uses linear regression, another is preoccupied with clustering, while still another builds decision trees.

Knowledge may take many forms, such as equations, contingency tables, taxonomies, decision trees, rules, graphs, concepts, exceptions from patterns, and many more. Thus it is essential that a data analyst has a solid knowledge on variety of available techniques. Moreover, it is also essential that a data analyst has a basic understanding of a mapping between techniques and problems, as there is no universal technique appropriate for all scenarios.

The book by Vasant Dhar and Roger Stein addresses this important issue. It provides an useful survey of seven technologies that have been gaining popularity in the business community in recent years. The authors wrote:

Our goal is to put the reader in a position to understand how a technique can be applied, why it works, and what concerns might arise as a result of its use ... Our goal is to empower you to increase your firm’s intelligence about how it deals with its customers, suppliers, and internal business process.

The book does not explain only the mechanics behind different techniques, but it also discusses how these techniques can be applied in the business environment. The use of these techniques is illustrated by many examples, figures, graphs, diagrams, and case studies; moreover, for each technique, some additional literature is suggested.

The techniques discussed in this text include decision-support systems, genetic algorithms, neural networks, rule-based systems, fuzzy logic, case- based reasoning, and machine learning. Each of these techniques is discussed briefly in separate chapters in a uniform way. Each chapter consists of three sections. The first one provides a short introduction that gives a motivation behind a particular technique. This part is followed by a section explaining the basic elements of a technique (i.e., explaining its algorithmic aspects). The third section of these chapters address an important question: “When is it a good idea to use this particular technique?”

I feel that the last sections of these seven chapters are of great value. It is important to know about a variety of techniques, but it is even more important to know about their strengths and limitations, about environments where they can be very useful or of limited use. Also, “summary tables,” which conclude each chapter, provide nicely basic characteristics of various techniques in a systematic way: their accuracy, explainability, response speed, scalability, flexibility, ease of use, development speed, computing resources, etc.

Further, the text is extended by four appendices, which address the issues of object orientation, provide a list of available AI products, give a list of references for general reading, and present seven case studies of some major companies (including Compaq Computer Corporation, US West, etc.).

As no book is perfect, this text has also a few weak points. I will discuss them on the basis of Chapter 5, “Genetic algorithms”; however, similar remarks apply to some other parts of the text. Some references given at the end of the chapter are outdated. The field of “genetic algorithms” has made the largest progress during the last five years, but the most recent reference is from 1992 (seven years ago). The list of references contains also a single volume of proceedings (from 1991): it seems that an arbitrary choice of one of many conferences in this field was made. It is unclear why the response speed of genetic algorithms was classified as “moderate to high”: many practitioners consider the low speed of this technique as a significant bottleneck. Also, it is hard to agree that the flexibility of this technique is high, as each application usually requires its own data structures and variation operators. Moreover, different problems have different constraints, which are problem specific and often require a problem-specific approach (various penalties, decoders, repair algorithms, etc.). I hope to see these issues addressed in the second edition of this book!

The book would be important for the business community, as many other books on the techniques covered in this text are too technical. The authors wrote: “One motivation for writing the book was for business people who often asked us to explain to them in simple terms how these technologies could be used profitably in business.” Clearly, the authors achieved their goal. Without any doubt, this book will increase the visibility of new “knowledge-intensive” decision-support methods in many organizations.

ZBIGNIEW MICHALEWICZ