Provides both theoretical and practical coverage of all data mining topics. Includes extensive number of integrated examples and figures. Offers instructor resources including solutions for exercises and complete set of lecture slides.
Assumes only a modest statistics or mathematics background, and no database knowledge is needed. Topics covered include; predictive modeling, association analysis, clustering, anomaly detection, visualization.
Table of Contents 1 Introduction 1. Share a link to All Resources. Instructor Resources. Websites and online courses. Other Student Resources. Show order information for All Digital Paper. Previous editions. Introduction to Data Mining: International Edition. Relevant courses. Next editions. Hi , I've attached the data file that ] mentioned in my previous email. Each line contains the information for a single palient and consists of five fields.
We want to predict the last field using the other fields. I don't have time to provide any more information about the data since I'm going out of town for a couple of days, but hopefully that won't slow you down too much.
And if you don't mind, could we meet when ] get back to discuss your preliminary results? I might invite a few other members of my team. Thanks and see you in a couple of days. Despite some misgivings, you proceed to analyze the data. The first few rows of the file are as follows: You put your doubts aside and star t the analysis. There are only lines, a smaller data file t han you had hoped for , but two days later, you feel that you have made some progress.
You arrive for the meeting, and while waiting for others to arrive, you strike 21 up a conversation with a statistician who is working on the project. When she learns that you have also been analyzing the data from the project, she asks if you would mind giving her a brief overview of your results.
Statistician: So, you got the data for all the patients? Data Miner: Yes. I haven't had much time for analysis, but l do have a few interesting results. Statistician: Amazing. There were so many data issues with this set of patients that I couldn't do much. Data Miner: Oh? Statistician: Well, first there is field 5, the variable we want to predict. It's common knowledge among people who analyze this type of data that results are better if you work with the log of the values, but I didn't discover this until later.
Was it mentioned to you? Data Miner: No. Statistician: But surely you heard about what happened to field 4? It 's supposed to be measured on a scale from 1 to 10, with 0 indicating a missing value, but because of a data entry error, all 10's were changed into O's. Unfortunately, since some of the pat ients have missing values for this fiel d, it's impossible to say whether a 0 in this field is a real 0 or a Quite a few of the records have that problem.
Data Miner: Interesting. Were there any other problems? Statistician: Yes, fields 2 and 3 are basically the same, buL I assume that you probably noticed that. Data Miner: Yes, but these fiel ds were only weak predictors of field 5.
Statistician: Anyway, given all those problems, I' m surprised you were able to accomplish anything. Data Miner: True, but my results are really quite good.
Field 1 is a very strong predictor of field 5. I'm surprised i hat this wasn't noticed before. Statistician: What? Field 1 is j ust an ident ificat ion number. Data Miner: Nonetheless, my resul ts speak for themselves.
Statistician: Oh, no! I just remembered. We assigned lD numbers after we sorted t he records based on field 5. There is a strong connection, but it's meaningless. Although this scenario represents an extreme situation, it emphasizes t he importance of "knowing your data. Other names for a data object are record, point, vector, pattern, event, case, sample, observation, or entity. In turn, data objects are described by a number of attributes that capture the basic characteristics of an object, such as the mass of a.
Other names for an attribute are variable, characteristic, field, feature, or dimension. Often, a data set is a file, in which the objects are records or rows in the file and each field or column corre- sponds to an attribute. For example, Table 2. Each row corresponds to a student and each column is an attribute that describes some aspect of a student, such as grade point average GPA or identification number ID.
Table 2. A sample data set containing student information. In Section 2. However, we first consider attributes. We first define an at- tribute, then consider what we mean by the type of an attribute, and finally describe the types of attributes that are commonly encountered. What I s an attrib ute? We start with a more detailed definition of an attribute.
Definition 2. An attribute is a. For example, eye color varies from person to person, while the temperature of an object varies over time. At the most basic level, attribut es are not about numbers or symbols. However, to discuss and more precisely analyze the characteristics of objects, we assign numbers or symbols to them. To do this in a well-defined way, we need a measurement scale.
A measurement scale is a. Formally, the process of measurement is the application of a measure- ment scale to associate a value with a particular attribute of a specific object. While this may seem a bit abstract, we engage in the process of measurement all the time. For instance, we step on a bathroom scale to determine our weight, we classify someone as male or female, or we count the number of chairs in a room to see if there will be enough to seat all the people coming to a meeting.
In all these cases, the "physical value" of an attribute of an object is mapped to a numerical or symbolic value. With this background, we can now discuss the type of an attribute, a concept that is important in determining if a part icular data analysis technique is consistent with a specific type of attribute. The Type of an Attribute It should be apparent from the previous discussion t hat. In ot her words, the values used to represent an at t ribute may have properties that are not properties of the attribute itself, and vice versa.
This is illustrated with two examples. Both of these attributes can be represented as integers. However, while it is reasonable to talk about the average age of an employee, it makes no sense to talk about the average employee ID. Indeed, the only aspect of employees that we want to capture with t he ID attribute is that they are distinct. Consequently, the only valid operation for employee IDs is to te'st whether they are equal. There is no hint of this limitation, however, when integers are used to represent the employee ID attribute.
For the age att ribute, the properties of the integers used to represent age are very much the properties of the attribute. Even so, the correspondence is not complete since, for example, ages have a maximum, while integers do not. Consider Figure 2. Each successive line segment, going from the top to the bottom, is formed by appending t he topmost line segment to itself. Thus, the second line segment from the top is formed by appending the topmost line segment to itself twice, the third line segment from the top is formed by appending the topmost line segment to itself three times, and so fort h.
This fact is captured by the measurements on the right-hand side of the figure, but not by those on the left hand-side. More specifically, the measurement scale on the left-hand side captures only t he ordering of the length at t ribute, while the scale on t he right-hand side captures both the ordering and additivity properties.
Thus, an att ribute can be measured in a way that does not capture all the properties of the attribute. The type of an attribute should tell us what properties of the attribute are refl ected in t he values used to measure it.
Knowing the type of an attribute is important because it tells us which properties of the measured values are consistent with the underlying properties of the attribute, and therefore, it allows us to avoid foolish actions, such as computing t he average employee ID.
Note that it is common to refer to the type of an attribute as the type of a measurement scale. Fi gure 2. The measurement of the length of li ne segments on two different scales of measurement. The Different Types of Attributes A useful and simple way to specify the type of an at tribute is to identify the properties of numbers that correspond to underlying properties of the attribute.
For example, an attribute such as length has many of the properties of numbers. It makes sense to compare and order objects by length, as well as to talk about the differences and ratios of length. The following properties operations of numbers are typically used to describe attributes. Each attribute type possesses all of tne properties and operations of t he attri bute types above it. Consequently, any property or operation that is valid for nominal, ordinal , and interval attributes is also valid for ratio attributes.
In other words, the definition of the attribute types is cumulative. However, 26 Chapter 2 Data Table 2. OiHerent attribute types. Attribute Type Description Examples Operations Nominal The values of a nominal zip codes, mode, entropy, attribute are just different employee ID numbers, contingency names; i. Nominal and ordinal attributes are collectively referred to as categorical or qua litati ve attributes.
As the name suggests, qualitative attributes, such as employee ID, lack most of t he properties of numbers. Even if t hey are rep- resented by numbers, i.
The remaining two types of attributes, interval and ratio, are collectively re- ferred to as quantitative or nume ric attributes. Quantitative attributes are represented by numbers and have most of t he properties of numbers. Note that. The types of attributes can also be described in terms of transformations that do not change the meaning of an attribute.
Indeed, S. Smith Stevens, the psychologist who originally defi ned t he t yp es of attributes shown in Table 2.
For example, 2. Transformations that define attribute levels. The statistical operations that make sense for a particular type of attribute are t hose that will yield the same results when the attribute is transformed us- ing a t ransformation that preserves the attribute's meaning. To illustrate, the average length of a set of objects is different when measured in meters rather than in feet, but both averages represent the same length.
Temperature provides a good illus- tration of some of the concepts that have been described. First, temperature can be either an interval or a ratio attribute, depending on its measurement scale. This is not true when temperature is measured on either t.
The problem is t hat the zero points of the Fahrenheit and Celsius scales are, in a physical sense, arbitrary, and therefore, the ratio of two Celsius or Fahrenheit temperatures is not.
Discrete A discrete attri bute has a finite or count ably infinite set of values. Such attributes can be categorical, such as zip codes or ID numbers, or numeric, such as counts. Discrete attributes are often represented using integer variables.
Binary attributes are a special case of dis- crete attributes and assume only two values, e. Binary attributes are often represented as Boolean variables, or as integer variables that only take t he values 0 or 1.
Continuous A continuous attribute is one whose values are real numbers. Ex- amples include attributes such as temperature, height, or weight. Con- tinuous attributes are typically represented as floating-point variables. Practically, real values can only be measured and represented with lim- ited precision. However , some combinations occur only infrequently or do not make much sense.
For instance, it is difficult to think of a realistic data set that contains a continuous binary attribute. Typically, nominal and ordinal attributes are binary or discrete, while interval a nd ratio attributes are continuous. However , cou nt a ttributes, which are discrete, are also ratio attributes.
Asymmetric Attributes For asymmetric attributes, only presence--a non-zero att ribute value--is re- garded as important. Consider a data set where each object is a student and each attribute records whether or not a student took a particular course at a university. For a specific student, an attribute has a value of 1 if the stu- dent took t he course associated with that attribute and a value of 0 otherwise. Because students take only a small fraction of all available courses, most of the values in such a data set would be 0.
Therefore, it is more meaningful and more efficient to focus on t he non-zero values. To illustrate, if students are compared on the basis of the courses t hey don't take, t hen most students would seem very similar, at least if the number of courses is large. Binary attributes where only non-zero values are important are called asymmetric ' 2.
This type of attribute is particularly important for as- sociation analysis, which is discussed in Chapter 6. It is also possible to have discrete or continuous asymmetric features. For instance, if the number of credits associated with each course is recorded, then the result ing data set will consist of asymmetric d iscr ete or cont inuous attributes.
Jn t his sect ion, we describe some of the most common types. For convenience, we have grouped the types of data sets into three groups: record data, graph- based data, and ordered dat a.
These categories do not cover all possibilities and other groupings are certainly possible. Gener a] Characteristics of Data Set s Before providing details of specific kinds of data sets, we discuss three char- acterist ics t hat apply to many data sets and have a significant impact on the data mining techniques that are used: dimensionality, sparsity, and resolution. Dimensiona lit y The dimensionality of a data set is the number of attributes that the objects in the data set possess.
Data with a small number of dimen- sions tends to be qualit atively different t han moderate or high-dimensional data. Indeed, the difficulties associated with analyzing high-dimensional data are sometimes referred to as the curse of dimensionality. Because of this, an important mot ivation in preprocessing the data is dimensionality reduc- t ion.
These issues are discussed in more depth later in this chapter and in Appendix B. In practical terms, sparsity is an advantage because usually only the non-zero values need to be stored and manipulated. This results in significant savings wit h respect to computation time and storage. Resolution l t is frequently possible to obtain data at different levels of reso- lut ion, and often the properties of the data are different at different resolutions.
For instance, the surface of the Earth seems very uneven at a resolution of a 30 Chapter 2 Data few meters, but is relatively smooth at a resolution of tens of kilometers. The patterns in t he data. If the resolution is too fine, a pattern may not be visible or may be buried in noise; if the resolution is too coarse, the pattern may disappear. For example, variations in atmospheric pressure on a scale of hours reflect the movement of storms and other weather systems.
On a scale of months, such phenomena are not detectable. Record Data Much data mining work assumes that the data set is a collection of records data objects , each of which consists of a fixed set of data fields attributes. See Figure 2. For the most basic form of record data, there is no explicit relationship among records or data fields, and every record object has the same set of attributes.
Record data is usually stored either in flat files or in relational databases. Relational databases are certainly more than a collection of records, but data rruning often does not use any of the additional information available in a relational database.
Rather, the database serves as a convenient place to find records. Different types of record data are described below and are illustrated in Figure 2. Transaction or Market Basket Data Transaction data is a special type of record data, where each record transaction involves a set of items. Con- sider a grocery store. The set of products purchased by a customer during one shopping trip constitutes a transaction, while t he individual products that were purchased are the items.
This type of data is called market basket data because t he items in each record are the products in a person's "mar- ket basket. Most often, the attributes are binary, indicating whether or not an item was purchased, but more generally, the attributes can be discrete or continuous, such as the number of items purchased or the amount spent on those items.
Figure 2. Each row represents the purchases of a particular customer at a particular time. The Data Matrix If the data objects in a collection of data all have the same fixed set of numeric attributes, then the data objects can be thought of as points vectors in a multidimensional space, where each dimension represents a distinct attribute describing the object. A set of such data objects can be interpreted as an m by n matrix, where t here are m rows, one for each object, 2.
Milk 7 Yes K No B. No Singl. DiHerent variations of record data. A representation t hat has data objects as columns and attributes as rows is also fine. This matrix is called a data matrix or a pattern matrix.
A data matrix is a variation of record data, but because it consists of numeric attributes, standard matrix operation can be applied to transform and manipulate t he data. Therefore, the data matrix is the standard data for mat for most statistical data. The Sparse Data Matrix A sparse data matrix is a special case of a data matrix in which the attributes are of the same type and are asymmetric; i. Another common example is document data. In particular, if the order of the terms words in a document is ignored, 32 Chapter 2 Data then a document can be represented as a term vector, where each term is a component attribute of the vector and the value of each component is the number of times the corresponding term occurs in t he document.
This representation of a collection of documents is often called a document-term matrix. The documents are the rows of this matrix, while the terms are the columns. In practice, only the non-zero entries of sparse data matrices are stored.
Graph-Based Data A graph can sometimes be a convenient and powerful representation for dat a. We consider two specific cases: 1 t he graph captures relationships among data objects and 2 the data objects themselves are represented as graphs. Data with Relationships among Objects The relationships among ob- jects frequently convey important information. In such cases, the data is often represented as a graph. In particular, the data objects are mapped to nodes of the graph, while the relationships among objects are captured by the links between objects and link properties, such as direction and weight.
Consider Web pages on the World Wide Web, which contain both text and links to other pages. In order to process search queries, Web search engines collect and process Web pages to extract their contents. It is well known, however, that the links to and from each page provide a great deal of information about the relevance of a Web page to a query, and thus, must also be taken into consideration. Data with Objects That Are Graphs If objects have structure, that is, the objects contain subobjects that have relationships, then such obj ects are frequently represented as graphs.
For example, the structure of chemical compounds can be represented by a graph, where the nodes are atoms and the links between nodes are chemical bonds. A graph representation makes it possible to determine which substructures occur frequently in a set of compounds and to ascertain whether the presence of any of these substructures is associated with the presence or absence of certain chemical properties, such as melting point or heat of formation.
Substructure mining, which is a branch of data mining that analyzes such data, is considered in Section 7. C'ioollllon U.. DiK"'"' ifl DiHerenl variations of graph data.
Ordered Data For some types of data, the attributes have relationships that involve order in time or space. Different types of ordered data are described next and are shown in Figure 2.
Sequential Data Sequential data, also referred to as temporal data, can be thought of as an extension of record data, where each record has a time associated with it. Consider a retail t ransaction data set that also stores the time at which the transaction took place. This time information makes it possible to find patterns such as "candy sales peak before Halloween.
For example, each record could be the purchase history of a customer, with a listing of items purchased at different times. Using this information, it is possible to find patterns such as "people who buy DVD players tend to buy "DVDs in the period immediately following the purchase. INf Different variations of ordered dala. In the top table, each row corresponds to the items purchased at a particular time by each customer. For instance, at time t3, customer C2 purchased items A and D.
In the bottom table, the same information is displayed, but each row corresponds to a particular customer. Each row contains information on each transaction involving the customer, where a t ransaction is considered to be a set of items and the time at which t hose items were purchased. For example, customer C3 bought items A and C at time t2. It is quite simi lar to sequential data, except that there are no time stamps; instead, there are posi- tions in an ordered sequence.
For example, the genetic information of plants and animals can be represented in the form of sequences of nucleotides that are known as genes. Many of the problems associated with genetic sequence data involve predicting similarit ies in the struct ure and function of genes from similarities in nucleotide sequences. Time Seri es Data Time series data is a special type of sequential data in which each record is a t ime series, i.
For example, a financial data set might contain objects that are time series of the daily prices of various stocks. As another example, consider Figure 2. When working with temporal data, it is important to consider temporal autocorrelation ; i. Spatial Data Some objects have spatial attributes, such as positions or ar- eas, as well as other types of attributes. An example of spat ial data is weat her data precipitation, temperature, pressure that is collected for a variety of geographical locations.
An important aspect of spatial data is spatial auto- correlation; i. Thus, two points on!. Important examples of spatial data are the science and engineering data sets that are the result of measurements or model output taken at regularly or irregularly distributed points on a two- or three-dimensional grid or mesh. For instance, Earth science data sets record the temperature or pressure mea- sured at points grid cells on latitude-longitude spherical grids of various resolutions, e.
As another example, in the simulation of the flow of a gas, the speed and direction of flow can be recorded for each grid point in the simulation. Record-oriented techniques can be applied to non-record data by ext racting features from data objects and using these features to create a record corresponding to each object. Consider the chemical structure data that was described earlier. Given a set of common substructures, each compound can be represented as a record with binary attributes that indicate whether a compound contains a specific substructure.
Such a representation is actually a transaction data set, where the transactions are the compounds and the items are the substructmes. In some cases, it is easy to represent the data in a record format, but this type of representation does not capture all the information in the data. Consider spatia-temporal data consisting of a time series from each point on a spatial grid. This data is often stored in a data matrix, where each row represents a location and each column represents a particular point in time.
However, such a representation does not explicit ly capture the time relation- ships that are present among attributes and the spatial relationships that exist among objects. This does not mean that such a representation is inap- propriate, but rather that these relationships must be taken into consideration during the analysis. For example, it would not be a good idea to use a data mining technique that assumes the attributes are statistically independent of one another. For t hat reason, data mining cannot usually take advantage of the significant benefits of "ad- dressing quality issues at the source.
Because preventing data quality problems is typically not an op- tion, data mining focuses on 1 the detection and correction of data quality problems and 2 the use of algorithms that can tolerate poor data quality.
The first step, detection and correction, is often called data cleaning. The following sections discuss specific aspects of data quality. The focus is on measurement and data collection issues, although some appl ication-related issues are also discussed. There may be problems due to human error , limitations of measuring devices, or flaws in the data collection process. Values or even entire data objects may be missing. For example, there might be two difl'erent records for a person who has recently Jived at two different addresses.
Even if all the data is present and "looks fine," there may be inconsistencies-a person has a height of 2 meters, but weighs only 2 kilograms. In the next few sections, we focus on aspects of data quality that are related to data measurement and collection.
We begin with a definition of measure- ment and data collection errors and then consider a variety of problems that involve measurement error: noise, artifacts, bias, precision, and accuracy. We conclude by discussing data quality issues that may involve both measurement and data collection problems: outliers, missing and inconsistent values, and duplicate data.
Measurement a nd Da t a Collection Errors The term measure me nt error refers to any problem resulting from the mea- surement process. A common problem is that t he value recorded differs from the true value to some extent. The term data collect ion error refers to errors such as omitti ng data objects or attri bute values, or inappropriately including a data object.
For example, a study of animals of a certain species might include animals of a related species that are similar in appearance to the species of interest.
Both measurement errors and data collection errors can be either systematic or random. We will only consider general types of errors. For example, keyboard errors are common when data is entered manually, and as a result , many data entry programs have techniques for detecting and, with human intervent ion, correcting such errors. Noise a nd Artifacts Noise is the random component of a measurement error.
It may involve the distortion of a value or the addition of spurious objects. If a bit 38 Chapter 2 Data a Time series. Noise in a time series context. Noise in a spatial context. Notice that some of the noise points are intermixed with the non-noise points. The term noise is often used in connection with data that has a spatial or temporal component. In such cases, techniques from signal or image process- ing can frequently be used to reduce noise and thus, help to discover patterns signals that might be "lost in the noise.
Such deterministic distortions of the data are often referred to as artifacts. Precision, Bias, and Accu racy In statistics and experimental science, the quality of the measurement process and the resulting data are measured by precision and bias.
We provide the standard definitions, followed by a brief discussion. For t. The closeness of repeated measurements of the same quantity to one another. Definit ion 2. A systematic variation of measurements from the quantity being measured. Precision is often measured by the standard deviation of a set of values, while bias is measured by taking the difference between the mean of the set of values and t he known value of the quantity being measured.
Bias can only be determined for objects whose measured quantity is known by means external to the current situation. Suppose that we have a standard laboratory weight with a mass of 1g and want to assess t he precision and bias of our new laboratory scale.
The mean of these values is 1. The precision, as measured by the standard deviation, is 0. It is common to use the more general term, accuracy, to refer to the degree of measurement error in data.
Defini t ion 2. The closeness of measurements to t. Accuracy depends on precision and bias, but since it is a general concept, t here is no specific formula for accuracy in terms of these two quantities.
One important aspect of accuracy is the use of significant digits. The goal is to use only as many digits to the result of a measurement or calculation as are justified by the precision of the data. For example, if the length of an object is measured with a meter stick whose smallest markings are millimeters, then we should only record the length of data to t he nearest mil- limeter.
The precision of such a measurement would be 0. We do not. Issues such as significant digits, precision, bias, and accuracy are sometimes overlooked, but they are important for data mining as well as statistics and science. Many times, data sets do not come with information on the precision of t he data, and furthermore, t he programs used for analysis return results without any such information.
Nonetheless, without some understanding of the accuracy of the data and the results, an analyst runs the risk of committing serious data analysis blunders. Outliers Outliers are either 1 data objects that, in some sense, have characteristics t hat are different from most of the other data objects in the data set , or 2 values of an attribute that are unusual with respect to t he typical values for that attribute.
Alternatively, we can speak of anomalous objects or values. There is considerable leeway in the definition of an outlier, and many diflerent definitions have been proposed by t he statistics and data mining communit ies. Furthermore, it is important to distinguish between the notions of noise and outliers. Outliers can be legitimate data objects or values. Thus, unlike noise, outliers may sometimes be of interest. In fraud and network intrusion detection, for example, the goal is to find unusual objects or events from among a large number of normal ones.
Chapter 10 discusses anomaly detection in more detail. Missing Values It is not unusual for an object to be missing one or more attribute values. Jn some cases, the information was not collected; e. In other cases, some attributes are not applicable to all objects; e. Regardless, missing values should be taken into account during the data analysis. There are several strategies and variations on these strategies for dealing with missing data, each of which may be appropriate in certain cir cumstances.
These strategies are list ed next, along with an indication of their advantages and disadvantages. However, even a partially speci- fied data object contains some information, and if many objects have missing values, then a reliable analysis can be difficult or impossible. Nonet heless, if a data set has only a few objects that have missing values, then it may be expedient to omit them.
A related strategy is to eliminate attribu tes that have missing values. This should be done with caution, however, since the eliminat ed attributes may be t he ones that are critical to the analysis.
Estimate Missing Values Somet imes missing data can be reliably est. For example, consider a time series that changes in a reasonably smooth fashion, but has a few, widely scattered missing values. In such cases, the missing values can be estimated interpolated by using the remaining values.
As another example, consider a data set that has many similar data points. In this situation, the attribute values of the points closest to t he point with the missing value are often used to estimate the missing value. If t he attribute is continuous, then the average attribute value of the nearest neigh- bors is used; if t he attribute is categorical, then the most commonly occurring at t ribute value can be taken.
For a concrete illustration, consider precipitation measurements that are recorded by ground stations. For areas not containi ng a ground station, the precipitation can be estimated using values observed at nearby ground stat ions. Ignore the Missing Value during Analysis Many data mining approaches can be modified to ignore missing values. For example, suppose t hat objects are bei ng clustered and the similarity between pairs of data objects needs to be calculated.
It is true that the similarity will only be approximate, but unless t he total number of at tributes is small or t he num- ber of missing values is high, t his degree of inaccuracy may not matt er much. Likewise, many classification schemes can be modified to work with missing values.
Inconsistent Values Data can contain inconsistent values. Consider an address field, where both a zip code and city are listed, but the specified zip code area is not contained in that city.
It may be that the individual entering this informat ion transposed two digits, or perhaps a digit was misread when the information was scanned 42 Chapter 2 Data from a handwritten form. Regardless of the cause of t he inconsistent values, it is important to detect and, if possible, correct such problems.
Some types of inconsistences are easy to detect. For inst ance, a person's height should not be negative. In other cases, it can be necessary to consult an external source of information. My library Help Advanced Book Search. Changes to cluster analysis are also localized. The data chapter has been updated to include discussions of mutual information and kernel-based techniques. The discussion of evaluation, which occurs in the section on imbalanced classes, has also been updated and improved.
This research has resulted in more than papers published in the proceedings of major data mining conferences or computer science or domain journals. Teaching and Learning Experience This program will provide a better teaching and learning experience-for you and your students. Each major topic kumaf organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms.
Previous to his academic career, he held a variety of software engineering, analysis, and design positions in industry at Silicon Biology, Racotek, and NCR. User Review — Flag as inappropriate provide its preview. Product details Format Paperback pages Nung x x The text requires only a modest background in mathematics.
Pearson Addison Wesley- Data mining — pages. Check out the top books of the year on our page Best Books of This chapter addresses the increasing concern over the validity and reproducibility of results obtained from data analysis. Each concept is explored thoroughly and supported with numerous examples. A new appendix provides a brief discussion vi;in scalability in the context of big data. This book provides a comprehensive coverage of important data mining techniques.
Ninf to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time. Each major topic is organized into two chapters, I like the comprehensive coverage which spans all major data mining techniques including classification, clustering, and pattern mining association rules. Home Contact Us Help Free delivery worldwide. Dispatched from the UK in 2 business days When will my order arrive?
0コメント