Understanding data

Understanding data

Since an important component of the machine learning process is data storage, we briefly consider in this section the different types and forms of data that are encountered in the machine learning process.

1. Unit of observation

By a unit of observation we mean the smallest entity with measured properties of interest for a study.

Examples

• A person, an object or a thing

• A time point

• A geographic region

• A measurement

Sometimes, units of observation are combined to form units such as person-years.

2. Examples and features

Datasets that store the units of observation and their properties can be imagined as collections of data consisting of the following:

· Examples

An “example” is an instance of the unit of observation for which properties have been recorded.

An “example” is also referred to as an “instance”, or “case” or “record.” (It may be noted that

the word “example” has been used here in a technical sense.)

· Features

A “feature” is a recorded property or a characteristic of examples. It is also referred to as

“attribute”, or “variable” or “feature.”

Examples for “examples” and “features”

1. Cancer detection

Consider the problem of developing an algorithm for detecting cancer. In this study we note

the following.

(a) The units of observation are the patients.

(b) The examples are members of a sample of cancer patients.

• gender

• age

• blood pressure

• the findings of the pathology report after a biopsy

2. Pet selection

Suppose we want to predict the type of pet a person will choose.

(a) The units are the persons.

(b) The examples are members of a sample of persons who own pets.

pets.

Figure 1: Example for “examples” and “features” collected in a matrix format (data relates to automobiles and their features)

3. Spam e-mail

Let it be required to build a learning algorithm to identify spam e-mail.

(a) The unit of observation could be an e-mail message.

(b) The examples would be specific messages.

Examples and features are generally collected in a “matrix format”. Fig. 1: shows such a dataset.

1. Different forms of data

1. Numeric data

If a feature represents a characteristic measured in numbers, it is called a numeric feature.

2. Categorical or nominal

A categorical feature is an attribute that can take on one of a limited, and usually fixed, number of possible values on the basis of some qualitative property. A categorical feature is also called a nominal feature.

3. Ordinal data

This denotes a nominal variable with categories falling in an ordered list. Examples include clothing sizes such as small, medium, and large, or a measurement of customer satisfaction on a scale from “not at all happy” to “very happy.”

Examples

In the data given in Fig.1, the features “year”, “price” and “mileage” are numeric and the features “model”, “color” and “transmission” are categorical.

Understanding data

1. Different forms of data

Post a Comment

0 Comments

Popular Posts

Information Security

Overview of information security

Database System Development Lifecycle

Categories

Tags

Random Posts

Popular Posts

Information Security

Overview of information security

Database System Development Lifecycle

Menu Footer Widget

Understanding data

1. Different forms of data

You may like these posts

Post a Comment

0 Comments

Popular Posts

Information Security

Overview of information security

Database System Development Lifecycle

Categories

Tags

Random Posts

Popular Posts

Information Security

Overview of information security

Database System Development Lifecycle

Menu Footer Widget