IBM Switch 15 Manual de usuario Pagina 109

  • Descarga
  • Añadir a mis manuales
  • Imprimir
  • Pagina
    / 270
  • Tabla de contenidos
  • MARCADORES
  • Valorado. / 5. Basado en revisión del cliente
Vista de pagina 108
Chapter
6
66
6
Handling Missing Values
Overview of Missing Values
During the Data Preparation phase of data mining, you will of ten want to replace missing values
in th e data. Missing values are values in the data set that are unknown, uncolle cted, or incorrectly
entered. Usually, such values are invalid for th eir elds. For example, the eld Sex should contain
the values M and F. If y ou discover the values Y or Z in the eld, you can safely assume that such
values are invalid and should therefore be interpret ed as blanks. Likew ise, a negative value for the
eld Age is meaningless and should also be interpreted as a blank . Frequently, such obviously
wrong values are purpose ly ente r ed, or elds left blank, during a ques tionnaire to indicate a
nonresp onse. At times, you may want to examine these blanks more closely to determine whether
a nonresponse, s uch as the re f usal to give one’s age, is a factor in pr edicting a specic o utcome.
Some modeling techniques handle missin g data better than others. For example, C5.0 and
Apriori cope w ell with values that are explicitly declared as “missing” in a Type node. Other
modeling techniques have trouble dealing with missing values and experience longer training
times, res ulting in less-accurate models.
There are several ty pe s of missing values recognized by IBM® SPSS® Modeler:
Null or system-missing values.
These are nonstring values that have been left blank in the
database or source le a nd have not been specically dened as “missing” in a source or
Type nod e. System-missing values are displayed as $null$. Note that empty strings are not
considered nu lls in SPSS Modeler, although they may be trea ted as nulls by certain databases.
Empty strings and white space.
Empty string values and white space (str ings with no visible
characters) are treated as distinct from null v alues. Empty strings are treated as equivalent to
white space for mo st purposes. For example, if you select the option to treat white space as
blanks in a source or Type node, this s etting applies to emp ty strings as well.
Blank or user-defined missing values.
These are values such as unknown, 99, or –1 that ar e
explicitly de ned in a source node or Type node as missing. Optionally, you can also choose
to tre at nulls a nd white space as blanks, which allows them to be agged for spec ial treatment
and to be excluded from most calcula tions. Fo r example, you can use the @BLANK function to
treat these values, along with other types of miss ing values, as blanks.
© Copyright IBM Corporation 1994, 2012.
99
Vista de pagina 108
1 2 ... 104 105 106 107 108 109 110 111 112 113 114 ... 269 270

Comentarios a estos manuales

Sin comentarios