Introduction to ESTARD Data Miner
Introduction to Data Mining
Step By Step Guide
Using Rules & Decision Trees
Reporting & Saving
Working with Classes
What is a Class? A class is a unique value from the analyzed column.
For example, if some column is of logical type and it contains values "True"
and "False" this means it contains two classes. |
What field should be used as the "Class field"?
The best way to
receive valuable data from data mining is to analyze the field that contains
key information in record. It can be of any type, but not the ID field.
Besides, setting Text fields with high level of unique values will probably
not return good results. For example, field "Customer name", containing names
of the customers of a company, will probably contain lots of unique values. If
you set this field as a Class field, you will receive Rules equal to records,
which are the best descriptions of every customer. This doesn't apply to
numeral fields, because there Classes are created manually, and you can create
any number of classes.
How do I create numeral classes?
Creating a numeral class means
creating intervals you want to analyze. To decide what intervals you want to
analyze use controls at the "Initial Query" page and "Query
Forming" chart at "Statistics" page. Chart will help you to analyze how many values are met
in equal intervals. Use "View Table" button to view values met in the table
selected for analysis. You also can see minimum and maximum values met in the
Class field above the input fields. The intervals you create should not
intersect. For example, intervals "1..10" and "11..20" do not intersect, while
"1..10" and "6..20" intersect. If you create intersecting intervals, you will
be asked to change the inputted value.
What number of classes is recommended for use for rules and decision
The more classes are detected in the analyzed column, the
more precised results will be obtained during the statistics query. But their
number shouldn't be too big, for example, several hundreds of classes used for
rules creation will be analyzed much longer, than 20 classes. So the number of
classes should be reasonable from the point of view of statistics and
Why all classes are analyzed when I select only one class for the
Statistics Query? All classes have to be analyzed during the Statistics query, because EDM has
to detect differences between them. After performing the statistics query you
can select values you want to analyze further.
Why EDM detects empty classes or values in statistics? For better
results, it is good to use "clean" databases. Empty records are automatically
detected in database, and though they are not filled, they are also been
analyzed. To avoid this problem you can deselect empty values, if such are
detected in analyzed column, before creating rules and decision trees.
Why do I get too many classes? If you have too many classes, then probably you've used a text field
as the Class field, because for numeric fields you will be asked to create
classes by yourself and you can create as many classes as you wish. If you
have too many classes in a text field, then you can split them into groups
and analyze step by step. In case if this column contains numeric data, and is
detected as a text column, it means this column contains some incorrect
records, containing text instead of numbers.