Introduction to ESTARD Data Miner

Introduction to Data Mining

Step By Step Guide

Program Interface

Using Databases

Using Rules & Decision Trees

BI Functions

Reporting & Saving 
 Home page

Working with Classes


  1. What is a Class?
    A class is a unique value from the analyzed column. For example, if some column is of logical type and it contains values "True" and "False" this means it contains two classes. |
  2. What field should be used as the "Class field"?

    The best way to receive valuable data from data mining is to analyze the field that contains key information in record. It can be of any type, but not the ID field. Besides, setting Text fields with high level of unique values will probably not return good results. For example, field "Customer name", containing names of the customers of a company, will probably contain lots of unique values. If you set this field as a Class field, you will receive Rules equal to records, which are the best descriptions of every customer. This doesn't apply to numeral fields, because there Classes are created manually, and you can create any number of classes.
     

  3. How do I create numeral classes?

    Creating a numeral class means creating intervals you want to analyze. To decide what intervals you want to analyze use controls at the  "Initial Query" page and "Query Forming" chart at "Statistics" page. Chart will help you to analyze how many values are met in equal intervals. Use "View Table" button to view values met in the table selected for analysis. You also can see minimum and maximum values met in the Class field above the input fields. The intervals you create should not intersect. For example, intervals "1..10" and "11..20" do not intersect, while "1..10" and "6..20" intersect. If you create intersecting intervals, you will be asked to change the inputted value.

  4. What number of classes is recommended for use for rules and decision trees creation?

    The more classes are detected in the analyzed column, the more precised results will be obtained during the statistics query. But their number shouldn't be too big, for example, several hundreds of classes used for rules creation will be analyzed much longer, than 20 classes. So the number of classes should be reasonable from the point of view of statistics and performance.

  5. Why all classes are analyzed when I select only one class for the Statistics Query?
    All classes have to be analyzed during the Statistics query, because EDM has to detect differences between them. After performing the statistics query you can select values you want to analyze further.
  6. Why EDM detects empty classes or values in statistics?
    For better results, it is good to use "clean" databases. Empty records are automatically detected in database, and though they are not filled, they are also been analyzed. To avoid this problem you can deselect empty values, if such are detected in analyzed column, before creating rules and decision trees.
  7. Why do I get too many classes?
    If you have too many classes, then probably you've used a text field as the Class field, because for numeric fields you will be asked to create classes by yourself and you can create as many classes as you wish. If you have too many classes in a text field, then you can split them into groups and analyze step by step. In case if this column contains numeric data, and is detected as a text column, it means this column contains some incorrect records, containing text instead of numbers.