ESTARD Data Miner FAQ
Here you can find answers to such questions:
- What is a "Class"?
- What field should be used as the "Class field"?
- How do I create numeral classes?
- Why do I get too many classes after selecting the field to
- Why all classes are analyzed when I select only one class for the
- Why EDM detects empty classes or values in statistics?
- What number of classes is recommended for use for rules and decision
- What is the difference between the Learning and Analyzed databases?
- Why ID field is necessary for analyzing two or more databases together?
- How values in statistics are divided into groups?
- What is the difference between Rules and Decision Trees?
- How to use settings for Rules and Decision Trees creation?
- Why no Rules are created after I press "Create Rules" button?
- What algorithm is used in ESTARD Data Miner?
If you have further questions, please don't hesitate to ask -
What is a Class? A class is a unique value from the analyzed column.
For example, if some column is of logical type and it contains values "True"
and "False" this means it contains two classes.
What field should be used as the "Class field"?
The best way to receive valuable data from data mining is to analyze the field
that contains key information in record. It can be of any type, but not the ID
field. Besides, setting Text fields with high level of unique values will
probably not return good results. For example, field "Customer name",
containing names of the customers of a company, will probably contain lots of
unique values. If you set this field as a Class field, you will receive Rules
equal to records, which are the best descriptions of every customer. This
doesn't apply to numeral fields, because there Classes are created manually,
and you can create any number of classes.
How do I create numeral classes? Creating a numeral class means
creating intervals you want to analyze. To decide what intervals you want to
analyze use the "View Field Values" button to analyze how many values are met
in equal intervals. Use "View Table" button to view values met in the table
selected for analysis. You also can see minimum and maximum values met in the
Class field above the input fields. The intervals you create should not
intersect. For example, intervals "1..10" and "11..20" do not intersect, while
"1..10" and "6..20" intersect. If you create intersecting intervals, you will
be asked to change the inputted value.
Why do I get too many classes? If you have too many classes, then probably you've used a text field
as the Class field, because for numeric fields you will be asked to create
classes by yourself and you can create as many classes as you wish. If you
have too many classes in a text field, then you can split them into groups
and analyze step by step. In case if this column contains numeric data, and is
detected as a text column, this means that the column contains some incorrect
records, containing text instead of numbers.
Why all classes are analyzed when I select only one class for the
Statistics Query? All classes have to be analyzed during the Statistics query, because EDM has
to detect differences between them. After performing the statistics query you
can select values you want to analyze further.
Why EDM detects empty classes or values in statistics? For better
results, it is good to use "clean" databases. Empty records are automatically
detected in database, and though they are not filled, they are also been
analyzed. To avoid this problem you can deselect empty values, if such are
detected in analyzed column, before creating rules and trees.
What number of classes is recommended for use for rules and decision
trees creation? The more classes are detected in the analyzed column, the
more precised results will be obtained during the statistics query. But their
number shouldn't be too big, for example, several hundreds of classes used for
rules creation will be analyzed much longer, than 20 classes. So the number of
classes should be reasonable from the point of view of statistics and
What is the difference between the Learning and Analyzed databases?
The Learning database is used for creating statistics, if-then rules and
decision trees. The Analyzed database is not used for these operations. It is
used for selecting records from the dataset with the use of rules or trees.
This additional feature allows you to easily apply the results of data mining
to some other database.
Why ID field is necessary for analyzing two or more databases together?
The ID field is not necessary if you want to analyze only one table. But
it is vital for estimating which record in one table corresponds to some other
record in another table. Without an ID field such relation cannot be set.
How values in statistics are divided into groups? The values are
automatically grouped in such way, that they detect differences between
classes in the best way.
What is the difference between Rules and Decision Trees? These two
data mining methods allow to look at the same problem from different points of
view. Rules can be represented in the form of tree, a decision tree - in a
form of rules, but these two methods differ not in the way of representation.
They differ in the way of obtaining. Used together, they create a good
combination of models for understanding relations in data.
How to use settings for Rules and Decision Trees creation? It is better
to start from high values for "Probability" and "Rule cases" settings. If
the number of obtained rules is low, or not created at all, low down these
values and create rules once again.
Why no Rules are created after I press "Create Rules" button?
This might happen if the "Probability" or "Rule cases" settings ("Query
options" dialog) values are too high for the dataset you are working on. For
example, the number of records for some class might be lower than "Rule
cases", or the analyzed data might contain too many unique values, so the
probability of rules might be lower than the one in the settings.
What algorithm is used for rules creation? The ESTARD Data Miner
algorithm used for creation of rules was specially created by our analysts. It
allows to obtain ALL if-then rules, of ANY length. So the length of rules
depends only on the number of fields you select for rules creation.