Class Reference
%iKnow.Classification.Optimizer
|
|
![]() |
|||
Private Storage |
This class automates selecting "appropriate" terms for a
See the individual property descriptions of their impact on the optimization process.
|
|
|
The number of terms to add during anAddTerms cycle. The top results according toRankScores will be added, as selected from theAddWindowSize terms tested in the cycle.
The number of terms to test in each round. If left at 0, this defaults to the number of cores the system has available, which should be most efficient.
The builder object to be optimized.
If
ScoreMetric is set to a 'Weighted*' value, the weights for each category are retrieved from this array, indexed by category name. If no category weight is set, it is assumed to be 0.Note: Weights don't need to add up to 1.
The class name of the current "best" classifier. This value is set duringOptimize , or as part of theAddTerms andRemoveTerms methods.
The score of the current classifier. This value is updated byAddTerms andRemoveTerms .
The key to%DeepSee.PMML.Utils.TempResult for the test results ofCurrentClassifier .
The domain using which the categorization model is being trained and tested. This assumes the value of the Builder's DomainId property when registering an IKnowBuilder instance asBuilder , if not set explicitly.
The maximal decrease in performance the optimizer should accept when trying to remove terms. If removing a term would imply a decrease larger than this figure, it will not be removed. A value of 1 means the maximal score decrease is 1%
The metadata field containing the actual category values to compare predictions against. This assumes the value of the Builder's MetadataField property when registering an IKnowBuilder instance asBuilder , if not set explicitly.
The minimal score increase % a term should ensure to be retained for further testing. If the score does not increase by at least this figure, it will be discarded from the list of terms to test. A value of 1 means the minimal score increase should be 1%
The number of terms to remove in a "remove" cycle. Setting this value > 1 assumes the terms deemed irrelevant (and scheduled to be removed) don't influence one another much and removing more in a single cycle will not worsen performance much more than the individual performance changes of each term removal alone.
The ratio of
RemoveTerms cycles vsAddTerms cycles. This should be a value between 0 and 1 (inclusive).Note: Remove cycles take significantly longer than add cycles
The default accuracy metric to use for evaluating test results, as used byRankScores . If set to a 'Weighted*' value, the weights are retrieved fromCategoryWeights .
The test set to validate model accuracy increases/decreases against.
If set to a boolean value, defines whether or not to write output to the current device during theOptimize method. If set to a string, it is treated as a global reference to which output needs to be written.
|
This method does one round of processing, testing
AddWindowSize candidate terms and selecting the best pCount terms according toRankScores , unless it wouldn't meet theMinimalScoreIncreas threshold.If pCount < 0, it defaults to
RemoveCount .
This method clears the temporary artifacts the optimizer has created while optimizing, such as theCurrentClassifier class andCurrentTestId test results.
Initializes this Optimizer instance. This method is called automatically as part ofOptimize
Loads all terms from the supplied array. If pListIndex is non-zero, the term info is read from that index at each array position. If the term info itself is a list structure as well, it is interpreted as follows: pTerms(n) = $lb(term, type, negationpolicy, matchpolicy)
Loads a list of candidate terms based on a SQL query. The query should return a column named "term" containing the term's value and may return columns named "type", "negation" and "match" to configure the type, negation and count policy for each term being retrieved, respectively.
In at most pMaxSteps steps, the current
Builder will be optimized by testing, one at a time, the terms added throughLoadTermsSQL andLoadTermsArray , judging which term works best for each test window by the results ofRankScores (see alsoAddTerms ). Every (1/) rounds, all terms in the dictionary so far will be tested for their contribution to the current model score and the lowest terms will be removed (see alsoRemoveCount>RemoveStepRatio) rounds, all terms in the dictionary so far will be tested for their contribution to the current model score and the lowest RemoveCount RemoveTerms ).At the end of the optimization process, in addition to
Builder being updated,CurrentClassifier will contain the class name of the last test class used to achieve the best result and pTestId will point to the test results for that class.
Test the impact of removing each term in the current model's TermDictionary individually. The pCount terms for which, after removing it,
RankScores still returns the best score (which supposedly implies its contribution was minimial), will be removed from the TermDictionary, unless the decrease in performance surpassesMaximalScoreDecrease .If pCount < 0, it defaults to
RemoveCount .
Saves theCurrentClassifier class to the desired pClassName, so it will not be removed after this Optimizer instance is dropped. IfCurrentClassifier is not set or if the class no longer exists for other reasons, the current builder object will create a classifier class based on its current state.