目录
Quantifying Quality General Considerations
A general optimization problem :
denotes the decision space,
is the objective space,
is the vector of objective functions, and
represents a binary relation over
that defines a partial order of the objective space, which in turn induces a preorder of the decision space.
Weak Pareto dominance:
Pareto dominance:
A corresponding set problem : the symbol
stands for the sets of all Pareto set approximations over
, the set of all Pareto front approximations over
is represented by
,
,
Quality Indicators: give more precise statements that quantify the difference in quality on a continuous scale, every quality indicator represents certain assumptions about the decision maker’s preferences, binary quality indicators exist.
Properties of Unary Quality Indicators
Monotonicity
strict:
Scaling
invariance: indicator values remain the same after after a strictly monotonic transformation
independent: indicator values remain same order after after a strictly monotonic transformation
Computation effort
runtime complexity depending on the number of solutions and the number of objectives
Additional problem knowledge
known Pareto optimal set? reference point or set?
Discussion of Selected Unary Quality Indicators

Terminology
: a Pareto set approximation
: the Pareto-optimal front
: the number of objectives
Outer Diameter
to be maximized
Proportion of Pareto-Optimal Objective Vectors Found
to be maximized
Cardinality
to be maximized
The number of elements of the set
Hypervolume Indicator
to be maximized
is the attainment function
Completeness Indicator
to be maximized
Epsilon Indicator Family
to be minimized
multiplicative -dominance relation
additive -dominance relation
unary
The D Indicator Family
to be minimized
The R Indicator Family
to be minimized
unary
usual utility functions: weighted linear function, the weighted Tchebycheff function, the augmented Tchebycheff function
where
Uniformity Measures
the standard deviation of nearest neighbor distances
Indicator Combinations and Binary Indicators
e.g. combine the epsilon indicator and the hypervolume indicator strictly monotonic, much less expensive
Stochasticity: General Considerations
transform the sample of approximation sets into
- a sample of indicator values: first transforms the samples of Pareto set approximations into samples of real values using quality indicators; then, the resulting samples of indicator values are compared based on standard statistical testing procedures
- an empirical attainment function: counts the number of approximation sets by which each objective vector is attained and normalizes the resulting number with the overall sample size
- a sample of ranks: for each Pareto set approximation generated by one optimizer it is computed by how many Pareto set approximations produced by another optimizer it is dominated
Sample Transformations
Dominance Ranking
- recommended to be the first step in any comparison
- quality indicators or the attainment function are optional
Quality Indicators
the same pros and cons as aforementioned
Empirical Attainment Function
for runs
%-attainment set
attainment surface of
- less information is lost
- computationally expensive
Statistical Testing
Fundamentals
subject: a sample of Pareto set approximations generated from multiple runs of an optimizer
- descriptive statistics: mean and variance(box plots)
- statistical inferences: by testing the possibility that the null hypothesis is true(statistical tests)
Statistical Inferences
null hypothesis : “samples
and
are drawn from the same distribution” or “samples
and
are drawn from distributions with the same mean value”
-value: the probability of obtaining a finding at least as ‘impressive’ as that obtained, computed by an inferential statistical test
: the significance level defines the largest acceptable p-value and represents a threshold that is user-defined
: confidence level
-value
alternative hypothesis
at a significance level of
: “sample
comes from a better distribution than sample
” given by a one-tailed test or “sample A and sample B are from different distributions” given by a two-tailed test

Non-parametric Statistical Inference: Rank and Permutation Tests
nonparametric tests is better than parametric tests(assuming known distribution)
- rank tests: less powerful, less sensitive to outliers and computationally cheap
- permutation tests: more powerful, better for cases with many tied values, expensive
Comparing Samples of Dominance Ranks
A test statistic: sum over the ranks in each of the two samples and take the difference of these sums
whether the value of the test statistic is significant? the set is partitioned into one set of
approximations and another set of
approximations; for each partitioning the difference between the rank sums can be determined, finally yielding a distribution of rank sum differences
Comparing Sample Indicators Values
2 optimizers: the Mann-Whitney rank sum test or Fisher’s permutation test
more than 2 optimizers: the Kruskal-Wallis test
Comparing Empirical Attainment Functions
the Kolmogorov-Smirnov (KS) test: measures the maximum difference between the ECDFs and assesses the statistical significance of this difference
Which optimizer is better in which area? plot the difference in frequencies of attaining the goal or the -value of the difference
Advanced Topics
Matched Samples
matched: the influence of one or more random variables is partially removed from consideration
statistical tests to use: the Wilcoxon signed rank test, Fisher’s matched samples test
Multiple Testing
problem: the -values are not reliable when
- There are more than two algorithms and we wish to make inferences about performance differences between all or a subset of them
- There are just two algorithms, but we wish to make multiple statistical tests of their performance, e.g., considering more than one indicator
solution
- Minimize the number of different tests carried out on a pair of algorithms by carefully choosing which tests to apply before collecting the data. Collect independent data for each test to be carried out
- Apply the tests on the same data but use methods for correcting the
-values for the reduction in confidence associated with data reuse, e.g. the significance level
for each distinct test is set to
(Bonferroni correction)
Assessing Worst-Case or Best-Case Performance
use permutation methods
Summary
- dominance-ranking
- quality indicators(optional, reflect preference)
- empirical attainment functions(optional, next level detail)
Reference
Branke, J., Branke, J., Deb, K., Miettinen, K., & Slowiński, R. (Eds.). (2008). Multiobjective optimization: Interactive and evolutionary approaches (Vol. 5252). Springer Science & Business Media.
Pingback: MOOP开发日志 - Fivyex's Blog