Notes on Multiobjective Optimization: Quality Assessment of Pareto Set Approximations

Quantifying Quality General Considerations

A general optimization problem $(X, Z, \mathbf{f}, rel)$ : $X$ denotes the decision space, $Z = \mathbb{R}^k$ is the objective space, $\textbf{f} = (f_1, f_2, \cdots, f_k)$ is the vector of objective functions, and $rel$ represents a binary relation over $Z$ that defines a partial order of the objective space, which in turn induces a preorder of the decision space.

Weak Pareto dominance: $\textbf{z}^1 \preceq \textbf{z}^2 \iff \forall i \in {1, \cdots, k}: \textbf{z}^1_i \leq \textbf{z}^2_i$

Pareto dominance: $\textbf{z}^1\prec \textbf{z}^2\iff \textbf{z}^1\preceq \textbf{z}^2 \land \neg \textbf{z}^2\preceq \textbf{z}^1$

A corresponding set problem $(\Psi, \Omega, \textbf{f}', rel')$ : the symbol $\Psi$ stands for the sets of all Pareto set approximations over $X$ , the set of all Pareto front approximations over $Z$ is represented by $\Omega$ , $\textbf{f}'(E) = {\textbf{z}\in Z|\exists\textbf{x}\in E: \textbf{z} = \textbf{f}(\textbf{x})}$ , $A$ $rel'$ $B\iff \forall \textbf{z}^2\in B, \exists \textbf{z}^1\in A: \textbf{z}^1$ $rel$ $\textbf{z}^2$

Quality Indicators: give more precise statements that quantify the difference in quality on a continuous scale, every quality indicator represents certain assumptions about the decision maker’s preferences, binary quality indicators exist.

Properties of Unary Quality Indicators

Monotonicity

$\forall A, B \in \Psi: A\preceq B\implies I(A)\ge I(B)$

strict: $\forall A, B \in \Psi: A\prec B\implies I(A) > I(B)$

Scaling

invariance: indicator values remain the same after after a strictly monotonic transformation

independent: indicator values remain same order after after a strictly monotonic transformation

Computation effort

runtime complexity depending on the number of solutions and the number of objectives

Additional problem knowledge

known Pareto optimal set? reference point or set?

Discussion of Selected Unary Quality Indicators

Terminology

$A$ : a Pareto set approximation

$P$ : the Pareto-optimal front

$n$ : the number of objectives

Outer Diameter

to be maximized

$I_{OD}(A) = \max_{1\le i\le n} w_i \left ( (\max_{\textbf{x}\in A} f_i(\textbf{x})) - (\min_{\textbf{x}\in A} f_i(\textbf{x})) \right )$

Proportion of Pareto-Optimal Objective Vectors Found

to be maximized

$I_{PF}(A) = \frac{\{\textbf{z}\in P | \exists \textbf{x}\in A: \textbf{f}(\textbf{x})\preceq \textbf{z}\}}{|P|}$

Cardinality

to be maximized

The number of elements of the set

Hypervolume Indicator

to be maximized

$I^*_H(A) = \int_{\textbf{z}_{lower}}^{\textbf{z}_{upper}} \alpha_A(\textbf{z})d\textbf{z}$

$\alpha_A$ is the attainment function

$\alpha_A(\textbf{z}) = \begin{cases}1 & \text{if $\exists \textbf{x}\in A: \textbf{f}(\textbf{x})\preceq \textbf{z}$} \\0 & \text{else}\end{cases}$

Completeness Indicator

to be maximized

$I_{CP}(A) = Prob[A\preceq \{\mathcal{U}\}]$

Epsilon Indicator Family

to be minimized

$I_\epsilon(A, B) = \inf_{\epsilon\in\mathbb{R}}\{\forall \textbf{x}^2\in B, \exists\textbf{x}^1\in A: \textbf{x}^1\preceq_\epsilon \textbf{x}^2\}$

multiplicative $\epsilon$ -dominance relation

$\textbf{x}^1\preceq_\epsilon \textbf{x}^2\iff\forall i \in \{1, \cdots, n\}: f_i(\textbf{x}^1) \le \epsilon\cdot f_i(\textbf{x}^2)$

additive $\epsilon$ -dominance relation

$\textbf{x}^1\preceq_{\epsilon+} \textbf{x}^2\iff\forall i \in \{1, \cdots, n\}: f_i(\textbf{x}^1) \le \epsilon + f_i(\textbf{x}^2)$

unary

$I_\epsilon^1 = I_\epsilon(A, R), I_{\epsilon+}^1 = I_{\epsilon+}(A, R)$

The D Indicator Family

to be minimized

$I_{D1}(A) = \frac{1}{|R|}\sum_{\textbf{x}^2\in R}\min_{\textbf{x}^1\in A}\max_{1\le i\le n}(0, w_i(f_i(\textbf{x}^1) - f_i(\textbf{x}^2)))$

$I_{D2}(A) = \max_{\textbf{x}^2\in R}\min_{\textbf{x}^1\in A}\max_{1\le i\le n}(0, w_i(f_i(\textbf{x}^1) - f_i(\textbf{x}^2)))$

The R Indicator Family

to be minimized

$I_{R2}(A, B) = \frac{\sum_{\lambda\in\Lambda}u^*(\lambda, A) - u^*(\lambda, B)}{|\Lambda|}$

$I_{R3}(A, B) = \frac{\sum_{\lambda\in\Lambda}\frac{u^*(\lambda, B) - u^*(\lambda, A)}{u^*(\lambda, B)}}{|\Lambda|}$

unary

$I_{R2}^1(A) = I_{R2}(R, A)$

$I_{R3}^1(A) = I_{R3}(A, R)$

usual utility functions: weighted linear function, the weighted Tchebycheff function, the augmented Tchebycheff function

$u_\lambda(\textbf{z}) = -\sum_{j\in{1, \cdots, n}}\lambda_j|z_j^* - z_j|$

$u_\lambda(\textbf{z}) = -\max_{j\in{1, \cdots, n}}\lambda_j|z_j^* - z_j|$

$u_\lambda(\textbf{z}) = -\left ( \max_{j\in{1, \cdots, n}}\lambda_j|z_j^* - z_j| + \rho\sum_{j\in{1, \cdots, n}}|z_j^* - z_j|\right )$

where $(\forall i\in {1, \cdots, n}: \lambda_n\ge0)\land(\sum_{j\in{1, \cdots, n}}\lambda_j = 1)$

Uniformity Measures

the standard deviation of nearest neighbor distances

Indicator Combinations and Binary Indicators

e.g. combine the epsilon indicator and the hypervolume indicator $\implies$ strictly monotonic, much less expensive

Stochasticity: General Considerations

transform the sample of approximation sets into

a sample of indicator values: first transforms the samples of Pareto set approximations into samples of real values using quality indicators; then, the resulting samples of indicator values are compared based on standard statistical testing procedures
an empirical attainment function: counts the number of approximation sets by which each objective vector is attained and normalizes the resulting number with the overall sample size
a sample of ranks: for each Pareto set approximation generated by one optimizer it is computed by how many Pareto set approximations produced by another optimizer it is dominated

Sample Transformations

Dominance Ranking

$rank(A_i) = |\{B|B\in{B_1, \cdots, B_s} \land B\prec A_i\}|$

recommended to be the first step in any comparison
quality indicators or the attainment function are optional

Quality Indicators

the same pros and cons as aforementioned

Empirical Attainment Function

for $r$ runs

$\alpha_r(\textbf{z}) = \frac{1}{r}\sum_{i=1}^r\textbf{I}(\textbf{f}'(A_i)\preceq {\textbf{z}})$

$k$ %-attainment set $\textbf{f}'(A)$

$\forall \textbf{z}\in Z: \alpha_r(\textbf{z})\ge \frac{k}{100}\iff \textbf{f}'(A)\preceq \{\textbf{z}\}$

attainment surface of $\textbf{f}'(A)$

$\textbf{z}\in\mathbb{R}^n: \textbf{f}'(A)\preceq \textbf{z}\land \neg \textbf{f}'(A)\prec\prec \textbf{z}$

less information is lost
computationally expensive

Statistical Testing

Fundamentals

subject: a sample of Pareto set approximations generated from multiple runs of an optimizer

descriptive statistics: mean and variance(box plots)
statistical inferences: by testing the possibility that the null hypothesis is true(statistical tests)

Statistical Inferences

null hypothesis $H_0$ : “samples $A$ and $B$ are drawn from the same distribution” or “samples $A$ and $B$ are drawn from distributions with the same mean value”

$p$ -value: the probability of obtaining a finding at least as ‘impressive’ as that obtained, computed by an inferential statistical test

$\alpha$ : the significance level defines the largest acceptable p-value and represents a threshold that is user-defined

$C = 1-\alpha$ : confidence level

$p$ -value $< \alpha \implies$ alternative hypothesis $H_A$ at a significance level of $\alpha$ : “sample $A$ comes from a better distribution than sample $B$ ” given by a one-tailed test or “sample A and sample B are from different distributions” given by a two-tailed test

Non-parametric Statistical Inference: Rank and Permutation Tests

nonparametric tests is better than parametric tests(assuming known distribution)

rank tests: less powerful, less sensitive to outliers and computationally cheap
permutation tests: more powerful, better for cases with many tied values, expensive

Comparing Samples of Dominance Ranks

A test statistic: sum over the ranks in each of the two samples and take the difference of these sums

whether the value of the test statistic is significant? the set $\{A_1, A_2, \cdots, A_r,B_1,B_2, \cdots, B_s\}$ is partitioned into one set of $r$ approximations and another set of $s$ approximations; for each partitioning the difference between the rank sums can be determined, finally yielding a distribution of rank sum differences

Comparing Sample Indicators Values

2 optimizers: the Mann-Whitney rank sum test or Fisher’s permutation test

more than 2 optimizers: the Kruskal-Wallis test

Comparing Empirical Attainment Functions

the Kolmogorov-Smirnov (KS) test: measures the maximum difference between the ECDFs and assesses the statistical significance of this difference

Which optimizer is better in which area? plot the difference in frequencies of attaining the goal or the $p$ -value of the difference

Advanced Topics

Matched Samples

matched: the influence of one or more random variables is partially removed from consideration

statistical tests to use: the Wilcoxon signed rank test, Fisher’s matched samples test

Multiple Testing

problem: the $p$ -values are not reliable when

There are more than two algorithms and we wish to make inferences about performance differences between all or a subset of them
There are just two algorithms, but we wish to make multiple statistical tests of their performance, e.g., considering more than one indicator

solution

Minimize the number of different tests carried out on a pair of algorithms by carefully choosing which tests to apply before collecting the data. Collect independent data for each test to be carried out
Apply the tests on the same data but use methods for correcting the $p$ -values for the reduction in confidence associated with data reuse, e.g. the significance level $\alpha_s$ for each distinct test is set to $\alpha_s = \frac{\alpha}{n}$ (Bonferroni correction)

Assessing Worst-Case or Best-Case Performance

use permutation methods

Summary

dominance-ranking
quality indicators(optional, reflect preference)
empirical attainment functions(optional, next level detail)

Reference

Branke, J., Branke, J., Deb, K., Miettinen, K., & Slowiński, R. (Eds.). (2008). Multiobjective optimization: Interactive and evolutionary approaches (Vol. 5252). Springer Science & Business Media.

Notes on Multiobjective Optimization: Quality Assessment of Pareto Set Approximations

Quantifying Quality General Considerations