Confused about Fig 3a in OTO v1 paper

Question

Confused about Fig 3a in OTO v1 paper

Closed this issue 4 months ago · 3 comments

Hello,

From the definition of $\mathcal{S}(\mathbf{x})$, and assuming $\epsilon = 0 ,\ \mathcal{I}^{0}= \{ 1 \}$, the half space should be the whole region on the right. The projection should be straight onto $[x]_1=0$ or $[x]_2=0$

What am i missing?

Answer 1 · 2024-01-17T16:05:14.000Z

@iamanigeeit

Thanks for reaching out. HSPG is a sparse optimizer to yield group sparsity. In the figure, we consider the union of $[x]_1$ and $[x]_2$ as a single variable group. Therefore, they need to be project onto zero at the same time. The Half-Space is then formed which boundary is orthogonal to the projection direction $-[x]$.

You might consider the case to produce unstructured sparsity. For this case, please look into our OBProxSG paper. In fact, the Half-Space projector under $\epsilon=0$ is equivalent to the orthant projector in OBProxSG if employing onto singleton groups.

Answer 2 · 2024-02-21T17:53:32.000Z

@tianyic Thanks for the reply!

My understanding of the algorithm is this:

Do subgradient descent update. Call the new point $\boldsymbol{x'_{k+1}}$
Zero out the param groups that were previously zeroed. Call the new point $\boldsymbol{\tilde{x}_{k+1}}$.
Now, we need to figure out which new param groups to zero.
We try to set each group of variables to zero. Let $\boldsymbol{[\tilde{x}_{k+1}]_g}$ be the trial point where param group $g$ is zeroed out .
If $\boldsymbol{[\tilde{x}_{k+1}]_g \cdot [x_k]_g} < 0$, the trial point is on the opposite side of the origin from the previous point $\boldsymbol{x_k}$. Then we can let the params in group $g$ to 0 because setting to 0 updates in the correct direction.
If $\boldsymbol{[\tilde{x}_{k+1}]_g \cdot [x_k]_g} < \epsilon\ \boldsymbol{[x_k]_g \cdot [x_k]_g} $ and $\epsilon < 1$, the trial point is on the same side as $[x_k]_g$ but is still closer to the origin. As we increase $\epsilon$ from 0 to 1, we increase the range of points where we can set params in $g$ to 0.

If this understanding is correct then i know why i was confused. Maybe this could help other people with similar problems.

I interpreted $\mathcal{G} = \{\{1,2\}\}$ as $g=\{1\}$ and $g=\{2\}$ when it should be $g=\{1,2\}$.
I thought $\mathcal{S}_k$ indicates the region where $\boldsymbol{x_{k+1}}$ would be set to zero. Now i realize it is the important region, i.e. we retain $\boldsymbol{x_{k+1}}$ if it's inside $\mathcal{S}_k$.

But i still have questions:

In Figure 3b, it seems like the $\epsilon=0$ line should pass through $\mathcal{O}$ and $\epsilon=1$ should pass through $\boldsymbol{x_k}$?
According to Algorithm 2 param groups that are zeroed can never be activated again, is that correct?
All the below look like the same thing. Should they be written as $\boldsymbol{x'_{k+1}}$?

Just before Fig 3a:

Fig 3a:

Algorithm 2:

Answer 3 · 2024-02-22T16:57:30.000Z

@iamanigeeit

Happy to hear you figure the previous confusion out. Your understanding is largely correct.

Regarding your remaining question.

Yes, $\epsilon$ is a hyper-parameter that we could control the region Half-Space. The feasible region is $[0,1)$. Equaling to $0$ pass through origin, making it approaching $1$ would pass through $x_k$.
Yes, in the HSPG, DHSPG schema, the identified redundant groups are kept redundant (being zero). In AdaHSPG+, the theoretical paper, we discuss how to make it adaptively zero in/out to achieve a better redundancy identification.
Your suggestion makes sense. But I think it is better to make line 10 as $[x_{k+1}]_{\mathcal{I}^{\neq 0}}$ for clarity.

Since we typically notate the final next iterate as $x_{k+1}$, other notations such as $x_{k+1}'$ or $\hat{x}_{k+1}$ refer to trial iterate that needs further adjustment to get the final next iterate.