Confused about Fig 3a in OTO v1 paper
Closed this issue · 3 comments
Thanks for reaching out. HSPG is a sparse optimizer to yield group sparsity. In the figure, we consider the union of
![image](https://private-user-images.githubusercontent.com/8930611/297439743-8c97471d-6ebc-4130-9da1-792e1410cf36.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk1NDEwMDcsIm5iZiI6MTcxOTU0MDcwNywicGF0aCI6Ii84OTMwNjExLzI5NzQzOTc0My04Yzk3NDcxZC02ZWJjLTQxMzAtOWRhMS03OTJlMTQxMGNmMzYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MjhUMDIxMTQ3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YzI5Y2I0ZjRhYWQ5ZGI2ODEzYjdiZWRkODZiNzBlNDE2ZGNhYmEwMTVhMjNlNGE3MWExZWU0MjRmYjIyMThhYiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.HUUSOUM9aNpd0x07WcExNhlDSvWEHa0cQsDcFGzLJOg)
You might consider the case to produce unstructured sparsity. For this case, please look into our OBProxSG paper. In fact, the Half-Space projector under
@tianyic Thanks for the reply!
My understanding of the algorithm is this:
- Do subgradient descent update. Call the new point
$\boldsymbol{x'_{k+1}}$ - Zero out the param groups that were previously zeroed. Call the new point
$\boldsymbol{\tilde{x}_{k+1}}$ . - Now, we need to figure out which new param groups to zero.
- We try to set each group of variables to zero. Let
$\boldsymbol{[\tilde{x}_{k+1}]_g}$ be the trial point where param group$g$ is zeroed out . - If
$\boldsymbol{[\tilde{x}_{k+1}]_g \cdot [x_k]_g} < 0$ , the trial point is on the opposite side of the origin from the previous point$\boldsymbol{x_k}$ . Then we can let the params in group$g$ to 0 because setting to 0 updates in the correct direction. - If
$\boldsymbol{[\tilde{x}_{k+1}]_g \cdot [x_k]_g} < \epsilon\ \boldsymbol{[x_k]_g \cdot [x_k]_g} $ and$\epsilon < 1$ , the trial point is on the same side as$[x_k]_g$ but is still closer to the origin. As we increase$\epsilon$ from 0 to 1, we increase the range of points where we can set params in$g$ to 0.
If this understanding is correct then i know why i was confused. Maybe this could help other people with similar problems.
- I interpreted
$\mathcal{G} = \{\{1,2\}\}$ as$g=\{1\}$ and$g=\{2\}$ when it should be$g=\{1,2\}$ . - I thought
$\mathcal{S}_k$ indicates the region where$\boldsymbol{x_{k+1}}$ would be set to zero. Now i realize it is the important region, i.e. we retain$\boldsymbol{x_{k+1}}$ if it's inside$\mathcal{S}_k$ .
But i still have questions:
- In Figure 3b, it seems like the
$\epsilon=0$ line should pass through$\mathcal{O}$ and$\epsilon=1$ should pass through$\boldsymbol{x_k}$ ? - According to Algorithm 2 param groups that are zeroed can never be activated again, is that correct?
- All the below look like the same thing. Should they be written as
$\boldsymbol{x'_{k+1}}$ ?
Happy to hear you figure the previous confusion out. Your understanding is largely correct.
Regarding your remaining question.
-
Yes,
$\epsilon$ is a hyper-parameter that we could control the region Half-Space. The feasible region is$[0,1)$ . Equaling to$0$ pass through origin, making it approaching$1$ would pass through$x_k$ . -
Yes, in the HSPG, DHSPG schema, the identified redundant groups are kept redundant (being zero). In AdaHSPG+, the theoretical paper, we discuss how to make it adaptively zero in/out to achieve a better redundancy identification.
-
Your suggestion makes sense. But I think it is better to make line 10 as
$[x_{k+1}]_{\mathcal{I}^{\neq 0}}$ for clarity.
Since we typically notate the final next iterate as