Verify Usage of Opensearch Core System and Hidden Indices
Opened this issue · 12 comments
Description/Concept of System Indices and Hidden Indices on Opensearch core:
System Index - An index must extend system index plugin for a index to be called as system index.
Example: Security Plugin correctly extends and uses system Indices, more info here
Hidden index - An index must have hidden SETTING_INDEX_HIDDEN = "index.hidden" set on the index setting to call it as hidden. It doesn't matter if it starts with "." or not.
Example: Asynchronous-Search plugin correctly sets the index setting value here
Misconception:
Many plugins still misunderstand the actual definition of system and hidden indices. Usual misconception is that any index starting with .
like .indexName
is a system or hidden index, but this is incorrect. Any user can create an index with .indexName
which is neither a system or hidden index. So users can mistakenly still create index starting with "." there is nothing stopping them from doing it.
To avoid this misconception all plugins should adopt/on-board with concept of system and hidden indices defined in opensearch core.
Goal:
The main ask of this issue is to make sure all plugins having/using system and hidden indices must on-board/adopt with concept defined in opensearch core.
Any plugin already on-board with opensearch core defined concept of system and hidden indices can ignore this issue and close the issue as completed on the plugin repo.
Additional info:
The following info provided below is not necessary for system/hidden indices, but might be useful info for plugins using system indices :
If your system indices need additonal security features/benefits provided by security plugin, follow the steps provided here. But to make sure these are additonal features provided by security plugin and it is completely decoupled from concept of system indices. It is upto the plugin owners to decide if they need these additional security benefits.
Open questions
In case of any questions or issues, please post it in core issue
Have same confusion like opensearch-project/OpenSearch#9239 (comment)
Not to add more confusion to system indices, but the security plugin also has a notion of protected indices which are indices that are given special protections, but are not system indices.
ml-commons is using security protected indices, not system you defined as
An index must extend system index plugin for a index to be called as system index.
Should we close this issue for now?
@ylwu-amzn I think, we need to figure out what is the purpose of index here. If the index holds some important info related to system/plugin and if we don't want a regular user to play around with index then it makes sense to define the index as system index. I want to make sure we are not intermixing both system index and security provided protected indices, both are different concepts and are completely decoupled.
It's upto plugin owners to decide which indices should be
-> system indices and
-> which should be security protected indices and
-> which should be both system and security protected indices
Some confusion here @Rishikesh1159.
In ML-Commons/most plugins, they do not implement the system index interface[1] instead use security plugin to configure security plugin's "system indices"[2]. Security Plugin internally doesn't register[3] these with core i.e from core perspective ML-Commons/plugins do not use core system index SystemIndexDescriptor
[1].
Security Plugin introduced the concept of system indices with commit[4], this is a new feature while deprecating protected indices[5].
That said coming back to the original issue, ML-Commons has 2 use cases:
- When no security in the cluster: ML-Commons doesn't care if a user is accessing plugin metadata, because all users can access all resources (models, connectors etc in this case) i.e it doesn't need a SystemIndex.
- When security is present in the cluster: ML-Commons wants to provide fine grained access control (FGAC) for its resources and it would only be possible by constructs provided by the security plugin (like user identity, roles, backend roles etc). So without security plugin, FGAC doesn't work which is why ML-Commons uses the system index concept introduced by security plugin (not core).
I hope this helps. This is how most plugins are configured to use today. Closing this issue, let me know if you need anything else.
[1] https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/plugins/SystemIndexPlugin.java
[2] https://github.com/opensearch-project/security/blob/main/tools/install_demo_configuration.sh#L385-L386
[3] https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/OpenSearchSecurityPlugin.java#L1886
[4] opensearch-project/security@d420f16
[5] https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/support/ConfigConstants.java#L296
Hi @saratvemulapalli , as part of this activity we want to move the indices created by the plugins to have a standardization. If your team feels that they should not onboard to system indices, they can create the indices with hidden property true(more details in the meta issue). Though I would strongly recommend that verify if you need special handling for the index created by your plugin by core and onboard to system indices.
That said coming back to the original issue, ML-Commons has 2 use cases:
When no security in the cluster: ML-Commons doesn't care if a user is accessing plugin metadata, because all users can access all resources (models, connectors etc in this case) i.e it doesn't need a SystemIndex.
When security is present in the cluster: ML-Commons wants to provide fine grained access control (FGAC) for its resources and it would only be possible by constructs provided by the security plugin (like user identity, roles, backend roles etc). So without security plugin, FGAC doesn't work which is why ML-Commons uses the system index concept introduced by security plugin (not core).
I think we are confusing the system indices with how a user can access the indices or not in non fgac or fgac setting.
You can take the following issues/PR/code on how other plugins are handling it:
Regarding this comment, I feel like we should re-visit our system indices implementation especially for encryption key. Currently we need to depend on security plugin, which is a bottleneck for us.
I think we should follow same logic as AD opensearch-project/anomaly-detection#975 (comment)
All AD indices are correctly configured as hidden indices. We have intentionally not enabled them as system indices. While system indices benefit from a dedicated threadpool for read/write operations and have higher priority compared to user indices, we want to ensure that the operations related to AD indices do not disproportionately impact the overall system performance. Hence, we chose not to use system indices for AD.
It's same for ml-commons, we don't want to impact the overall system performance
Regarding opensearch-project/OpenSearch#9239 (comment), I feel like we should re-visit our system indices implementation especially for encryption key. Currently we need to depend on security plugin, which is a bottleneck for us.
Sorry about the initial confusion, the system indices of core has no correlation with accessibility and have updated the comment. Thanks!
It's same for ml-commons, we don't want to impact the overall system performance
What all operations we do on the system indices of ml-commons? Are we doing heavy ingestion/search on these indices?
What all operations we do on the system indices of ml-commons? Are we doing heavy ingestion/search on these indices?
We save model/task related data into system indices when user create/update. User can get/search these data. No heavy ingestion/search on these indices.
@Arpit-Bandejiya
AFAIK system indices in ML-Commons is used for protecting the access to the plugin metadata index.
ML-Commons doesn't really have heavy ingest/search operations on these indices, which is why we really do not want use system indices and do not dent overall performance of the cluster.
I think there is some confusion here:
- If the indices are not ingest/search operation heavy, we should ideally move the indices to the system indices. This is for better resiliency of the system indices. Why do we believe it will impact the performance of the cluster?
- The system indices list of security plugin is a totally different thing, that is used for security of the index being accessed. The current issue is for the system indices of the core.
As far as I understood, we can move the ml-commons indices to system indices of core as well(Do not change anything in system indices list of security plugin). This will be helpfull for cases when the backpressure is applied and the ml-common plugin is being ingest/searched. We won't be rejecting those requests and the plugin functionality won't be impacted.