ruflin/Elastica

"no permissions for []" for queries against aliases API

ThibautSF opened this issue · 9 comments

Hi,

I have an error that I didn't manage to understand (and might not be completely related to Elastica, or maybe I have an issue with a config somewhere...).

Intro

I have the following ES environment :

  • Amazon OpenSearch Service
  • Elasticsearch 7.10.2
  • t3.small.search with 3 nodes
  • Access with HTTPS transport

For PHP :

The issue

Every other queries (index creation, indexation, search, etc) works but when I'm trying to remove & create aliases I get an 500

PHP Fatal error: Uncaught Elastica\Exception\ResponseException: no
permissions for [] and User [name=elsadmin, backend_roles=[],
requestedTenant=null] in vendor\ruflin\elastica\src\Transport\Http.php:178

Note: the user elsadmin indicated is admin with full access to everything

The client is initialized with config

        [
            // Server
            'servers' => [
                [
                    'host' => '<host>',
                    'port' => 443,
                    'transport' => 'Https',
                    'username' => 'elsadmin',
                    'password' => '<password>',
                    'connectTimeout' => 10,
                ],
            ],
            // + some other options related to elastically
        ]

Alias query call

$data = ['actions' => []];

$data['actions'][] = ['remove' => ['index' => '*', 'alias' => $indexAlias]];
$data['actions'][] = ['add' => ['index' => $index->getName(), 'alias' => $indexAlias]];

$elasticaClient->request('_aliases', Request::POST, $data);

I remember AWS OpenSearch had some different auth mechanisms but as normal operations do work, not sure if this is related. It seems you don't get much about in the logs, I would look at the logs of Elasticsearch / OpenSearch to see if you see something there.

Side note: Elastica does not support and is not tested against OpenSearch but in 7.10 I would expect the APIs to still be mostly aligned.

I remember AWS OpenSearch had some different auth mechanisms but as normal operations do work, not sure if this is related. It seems you don't get much about in the logs, I would look at the logs of Elasticsearch / OpenSearch to see if you see something there.

I will search how to activate logs then because looks like they aren't on by default on AWS.
But the strange things I found :

  • only aliases API doesn't work in HTTP with Elastica; indice creation/deletion is ok, search is ok, indexation is almost ok (just have an issue with indexing more than 5 documents at once with bulk, 5 is ok but 6 is auto 429 Too Many Requests /_bulk, looks like logs might help me for that too)
  • with a GUI tool Elasticvue I'm able to delete & add alias manually (with the same user)

Side note: Elastica does not support and is not tested against OpenSearch but in 7.10 I would expect the APIs to still be mostly aligned.

This is for this specific reason I set up an Elasticsearch 7.10 and not OpenSearch 1.2
I wanted to test my implementation with normal ElasticSearch first (then in a second time try an OpenSearch instance)
image

The bulk request issue is odd. I would have expected a different error if the bulk request is too large. Do you have lots of other traffic on this instance? As you said, ES logs should help you in this scenario too.

You also not above, the elsadmin is a super user so I don't see how you should get problems with aliases :-( Have you tried to just run Elasticsearch locally on your machine and run the same code to see what happens?

Still trying to obtain AWS logs, but it doesn't give anything yet.

You also not above, the elsadmin is a super user so I don't see how you should get problems with aliases :-( Have you tried to just run Elasticsearch locally on your machine and run the same code to see what happens?

Local run works fine (although I only have HTTP and no users...)

I tried queries with postman to https://elsadmin:<pass>@<elasticdomain>:443/_aliases

And if I use the "*" wildcard in the remove action it generates the "no permissions for []" error
image

BUT if I use direct old indice name :
image

So looks like it confirms the fact that the issue is AWS side... I also made tests (but several months ago) with Elastic Cloud 30days demo instance and almost same code (and strictly same code for alias query part) was working on it.

The bulk request issue is odd. I would have expected a different error if the bulk request is too large. Do you have lots of other traffic on this instance? As you said, ES logs should help you in this scenario too.

I still need to deep down that part, but.. even if it's the smallest AWS instance available for OpenSearch (because it's only for dev tests) my local can handle more documents (by number and byte size) at one time with less RAM and nodes (1Go for 1 node (1 shard) against 2Go/nodes for 3 nodes (3 shards))
image
(AWS cluster from Elasticview firefox addon view)

Ok managed to find a workaround.
I made some more documentation read for _aliases
I suppose that the wildcard * was affecting hidden protected indices (security & co).

Since the implementation just adds the base name of the indices as alias I will override the class method in elastically and change the pattern from "*" to "my_index_base_name*"

image

And this way it works.

Glad you found a workaround:

I suppose that the wildcard * was affecting hidden protected indices (security & co).

I remember there was a bug in some of the 7.x releases in Elasticsearch. Would be interesting to know if it works with 7.17.

On the bulk request: Even though your instances are small and above 80% memory, I would still expect you can ingest more then 5 docs in a bulk. Are these especially large docs?

I remember there was a bug in some of the 7.x releases in Elasticsearch. Would be interesting to know if it works with 7.17.

Sadly AWS OpenSearch service is limited to 7.10, after that it's OpenSearch 1.2 (or manual cluster creation).
But I should be able to try older versions
image

On the bulk request: Even though your instances are small and above 80% memory, I would still expect you can ingest more then 5 docs in a bulk. Are these especially large docs?

Those are not large docs, those are docs containing attachments files (pdf, images, excel, etc...).
Each bulk is created based on 2 metrics :

  • number of documents in bulk (I keeped the default value by elastically 100)
  • byte size of documents (use min((int) (ini_get('post_max_size')), (int) (ini_get('upload_max_filesize'))) * 1.0e+6
    Before the document is added to the bulk, I check if a metric is reached and in that case, I send the bulk and then create a new bulk.

In my practice case, the 6 documents sent are below 1MB:

  • total 6 file size is around 600KB
  • since each binary is b64 encoded it goes up to 825KB
  • just add some other text data like name in db, id in db, if the file binaries are stored in db or disk, path if disk...

Those are not large docs, those are docs containing attachments files (pdf, images, excel, etc...).

Agree these are not large documents but still different from just JSON payload. If you don't use the attachments, to larger bulk requests go through?

Hi,

Took some time and fixes on my index requests script and made several tests + debugs.
And on each tests I reduced my upload max byte size parameter here is for 20MB

Flush 1
Queue size : 1/100
Queue bytes size : 8586291/20000000
array(4) { ["took"]=> int(513) ["ingest_took"]=> int(8662) ["errors"]=> bool(false) ["items"]=> array(1) { [0]=> array(1) { ["index"]=> array(9) { ["_index"]=> string(97) "45a88cff4bbf0973e254c6e87c0a971a76d812187f3376aa04eb9d121756b031_eln_pageattach_2022-05-03-111557" ["_type"]=> string(4) "_doc" ["_id"]=> string(64) "59f855d347347c5fc730fed4bce741e255f07ec0d4d5f0d466659c0abc9f25c3" ["_version"]=> int(1) ["result"]=> string(7) "created" ["_shards"]=> array(3) { ["total"]=> int(1) ["successful"]=> int(1) ["failed"]=> int(0) } ["_seq_no"]=> int(2) ["_primary_term"]=> int(1) ["status"]=> int(201) } } } }
Flush 2
Queue size : 1/100
Queue bytes size : 15550231/20000000
array(1) { ["message"]=> string(28) "429 Too Many Requests /_bulk" }
Flush 3
Queue size : 10/100
Queue bytes size : 16879040/20000000
array(1) { ["message"]=> string(28) "429 Too Many Requests /_bulk" }

And if I reduce it to 15MB all requests work because :

  • Flush 2 Is never sent (bulk is the minimum size of 1)
  • Flush 3 is separated into 2 bulks

But I need to recheck how my requests are created. Because the "Flush 2" which is 15,5MB contains a file of 8,8MB. And encoded base64 it should take around 11,7MB (33% increase). Se even if I pack some more data with the file, which are really basic data like file name, some ids... 3-4MB difference looks huge so I might have some unwanted data sent.

OR it's my method to calculate the bulk size which is wrong...

    getBytesSize($currentBulk->getActions()); //Note: see edit bellow
    
    function getBytesSize($arr): int
    {
        $tot = 0;

        if (is_array($arr)) {
            foreach ($arr as $a) {
                $tot += getBytesSize($a);
            }
        }
        if (is_string($arr)) {
            $tot += strlen($arr);
        }
        if (is_int($arr)) {
            $tot += PHP_INT_SIZE;
        }
        if (is_object($arr)) {
            $tot += strlen(serialize($arr));
        }

        return $tot;
    }

EDIT : OK calling getBytesSize((string) $currentBulk); instead looks to give a much better approximation value of the plyload.