manticoresoftware/manticoresearch-helm

Worker's service should be headless for DNS resolving to work.

Closed this issue · 13 comments

The replica.php script relies on the statefulset's pods subdomains resolving:

$sql = "JOIN CLUSTER $clusterName at '".$first.".".$workerService.":9312'";

This works only if the statefulset's governing service is headless: StatefulSet limitations

StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.`    
    
There should be `clusterIP: None` in the worker service manifest     
https://github.com/manticoresoftware/manticoresearch-helm/blob/master/chart/templates/service-worker.yaml    
    
With the current implementation of the worker service manifest the workers pod names can't be resolved and cluster join doesn't work.   
   
 
 

➤ Klim Todrik commented:

Strange, but I don't see any errors:

Replica hook: Pods count:2 
Replica hook: Join cluster 
Replica hook: Sql query: JOIN CLUSTER manticore at 'manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312' 
Replica hook: Join success 

In worker 1 I've created table

mysql> create table pq type='percolate'; 
mysql> show tables; 
+-------+-----------+ 
| Index | Type      | 
+-------+-----------+ 
| pq    | percolate | 
+-------+-----------+ 
mysql> ALTER CLUSTER manticore ADD pq; 

In worker 0 and 2:

root@manticore-manticoresearch-worker-0:/etc/manticoresearch# mysql -e "show tables;" 
+-------+-----------+ 
| Index | Type      | 
+-------+-----------+ 
| pq    | percolate | 
+-------+-----------+ 

I get the following log:

manticoresearch-worker Replica hook: Pods count:2
manticoresearch-worker Replica hook: Join cluster
manticoresearch-worker /* Mon Jan 24 08:41:30.078 2022 conn 1 */ JOIN CLUSTER manticore at 'manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312' # error=cluster 'manticore', no nodes available(manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312), error 'manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312 invalid node, no AF_INET address found for: manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc'
manticoresearch-worker Replica hook: Sql query: JOIN CLUSTER manticore at 'manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312'
manticoresearch-worker Replica hook: QL error: cluster 'manticore', no nodes available(manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312), error 'manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312 invalid node, no AF_INET address found for: manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc'

➤ Klim Todrik commented:

If you adding clusterIP:none all works fine?

Yes, it works fine if I add clusterIP:none to the worker's service spec.

➤ Klim Todrik commented:

What your Kubernetes version?

1.20.12

➤ Klim Todrik commented:

I've tried to reproduce your issue via Kind:

$ kubectl version 
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-21T02:20:54Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"} 
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-03-11T06:23:38Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"} 
WARNING: version difference between client (1.23) and server (1.20) exceeds the supported minor version skew of +/-1 
$ helm install manticore -n manticore --create-namespace . 
NAME: manticore 
LAST DEPLOYED: Wed Jan 26 17:56:19 2022 
NAMESPACE: manticore 
STATUS: deployed 
REVISION: 1 
TEST SUITE: None 
NOTES: 
.... 
$ k get po -n manticore 
NAME                                                  READY   STATUS              RESTARTS   AGE 
manticore-manticoresearch-balancer-6c6b76c8ff-zf4x9   0/1     ContainerCreating   0          12s 
manticore-manticoresearch-worker-0                    0/1     ContainerCreating   0          12s 
 
$ kubectl scale statefulsets manticore-manticoresearch-worker -n manticore --replicas=3 
statefulset.apps/manticore-manticoresearch-worker scaled 
 
$ k get po -n manticore 
NAME                                                  READY   STATUS              RESTARTS   AGE 
manticore-manticoresearch-balancer-6c6b76c8ff-zf4x9   1/1     Running             0          55s 
manticore-manticoresearch-worker-0                    1/1     Running             0          55s 
manticore-manticoresearch-worker-1                    1/1     Running             0          13s 
manticore-manticoresearch-worker-2                    0/1     ContainerCreating   0          2s 
 
 
$ kubectl -n manticore logs manticore-manticoresearch-worker-0 
Mount success 
[Wed Jan 26 14:56:57.884 2022] [1] using config file '/etc/manticoresearch/manticore.conf' (435 chars)... 
[Wed Jan 26 14:56:57.886 2022] [1] starting daemon version '4.2.0 15e927b28@211223 release' ... 
starting daemon version '4.2.0 15e927b28@211223 release' ... 
[Wed Jan 26 14:56:57.886 2022] [1] listening on UNIX socket /var/run/mysqld/mysqld.sock 
listening on UNIX socket /var/run/mysqld/mysqld.sock 
[Wed Jan 26 14:56:57.886 2022] [1] listening on all interfaces for mysql, port=9306 
listening on all interfaces for mysql, port=9306 
[Wed Jan 26 14:56:57.887 2022] [1] listening on all interfaces for sphinx and http(s), port=9308 
listening on all interfaces for sphinx and http(s), port=9308 
[Wed Jan 26 14:56:57.887 2022] [1] listening on all interfaces for VIP mysql, port=9301 
listening on all interfaces for VIP mysql, port=9301 
[Wed Jan 26 14:56:57.887 2022] [1] listening on 10.244.0.7:9312 for sphinx and http(s) 
listening on 10.244.0.7:9312 for sphinx and http(s) 
PHP Warning:  mysqli::__construct(): (HY000/2002): Connection refused in /etc/manticoresearch/replica.php on line 18 
 
 
Wait for searchd came alive 
[Wed Jan 26 14:56:57.902 2022] [22] prereading 0 indexes 
prereading 0 indexes 
[Wed Jan 26 14:56:57.902 2022] [22] prereaded 0 indexes in 0.000 sec 
prereaded 0 indexes in 0.000 sec 
[Wed Jan 26 14:56:57.902 2022] [1] accepting connections 
accepting connections 
Replica hook: Pods count:1 
Replica hook: Create new cluster 
Replica hook: Sql query: CREATE CLUSTER manticore 
 
 
 
$ kubectl -n manticore logs manticore-manticoresearch-worker-1 
Mount success 
[Wed Jan 26 14:57:06.387 2022] [1] using config file '/etc/manticoresearch/manticore.conf' (435 chars)... 
[Wed Jan 26 14:57:06.389 2022] [1] starting daemon version '4.2.0 15e927b28@211223 release' ... 
starting daemon version '4.2.0 15e927b28@211223 release' ... 
[Wed Jan 26 14:57:06.389 2022] [1] listening on UNIX socket /var/run/mysqld/mysqld.sock 
listening on UNIX socket /var/run/mysqld/mysqld.sock 
[Wed Jan 26 14:57:06.390 2022] [1] listening on all interfaces for mysql, port=9306 
listening on all interfaces for mysql, port=9306 
[Wed Jan 26 14:57:06.390 2022] [1] listening on all interfaces for sphinx and http(s), port=9308 
listening on all interfaces for sphinx and http(s), port=9308 
[Wed Jan 26 14:57:06.390 2022] [1] listening on all interfaces for VIP mysql, port=9301 
listening on all interfaces for VIP mysql, port=9301 
[Wed Jan 26 14:57:06.390 2022] [1] listening on 10.244.0.9:9312 for sphinx and http(s) 
listening on 10.244.0.9:9312 for sphinx and http(s) 
PHP Warning:  mysqli::__construct(): (HY000/2002): Connection refused in /etc/manticoresearch/replica.php on line 18 
 
 
Wait for searchd came alive 
[Wed Jan 26 14:57:06.403 2022] [22] prereading 0 indexes 
prereading 0 indexes 
[Wed Jan 26 14:57:06.403 2022] [22] prereaded 0 indexes in 0.000 sec 
prereaded 0 indexes in 0.000 sec 
[Wed Jan 26 14:57:06.403 2022] [1] accepting connections 
accepting connections 
Replica hook: Pods count:2 
Replica hook: Join cluster 
Replica hook: Sql query: JOIN CLUSTER manticore at 'manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312' 
Replica hook: Join success  
 
 
$ kubectl -n manticore logs manticore-manticoresearch-worker-2 
Mount success 
[Wed Jan 26 14:57:15.848 2022] [1] using config file '/etc/manticoresearch/manticore.conf' (437 chars)... 
[Wed Jan 26 14:57:15.851 2022] [1] starting daemon version '4.2.0 15e927b28@211223 release' ... 
starting daemon version '4.2.0 15e927b28@211223 release' ... 
[Wed Jan 26 14:57:15.851 2022] [1] listening on UNIX socket /var/run/mysqld/mysqld.sock 
listening on UNIX socket /var/run/mysqld/mysqld.sock 
[Wed Jan 26 14:57:15.851 2022] [1] listening on all interfaces for mysql, port=9306 
listening on all interfaces for mysql, port=9306 
[Wed Jan 26 14:57:15.851 2022] [1] listening on all interfaces for sphinx and http(s), port=9308 
listening on all interfaces for sphinx and http(s), port=9308 
[Wed Jan 26 14:57:15.851 2022] [1] listening on all interfaces for VIP mysql, port=9301 
listening on all interfaces for VIP mysql, port=9301 
[Wed Jan 26 14:57:15.851 2022] [1] listening on 10.244.0.11:9312 for sphinx and http(s) 
listening on 10.244.0.11:9312 for sphinx and http(s) 
PHP Warning:  mysqli::__construct(): (HY000/2002): Connection refused in /etc/manticoresearch/replica.php on line 18 
 
 
Wait for searchd came alive 
[Wed Jan 26 14:57:15.866 2022] [21] prereading 0 indexes 
prereading 0 indexes 
[Wed Jan 26 14:57:15.866 2022] [21] prereaded 0 indexes in 0.000 sec 
prereaded 0 indexes in 0.000 sec 
[Wed Jan 26 14:57:15.866 2022] [1] accepting connections 
accepting connections 
Replica hook: Pods count:3 
Replica hook: Join cluster 
Replica hook: Sql query: JOIN CLUSTER manticore at 'manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc:9312' 
Replica hook: Join success  
 

So as you can see all was deployed successfully. Look like the problem on your side and chart deploys normally

I don't have a clue how does resolving of the subdomain manticore-manticoresearch-worker-0.manticore-manticoresearch-worker-svc work in your cluster without headless service, could you please explain it?
Which DNS addon is used in your k8s cluster?

➤ Klim Todrik commented:

No addons. Moreover, this also works on pure clusters like Minikube or Kind. You can check it yourself if you want.

Implemented in 4.2.0.1

I've additionally tested with Kind and with cluster configured by "kubernetes hard way", and it seems that the "root" of the issue is in CoreDNS.

If k8s cluster uses CoreDNS as DNS, it creates "A" records for subdomains of statefulset service (.) even if that statefulset's governing service is not headlesss, like in your helm chart.

But if the DNS in k8s cluster is managed by kube-dns, these records are registered only if the governing service is headless, which looks consistent with the kubernetes documentation.

The problem is that GCP managed K8S clusters use kube-dns addon :)

@vesvalo can you confirm the new version works fine for you?

Yes, I confirm that the 4.2.0.1 works fine in the k8s cluster with kube-dns.
Thank you!