ceph/ceph-ansible

setting up Ceph MDS server on an existing server

favourjuwe opened this issue · 4 comments

hello,

I am having a little issue setting up Ceph MDS server on an existing server
Last year, I set up a three-node Ceph cluster with
For CEPH setup ceph-ansible is used (https://github.com/ceph/ceph-ansible)
In my Ansible hosts file configuration

[all]
node1 ansible_host=10.xxx.1.xxx
node2 ansible_host=10.xxx.1.xxx
node3 ansible_host=10.xxx.1.xxx
[mons]
node1
node2
node3
[osds]
node1
node2
node3
[mgrs]
node1
node2
node3
[grafana-server]
node1
[rgws]

I do need help in setting up the MDS server

guits commented

Hi @favourjuwe,

I'm not sure how I could help you when I don't even know what you are looking for.

I am having a little issue

ok... what issue are you hitting? This definitely lacks details, are you seeing errors...?

Please, provide what is required when opening a new issue: inventory, group_vars, FULL ceph-ansible logs, ceph release, ceph-ansible version (sha1?), OS version.

The more details you share, the more effective the help will be. (We should both save a lot of time too...)

Hi Quits , thanks so much.

I am trying to set up active MDS deamon on ceph cluster with three Nodes.

From my dashboard, there is no active daemon on the dashboard
Dashboard

From my dashboard, there is no active MDS daemon

The present issue is on Node3

    The are the two  errors I am getting are :

MDS_ALL_DOWN: 2 filesystems are offline

MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average

MDS_UP_LESS_THAN_MAX: 2 filesystems are online with fewer MDS than max_mds

RECENT_CRASH: 2 daemons have recently crashed

[root@node3 mds]# ceph-mds --hot-standby 3
2022-05-04T02:30:58.508-0700 7f3ed0088600 -1 auth: error reading file: /var/lib/ceph/mds/ceph-admin/keyring: bufferlist::read_file(/var/lib/ceph/mds/ceph-admin/keyring): read error:(21) Is a directory
2022-05-04T02:30:58.508-0700 7f3ed0088600 -1 auth: failed to load /var/lib/ceph/mds/ceph-admin/keyring: (21) Is a directory
2022-05-04T02:30:58.508-0700 7f3ed0088600 -1 auth: error reading file: /var/lib/ceph/mds/ceph-admin/keyring: bufferlist::read_file(/var/lib/ceph/mds/ceph-admin/keyring): read error:(21) Is a directory
2022-05-04T02:30:58.508-0700 7f3ed0088600 -1 auth: failed to load /var/lib/ceph/mds/ceph-admin/keyring: (21) Is a directory
2022-05-04T02:30:58.508-0700 7f3ed0088600 -1 auth: error reading file: /var/lib/ceph/mds/ceph-admin/keyring: bufferlist::read_file(/var/lib/ceph/mds/ceph-admin/keyring): read error:(21) Is a directory
2022-05-04T02:30:58.508-0700 7f3ed0088600 -1 auth: failed to load /var/lib/ceph/mds/ceph-admin/keyring: (21) Is a directory
2022-05-04T02:30:58.508-0700 7f3ed0088600 -1 monclient: keyring not found
failed to fetch mon config (--no-mon-config to skip)

                  What I had set up 

ceph auth get-or-create mds.${3} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-${3}/keyring

create filesystem pools

ceph osd pool create cephfs0_data
ceph osd pool create cephfs0_metadata

create filesystem with explicit pools

ceph fs new cephfs0 cephfs0_metadata cephfs0_da

                               My environment

[root@node3 mnt]# ceph -s
cluster:
id: 1a15bbb4-b62c-441b-889b-4e5a230911d1
health: HEALTH_ERR
1 pools have many more objects per pg than average
2 filesystems are offline
2 filesystems are online with fewer MDS than max_mds
2 daemons have recently crashed

services:
mon: 3 daemons, quorum node1,node2,node3 (age 2h)
mgr: node3(active, since 2h), standbys: node2, node1
mds: cephfs:1 cephfs-3:0
osd: 108 osds: 108 up (since 2h), 108 in (since 8M)
rgw: 1 daemon active (node1.rgw0)

task status:

data:
pools: 28 pools, 817 pgs
objects: 102.46k objects, 36 GiB
usage: 237 GiB used, 982 TiB / 982 TiB avail
pgs: 817 active+clean

io:
client: 29 KiB/s rd, 11 KiB/s wr, 14 op/s rd, 2 op/s wr

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.