k8s-ceph

Setup

Install k8s with flannel network

Install ceph

Ceph setting

LVM

NTP

Install

sudo apt-get update
sudo apt-get install ntpdate ntp

Configuration

Extra commands

ntpdate -u <host> #The ntpdate command can be used to set the local date and time by polling the NTP server. Typically, you’ll have to do this only one time.

k8s + ceph

RBD storage

Working links

  • bug zilla befort create ceph image run this command on ceph-mon node
rbd create --image ceph-image --size 2G --image-feature layering

Installing the ceph-common

Installing the ceph-common Package The ceph-common library must be installed on all schedulable OpenShift Container Platform nodes:

yum install -y ceph-common

Creating the Ceph Secret

The ceph auth get-key command is run on a Ceph MON node to display the key value for the client.admin user:

apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
data:
  key: QVFBOFF2SlZheUJQRVJBQWgvS2cwT1laQUhPQno3akZwekxxdGc9PQ== #1

1 This base64 key is generated on one of the Ceph MON nodes using the ceph auth get-key client.admin | base64 command, then copying the output and pasting it as the secret key’s value.

Save the secret definition to a file, for example ceph-secret.yaml, then create the secret:

$ oc create -f ceph-secret.yaml
secret "ceph-secret" created

Verify that the secret was created:

# oc get secret ceph-secret
NAME          TYPE      DATA      AGE
ceph-secret   Opaque    1         23d

Creating the Persistent Volume

Next, before creating the PV object in OpenShift Container Platform, define the persistent volume file:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ceph-pv     #1
spec:
  capacity:
    storage: 2Gi    #2
  accessModes:
    - ReadWriteOnce #3
  rbd:              #4
    monitors:       #5
      - 192.168.122.133:6789
    pool: rbd
    image: ceph-image
    user: admin
    secretRef:
      name: ceph-secret #6
    fsType: ext4        #7
    readOnly: false
  persistentVolumeReclaimPolicy: Recycle

1 The name of the PV, which is referenced in pod definitions or displayed in various oc volume commands.

2 The amount of storage allocated to this volume.

3 accessModes are used as labels to match a PV and a PVC. They currently do not define any form of access control. All block storage is defined to be single user (non-shared storage).

4 This defines the volume type being used. In this case, the rbd plug-in is defined.

5 This is an array of Ceph monitor IP addresses and ports.

6 This is the Ceph secret, defined above. It is used to create a secure connection from OpenShift Container Platform to the Ceph server.

7 This is the file system type mounted on the Ceph RBD block device.

Save the PV definition to a file, for example ceph-pv.yaml, and create the persistent volume:

# oc create -f ceph-pv.yaml
persistentvolume "ceph-pv" created

Verify that the persistent volume was created:

# oc get pv
NAME                     LABELS    CAPACITY     ACCESSMODES   STATUS      CLAIM     REASON    AGE
ceph-pv                  <none>    2147483648   RWO           Available                       2s

Creating the Persistent Volume Claim

Creating the Persistent Volume Claim A persistent volume claim (PVC) specifies the desired access mode and storage capacity. Currently, based on only these two attributes, a PVC is bound to a single PV. Once a PV is bound to a PVC, that PV is essentially tied to the PVC’s project and cannot be bound to by another PVC. There is a one-to-one mapping of PVs and PVCs. However, multiple pods in the same project can use the same PVC.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: ceph-claim
spec:
  accessModes:     #1
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi #2

1 As mentioned above for PVs, the accessModes do not enforce access right, but rather act as labels to match a PV to a PVC.

2 This claim will look for PVs offering 2Gi or greater capacity.

Save the PVC definition to a file, for example ceph-claim.yaml, and create the PVC:

# oc create -f ceph-claim.yaml
persistentvolumeclaim "ceph-claim" created

#and verify the PVC was created and bound to the expected PV:
# oc get pvc
NAME         LABELS    STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
ceph-claim   <none>    Bound     ceph-pv   1Gi        RWX           21s

Creating the Pod

Creating the Pod A pod definition file or a template file can be used to define a pod. Below is a pod specification that creates a single container and mounts the Ceph RBD volume for read-write access:

apiVersion: v1
kind: Pod
metadata:
  name: ceph-pod1           #1
spec:
  containers:
  - name: ceph-busybox
    image: busybox          #2
    command: ["sleep", "60000"]
    volumeMounts:
    - name: ceph-vol1       #3
      mountPath: /usr/share/busybox #4
      readOnly: false
  volumes:
  - name: ceph-vol1         #5
    persistentVolumeClaim:
      claimName: ceph-claim #6

1 The name of this pod as displayed by oc get pod.

2 The image run by this pod. In this case, we are telling busybox to sleep.

3 5 The name of the volume. This name must be the same in both the containers and volumes sections.

4 The mount path as seen in the container.

6 The PVC that is bound to the Ceph RBD cluster.

Save the pod definition to a file, for example ceph-pod1.yaml, and create the pod:

# oc create -f ceph-pod1.yaml
pod "ceph-pod1" created

#verify pod was created
# oc get pod
NAME        READY     STATUS    RESTARTS   AGE
ceph-pod1   1/1       Running   0          2m

CephFS

k8s official example

apiVersion: v1
kind: Pod
metadata:
  name: cephfs2
spec:
  containers:
  - name: cephfs-rw
    image: kubernetes/pause
    volumeMounts:
    - mountPath: "/mnt/cephfs" #1
      name: cephfs
  volumes:
  - name: cephfs
    cephfs:
      monitors: #2
      - 10.16.154.78:6789
      - 10.16.154.82:6789
      - 10.16.154.83:6789
      user: admin #3
      secretRef: #4
        name: ceph-secret
      readOnly: true #5
      path: "/" #6

1 Path inside container

2 Array of Ceph monitors.

3 The RADOS user name. If not provided, default admin is used.

4 Reference to Ceph authentication secrets. If provided, secret overrides secretFile.

5 Whether the filesystem is used as readOnly.

6 Used as the mounted root, rather than the full Ceph tree. If not provided, default / is used.

Optional param: secretFile: The path to the keyring file. If not provided, default /etc/ceph/user.secret is used.

Now you can see pod with name cephfs-rw

kubectl get pods

CephFS + StatefulSet(k8s API)

StatefulSet k8s example

Prerequisite

You must create storage class from this kubernetes/examples/staging/volumes/cephfs repo. Use rbac deployment for k8s >= 1.9. Ensure that all components are in the same namespace.

Constraints
  • Critical
    • Ensure that all components are in the same namespace
StatefulSet Example

For example we use namespace cephfs

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: cephfs #1
spec:
  selector:
    matchLabels:
      app: mysql
  serviceName: mysql
  replicas: 2 #2
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: busybox
        command: ["sleep", "60000"]
        volumeMounts:
        - name: data
          mountPath: /usr/share/busybox
  volumeClaimTemplates: #3
  - metadata:
      name: data
    spec:
      storageClassName: cephfs #4
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

1 Use the same namespace that you use in Prerequisite section.

2 Two instances of service will be created.

3 In this section you can see dynamic creating of ceph's volumes for k8s containers.

4 StorageClass was created in Prerequisite section.


Ceph 12.2.x

Constraints

  • Critical
    • Ceph's user can't be named with username ceph

Allow application to use poll

After creating a pool you must allow application that can access to this pool Applications:

  • cephfs
  • rdb
  • rgw
ceph osd pool application enable <app> <pool>

Allow dashboard

# allow dasboard
ceph mgr module enable dashboard

# ip adn port
ceph config-key set mgr/dashboard/server_addr $IP
ceph config-key set mgr/dashboard/server_port $PORT

# reverse proxes
ceph config-key set mgr/dashboard/url_prefix $PREFIX

Deploy Ceph's OSDs on disk partition with ceph-volume(for 12.2.x +)

# install lvm packages
apt-get install lvm2

# copy ceph's client.bootstrap-osd key from file ceph.bootstrap-osd.keyring to /var/lib/ceph/bootstrap-osd/ceph.keyring
# For example:
pwd
#output:
#    /home/zagrebaev/my-cluster
cp ceph.bootstrap-osd.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring

For example, we have unused /dev/sdd device. Let's create OSDs on /dev/sdd2. Firstly, we need to parted this device:

Prepare. Official doc

# create GPT partition table on /dev/sdd device
parted --script /dev/sdd mklabel gpt

# create /dev/sdd1
parted --script /dev/sdd mkpart primary 1 20%

# create /dev/sdd2
parted --script /dev/sdd mkpart primary 20% 100%

#prepare osd
ceph-volume lvm prepare --bluestore --data /dev/sdd2

Secondly, we need to activate OSD. For example, the above command created osd.2

Activate. Official doc

# get  OSD uuid from file osd_fsid
cat /var/lib/ceph/osd/ceph-2/fsid
# Example output:
    7e6a5fdd-e0d4-4b4b-b21b-0b72d41177c1

# Activate osd.2. Coomand: ceph-volume lvm activate --bluestore $OSD_ID  $OSD_UIID
ceph-volume lvm activate --bluestore 2 7e6a5fdd-e0d4-4b4b-b21b-0b72d41177c1

We created OSD!


Monitoring cluster

Deploy monitoring server

git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-server
kubectl create -f deploy/

Deploy Heapster

Note:

git clone https://github.com/DmitryZagr/heapster.git
cd heapster
git checkout cephfs_storage
kubectl create -f deploy/kube-config/rbac/
kubectl create -f deploy/kube-config/influxdb/

Extra Info

Ubuntu 16.04

update kernel to latest version

sudo apt install --install-recommends linux-image-generic-hwe-16.04

Testing env

Single node

OC ceph k8s k8s pod network kernel version
Ubuntu 16.04 LTS 10.2.10 1.9.0 Calico
Ubuntu 16.04 LTS 12.2.2 1.9.1 Calico 4.13.0-32-generic
Ubuntu 16.04 LTS 12.2.2 1.9.2 Calico 4.13.0-32-generic
Ubuntu 16.04 LTS 12.2.2 1.9.3 Calico 4.13.0-32-generic

Multi node

OC ceph k8s k8s pod network kernel version
Ubuntu 16.04 LTS 12.2.2 1.9.2 Flannel v.0.10.0 4.13.0-32-generic
Ubuntu 16.04 LTS 12.2.2 1.9.3 Flannel v.0.10.0 4.13.0-32-generic