Distributed Storage with CEPH

What is ceph

Ceph is a software-defined storage solution designed to address the object, block, and file storage needs of data centers adopting open source as the new norm for high-growth block storage, object stores, and data lakes.

Ceph aims primarily for entirely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

In this way, administrators have a single, consolidated system that avoids silos and collects the storage within a common management framework. As a result, Ceph consolidates several storage use cases and improves resource utilization. It also lets an organization deploy servers where needed.

Ceph is more challenging to install and maintain than some of the competitors, so if you are looking for a system that addresses a single problem, and you can “outsource” the redundancy requirements to hardware (or several servers with switchover), ceph might not be your solution for quick deployment.

Alternatives to ceph

As we can use ceph in three categories: filesystem, object storage, and block devices, we need to see for alternatives in each of the fronts:

Filesystem

Sharing a filesystem is the oldest challenge, where we can find classic alternatives to ceph, such as:

NFS: Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. Like many other protocols, NFS builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. NFS is an open standard defined in a Request for Comments (RFC), allowing anyone to implement the protocol.
Samba(SMB): Samba is a free software re-implementation of the SMB networking protocol and was initially developed by Andrew Tridgell. Samba provides file and print services for various Microsoft Windows clients. It can integrate with a Microsoft Windows Server domain, either as a Domain Controller (DC) or as a domain member. In addition, as of version 4, it supports Active Directory and Microsoft Windows NT domains.

Object Storage

Minio: Object Storage for the era of the Hybrid Cloud. MinIO’s high-performance, Kubernetes-native object storage suite is built for the demands of the hybrid cloud. Software-defined, it delivers a consistent experience across every Kubernetes environment.

Block devices

LIO: In computing, Linux-IO (LIO) Target is an open-source implementation of the SCSI target that has become the standard one included in the Linux kernel. Internally, LIO does not initiate sessions but instead provides one or more Logical Unit Numbers (LUNs), waits for SCSI commands from a SCSI initiator, and performs required input/output data transfers. LIO supports standard storage fabrics, including FCoE, Fibre Channel, IEEE 1394, iSCSI, iSCSI Extensions for RDMA (iSER), SCSI RDMA Protocol (SRP), and USB. It is included in most Linux distributions; native support for LIO in QEMU/KVM, libvirt, and OpenStack makes LIO a storage option for cloud deployments.

Source of all descriptions: Official webpage and Wikipedia

Summary

I will install ceph and test the functionality for sharing a filesystem and the object-store. I consider those two the most important parts of ceph over the possibility of defining and sharing block devices.

The logic behind my tests is that most development nowadays moves toward microservices and containerization, moving away from virtualization techniques. Therefore, the possibility of having centralized storage for block devices is significant. Instead, in microservices and container, I consider more important to be able to mount volumes in our container and save/read data in object storages.

Installing ceph

As the latest version (pacific), the recommend installation way is via the command cephadm. The application deploys and manages a ceph cluster. It does this by connecting the manager daemon to hosts via SSH. The manager daemon can add, remove, and update ceph containers.

The requirements are:

Python 3
Systemd
Podman or Docker for running containers
Time synchronization (such as chrony or NTP)
LVM2 for provisioning storage devices

To install, we shall run the following commands as root. The installation will leave the executable in the /usr/sbin directory.

curl --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm

```
chmod +x cephadm
```
```
./cephadm add-repo --release pacific
```
```
./cephadm install
```

Now to provide the services to start a cluster, we need to boostrap:

cephadm bootstrap --mon-IP <YOUR LOCAL IP ADDRESS>

After the installation, you will see something similar to the following.

Ceph Dashboard is now available at:
 
             URL: https://BARE:8443/
            User: admin
        Password: <some random password>
 
You can access the Ceph CLI with:
 
        sudo /usr/sbin/cephadm shell --fsid 6945738e-c42a-11eb-b8ba-ff25ba285015 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring
 
Please consider enabling telemetry to help improve Ceph:
 
        ceph telemetry on
 
For more information, see:
 
        https://docs.ceph.com/docs/master/mgr/telemetry/

You can access the web interface from the machine you installed ceph.

If you wish to access from another machine, consider a port forward with ssh, for example:

ssh luis@bare.local -L 8443:localhost:8443

After the forwarding, we can access another machine via https://localhost:8443, log in and change the random password generated by the bootstrap command.

Install ceph cli

To install the ceph command, we run:

cephadm install ceph-common

Adding a disk to the cluster

To add all available unused space to CEPH, we have the command:

ceph orch apply osd --all-available-devices

We can add --dry-run, to simulate the command rather than execute.

Otherwise, we will need to specify the device to use

ceph orch daemon add osd <hostname>:/dev/sd<letter>

Important to note that it is different from the minio object store, in which we will specify a directory to use. In ceph, we need to select a storage device, raw, without any filesystem nor partitions.

In my case, I have two external USB Disks that I am going to use for storage. Because the disk contains some old data, I will (partially) zero the devices.

Zeroing existing hard drives

The steps to zero one of the external USB disks, located in /dev/sdd are:

EXTREME CAUTION: Double check your command. Otherwise, you could end up destroying important data or your own OS.

```
dd if=/dev/zero of=/dev/sdd
```
Let it run for some seconds and CTRL+C the command

Adding the disks to the cluster: example

After zeroing both devices, I will run:

ceph orch apply osd --all-available-devices --dry-run`

You will need to execute the command twice, the first to trigger the scan and the second time to see the results.

After a while (3 minutes in my case), the system detected both devices, and I executed:

ceph orch apply osd --all-available-devices`

Running without redundancy

We will need to change some rules regarding the size of the pool. By default, ceph is fault-tolerant, which means that data get replicated along with several devices, or at least enough information is saved to build the information back.

In our case, we are only experimenting with ceph, and I don’t have the resources for a recommended installation. However, in production, you should remember that you need several machines (>3) with several hard drives (>3 per machine) to do a robust, fault-tolerant installation.

We will proceed to disable the functionality mentioned above of ceph, using it as a single device. We need to modify the following relevant values in a cluster configuration (via web GUI), using level advance:

osd_pool_default_size to 1
osd_pool_default_min size to 1
mon_allow_pool_size_one to True

Nonetheless, the device_health device was already created with a size of 3. This is a pool created automatically by ceph to store the health data of the whole cluster. We can change and remove the redundancy, running:

ceph osd pool set device_health_metrics size 1

Creating a filesystem

Now I will create a new filesystem in Ceph, named Bigfoot, and limit it to 1 replica.

ceph fs volume create Bigfoot

Testing the filesystem

First, we need to install ceph-fuse in the machine we want to perform the test:

apt-get install ceph-fuse ceph-common

Adapt the installation command in case you are in a not Debian-based distribution.

Now, we can mount:

ceph-fuse -id admin -m 192.168.178.30:6789 ./test/

I created several directories and files in the test mount point and checking in the Dashboard, and you can see the user is connected to the cluster. You can also see in the Dashboard the directories created in the root of Bigfoot.

Creating an object storage

Now, let’s set up our object storage. First, we need to create a pool the services and the pool:

ceph orch apply rgw Bigfoot '--placement=1 BARE' --port=8000

We also need to add object storage to the Dashboard.

radosgw-admin user create --uid=1000 --display-name=admin --system

Next, need to save the credentials and pass them to the ceph dashboard. For that, we can run:

radosgw-admin user info --uid=1000`

And saving the access and secret key in files, we run the commands:

ceph dashboard set-rgw-API-access-key -i access.key
ceph dashboard set-rgw-API-secret-key -i secret.key

We should see in the dashboard access to the Object Gateway, and we can create a Bucket with the name bigfoot and owner the user 1000 I create previously. We can leave all options of the Bucket as by default.

Testing the object storage

First, we need to install s3fs, so we execute as root:

apt-get install s3fs

We also need to create a file with the credentials for s3fs. For that, we will need the access.key and secret.key files of the last part:

echo $(cat user):$(cat secret) > s3password

Finally we can mount the object storage with:

s3fs bigfoot /root/test -o passwd_file=/root/s3password -o url=http://192.168.178.30:8000/ -o use_path_request_style

With an editor, I created five files with random text and saved them. In the Dashboard, I can see that the object storage now contains five objects.

Creating a block device

Last we can create an image to use as a remote block device for virtual machines thanks to iSCSI.

First, we need to create a pool for our block disks, so we type:

ceph osd pool create hdpool
ceph osd pool application enable hdpool rbd

Next, we go to the Dashboard, and under Block -> Images, we click on “Create”

We fill all data required by “*”, and leave the rest as default. For example, I used the Name of Shared_Disk and 5GB of space.

This way, we can create a block disk and share it with virtualization software or operating systems (Windows 10, Linux, etc.) supporting iSCSI.