What is Ceph?
Ceph is an open source software-defined storage solution designed to address the block, file and object storage needs of modern enterprises. Its highly scalable architecture sees it being adopted as the new norm for high-growth block storage, object stores, and data lakes.
Ceph makes it possible to decouple data from physical storage hardware using software abstraction layers, which provides unparalleled scaling and fault management capabilities. This makes Ceph ideal for cloud, Openstack, Kubernetes, and other microservice and container-based workloads, as it can effectively address large data volume storage needs.
Use cases for Ceph range from private cloud infrastructure (both hyper-converged and disaggregated) to big data analytics and rich media, or as an alternative to public cloud storage.
What is a Ceph cluster?
A Ceph storage cluster consists of the following types of daemons:
- Cluster monitors (
ceph-mon
) that maintain the map of the cluster state, keeping track of active and failed cluster nodes, cluster configuration, and information about data placement and manage authentication. - Managers (
ceph-mgr
) that maintain cluster runtime metrics, enable dashboarding capabilities, and provide an interface to external monitoring systems. - Object storage devices (
ceph-osd
) that store data in the Ceph cluster and handle data replication, erasure coding, recovery, and rebalancing. Conceptually, an OSD can be thought of as a slice of CPU/RAM and the underlying SSD or HDD. - Rados Gateways (
ceph-rgw
) that provide object storage APIs (swift and S3) via http/https. - Metadata servers (
ceph-mds
) that store metadata for the Ceph File System, mapping filenames and directories of the file system to RADOS objects and enabling the use of POSIX semantics to access the files. - iSCSI Gateways (
ceph-iscsi
) that provide iSCSI targets for traditional block storage workloads such as VMware or Windows Server.
Ceph stores data as objects within logical storage pools. A Ceph cluster can have multiple pools, each tuned to different performance or capacity use cases. In order to efficiently scale and handle rebalancing and recovery, Ceph shards the pools into placement groups (PGs). The CRUSH algorithm defines the placement group for storing an object and thereafter calculates which Ceph OSDs should store the placement group.