Kubernetes Persistent Volumes: Everything You Need to Know

Traditionally, distributed applications in Kubernetes are stateless, which means a pod can be recreated without having to worry about losing any local data from the container. However, for stateful applications, you need to store data, like images uploaded by users in a WordPress site. Kubernetes supports stateful workloads by using persistent volumes.

In this post, I’ll cover a few things you need to know about storage in Kubernetes and how you can make use of them within your applications. Moreover, I’m going to cover other topics essential to storage management, like backing and restoring volumes.

Kubernetes wheel signifying kubernetes persistent volumes

Storage in Kubernetes

When a pod has a problem, and Kubernetes needs to recreate it, all its data is lost because the new pod starts in a clean state. For some applications, like a database, the ability to persist or replicate its state is vital. To solve this problem, Kubernetes uses the volume abstraction. With volumes, your applications won’t lose any data when a restart happens and can be recreated at any time.

For a container, a volume is simply a directory, and what’s behind it is transparent. For instance, the volume could be a local folder in the host, or it could be a remote storage location in a cloud provider like Amazon Web Services EBS or Azure Disk. To use them, you have to mount the volume to a location inside the container. Notice that you can share a volume between containers in the same pod. However, these volumes have the same life cycle as a pod, meaning that they will be destroyed when the pod is restarted.

With volumes, your applications won't lose any data when a restart happens and can be recreated at any time.

So, how can you decouple storage from the pod life cycle? Well, by using persistent volumes.

Persistent Volumes

A persistent volume (PV) is a type of object that defines how a cluster provides storage and lives longer than a lifespan of a pod or even a node. In other words, PVs have a different life cycle than a pod, and they’re another resource in the cluster. Therefore, a Kubernetes administrator can configure storage in advance separately from your applications. Your pods then will claim storage with a PersistentVolumeClaim object. However, there’s a way to dynamically provision storage by using a StorageClass object; I’ll come back to this later.

Moreover, in a PV, you can define the implementation details like the access mode (like read-only or read-write), capacity, mount points, or the volume type (like NFS, iSCSI, or cloud provider storage). To do so, you could represent the PV configuration in a YAML format like the following:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0003
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  storageClassName: slow
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /tmp
    server: 172.17.0.2

Think about PVs as an available file shared location for the cluster that an application could use to mount volumes.

An important thing to know about PVs is that if a Kubernetes administrator attempts to delete a PV while it’s being used, the removal will be postponed until the PV is no longer active. If, on the other hand, an application stops using a PV, you can configure what happens next. At the time of writing this post, you can only configure the following actions: delete, retain, or recycle. However, the recycling policy is now deprecated, and it’s recommended to instead use dynamic provisioning, which I’ll cover in the next section.

Now let’s talk about how you can use a PV in your applications.

Persistent Volume Claims

A PersistentVolumeClaim (PVC) is how you reserve a PV to mount it later as a volume in containers within a pod. Let’s digest what I just said slowly. When you create a PV, you define specific properties like capacity or access mode. Those properties are important because when you create a PVC, you set the properties you want a PV to have. Then, Kubernetes will try to match a PV with the specifications from a PVC. I said “try” because if you have a PVC that requires 50Gi and there’s no PV with exactly 50Gi of capacity, it will bound it to a PV with higher capacity, even if it ends up wasting storage. However, other properties like access mode or volume type need to match. Let’s see how to define a PVC in a YAML manifest:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc0003
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 8Gi
  storageClassName: slow

Once a PVC is bound (matched) to a PV, you can use it the PVC in pods and mount volumes in containers. It’s as simple as defining the volume in the pod, and then mounting it to a local path inside the container. Let’s see how this process looks in YAML:

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
    - name: frontend
      image: nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: storage
  volumes:
    - name: storage
      persistentVolumeClaim:
        claimName: pvc0003

Before you ask, you can reference a PVC in a deployment object as well. In other words, more than one pod can use the same PVC. For example, you could have as many pods as you want of a WordPress deployment. This deployment can reference a PVC within the pod’s spec. When a user uploads a photo, it will be available to the rest of the pods because all the pods are using the shared volume. It won’t matter if a pod dies and comes back, since no data gets lost.

Dynamic Provisioning

What I’ve described so far with PV is known as “static provisioning,” and having to match a PVC with a PV could be problematic in the long term. Instead, a better approach is to use dynamic provisioning using a StorageClass object. A Kubernetes administrator has to create this class in advance but has to be done only once. Think of storage classes as a storage provider and not file shares, as you do with PVs. Most of the cloud providers, when you create a Kubernetes managed cluster, come with a default storage class defined. But you can configure one as well and use storage providers like Ceph or NFS.

To create a default storage class in the cluster, you have to use a YAML manifest like this one:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: slow
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-standard

Now, instead of creating several PV/PVC combinations, you create one storage class and several PVCs. When you create a PVC with a storage class provider, you claim storage on demand and avoid wasting storage. A YAML manifest for the PVC that uses the default storage class provider in the cluster looks like this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pv-claim
  labels:
    app: wordpress
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi

Notice how simple the manifest looks now. When you define a storage class in the PVC, the default one is used.

Backing up Volume Data

Once you start using PVs, PVCs, and storage classes, one concern that might arise could be in regards to backups. How can you back up the volume data in Kubernetes? Well, you have a few options. There are native objects in Kubernetes like VolumeSnapshot or VolumeSnapshotClass, which behave as another resource in the cluster. When you have taken a snapshot, you can restore it by creating a new PVC using the snapshot as the data source.

Additionally, there are other projects like Velero where you can do backups of volumes, similar to the native objects in Kubernetes. Additionally, you could configure backups of other resources in your cluster. For example, you could configure backups for an entire namespace in Kubernetes.

What’s Next?

What I’ve covered in this post is everything you need to know to get started with storage in Kubernetes. However, I didn’t cover other properties you can configure in PVs, PVCs, and storage classes. Or how to configure storage with providers like Ceph or NFS. So, if you want to go deeper, I suggest you go to the official documentation in Kubernetes to get more detailed information about working with storage in Kubernetes.

Lastly, if you want to get hands-on experience, a nice way to get started with storage in Kubernetes is to configure a WordPress site with a MySQL database. With the concepts you’ve seen here, you’ll be able to follow that guide without any problem.