Using trace capabilities
The trace capabilities gadget allows us to see what capability security checks are triggered by applications running in Kubernetes Pods.
Linux capabilities allow for a finer privilege control because they can give root-like capabilities to processes without giving them full root access. They can also be taken away from root processes. If a pod is directly executing programs as root, we can further lock it down by taking capabilities away. Sometimes we need to add capabilities which are not there by default. You can see the list of default and available capabilities in Docker . Specially if our pod is directly run as user instead of root (runAsUser: ID), we can give some more capabilities (think as partly root) and still take all unused capabilities to really lock it down.
Here we have a small demo app which logs failures due to lacking capabilities. Since none of the default capabilities is dropped, we have to find out what non-default capability we have to add.
$ cat docs/examples/app-set-priority.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: set-priority
labels:
k8s-app: set-priority
spec:
selector:
matchLabels:
name: set-priority
template:
metadata:
labels:
name: set-priority
spec:
containers:
- name: set-priority
image: busybox
command: [ "sh", "-c", "while /bin/true ; do nice -n -20 echo ; sleep 5; done" ]
$ kubectl apply -f docs/examples/app-set-priority.yaml
deployment.apps/set-priority created
$ kubectl logs -lname=set-priority
nice: setpriority(-20): Permission denied
nice: setpriority(-20): Permission denied
We could see the error messages in the pod’s log. Let’s use Inspektor Gadget to watch the capability checks:
$ kubectl gadget trace capabilities --selector name=set-priority
NODE NAMESPACE POD CONTAINER PID COMM UID CAP CAPNAME AUDIT VERDICT
minikube default set-priority-5646554d9d-pk4gg set-priority 110385 nice 0 23 SYS_NICE 1 Deny
minikube default set-priority-5646554d9d-pk4gg set-priority 110592 nice 0 23 SYS_NICE 1 Deny
minikube default set-priority-5646554d9d-pk4gg set-priority 110764 nice 0 23 SYS_NICE 1 Deny
minikube default set-priority-5646554d9d-pk4gg set-priority 110965 nice 0 23 SYS_NICE 1 Deny
minikube default set-priority-5646554d9d-pk4gg set-priority 111134 nice 0 23 SYS_NICE 1 Deny
^C
Terminating...
We can leave the gadget with Ctrl-C.
In the output we see that the SYS_NICE
capability got checked when nice
was run.
We should probably add it to our pod template for nice
to work. We can also drop
all other capabilities from the default list (see link above) since nice
did not use them:
The meaning of the columns is:
CAP
: capability number.CAPNAME
: capability name in a human friendly format.AUDIT
: whether the kernel should audit the security request or not.VERDICT
: whether the capability was present (allow) or not (deny)
$ cat docs/examples/app-set-priority-locked-down.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: set-priority
labels:
k8s-app: set-priority
spec:
selector:
matchLabels:
name: set-priority
template:
metadata:
labels:
name: set-priority
spec:
containers:
- name: set-priority
image: busybox
command: [ "sh", "-c", "while /bin/true ; do nice -n -20 echo ; sleep 5; done" ]
securityContext:
capabilities:
add: ["SYS_NICE"]
drop: [all]
At this moment we have to make sure that we are allowed to grant SYS_NICE
for new pods in the
restricted pod security policy.
$ kubectl get psp
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
nginx-ingress-controller false NET_BIND_SERVICE RunAsAny MustRunAs MustRunAs MustRunAs false configMap,secret
privileged true * RunAsAny RunAsAny RunAsAny RunAsAny false *
restricted false RunAsAny MustRunAs MustRunAs MustRunAs false configMap, …
For privileged pods adding SYS_NICE
would work, but not for the default pods.
We can change that by editing the policy.
$ kubectl edit psp restricted # opens the editor to add the below two lines
spec:
allowPrivilegeEscalation: false
allowedCapabilities: # <- add these two
- SYS_NICE # lines here
…
After saving we can verify that we are allowed to add new pods which grant SYS_NICE
.
$ kubectl get psp
NAME PRIV CAPS SELINUX RUNASUSER FSGROUP SUPGROUP READONLYROOTFS VOLUMES
nginx-ingress-controller false NET_BIND_SERVICE RunAsAny MustRunAs MustRunAs MustRunAs false configMap,secret
privileged true * RunAsAny RunAsAny RunAsAny RunAsAny false *
restricted false SYS_NICE RunAsAny MustRunAs MustRunAs MustRunAs false configMap, …
Let’s verify that our locked-down version works.
$ kubectl delete -f docs/examples/app-set-priority.yaml
deployment.apps "set-priority" deleted
$ kubectl apply -f docs/examples/app-set-priority-locked-down.yaml
deployment.apps/set-priority created
$ kubectl logs -lname=set-priority
The logs are clean, so everything works!
We can see the same checks but this time with the Allow
verdict:
$ kubectl gadget trace capabilities --selector name=set-priority
NODE NAMESPACE POD CONTAINER PID COMM UID CAP CAPNAME AUDIT VERDICT
minikube default set-priority-768db6dcf7-rp8gd set-priority 10158 nice 0 23 SYS_NICE 1 Allow
minikube default set-priority-768db6dcf7-rp8gd set-priority 10365 nice 0 23 SYS_NICE 1 Allow
You may include a kernel call stack for more context with --print-stack
. (If
we see additional SYS_ADMIN
checks we can ignore them since only priviledged
pods have this capability and it’s not a default capability.)
You can now delete the pod you created:
$ kubectl delete -f docs/examples/app-set-priority-locked-down.yaml