Ot: why ubuntu box fill disk space so quickly?

justaguy · April 9, 2019, 11:33pm

Sure, I can reboot the box but I don’t think a good idea to reboot a sort of “production” server often.
I’ve been trying to figure out what frequently fill up the disk space with your help,
now it seems the jre under lucee4 docker image is the culprit.

Thanks.

joe.gooch · April 10, 2019, 1:58pm

I agree, rebooting a production server isn’t great, but you also have to determine when you’re not making progress.

The first step was identifying where the large files/directories were

The second step was you deleted a bunch of them and the space didn’t free, which meant you now

hit the third step which was trying to identify what was held open, what wasn’t

which led to the fourth step where you were looking at possibly clearing system libraries.

Rebooting solves 2-4. Now you’re back to 1. When the space usage spikes, don’t just DELETE stuff, use du to identify the folders and files that are growing. Identify the biggest offenders. Ignore anything less than 1MB. Figure out where the space usage is. THEN we can help you formulate a targeted plan for resolving the root cause, not just patching symptoms.

Note there are also steps you can take to limit the damage - right now you’re not using LVM, you have a single hard drive and it’s partitioned. That’s like, 0 for 3.

I always split my filesystem into (AT LEAST) /, /var, /usr. Why? Because /var is going to grow. If it grows, it can fill var. But it’s not going to fill / or /usr. Even better would be splitting out /var/lib/docker, and /var/log as well. LVM makes this way easier because you can dynamically assign space to the different areas on the fly, without rebooting. In fact with LVM you could just attach more space as a second drive and start using it immediately to resolve the immediate problem.

Of course, this goes even FURTHER off topic. There are numerous howtos on how to do these sorts of things, Redhat will be all over LVM, guides for debian, guides for ubuntu, etc.

https://unix.stackexchange.com/questions/131311/moving-var-home-to-separate-partition

https://access.redhat.com/discussions/641923

https://www.control-escape.com/linux/lx-partition.html

But in the world of virtual servers, I recommend you do NOT partition your drives. My VMs generally fit this pattern:

/dev/sda 4GB drive, partitioned, /dev/sda1 is the full disk size, starting at block 2048 (properly aligned), type linux, active and ext4 formatted. This gets mounted at /boot. Why? Because this is the only thing the BIOS/EFI needs to boot. It could be MBR, it could be GPT, it could be whatever - as long as you can install grub, it can get to the kernel and initrd, you’re golden.

/dev/sdb 8GB drive NOT PARTITIONED, literally mkswap -L swap /dev/sdb

/dev/sdc 100GB drive NOT PARTITIONED for LVM - pvcreate /dev/sdc, vgcreate system /dev/sdc

Then I carve out root, usr, var, home, opt. With Bionic i’ve noticed slow boots if / and /usr are on different partitions so I keep them together now, so like 8GB for /, 5gb for /var, 1GB for home, the rest in OPT, leave like 5GB free for snapshots and future problems.

home is small. Usually I’m building servers, so home doesn’t need to be big. My package development/jenkins box has a 100GB home. Most of the others 1GB is mostly unused.

/var is also small. If it fills, I either extend it, or identify WHY it grew and move that stuff elsewhere (i.e. /opt) or decrease retention or isolate it to a new partition. /var/lib/docker is a great one to isolate because docker puts EVERYTHING there - logs, containers, layers, volumes.

And then the most critical piece - monitoring. You need to know when the drives reach 85% full. Not 100% full and failing. When they reach 85% you usually have time to respond. (Usually. When I was playing with logstash it ended up spewing 5GB into /var/log in a matter of 45min) See monitoring-plugins-basic, check_disk, or just write a script, there will be tons of options online, or go crazy and install nagios.

All of this will, however, require more work than simply firing up an AMI, or taking a stock template from a cloud provider. If you’re going to go that route, since you’re using docker anyway, CoreOS already does just about all this in an 8GB footprint with automatic updates. You just need to add a drive for /var/lib/docker and away you go. (I add /storage too for any other odds and ends I might need to bind mount)

justaguy · April 10, 2019, 3:24pm

Thank you very much for the insightful and thoughtful post, Joe.
You’re correct I was reactive instead of proactive when it comes to our ubuntu virtual machine’s admin, btw, it’s a droplet from Digital Ocean.
I need to put more sources into its admin.

justaguy · April 10, 2019, 4:14pm

Here’s an update, finally found the culprit (docker created many unexpected and large containers), disk space is now freed up substantially.

Thanks.

joe.gooch · April 11, 2019, 2:49am

If it didnt free space, then the space WILL show in du. Identify the cause of the space usage and go from there.

Resize should make your disk bigger (vda). But you still have to mess around with partition tables to get those bigger, whicb will require a reboot and it very easy to get wrong.

justaguy · April 11, 2019, 3:38am

Thank you, Joe, it now has tons of free disk spaces. Resize is no longer needed at this point.

joe.gooch · April 11, 2019, 1:39pm

Another thing you may want to do is create a /etc/docker/daemon.json with these contents:


{

"log-driver": "json-file",

"log-opts": {

"max-size": "100m",

"max-file": "3"

}

}

If you don’t make other arrangements when you start a container through docker or docker-compose, it’ll log stdout/stderr forever, and grow forever, with no space restrictions. This limits logs to 300mb per container. (Which should be plenty, if you have more than that, you should be using another log driver anyway)

I don’t know for sure if this was your problem but it could have been.

justaguy · April 11, 2019, 2:26pm

Great tip, thank you Joe.