Ot: why ubuntu box fill disk space so quickly?

#1

Hi,

I’m not a Unix/Linux/Ubuntu admin but I’m quite comfortable with it.
I’m on an ubuntu box leased from digital ocean.
We do not add much data to any of our services but quite frequently something seems to cause the disk space to be full. We have something like 80ish GB of disk space.

Any insight into the issue?

Thanks.

0 Likes

#2

Its almost always going to be a log or series of logs that are grinding up disk space in Linux. Check logs in these locations: https://tutorials.ubuntu.com/tutorial/viewing-and-monitoring-log-files#1. If you find one(s) that are sucking up your disk space then you have a place to start searching on how to either turn down the amount of logging or, in some cases, disable it.

HTH

– Denny

1 Like

#3

Denny,

With du -mxs /* |sort -nr ( advised by joe gooch ), I noticed that /var/logs takes some spaces, but not a ton of them, removing /var/logs and /var/cache does not help much.

the /usr folder takes most disk spaces and yet I cannot seem to touch any sub folder under it.

It’s a hard nut to crack…

thanks.

0 Likes

#4

If you’re not already logged in as root (which you shouldn’t be) then you’ll need sudo to get what you’re looking for.

try:

sudo du -a /usr | sort -nr | head -n 20

or

sudo du -mxs /usr | sort -nr | head -n 20

That should show you the top 20 files taking up space.

0 Likes

#5

right now the biggest offenders are:
214 /lib
222 /var
440 /home
1275 /usr

I’ve deleted
/var/share/man
/var/share/doc

However, unable to identify anything else to remove, only 1GB freed up.

Many thanks.

0 Likes

#6

I thought you had 80gb… none of those are anything near that big.

Whats the problem? What does df show?

0 Likes

#7

80gb was the total disk space for /dev/vda1

df -BG
Filesystem 1G-blocks Used Available Use% Mounted on
udev 2G 0G 2G 0% /dev
tmpfs 1G 1G 1G 11% /run
/dev/vda1 78G 78G 1G 100% /
tmpfs 2G 0G 2G 0% /dev/shm
tmpfs 1G 0G 1G 0% /run/lock
tmpfs 2G 0G 2G 0% /sys/fs/cgroup
/dev/vda15 1G 1G 1G 4% /boot/efi
tmpfs 1G 0G 1G 0% /run/user/1000

thanks Joe.

0 Likes

#8

It might be in your root directory!

ls -laSr /

If that doesn’t work

du -mx -d 1 / |sort -n

1 Like

#9

Thanks Joe. The ls -laSr / command is very helpful, it identified old ubuntu kernel files at /boot directory, I then deleted them.

Oddly, the df -BG still indicates only 1GB available even after removing of several large unused files.
Is there any command that is the equivalent of hard reboot (to flash out some disk space)?

0 Likes

#10

Interesting, my post from Mar 8 via email didn’t make it to the forum.

Copied:
Glob patterns (*) do not include hidden directories. If you ls -a /home/me you’ll see a bunch of hidden directories and files (i.e. .ssh .java .bashrc .profile etc)

Try du -mx -d 0 /home/me

And then du -mx -d 1 /home/me

And du -mx -d 2 /home/me

There may also be a discrepancy between df and du - Linux handles open files differently from windows - windows just locks the file and denies access. Linux will allow you to replace or remove an in-use file - HOWEVER, it just removes the directory link to the data, and keeps the file around until all references to it are cleaned up. (Just like Java garbage collection). Real world examples:

  1. There’s a huge log in /var/log that’s growing out of control. (/var/log/syslog for example). You rm the file. You ls, it’s gone. But no space was freed from df. WHAT? Well, your syslog server still has the file open (and is still writing to it!). If you kill -HUP your syslog process, or do a service rsyslog rotate or whatever your distro specific process is to recycle or kill the process gracefully (or just kill -9 it), the space is freed. (And the file is probably recreated with 0 bytes)

  2. You’re upgrading a java application. In windows, all the jars used by the app are locked. In linux, you replace the jars with new versions of the jars. I.e. replace app.jar with another app.jar. The application carries on using the old jar. You don’t see updates. This is actually a GOOD thing most of the time… When you restart the server, it’ll pick up the new changes and the old, replaced files will be freed.

You can see these things in lsof… If you do a lsof -n (which will do EVERY process and be really long) you may see files with the word (deleted) after them , which indicates they still have an open file handle to a file that isn’t viewable topside anymore.

It may be a little confusing to experience but I prefer it to what happens in windows - FILE BUSY, and you have no idea what’s accessing it or how to resolve it. Also, so many times switching git branches failed because windows CF held locks against jars checked into git… Just doesn’t happen on linux.

1 Like

#11

Very helpful, thank Joe.

After running du -mx -d 2 /home/userme I found the following entries, since I don’t know enough about nvm and npm, not sure I can remove them, can I?
11 /home/userme/.nvm/.cache
56 /home/userme/.npm/_cacache

for “kill -HUP your syslog process”,
how do I find syslog process id? (I’ve removed the entire log directory of /var/log )

Thanks again.

0 Likes

#12

ps awwux |grep syslog

It’s probably rsyslog, so service rsyslog restart works.

ps awwux shows you everything running.

As does lsof -n if you had run it like I suggested. :slight_smile:

You can probably delete the directories you’ve indicated, but they’re less than 100mb. Your problem is almost 1000x bigger, so it’s not going to make a dent in what you’re trying to accomplish.

1 Like

#13

Thank you Joe.

I restarted the syslog process and killed it as suggested, then
lsof -n
which indicates tons of processed running or the like.

Your problem is almost 1000x bigger, so it’s not going to make a dent in what you’re trying to accomplish.”,
is there any other way to free up disk space without rebooting?

Thanks again.

0 Likes

#14

Ideally you would have identified the space usage culprit, deleted the file or folder, and when it DIDN’T fix the problem, you’d know what to look for in lsof.

For instance, /var/log/syslog is 5GB. rm /var/log/syslog. df has no effect. Interesting.

lsof -n |grep deleted

Find the process(es) keeping /var/log/syslog open, kill them or restart the services.

df shows freed up space.

At this point, lsof -n |grep deleted and see what’s holding things open. It’s only a hypothesis that /var/log was the culprit, and that rsyslog was holding those files open. lsof will give you the facts.

Note it’ll also tell you the PID, the process, the file, and how big the file is.

lsof -n |grep rsyslog |grep /var/log on my system shows

rsyslogd 928 syslog 9w REG 253,6 3376600 133058 /var/log/auth.log

rsyslogd 928 syslog 10w REG 253,6 33663657 131139 /var/log/snmpd.log

rsyslogd 928 syslog 11w REG 253,6 12631 131178 /var/log/syslog

rsyslogd 928 syslog 12w REG 253,6 3777817 131458 /var/log/commands.log

So rsyslogd is process 928, referencing a REGular file - /var/log/snmpd.log is 33MB.

Which is true:

ls -l /var/log/snmpd.log -h

-rw-r----- 1 syslog adm 33M Apr 7 15:20 /var/log/snmpd.log

I also have a couple deleted files

systemd-l 872 root txt REG 253,0 219272 4380 /lib/systemd/systemd-logind (deleted)

ulogd 922 ulog 8w REG 253,6 9314162 131246 /var/log/ulog/syslogemu.log.1 (deleted)

systemd 85260 jenkins txt REG 253,0 1595792 11866 /lib/systemd/systemd (deleted)

(sd-pam 85261 jenkins txt REG 253,0 1595792 11866 /lib/systemd/systemd (deleted)

I’m guessing you have A LOT at this point. Or at the very least you’re going to have 1 or a couple big ones if there is so much disparity between du -sx / and df .

Joe

1 Like

#15

Awesome, thank you Joe!

0 Likes

#16

Joe,

Running lsof -n |grep rsyslog |grep /var/log I got the following output:
lsof: WARNING: can’t stat() overlay file system /var/lib/docker/overlay2/86ff9f62db806792d48a4128e2774e299a2a291491f400be8c6ef68bef79b2c2/merged
Output information may be incomplete.
lsof: WARNING: can’t stat() overlay file system /var/lib/docker/overlay2/39d03d36edda2628e1cef0c446e9af440b877e0c722d59b8a768d2009a5110ee/merged
Output information may be incomplete.
lsof: WARNING: can’t stat() overlay file system /var/lib/docker/overlay2/cf8a3e7199e7df38f0045764d9a675bb9e343d64d980e8cb8a3916a9faaf390f/merged
Output information may be incomplete.
lsof: WARNING: can’t stat() tmpfs file system /var/lib/docker/containers/659269a15e6c415bfee74308a3aeac6964aafed6733e41f43112ec5f3379e531/mounts/shm
Output information may be incomplete.
lsof: WARNING: can’t stat() tmpfs file system /var/lib/docker/containers/38adab9c892b284d8dad7ce5a04e05f69d119ec44ee54ce7f0cf2bc829290901/mounts/shm
Output information may be incomplete.
lsof: WARNING: can’t stat() tmpfs file system /var/lib/docker/containers/b6598524bde5fb62400107f36b3e1e66d2a871bcea7b971215e18dd6b9f9d98c/mounts/shm
Output information may be incomplete.
lsof: WARNING: can’t stat() nsfs file system /run/docker/netns/43643a5d477a
Output information may be incomplete.
lsof: WARNING: can’t stat() nsfs file system /run/docker/netns/3764d085ed75
Output information may be incomplete.
lsof: WARNING: can’t stat() nsfs file system /run/docker/netns/77318d52ffc9
Output information may be incomplete.
lsof: WARNING: can’t stat() overlay file system /var/lib/docker/overlay2/30f7ca6684a0e036e020dccb199b54156959a8ca3b027d12727fb899dd0f6ae7/merged
Output information may be incomplete.
lsof: WARNING: can’t stat() overlay file system /var/lib/docker/overlay2/45683f00860eadfce84869c5b025250b76e9e6f2bd93b62185332ef7352ece68/merged
Output information may be incomplete.
lsof: WARNING: can’t stat() overlay file system /var/lib/docker/overlay2/be3b4fabbd5f355c5df8a7dd7353316a2bbc9787937f539d81e25d737f206f1f/merged
Output information may be incomplete.
lsof: WARNING: can’t stat() overlay file system /var/lib/docker/overlay2/681d16d2a1c0121b828e9d81a507a796371fd85a6d08ef72e07d04df3fbe3f41/merged
Output information may be incomplete.
lsof: WARNING: can’t stat() tmpfs file system /var/lib/docker/containers/dd0c66841d96d0c293ce9435ea20803b62820847e5f5ab05b1225ea48051cac8/mounts/shm
Output information may be incomplete.

Probably I failed or missed certain instructions in your post.
Thanks.

0 Likes

#17

Ok, I’ve missed " ps awwux shows you everything running."
And now I’ve killed processes owned by syslog.
Question, which option to use to show processes that have the highest VSZ size?

I’ve identified some packages/applications that can be removed.
However, attempt to remove one ran into disk space issue ( i guess such operation requires some minimal disk space as well ), but currently it’s like just a few k left, insufficient to run such operation and I’ve already looking high and low for any potential files to remove.
Further thoughts?

Many thanks.

0 Likes

#18

Make sure you’re running it as root (through sudo as necessary)

1 Like

#19

Thanks Joe. that makes the difference.
Now, sudo lsof -n |grep rsyslog outputs the following:
rsyslogd 27383 syslog cwd DIR 253,1 4096 2 /
rsyslogd 27383 syslog rtd DIR 253,1 4096 2 /
rsyslogd 27383 syslog txt REG 253,1 599328 24061 /usr/sbin/rsyslogd
rsyslogd 27383 syslog mem REG 253,1 47648 2019 /lib/x86_64-linux-gnu/libnss_nis-2.23.so
rsyslogd 27383 syslog mem REG 253,1 93128 1992 /lib/x86_64-linux-gnu/libnsl-2.23.so
rsyslogd 27383 syslog mem REG 253,1 35688 2010 /lib/x86_64-linux-gnu/libnss_compat-2.23.so
rsyslogd 27383 syslog mem REG 253,1 19936 29294 /usr/lib/rsyslog/imklog.so
rsyslogd 27383 syslog mem REG 253,1 33072 29296 /usr/lib/rsyslog/imuxsock.so
rsyslogd 27383 syslog mem REG 253,1 47600 2015 /lib/x86_64-linux-gnu/libnss_files-2.23.so
rsyslogd 27383 syslog mem REG 253,1 23648 29279 /usr/lib/rsyslog/lmnet.so
rsyslogd 27383 syslog mem REG 253,1 1868984 1995 /lib/x86_64-linux-gnu/libc-2.23.so
rsyslogd 27383 syslog mem REG 253,1 18976 2117 /lib/x86_64-linux-gnu/libuuid.so.1.3.0
rsyslogd 27383 syslog mem REG 253,1 43496 2160 /lib/x86_64-linux-gnu/libjson-c.so.2.0.0
rsyslogd 27383 syslog mem REG 253,1 14360 25955 /usr/lib/x86_64-linux-gnu/libestr.so.0.0.0
rsyslogd 27383 syslog mem REG 253,1 31712 2020 /lib/x86_64-linux-gnu/librt-2.23.so
rsyslogd 27383 syslog mem REG 253,1 14608 2001 /lib/x86_64-linux-gnu/libdl-2.23.so
rsyslogd 27383 syslog mem REG 253,1 138696 1994 /lib/x86_64-linux-gnu/libpthread-2.23.so
rsyslogd 27383 syslog mem REG 253,1 104864 2150 /lib/x86_64-linux-gnu/libz.so.1.2.8
rsyslogd 27383 syslog mem REG 253,1 162632 1993 /lib/x86_64-linux-gnu/ld-2.23.so
rsyslogd 27383 syslog 0r CHR 1,3 0t0 6 /dev/null
rsyslogd 27383 syslog 1w CHR 1,3 0t0 6 /dev/null
rsyslogd 27383 syslog 2w CHR 1,3 0t0 6 /dev/null
rsyslogd 27383 syslog 3u unix 0xffff8800370c1400 0t0 8029 /run/systemd/journal/syslog type=DGRAM
rsyslogd 27383 syslog 4r REG 0,4 0 4026531995 /proc/kmsg
in:imuxso 27383 27385 syslog txt REG 253,1 599328 24061 /usr/sbin/rsyslogd
in:imuxso 27383 27385 syslog mem REG 253,1 19936 29294 /usr/lib/rsyslog/imklog.so
in:imuxso 27383 27385 syslog mem REG 253,1 33072 29296 /usr/lib/rsyslog/imuxsock.so
in:imuxso 27383 27385 syslog mem REG 253,1 23648 29279 /usr/lib/rsyslog/lmnet.so
in:imklog 27383 27386 syslog txt REG 253,1 599328 24061 /usr/sbin/rsyslogd
in:imklog 27383 27386 syslog mem REG 253,1 19936 29294 /usr/lib/rsyslog/imklog.so
in:imklog 27383 27386 syslog mem REG 253,1 33072 29296 /usr/lib/rsyslog/imuxsock.so
in:imklog 27383 27386 syslog mem REG 253,1 23648 29279 /usr/lib/rsyslog/lmnet.so
rs:main 27383 27387 syslog txt REG 253,1 599328 24061 /usr/sbin/rsyslogd
rs:main 27383 27387 syslog mem REG 253,1 19936 29294 /usr/lib/rsyslog/imklog.so
rs:main 27383 27387 syslog mem REG 253,1 33072 29296 /usr/lib/rsyslog/imuxsock.so
rs:main 27383 27387 syslog mem REG 253,1 23648 29279 /usr/lib/rsyslog/lmnet.so

Question, how do I kill/remove/get rid of entry like
rsyslogd 27383 syslog mem REG 253,1 1868984 1995 /lib/x86_64-linux-gnu/libc-2.23.so ?

sudo kill 27383 or
sudo kill 9 27383 won’t suffice.

0 Likes

#20

You don’t. Not if you want the system to function. Those are shared libraries that every program is going to need to run. If you remove libc literally NO PROGRAM on your box will function.

Is there anything on that list that says (deleted)? I don’t see anything. Move on.

I gave you that command to illustrate on my system how rsyslog has logs open. Did you run the grep deleted version? That one would be relevant. Read back through what I said again.

At this point i think you need to ask yourself why you’ve spent weeks trying to solve this problem without rebooting… when a simple reboot would fix it. This is clearly beyond your understanding of Linux, and designing systems to work without rebooting is a horrible idea - you WILL need to reboot for an OS update at some point, so plan for it, don’t avoid it.

I.e. read up on the Chaos Monkey philosophy netflix has.

0 Likes