Just thought I would relay an interesting day I had tracking down a website problem.
I have 2 separate Hyper-V VMs on my server. One is a Lucee site and another is a Wordpress site.
I woke up that morning to WebGazer emails telling me my Lucee site was going up and down about every 5-30 mins for random amounts of time. I went to my home office and VPN in to my computer at work and website was working. Got my laptop and connected from Home internet and website was down when WebGazer reported random up and down. Internet at work computer was working fine and my Wordpress site was working fine from Home. Weird…
- All logs (Tomcat, Lucee, Apache, Ubuntu) were fine.
- Rebooted Lucee website VM just in case. Didn’t fix.
- I looked at firewall (did a firmware update that was needed for something else) but didn’t fix problem.
- Rebooted internet modem in case. Didn’t fix.
- Updated Tomcat and Java (Tomcat had a small CVE from current version). Didn’t fix.
- Updated Lucee. No fix.
- During each of these sometimes the site would work for 5-30 mins and then go down for Internet. Never had a problem from my Work computer.
- Banged head repeatedly
- Ran tcpdump on Ubuntu VM and didn’t see anything weird.
- Banged head repeatedly some more
7 hours later finally found culprit!!!
Our DHCP sever sits on a Windows SBS server (upgrading to Server 2019 this year). I had ran one of the scripts to renew a self sign cert last week. It had apparently reset the default DHCP exemption for IPs without me knowing and gave an IP for my site to a phone. When someone used the phone, traffic from the Internet got re-routed to it and timed out. My work computer is on a switch with the web server so I guess that is why I never had problems from my work computer. Fixed exemptions, cleaned up DHCP and DNS. Moved website to clear IP and re did firewall route. Took rest of day off to not think anymore.