Is there a straightforward, simple, clear, tested and proven way to install a bug-free version of Lucee anymore?

Putting aside all the criticisms, I’d like to focus on your last note and your observation that all attempts to get things going have eventually failed to run well (out of memory errors), even when (it seems) things may have worked at first. (Or do I have that wrong?)

Thinking back on your last good, working variant, isn’t it possible that Tomcat (or whatever you’d deployed Lucee on) had been changed by you (even perhaps several years ago) to have a higher level Java heap/max memory pool size?

Maybe all that is wrong is that every new variant you’ve implemented had had the default (which is small). Might you still have ANY copy anywhere (even if just a backup, not running) of how things were configured when things worked?

If not, let’s consider the contention that “nothing has changed” about your code, since things last worked, such that you contend it MUST be something new and different in Lucee (or some component it uses, like mod_cfml). Again, for now, I want to put aside consideration of your many other disappointments with the current state of Lucee and its docs.

Might it just be that your new implementations are suffering from excessive traffic that might be brought on by hackers, thieves, and other miscreants, on top of search engine crawlers of many shades? Perhaps the former have picked up in volume coincidentally during this move to a new platform that you’ve been trying.

It’s not ludicrous to consider: I’ve helped many folks in recent weeks who found a tremendous increase in automated traffic which we found was due to requests all reflecting new user agent headers they’d never seen before. (And whether using Lucee or CF, and whether fronted by Apache, IIS, or otherwise, there are solutions to block such unwanted user agents. And it’s worked remarkably well, though some might think the bad guys would just change user agents. Some just can’t be bothered.)

Now, I realize you may say this is more of that mumbo jumbo server talk that you’ve never had to understand. Or you may grok it and want to contend it can’t be this because of reason x, y, or z.

But I will say that since you have fusionreactor in place, we can find with 100% clarity where the problem is: whether it’s about requests piling up, or high rates if requests, or heap (any of its parts) running out, or Gc’s being at issue, and on and on.

I get it, though: you’ve not been able to make heads or tails of it as you’ve assessed it. I totally understand that. In fact I created a 4-part intro video series a few years ago walking through how to use it to troubleshoot such problems. Yes, it will take a few hours of your time. If you’re really wanting to know how to leverage it for such challenges (whether with lucee or cf or other things it can monitor), it will get you on a very good footing. Just Google : arehart fusionreactor playlist, or see this url:

Or as others have mentioned, folks who prefer direct assistance can get it from folks like myself. I realize many people who use Lucee for free are not interested in paying for help. That’s where the community and docs usually help folks otherwise, or grit and determination win out when those fail them. Yours is a sad tale indeed, and you’ve raised some points worth considering, but you can’t push rope. Time will tell if anything is done or if you’ll win many folks to your perspective. As you said, some folks are satisfied with things as they are, or feel anyone who feels otherwise can “do something about it”. That’s a hornet’s nest you’re poking.

That’s why I’ve chosen here to focus on the more technical aspects of your server troubles, and about helping you better leverage fr if you care to. For now, I’ll leave it at that (hoping it helps you, without getting stung myself).

1 Like

Thank you for your reply, @carehart. To answer some of your thoughts:

  • I still have my old server, just sitting there, doing nothing. I am keeping it around until I have about 3 months of stable operations on a new server. “Why not go back to the old server?” I can’t. The old server is a security risk, but it’s isolated and just in hibernation.
  • My last working variant had zero modifications to Tomcat of any kind. There is no universe where I would even have an understanding of how to modify Java heap/max memory pool settings, much less a desire to actually implement something I don’t understand on my server. Whatever came from the default installation profile is what was successfully running for multiple years.
  • I’m able to view traffic for the last six months and there are zero changes to the patterns. No new crawlers, no hacking/thieving, etc. Same old boring server. The overwhelming majority of traffic comes from my scheduled tasks completing dumb little, highly-efficient tasks. Regardless, I have fail2ban protections for weird behavior.
  • I will look into the playlist and see if it brings any insight. I’ve already spent dozens of hours on this issue, what’s four more? I’ll perhaps, though, return to my earlier point about the level of difficulty and time commitment this very basic installation seems to be requiring – and whether that complexity and challenge for new entrants is a good idea for Lucee’s future.

I appreciate your response and willingness to help me better understand these issues without accusing me of being an intemperate monster maligning the community and unfairly complaining about something I’m getting for free.

Some people get into woodworking, some people garden, and some people write dumb little CFML apps. We all have our things. Right now, I can’t do what I love, I’ve spent hours and hours of my life on this install to no avail, and it’s deeply sad. That people would take this as “complaining” is really hard for me.

Lucee used to be accessible and easy, now it’s not. I recognize that the community’s previous commitment to low-barrier access to installation and configuration guides was a priority and now it’s not. If I hadn’t invested 20+ years into CFML, I’d have left in minutes with these troubles. But I have nowhere else to turn.

By the way, the only notable place I see something that seems/feels weird in FusionReactor, to me, is in the “Memory” section under “View Heap,” where I see the classname byte[] taking up the overwhelming majority of “Live Size %” and a number that seemingly fluctuates anywhere between 20 MB and 150 MB over the course of several seconds, and then goes back down. When I try to examine byte[] further, I get some really useless stuff, like layer upon layer upon layer of stuff like this:

byte[]	193,629	22,273,120 (100.0%)
Referencejava.lang.String.value	184,403	10,691,608 (48.0%)
Referencejava.util.concurrent.ConcurrentHashMap$Node.key	33,502	1,685,944 ( 7.6%)
 Referencejava.util.concurrent.ConcurrentHashMap$Node[]	26,047	1,324,912 ( 5.9%)
Referencejava.util.concurrent.ConcurrentHashMap$Node.next	7,455	361,032 ( 1.6%)
Referencejava.util.LinkedHashMap$Entry.key	15,718	967,384 ( 4.3%)
 Referencejava.util.HashMap$Node[]	11,975	738,768 ( 3.3%)
Referencejava.util.LinkedHashMap$Entry.before	3,078	199,800 ( 0.9%)
Referencejava.util.HashMap$Node.key	19,204	896,616 ( 4.0%)
 Referencejava.util.HashMap$Node[]	13,860	686,248 ( 3.1%)
Referencejava.util.HashMap$Node.next	3,982	176,240 ( 0.8%)
Referencejava.util.HashMap$Node.value	11,902	859,400 ( 3.9%)
 Referencejava.util.HashMap$Node[]	6,931	453,520 ( 2.0%)
 Referencejava.util.Map$Entry[]	3,303	293,824 ( 1.3%)

Really, none of this means anything to me, there’s nothing in my CFML code that would even remotely reference something like this. I have opened the little triangles on every line in these dumps and they make no references to any existing code, page titles, tags, anything. Just byte[] and a bunch of java names that mean nothing to me. I mean, “HashMap$Node[]”? Ok. Not a big help. Just endless long stacks of stuff about “HashMap” and “ConcurrentHashMap” and no matter how many little triangles I dive down, they just end something like “JNI Local” or “Other”. And more HashMap. Oh, how I wish it gave a reference to a CFML tag! Or a page! Or anything! I have never once had to learn a thing about java class names in my entire life. I don’t know what any of them mean!

Somebody mentioned configuring Lucee to do a “Heap Dump” whenever the server encounters an OOM event. But I can’t find how to do that and in the few examples I found online, the server never creates the heap dump, so those don’t work. I understand you can analyze a heap dump for clues but … Lucee isn’t making a heap dump. However, if it’s anything as useful as the “heap snapshot” I’m viewing in FusionReactor, well, I’m not sure how seeing hundreds or perhaps thousands of iterations of the phrase “HashMap” is going to help me understand anything.

These are the struggles I am running into, this need to apparently become an advanced, professional-level memory leak debugger just to get my previously working and completely unaltered ultra-stable website to work in a new installation. Somebody said I need to learn Java. Oh, OK! Piece of cake, let me get right on that. I didn’t need to know it for 20+ years of CFML development, but now I do? Just to make my server setup work?

Something is lost here. A more cynical part of could think that a sinister cabal of Lucee developers designed seemingly easy-to-use software to lure in novices and amateurs, get them hooked into it, and then gradually force them into having to utilize paid consulting services to keep their code running as the years go by. That’s a joke, but I’d be lying if I said that wasn’t how things were starting to feel.

I seriously this this tread / trainwreck could benefit from two things:

  • @gooutside: we get it, you’re frustrated. No need to keep repeating that. It’s not helping.

  • Everyone reacting to same, especially those going “if you don’t like it do a pull request”, or “you don’t even help out here”. Seriously? STFU. You’re not helping, and this sort of thing is never helpful.

@carehart nice work ignoring all of the above, and trying to help.

Can we draw a line under all the histrionics and virtue signalling and get on with sorting this out?


@gooutside you did not feed back on my suggestion of downgrading Lucee to a more-likely-to-be-stable 5.4.4.38, and see if that helps? Do it, check, report back.

As for the install docs for how you did stuff in the past not being maintained… Things in IT move quickly. Approaches to doing things change, and when certain practices are superceded by new ones, the docs for “the old way” will cease to be maintained, largely cos the benefit from doing so is outweighed by the effort. This is how it is. Get used to it. This is webdev. It evolves quickly. You need to keep up. This is on you.

Tomcat and Java themselves will have been updated along the way, and they might be less tolerant of edge-case practices that used to work and now don’t work so well. This is normal. You might need to change your code. This is why I said “it’s gonna be your code”. This does not mean your code has changed and you’ve introduced bugs. It means there’s a lot of moving parts and some under-the-hood libs, modules, apps, services might have changed, requiring a change in approach in your code.

Thanks for confirming there’s no traffic pattern changes.

Instead of installing Lucee directly on the Ubuntu VPS, have you considered using Lucee’s Docker image instead? Yes I realise this will require you to get up to speed with Docker, but there’s not much to it, and it’s all well documented. I was able to teach myself to be productive with it in a coupla sessions, and I now “own” a bunch of production Lucee apps running on Docker, making my boss a bunch of dosh. And I hate doing this stuff; I just wanna be a dev, not someone dicking around with servers. But needs must, right? Almost all of the dicking-around you’d need to do to configure Lucee, Tomcat, Java, Nginx (not Apache, but I doubt you care) has been done for you. It seriously “just works”.

^^^ This is an example of things moving on in the industry. I would never dream of installing Lucee itself onto a box / VPS any more. I’d use Docker cos it’s where my expertise (such as it is) lies. Others here will run a server via CommandBox. I have little experience with the latter, but it is well documented and it’s really bloody easy. It’s a solution designed for ppl like you (no diss) who don’t necessarily have the expertise in modern app server config strategies.

As for getting to the bottom of the memory leak thing. What environment is the new set-up in ATM? I presume it’s still in a lab environment whilst you get it stable? So have you load-tested parts of the app to try to isolate what part of it is causing the memory problem? This should be easy enough to do.

FWIW I agree with your FusionReactor comment that drilling down to find out that Byte arrays are taking all the heap space is unhelpful. However FR also gives a lot more metrics than that, and you should be able to watch the shape of the traffic to give an indicator as to what’s kicking things off.

You are going to have to do some work to help yourself here. Sorry, but that’s the way it is. There are two options: get someone else in to help, or do it yourself. You have discounted the former, so it’s gonna need to be the latter.

Complaining (rightly or wrongly / validly or invalidly) that docs are out of date is not going to move you forward, so… we’re done with that shit now right? And to the trolls (which is what you are): if you can’t help sort out the problem… go find another bridge to play under, OK?

Let’s focus on getting your problem sorted out.

It might actually be an idea to abandon this thread as it’s a mess, and raise another thread along the lines of “I have [this specific problem], I have [done this troubleshooting], [this] is some relevant info, and [these] are my findings”. Keep it tight and on-point and we can try to help.

2 Likes

@gooutside What @AdamCameron just said is really true. Shouting that all out was totally wrong of my part and doesn’t help at all. Sorry for being very emotional. I should have slept over before clicking the “send” button. That was wrong of my part. Please apologize, and lets make your app get working. I promise to create new guides as soon as I can.

2 Likes

Thank you for your suggestions, @AdamCameron. Let me try responding to some of them, as best as I can.

This is the first thing I tried to do. I have abandoned any attempt to use Lucee 6 completely, I’ll wait a long time before I go down that road. I went back to the last known working version of Lucee that was stable for my old server, and the problems continue.

Yes, I know. Like I said, I’ve used the internet as a teacher for 20+ years of CFML development, I’m aware that things evolve. Without belaboring the point even more, I am just trying to note that the current state is not conducive to low-barrier entry points. It’s probably also worth noting that this is a problem that happens after multiple attempts on various brand-new VPS instances.

I’ve heard “Docker” around the internet for some time, and have looked into it. When I last looked, it seemed considerably too complicated for my needs, but perhaps that’s changed? Here’s what seemed to be an issue, before:

  • Considerably more expensive than my $10 AWS Lightsail VPS.
  • Difficult for file management (my Ubuntu/Lucee server has an FTP server that accepts webcam images, processes and organizes them, and coordinates with S3)
  • Disk space, generally, seemed insufficient.
  • Instead of hosting MySQL on the VPS, I’d need to start up an RDS instance, which is transformationally more expensive and overkill.
  • And, of course, the “learning curve” I’d need to engage in, which is not trivial, and ultimately I worry it wouldn’t be all that helpful if Docker just isn’t even a good fit for my needs.
  • Seems to have a lot of relationships with GitHub, which is another thing to learn how to use?

I wish I could sound smart here, but I can’t. I know that this is not common practice, but it has served me well for 20 years: I develop and operate on the live server. I am well aware that this is uncommon and even discouraged. The server is live, the problems are live, because the server processes live data, and I don’t know how to have live data (like incoming webcam images) and API calls (to data sources) into my MySQL databases take place without it causing enormous conflicts when I switched from “test” to “live.” As best as I could imagine, you’d need to set up intermediary file handling before images reached the live and test servers, for example. That’s a level of complexity that is just too much for my basic, simple project.

I have disabled all scheduled tasks and re-enabled them one by one, none of them point singularly to a failure point. There is no “common denominator” on the failure time. It’s not tied to any particular page load, or to any particular sequence of events. It just happens at some random point, sometimes within 15 minutes, sometimes within 7 hours. No pattern. Just random.

Any ideas on a place to keep an eye on things? By the way, after every crash, all of the data in FusionReactor resets and disappears, and I’m not sure what to even do about that.

I appreciate your ideas. I solve 99 percent of my issues myself; reaching out is uncommon and a last resort for me. I intend to fix this as best as I am able.

Well, I’ve spent many, many hours just staring into FusionReactor and Eclipse MAT to try and understand what’s happening. I revisited this thread and saw this comment, and it set me off, a little, that I could “watch the shape of the traffic” to understand what is “kicking things off.”

One thing FusionReactor seems really quite adept at is letting me know when a process is causing a big problem. As in, “uh oh, this page takes 10 seconds to run, and here’s why.”

What FusionReactor is not telling me is why my memory usage just goes up, up, up, up, up over the course of a few hours and then … crash. Here’s a very, very, very typical view of a one-hour increment log archive for FusionReactor.

They’re all pretty much like this. There’s no big “spike” in memory right before the end, no big “uh oh!” page load that just comes up. No “kicking things off,” basically. It’s just … slow, slow, slow memory increase until the crash. No single page is a common culprit. The times are random, sometimes 20 minutes apart, sometimes 2 hours. Nothing triggers anything. It just happens, random times.

What I’m wondering is … how can FusionReactor or Eclipse MAT help me understand why the memory is filling up slowly, over time. The server is active. It has lots of scheduled tasks that run all day and night. They’ve never once been anything even remotely close to a memory issue on my old server. They’re not complicated tasks. There are lots of MySQL calls, some file processing.

And, again, I am using the same version of Lucee and the same version of Java as my old server. Memory buildup was never an issue before. Nothing in the CFML code has changed whatsoever. I installed MySQL 8, but that seems unlikely to be the issue unless there is some weird thing with Lucee’s memory usage in MySQL 8 vs. MySQL 5.7.

I can not figure out why my memory just keeps building, building, building, building, building over the course of time. Does anybody have ideas on how I can use FusionReactor or any other tool to better understand why things are just burying themselves like this? (I tried using the “leak suspect” tool and I got more useless info about byte[] and HashMap)

Also, @carehart, I watched much of your video on post-crash troubleshooting and, regrettably, it seems geared towards identifying problem requests that cause sudden problems and not to the slow accumulation of memory usage over the course of several hours.

And, back to the frustrations around installation, is there any possibility that the weird, inscrutable connections and linkages between Apache/AJP/Tomcat/Lucee are causing some kind of scenario where the memory just builds up over time? That is one of the only differences between the old server and the new.

Somehow things are just staying in memory. I’m not asking them to. Lucee is choosing to retain them. I don’t know why. And I don’t know what Lucee is retaining except some “HashMap” and byte[], whatever that means. I especially don’t know why the same version of Lucee and the same version of Java would cause this kind of behavior to happen.

I think I see a key new fact in your reply : the rise in memory is NOT in the heap, but rather in “non-heap”, specifically. So I’m curious: when you originally reported getting outofmemory, you left it at that. Did it say anything more? Heap space? Metasspace?

I’m guessing your issue is the latter. And you can confirm things via that fr archived metrics page, and specifically it’s memory category (on the left in black, not shown in that screenshot), then choose the metaspace log. Does THAT show this same constant move upward? If so, there’s your culprit.

So first, what controls that? A maxmetaspacesize argument in the jvm args. Do you see one? If using lucee on windows as a service, you’d view/change it with the Tomcat9w.exe program within lucee’s tomcat folder. You can see the specific path using the properties of the windows service itself.

Ugh, hit a button that submitted this before i was done… Will finish in another reply.

But I’m thinking you may not see a limit. The graph you show doesn’t indicate one. The metaspace log will show 3 values of there is a max set.

If instead there’s none, then the jvm is free to take memory from all available is memory. And the problem maybe that it’s running out (your os memory). You COULD set a max, but that’s not going to SOLVE the problem.

As you would rightly ask, the question would be what’s CAUSING the rise in metaspace. And that’s not answered where you’re looking, doing heap analysis.

Instead, look to FR’s tracking of class loading (another graph). Is that rising? If so, there are additional techniques to track WHAT those are…and that (or merely the info above clarifying that it may be a metaspace problem) may be what the lucee folks need to figure out what’s been hurting you.

I’m afraid I have to run now for a meeting so must leave it at this. I could say more, which you might value or you may feel is all just more server mumbo jumbo. I realize you just want the madness to stop. I hope the new clues you and I have shared may aid others in helping you.

Good morning, @carehart. Thanks for responding. Here’s some of my replies:

I’m using Lucee on Ubuntu 22.04.3, but, regardless, I haven’t altered any memory settings beyond Lucee’s stock configuration (nor did I do this on my previous functioning server). My memory settings are whatever Lucee was configured with from scratch which was absolutely sufficient in the past.

The system memory (when I view it through htop) is, indeed, filling up piece by piece to the upper limit, which, again, is weird.

Correct. I would rather not fiddle with memory/GC settings because I’d like to solve the problem with the steadily increasing memory until a crash. The problem is somehow in the way Lucee (and, I still think, some kind of issue with Apache/Mod_cfml/whatever) is filling up that memory.

Yes. It continuously rises throughout the life of the server, just constantly rising. Here’s something from the FusionReactor (notice very little classes unloaded):

This is just common across the board, just constant class loading, very little if no class unloading.

Perhaps this might be of some interest, I don’t know, I changed my “application” log to “DEBUG” and looked at my logs, and I have lots and lots of Request start: and the page, and then lots of load class from binary [{my pages here}]. I don’t know if this means anything.

And, lastly, I just want to point this out, because it’s things like this that have me super concerned that the mod_cfml/Apache/AJP/whatever connection is part of the issue, here. (I could be totally wrong, too!) We are all familiar with the “Web contexts” at the bottom of the Server administration page.

Here is one for my old server (nginx-ubuntu-lucee):

And here is one for my new server (default install with Apache and mod_cfml):

Take careful note – the new server shows port :443 but an http:// for the URL. The old server shows https:// and :443 with the domain name.

I do not host anything out of the /var/lib/tomcat(8)/webapps/ROOT folder. That’s never bothered me before, it just shows up, seems harmless.

I don’t even know where to begin how to adjust these settings, because it seems like Mod_CFML just does it on its own. Sometimes, when the new server restarts, it will show port :80 instead of :443. Eventually it sometimes switches to :443. Again: no idea how that happens or why.

I worry because this seems weird to me and I have no idea if this could be contributing to Lucee behavior that I’m seeing, like thinking it’s opening classes but never closing them becuase, of course, “http://www.mydomain.com:443” is in no way a valid URL.

I just thought I’d throw that out there as a possible contributor.

Depending on your application the classes loaded growing and not being unloaded might be a clue

Check out [LDEV-4739] - Lucee

and try a snapshot build with the fix in

I’ve tried a few different versions, 5.4.3.16, 5.4.4.38 – they all have the same issue. Would anybody happen to know if the Apache2 configuration thing is a possible issue? I’m just really curious if the URL for the “Web Context” being structured in such a way might contribute to this behavior?

Could an Apache2/Mod_cfml/Lucee configuration issue be contributing to classes growing and growing and growing and not getting unloaded?

Did this ever get resolved for you, @gooutside?

I ask not only since there was no further discussion, but also I see that in the ticket Alex pointed to (before your last comment) it seems this issue (or one very similar) was resolved with a fix implemented in 5.4.4.32–if I’m reading things right.

But your last note had said you’d tried even 5.4.4.38 and still had the problem. If that’s so, you may want to add a comment to that ticket–and here, to let us know where things stand. It was certainly a very interesting troubleshooting challenge.