A review of Lucee start and restart times on Tomcat 8


#1

I created a bunch of time measurements throughout the Lucee loader and core in order to measure what is taking all the time and found that nearly all of the time occurs outside lucee’s code or is doing I/O operations. I know lucee records some of the measurements in its own logs, but that wasn’t recording the full amount of time Lucee code is running during startup. Now that I have isolated the hot spots, I may work on those areas or eliminate some of these features.

When we click restart engine in the Lucee admin, the code tries to stop and remove the OSGi bundles before running a reduced version of the startup again. In addition to that looking unnecessary to me, the sleep(5000) looks wrong considering it was only taking 150ms of time to finish that code and it could use a loop with logic instead if it really needs this code to run. I don’t think Felix even requires you to keep reloading them like this, but maybe you have an edge case to worry about. I think Felix bundle installation is more of a one time occasion, and then you only have to do it again on upgrades for specific bundles. Lucee shutdown becomes like 150ms instead of 5150ms now with no consequence that I can see yet, which is pretty awesome, since this saves me significant time iterating on changes to Lucee now.

The full Tomcat/Lucee in windows takes about 10 seconds to shutdown, and about 4 seconds to start again. Sometimes more then that. Also, anyone using multiple contexts would experiencing much worse startup performance probably. I switched to developing everything as one context over 7 years ago to avoid that problem, but they could at least be made to load in parallel if others are using this feature.

1500ms of the start up time is happening after lucee CFMLServlet starts, so it seems like the majority of the wait to start is happening in Tomcat and servlet jars and we’d wait at least 2 seconds even if Lucee did nothing. This startup time also includes running and finishing the first CFML request, not just starting the server. I’m also using a fast SSD with a core i7 cpu, so it could be worse then this for others.

Felix is starting and loading all the OSGi bundles in about 500ms. 200ms of that could probably done in parallel saving up to 150ms. Most of it occurs in Felix outside of Lucee’s code unfortunately. Even if you delete all of Lucee’s extensions in a custom build, I think it would only start up to 100ms faster.

I think some of these startup and shutdown processes could verify if they are needed more, instead of always doing them for OSGi loading and the XMLConfigWebFactory and deploying server/web context files. At a glance, I think it could be made up to 4 times faster with the same feature set. With some reorganization, perhaps it is possible to get the OSGi, XML and Context loading systems all running on separate threads in parallel since they are each around 500ms+. Additionally, I think some of the code could be deleted or made only to happen during upgrade/install events and even then, it is rare for all the files to need to change.

On my version, I might not even have most of these features that slow down start-up eventually. I think there only needs to force a cfclasses and temp directory to exist for the language to work, and I don’t want the Lucee admin, the cfml functions, or any of the xml code later. I think the majority of the files should be trusted and already exist on repeat startups. It could wait until a user explicitly requests an upgrade before trying to repeat some of these steps, or upgrades could be handled outside the project with a command line tool instead. Sometimes tracking md5 hashes or date/size of the data is all you need to eliminate lots of processing time.

After the CFML engine is loaded, it seems like the first CFML request takes another 200ms to 300ms startup time for a hello world kind of page that usually takes less then a millisecond. Some of that is probably servlet and initialization work that might be able to sped up or eliminated.

A clean start would probably take under a second to be able to run CFML code if there was no servlet, no XML config system, no dependencies and no OSGi for the simplified core language, which is would be 3 to 5 times faster then the full Lucee feature set allows.

Most of these startup features are there in order to make Lucee work in multiple language modes, the ability to extend it externally easier and to make the Lucee admin able to work. The core language itself is much faster/simpler to build/deploy/start if you forego these features and look at developing your own fork or looking into what I’m doing with Lucee.

Here is the raw log of measurements I took for a clean start of my modified version of Lucee 5.3.2.16. I made up some of the names based on the surrounding code. These don’t include the time for tomcat to start. Each line is measuring the time spent since the previous log item in sequence. I don’t think threads are used anywhere in startup yet.

1ms for CFMLServlet init start
0ms for CFMLServlet init Servlet init end
5ms for CFMLEngineFactory getInstance begin
3ms for CFMLEngineFactory getInstance after new factory
0ms for CFMLEngineFactory getInstance after readInitParam
0ms for CFMLEngineFactory initEngine begin
1ms for CFMLEngineFactory after patch check
1ms for CFMLEngineFactory initEngine before loadBundles
0ms for BundleLoader loadBundles begin
3ms for CFMLEngineFactory before new Felix
70ms for CFMLEngineFactory after new Felix
182ms for CFMLEngineFactory after start Felix
0ms for BundleLoader loadBundles after close all bundles
54ms for BundleLoader loadBundles after addBundle set 1
1ms for BundleLoader loadBundles after addBundle set 1
6ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
4ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
4ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
4ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
1ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
4ms for BundleLoader loadBundles after addBundle set 1
4ms for BundleLoader loadBundles after addBundle set 1
4ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
1ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
1ms for BundleLoader loadBundles after addBundle set 1
3ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 1
1ms for BundleLoader loadBundles after addBundle set 1
2ms for BundleLoader loadBundles after addBundle set 2
10ms for BundleLoader loadBundles after addBundle core
1ms for BundleUtil.start begin
84ms for Bundle.start before bundle load wait
0ms for Bundle.start bundles loaded
12ms for CFMLEngineFactory initEngine after loadBundles
127ms for CFMLEngineImpl getConfigServerImpl begin
1ms for XMLConfigServerFactory begin
14ms for XMLConfigServerFactory after loadDocumentCreateIfFails
1ms for XMLConfigServerFactory after getDocumentElement
152ms for XMLConfigServerFactory after new ConfigServerImpl
18ms for XMLConfigWebFactory load begin
19ms for XMLConfigWebFactory load after deployWebContext
0ms for XMLConfigWebFactory load after deployWeb
1ms for XMLConfigWebFactory load after loadConfig
78ms for XMLConfigWebFactory load after many load operations
7ms for XMLConfigWebFactory load after loadLib
110ms for XMLConfigWebFactory load after many load operations
47ms for XMLConfigWebFactory load after many load operations
41ms for XMLConfigWebFactory load after many load operations
9ms for XMLConfigWebFactory load after many load operations
0ms for XMLConfigWebFactory load end
0ms for XMLConfigServerFactory after load
3ms for XMLConfigServerFactory after createContextFiles
0ms for XMLConfigServerFactory end
0ms for CFMLEngineImpl getConfigServerImpl after newInstance for the xml configServer
2ms for CFMLEngineFactory initEngine before getUpdateType
0ms for CFMLEngineImpl getConfigServerImpl begin
0ms for CFMLEngineFactory initEngine after getUpdateType
0ms for CFMLEngineFactory initEngine end
0ms for CFMLEngineFactory getInstance after initEngineIfNecessary
0ms for CFMLServlet addServletConfig start
28ms for CFMLServlet loadJSPFactory start
0ms for CFMLEngineImpl getConfigServerImpl begin
1ms for CFMLEngineImpl getConfigDirectory end
8ms for CFMLFactoryImpl created
37ms for XMLConfigWebFactory load begin
2ms for XMLConfigWebFactory load after deployWebContext
0ms for XMLConfigWebFactory load after deployWeb
0ms for XMLConfigWebFactory load after loadConfig
13ms for XMLConfigWebFactory load after many load operations
1ms for XMLConfigWebFactory load after loadLib
27ms for XMLConfigWebFactory load after many load operations
11ms for XMLConfigWebFactory load after many load operations
19ms for XMLConfigWebFactory load after many load operations
4ms for XMLConfigWebFactory load after many load operations
1ms for XMLConfigWebFactory load end
4ms for CFMLServlet loadJSPFactory end
0ms for CFMLServlet init context(s) end
0ms for CFMLEngineFactory getInstance after addServletConfig
0ms for CFMLEngineFactory getInstance(config) end
0ms for CFMLEngineFactory readInitParam end
0ms for CFMLEngineFactory initEngine end
0ms for CFMLServlet addServletConfig start
0ms for CFMLEngineFactory addServletConfig end
3ms for CFMLServlet init CFML factory instance end
90ms for CFMLServlet service start
0ms for CFMLEngineImpl getConfigServerImpl begin
0ms for CFMLEngineImpl getCFMLFactory end
0ms for CFMLEngineImpl _service begin
36ms for PageContextImpl created
40ms for PageContextImpl initialize start
5ms for PageContextImpl initialized
0ms for CFMLEngineImpl _service after create PageContextImpl
0ms for CFMLEngineImpl _service before Request.exe
1ms for PageContextImpl execute begin


#2

So far, I found there was very little that can be done to improve Felix startup.

The hidden feature made for this ticket makes it so that Felix loads the bundles itself on startup:
https://luceeserver.atlassian.net/browse/LDEV-1757?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

All the individual addBundle calls become 0ms after setting those options to “none”, which means the option just changes which project does the initial Jar loading, instead of improving performance. I also think felix might not be loading them in parallel, but even if they did, it’s less then 30ms faster when I tried to do it. I have a feeling Felix might prevent parallel loading of bundles internally. There doesn’t seem to be anything better then Felix.

Then for the loadAvailableBundles, I made it read the jar manifests in parallel, which only saves 30ms, but that’s several times faster at least.

I can’t get the OSGi stuff any faster then around 300ms without deleting features and nearly all of that is happening in the Felix project, so I’d have to look at their code to do anything more.

There are about 50 bundle jars needed for Lucee to start. However, the lucee loader is already running before those 50 jars have to load, so it seems that startup has do more parallel work in other ways in the loader project to go faster. I’m going to try moving some of the slower I/O startup features from the core to the loader and make them run in parallel. Then the core would only need to access the data already loaded which may cut out a good chunk of time. These might be new simple data containers in order to keep the implementation mostly in the core still.


#3

I don’t think that’s correct. I made that change for CommandBox because otherwise, Felix was deleting the entire bundle cache off of disk every time Lucee started and then re-creating it from scratch which took a great deal of time. (This is not the Felix default, but rather how Lucee configures it by default) I’m not aware of the “storage clean” settings changing who loads the bundles.


#4

I understand what you are trying to say, but what I said is still true that felix.start() loads the bundles when the cache is already there.

The measurements are showing felix-cache creation time is almost undetectable, under 30ms for me. If the commandbox stuff runs differently, I wouldn’t know. If all the bundles had to reload, it takes a few hundred ms extra for me.


#5

I’ve been testing all these lucee startup changes in Windows, and I’ve continued to work on this all of today.

I was able to split a lot of XMLConfigServerFactory and XMLConfigWebFactory into separate future threads using multiple threadpools. Many of those steps take many milliseconds so it should make it a little faster. I’ve got over 30temporary threads in an executor service now. It’s so random slow/fast, but sometimes it looks about 300ms faster. I don’t think it can be made any faster anymore without rewriting the code itself to do less, or to synchronize less.

Felix cache has to be wiped whenever the patch version changes and there isn’t a feature to detect that yet, so you have to leave felix-cache deleting on in order to be able to update lucee at the moment. I think cfadmin restart should be able to override the setting temporarily at least, so we have fast start-up when there isn’t an upgrade. When developing lucee, it has to clear felix-cache or you can’t see new code, so this isn’t really possible to turn it off without rewriting how it works some more.

Restarting Lucee engine takes just over 1 second, which is super fast compared to a tomcat stop/start.

Maybe if all these objects could serialize and reload themselves, it could load faster, but I don’t think they are setup to be able to be serialized yet and that’s more complex since there are a lot of resources and file system checks in this code, there is a mixture of read/write operations. The worst ones are reading those 6 huge TLD and FLD files. Perhaps 30% of startup is just xml to object translation stuff to get the language mappings setup. If those things were serialized and reloaded as java objects, perhaps that would be better on repeat starts then starting from XML which is never changing.

I’m surprised that adding over 30 threads to these slow processes only makes it about 20 to 30% faster. It should have been at least 2-3 times faster on a quad core with SSD. There must be synchronization or something blocking some of the parallelism from working. I’d have to look at more of the details instead of just wrapping these calls in futures to figure that out. However, I can see lucee use all 8 of my cpus are used briefly when it starts now, so it must be working to some extent.

I deployed the latest version of my Lucee fork to my linux test machine which is a virtual machine on the same computer to see if it works any different there. The tomcat full stop/restart is taking 4.13 seconds according to “time”. But it’s actually about 10 seconds for CFML to be able to first run without the threads. I can see I’ve cut 2 seconds off of that with all these threads and have it running in about 8 seconds now with the threads, which is a 30% reduction in startup time.


#6

I took a quick look for synchronized keyword throughout the project.

I found the XMLUtilImpl and CFMLEngineFactory are synchronizing in startup. I was able to safely minimize the need for that on CFMLEngineFactory. For the xml one, I just don’t know yet, but I removed it for now. I’m pretty sure that if we want synchronized on the xml document, it can be done in other ways, like what if synchronized is only done after startup and you do a boolean flag to toggle that mode. XMLUtilImpl functions are called I think hundreds of times, so this is probably the main thing that was stopping some of the parallelism.

After restarting it a few times, my windows box was 300ms faster with the synchronized changes, and my linux box was 700ms faster to startup to working cfml even with the felix cache being rebuilt. So now my new threads are working better.