Application.cfc best practices?

partap · November 6, 2020, 8:13pm

So I’m working with this legacy codebase, and I’ve got an application.cfc that looks something like this:

<cfcomponent
    displayname="Application"
    output="true"
    hint="Handle the application.">

  <!--- Set up the application. --->
  <cfscript>
    this.name = "myapp";
    this.sessionManagement = true;
    this.sessionType = "cfml";
    this.clientManagement = true;
    this.tag.cflocation.addtoken = false;
    // ...more stuff

    // *Location 1*

    // Application Hooks

    public boolean function onApplicationStart() {
      StructClear(Application);

      // *Location 2*

      Application.ObjectFactory = createObject("component","com.ObjectFactory").init();
      Application.preferenceObjectFactory =  application.ObjectFactory.getObject("preferenceObjectFactory");
      Application.preferenceObjectFactory.init(); 
      Application.securityObject = application.ObjectFactory.getObject("securityObject");
      Application.securityObject.init();
      // Bunch more stuff...

      
    }

    // ...more hooks

  </cfscript>
</cfcomponent>

I wanted to log the version at application startup, and also whenever onApplicationStart() is called. So at Location 1, I’m logging “Running application version: 1.2.3”, and then at Location 2, I’m logging “onApplicationStart version: 1.2.3”…

I expected the log to show the output from Location 1 once when I restart lucee, and then the output from Location 2 before the first request and any other time onApplicationStart() is called (it caches a bunch of global prefs from the db in the Application object, so it is called manually when any of those values change)

Instead, Location 2 output is as expected, but I’m seeing Location 1 output before every single request. So it seems that the code in application.cfc is actually rerun on every request, but the onApplicationStart() hook is actually only run once (when the application is actually started).

So I’m wondering if it’s actually better to not have any executable code in application.cfc outside of the scope of its member functions (Location 1)? The only code that’s currently there is a bunch of member initializations (e.g. this.varname = constantValue;), so I’m not too worried about the overhead in this case, but I’m wondering about the general conventions/ best practices for how to organize application.cfc.
Like…should that member init code actually be inside onApplicationStart() or is it ok where it is?

Also, I find the distinction between this.* (the Application component?) and Application.* (the Application Scope?) a little confusing. Is there an article somewhere that covers these basic concepts?

Thanks!
-Partap

bdw429s · November 6, 2020, 8:40pm

The Application.cfc does several things including defining lifecycle events for the application, requests, sessions, and errors. One often-times unexpected behavior is that a new instance of the Application.cfc is created for every request. Every time a CFC is created for any reason, the code outside of any method (known as the pseudo-constructor) is run. That is why you are seeing your “location 1” code running for every request. The onApplicationStart method, however, is still only run once, the first time the application is hit, or if the application has timed out. The re-running of the Application file is actually very powerful as it allows settings to be very dynamic. For instance, you can have more than one application scope by dynamically changing the application name based on the cgi.host_name for example. Or you can override the session timeout for an individual request based on the user agent name (so bots don’t have a long session, etc).

There is very little overhead in creating the Application.cfc instance on every request and processing the pseduo constructor. You obviously would not want to put application startup code in location 1.

The difference between the this scope of the Application.cfc and the application scope is that the latter is transcendent for the entire life of the application, unique to the application name, and shared by all running requests. Variables placed in the application scope will be available on subsequent requests until the application times out. The this scope of the CFC is specific to that CFC instance, and therefore that request and is used only to help set the defaults for how that application will behave. After the CF engine instantiates the Application.cfc for each request, it looks at the public properties (the this scope) to configure the metadata for the application. This is unrelated to the actual application scope.

should that member init code actually be inside onApplicationStart()…?

No, it will not work there. Any of the this.name sort of settings must be outside of any method in the pseudo constructor.

StructClear(Application);

This code is pointless unless you have code in your onRequestStart() to similar that manually calls the onApplicationStart() method to force a refresh of the application variables.

partap · November 9, 2020, 6:29pm

Thanks, that actually clears it up a lot!

Pseudo constructor code, eh? So is that how all of the CFCs work?
Actually, I guess other than Application.cfc, I would be calling the cfc directly by a request, or including it in my code with createObject(), so I would expect all the code to be run in those cases (pseudo constructor on load, then whatever functions/methods I need)

Yeah, there are several places where it manually calls onApplicationStart() to reload various objects into Application …e.g. when somebody changes server settings, it need to reload the application preferences from the db.

I don’t actually like that architecture… the ObjectFactory stuff seems like an anti-pattern to me, although it seems to be used a lot in Java. The developer who added that stuff in said it fixed “memory leaks” …from what I can tell though, it is just a verbose way of creating singleton objects shared by the application.

Shouldn’t we be able to do the same thing by using one of Lucee’s Cache services? I haven’t actually tried used lucee caches yet, but it seems like that would work to cache objects and queries.

Oh yeah… so there’s a broken bit of code that I discovered recently. There’s a sub-application in /mysubapp/application.cfc… onApplicationStart contains similar code to create Application.objectFactory, etc. When prefs are changed in the main app, the subapp needs to reload the prefs. From the main app it does something like this:

<cfapplication name="mysubapp">
<cfscript>
  this.onApplicationStart();
</cfscript>

<cfapplication name="myapp">
<!--- continue main app... --->

But after this code is run, any requests to /mysubapp/mypage.cfm break with a lucee exception saying it can’t find Application.preferenceObjectFactory or whatever, until I restart Lucee.

This code has been there for years, and it seems crazy that nobody has ever noticed this. Is it possible that this code used to actually work with a certain server config? Would it switch contexts to the existing mysubapp application, run onApplicationStart() and switch back to the main app context?

…Because it seems like what it is maybe doing is creating a new mysubapp application and blowing away all of the existing Application scope in the old one?

I haven’t been able to tell what this piece of code actually does. It’s definitely not running onApplicationStart() in /mysubapp/application.cfc…from the logs, it seems like it’s running it from /application.cfc, but whatever is happening, the Application scope in /mysubapp is empty after this.

The code was originally written for BlueDragon, if that makes a difference.

bdw429s · November 9, 2020, 10:12pm

Yes, all CFCs work this way. It actually sort of makes sense that a UDF declaration such as

function foo() {
}

defines a UDF in your local variables scope. So when you instantiate a CFC, the code inside of the component block is executed, thus defining all of the UDFs inside of the component. Any other code mixed in with the UDF declarations runs at the same time.

Yeah, there are several places where it manually calls onApplicationStart() to reload various objects into Application

That’s fine. Just keep in mind that when the CF engine calls the app start method for you, it’s guaranteed to be run once and be thread safe. However, when you run it yourself, it’s up to you to provide any sort of thread safety you may need. Another approach is to call

applicationStop();

in your code and then the next request will automatically trigger the onApplicationStart() method automatically in its typical thread safe fashion. The main downside there is that applicationStop() is a bit of a club and can cause any currently-running requests to error if the server is busy. With my move to Docker Swarm, there is no such thing as an app reinit for me. Any update to the application is in the form of a rolling refresh of the Swarm service so I never have to worry about reloading the Application in production.

I don’t actually like that architecture… the ObjectFactory stuff seems like an anti-pattern to me, although it seems to be used a lot in Java.

I’m not familiar with the particular factory code in your app so I can’t comment on that. I use WireBox DI/AOP which can be used standalone or inside of ColdBox MVC which greatly reduces the boilerplate needed for managing application state and singletons as WireBox manages creation, dependency injection, object persistence, and weaving AOP advices. I’ve never written a custom object factory before in CFML.

The developer who added that stuff in said it fixed “memory leaks”

No telling what that means

it is just a verbose way of creating singleton objects shared by the application.

yes, that’s very common. In WireBox, is much simpler. I simply annotate my components like so:

component singleton {
}

And then the DI container is responsible for creating it when necessary and ensuring only one instance exists at a time for the application.

wirebox.getInstance( 'myService' );

Shouldn’t we be able to do the same thing by using one of Lucee’s Cache services? I haven’t actually tried used lucee caches yet, but it seems like that would work to cache objects and queries.

WireBox solves these issues in a much better fashion. I would be wary of using a cache since an out-of-process cache would serialize the instance, likely causing issues for you.

But after this code is run, any requests to /mysubapp/mypage.cfm break with a lucee exception saying it can’t find Application.preferenceObjectFactory or whatever, until I restart Lucee.

Hard to say what’s going on without being able to see it all in action. I’m not a fan of nested applications since they cause all sorts of issues. I prefer to have one single application name and scope. If I want to break an application apart into bite size chunks, I create modules in ColdBox MVC so I can cleanly separate models, controllers, views, settings, and SES URL routes all inside of the same parent application.

The code was originally written for BlueDragon, if that makes a difference.

I’ve never used BD, but I wouldn’t be surprised if it had some different behaviors. It never was 100% compatible from my understanding.

partap · November 12, 2020, 1:23am

Thanks, @bdw429s, you’ve been very helpful!

I’ve been looking at WireBox…it, and ColdBox in general seem very nice.
I explored the sample ColdBox site from the tutorial and it was a good example of how a coldfusion site can be modular and organized…

Unfortunately, for the time being, I need to maintain this codebase, and rather than a nice MVC layout, this one is built mostly out of nested cfinclude templates, with a few custom tags and miscellaneous components thrown in.

There are a couple of nested apps within the main app, and logically, there’s no actual need for them, but I’ll need to do a lot of “gardening” before I can be sure they are able to coexist safely within the main app. I think if I can integrate WireBox that will be very helpful in cleaning up the “globals”…the application/client/session scopes are full of them, but the worst are the standalone variables that are accessed in a cfm file out of nowhere, but expected to be defined in some other cfm file up the include chain.

Meanwhile, is there some way that I can access the the sub-apps from the main app? I don’t necessarily need to run onApplicationStart directly. All I need is to be able to send a message or set a flag somehow so when the next request to the sub-app comes in, it knows that it needs to reload the prefs from the db. Or maybe the sub-apps can message the parent? I hope there would be some sort of IPC included with the concept of nested apps?

bdw429s · November 12, 2020, 4:02pm

ColdFusion doesn’t provide any direct mechanism for one application to touch another application in memory. Typical solutions to the problem you have would be:

Have the parent application hit the sub apps via CFHTTP on a URL that forces a reload. Note this can get tricky with more than one server behind a load balancer
Update the time stamp on a shared file that each app checks periodically to force a refresh
Use a shared scope that’s visible to all applications on the server such as the server scope and update a timestamp or flag that each sub app checks.

In each of these cases, you still need to be caution of race conditions so 10 threads don’t all try and initiate a refresh at the same time. The standard solution for this is to employ the “double checked” lock like so:

// Check if app needs to reload (many threads can get inside this if at the same time)
if( needsReload() ) {

  // First thread to acquire this lock wins the rights to reinit the app.  All other threads wait here
  // Ensure timeout is greater than time to reload unless you want to have a fail-fast for waiting threads
  lock type="exclusive" name="application_myApp_reloading" timeout="60" {

    // Once the lock is acquired, check if another thread already beat you here (if so, nothing to do)
    if( needsReload() ) {

      // Actually perform the reload-- only one thread gets here.
      performActualReload();

    }

  }

}

partap · November 12, 2020, 9:06pm

Speaking of thread-safe, I came across a page a while ago that was explaining the difference between initializing a variable with var vs without… I remember it was saying something about thread safety and was planning on going back to read it since it seemed interesting (was looking for something else at the time).

I lost that particular page, but from what I can find, I think it might explain some strange behavior that seems to pop up randomly. Like, sometimes a variable’s value seems to change from one line of execution to the next, randomly.

Here’s an example of one place it used to happen:

securityObject.cfc:

<cffunction name="canAccess">
  <cfargument name="scriptUrl" type="string" required="yes" />
  <cfargument name="user" type="struct" required="yes" />
  <cfscript>
    resultStruct = structNew();
    resultStruct.allowAccess = 1;
  </cfscript>
  ...
  <cfif [something...]>
    <cfset resultStruct.allowAccess = 0>
  </cfif>
  ...
  <cfreturn resultStruct>
</cffunction>

application.cfc:

<cffunction name="OnRequest" ... >
  <cfargument name="TargetPage" type="string" required="true"/>
  <cfset results = Application.securityObject.canAccess(ARGUMENTS.TargetPage, Client)>
  <!--- location1 --->
  <cfif results.allowAccess>
    <cfinclude template="#ARGUMENTS.TargetPage#" />
  </cfelse>
    <!--- location2 --->
    ...
  </cfif>
</cffunction>

There were times when I saw page access denied, despite the user having permission. I added log statements at location1 and location2 to display the value of results.canAccess and found that every once in a while, it was 1 in location1, but then I would get the access failure and it was 0 in location2 in the same request.

I switched application.cfc over to cfscript a little while ago, and I haven’t been seeing the problem lately. I suspect that I inadvertently fixed it by using var results = ...

function onRequest (string targetPage) {
  var results = application.securityObject.canAccess(arguments.targetPage, session);
  if (results.allowAccess) {
    include arguments.targetPage;
  } else {
    ...
  }
}

If I understand the docs correctly now, when you assign a new variable like <cfset myvar = someValue>, it puts it in the variables scope, rather than local (as the code seems to believe).

Now what I’m not sure of is how this pertains to threads…From the docs, the variables scope is private to the current CFC, but shared between calls to CFC methods, which in this case would either be securityObject.cfc or application.cfc.

The securityObject is created using the objectFactory from my first post, so it is going to be a singleton. What about application.cfc? Is that a singleton, or is there a new instance created for every request? If both are singletons, then all simultaneous request threads would be sharing the same results variable in application.cfc and resultStruct in securityObject.cfc, despite the use of structNew().

So, if this is the case then I only halfway “fixed” the problem, since resultStruct is still shared. onRequest()'s results wouldn’t change after being returned from canAccess() like before, but it’s still possible for it to be overwritten halfway through canAccess() by another thread and be assigned the values from a different request… or even possibly getting a just-initialized or halfway-processed resultStruct. Or…hmm…since it’s still a pointer to the struct created in securityObject.cfc, I guess it could still change after returning.

So, this is disturbing…it potentially affects most of the site. Since pages are executed by cfincluding from application.cfc, and “local” variables are typically assigned in tag notation like <cfset myvar = someval>… occasionally there is some cfscript code that uses var myvar = val, but the vast majority of the site is written in tag notation.

OK…I thought I had it figured out, but there are actually a lot of “local” variables that are declared using <cfparam name="myvar" default="">. But now I’m not sure what scope those are created in…

bdw429s · November 12, 2020, 9:20pm

That is exactly correct. The default scope is variables which is private to the CFC but lives for the life of the CFC.

var foo = 'bar';

is just syntactic sugar for

local.foo = 'bar';

but is very important to ensure all variables that are meant to be local to a method are delcared properly. This includes loop counters, cfquery results, cfhttp results, cfsavecontent variables.
You can either pre-declare the variable above as local

<cfset var myQry = "">
<cfquery name="myQry">
...
</cfquery>

or declare it in the local scope in the first place

<cfquery name="local.myQry">
...
</cfquery>

I switched application.cfc over to cfscript a little while ago, and I haven’t been seeing the problem lately. I suspect that I inadvertently fixed it by using var results = ...

Yes, the script conversion had nothing to do with it. It was the local var’ing of the variable that fixed it.

What about application.cfc? Is that a singleton, or is there a new instance created for every request?

There is a new instance created for every thread. You should still correctly var stuff just to keep it kosher.

there are actually a lot of “local” variables that are declared using <cfparam name="myvar" default="">

That code will not create a variable in the local scope. It will go in the default scope of variables.

There is a static code analysis cool called var scoper you can use to help scan your code. I think it only works on tags. It’s build into the CodeChecker CLI you can run from the command line if you like.

Also, for ColdBox apps, I wrote a module that automatically checks for scope leak in your singleton CFCs

You may be able to take some inspiration from how it works. It creates a list of variables declared in a CFC when it is first created and then later after the app has been running, it will compare that list to the current variables. Any new or changed variables since the creation of the CFC are a potential leak from one of the methods that weren’t var scoped.

bdw429s · November 12, 2020, 9:23pm

One more idea I just thought of. Lucee has a setting called “local scope mode” on the “scope” page in the admin

If you change it to “modern” it will change the default scope to local unless you specifically declare it as varaibles.foo. However, warning-- this is a bug change and will likely break any sort of libraries you may be using that rely on the old functionality. It’s worth looking at though as it’s much more strict.

partap · November 12, 2020, 11:24pm

OK, so in the typical cfm code on this site, the variables scope would be used, which means application.cfc’s variables (cfinclude chain from application.cfc)

But you are saying that there’s a new application.cfc variables scope created for every request?
In that case, my var results = ... line in onRequest wouldn’t actually have changed anything, would it? Other simultaneous requests would have their own private application.cfc variables scope, so it’s not quite as bad as I was thinking.

The real issue would be (is still) in the securityObject.cfc code, which using the variables scope instead of local. Probably the same for the majority of the CFC code…probably 95% of it uses tag notation and I’ve never seen <cfset var myvar = ...> used or even local.myvar = ....
I didn’t even know you could use var with <cfset> until today

…the fact that this site runs at all seems to be a happy accident…

I should test out the “modern” scope setting… hard to say if it will break anything, but most of the code seems to be written with the assumption that variables.myvar is a local scope.

Oh yeah… also: Code inside actual functions is the exception, rather than the norm… most pages consist of cfm files full of tags with cfincludes inserting other cfm files full of tags. In this case, what is local? Are we basically just running all of this code as if it were pasted inside onRequest()?

edit: I started on a rant here

It will be hard to really check everything, I’m looking at over 3000 cfm and cfc files here, although I suspect the amount files in actual use is much much lower. There are tons of different “versions” of files… basically copypasted code with a few changes and named “myfile_v9.cfm”, myfile_v10.cfm", “myfile_flv_new.cfm”, etc…

I’ve been trying forever to figure out some way to programmatically find zombie files and remove them from the repo, but it’s tough. I wrote a script that enumerates the files and maps out the cfinclude relationships…I thought I was onto something, but then I realized a few issues:

if a file is not referenced by cfinclude, that doesn’t mean it’s not used. It might be a top level file accessed via url
if a file is referenced by cfinclude, that doesn’t mean it is used… there are lots of
cases of unused1.cfm including unused2.cfm. Or the cfinclude is in a commented out section, or surrounded by <cfif #someObsoleteFeature#> ... </cfif>

So I tried expand the mapping by enumerating url links, but turns out to be much more complex. There’s the simple cases of <a href="...">, <form target="...">, etc. but then there’s also a lot of files that are referenced by javascript ajax calls, or included in a frame page via url parameters, or the page name is in a variable (in cf and/or js) instead of a string literal… or sometimes the file is actually an external access endpoint that is never referred to internally.

So my script found around 2500 cfm files with no obvious <cfinclude>, <a>, or <form> references…but with a few obvious exceptions (defaultold.cfm, aaa_test.cfm, etc), I can’t risk removing them without doing a deep dive on each file… and of the ~700 other files, most of them are referenced by exactly one cfinclude statement or one url link. So not much insight there, either.

There’s probably some sort of graph traversal algorithm I could use to find clusters of linked files, but of course I would still need to positively identify all of the programmatically generated urls and external api endpoints.

Sigh…

/rant

bdw429s · November 13, 2020, 3:46pm

That is correct. The variables scope in each Application.cfc instance would be shared by all of the cfinclude chain for that page starting in the onRequest() method.

Yes, that’s correct.

Yeah, dunno-- I don’t really understand enough about your app to comment. If you want a variable to be in the variables scope, then it may be fine. There’s no rule here, you just have to put it in the proper scope based on the context it’s declare and where it needs to be accessed from.

It depends on how this object is created and if it’s shared as a singleton across requests. If it’s a singleton, then yeah that’s not good. If it’s re-created at each use via createObject or via CFInvoke, then it wouldn’t matter.

Strictly speaking, there is no local scope in a cfm page and if you try to declare a variable in local, you’ll get an error. That said, a .cfm cfincluded inside of a method from a CFC will inherit the local scope of the CFC method so it can access variables in it and any local variables created in that cfm will really be created inside the method’s local scope. If alll the pages in your site are cfincluded in your onRequest, then that’s the only way any of the cfms have a local scope.

I’m thinking @Zackster had something that tapped into Lucee’s debugger information to track all the files that executed on a request. Presumably, you could hook such a check up so you could toss it in production for a week and log all the access. Zac, does that exist or am I imagining it?

Zackster · November 13, 2020, 4:00pm

something along the lines of

getPageContext().getDebugger().getDebuggingData(getPageContext())

that returns all the debug information

have a look at the debugging template to see what you have available, or just dump it out

github.com

lucee/Lucee/blob/5.3/core/src/main/java/resource/context/admin/debug/Classic.cfc#L220


      
          			<td class="cfdebug" align="center"><b>Total Time</b></td>
          			<td class="cfdebug" align="center"><b>Avg Time</b></td>
          			<td class="cfdebug" align="center"><b>Count</b></td>
          			<td class="cfdebug"><b>Template</b></td>
          		</tr></cfif>
          <cfset var loa=0>
          <cfset var tot=0>
          <cfset var q=0>
          <cfparam name="arguments.custom.minimal" default="0">
          <cfparam name="arguments.custom.highlight" default="250000">
          <cfloop query="pages">
          		<cfset tot=tot+pages.total><cfset q=q+pages.query>
          		<cfif pages.avg LT arguments.custom.minimal*1000><cfcontinue></cfif>
          		<cfset var bad=pages.avg GTE arguments.custom.highlight*1000><cfset loa=loa+pages.load>
          		<tr>
          			<td align="right" class="cfdebug" nowrap><cfif bad><font color="red"><span class="template_overage"></cfif>#formatUnit(arguments.custom.unit, pages.total-pages.load)#<cfif bad></span></font></cfif></td>
          			<td align="right" class="cfdebug" nowrap><cfif bad><font color="red"><span class="template_overage"></cfif>#formatUnit(arguments.custom.unit, pages.avg)#<cfif bad></span></font></cfif></td>
          			<td align="center" class="cfdebug" nowrap>#pages.count#</td>
          			<td align="left" class="cfdebug" nowrap><cfif bad><font color="red"><span class="template_overage"></cfif>#pages.src#<cfif bad></span></font></cfif></td>
          		</tr>
          </cfloop>

partap · November 16, 2020, 11:45pm

Thanks!
This might help…