Lucee returns blank pages

Hi,

Occasionally and randomly our application would take a while to respond (the browser would wait for a response) and after several seconds the page returned was blank with no content. It is quite rare that this happens. But this morning there was no timeout and the entire application was returning blank pages with no content. We had to restart Lucee to get it back to normal. It had been a while since we last restarted Lucee.

I found this topic: Lucee Stops Serving Pages after Error

When I look in the Lucee Log Viewer plugin, I see a bunch of errors like this one :

application.logERROR 08:32:30, 1 déc, 2021Thread-44
"was not able to stop controller thread running for 12943672ms;java.lang.Throwable;java.lang.Throwable

    /application.cfc:339

at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2054)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6596)
at com.microsoft.sqlserver.jdbc.TDSReader.nextPacket(IOBuffer.java:6546)
at com.microsoft.sqlserver.jdbc.TDSReader.ensurePayload(IOBuffer.java:6524)
at com.microsoft.sqlserver.jdbc.TDSReader.peekTokenType(IOBuffer.java:6731)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:61)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1628)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.processResults(SQLServerStatement.java:1302)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.processBatch(SQLServerStatement.java:1293)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.processExecuteResults(SQLServerStatement.java:1286)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.processResponse(SQLServerPreparedStatement.java:530)
at com.microsoft.sqlserver.jdbc.TDSCommand.close(IOBuffer.java:7445)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.discardLastExecutionResults(SQLServerStatement.java:142)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.closeInternal(SQLServerStatement.java:653)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.closeInternal(SQLServerPreparedStatement.java:328)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.$fr$close(SQLServerStatement.java:668)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.close(SQLServerStatement.java)
at lucee.commons.db.DBUtil.closeEL(DBUtil.java:71)
at lucee.runtime.type.QueryImpl.execute(QueryImpl.java:331)
at lucee.runtime.type.QueryImpl.<init>(QueryImpl.java:235)
at lucee.runtime.tag.Query.executeDatasoure(Query.java:1134)
at lucee.runtime.tag.Query._doEndTag(Query.java:699)
at lucee.runtime.tag.Query.doEndTag(Query.java:565)
at cfc.myproject.client.transactions_paiements_cfc585$cf.udfCall(/cfc/myproject/client/transactions_paiements.cfc:269)
at lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:106)
at lucee.runtime.type.UDFImpl._call(UDFImpl.java:344)
at lucee.runtime.type.UDFImpl.callWithNamedValues(UDFImpl.java:207)
at lucee.runtime.ComponentImpl._call(ComponentImpl.java:685)
at lucee.runtime.ComponentImpl._call(ComponentImpl.java:572)
at lucee.runtime.ComponentImpl.callWithNamedValues(ComponentImpl.java:1925)
at lucee.runtime.tag.Invoke.doComponent(Invoke.java:209)
at lucee.runtime.tag.Invoke.doEndTag(Invoke.java:186)
at application_cfc$cf.udfCall(/application.cfc:339)
at lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:106)
at lucee.runtime.type.UDFImpl._call(UDFImpl.java:344)
at lucee.runtime.type.UDFImpl.call(UDFImpl.java:217)
at lucee.runtime.ComponentImpl._call(ComponentImpl.java:684)
at lucee.runtime.ComponentImpl._call(ComponentImpl.java:572)
at lucee.runtime.ComponentImpl.call(ComponentImpl.java:1911)
at lucee.runtime.listener.ModernAppListener.call(ModernAppListener.java:437)
at lucee.runtime.listener.ModernAppListener.onSessionEnd(ModernAppListener.java:362)
at lucee.runtime.type.scope.ScopeContext.clearUnusedMemoryScope(ScopeContext.java:961)
at lucee.runtime.type.scope.ScopeContext.clearUnused(ScopeContext.java:825)
at lucee.runtime.engine.Controler.control(Controler.java:346)
at lucee.runtime.engine.Controler.control(Controler.java:244)
at lucee.runtime.engine.Controler.access$000(Controler.java:58)
at lucee.runtime.engine.Controler$ControlerThread.run(Controler.java:113)

And there is one with 607268603ms! Same CFC file. Same line in the application.cfc.

The error is related to the OnSessionEnd method in the application.cfc that loops through a set of databases to delete records in a table that relate to the session that just ended. This is code that has been around for years without a problem. I still find it weird that these threads have been running for a long time …

I looked in FusionReactor, but I can’t find anything conclusive yet.

Any ideas?

OS: Windows Server 2016 (10.0) 64bit
Java Version: 1.8.0_181 (Oracle Corporation) 64bit
Tomcat Version: Apache Tomcat/8.5.33
Lucee Version: Lucee 5.3.8.206

not sure if this is your problem or not, but I would

  • make sure you have indexes on your client storage tables
  • make sure they have old records purged

The stack in your code above is not related to session end, it’s a cfquery tag here

/cfc/myproject/client/transactions_paiements.cfc:269

It sounds like you have hanging DB queries that never return. I’d check if the queries are really still running or if it’s a network issue. I’ve seen this when using clustered SQL Server instances or Azure managed DBs (which use clustered instances) and the IP address changes midstream.

1 Like

Tony, as for pages returning blank, there can indeed be various causes, some unrelated to each other. As for the DB errors/issues you observe, they may or may not be related.

My first suspicion would be if you may find you have maxmetaspacesize set for your jvm arguments. I have a blog post with more on that, which while focused on ACF would apply also to Lucee (or any Java app).

But you mention finding log entries related to application.cfc, and certainly one could presume that any problem in there could “hold up all requests”…and that would be true, except you note that the errors relate to onsessionend. And I will just note that onsessionend is not a method that would ever (likely) be processed in the life of a CFML page request. Instead, it’s a method that the engine (Lucee or ACF) calls asynchronously when it detects that a session has timed out, which could be minutes or hours after the request using the session had run.

Finally, you mention having FR and not finding “anything conclusive” yet. Well, FR could certainly have helped you see a lot (when things were hanging), such as what requests were running, what queries they were running (if any), and if you’d stack traced the requests while running, you’d have seen exactly WHAT they were waiting for–which could be any number of things, some related to Lucee and others not.

Even after the fact, FR’s “archived metrics” (found under the Metrics or Logging buttons) would let you look BACK at the time of the outage to see a great deal, like how many requests were running, how many queries were running (if any).

And if you setup the FR CP email alerts, those would be sent WHEN such a hangup occurs, and the email would include again WHAT requests were running at the time, what queries they were running (if any), and the stack trace for each running requests (indeed, for all jvm threads), which would tell you what the requests were doing at the moment of that alert.

FWIW, FR also has pages that show each JVM memory space, including the metaspace, which can be viewed in real time or in the archived metrics. That could help you see if you were (or are) having any issues related to metaspace. (I mention this also in my blog post on the topic.)

Bottom line, you absolutely should be able to get to the bottom of whatever is causing your issues, and FR is clearly the best way (in my opinion) to do that in Lucee (and even in ACF, as it does things that the CF PMT and older Server Monitor don’t do.) Or if the problem is of sufficient trouble to warrant you getting more direct help, I can help remotely, often solving such problems in minutes. More at Charlie Arehart's Server Troubleshooting Consulting Services.

As for the DB issue and the long requests, those might even be unrelated to the hangs and blank pages, but are worth pursuing. Again, FR can help, or perhaps what Brad shared will get you going there.

2 Likes

Hi!

The SQL query in /cfc/myproject/client/transactions_paiements.cfc is very simple. It is a DELETE with a small condition. The table are usually mostly empty too. I don’t store a lot of information there. However, when I clear it, I do this for all the databases that the application uses.

The IP that would have changed is an interesting lead, but no IP has changed since the last reboot and it seems to me that onError would have caught the error if it occurred in onSessionEnd?

What intrigues me the most is the thread that has been running for 7 days? How?

was not able to stop controller thread running for 607283603ms;java.lang.Throwable;java.lang.Throwable
/application.cfc:339

Is the application.cfc a thread that keep running until the application timed out for inactivity?

Thank you for your reply!

I’ll see what I can get more out of FusionReactor. I was not present at the time of the incident and Lucee was restarted quickly. Also, errors are captured with OnError in the application.cfc file, but I will change this way otherwise the error is not caught by FusionReactor. And on top of that, this tool is new to me.

I’ll take the time to analyze all of this and get back to you.

It really doesn’t matter what’s in the query. The first question to answer is whether the actual query itself is still running on the DB. if this is SQL Server, you can find out the running processing from the host of your server and see if there is actually a query running or if CF just got hung up waiting for some low level packet to come across the network (which will never arrive). These sort of issues are usually an issue with the JDBC driver itself, not CF.

Sorry, I should have been more specific. I’ve seen this happen with clustered SQL Server (master/slave) instances when the primary instance is failed over to the backup server. If you’re not using clustered SQL Servers, then disregard.

Yeah, no clue there. You haven’t provided any details about your session end error. In fact, those don’t even happen in the context of a running HTTP request.

What’s so surprising about that? You seem to have a little too much faith in the JVM’s ability to stop threads which are running native code. :slight_smile: The rule of thumb is, any time a thread is stuck at any sort of socketRead() method, it can’t be killed. It’s just how the JVM works, love it hate it. Lucee has no control over what the JVM does or doesn’t do here. The likely only solution is to restart the JVM. But stuck threads like this generally don’t cause any issues per se (other than wasting a thread on your JVM ) so long as they don’t hold any open locks or resources.

I doubt the CF application timeout will have any affect on it. It will likely run until you restart your server.

1 Like

I took a look at my JVM arguments and nothing is there about maxmetaspacesize. I just have 8192 MB for Initial memory spool and maximum memory spool.

I think more specifically Charlie may have been implying that your server was hitting an out of memory error. Those can be associated with “white screens” since error handling often times can’t kick in. In order to confirm that, check your servlet’s “out” logs as there would be an out of memory error somewhere in there.

I must admit that at the time of the incident the memory used was high. Here are the screenshots.

I’ll look in the logs as you suggested …

I don’t see any issues at all with your memory-- not in those graphs at least. Memory being “high” as in the sawtooth approaching the top but then goes back down is perfectly normal. It would have to be smashed at the top with no room to be concerning. Plus, all your memory spaces (like non-heap) aren’t even visible in that screenshot.

To be honest, I’m not sure that there’s a correlation between sometimes getting a “white page” and the fact that some JDBC calls sometimes hang. And the on session end error you’ve talked about but not shared any details of is entirely possible to be a totally unrelated third issue. It’s also not actually clear which specific one of the 3 issues above coincided with your screenshots you posted.

1 Like

Probably not related, but in the application.log, I see a lot of this line, like every minutes :

“INFO”,“Thread-95805”,“12/01/2021”,“15:32:35”,“lucee.runtime.PageSourcePool”,“PagePool size [1747] has exceeded max size [1000]. Clearing unused…”
“INFO”,“Thread-95805”,“12/01/2021”,“15:32:35”,“lucee.runtime.PageSourcePool”,“New pagePool size [1747].”

I found this topic : PagePool: 1191>(1000)

I continue to look at the logs.

Interesting. I found 9 errors like that this morning, before the incident, related to the application.cfc and my cfc in the OnSessionEnd :

“ERROR”,“Thread-1547582”,“12/01/2021”,“04:58:28”,"","key [CFID] doesn’t exist;key [CFID] doesn’t exist;lucee.runtime.exp.ExpressionException: key [CFID] doesn’t exist
at lucee.runtime.type.util.StructSupport.invalidKey(StructSupport.java:67)
at lucee.runtime.type.StructImpl.get(StructImpl.java:139)
at lucee.runtime.type.scope.storage.StorageScopeImpl.get(StorageScopeImpl.java:255)
at lucee.runtime.util.VariableUtilImpl.get(VariableUtilImpl.java:278)
at lucee.runtime.PageContextImpl.get(PageContextImpl.java:1539)
at application_cfc$cf.udfCall(/application.cfc:338)
at lucee.runtime.type.UDFImpl.implementation(UDFImpl.java:106)
at lucee.runtime.type.UDFImpl._call(UDFImpl.java:344)
at lucee.runtime.type.UDFImpl.call(UDFImpl.java:217)
at lucee.runtime.ComponentImpl._call(ComponentImpl.java:684)
at lucee.runtime.ComponentImpl._call(ComponentImpl.java:572)
at lucee.runtime.ComponentImpl.call(ComponentImpl.java:1911)
at lucee.runtime.listener.ModernAppListener.call(ModernAppListener.java:437)
at lucee.runtime.listener.ModernAppListener.onSessionEnd(ModernAppListener.java:362)
at lucee.runtime.type.scope.ScopeContext.clearUnusedMemoryScope(ScopeContext.java:961)
at lucee.runtime.type.scope.ScopeContext.clearUnused(ScopeContext.java:825)
at lucee.runtime.engine.Controler.control(Controler.java:346)
at lucee.runtime.engine.Controler.control(Controler.java:244)
at lucee.runtime.engine.Controler.access$000(Controler.java:58)
at lucee.runtime.engine.Controler$ControlerThread.run(Controler.java:113)
"

My code is similar to this :

<cffunction
    name="OnSessionEnd"
    access="public"
    returntype="void"
    output="false"
    hint="Fires when the session is terminated.">
 
    <!--- Define arguments. --->
    <cfargument
        name="SessionScope"
        type="struct"
        required="true"
        />

    <cfargument
        name="ApplicationScope"
        type="struct"
        required="false"
        default="#StructNew()#"
        />
            
    <cfset var myDatabases = "">
            
    <cfinvoke component="cfc.myproject.client.databases" method="getMyDatabases" returnvariable="myDatabases">
        
        <cfloop query="myDatabases">
        
                <cfinvoke component="cfc.myproject.client.transactions-paiements" method="ope_clean_data" datasource="#myDatabases.Datasource#"
                cfid="#ARGUMENTS.SessionScope.CFID#"
            	cftoken="#ARGUMENTS.SessionScope.CFTOKEN#">
        
        </cfloop>
        
		<!--- Return out. --->
		<cfreturn />
        
	</cffunction>

And I found this error too :

“ERROR”,“Thread-1559271”,“12/01/2021”,“04:57:31”,“controller”,"stop request (35) because run into a timeout. ATM we have 4 active request(s) and 0 active cfthreads path: T:\inetpub\wwwroot\myProject\application.cfc (T:\customtags\cfc\myProject\client\transactions-paiements.cfc).;java.lang.Throwable;java.lang.Throwable

I would check for the existence of the CFID before assuming it’s there and using it. But I think you’re off in the weeds with that. There’s no indication that’s related to any of the other issues above. Probably just a bug that’s always been there.

The “stop requests” message is simply telling you that Lucee tried to stop a long-running request.

You are right.

When you say to check my servlet’s “out” logs, is it a file with a specific name? Right now I look in :

  • C:\lucee\tomcat\logs
  • T:\inetpub\wwwroot\myProject\WEB-INF\lucee\logs

I continue my research with the leads you gave me. Thank you so much.

Probably unrelated either, but like I’m taking time to find different errors, I might as well correct them at the same time.I found endless errors like this in the lucee-stderr.log and catalina.log :

javax.mail.Session.loadResource expected resource not found: /META-INF/javamail.default.address.map

Mentionned in this topic : Recurring log error with possible thread issues - #2 by River_Bender

I will look at it tomorrow.

It depends on the servlet container you’re using (Tomcat, Jetty, CommandBox, etc) The “out” logs capture the standard output stream of the Java process that is your server. For Tomcat, it’s usually called catalina.out and would be in Tomcat’s log dir.

Vote on this ticket here:
https://luceeserver.atlassian.net/browse/LDEV-3130

1 Like

Done!

Hi!

I opened every logs I found and there is nothing about out of memory error.

Here’s what I found, but none of it explains the White Pages.

A lot of this errors

javax.mail.Session.loadResource expected resource not found: /META-INF/javamail.default.address.map

Few errors like this

INFO [ajp-nio-8009-exec-11] org.apache.tomcat.util.http.parser.Cookie.logInvalidHeader A cookie header was received […] that contained an invalid cookie. That cookie will be ignored.Note: further occurrences of this error will be logged at DEBUG level.

Few hours before the incident, just one error like this :

01-Dec-2021 04:58:58.034 WARNING [Thread-1547582] com.microsoft.sqlserver.jdbc.TDSReader.readPacket ConnectionID:77223 ClientConnectionId: ******* TDS header contained invalid packet length:0; packet size:8000

01-Dec-2021 04:58:58.034 SEVERE [Thread-1547582] com.microsoft.sqlserver.jdbc.TDSReader.throwInvalidTDS ConnectionID:77223 ClientConnectionId: ******* got unexpected value in TDS response at offset:0

And this one :

“ERROR”,“Thread-1576539”,“12/01/2021”,“05:00:18”,“controller”,"stop request (10) because run into a timeout. ATM we have 4 active request(s) and 0 active cfthreads path: T:\inetpub\wwwroot\myProject\application.cfc (T:\customtags\cfc\myProject\client\transactions-paiements.cfc).;java.lang.Throwable;java.lang.Throwable

Then the incident and we restarted Lucee.

In archived metrics, I am able to find all the requests (POST, GET) before the incident, but nothing suspicious for now.

I’m probably saying bullshit, but is it possible that the problem is with IIS or the BonCode connector? After all, is that what manages the final return of the content? Perhaps by restarting Lucee the connection was reestablished correctly?

We are going to improve our monitoring. In case of an incident like that, before restarting, we will test if our server is able to return a static file like an image. If the server does not succeed, then the problem is not directly linked to Lucee / Tomcat.

I continue my investigations…

My colleague has just confirmed to me that there was already a probe running during the incident which checked if IIS was able to return a static html file. And IIS was still responding during the Blank Pages incident.

That’s not an “error” as you say. It’s an “INFO” message. It would appear a bot probably sent a badly formed cookie to your server. Nothing to worry about IMO.

Those interesting. Hard to say if they signal anything to be concerned about. You’d need to post to a Microsoft forum however, to get support on errors coming from the MS JDBC driver.

That’s certainly possible, but I can’t say I’ve ever seen that happen with Boncode myself. Usually when there is a connection issue, you get a Boncode error message or an IIS error message, not a white page.


This may be a silly question-- but have you checked the possibility that you actually just have a default error page that has no contents-- or an onError() method perhaps with a code path that outputs nothing? Can you tell from FR if the ALL requests to the server were returning a blank page during the incident, and were those requests also being logged in FusionReactor as hitting CF? What was their response code?