Request for Comments - New proposed QoQ feature for CFML BIFs

bdw429s · February 1, 2022, 10:26pm

Ever since completely refactoring the native Query of Query support in Lucee to be more performant and support better aggregate and grouping, I’ve been doing a steady stream of pulls adding new features and fixing incompats with Adobe’s QoQ. One really nice thing about the native QoQ support is

We have full control over the SQL parser
We have full control over the execution and optimization
We can add whatever functions/expressions we want-- many of which Adobe’s QoQ doesn’t even support such as isNull() (already supported) or CASE WHEN THEN END statements (proposed).

While most everything QoQ does follows the SQL spec, or at least takes into account MSSQL, MySQL, Postgres, and Oracle’s precedent-- CFML’s QoQ has a few nods to CFML since it’s embedded into CFML and made for CF devs. We have a lot of great first-class functions in Lucee’s native QoQ and Implementing new ones is easy and I really love making QoQ more powerful.

One of my re-occurring ideas is to gather a collection of basic string, list, and date/time functions in CFML and make them available first class SQL functions in Lucee’s QoQ which just delegated behind the scenes to the same implementation as the BIFs. This would really unlock a ton of incredibly powerful stuff you can do right from inside your Query without additional loops for mapping of the result set. Some examples could be:

Format data for your reports or use string and number manipulations like regex or lists

SELECT
  cf.reReplace( nameCol, '(.*)(Jr|Sr)', '\1' ) as nameCleaned,
  cf.listFirst( forwarded_for_IP_List ) as forwarded_for,
  cf.dollarFormat( cf.max( productTotal, orderTotal ) ) as maxDollars,
  cf.hash( password ) as passwordHashed,
  cf.now() as datetimeReport,
  cf.val( age ) as age,
  cf.dateFormat( dateCreated, 'MM/DD/YYYY' ) as dateCreatedFormatted
from myQry

Or do funky stuff with JSON or lists. Yes, performance considerations apply, but it’s not going to be any different than doing a separate myQry.filter() or myQry.map() - in fact it may even be faster.

SELECT 
  name,
  foo,
  cf.listToArray( categoryIDList ) as categoryIDArray
from myQry 
where cf.structFind( cf.deserializeJSON( strOrderJSON ), 'customerName' ) = 'brad'

The possibilities are near endless, however I would recommend putting some artificial limits on it on purpose so it’s clear that WE’RE WRITING SQL, NOT CFML HERE. i.e. everything above is SQL, being parsed by the SQL parser, following the rules of SQL. And the functions called cf.BIF() just to happen to delegate to the CFML BIFs of the same name behind the scenes. This means I’d recommend enforcing the following rules (which are really just desired limitations of the SQL parser):

Support for delegating to CF functions only, no tags
Positional params only
We choose what BIFs to expose so it’s only the ones that actually make sense (Keeping in mind a CFML query object can contain a complex value such as an array or a struct inside a table cell)
No complex object literals allowed in the SQL (this includes array, struct, and closure/UDF literals). Only query columns, string/number literals, and other SQL expressions can be passed.
There would be no use of evaluate() or anything like that. The Lucee native QoQ Java implementation would simply map those SQL functions directly through to the matching CFML BIFs when executing so it would perform the same as you writing that CFML BIF in your source code.

So this is where you come in. The technical implementation of this feature is pretty straightforward. Heck, I’ll probably spend more time writing the tests than the actual java code for this. This would also not be any sort of replacement for functional programming as any of this could also be done with queryMap(), queryFilter(), etc-- this would just be another way to skin the cat if you already were doing logic in QoQ in the first place. The questions are

Does this make sense?
Is it a good idea?
Would you use it?
Is cf_ a good prefix for the SQL functions that pass through to a BIF? I feel like they need to be name spaced to be explicit, but cf_foo also looks a lot like a custom tag call and a little bit like Adobe’s tag-in-script syntax. Better ideas? Update: I’ve updated the examples to use cf.BIF() as it matches SQL schema much better.
What should the final list of functions be that we support? I have several examples above, but honestly there are a ton of string/number/list/date functions and that doesn’t even count if we want to get funky with struct/array/JSON functions to lean on CFML queries’ abilities to hold complex values in their cells. And then there’s bit-wise operations, decision functions, conversion functions, and math functions! I can imagine use cases for nearly any of those, but I’m not sure we should support everything just because it’s there.

Zackster · February 2, 2022, 12:13am

Could they work in a UPDATE statement too?

Wait, INSERT into datasource via QoQ is also possible?

Nah, who’d ever use tag islands! Meh

dswitzer · February 2, 2022, 1:00pm

First, I like this idea a lot. The language should be helping developers to do things in fewer steps, so anytime something can be added to reduce “time to implement”, I think that’s awesome.

However, since you asked , I have a few thoughts:

It would seem like a more SQL-like thing would be to use a schema to put the functions under. So instead of cf_ use cf. schema. Most modern databases allow you to create a schema to store objects (such as functions), so this would seem like a logical way to handle things. Maybe lucee or cfml would be a better schema name as well. It might make it easier to find those implementations in the code (or maybe cfo/lfo (e.g. ColdFusion/Lucee Function Object), to mimic SQL Server’s default dbo schema name).
I would think adding support for an environmental variable to disable this feature would be smart as well. For example, if you use Lucee as your preferred platform, but are trying to code for cross compatibility, it would be good to be warned of code incompatibilities with ACF. So an environmental variable that could be set to throw an exception would be useful. That way you realize you have to do something else for ACF compatibility reasons.
What’s going to happen to a query that has to fall back to internal HSQL implementation (I think that’s the engine name). I know the code does not work in every situation, and when it can’t handle the SQL it fails back to an actual SQL engine under the hood. I could see how that would be confusing if the functions are only available some of the time. I think if nothing else, we’d need to check if BIF are being used and the fallback SQL implementation is used that we throw an explicit exception that the QoQ cannot be run and why it cannot be run. This could cause some weirdness with people building dynamic QoQ code as well, since this exception might only fire in some corner case scenarios. That could probably be solved by running 2 QoQs (once to the complex join/aggregation) and then another to apply the BIF manipulation.

Anyway, it’s definitely an interesting idea and I can see many uses for it.

scottsteinbeck · February 2, 2022, 6:37pm

+1 for using cf. over cf_, it makes it more readable, searchable, and potentially reusable if you decide to implement those functions on the database.
I hadn’t considered the fallback approach when using HSQL but I feel like that would be an adequate solution when the QoQ has defaulted back to HSQL, is that something that can be handled easily?
The second example of deserializing and pulling out a value from JSON is incredibly powerful, as well as regex support. Both of these features are not available/fully supported on all databases and are a wonderful addition to QoQ

Zackster · February 2, 2022, 7:08pm

Hsqldb supports custom functions too

http://www.hsqldb.org/doc/2.0/guide/sqlroutines-chapt.html#src_jrt_routines

But I’d say let’s just get it working with the native QoQ first

bdw429s · February 3, 2022, 1:00am

This is pure genius-- I love it. So it would be

SELECT 
  cf.uCase( fName ) as first,
  cf.reverse( lName ) as tsal,
  cf.dateFormat( dateCreated, 'full' )
FROM myQry

Yes, that’s much more SQLy just like dbo.myFunc() would be. I’m going to update the examples in the original post to use this.

I’m not so sure of this. The main reason is that Lucee already has hundreds, if not thousands of little features it supports that ACF doesn’t. Even QoQ today has lots of stuff it does in Lucee that Adobe doesn’t support. There normally are no flags to disable any Lucee-specific features. Lucee just is what it is, and it’s just expected that you’ll test your ACF-compat libraries on Adobe If I do this, I’ll actually create a ticket for Adobe to copy it, but so far Adobe has had little interest in any of the handful of tickets I’ve entered to help them stay compatible with Lucee’s QoQ.

It would error There’s not a great solution to this, and it’s a scenario that already exists since Lucee’s QoQ has stuff it does that HSQLDB doesn’t do and there’s nothing really in place for that. My goal in life is to make HSQLDB never get hit so this becomes moot. The big remaining feature HSQLDB does that Lucee’s native QoQ doesn’t do is joins. I want to tackle this, but it’s a major thing so I’ve never had the time. If someone wanted to sponsor that work, I’d love to dig into it. Once native QoQ can do joins (and I’d like to support inner AND outer joins) then it’s probably a very short list of anything HSQLDB does that the native implementation doesn’t do. Ideally we’d completely remove HSQLDB at some point IMO.

FWIW, there is already a flag you can add to completely disable HSQLDB, though that’s not quite the same as what you’re suggesting.

-Dlucee.qoq.hsqldb.disable=true

There’s not a fantastic way to know WHY the native QoQ fails-- the setup is just a basic try catch and pretty much ANY exceptions that get thrown while executing the native code (including stuff like the “you can’t divide by zero” exceptions) just fail over to HSQLDB. It’s fairly inefficient since there are probably cases like divide by zero where there’s no point in even trying to fail over. But there aren’t custom exception types really for most stuff so the exceptions all “look” the same. Furthermore, if there is an exception thrown while parsing the SQL, then it’s not possible to look at see what’s in it since there’s no parsed version of it. So putting in any sort of check like this would take some tip toeing around the error messages. Again, all of this would go away once we can support everything natively and don’t need HSQLDB any longer!

dswitzer · February 3, 2022, 12:48pm

@bdw429s,

I still think it makes sense to check if the QoQ is using the cf schema before trying to sending to HSQLDB, that way a more helpful exception can be thrown. I think we both hate unhelpful exception messages, so if there was a way in the catch that to check for the BIFs being used, then instead of running the HSQLDB query (which we know not only will fail, but I believe would have some overhead of creating all the temp structures before it actually failed), we could show an exception message like:

Your query requires the use of HSQLDB in order to run. This is required for queries using JOINS. However, your query utilizes built-in functions using the cf schema and those are not compatible with HSQLDB queries. To resolve this issue, you can split your QoQ into two distinct queries, one to do the joins and one to apply the function logic, or you can refactor your code to not use QoQs.

Something like that could go a long way to avoid devs having to go look for help on why their code is not working.

That’s just my take on it and how I’d go about implementing it.

bdw429s · February 3, 2022, 4:25pm

Right, and to be clear, I understand your ask, why you’re asking, how it would work, and why it’s a great idea. I’m just saying it won’t be possible 100% of time. I don’t mind doing it when we can-- which may be a large portion of the time, but I’m not sure you’re understanding the catch-22 that makes it difficult.

The primary scenario that causes the native QoQ to fall back to HSQLDB is exceptions thrown by the SelectParser. That class is responsible for taking the string that is the SQL statement and turning it into a collection of objects that represent all the selects, group by, order by, and expressions represented in the SQL String. This collection of objects is key to being able to tell what’s in the SQL and make determinations about whether it’s going to work in HSQLDB.

Now, consider the fact that the very situation you’re wanting to detect and identify are the situations in which the native QoQ parser is incapable of parsing the string, and therefore cannot create the collection of objects to describe it. It’s just a mystery string whose syntax didn’t make sense.

 SELECT cf.foo( col )
FROM GHJ%^&*()DFHSF#FS#
I lika to do da cha-cha

That’s not parse-able so I can’t introspect it and tell you what it does. And-- keep in mind, joins are one of the things the native QoQ parser has never been made to work with, so as soon as there’s a join, it may as well be the gobbledygook above, because it’s all going to make the native QoQ parser just give up.
Now, you may be thinking of resorting to some basic string searching on the SQL, which is possible, but now we’re in the weeds of possible false positives. Consider this SQL:

SELECT 'cf.Objective() was an awesome conferece!'
FROM myQry, myQry2
WHERE myQry.id = myQry2.id

That SQL won’t be parsable by the native QoQ because of the join
There won’t be a collection of objects built up to describe what expressions are present due to bullet one (it’s just gobbledygook)
A simple string search would produce a false positive and incorrectly prevent the user from falling back to HSQLDB

And of course, all this goes away once we finish making the native QoQ do everything we need and leave HSQLDB in the dust bin. Hopefully this helps explain why I love your idea, but it’s not quite as simple every time due to the catch 22 of the native QoQ parsing being a requirement to be able to tell if HSQLDB can run it!

michaelborn_me · February 4, 2022, 3:09pm

Brad, if I may add my uncoherent $0.02…

Why do we need this? So far, the only reasoning I’ve seen is that 1) this is easy to add, and 2) it may be more streamlined to manipulate data inside SQL simply because one is already doing such.

Your last sentence in the RFC says:

I’m not sure we should support everything just because it’s there.

Along those lines, I’d like to know should we support any of this, just because it’s there?

I’m not challenging, but I personally wouldn’t vote for this without a clear, sensible use case.

Zackster · February 4, 2022, 3:23pm

turn it around, why should native QoQ only support a limited subset of only basic SQL functions?

my favourite use case would be supporting UPDATEs native with CFML BIFs

it’s not like we are forcing you to use them, they will just be there

bdw429s · February 4, 2022, 5:42pm

Excellent, and valid question! It’s very fair to ask this. My main reasoning is that I think it would be “damn handy” to have available as an option, but I certainly understand a lot of people rarely use QoQ so there’s not a huge benifit to them. I certainly have a few dozens times I know I could have used a CFML function in my query. Formatting can be done after-the-fact so it’s not a huge one, but the ability to use CFML functions to affect the sorting, filter, and grouping of the data has a very distinct advantage that can’t be easily reproduced in CFML.

For example, grouping a list of messages based on the domain of their E-mail address is something I can do in CFML using queryReduce() or a CFLoop, but I don’t think I can beat the simple intuitive SQL version of it when I can use CF’s listRest() BIF right inside my SQL.

SELECT max( dateSent ) as dateLastMessage,
  cf.listRest( email, '@' ) as emailDomain
FROM myMessages
GROUP BY cf.listRest( email, '@' )

Or how many times have I wished I could do this in QoQ for reports to break down yearly/monthly data:

SELECT count( 1 ) as countOrders,
  cf.year( dateCreated ) as year,
  cf.month( dateCreated ) as month
FROM myOrders
GROUP BY cf.year( dateCreated ), cf.month( dateCreated )

We could start adding more SQL functions to QoQ for string manip or date manip, but then we have to decide, do we follow T-SQL, P-SQL, MySQL, etc? The common thread here is CFML and it already has all this stuff-- so it sort of seems like the natural choice.

Valid question again, but I think what I was suggesting was we approach it with moderation. Figure out how to cleanly expose the awesome power of CFML inside QoQ where it makes sense and is the most useful without going off the rails and trying to turn it into SQLML or CFQL, lol. (Actually, that last one is sort of catchy ) Basically, I think there’s probably a point where it could be taken too far, but I think there’s also a lot of ground before that where we can add a lot of value without departing too far from SQL itself.

It’s an interesting precedent to know that HSQLDB (also written in Java) has a concept of adding your own custom functions. That’s sort of like what we’re doing, but with a subset of an existing functional library we’re all familiar with.

I thought I provided about a dozen of those in my original post in the code blocks. I am genuinely curious if you don’t think they were clear, or sensible, or if you just didn’t think you’d ever personally use it (which is fine too )

bdw429s · February 4, 2022, 5:51pm

This is an incredibly interesting idea. One that I think is completely unrelated to this RFC, but probably doable. It would require an abstraction layer that detects the type of Database being used for the JDBC connection and generated an update or insert statement with valid SQL for that DB. This ground is already covered pretty well with CFML libraries like qb (https://www.forgebox.io/view/qb) but that library uses a method-based DSL, not a query syntax.
https://qb.ortusbooks.com/query-builder/executing-queries/inserts-updates-deletes#update

I would say a feature like this would come with a higher risk since the SQL we generate would need to account for the current versions of all the DB’s out there and account for things like batching rows, etc and those limits differ per DB engine. Outside of the <CFInsert> and <CFUpdate> tags, Lucee has no precedent for writing SQL on behalf of the developer, which to a degree makes Lucee’s life a lot easier

I’ll also clarify there are two sort of separate ideas I think within in this one.

Allowing INSERT, UDPATE, and DELETE QoQ to affect just a query object in memory
Allowing INSERT and UPDATE that targets an actual remote DB via the JDBC connection/datasource

My comments above were mostly directed at the second bullet. The first bullet is also an interesting idea as well.

Leftbower · February 4, 2022, 5:55pm

We use QoQ an awful lot, but only because we primarily write middleware that joins together data from several different systems. So we wouldn’t be able use most of what you’re discussing until joins are actually supported. That said - one thing that annoys the heck out of me is MySQL lacking a native way to handle “business day” calculations. So I suppose being able to use some of Lucee’s date manipulation functions might me handy in that regard…

Zackster · February 4, 2022, 6:21pm

that was my main idea as it’s pretty simple to implement…KISS for now

Jamo · February 4, 2022, 7:02pm

I’d love this QoQ functionality as a separate library (not built-in) so that it could also be used with Adobe ColdFusion. This would allow all developers to write cross-compatible code that benefits CFML in general rather than fragmenting features and causing potential compatibility issues down the road. I’m already finding CFML online that works solely on Lucee (without any disclaimer) and I’m sure that this frustrates developers that are new to ColdFusion.

I’m not sure how possible or desirable this is, but I’d thought I’d comment as we haven’t been able to migrate to Lucee as quickly & painlessly as we thought was possible.

michaelborn_me · February 4, 2022, 7:47pm

Yep, that last one. I consider a “sensible” use case as one that I would conceivably use if it improved my life in any way. IMO, you’ll be processing the data in CFML later anyway, so why not keep your CFML in CF and avoid the confusion that this CF-in-SQL will inevitably bring? However, it has been a while since I’ve done the complex reports that some businesses thrive on… so I’m probably missing the point here.

I haven’t used used QoQ once in the last 4 years (since I left the Big Legacy Codebase I was tied to at my last-last job), so I’m obviously not the “ideal customer” here.

We’re on the same page here, I think. It’s mainly just two questions 1) would anyone use this, which we can clearly see the answer is “Yes”, and 2) where do you “draw the line” on allowing CFML in QoQ. That line will always be there, and always be moving, IMO. Such is the way of lines in the sand… the waves tend to erase it sooner or later.

bdw429s · February 4, 2022, 7:54pm

This is an interesting idea, but would basically require a back-to-the-drawing-board re-implementation of the entire QoQ system in native CFML. It would be hard to optimize, and honestly I’m not sure if it’s possible. I don’t know how Adobe’s QoQ of works, but I know that Lucee’s QoQ uses a lot of underlying features of the Java query object classes directly which are not necessarily exposed at the CF level. (Such as setting an entire column of data in and skipping type checks, or having direct explicit access to Lucee’s Caster class)

Just to clarify Lucee’s position here, Lucee is resolved to create a CF engine that will execute the same code in the same manner as Adobe ColdFusion except for when

there are significant performance considerations (arrays by reference, cfthread attributes by reference, etc)
the ACF feature has little backing or demand (CFClient, CFAjax, CFSharepoint, etc)

That said, Lucee has never resolved for the code it parses to also be parsable by CF and considers its support of CFML to be a superset of Adobe’s. Therefore, Lucee has

additional functions (systemOutput(), dump(), etc)
additional features (extensions, caches, additional params to functions like directoryCreate()
additiona tag behaviors like <CFLoop times="7">

There’s two interesting schools of thought here that you’ll find in two distinct groups of people. Well, three groups, really.

People who use Lucee only and don’t use CF at all– these people are like Rhett in Gone with the Wind. Quite frankly, my dear, they don’t give a dang. Lucee is the superior engine to them and these additional features are just proof of that. Adobe should catch up with Lucee!
People using Adobe and trying to switch to Lucee. – These are the most frustrated group as every little difference they find is another thing they have to fix and they are usually correct when they bemoan the lack of good compat docs. People in this group often eventually move into the first group once they get off Adobe CF and never look back.
People writing code-libraries for the community– I’m in this group as a framework developer and while the difference are certainly annoying, it’s something you live with and grow more familiar with the terrain over time. At Ortus, we rely on comprehensive testing suites and CI/CD to root out any incompat problems.

It sounds like you’re mostly in the second two groups, and I can sympathize that when I first started using Lucee it felt like a lot of “don’t care” attitudes from Lucee people, but as I’ve moved more into the 1st and 3rd groups over the years I understand it a bit better. There’s just a limit to the amount of time and complexity that it’s worth attempt to “hold lucee back” and not let it go and do things on its own that Adobe can catch up with if they want. These Lucee-specific features are off-limits for group 3, but that’s sort of the price to pay when you have two completing engines and the “CFML Language Advisory Committee” (anyone remember that??) showed just how hard it is to get both engines to work together.

Anyway, long answer to your short and valuable feedback. I just find this topic very interesting. But it’s my long way of saying I don’t think there’s anything wrong with Lucee pushing CFML forward on its own. I know it can be frustrating, but I’ve also put in tickets in Adobe’s bug tracker for EVERY enhancement I’ve made to Lucee’s QoQ and I think over time, Adobe will hopefully incorporate most of them.

dswitzer · February 15, 2022, 6:00pm

You could always split the logic into 2 QoQ, one to do the join, one for manipulation. Whether the performance is any good, I suspect that remains to be seen!

Leftbower · February 15, 2022, 6:30pm

In this case we are comparing order dates (coming from an ancient DOS FoxPRO system) with production scheduled dates (coming from a MySQL db) and customer planning information (coming from Salesforce fetched/imported data). All that’s put into various dashboards for real time consumption in both an internal MRP (where that data can be acted upon and modified) and vendor and customer specific dashboards where it is just viewed. We also mix-in data imported from vendor reports which only come in the format of Excel documents and PDFs… as well as data scraped from email notifications from another partner’s Salesforce account.

It’s a Tower of Babel and Lucee glues it all together, mostly through various QoQ, though sometimes we need to do a lot of MySQL temp table work as well.