Is it possible to use cf_sql_array with an IN clause?

Zackster · October 12, 2018, 12:02pm

I’m trying to avoid the overheard of parsing an IN clause with cfqueryparam list=“true” which adds a bind parameter for each item in a list which forces reparsing of the sql if the number of list items changes

github.com

lucee/Lucee/blob/5.3/core/src/main/java/lucee/runtime/db/SQLCaster.java#L153


      
          	}
          
          
	public static void setValue(PageContext pc, TimeZone tz, PreparedStatement stat, int parameterIndex, SQLItem item) throws PageException, SQLException, DatabaseException {
          		pc = ThreadLocalPageContext.get(pc);
          		Object value = item.getValue();
          		if (item.isNulls() || value == null) {
          			stat.setNull(parameterIndex, item.getType());
          			return;
          		}
          		int type = item.getType();
          		boolean fns = NullSupportHelper.full(pc);
          		switch (type) {
          		/*
          		 * case Types.ARRAY: stat.setArray(parameterIndex,toArray(item.getValue())); return;
          		 */
          		case Types.BIGINT:
          			try {
          				stat.setLong(parameterIndex, Caster.toLongValue(value));
          			}
          			catch (PageException pe) {
          				if (!fns && value instanceof String && StringUtil.isEmpty((String) value)) stat.setNull(parameterIndex, item.getType());

joe.gooch · October 12, 2018, 6:50pm

The JDBC method setArray passes a parameter of type Array to the SQL server for processing, however, I know MSSQL in particular isn’t going to do what you want.

For that matter the overhead of “reparsing the sql” you mention isn’t so much parsing the SQL as it is creating an execution plan which is later cached - if your array changes in size (i.e. you pass 1 argument or 3), the plan changes. To wit, if you’re expecting SQL Server to treat WHERE col=[1,2,3] as WHERE col IN (1,2,3), then it’s going to need a new execution plan anyway :). In other words, “I don’t think that means what you think it means.

Another similar thing:

https://social.msdn.microsoft.com/Forums/sqlserver/en-US/7ed83343-142b-443e-9634-d1770b24a98b/jdbc-setarray-method?forum=sqldataaccess

One such option mentioned above would be to join to a table… i.e. in MSSQL

(Note I want 1 # to be passed to MSSQL, you may need ## to get around CF’s parsing)

CREATE TABLE #someTable ( param INT not null, PRIMARY KEY CLUSTERED (param))

Insert into #someTable VALUES ((?),(?),(?)…) (create this with cfqueryparam list for instance)

Select * from othertable where blah in (select param from #someTable)

Or similar

But, again, we still have the parameters so it’s not saving you much.

I think it’s a better question to ask if you’re actually having problems because of the # of cached plans. Sure, having a bunch of extra cached plans takes more memory, but in my experience

parsing and creating execution plans is not a significant amount of time, except for a few very EXTREME cases in which case it’s not really avoidable anyway (i.e. crazy data warehousing, aggregation and graphing things)
The additional plans don’t really hurt anything
Usually the execution of the query takes longer than the prep by a significant magnitude.

We have no tangible problems with IN (cfqueryparam list=“yes”) constructs. What we DO have problems with is if the list is big enough that we go over 2000 parameters in a single query. In that case, I’ve actually regex enforced that the lists ONLY contains characters of 0-9 and , (to prevent sql injection) and then just done IN (#thelist#). It means I can’t reuse the plan, but how useful is a plan with 2000+ hardcoded params anyway - it ends up being a single use plan that gets cleared by my maintenance jobs and/or over time. And SQL can still execute that query without blinking… I mean sure it’s a lot of data, but the plan prep isn’t the part that takes the time.

We’ve also used the tmp table method as well - usually to pass a table of params or data into a stored procedure or external job… and usually that job takes way longer than the complexity of creating a temp table and inserting data does. (And I’d take keeping it tabular over native JSON or XML or CSVs parsed in the SQL server any day of the week and twice sunday… Those DO cause significant performance hits…)

-G

Zackster · October 13, 2018, 10:47am

With sql server 2016, I’m getting better performance passing the [1,2,3] as json in as a varchar param

in ( SELECT value FROM OPENJSON(‘[1,2,3]’) )

It’s seems to be 30-40% faster for the initial query with the json approach vs compared to with list=“yes” for the initial request with x params

once the query plan is cached for each unique number of list=“yes” params, the performance is about the same, but this avoids the initial query often being twice as slow (and avoids filling up the parsed query cache with lots of variations of the same query)

joe.gooch · October 13, 2018, 11:42am

Well that’s cool. I’ll definitely have to explore that for some of our queries. Prod is on 2012 and dev 2017 so that may be useful in the future.

What magnitude are we talking about when you say 30-40%? (1ms vs 1.4ms? 100ms vs 140ms?)

Have you by chance identified where the time difference is happening? Cf parsing, JDBC, MSSQL… (I.e. if you test in SSMS w/ sp_prepare do you get the same results) obviously the execution plan will be different but I’d think sql will make some pretty contrived estimates about how many rows will come back from openjson.

What version of the mssql driver are you using?

Also maybe these settings:

https://www.brentozar.com/archive/2018/03/sp_prepare-isnt-good-sp_executesql-performance/

-G