Experimental Optimized multi table data loading with QoQ

Zackster · June 30, 2023, 3:00pm

I’ve been working on optimizing the loading of temp tables into HSQLDB for a few years

finally got it all working (more or less)

some stats and a sample .lco on the ticket, I’d been keen to have some other people test it out?

https://luceeserver.atlassian.net/browse/LDEV-3317

this only affects multi table QoQ

the main changes are as follows

only the required columns are loaded into the temp database (a view is created and asked which columns are needed to perform the query)
the synchronized lock is now only around loading data, not the actual executing of the query of query, basically all multi table QoQ is single threaded on an instance
if you are querying from large tables with lots of columns, it should be far more efficient
a bonus here is sometimes the old engine had problems ignoring columns it couldn’t handle, which weren’t even required for the QoQ! i.e. incompatible data type in operation (problem was the id columns differ in type, even tho they aren’t even referenced, this is solved by this PR too)
refactoring

this is with a simple example with optimized column loading disabled

[java]    [script] Found 1 tests to run, filter took 591ms
[java]    [script]         test.tickets.LDEV3311 q1 has 5 rows
[java]    [script] q2 has 1048581 rows
[java]    [script] SQL: CREATE TABLE "Q1" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] QoQ HSQLDB CREATED TABLES: SELECT q1.id FROM q1 where id in (select id from q1 )
[java]    [script] getUsedColumnsForQuery: took 6
[java]    [script] Populate Table: [q1] with [5] rows, [3] columns, took 3ms
[java]    [script] SQL: CREATE TABLE "Q2" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] QoQ HSQLDB CREATED TABLES: SELECT q2.id FROM q2 where id in (select id from q2 )
[java]    [script] getUsedColumnsForQuery: took 1
[java]    [script] Populate Table: [q2] with [1048581] rows, [3] columns, took 803ms
[java]    [script] SQL: CREATE TABLE "Q1" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] SQL: CREATE TABLE "Q2" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] QoQ HSQLDB CREATED TABLES: SELECT q1.id, q2.id as id2 FROM q2, q1 where q1.id = q2.id group by q1.id, q2.id
[java]    [script] getUsedColumnsForQuery: took 4
[java]    [script] Populate Table: [q1] with [5] rows, [3] columns, took 1ms
[java]    [script] Populate Table: [q2] with [1048581] rows, [3] columns, took 766ms
[java]    [script]          (1 tests passed in 4,213 ms)

this is with a simple example with optimised column loading enabled

[java]    [script] Found 1 tests to run, filter took 655ms
[java]    [script]         test.tickets.LDEV3311 q1 has 5 rows
[java]    [script] q2 has 1048581 rows
[java]    [script] SQL: CREATE TABLE "Q1" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] QoQ HSQLDB CREATED TABLES: SELECT q1.id FROM q1 where id in (select id from q1 )
[java]    [script] getUsedColumnsForQuery: took 6
[java]    [script] Populate Table: [q1] with [5] rows, [1] columns, took 2ms
[java]    [script] SQL: CREATE TABLE "Q2" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] QoQ HSQLDB CREATED TABLES: SELECT q2.id FROM q2 where id in (select id from q2 )
[java]    [script] getUsedColumnsForQuery: took 1
[java]    [script] Populate Table: [q2] with [1048581] rows, [1] columns, took 550ms
[java]    [script] SQL: CREATE TABLE "Q1" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] SQL: CREATE TABLE "Q2" ("ID" INTEGER,"NAME" VARCHAR_IGNORECASE,"DATA" VARCHAR_IGNORECASE)
[java]    [script] QoQ HSQLDB CREATED TABLES: SELECT q1.id, q2.id as id2 FROM q2, q1 where q1.id = q2.id group by q1.id, q2.id
[java]    [script] getUsedColumnsForQuery: took 3
[java]    [script] Populate Table: [q1] with [5] rows, [1] columns, took 0ms
[java]    [script] Populate Table: [q2] with [1048581] rows, [1] columns, took 663ms
[java]    [script]          (1 tests passed in 3,948 ms)

As we have upgraded HSQLDB to 2.7.2, something don’t work like they used to, it’s a bit more strict

So odd things like duplicate column names don’t work, just add a column alias

Zackster · July 1, 2023, 9:51am

load testing 5.4 vs 6.0 here

running with 10 threads, 200 loops with jmeter

update, this shows the improved throughput better (two thread groups)

If you haven’t tried JMeter out, here’s an old blog post of mine

Zackster · July 6, 2023, 6:02am

A post was split to a new topic: QoQ: data exception: string data, right truncation ; size limit: 1