Open Telemetry reports cfhttp as separate transaction

I could only work around with direct java http call which I don’t find very elegant. Here is what Claude supposes, any chance to get it implemented?

Summary

When Lucee executes work on a PageContextThread worker — most notably cfhttp/new http() with a timeout, and cfthread — the originating thread’s ThreadLocal-bound context is not carried to the worker thread. PageContextThread propagates the Lucee PageContext, but nothing else. This breaks every tool that relies on ThreadLocal context propagation: OpenTelemetry traces, commercial APM agents, and SLF4J/Log4j MDC.

I’d like to propose a small, dependency-free SPI that lets such tools register a context propagator, discovered via ServiceLoader, so Lucee can carry arbitrary context across its worker-thread boundaries without taking a compile-time dependency on any of them. I’m willing to implement the PR.

Environment

  • Lucee 6.2.7 (verified against current source; behavior is long-standing)
  • Any OpenTelemetry / Elastic APM / Dynatrace / MDC setup

Problem / background

The OpenTelemetry Java agent (and APM agents generally) keep the active span in a ThreadLocal (io.opentelemetry.context.Context). Outbound HTTP client calls are auto-instrumented and become child spans of the active server span — as long as that span is active on the calling thread.

For Lucee cfhttp, that’s true only when the request runs inline. With a timeout set, it runs on a worker thread that has no access to the caller’s ThreadLocals, so the HTTP client span starts with no parent and is reported as a separate root transaction instead of a child span.

Concretely: a search page that issues a HTTP call shows up in APM as two unrelated transactions (the page, and a stray GET) rather than one trace with a nested Solr span. JDBC/Hibernate calls, which run inline on the request thread, nest correctly — which makes the inconsistency obvious.

Reproduction

  1. Attach the OpenTelemetry Java agent to Lucee/Tomcat.
  2. In a request, call:
    var h = new http();
    h.setUrl(“http://example.internal:8983/solr/core/select?q=*:*”);
    h.setTimeout(15); // forces the threaded path
    h.send();
  3. In the APM UI, observe the HTTP call appears as its own root transaction, not as a child span of the request.
  4. Remove setTimeout(15) → the call runs inline → it correctly nests. This isolates the threaded path as the cause.

Root cause

core/src/main/java/lucee/runtime/tag/Http.java, _doEndTag():

if (socketTimeout == null || socketTimeout.getMillis() <= 0) {
    rsp = e.execute(httpContext);          // INLINE — conte
} else {
    e.start();                             // SEPARATE threaextThread)
    synchronized (this) { this.wait(socketTimeout.getMillis()); }
    if (!e.done) { req.abort(); /* timeout */ }
}

Executor4 extends PageContextThread, started via e.start(). PageContextThread copies the PageContext to the child thread but not arbitrary
ThreadLocals, so the OTel context (and MDC, etc.) is lost. T since it also runs on PageContextThread.

Impact

  • Distributed tracing is broken for cfhttp with timeout — orisleading service maps. Timeouts are recommended practice, sothis hits well-configured apps hardest.
  • APM agents (Elastic, Dynatrace, New Relic) that use Threadected.
  • MDC logging loses correlation IDs inside cfthread/cfhttp.
  • Forces per-call-site workarounds (replacing cfhttp with a call) that are error-prone and lose cfhttp semantics.

Proposed solution

A minimal SPI in Lucee core, plus a single hook in PageContestered context on the origin thread and re-installs it on theworker thread around execution. No core dependency on OTel/MDC/etc.

SPI

package lucee.commons.lang.thread;

/** Bridges ThreadLocal-bound context across a Lucee worker-thread boundary
 *  (cfhttp Executor4, cfthread, …). Discovered via java.uti
 *  so core needs no dependency on OpenTelemetry, MDC, or any APM. */
public interface ThreadContextPropagator {
    /** Origin thread: snapshot what must cross. */
    Object capture();
    /** Worker thread: install snapshot before work; return an undo handle. */
    Restorer install(Object captured);
    interface Restorer { void close(); }
}

Registry (ServiceLoader, cached, fail-safe)

public final class ThreadContextPropagators {
    private static final List<ThreadContextPropagator> ALL = load();
    public static boolean isEmpty() { return ALL.isEmpty();
    public static Object[] captureAll() { /* per-propagator try/catch; null if empty */ }
    public static ThreadContextPropagator.Restorer installAl reverse */ }
    // load() must use the classloader that sees external jars — see "Open questions"
}

Hook in PageContextThread

public abstract class PageContextThread extends Thread {
    private final Object[] ctxSnapshot = ThreadContextPropagators.captureAll(); // ctor = origin thread
    @Override public final void run() {
        ThreadContextPropagator.Restorer r = ThreadContextPropagators.installAll(ctxSnapshot);
        try { runImpl(); } finally { r.close(); }
    }
    protected abstract void runImpl();
}

Example external provider (ships outside Lucee)

public final class OtelThreadContextPropagator implements ThreadContextPropagator {
    public Object capture() { return io.opentelemetry.contex
    public Restorer install(Object c) {
        var s = ((io.opentelemetry.context.Context) c).makeC
        return s::close;
    }
}

…registered via META-INF/services/lucee.commons.lang.thread.

Design considerations

  • No hard dependency / fail-safe: core imports nothing from ll is wrapped so a buggy provider can never break a request.No providers → no-op, negligible cost (snapshot list cached statically).
  • Capture on the origin thread: the PageContextThread constr thread, so capturing there is correct. (If a subclass couldbe constructed off-thread, capture in an overridden start() instead.)
  • Opt-out: a setting (e.g. lucee.thread.context.propagation,d attribute, since propagating context intolong-lived/detached cfthreads may occasionally be undesirable.
  • Prior art: API intentionally shaped like Micrometer’s io.mAccessor so an adapter is trivial; a minimal custom SPI isproposed to keep core dependency-free (open to depending on Micrometer’s context-propagation instead, if preferred).