CFHTTP character encoding issue / utf-8

Using a very old production workhorse Lucee here to play around with OpenAI API and I bumped - first time in years - to the notorious mojibake character encoding problem.

Posting a totally valid UTF-8 payload to OpenAI API through CFHTTP I get data back that has messed up the non-ASCII characters (mojibake).

The response data I get from OpenAI API using CFHTTP converts “ä” to “ö” and so on. The very basic behaviour when character encoding is somehow totally messed up.

Using getAsBinary=“yes” wont work, as API throws a 400 error if using that. Setting charset=“utf-8” or leaving it out as a CFHTTP attribute wont do any difference. Also, tweaking the headers (applicatoin/json vs adding explicit charset there) doesn’t seem to change the situation.

Posting the exact same payload manually through curl or HTTPie works and doesn’t mess up the encodings.

Any tips? Has this been resolved in same Lucee updates during the past 6 years ( :grin: yes… I know, it is an old, old instance).

OS: Linux
Lucee Version: 5.3.2.77

Here is the related ticket for this issue:
https://luceeserver.atlassian.net/browse/LDEV-1856
Could you please try it with the latest Lucee version (6.2.2.91)?

1 Like

We are in process of upgrading our test environment to have modern stable Lucee, but as it involves tons of other stuff do (basically re-doing the entire server image), I can’t test the code in 6.x until sometime later this year :frowning_face:

But yeah, ticket somewhat looks like it could fix the issue.

For the time being, we created a stupid little function to manually replace the most common mojibake character-combinations with the proper characters.