How to determine length of a UTF-8 input string using CFML

This is more of a general design issue than a CFML issue but since we are
using Lucee, I figured I would give it a try on this forum.

The problem that we are having is that we have just started accepting
Japanese characters into our Oracle database. We have set our web page
encoding to UTF-8. The problem that we are having is that we have
database field lengths set to specific length, for example first name field
is set to VARCHAR2(50). But the Japanese characters are longer than 50
characters and our validation to trap for field lengths are not working and
we are getting Insert errors because the field length is longer than 50
characters.

Trying Left(firstname,50) isn’t working.

Do I just keep increasing the size of my database fields? Or is there a
way to test the length of the Japanese string? We are running on IIS if
that matters.

Any help would be greatly appreciated.

Problem is that the jvm (where Lucee is on top) is handling all strings as
unicode, so the charset only comes into the game when you read strings from
outside the jvm or write them somewhere outside the jvm. Inside the jvm
every letter uses one position, so that a certain letter uses 2 position
when stored with a certain charset is completely independent. So what can
you do? The easiest way is to convert the string to a byte array and get
the length of that array like this:
// not tested
function bytelength(required string str, required string charset){
return arrayLen( arguments.str.getBytes(arguments. charset));
}

Then you can use this function to terminate the length and cut pieces of
the string until you have the right length.

We should consider to extend the function stringLen with this…

MichaAm 10.02.2016 4:34 vorm. schrieb “Dave Hatz” <@Dave_Hatz>:

This is more of a general design issue than a CFML issue but since we are
using Lucee, I figured I would give it a try on this forum.

The problem that we are having is that we have just started accepting
Japanese characters into our Oracle database. We have set our web page
encoding to UTF-8. The problem that we are having is that we have
database field lengths set to specific length, for example first name field
is set to VARCHAR2(50). But the Japanese characters are longer than 50
characters and our validation to trap for field lengths are not working and
we are getting Insert errors because the field length is longer than 50
characters.

Trying Left(firstname,50) isn’t working.

Do I just keep increasing the size of my database fields? Or is there a
way to test the length of the Japanese string? We are running on IIS if
that matters.

Any help would be greatly appreciated.


Love Lucee? Become a supporter and be part of the Lucee project today! -
http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/65c3b19b-9a9a-4843-9ad5-b2282bc949ba%40googlegroups.com
https://groups.google.com/d/msgid/lucee/65c3b19b-9a9a-4843-9ad5-b2282bc949ba%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.