Refind regexp not working anymore

Hi,

I’ve just discover that one my regexp check is not working anymore … no
change have been made to the code, only an upgrade from railo to lucee
(using railo for long and now lucee :wink: )

the regexp is :

nom: {regexp:“^[éèëêàäâûüùïîöôÿç’a-zA-Z\
_-]{2,50}$”,erreur=“invalid_nom”, msg=“Votre nom n’est pas valide. Sa
longueur doit être comprise entre 2 et 50 caracteres.”},

<cfif (not refind(CHKregexp[nom].regexp, ARGUMENTS.nom)) >


<cfset callit=application.mythrow(errorCode=“400”, erreur=“#erreur#”,
detail=“#erreur_detail#”) >

if I try with a name with accent like ‘Lashât’ it fail with the error
message …

This check is in an API (REST) and is called by a php page in ajax.

have I done something wrong ?

Stéphane

Hi Stéphane,

Having special characters inside a cfm/cfc file can be problematic when files are read back in, and should be avoided imho. It’s better to have the regex something like this:
“^[\x8C\x9C\xC0\xC2\xC6-\xCB\xCE\xCF\xD4\xD9\xDB\xDC\xE0\xE2\xE6-\xEB\xEE\xEF\xF4\xF9\xFB\xFC 'a-zA-Z_-]{2,50}$”
Where all those \x characters are unicode representations of these characters: http://character-code.com/french-html-codes.php

Also, I see your regex is currently not checking for the uppercase variants of the special characters. The regex I suggested here will check for the uppercase variant as well.

Kind regards,

Paul KlinkenbergOp 22 feb. 2016, om 16:51 heeft Stéphane MERLE <@Stephane_MERLE> het volgende geschreven:

Hi,

I’ve just discover that one my regexp check is not working anymore … no change have been made to the code, only an upgrade from railo to lucee (using railo for long and now lucee :wink: )

the regexp is :

nom: {regexp:“^[éèëêàäâûüùïîöôÿç’a-zA-Z\ _-]{2,50}$”,erreur=“invalid_nom”, msg=“Votre nom n’est pas valide. Sa longueur doit être comprise entre 2 et 50 caracteres.”},

<cfif (not refind(CHKregexp[nom].regexp, ARGUMENTS.nom)) >


<cfset callit=application.mythrow(errorCode=“400”, erreur=“#erreur#”, detail=“#erreur_detail#”) >

if I try with a name with accent like ‘Lashât’ it fail with the error message …

This check is in an API (REST) and is called by a php page in ajax.

have I done something wrong ?

Stéphane


Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com mailto:lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/a3220998-adad-48de-a46e-4052f6820aba%40googlegroups.com https://groups.google.com/d/msgid/lucee/a3220998-adad-48de-a46e-4052f6820aba%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

Having special characters inside a cfm/cfc file can be problematic when
files are read back in, and should be avoided imho.

What?

Why?On Monday, 22 February 2016 21:28:55 UTC, Paul Klinkenberg wrote:

Hi paul,

it has indeed solved the matter !

Thanks for the tip !

StéphaneLe mardi 23 février 2016 10:32:43 UTC+1, Paul Klinkenberg a écrit :

Well, I have seen numereous occasions where special characters were
garbled in a cfml template (or any text document for that matter)
That usually occured after the file was ftp’d, updated in an editor with
different character set, or any other action which causes the file to
change character set.

Off course, measures can be taken to prevent this from happening, but I
have had my fair share of bugs at different companies and platforms, with
this exact problem.

Btw, google comes up with dozens of examples, eg
http://stackoverflow.com/questions/22485224/coldfusion-character-encoding-issue
/ ColdFusion is not UTF-8 encoded

Paul

Op 23 feb. 2016, om 00:24 heeft Adam Cameron <camero...@gmail.com <javascript:>> het volgende geschreven:

On Monday, 22 February 2016 21:28:55 UTC, Paul Klinkenberg wrote:

Having special characters inside a cfm/cfc file can be problematic when
files are read back in, and should be avoided imho.

What?

Why?


Love Lucee? Become a supporter and be part of the Lucee project today! -
http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+un...@googlegroups.com <javascript:>.
To post to this group, send email to lu...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/60da3470-88da-4909-abad-69cd9a959315%40googlegroups.com
https://groups.google.com/d/msgid/lucee/60da3470-88da-4909-abad-69cd9a959315%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

Well, I have seen numereous occasions where special characters were garbled in a cfml template (or any text document for that matter)
That usually occured after the file was ftp’d, updated in an editor with different character set, or any other action which causes the file to change character set.

Off course, measures can be taken to prevent this from happening, but I have had my fair share of bugs at different companies and platforms, with this exact problem.

Btw, google comes up with dozens of examples, eg http://stackoverflow.com/questions/22485224/coldfusion-character-encoding-issue http://stackoverflow.com/questions/22485224/coldfusion-character-encoding-issue / ColdFusion is not UTF-8 encoded http://www.thickpaddy.com/2009/8/10/coldfusion-is-not-utf-8-encoded

PaulOp 23 feb. 2016, om 00:24 heeft Adam Cameron <@Adam_Cameron> het volgende geschreven:

On Monday, 22 February 2016 21:28:55 UTC, Paul Klinkenberg wrote:
Having special characters inside a cfm/cfc file can be problematic when files are read back in, and should be avoided imho.

What?

Why?


Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com mailto:lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com mailto:lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/60da3470-88da-4909-abad-69cd9a959315%40googlegroups.com https://groups.google.com/d/msgid/lucee/60da3470-88da-4909-abad-69cd9a959315%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

Of course, for nearly any language besides English, “special” characters
are normal. English is the outlier here. Interfaces will have words
containing them, and these characters simply can’t be avoided without
misspellings - or using images in place of text - for Chinese words as an
example. Both workarounds aren’t at all ideal. Dealing with various and
varying character sets is a pain, but unavoidable in my opinion - unless
one works only in English.

The issue I typically run across in Switzerland is that someone will give
me a text in an encoding other than utf-8, and then I have to convert it.
If it’s data, I’ll import it into mySql and convert the charset there.

Aria Media Sagl
+41 (0)76 303 4477 cell
skype: ariamediaOn Sat, Feb 27, 2016 at 9:08 PM, Adam Cameron <@Adam_Cameron> wrote:

On Monday, 22 February 2016 21:28:55 UTC, Paul Klinkenberg wrote:

Having special characters inside a cfm/cfc file can be problematic when
files are read back in, and should be avoided imho.

What?

Why?

On Tuesday, 23 February 2016 09:32:43 UTC, Paul Klinkenberg wrote:

Well, I have seen numereous occasions where special characters were
garbled in a cfml template (or any text document for that matter)
That usually occured after the file was ftp’d, updated in an editor with
different character set, or any other action which causes the file to
change character set.

Well yeah, all valid observations. I think it’s more just something to be
mindful of, than actively avoid. That said… having this sort of content
in a code file kinda suggests there’s hard-coded content in the code - I
imagine this is where this sort of thing mostly comes from - which is
probably rather more an issue.

I do find that charset encoding is a topic that a lot of CFML devs
(perhaps not just CFML ones) do seem to struggle with.

I s’pose it’s just more complexity and “moving parts” that can
contribute to possible problems.


Love Lucee? Become a supporter and be part of the Lucee project today! -
http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups
“Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an
email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/lucee/4a94a5ae-c5b6-4460-8ab5-e3aea5864a0f%40googlegroups.com
https://groups.google.com/d/msgid/lucee/4a94a5ae-c5b6-4460-8ab5-e3aea5864a0f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

Having special characters inside a cfm/cfc file can be problematic when
files are read back in, and should be avoided imho.

What?

Why?On Monday, 22 February 2016 21:28:55 UTC, Paul Klinkenberg wrote:

On Tuesday, 23 February 2016 09:32:43 UTC, Paul Klinkenberg wrote:

Well, I have seen numereous occasions where special characters were
garbled in a cfml template (or any text document for that matter)
That usually occured after the file was ftp’d, updated in an editor with
different character set, or any other action which causes the file to
change character set.

Well yeah, all valid observations. I think it’s more just something to be
mindful of, than actively avoid. That said… having this sort of content
in a code file kinda suggests there’s hard-coded content in the code - I
imagine this is where this sort of thing mostly comes from - which is
probably rather more an issue.

I do find that charset encoding is a topic that a lot of CFML devs (perhaps
not just CFML ones) do seem to struggle with.

I s’pose it’s just more complexity and “moving parts” that can contribute
to possible problems.

Hence the point Nando makes is absolutely correct.

Which is why… uh… I was agreeing with him.On Sunday, 28 February 2016 19:28:20 UTC, Kai Koenig wrote:

Of course, for nearly any language besides English, “special” characters
are normal.

Yeah, I do wish people would stop using such jingoistic terms. I assure you
to a lot of English speakers, there’s nothing “special” about some other
language’s character set.

It’s even worse when some muppets - usually when talking about password
strength - refer to punctuation as “special characters”. I know IT people
are - on the whole - reasonably poor at written communication, but even to
them how are things like comma and fullstops “special”?

[sigh]On Saturday, 27 February 2016 22:37:46 UTC, Nando Breiter wrote:

I get your point, Adam, but you have to admit that in particular in the single-language English lands of the UK, AU, NZ and the US for a lot of English speakers, “special” (non ASCII) characters are NOT the norm and quite frankly a lot of applications are not being built to handle non-ASCII character sets.

Hence the point Nando makes is absolutely correct.

Cheers
Kai>

On Saturday, 27 February 2016 22:37:46 UTC, Nando Breiter wrote:
Of course, for nearly any language besides English, “special” characters are normal.

Yeah, I do wish people would stop using such jingoistic terms. I assure you to a lot of English speakers, there’s nothing “special” about some other language’s character set.

It’s even worse when some muppets - usually when talking about password strength - refer to punctuation as “special characters”. I know IT people are - on the whole - reasonably poor at written communication, but even to them how are things like comma and fullstops “special”?

[sigh]


Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html

You received this message because you are subscribed to the Google Groups “Lucee” group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lucee@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/b03c41ac-0501-4a39-aa04-c1bbdb548093%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.