Add support for sort type="natural|naturalNoCase"

A very common sort type is Natural or Human Sorting, which takes into consideration each
element of the string, so elements containing text and numbers are sorted properly

Should this been added to lucee as a built in sort type in addition to text, numeric and textnocase?

I really like that style of sorting, but personally, I say someone should write a library to perform that comparison (and put it on ForgeBox, of course!) and then it can be used as a comparator UDF to the sort function. I’m not sure about the core engine getting off in the weeds of alternative sorts since I could see a lot of variations of it.

I’d like to see a native java version for performance reasons, it’s also
one of the things which I love about CFML is OOTB you have a lot
base functionality without needing extra libraries

this implementation is pretty tiny
http://www.davekoelle.com/files/AlphanumComparator.java

of course you if you need something more customised, you can always plugin in your own sort comparator from forgebox!

I’d be interested is seeing an actual performance comparison between CFML compiled down to bytecode and Java compiled down to bytecode doing this same algorithm.

haha, I’m currently working on transpiling lucee into webassembly and then back into cfml #justkidding

1 Like

Natural sorting usually relies on a Comparable Interface, i.e. having the sorted objects implement a compareTo(other) method which will return -1, 0, or 1.

You can pass a Comparator closure to ArraySort(), and implement the proposed type like so:

comparator = function(lhs, rhs){
  if (isNumeric(arguments.lhs) && isNumeric(arguments.rhs))
    return arguments.lhs - arguments.rhs;

  return compare(arguments.lhs, arguments.rhs);
}

and then

someArray.sort(comparator);

What I would propose actually, is to enhance Compare() so that it will handle other known types like Numbers and Dates (currently only strings). Then your comparator can be as simple as

comparator = function(lhs, rhs){
  return compare(arguments.lhs, arguments.rhs);
}

This sounds good, Igal, but it doesn’t seem to work as I was expecting on this array:

data = [ "z1", "z10", "z2" ];

As I understand it the “natural” order would be z1, z2, z10, but I’m not seeing any change from the input.

@Julian_Halliwell You are right. In the context of this thread “Natural Ordering” means something different from what I had in mind, and TBH I didn’t look at the link that @Zackster posted.

Coming from the Java world, natural ordering means what I wrote above [1]:

Comparable implementations provide a natural ordering for a class, which allows objects of that class to be sorted automatically.

For String that would be “Lexicographic” ordering, which seems “natural” to me, but is by no means “human”.

I think that for the sake of clarity, the “Human” part should be emphasized in this thread rather than the “Natural” part.

I still think that the best approach is to enhance the BIF Compare(), so that instead of

Compare(String1, String2)

Its signature will be

Compare(Object1, Object2, flags)

So that (just throwing ideas here):

Compare("z2", "z10")                          // yields 1

Compare("z2", "z10", "human")                 // yields -1

Compare("01/02/2018", "2018-01-15", "date", "en-US")  // yields -1 for January 2nd

Compare("01/02/2018", "2018-01-15", "date", "en-UK")  // yields  1 for February 1st

Compare("01/02/2018", "2018-01-15", "date")  // yields -1 for me and 1 for you, assuming US and UK locales, respectively

This implementation uses Java 8 features via the java.util.streams package.

As of now Lucee’s source code is still limited to Java 7, so that one, for example, will not compile until we decide to make Lucee require Java 8 as a minimum runtime environment, so that we can take advantage of the new features that the JDK has to offer.

As of a few hours ago that is no longer the case. Lucee 5.3 will require Java 8 as a minimum runtime environment. Mazal tov :wine_glass:

4 Likes

we could do it as part of the core and still make it customisable. Currently the sort types support (next to UDF comparator) are [text,textNoCase,numeric] and the implementation in Lucee looks like this:

	// text
	if(sortType.equalsIgnoreCase("text")) comp=new SortRegisterComparator(pc,isAsc,false,true);
	// text no case
	else if(sortType.equalsIgnoreCase("textnocase")) comp=new SortRegisterComparator(pc,isAsc,true,true);			
	// numeric
	else if(sortType.equalsIgnoreCase("numeric")) comp=new NumberSortRegisterComparator(isAsc);

we could make it possible to make this customisable and extendable by adding this to the lucee xml as follows.

<sorters>
   <sorter name="text,string" class="lucee.runtime.type.comparator.TextSortRegisterComparator" />
   <sorter name="textnocase" class="lucee.runtime.type.comparator.TextNoCaseSortRegisterComparator" />
   <sorter name="numeric,number" class="lucee.runtime.type.comparator.NumberSortRegisterComparator" />
   <sorter name="human" class="org.lucee.extension.whatever.HumanSortRegisterComparator" bundlename="whatever" bundleversion="1.0.0.0" />
   <sorter name="chaos" udf="{lucee-config}/sorter/chaos.cfm"/>
   <sorter name="chaos2" component="{lucee-config}/sorter/Chaos.cfc"/>
   
</sorters>

then this could be maniupulated by an extension or with help of <cfadmin>. We could also add the possibility to extend it in the application.cfc itself (with a UDF or CFC).
this.sorter['human']=function(l,r){...};

I like the Chaos2 sorter :wink:

I still think that we should enhance Compare() to take other arguments.

Instead of Sorter I would use Comparator. It’s common in other languages too, and there is no need to come up with new names.