Revisiting hashcode() - does serialize solve this?

Phillyun · July 20, 2023, 6:38pm

I’ve been attempting to refactor some code and need to have confidence the original object = new object

(see related thread - short version - it’s complicated)

I’m wondering if others have had experience / success using serialize (or serializeJSON()) to work around this issue.

<cfscript>
    // string
    myVar = "a"
    dump(myVar.hashCode())
    dump(myVar.hashCode())
    // array
    myArr = ["a","b"]
    dump(myArr.hashCode())
    dump(myArr.hashCode())

    // struct (cannot use hashcode on object!)
    myStruct1 = {name:"Lucee",ver:5.3}
    myStruct2 = {name:"Lucee",ver:5.3}
    dump(myStruct1.hashCode())
    dump(myStruct2.hashCode())
    // struct serialized (works!)
    dump(serialize(myStruct1).hashCode())
    dump(serialize(myStruct2).hashCode())

    // struct in different sequence serialized (works!)
    myStruct1 = {name:"Lucee",ver:5.3}
    myStruct2 = {ver:5.3,name:"Lucee"}
    dump(serialize(myStruct1).hashCode())
    dump(serialize(myStruct2).hashCode())
</cfscript>

1st execution:

2nd execution: (notice how the hashCode() on the serialized object is the consistent now)

Thoughts? Will this work (for larger/more complex structures) or is it dumb luck with my simple example?

Zackster · July 20, 2023, 6:44pm

hashcode() is the the internal java object signature

try hashing the actual json Hash() :: Lucee Documentation

Phillyun · July 20, 2023, 7:00pm

I cannot pass an object (struct) directly into hash() - it expects a string.
I was hoping for confirmation that serialize is my missing puzzle piece.

Zackster · July 20, 2023, 8:08pm

.toJson() should work, serialize makes a java object dunno if it’s reliably identical

andreas · July 20, 2023, 8:28pm

I’m just going to throw some things that come to my mind relating to that issue. Don’t know if it may help, but it won’t harm at all…

I’d try to make sure that the serialized object is always consistent. During development of the lang admin tool I’ve seen order of structs being changed inconsistently when serializing to JSON.

To easily compare/diff changes of the JSON language resource files inbetween PR/Commits I wanted them to always be consistent. I needed to order the object in all its depth (with all complex data types inside of the object being ordered too). To achieve this, I’ve translated the object (in all depth) into a one dimensional struct, using the complete keyPathName as a single keyname (containing the dot.Notation path name), then sorted it, and translated that one dimensional back to an ordered struct with the help of structKeyTranslate() and keeping that struct as an ordered struct [:].

Thus, in your case and depending of your object… if you need to serialize it to some string, I’d do it just the same way as described above before serializing to a string.

andreas · July 20, 2023, 9:09pm

I think I was talking some nonesense. I didn’t need to use structkeytranslate() for that. I used that for another cause, BUT: I used a recursive function to sort the struct. Here is the example… Because the only complex data type in those structs were structs, the recursive call only takes effect on that data type. You might need to adapt that in case you have more complex data types.

/**
	 * Sorts a struct recursively
	 */
	public struct function sortNestedStruct( struct datastruct ) localmode = true {
		// define sorted struct
		sortedStruct = [ : ];

		// Get the keys of the struct and sort them
		keys = structKeyArray( arguments.datastruct ).sort( "textnocase" );

		// Iterate over the sorted keys
		for( var key in keys ) {
			value = arguments.datastruct[ key ];

			// If the value is a nested struct, recursively sort it
			if( isStruct( value ) ) {
				value = sortNestedStruct( value );
			}

			// Add the key-value pair to the sorted struct
			sortedStruct[ key ] = value;
		}

		return sortedStruct;
	}

BK_BK · July 24, 2023, 12:54pm

@Phillyun: Thoughts? Will this work (for larger/more complex structures) or is it dumb luck with my simple example?

Could you please give more information on how the “1st execution” and “2nd execution” outputs were generated.? Perhaps using the ‘label’ attribute to add an explanation for each dump?

Even if the hashcode of the serialized struct works, I wouldn’t trust it completely in Lucee. Not in the way you’re using it here in any case. I say this for the following reason (ignoring, for simplicity, such technical intricacies as type, overriding, and so on):

It is easy to conclude from equality of hashcodes that the corresponding objects are the same. But that can be misleading even in Java, where hashcode is native. Let alone in Lucee.

In Java, hashcode’s contract is, in pseudocode:

/* Statement 1: If obj1 and obj2 have the same state, then their corresponding hashcodes are equal */
if obj1.equals(obj2) then obj1.hashCode() == obj2.hashCode()

However, there is a logical catch. The converse of this statement is generally false:

/* Statement 2: If two hashcodes are the same, then their corresponding objects are equal : WRONG conclusion! */
if obj1.hashCode() == obj2.hashCode() then obj1.equals(obj2)

Statement 2 is wrong because two distinct object instances may have the same hashcode.

Is that the logic you have been using? If so, then you’re not 100% home and dry.

It is the inverse of statement 1 that is correct, namely:

/* Statement 3: If two hashcodes are different, then their corresponding objects are not equal : CORRECT conclusion! */
if obj1.hashCode() != obj2.hashCode() then !obj1.equals(obj2)

Hence, hashcode is a test of inequality of objects, rather than of equality. Add to that the fact that CFML is weakly-typed. Object equality in CFML becomes a much more complicated affair than in Java, where hashcode is native.

For example, if you run the following Lucee code, you will find that the hashcode of struct1 is different from that of struct2.

struct1={a=1,b=2};
struct2={a=1,b=2};

dump(var=struct1, label="struct1 is {a=1,b=2}");	
dump(var=struct2, label="struct2 is {a=1,b=2}");

writeoutput("<p>");
dump(var=struct1.hashcode(), label="struct1.hashcode()");	
dump(var=struct2.hashcode(), label="struct2.hashcode()");

Therefore, according to statement 3, struct1 and struct 2 are not equal.

That said, would Lucee’s objectEquals() satisfy your needs? An example:

testQuery = queryNew( "name , age" , "varchar , numeric" , { name: [ "Susi" , "Urs" ] , age: [ 20 , 24 ] } );
dump(var=testQuery, label='testQuery: queryNew( "name , age" , "varchar , numeric" , { name: [ "Susi" , "Urs" ] , age: [ 20 , 24 ] } )');

writeoutput("<p>");
testQuery1= queryNew( "name , age" , "varchar , numeric" , [ [ "Susi" , 20 ] , [ "Urs", 24 ] ]);
dump(var=testQuery1, label='testQuery1: queryNew( "name , age" , "varchar , numeric" , [ [ "Susi" , 20 ] , [ "Urs", 24 ] ])');

writeoutput("<p>");
testQuery2= queryNew( "name , age" , "varchar , numeric" , [ { name: "Susi" , age: 20 }, { name: "Urs" , age: 24 } ] );
dump(var=testQuery2, label='testQuery2: queryNew( "name , age" , "varchar , numeric" , [ { name: "Susi" , age: 20 }, { name: "Urs" , age: 24 } ] )');

writeoutput("<p>");
dump(var=objectEquals(left=testQuery, right=testQuery1), label="objectEquals(left=testQuery, right=testQuery1)");

writeoutput("<p>");
dump(var=objectEquals(left=testQuery, right=testQuery2), label="objectEquals(left=testQuery, right=testQuery2)");

writeoutput("<p>");
dump(var=objectEquals(left=testQuery1, right=testQuery2), label="objectEquals(left=testQuery1, right=testQuery2)");

System:
Lucee 5.4.1.8
Windows 10 Professional