Say, you have a friend who is truly a very promising artist, and you want to help her to protect her work.
One way is to advise her to use her mobile phone to capture each of her art work and send them to you, you then, you either hash or apply other encryption algorithm to each art work image and store them in a safe place.
Then, at a future date, some con artist or the like tries to steal her intellectual property / art work by faking her work, she could take an image at such fake work and compare the hash value of each art work to detect fakes.
a) would same authentic image of different size generate different hash value even with the same/exact hash method?
b) would same authentic image of the same size captured by different maker of mobile phone generate different hash value even using the same hash method?
And any further thoughts on the subject would be much appreciated.
every single image is going to have a different hash
I guess I should have further expanded it to include the following:
c) would the hash value of images (of exact same size, width and height and same resolution) generated / taken of the same authentic physical object (be it art work or otherwise) at different time intervals by the same mobile phone remain the same value?
If not, why?
If the protected work is digital, then cryptographic hashes will work. Because a hash is going to be identical if and only if the digital stream it is passed is identical in length and every byte is the same. Thats exactly why people use md5sum sha256sum etc to verify downloaded files.
Hashes do NOT represent how SIMILAR two things are. Change 1 byte and the entire hash will change. This is why programs like rsync do incremental transfers by splitting the content up into smaller chunks that can be hashed. Even so you still have a binary state - equal, or not.
If the protected work is analog, you’ll never get the same digital stream from the same analog input. If we’re talking a 4 megapixel photo, EVERY PIXEL must be identical. If your second photo is 1mm to the right, taken in slightly different lighting conditions, or has a filter applied, then a != b.
Consider an mp3 - digital work. A mp3 and aac of the same song will be different. In fact, if you decompress them to wav, the WAV files will be different, because both are lossy compression. Changing a single letter in an ID3 tag in a mp3 yields a different hash.
What you CAN do is pick an algorithm that models the data you’re comparing - i.e. with music, Shazam has figured out and utilized an algorithm that can identify the fingerprint of a song and reveal the original even in differing environments, etc. But it’s still not good enough to identify another band covering the same song.
Realistically the best “fuzzy” comparisons are done by the human brain. That’s why a captcha, which is relatively easy for a person, is hard for a machine.
Basically, in your example, you’re better off just keeping the images your friend sends you in a safe place, tagged by when you received them and with all exif information, to establish providence later. But some human will still have to compare your saved image to a future work and make a determination as to whether they are the “same” or not. ( this is also why things like saving images of the canvas UNDER the frame is useful for identifying forged paintings, because only the artist and the person framing the canvas have seen that part of the work - security by obscurity rather than iron clad machine logic)
Excellent, Joe, thank you.
Perfect analysis, Joe, much appreciated (to your first post).