...HOWEVER, we recently became aware of a problem...
Big thanks for mentioning this--I'll watch for the changes.
Seems my proposed data structure incorporating 'actual' difference bytes might run into 'legal' problems. I will need a more 'Bitzi' like approach.
Thoughts:
HashRite should be able to
1) Publish the TigerTreeHash for a given bitprint with a file name of the bitprint and an extension of "TTH'?:
7PJQCHA5IEDECEEQ4YPP2LZ2PQYQI2YJ.H4JFJ47NLNRPNASHYAPAN3RKA3VSXM2LN3EOPRQ.TTH
By using the bitprint as a file name and 'searching' for the file name, I'll be using the same concept as Bitzi--community votes on quality of the file.
This way many people can publish and confirm the 'correct' version of the TigerTree Hash files.
2) Search for HashRitePatch file name in format 'Good SHA1'.'Bad SHA1'.HRP:
7PJQCHA5IEDECEEQ4YPP2LZ2PQYQI2YJ.BRZ5J3R53Z3O3FSMT4B6QW45TPNLDESK.HRP
These are community published files which 'contain' the actual difference data to repair the file.
3) Search for HashRiteDifference file:
7PJQCHA5IEDECEEQ4YPP2LZ2PQYQI2YJ.BRZ5J3R53Z3O3FSMT4B6QW45TPNLDESK.HRD
These files do not contain the actual difference data, just offset and length of differences--should be legal/safe to store on a web site or to share. Requires that HashRite 'locate' the good file in the P2P community and download the 'smallest' amount of data required to correct the 'bad' file.
Will also offer to 'build' a .HRP file to publish for others to reduce BW required to repair files.
4) If no .HRP or HRD files exist, use .TTH to 'repair' local file (using 1k data blocks which have different hash values--again, like above using P2P shared files as source of data) and offer option to publish '.HRD' and/or '.HRP' files.
It makes sense for HasRite to 'reveal' how many bytes must be downloaded from a source to make the files match--this way a user can decide that this is not a good alternative source--This information can also be submitted as part of bitprint metatag info.
I'll prototype HashRite first without the P2P access (download popular variants/corrupted copies of the same file.) I'll publish a command line utility for testing.
I hope that P2P clients that support TTHash will 'offer' the entire hash tree for download--this way I can avoid using intermediate '.TTH' files--better yet, I hope they will directly support file repair.
It would be nice to add a metatag in the bitprint of the TTH data/file bitprint so I can use xml to find the TTH file.
To keep the TTH file small maybe it should only go as deep as 64k data hashes or would all the way down to 1k data hashes be better? Another choice might be to have two files top level hashes down to 64k data hashes then second file which contains hashes below 64k down to 1k?
Does this make sense or am I off base?