Thanks for bringing this conversation to our attention.
FYI, the TigerTree algorithm has only been changed once in its history, earlier this year, to address the "trivial missized collision" issue that was first brought to our attention last November. We made the change after consulting with a number of cryptographic experts as to the best solution. The THEX specification was updated in March, and the code in the 'tigertree' and 'bitcollider' projects at Sourceforge since then.
I don't quite know what Curt Welch means by "the file size problem" that he believes remains, and browsing over more of the conversation at news.software.nntp hasn't helped me understand.
TigerTree (or other THEX tree-arranged hash) should be ideal for the issue of Usenet posts split into many parts. If it is necessary to know exactly where a part fits into the whole, and the size of the whole, with every part, then you should include (a portion of) the full tree with each part.
That portion could be (1) all generations of the serialized tree from the node which includes the sent part on up to the root (in which case you could use the THEX serialized form); or (2) all the nodes in the same generation as the node which includes the sent part; or (3) an even more compact "proof", which only contains the barest minimum other node values to "paint in" the sent part's value, up to the root.
So considering a file broken into 8 parts, A B C D E F G H, and assuming you're sending part C:
The tree is:
.......O.......
...M...+...N...
.I.+.J...K.+.L.
A+B.C+D.E+F.G+H
Under strategy (1), you'd send all interim values A B C D E F G H I J K L M N O (the full THEX tree) along with the data for part C. The recipient would know for certain the whole file's size, the whole file's treehash, C's size, where C fits in, and every interim hash at C's level and higher.
Under strategy (2), you'd send just the bottom interim values A B C D E F G H along with the data for part C -- because I J K L M N O are all trivially calculatable from the bottom generation. Again, the recipient would know for certain the whole file's size, the whole file's treehash, C's size, where C fits in, and every interim hash at C's level and higher.
Under strategy (3), you'd only send the barest minimum of other values to prove where C fits in -- D I N -- along with the data for C. (D with C is enough to calculate J, J with I is enough to calculate M, M with N is enough to calculate O.) Here, the recipient would know for certain the whole file's size, the whole file's treehash, C's size, where C fits in, and the interim hashes between C and the root value.
--
Even the "send the whole (top of) the tree" strategy #1 is a small amount of overhead compared to large files: Splitting a 1GB file into 1024 1MB segments only requires a full tree of 2048 interim values, ~48K. That's under 5% overhead per 1MB part. (For strategy #2, that'd be 1024 interim values: 24K, <2.5% overhead. For strategy #3, that'd be just 10 interim values, 240bytes, <0.03% overhead -- but much more complicated code as random parts come in.)
Hope this helps,