Sign in to follow this  
Followers 0
x86freak

Smart Hashing

18 posts in this topic

I want to know is it possible to make Apex to be a litlle smarter when handaling hashed files. What I mean is now hashes are 'tied' to place where the file is located. So if I move some already-hashed file to other dir it gets rehashed again. WHY? It's the same file, same size, etc.. I thought check sum is to identify the file, NOT it's place.

One example i can think of is seeding torrents. I can seed my torrents from anywhere on my drive without rechecking.

Share this post


Link to post
Share on other sites

a when it's in different place how you find out that it's the same file? you must hash it :-P

Is it possible to first hash block #0 of each file then re-hash entire files in pass #2 ? Like, fast hash-indexing to get list of all files then slower pass #2 to complete the hashes. Block#0 could be used to inteligently detect if it is "potentitally" same file moved to other folder and avoit complete re-hash ... I'd say that there are bettter ways to determine if it is at least almost same file ... anyway before actually uploading you would force complete hashing of that particular file.

Share this post


Link to post
Share on other sites

Is it possible to first hash block #0 of each file then re-hash entire files in pass #2 ? Like, fast hash-indexing to get list of all files then slower pass #2 to complete the hashes. Block#0 could be used to inteligently detect if it is "potentitally" same file moved to other folder and avoit complete re-hash ... I'd say that there are bettter ways to determine if it is at least almost same file ... anyway before actually uploading you would force complete hashing of that particular file.

I don't agree. What happens if the folder is on another drive and the file becomes damaged in the transfer? Then the user trying to get that file would end up with a botched file.

Share this post


Link to post
Share on other sites

a when it's in different place how you find out that it's the same file? you must hash it :-P

I'm just curious is it possible to make some simple-check for already hashed.

Doing a full file hash all the time is way too much & I dont think is a good idea. I bet you dont ask your mom for ID and blood sample each day just to tell it's her xD

And about file getting corrupt - it can get damaged AFTER ApexDC++ hashes it and wham! - your sharing a bogus file not even knowing about it.

Share this post


Link to post
Share on other sites

I'm just curious is it possible to make some simple-check for already hashed.

Doing a full file hash all the time is way too much & I dont think is a good idea. I bet you dont ask your mom for ID and blood sample each day just to tell it's her xD

And about file getting corrupt - it can get damaged AFTER ApexDC++ hashes it and wham! - your sharing a bogus file not even knowing about it.

MD5 or SFV checking? Is it faster and applicable?

Share this post


Link to post
Share on other sites

MD5 or SFV checking? Is it faster and applicable?

I can't imagine it is much faster/simpler to be honest. I guess I wouldn't know though. :)

Share this post


Link to post
Share on other sites

The simplest this i could think of make a hash list that would look something like this:

filename - size - creation/modification dates - TTH

Every time rechecking occurs just check if the file wasn't moddified and if it wasn't - use hash from list.

P.S. it would be nice to have a Force rehash button for files/dirs.

Share this post


Link to post
Share on other sites

But with all these details, will the hashing take much longer? Of course it will compensate when moving, but how much time it will take for the first hashing?

Edited by Zlobomir

Share this post


Link to post
Share on other sites

But with all these details, will the hashing take much longer? Of course it will compensate when moving, but how much time it will take for the first hashing?

Why should it? There's no calculating involved just writing additional data to some file.

Think about how filesystem handles files: it doesnt actually move your files(unless they need to be) - just changes table indexes which assing them to some dir - It doesnt take time at all :)

Share this post


Link to post
Share on other sites

I think its fine the way it is ...

think about all those double file names peeps can have at thier share.

for example some tools have default names ... for audio files track01 , track02 ...... are common names.

if you dont check the TTH and remove a set of files ad a new set of files with indentical names how can the client know its the same set/file or a new one.

the only safe way i can think on is to ad a function to Apex to move folders so it know than it must be the same set

if the hashing bloat your system ... you can also simple set it to a lower max hash speed.

(for the peeps who know what the do there is an option to close apex ... move your files ... edit the hashindex.xml and it wont rehash)

A_Alias

Share this post


Link to post
Share on other sites

Emm.. Since this is the "Feature Requests" part of the forum i think ppl here aren't happy enough with what they have :stuart:

And about your comment A_Alias:

I'm sure there are many files with same names in shares. It was just a simple example.

My point was - someone should make some algorithm to prevent unneeded rehashing.

Share this post


Link to post
Share on other sites

My point was - someone should make some algorithm to prevent unneeded rehashing.

...for the sake of the life of our harddisks.. :D

Share this post


Link to post
Share on other sites

...for the sake of the life of our harddisks.. :D

There is an easy solution(s)

1. set max hash speed to lower value

2. stop moving those files after you have added them to share, and you only need to hash them once...

Simple and easy, ain't it?

Share this post


Link to post
Share on other sites

@1: To my understanding that would not really decrease the work the HDD has to do, only leave it more time to do it.

@2: Yes, that would be the only solution, and actually I don't understand either why someone needs to move his files that often. As for me, I try to sort everything when I download the files. So files only have to be moved and rehashed once I change something in my order...

BTW: Avoiding unnecessary moving of files was the reason for my feature request concerning direct downloading to target drives and folders...

Share this post


Link to post
Share on other sites

There is an easy solution(s)

1. set max hash speed to lower value

2. stop moving those files after you have added them to share, and you only need to hash them once...

Simple and easy, ain't it?

...hmm...

Hard and maniac way:

1. Make extension for explorer what add ability to change hashindex file of DC when DC closed.

2. Make same plugins for total commander, far manager and other filebrowsers.

3. Make same extension for linux-based programs.

...and now we can sell this useless programs...

:D

2Crise: You absolutely right!

Share this post


Link to post
Share on other sites

heh... Don't get hit by a truck and you won't need to go to hospital :D

Just want to get some answers before moving on:

- Is it possible to make the improvement this thread is about?

- If yes - are the benefits of such improvement so minor that aren't even worth the effort?

Share this post


Link to post
Share on other sites

I don't think any feel improvements have been suggested, not that are related to the client at least. You could setup your filemanager to edit your hash data to reflect you moving your files so you wouldn't really need to re-hash, but that's rather complicated if possible (which it should be).... I can't imagine there's any realistic way to do it in explorer. The option of just not moving files is also another one, although that's not really counter-attacking the problem, just avoiding it. As for the MD5 or SFV checking, as I said, I don't imagne it would be any faster.

Share this post


Link to post
Share on other sites
Sign in to follow this  
Followers 0