Thursday, July 24, 2008

Update

Unless another old hard drive turns up somewhere, I have completed the scanning and recovering phase. Now I just have to finish identification and renaming phase - the much, MUCH, longer and tedious phase.

All in total, I have scanned 520 GB of hard drive and found and recovered about 500,000 files totaling almost 300 GB. Much is just program files or temporary internet files and can be deleted right away. But it takes a lot of time to go through and determine what is worthless and what needs to be saved.

MP3's which have full tagging can be automated to rename them, but at least half of them are not or have incomplete tagging. The only way to go through all that is to listen to them. And if it's something I am not familiar with, I will be unable to do anything with them. This amounts to some 5000+ untagged MP3's that I have to go through. Needless to say this will be a never-ending project.

Text files present a different challenge. I can parse through text files for a specific search string, like "domsquad", very easily. The trouble is that certain strings turn up thousands of false positives. So the challenge is to find the right strings to search for. Failing that, I have to manually read the matched files and determine which are legit and which are garbage. I should note that when I mean false positive, I mean that the match is positive, but it's not relevant to what I am looking for. For example, searching for "vj" will likely turn up a ton of matches, but 99% of them are not what I want - namely anything dealing with the DOM member named vj.

There are a lot of duplicates, so the total of unique files may be far smaller. It takes a while to figure out what are repeats and eliminate them. So, for example, the last 120 GB hard drive that I scanned, and have yet to parse, may have almost nothing on there that I already haven't recovered elsewhere. But I have to look anyway because there may be a few gems hidden somewhere in there.

So that was just a recap of what I went through and the totals I ended up with so I can remember later on. I've learned a lot during the process too. If I were to do it over again, I'd be more organized and efficient from the start. At the end of the day, though (or summer or year in this case), it will be worth it when everything is all recovered, renamed, and it its rightful spot.

No comments: