Tuesday, July 15, 2008

Forensics Methods

It might be of interest to some people exactly what I did to find and recover this lost information and data. In any case, it makes sense to me to document the methods used and anything else that may be interesting during this project. After all, I'm trying to recover lost documentation of my Quake clanning experience because I wish to reminisce occasionally - I may also want to reminisce about this forensics experience.

First I went through every hard drive I owned and installed them, one by one, in a 2nd, and mostly unused, computer to see what worked and what could easily be found. This is where I was originally expecting to find those saved mp3s and videos. I discovered that two hard drives were entirely dead (which I already knew, but it was reassuring that there were no others) and a couple other drives could not be recognized. A few videos and pictures were found at this stage, but not much and nothing unexpected or entirely interesting.

I then did some research and planning. There is a lot involved in computer forensics and while I did take a course on it 9 months ago, I am by no means an expert and had largely forgotten the details of the course. So I had to remind myself of certain things. I also had to do some planning for which hard drives to search first and how to store what I recovered. For one thing, I have several hard drives that are the same make and size, thus being potentially confusing unless I clearly labeled each. For another thing, I needed free space to store all this stuff, without potentially writing over as of yet unrecovered data. I have two 750GB hard drives in my main computer that I felt safe did not have any hidden or lost data contained on them. So that would be where I would store everything. But even then, I have several drives that are 80-160GB in size - I could eat through that 1.5TB of space (of which at least 500 GB was already being used, by the way) in a hurry.

Rather than going through the pain of learning various filesystems all over again and writing some programs to parse through them, I found and used these programs called TestDisk and PhotoRec for search and recovery. It was quite a niffy set of programs. With TestDisk, I found the reason for a couple hard drives being unrecognizable to Windows - they were Linux LVM partitions, remnants of a PVR system I built, but had ceased using, utilizing MythTV. With PhotoRec, I could search through and recover data files. I did this one hard drive at a time on my 2nd computer, to ensure that if anything went bad, I wasn't losing anything valuable. I then transferred the recovered data to my main computer where the parsing phase began.

To parse the hundreds of GB of data totaling over 200,000 individual files would require automation. I first made a script to seperate all the files by filetype into individual directories. That way I would have all the jpg files together in one place to look through, while all the mpg files were in another place to browse through. I separated according to the most likely and interesting filetypes (jpg, mpg, avi, gif, html, txt, mp3, wav) and also in order to rid the more useless or uninteresting filetypes (dll, exe, h, xml).

I now could look through a graphic directory like jpg say, see what is there, and separate them to a more appropriate location. Of interest were any personal pictures of me, my friends, or trips or events I may have taken or been a part of; any shows, practices, or recording sessions of the bands I worked with; or anything dealing with DP and the DOM or RiP clans. (There was also much porn, which was inevitable and mostly uninteresting, if not humorous. Mostly temporary saved Internet files rather than stuff that I had wanted to save. To that end there were a lot of other temporary Internet files dealing with music, sports, games, and anything else I typically spend time browsing on the web.)

I should note that since most of these files were deleted or overwritten by a new filesystem, filenames or directory structure could not be recreated. Thus all the files are named f71429412.mp3 or f83917523.jpg. In other words, there was, and still is, a lot of tedious renaming to be done. In most cases I simply separated them into the appropriate location and will go back when I have time to rename them. I have to actually view or listen to each individual file to figure out what it is exactly before I can rename them - a very, VERY, time consuming task. It should also be known that there may be a lot of repeats, files that were transferred to different hard drives over the years, their remains still present on multiple drives. There is a way I could parse through all the files, detect if they are duplicates, and delete them, but I have not done that yet.

With text files, since many were garbage or were uninteresting and useless system files, I had to search through for specific strings that may be of interest, for example "domsquad". To do this I just used wrote a script that used find and xargs and moved found matches to appropriate places. Since there could be false positives, I still had to manually look at each file, but the script greatly limited the number of files I had to check.

The results of all this work and parsing was the rediscovery of many valuable files, some long forgotten. I still have a lot of work left to do, but I am quite pleased with the initial results. I hope that additional, deeper and more in depth forensics, will result in even more rediscoveries. And anything of interest relating to DOM will be posted here.

No comments: