Wednesday, October 9, 2013

Powershell for Hashes and Timestamps

I have recently been loving the functionality that PowerShell provides. I think it's Microsoft's best attempt at a *nix-like shell system. There is so much that I have been able to do just playing around with it that today, when needed, I was able to bang out two quick scripts. I figured I would post one here now (and maybe the second one after I clean it up...and maybe "module-ize" it).

I recently have been put through the wringer by first Dell (in regards to a failed hard drive on my primary laptop) and then Microsoft (in regards to my primary Live account being hacked and misused). Because of these issues, I have had to rebuild and set up my system from scratch. What I found while doing this was that I had somehow made a good number of "backup" copies of the source directory of a big project I have been working on. As I use BitBucket, this normally wouldn't be a problem...except for the fact that there was a good amount of changes I made while the last hard drive was failing, which caused multiple pushes to fail when the laptop froze up. This left me with a less than sure feeling of main and primary backup folders for the project. What to do?

I decided that the easiest thing would be to have a spreadsheet of the filepath, filename, the MD5 of each file, and the Last Write Time of each file. So, I moved all of the folders under one temporary one on my desktop. Now I just needed to get the meta-data I needed. PowerShell and the PowerShell Community Extensions to the rescue!

The PowerShell Community Extensions (PSCX, http://pscx.codeplex.com/) provides a useful Get-Hash function. This function can produce a number of different type of hashes depending on the switches applied by the user. Even better, this function accepts pipelined results and the pipelining of its own output, which comes in very handy.

To get to some code, using the PSCX Get-Hash function is as easy as:

Get-Hash MyFile.txt

The above will by default produce the MD5 hash of MyFile.txt and will output four data values:
- Path: the full path and name of the file
- Algorithm: the algorithm used. In the example above the output would be 'MD5'
- HashString: the hash string based on the algorithm used
- Hash: the system datatype of the HashString

For my purposes, I only care about the 1st and 3rd columns (Path, HashString). However, this is still not enough information. The below script is the solution that works for me. I think I am going to convert this to get rid of the hard coded values at some point in the very near future.



####################################################
# FileName: Get-HashesAndTimeStamps.ps1            #
# Author: Dave Werden                              #
# Date:   9 Oct 2013                               #
# NOTES:                                           #
# The four columns produced by the PSCX's Get-Hash #
# module are: Path, Algorithm,HashString,Hash      #
# Dependencies: The PSCX pack must be installed and#
#  imported in order to make use of the Get-Hash   #
#  module.                                         #
####################################################



#Hardcoded csv filepath and name
$outCSVFileTemp = "C:\users\dwerd_000\Desktop\SB_File_Hashes_Full.csv"

#Hardcoded location of files
$sbpath ="C:\users\dwerd_000\Desktop\ScoutNB_Collections\"
#create the collection of files
$sbfiles = gci $sbpath -Recurse | ? { !$_.PSIsContainer }


#process each file, getting the file hash and last write time for each
#ouput goes to file defined in outCSVFileTemp above
foreach ($sbfile in $sbfiles ) {
   
    #smarter to grab file's LastWriteTime value first in order to append to the Get-Hash object
    $sbfileTime = $sbfile.LastWriteTime.ToString("dd/MM/yyyy HH:mm:ss")
    Get-Hash $sbfile | Select-Object Path,HashString,@{Name='LastModified';Expression={$sbfileTime}} | Export-Csv $outCSVFileTemp -Append

 }


To quickly explain what is going on here exactly:
$sbfiles is set to contain all of the files in the given path. This is done recursively and excludes folders themselves.
Next, a foreach loop is used to process each file by:
   - First grabbing the file's LastWriteTime property, using the given format and saving to $sbfileTime
   - Next (and this was the FUN part) the file object is cut-up using the Select-Object function where only the Path and HashString 'columns' are retained and third column (LastModified) is added and set to the value of $sbfileTime
   - The "new" object, consisting of Path, HashString, LastModified, columns/values is now exported to the $outCSVFileTemp.

By running this script, I am able to use one spreadsheet to identify the newest version of each file, as well as if multiple versions of the same file are the same or different. While there is probably a way to automate this in PowerShell, I still prefer to do these kinds of tasks semi-manually by using Excel's ability to filter/sort as well as the its ability to highlight duplicates (HashString, in this case). The only other action that I currently do manually but may add to this script is the splitting of the Path value into the full path to the lowest folder in one column and the filename by itself in a second column (not sure which way to go on this).

Anyway, it was a lot of fun to bang this out and to see that I ended up with a CSV file of exactly the data I needed and nothing else. The other PowerShell script I knocked out today was a (for now) hardcoded parser for finding specific items from one or more Nessus results file and creating an appropriately named CSV file for these found subsets. Maybe later this week or next I will post that up as well....actually, I am certain I will as I have not found a good PS or other tool to find and compile the subsets I need from Nessus in order to provide valid data for PT reports.