Powershell for data mining Access databases into SQL Server

I need to do some data extraction from several hundreds of Access databases extracted from zip files. The extracted data will allow us to do some statistics on our customer’s behavior.
In order to do that, I decided to use Powershell, because it has all the features I need, bundled in one neat language.

The scaffolding for the scripts uses PSake for the task launcher, Pester for Powershell unit tests, and a few Nuget packages (including PSake).

In a vendor folder, add the Nuget exe, as well as a packages.config file listing the necessary Nuget packages. I have added PSake, and NUnitOrange to transform the Pester results to a nice HTML report.
I’m not using the Pester Nuget package because it includes the Pester unit tests, and some of them fail, which breaks my build (in addition to adding a thousand tests I don’t care about). Instead, I directly include the Pester scripts, and I cleaned up the samples and tests.

Since the script needs to read Access databases using some sort of ADODB or OLEDB provider, you will have to install either the Access 2007 or Access 2010 provider. If you’re like me (with Office x86 already installed), you won’t be able to install the Access 2010 x64 provider, which will trigger “The Jet/ACE OLEDB provider is not registered on the local machine” errors. So you will have to run the x86 version of Powershell. Which means you will have to use the x86 version of the SQLPS module (make sure you download the x86 version). Don’t worry, the SQL 2012 version of SQLPS is compatible with older SQL Servers (at least 2008 R2).

The batch bootstrapper is inspired by the bootstrapper from Pester:

The tasks in tasks.ps1 are pretty standard and minimalist, so that the maximum is tested in modules via Pester.

Reading Access databases is as simple as using OleDb, old-school style:

And the corresponding Pester tests, using actual Access test databases:

Executing SQL scripts uses the much more powerful Invoke-SqlCmd cmdlet from the SQLPS module. The most is that it returns a Powershell object, so I can do something like this:

Unit testing your Powershell scripts using Pester

In order to stabilize our middle office, we need to test it. Not that it’s buggy, but hey, testing is good.

We have a huge 500-lines script that processes PDF files through a bunch of programs. We need to test a few things:

  • Every step and each piece of the code must be working
  • The configuration files must be properly read
  • The proper programs must be run in the proper order
  • The processed PDFs must look like what we’re expecting

To do that, we have a lot of work to do.

There is a series of great articles on Pester on PowerShell Magazine.

Refactor your script

First, we need to extract the methods and code blocks we will test. Our script is not that hard to refactor into a bunch of methods, since it’s pretty well organized so far. A simple refactor will simply be moving a bunch of independent code blocks into methods, regrouped and externalized by feature.

Prefer using modules to do that. Create your methods into .psm1 files. It will allow you to just Import-Module mymodule.psm1 and use it the same way as if you were dot-sourcing your file (. .\myfile.ps1), but will allow you to do more awesome things later; for instance, you can get the list of available commands through Get-Command -Module mymodule, get the comment-based help of your module methods through Get-Help my-module-method, get tab-expansion on your methods, etc.

If you’re not sure how to refactor your script to extract units of work, I can’t really help you there, and you should go learn more about unit testing; there are a lot of places where you can do that.

Unit Test your extracted code

There is an awesome unit test and BDD tool called Pester. Download the latest nuget package into your scripts directory by running nuget install pester. PsGet also has a Pester package, but including the Pester files with your sources (or a way to get them like with Nuget) is very important if you intend to run your tests on a continuous integration platform, especially if you externalize them on a service like AppVeyor.

Create a _run_tests.ps1 file (or whatever your naming convention calls for), with the very simple following contents:

Invoke-Pester will run all the “*.Tests.ps1” files it finds in the current directory. Unfortunately, the Pester Nuget package comes with the Pester tests, and it’s pretty annoying to see the thousands of unit tests in the middle of yours. You can either not use Invoke-Pester and roll your own “look for *.Tests.ps1 file except in the Pester folder” method, or (like I did) forget about nuget update pester and remove the test files from the Pester folder.

To create unit test files, you can either write them manually, or use the New-Fixture module command, which will create both a file to contain your methods, and a test file to test your methods. If you’re working with modules, you will not really be able to use the power of this command, but if you’re creating a new script, it will provide you with a BDD workflow.

Your test file for your module will look like this:

As you can see, for now I’m using a custom config files with the expected values filled. This is not the best way to unit test, so we’re going to use mocking.

Mock the system methods

Here is the true power of Pester and Powershell: the ability to mock system methods. Your method reads files from the disk? No problem. Just provide an alternative implementation and you don’t have to setup a bunch of test data and config files.

My Read-MailConfig looks like this:

So, there is a Get-Content method that I want to mock and control its return value. I can now modify my test so that the values used by my test are right next to them:

Note the usage of -ModuleName mail in the Mock call: modules have their own scope (which is not the case of plain dot-sourced script files), and so need a bit more work to inject mocks.

My mail module actually sends emails through the System.Net.Mail classes, but Pester can’t mock .Net objects (note that .Net mocking frameworks can’t mock most system classes either).

In order to bypass that, we’re going to extract the .Net object calls into separate methods doing only that, we’re going to mock this extraction, and not test the .Net method call:

And the test:

Use TestDrive to test file system processes

If you need to test for complex file access, mocking system methods will quickly become too hard. For instance, I need to test two methods: one that removes files older than X days, and one that removes empty folders. Using TestDrive is much more straightforward, simple and compact than mocking the Get-ChildItem cmdlet: