Generate SQL Server test data from Excel. Operations in Excel (1)
Creating sample data
During the application development stages, in most of the time, we are
faced with the need to have at our disposal a set of test data to use in the
different processes that are developing. A similar situation occurs if we are
building an information system based on OLAP data cubes using SQL Server
Analysis Services, because in these cases we will need also a large volume of
data, to perform analysis simulations.
Suppose we have to generate a database of population data, with a table
containing data of individuals such as age, code region of residence, national
origin, sex, registration date, etc. On the other hand, also need
a series of catalog tables of countries, regions, and other values related to
the fields in the table of individuals.
Among the full range of utilities, tricks, and more, which exist to
carry out this task, in this article we will use Excel as a tool for generating
the dummy data set for individuals that later we'll dump in a SQL Server
database, which could be used as a data source for the application or
information system.
Creating data with
Excel
After starting Excel 2010 (earlier versions may also use), our first
task will be to create a column with the values that will serve to identify
the rows in the table. The simplest way to generate is to introduce a pair
of consecutive numbers in individual cells in a column of the worksheet, select
both cells and drag the fill handle to the last row for which we want to
generate the numbers.

However, the generation of values using this technique can be somewhat
cumbersome in the case that we must to produce a large number of rows and / or
columns, so that for the creation of data for all columns of the sheet we'll
use a more flexible yet powerful solution: Excel macros.
Creating a Macro
To create a macro we will click on the View tab of the Excel ribbon, and
within the Macros group we will click on
the option of the same name, which opens the Macro dialog box, where we'll write
the macro name: CrearDatosPoblacion.

Clicking the Create button opens the Visual Basic for Applications (VBA)
window editor, so that we can start writing the code of the macro.

Within the procedure body CrearDatosPoblación, write the following code block, where first, clean up the cells of the spreadsheet where we are
currently positioned. Then ask the user to enter through an InputBox
dialog box, the number of rows to generate. After inserting the column title,
we will introduce the first two values that start the series, which generate
using the method Selection.AutoFill.
Option Explicit
Sub CrearDatosPoblacion()
Dim nFilaDestino As Long
Dim sCeldaOrigen As String
Dim sCeldaDestino As String
'clean worksheet cells
Cells.Select
Selection.ClearContents
nFilaDestino = InputBox("Number of records to generate")
nFilaDestino = nFilaDestino + 1
'fila_id column
'---------------
'column caption
Range("A1").Select
ActiveCell.FormulaR1C1 = "Fila_ID"
'init values of series to generate
Range("A2").Select
ActiveCell.FormulaR1C1 = "1"
Range("A3").Select
ActiveCell.FormulaR1C1 = "2"
'select init values range
Range("A2:A3").Select
'fill total range of cells
sCeldaOrigen = "A2"
sCeldaDestino = "A" & nFilaDestino
Selection.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
End Sub

To run the macro will select the menu "Run | Run Sub / UserForm", or
press the key F5, filling the first column of the sheet with the number of
values indicated in the InputBox.

Before continuing we'll save our work in VBA or Excel, bearing in mind
that we must do it in a file of type "Excel Macro-Enabled Workbook (.xlsm)", what
we will be notified by a dialog box in time to save. We will click "No" in that
dialog and save our worksheet in a file named GenerarDatosPoblacion.xlsm.

Generating random data
The next column to create correspond to the age of individuals, in it,
need to generate random values within a range of numbers, representing the
maximum and minimum age that a person can have, for example between 0 and 120.
To create a random value between two numbers in a cell of Excel, we can
use a formula with RANDBETWEEN function, that receives as parameter the above
numbers, returning as a result the number generated.

Moving this functionality to the macro that we developed, we will add to
it the next code block, which assign the formula expression RANDBETWEEN
(0,120) to a cell in the second column. We will select that cell, and repeat
the formula over a range of cells using the method ActiveCell.AutoFill.
'age
'----
Range("B1").Select
ActiveCell.FormulaR1C1 = "Edad_ID"
sCeldaOrigen = "B2"
sCeldaDestino = "B" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(0,120)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
Running the macro again, we'll get the new column with the age data.

Our next step is for the column of sex of individuals. Here
we could have used numbers 1 and 2, also generated randomly, to identify male
and female respectively, but let's get a little this operation, using instead
the letters H and M as values for the column. So the problem now is how to
randomly generate these letters in the cells of the column, since RANDBETWEEN only
receives and returns numerical results.
The solution is very simple, as it also involves using RANDBETWEEN
function, but combining it with the decision expression IF. We will pass to RANDBETWEEN
the numbers 1 and 2 as parameters, and according to the result obtained, IF
return the letter H or M. Below is the code block for this column, we will add
to the macro.
'sex
'---
Range("C1").Select
ActiveCell.FormulaR1C1 = "Sexo_ID"
sCeldaOrigen = "C2"
sCeldaDestino = "C" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=IF(RANDBETWEEN(1,2)=1,""H"",""M"")"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault

For the next two columns: code region of residence and country of origin
code, we'll follow the same mechanics as the column ages, although using
different numerical ranges.
'code region of residence
'------------------------
Range("D1").Select
ActiveCell.FormulaR1C1 = "CCAA_ID"
sCeldaOrigen = "D2"
sCeldaDestino = "D" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,19)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'country
'-------
Range("E1").Select
ActiveCell.FormulaR1C1 = "Pais_ID"
sCeldaOrigen = "E2"
sCeldaDestino = "E" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(4,894)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault

Our next step is to create the data of an alleged registration date of
individuals in this population system, a task that will make in two phases. First,
we create each part of the date in separate columns.
'date elements:
'==============
'year
'---
Range("F1").Select
ActiveCell.FormulaR1C1 = "Anualidad"
sCeldaOrigen = "F2"
sCeldaDestino = "F" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(2008,2010)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'month
'-----
Range("G1").Select
ActiveCell.FormulaR1C1 = "Mes"
sCeldaOrigen = "G2"
sCeldaDestino = "G" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,12)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'day
'---
Range("H1").Select
ActiveCell.FormulaR1C1 = "Dia"
sCeldaOrigen = "H2"
sCeldaDestino = "H" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=IF(RC[-1]=2, RANDBETWEEN(1,28), IF(OR(RC[-1]=4, RC[-1]=6, RC[-1]=9, RC[-1]=11), RANDBETWEEN(1,30), RANDBETWEEN(1,31)))"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
As we have seen, the day is the part of the date that requires further
work, as we must be careful to avoid the generation, for example, of a day 31 if
in the month column we have the value 2. We will solve this checking the month
column in first place, and according this we'll use a different range to
generate the day. So that in the formula we'll use several IF expressions
combined with RANDBETWEEN functions.

In the second phase of this operation, we will create a final column in
the spreadsheet with a date in a format understandable to SQL Server, as a
result of the concatenation of the above columns: Anualidad, Mes and Dia.
It is necessary to take into account when composing the date in this
way, we must add a zero to the month and day when these values are only one
digit.
'date composition
'----------------
Range("I1").Select
ActiveCell.FormulaR1C1 = "Fecha_Alta"
sCeldaOrigen = "I2"
sCeldaDestino = "I" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RC[-3] & IF(LEN(RC[-2])=1,""0"" & RC[-2],RC[-2]) & " & _
"IF(LEN(RC[-1])=1,""0"" & RC[-1],RC[-1]) "
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
When we rerun the macro, we'll get the correctly formatted date in the
last column. This time also establish the number of records to generate in
a million, that way we will test the data creation power of our process.

To complete operations in Excel we'll save the file as "Excel
Workbook (*.xlsx)". Dialog box appears again warning that we couldn't save
macros in a .xlsx file, which we'll reply by clicking Yes.
At this point we conclude the first part of the article, in the second
installment will explain how to transfer the data just generated to SQL Server.
Regards,