After data generation in Excel explained in the first part of the
article, in this second installment we'll show how to insert that information
into a SQL Server database.
Database creation
After spreadsheet creation, we'll transfer its contents into a SQL
Server database running the following script from SQL Server Management Studio.
USE master
GO
CREATE DATABASE Poblacion
GO
USE Poblacion
GO
CREATE TABLE DatosPoblacion (
Fila_ID int NOT NULL,
Edad_ID int NULL,
Sexo_ID char(1) NULL,
CCAA_ID int NULL,
Pais_ID int NULL,
Fecha_Alta datetime NULL,
CONSTRAINT PK_DatosPoblacion PRIMARY KEY CLUSTERED (Fila_ID ASC))
GO
CREATE TABLE Sexo (
Sexo_ID char(1) NOT NULL,
Sexo_DS varchar(10) NULL,
CONSTRAINT PK_Sexo PRIMARY KEY CLUSTERED (Sexo_ID ASC))
GO
CREATE TABLE CCAA (
CCAA_ID int NOT NULL,
CCAA_DS varchar(50) NULL,
CONSTRAINT PK_CCAA PRIMARY KEY CLUSTERED (CCAA_ID ASC))
GO
CREATE TABLE Pais (
Pais_ID int NOT NULL,
Pais_DS varchar(50) NULL,
CONSTRAINT PK_Paises PRIMARY KEY CLUSTERED (Pais_ID ASC))
GO
As we have seen, in addition to the table that will house the data
generated from Excel, we will also create the catalog tables, which contain
descriptions of some fields of code existing in DatosPoblacion table with which
establish the necessary relationships.

Import Excel data from
SQL Server
To insert data into DatosPoblacion table, we'll use Transact-SQL's
OPENROWSET function as follows.
INSERT INTO DatosPoblacion
SELECT Fila_ID,Edad_ID,Sexo_ID,CCAA_ID,Pais_ID,Fecha_Alta
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 8.0;Database=C:\DatosOrigen\GenerarDatosPoblacion.xlsx',
'SELECT * FROM [Hoja1$]')
However, it's possible we get an error when trying to execute this
insert sentence, which informs us that SQL Server is not configured to query in
this way, called 'Ad Hoc Distributed Queries'.
If we want to check the SQL Server settings, we must run the system
stored procedure sp_configure. However, it is likely that among the options
listed, we do not see the distributed queries configuration. If we are in this
case, we must enable the advanced options display using the following
sentences.
EXECUTE sp_configure 'show advanced options',1
GO
RECONFIGURE
GO
Now we see the value of 'Ad Hoc Distributed Queries' option by
executing sp_configure.To enable it execute the following.
EXECUTE sp_configure 'Ad Hoc Distributed Queries',1
GO
RECONFIGURE
GO
When we run sp_configure back, we will already see activated the ability
to run distributed queries.

Thus, the previous sentence with OPENROWSET will work properly, filling
the DatosPoblacion table with the content of the GenerarDatosPoblacion.xlsx
file.

Optimizing data import
from Excel
However, at the current point we may face a performance problem, because
if we followed the above steps for creating the Excel file, we have a
spreadsheet with one million rows, which can take about fifteen minutes to load
into SQL Server table. For the sample developed in this article, we have used a virtual machine
with Windows 7 operating system and 1.5 GB RAM, so that the above times may
vary depending on the configuration used for these tests.
If we want to reduce these load times we may opt an alternative
technique for transferring data which explain below.
In first place we will open back the file GenerarDatosPoblacion.xlsx,
saving it as "CSV (Comma delimited)" type.

In this way we'll get a text file with fields delimited by the semicolon
character. We can see its contents by opening it with Notepad.

Then we will create a new table in the database with the following
structure.
CREATE TABLE DatosPoblacionExcel (
Fila_ID varchar(20) NULL,
Edad_ID varchar(20) NULL,
Sexo_ID varchar(20) NULL,
CCAA_ID varchar(20) NULL,
Pais_ID varchar(20) NULL,
Anualidad varchar(20) NULL,
Mes varchar(20) NULL,
Dia varchar(20) NULL,
Fecha_Alta varchar(20) NULL)
Into this new table we will import the contents of
GenerarDatosPoblacion.csv, using the Transact-SQL statement BULK INSERT.
Through the FIELDTERMINATOR option, we'll specify the character used as field
separator, while the option FIRSTROW indicate that the reading of data starts
in the second row of the file, because the first contains the column names.
BULK INSERT DatosPoblacionExcel
FROM 'C:\DatosOrigen\GenerarDatosPoblacion.csv'
WITH (FIELDTERMINATOR =';', FIRSTROW=2)
The time consumed by the bulk insert operation in the table
DatosPoblacionExcel be about thirty seconds.
To finish this process, we will insert in the table DatosPoblacion the
records from DatosPoblacionExcel table, excluding the unnecessary fields, as
shown in the following statement, whose execution will take approximately 15
seconds.
INSERT INTO DatosPoblacion
SELECT Fila_ID,Edad_ID,Sexo_ID,CCAA_ID,Pais_ID,Fecha_Alta
FROM DatosPoblacionExcel
As we have seen, this inserting data technique, but requires us to
perform an additional step, is a significant gain in time, employing less than
a minute in the transfer of data to the table DatosPoblacion, compared to the
fifteen minutes used by the OPENROWSET function.
Importing the other
catalog tables
For tables CCAA and Pais, we will use two Excel files containing,
respectively, the official classification of regions
and countries .These files are available in
compressed format on the website of the Madrid Community Statistics Institute. Once downloaded and unzipped we'll
run the following SQL statements for import into our database.
INSERT INTO CCAA
SELECT DISTINCT ccaa, liteccaa
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\DatosOrigen\ccaaprov.xls',
'SELECT * FROM [ccaaprov$]')
INSERT INTO Pais
SELECT DISTINCT isopais,lpaisc
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\DatosOrigen\cozonu.xls',
'SELECT * FROM [cozonu$]')
Note that depending on the version of Excel file to import, in
OPENROWSET function we'll use a different provider to
get the data.If the file is for Excel 2007-2010 will use
'Microsoft.ACE.OLEDB.12.0 ', while for earlier versions will be
"Microsoft.Jet.OLEDB.4.0 '.
Setting the country
code in the table DatosPoblacion
A closer look at the records of the Pais table will unveil that Pais_ID
field values are uncorrelated, being also the lowest value 4 and the highest
894.

This contrasts with existing data in the field of the same name for the
table DatosPoblacion because, although the maximum and minimum value of this
field is also between 4 and 894, we find a lot of records where the field
Pais_ID not correspond to any value in the Pais table.
To solve this problem we resort to a couple of techniques, of which the
first will be taken from DatosPoblacion table, each Pais_ID field values that
do not exist in the Pais table, adding one until we reach a value which does
exist in the countries catalog table mentioned. We will implement this
process in the SQL Server function below.
CREATE FUNCTION dbo.ObtenerPais(@nPais_ID int)
RETURNS int
AS
BEGIN
WHILE (SELECT COUNT(*) FROM Pais WHERE Pais_ID = @nPais_ID) = 0
BEGIN
SET @nPais_ID = @nPais_ID + 1
END
RETURN @nPais_ID
END
GO
The update of field Pais_ID in table DatosPoblacion the take out with
the next statement.
UPDATE DatosPoblacion
SET Pais_ID = dbo.ObtenerPais(Pais_ID)
WHERE Pais_ID NOT IN (SELECT Pais_ID FROM Pais)
The second technique is more direct, as it avoids the use of the search
function of the field Pais_ID in the countries table.What we do here is an
update of the field Pais_ID for the entire table DatosPoblacion, looking in the
Pais table, the value of Pais_ID closest to that exists in the same field in the
table DatosPoblacion.
UPDATE DatosPoblacion
SET Pais_ID = (SELECT TOP 1 Pais.Pais_ID FROM Pais
WHERE Pais.Pais_ID >= DatosPoblacion.Pais_ID
ORDER BY Pais.Pais_ID)
In both cases, we get that all the records in the DatosPoblacion table
join correctly with the Pais table through Pais_ID field.
Manual data entry
After the above operations, the only remaining empty table is Sexo, so
we'll run the following statements, which will create the necessary records.
INSERT INTO Sexo VALUES ('H','Hombre')
INSERT INTO Sexo VALUES ('M','Mujer')
Establishing
relationships between tables
To finish the database creation, we will establish the appropriate
relationships between fields in the table DatosPoblacion and other catalog
tables using the following sentences.
ALTER TABLE DatosPoblacion WITH CHECK ADD
CONSTRAINT FK_DatosPoblacion_Sexo FOREIGN KEY(Sexo_ID) REFERENCES Sexo (Sexo_ID),
CONSTRAINT FK_DatosPoblacion_CCAA FOREIGN KEY(CCAA_ID) REFERENCES CCAA (CCAA_ID),
CONSTRAINT FK_DatosPoblacion_Pais FOREIGN KEY(Pais_ID) REFERENCES Pais (Pais_ID)
After this operation we finish this article, which explained the
different parts of a process for generating, from Excel, a considerable amount
of test data that can be used later from SQL Server.We hope you find useful.
Regards.
Creating sample data
During the application development stages, in most of the time, we are
faced with the need to have at our disposal a set of test data to use in the
different processes that are developing. A similar situation occurs if we are
building an information system based on OLAP data cubes using SQL Server
Analysis Services, because in these cases we will need also a large volume of
data, to perform analysis simulations.
Suppose we have to generate a database of population data, with a table
containing data of individuals such as age, code region of residence, national
origin, sex, registration date, etc. On the other hand, also need
a series of catalog tables of countries, regions, and other values related to
the fields in the table of individuals.
Among the full range of utilities, tricks, and more, which exist to
carry out this task, in this article we will use Excel as a tool for generating
the dummy data set for individuals that later we'll dump in a SQL Server
database, which could be used as a data source for the application or
information system.
Creating data with
Excel
After starting Excel 2010 (earlier versions may also use), our first
task will be to create a column with the values that will serve to identify
the rows in the table. The simplest way to generate is to introduce a pair
of consecutive numbers in individual cells in a column of the worksheet, select
both cells and drag the fill handle to the last row for which we want to
generate the numbers.

However, the generation of values using this technique can be somewhat
cumbersome in the case that we must to produce a large number of rows and / or
columns, so that for the creation of data for all columns of the sheet we'll
use a more flexible yet powerful solution: Excel macros.
Creating a Macro
To create a macro we will click on the View tab of the Excel ribbon, and
within the Macros group we will click on
the option of the same name, which opens the Macro dialog box, where we'll write
the macro name: CrearDatosPoblacion.

Clicking the Create button opens the Visual Basic for Applications (VBA)
window editor, so that we can start writing the code of the macro.

Within the procedure body CrearDatosPoblación, write the following code block, where first, clean up the cells of the spreadsheet where we are
currently positioned. Then ask the user to enter through an InputBox
dialog box, the number of rows to generate. After inserting the column title,
we will introduce the first two values that start the series, which generate
using the method Selection.AutoFill.
Option Explicit
Sub CrearDatosPoblacion()
Dim nFilaDestino As Long
Dim sCeldaOrigen As String
Dim sCeldaDestino As String
'clean worksheet cells
Cells.Select
Selection.ClearContents
nFilaDestino = InputBox("Number of records to generate")
nFilaDestino = nFilaDestino + 1
'fila_id column
'---------------
'column caption
Range("A1").Select
ActiveCell.FormulaR1C1 = "Fila_ID"
'init values of series to generate
Range("A2").Select
ActiveCell.FormulaR1C1 = "1"
Range("A3").Select
ActiveCell.FormulaR1C1 = "2"
'select init values range
Range("A2:A3").Select
'fill total range of cells
sCeldaOrigen = "A2"
sCeldaDestino = "A" & nFilaDestino
Selection.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
End Sub

To run the macro will select the menu "Run | Run Sub / UserForm", or
press the key F5, filling the first column of the sheet with the number of
values indicated in the InputBox.

Before continuing we'll save our work in VBA or Excel, bearing in mind
that we must do it in a file of type "Excel Macro-Enabled Workbook (.xlsm)", what
we will be notified by a dialog box in time to save. We will click "No" in that
dialog and save our worksheet in a file named GenerarDatosPoblacion.xlsm.

Generating random data
The next column to create correspond to the age of individuals, in it,
need to generate random values within a range of numbers, representing the
maximum and minimum age that a person can have, for example between 0 and 120.
To create a random value between two numbers in a cell of Excel, we can
use a formula with RANDBETWEEN function, that receives as parameter the above
numbers, returning as a result the number generated.

Moving this functionality to the macro that we developed, we will add to
it the next code block, which assign the formula expression RANDBETWEEN
(0,120) to a cell in the second column. We will select that cell, and repeat
the formula over a range of cells using the method ActiveCell.AutoFill.
'age
'----
Range("B1").Select
ActiveCell.FormulaR1C1 = "Edad_ID"
sCeldaOrigen = "B2"
sCeldaDestino = "B" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(0,120)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
Running the macro again, we'll get the new column with the age data.

Our next step is for the column of sex of individuals. Here
we could have used numbers 1 and 2, also generated randomly, to identify male
and female respectively, but let's get a little this operation, using instead
the letters H and M as values for the column. So the problem now is how to
randomly generate these letters in the cells of the column, since RANDBETWEEN only
receives and returns numerical results.
The solution is very simple, as it also involves using RANDBETWEEN
function, but combining it with the decision expression IF. We will pass to RANDBETWEEN
the numbers 1 and 2 as parameters, and according to the result obtained, IF
return the letter H or M. Below is the code block for this column, we will add
to the macro.
'sex
'---
Range("C1").Select
ActiveCell.FormulaR1C1 = "Sexo_ID"
sCeldaOrigen = "C2"
sCeldaDestino = "C" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=IF(RANDBETWEEN(1,2)=1,""H"",""M"")"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault

For the next two columns: code region of residence and country of origin
code, we'll follow the same mechanics as the column ages, although using
different numerical ranges.
'code region of residence
'------------------------
Range("D1").Select
ActiveCell.FormulaR1C1 = "CCAA_ID"
sCeldaOrigen = "D2"
sCeldaDestino = "D" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,19)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'country
'-------
Range("E1").Select
ActiveCell.FormulaR1C1 = "Pais_ID"
sCeldaOrigen = "E2"
sCeldaDestino = "E" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(4,894)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault

Our next step is to create the data of an alleged registration date of
individuals in this population system, a task that will make in two phases. First,
we create each part of the date in separate columns.
'date elements:
'==============
'year
'---
Range("F1").Select
ActiveCell.FormulaR1C1 = "Anualidad"
sCeldaOrigen = "F2"
sCeldaDestino = "F" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(2008,2010)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'month
'-----
Range("G1").Select
ActiveCell.FormulaR1C1 = "Mes"
sCeldaOrigen = "G2"
sCeldaDestino = "G" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,12)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'day
'---
Range("H1").Select
ActiveCell.FormulaR1C1 = "Dia"
sCeldaOrigen = "H2"
sCeldaDestino = "H" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=IF(RC[-1]=2, RANDBETWEEN(1,28), IF(OR(RC[-1]=4, RC[-1]=6, RC[-1]=9, RC[-1]=11), RANDBETWEEN(1,30), RANDBETWEEN(1,31)))"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
As we have seen, the day is the part of the date that requires further
work, as we must be careful to avoid the generation, for example, of a day 31 if
in the month column we have the value 2. We will solve this checking the month
column in first place, and according this we'll use a different range to
generate the day. So that in the formula we'll use several IF expressions
combined with RANDBETWEEN functions.

In the second phase of this operation, we will create a final column in
the spreadsheet with a date in a format understandable to SQL Server, as a
result of the concatenation of the above columns: Anualidad, Mes and Dia.
It is necessary to take into account when composing the date in this
way, we must add a zero to the month and day when these values are only one
digit.
'date composition
'----------------
Range("I1").Select
ActiveCell.FormulaR1C1 = "Fecha_Alta"
sCeldaOrigen = "I2"
sCeldaDestino = "I" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RC[-3] & IF(LEN(RC[-2])=1,""0"" & RC[-2],RC[-2]) & " & _
"IF(LEN(RC[-1])=1,""0"" & RC[-1],RC[-1]) "
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
When we rerun the macro, we'll get the correctly formatted date in the
last column. This time also establish the number of records to generate in
a million, that way we will test the data creation power of our process.

To complete operations in Excel we'll save the file as "Excel
Workbook (*.xlsx)". Dialog box appears again warning that we couldn't save
macros in a .xlsx file, which we'll reply by clicking Yes.
At this point we conclude the first part of the article, in the second
installment will explain how to transfer the data just generated to SQL Server.
Regards,