La cláusula ROLLUP perteneciente a la función SUMMARIZE genera, en las columnas de agregación que especifiquemos, filas adicionales de resultados acumulados (totales parciales) para las columnas numéricas utilizadas en la consulta.
A modo de ejemplo, en la siguiente consulta utilizamos ROLLUP para la columna StoreType de la tabla DimStore, por lo que obtendremos, además de las cifras de venta por tipo de almacén, una última fila adicional con el total de las ventas.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP('DimStore'[StoreType]),
"Ventas", SUM('FactSales'[SalesAmount])
)

Partiendo de este resultado, supongamos ahora que necesitamos que los tipos de almacén aparezcan ordenados, pero manteniendo la fila de total al final. Si usamos la cláusula ORDER BY en la consulta, el resultado no será totalmente satisfactorio, ya que se ordenarán los nombres, pero la fila de total quedará al principio.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP('DimStore'[StoreType]),
"Ventas", SUM('FactSales'[SalesAmount])
)
ORDER BY [StoreType]

Aplicando el modificador DESC a la cláusula ORDER BY solucionamos el problema solamente en parte, ya que conseguimos que el total vuelva a colocarse al final del conjunto de resultados, pero como es lógico, los nombres quedan ordenados en sentido descendente.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP('DimStore'[StoreType]),
"Ventas", SUM('FactSales'[SalesAmount])
)
ORDER BY [StoreType] DESC

Controlando la ubicación de los acumulados mediante ISSUBTOTAL
Si queremos ordenar los nombres en sentido ascendente, pero manteniendo la ubicación del total al final, debemos recurrir a ISSUBTOTAL, otra de las cláusulas de SUMMARIZE, la cual crea una columna de tipo lógico, en la que cada valor nos indica si la fila actual del conjunto de resultados corresponde a un acumulado (total parcial) o bien se trata de una fila normal de datos. El truco en este caso reside en aplicar una doble ordenación: primero por la columna de ISSUBTOTAL y a continuación por la de datos.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP('DimStore'[StoreType]),
"Ventas", SUM('FactSales'[SalesAmount]),
"Subtotal_StoreType", ISSUBTOTAL('DimStore'[StoreType])
)
ORDER BY [Subtotal_StoreType],[StoreType]

Ahora vamos a añadir la columna ContinentName, de la tabla DimGeography, como nueva columna de agrupación a la consulta. De esta manera, cada fila de acumulado resultante se corresponderá con el total de ventas de un continente para todos los tipos de almacén.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP('DimStore'[StoreType]),
'DimGeography'[ContinentName],
"Ventas", SUM('FactSales'[SalesAmount])
)

La interpretación de los datos que obtenemos, no obstante, es poco amigable, por lo que nuevamente usaremos ISSUBTOTAL y ordenaremos las columnas tal y como muestra el siguiente bloque de código, consiguiendo que las filas de acumulado se sitúen proporcionando la información de un modo más legible.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP('DimStore'[StoreType]),
'DimGeography'[ContinentName],
"Ventas", SUM('FactSales'[SalesAmount]),
"Subtotal_StoreType", ISSUBTOTAL('DimStore'[StoreType])
)
ORDER BY [ContinentName], [Subtotal_StoreType], [StoreType]

Ya que ROLLUP admite más de una columna de agregación como parámetro, vamos a añadirle la columna ContinentName para obtener en esta ocasión, además de los acumulados anteriormente mencionados, una nueva fila de total con el importe de las ventas para todos los tipos de almacén y continentes. Al mismo tiempo crearemos con ISSUBTOTAL sendas columnas para StoreType y ContinentName, que junto a un adecuado orden facilitarán la lectura de los datos.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP('DimStore'[StoreType], 'DimGeography'[ContinentName]), "Ventas", SUM('FactSales'[SalesAmount]),
"Subtotal_StoreType", ISSUBTOTAL('DimStore'[StoreType]), "Subtotal_ContinentName", ISSUBTOTAL('DimGeography'[ContinentName])
)
ORDER BY [Subtotal_StoreType], [StoreType], [Subtotal_ContinentName], [ContinentName]

Agrupar los acumulados con ROLLUPGROUP
A pesar de existir la posibilidad de utilizar varias columnas con ROLLUP, en determinadas situaciones quizá nos interese tener una única fila de acumulado para todas las columnas de agregación incluidas en esta cláusula, lo que podemos lograr mediante ROLLUPGROUP, otra de las cláusulas de SUMMARIZE.
Esta cláusula se utiliza como parámetro de ROLLUP, colocando las columnas de agregación como parámetros de la llamada a la función ROLLUPGROUP.
EVALUATE
SUMMARIZE(
'DimStore',
ROLLUP(ROLLUPGROUP('DimStore'[StoreType], 'DimGeography'[ContinentName])),
"Ventas", SUM('FactSales'[SalesAmount]),
"Subtotal_StoreType", ISSUBTOTAL('DimStore'[StoreType])
)
ORDER BY [Subtotal_StoreType], [StoreType], [ContinentName]

Cabe destacar que la información sobre ROLLUPGROUP disponible en la documentación online de la función SUMMARIZE es una aportación de Marco Russo, puesto que dicha entrada, correspondiente a la referencia de las funciones del lenguaje DAX, no disponía inicialmente de reseña alguna acerca de esta cláusula.
Para lograr un comportamiento similar en Transact-SQL haremos uso del operador ROLLUP cuando especifiquemos las columnas a agrupar, así como de la función GROUPING en la lista de columnas a mostrar de la instrucción SELECT.
SELECT
CASE
WHEN (GROUPING(StoreType)=1) THEN '--Total General--'
ELSE StoreType
END AS StoreType,
CASE
WHEN (GROUPING(ContinentName)=1) THEN '--Acumulado por tipo almacén--'
ELSE ContinentName
END AS ContinentName,
SUM(SalesAmount) AS Ventas
FROM DimStore
INNER JOIN DimGeography
ON DimStore.GeographyKey = DimGeography.GeographyKey
INNER JOIN FactSales
ON DimStore.StoreKey = FactSales.StoreKey
GROUP BY StoreType, ContinentName WITH ROLLUP
ORDER BY StoreType, GROUPING(ContinentName), ContinentName
Y llegados a este punto damos por concluida esta serie de artículos en los que hemos expuesto diversas técnicas para la creación de consultas contra modelos tabulares empleando el lenguaje DAX, espero que os resulten de utilidad.
Si las tablas de las columnas empleadas en la consulta con SUMMARIZE no están relacionadas se producirá un error, como vemos en el siguiente bloque de código, donde combinamos las columnas StoreType y BrandName de las tablas DimStore y DimProduct, para obtener las ventas realizadas.
EVALUATE
SUMMARIZE(
'DimStore',
'DimStore'[StoreType],
'DimProduct'[BrandName],
"Ventas", SUM('FactSales'[SalesAmount])
)

Tal y como se explica en el mensaje de error, la columna BrandName no existe en la tabla de entrada DimStore, utilizada como primer parámetro de la función SUMMARIZE; aunque en este caso sería más correcto decir que la tabla DimProduct, que contiene la columna BrandName, no está relacionada con la tabla de entrada usada en la consulta.
Si el primer parámetro de SUMMARIZE representa una tabla con las columnas a visualizar (directa o indirectamente a través de relaciones), en el caso de que no existan relaciones con las tablas necesarias podemos crear una tabla al vuelo con las columnas que tenemos que mostrar empleando la función GENERATE.
EVALUATE
SUMMARIZE(
GENERATE(
VALUES('DimStore'[StoreType]),
VALUES('DimProduct'[BrandName])
),
[StoreType],
[BrandName],
"Ventas tipo almacén marca producto", SUM('FactSales'[SalesAmount])
)
ORDER BY [StoreType],[BrandName]

En el caso de que el número de columnas sin relacionar que tengamos que presentar sea superior a dos, podemos optar por la función CROSSJOIN para crear la tabla de entrada utilizada por SUMMARIZE, como vemos en la siguiente consulta, donde creamos una medida calculada para realizar la suma de las ventas, a la que aplicaremos un formato monetario, y que emplearemos en una condición de filtro para evitar mostrar importes vacíos.
DEFINE
MEASURE 'FactSales'[Ventas] = SUM('FactSales'[SalesAmount])
EVALUATE
FILTER(
SUMMARIZE(
CROSSJOIN(
VALUES('DimStore'[StoreType]),
VALUES('DimProduct'[BrandName]),
VALUES('DimDate'[CalendarYear])
),
[StoreType],
[BrandName],
[CalendarYear],
"Ventas tipo almacén-marca producto-año venta", FORMAT('FactSales'[Ventas], "Currency")
),
NOT(ISBLANK([Ventas]))
)
ORDER BY [StoreType],[BrandName],[CalendarYear]

Partiendo del anterior resultado, supongamos ahora que necesitamos calcular el porcentaje que la combinación de ventas por tipo de almacén, marca de producto y año de venta ha supuesto con respecto al total de las ventas realizadas.
Para obtener estos resultados precisamos añadir al código de la consulta dos nuevas medidas calculadas: la primera, a la que llamaremos VentasTotales, realizará mediante la función SUMX la suma de la columna SalesAmount para todas las filas de la tabla FactSales; empleando la función ALL para ignorar cualquier filtro que pudiera encontrarse activo.
La segunda medida, PorcentajeVentas, efectuará la operación encargada de calcular el porcentaje de las ventas realizadas. Dicho valor será convenientemente formateado cuando lo mostremos junto al resto de columnas de la consulta.
DEFINE
MEASURE 'FactSales'[Ventas] = SUM('FactSales'[SalesAmount])
MEASURE 'FactSales'[VentasTotales] = SUMX(ALL('FactSales'), 'FactSales'[SalesAmount])
MEASURE 'FactSales'[PorcentajeVentas] = ('FactSales'[Ventas] / 'FactSales'[VentasTotales])
EVALUATE
FILTER(
SUMMARIZE(
CROSSJOIN(
VALUES('DimStore'[StoreType]),
VALUES('DimProduct'[BrandName]),
VALUES('DimDate'[CalendarYear])
),
[StoreType],
[BrandName],
[CalendarYear],
"Ventas tipo almacén-marca producto-año venta", FORMAT('FactSales'[Ventas], "Currency"),
"Ventas totales", 'FactSales'[VentasTotales],
"Porcentaje ventas", FORMAT('FactSales'[PorcentajeVentas], "Percent")
),
NOT(ISBLANK([Ventas]))
)
ORDER BY [StoreType],[BrandName],[CalendarYear]

Aunque la medida calculada VentasTotales se visualiza junto al resto de las columnas, podría perfectamente permanecer oculta, ya que su finalidad consiste en actuar como operando para hallar el porcentaje de las ventas.
Para obtener este mismo resultado mediante SQL, tendremos que establecer, al igual que en anteriores ejemplos, las combinaciones necesarias entre las tablas implicadas en la consulta.
SELECT StoreType, BrandName, CalendarYear,
SUM(SalesAmount) AS [Ventas tipo almacén-marca producto-año venta],
(SELECT SUM(SalesAmount) FROM FactSales) AS [Ventas totales],
FORMAT( ( SUM(SalesAmount) / (SELECT SUM(SalesAmount) FROM FactSales) ) , 'P', 'ES-ES' ) AS [Porcentaje ventas]
FROM DimStore
INNER JOIN FactSales
ON DimStore.StoreKey = FactSales.StoreKey
INNER JOIN DimProduct
ON FactSales.ProductKey = DimProduct.ProductKey
INNER JOIN DimDate
ON FactSales.DateKey = DimDate.Datekey
GROUP BY StoreType, BrandName, CalendarYear
ORDER BY StoreType, BrandName, CalendarYear
Empleando la tabla de resultados numéricos como tabla de entrada
En los últimos ejemplos de SUMMARIZE hemos planteado la manera de abordar la creación de una consulta introduciendo un problema consistente en la inexistencia de relaciones entre la tabla de entrada y el resto de tablas de las columnas de agregación implicadas en dicha consulta, lo cual nos llevaba a recurrir a soluciones un tanto artificiosas, tales como el empleo de las funciones GENERATE o CROSSJOIN en el código a escribir.
Observando las características de estas consultas nos percataremos de que la tabla FactSales, empleada para calcular la suma de ventas, se encuentra relacionada con las tablas utilizadas para las columnas de agregación. Si a esto unimos la capacidad de utilizar FactSales como tabla de entrada en SUMMARIZE, podremos construir una consulta más simple que evite el uso de trucos rebuscados. Como ventaja añadida, esta consulta no devuelve filas con la columna de ventas vacía, lo que también nos libera de aplicar el correspondiente filtro.
EVALUATE
SUMMARIZE(
'FactSales',
'DimStore'[StoreType],
'DimProduct'[BrandName],
'DimDate'[CalendarYear],
"Ventas", FORMAT(SUM('FactSales'[SalesAmount]), "Currency")
)
ORDER BY [StoreType],[BrandName],[CalendarYear]
Las columnas de agrupación de los ejemplos con SUMMARIZE mostrados en la entrega anterior pertenecían a una misma tabla, lo cual puede resultar conveniente en determinados casos, pero con toda seguridad, en algún momento nos encontraremos ante situaciones en las que tengamos que crear una consulta para visualizar columnas pertenecientes a diversas tablas del modelo.
Gracias a la potencia que ofrece el mecanismo de relaciones del motor tabular, esta labor resulta tan simple como añadir tales columnas en la lista de parámetros de la función, siempre y cuando existan las oportunas relaciones entre las tablas integrantes de la consulta.
Observando el diagrama de nuestro modelo de datos de ejemplo vemos que la tabla DimStore se relaciona con las tablas DimGeography, DimEmployee y FactSales, por lo que podemos construir una consulta con SUMMARIZE en la que ofrezcamos al usuario la suma de los descuentos sobre las ventas, agregada por el tipo de almacén, país y nombre del gerente del almacén que ha realizado la venta.
EVALUATE
SUMMARIZE(
'DimStore',
'DimStore'[StoreType],
'DimGeography'[RegionCountryName],
'DimEmployee'[FirstName],
'DimEmployee'[LastName],
"Total descuentos", SUM('FactSales'[DiscountAmount])
)
ORDER BY [StoreType],[RegionCountryName],[FirstName],[LastName]

En SQL esta consulta es muy similar a la anterior que hemos escrito en Transact, con alguna variación en las tablas a combinar.
SELECT StoreType, RegionCountryName, FirstName, LastName,
SUM(DiscountAmount) AS [Total descuentos]
FROM DimStore
INNER JOIN DimGeography
ON DimStore.GeographyKey = DimGeography.GeographyKey
INNER JOIN DimEmployee
ON DimStore.StoreManager = DimEmployee.EmployeeKey
INNER JOIN FactSales
ON DimStore.StoreKey = FactSales.StoreKey
GROUP BY StoreType, RegionCountryName, FirstName, LastName
ORDER BY StoreType, RegionCountryName, FirstName, LastName
En la tabla DimStore, la columna StoreManager representa al empleado de la organización que realiza las funciones de gerente o director del almacén, existiendo la posibilidad de que un mismo gerente se encuentre al cargo de varios almacenes. Esto nos permite complicar un poco la consulta DAX anterior, de manera que los resultados visualizados se correspondan con los directores que administran más de un almacén.
A continuación mostramos nuevamente esta consulta con los retoques que acabamos de mencionar. Por un lado quitamos la columna StoreType de la tabla DimStore, ya que uno de los objetivos consiste en contar el número de almacenes a cargo de cada director, independientemente del tipo de almacén. Dicho recuento lo realizaremos empleando una medida calculada (cláusula DEFINE) con el nombre NumeroAlmacenes, que reutilizaremos en más de un lugar de la consulta. La función FILTER, por otra parte, nos permitirá establecer la condición para obtener los gerentes a cargo de varios almacenes.
DEFINE
MEASURE 'DimStore'[NumeroAlmacenes] = COUNT('DimStore'[StoreKey])
EVALUATE
FILTER(
SUMMARIZE(
'DimStore',
'DimGeography'[RegionCountryName],
'DimEmployee'[FirstName],
'DimEmployee'[LastName],
"Total descuentos", SUM('FactSales'[DiscountAmount]),
"Número almacenes gestionados", 'DimStore'[NumeroAlmacenes] ),
'DimStore'[NumeroAlmacenes] > 1
)
ORDER BY [RegionCountryName],[FirstName],[LastName]

Utilizando SQL resulta necesario escribir una mayor cantidad de código para conseguir este mismo resultado, como vemos a continuación.
WITH
tblGerentes AS
(
SELECT StoreManager, COUNT(*) AS NumeroAlmacenes
FROM DimStore
GROUP BY StoreManager
HAVING COUNT(*) > 1
),
tblStore AS
(
SELECT StoreKey, GeographyKey, DimStore.StoreManager, tblGerentes.NumeroAlmacenes
FROM DimStore
INNER JOIN tblGerentes
ON DimStore.StoreManager = tblGerentes.StoreManager
)
SELECT RegionCountryName, StoreManager, FirstName, LastName,
SUM(DiscountAmount) AS [Total descuentos],
NumeroAlmacenes AS [Número almacenes gestionados]
FROM tblStore
INNER JOIN DimGeography
ON tblStore.GeographyKey = DimGeography.GeographyKey
INNER JOIN DimEmployee
ON tblStore.StoreManager = DimEmployee.EmployeeKey
INNER JOIN FactSales
ON tblStore.StoreKey = FactSales.StoreKey
GROUP BY RegionCountryName, StoreManager, FirstName, LastName, NumeroAlmacenes
ORDER BY RegionCountryName, StoreManager, FirstName, LastName, NumeroAlmacenes
En la siguiente entrega daremos otra vuelta de tuerca al uso de la función SUMMARIZE, mostrando el modo de resolver consultas en las que se encuentren involucradas tablas no relacionadas.
La función SUMMARIZE muestra los distintos valores de una o varias columnas (denominadas columnas de agrupación) de una tabla, permitiendo además incluir expresiones que generen columnas calculadas, encargadas de devolver valores numéricos en base a las columnas de agrupación. Con ello logramos un comportamiento similar al obtenido mediante la cláusula GROUP BY de Transact-SQL.
Comenzaremos nuestro periplo acerca del uso de esta función con un ejemplo muy básico: mostrar los distintos valores de la columna RegionCountryName, perteneciente a la tabla DimGeography.
EVALUATE
SUMMARIZE(
'DimGeography',
'DimGeography'[RegionCountryName]
)
ORDER BY [RegionCountryName]

Como es lógico, podemos utilizar varias de las columnas correspondientes a la tabla pasada en el primer parámetro, también denominada tabla de entrada.
EVALUATE
SUMMARIZE(
'DimGeography',
'DimGeography'[ContinentName],
'DimGeography'[RegionCountryName]
)
ORDER BY [ContinentName],[RegionCountryName]
Para conseguir un resultado equivalente en SQL podemos emplear GROUP BY, tal y como hemos mencionado anteriormente, o también la cláusula DISTINCT, ya que hasta el momento no estamos usando una operación de agregado como COUNT, SUM, etc., sobre el conjunto de resultados.
SELECT ContinentName, RegionCountryName
FROM DimGeography
GROUP BY ContinentName, RegionCountryName
ORDER BY ContinentName, RegionCountryName
--//////////////////////////////////////////////
SELECT DISTINCT ContinentName, RegionCountryName
FROM DimGeography
ORDER BY ContinentName, RegionCountryName
Supongamos ahora que de este resultado nos interesaría saber la cantidad de unidades vendidas para cada una de las combinaciones de continente y país. Para ello escribiremos una expresión que utilice la función SUM aplicada a la columna SalesQuantity de la tabla FactSales, y que añadiremos a SUMMARIZE como último parámetro.
EVALUATE
SUMMARIZE(
'DimGeography',
'DimGeography'[ContinentName],
'DimGeography'[RegionCountryName],
"Cantidades vendidas", SUM('FactSales'[SalesQuantity])
)
ORDER BY [ContinentName],[RegionCountryName]

En el caso de que necesitemos más columnas calculadas iremos añadiendo las expresiones correspondientes a las mismas como parámetros adicionales al final de la función, tal y como vemos a continuación, donde agregamos dos medidas para obtener el importe de venta y las ventas menos el descuento aplicado.
EVALUATE
SUMMARIZE(
'DimGeography',
'DimGeography'[ContinentName],
'DimGeography'[RegionCountryName],
"Cantidades vendidas", SUM('FactSales'[SalesQuantity]),
"Importe ventas", SUM('FactSales'[SalesAmount]),
"Venta con descuento", SUMX('FactSales', 'FactSales'[SalesAmount] - 'FactSales'[DiscountAmount])
)
ORDER BY [ContinentName],[RegionCountryName]
Al emplear una sentencia SQL para obtener un resultado equivalente ahora sí será necesario el uso de GROUP BY, dado que aplicamos diversas operaciones de agregado (suma) sobre varias columnas, así como la combinación de la tabla DimGeography con DimStore y FactSales, para que las columnas numéricas reflejen los valores correctos para cada continente y país.
SELECT ContinentName, RegionCountryName,
SUM(SalesQuantity) AS [Cantidades vendidas],
SUM(SalesAmount) AS [Importe ventas],
(SUM(SalesAmount) - SUM(DiscountAmount)) AS [Venta con descuento]
FROM DimGeography
INNER JOIN DimStore
ON DimGeography.GeographyKey = DimStore.GeographyKey
INNER JOIN FactSales
ON DimStore.StoreKey = FactSales.StoreKey
GROUP BY ContinentName, RegionCountryName
ORDER BY ContinentName, RegionCountryName
En la siguiente parte de esta serie abordaremos la forma de mostrar columnas de agrupación procedentes de múltiples tablas empleando SUMMARIZE.
La función TOPN devuelve un subconjunto de una cantidad determinada de las primeras filas pertenecientes a una tabla, en base a una expresión utilizada para ordenar dicha tabla.
Como primer parámetro pasaremos el número de filas a recuperar; en segundo lugar indicaremos la tabla de la que se obtendrán las filas; el tercer parámetro corresponderá a la expresión que realizará la ordenación de las filas; y opcionalmente, en el cuarto parámetro, pasaremos un número (0 ó 1) indicativo del sentido (ascendente o descendente) en que se realizará la ordenación especificada en el tercer parámetro.
Supongamos que de la tabla DimProduct queremos averiguar las siete marcas de productos (columna BrandName) que han generado más ventas (tabla FactSales, columna SalesAmount). Comenzaremos nuestro análisis del problema desde el motor relacional, ejecutando la siguiente consulta SQL, que nos devuelve todas las marcas de producto con sus totales de venta correspondientes.
WITH tblMarcas AS
(
SELECT DP.BrandName,
SUM(FS.SalesAmount) AS VentasMarca
FROM FactSales AS FS, DimProduct AS DP
WHERE FS.ProductKey = DP.ProductKey
GROUP BY DP.BrandName
)
SELECT BrandName, FORMAT(VentasMarca, 'C', 'es-es') AS Ventas
FROM tblMarcas
ORDER BY VentasMarca DESC

Para obtener aquí las siete marcas con más ventas, simplemente añadiremos la partícula TOP junto a la cantidad de filas que necesitamos obtener.
WITH tblMarcas AS
(
SELECT DP.BrandName,
SUM(FS.SalesAmount) AS VentasMarca
FROM FactSales AS FS, DimProduct AS DP
WHERE FS.ProductKey = DP.ProductKey
GROUP BY DP.BrandName
)
SELECT TOP 7 BrandName, FORMAT(VentasMarca, 'C', 'es-es') AS Ventas
FROM tblMarcas
ORDER BY VentasMarca DESC
Vamos a realizar un primer intento para conseguir este mismo resultado en DAX, utilizando la función TOPN en la siguiente consulta.
EVALUATE
TOPN(
7,
VALUES('DimProduct'[BrandName]),
SUM('FactSales'[SalesAmount]),
0
)
Los resultados, no obstante, no serán nada satisfactorios porque, por un lado, obtenemos todos los nombres de marcas de producto, y por otro, no aparecen los importes de ventas por marca. Observando detenidamente la documentación de la función veremos que esto es completamente lógico, ya que TOPN devuelve las filas de la tabla pasada en el segundo parámetro según el resultado de aplicar la expresión del tercero.
Como primer paso para acercarnos a la solución añadiremos a la consulta una medida calculada, que será la encargada de sumar la columna SalesAmount; esta medida la emplearemos a continuación como tercer parámetro de TOPN.
DEFINE
MEASURE DimProduct[VentasPorMarca] = SUM(FactSales[SalesAmount])
EVALUATE
TOPN(
7,
VALUES('DimProduct'[BrandName]),
DimProduct[VentasPorMarca],
0
)

Con esto casi hemos cumplido con nuestro objetivo, tan sólo nos falta visualizar la columna con los importes de ventas, para lo cual recurriremos a la función ADDCOLUMNS, a la que pasaremos como parámetro la función TOPN y la expresión con la columna calculada que realiza la suma de las ventas.
DEFINE
MEASURE 'DimProduct'[VentasPorMarca] = SUM('FactSales' [SalesAmount])
EVALUATE
ADDCOLUMNS(
TOPN(
7,
VALUES('DimProduct'[BrandName]),
'DimProduct'[VentasPorMarca],
0
),
"Ventas por nombre marca", FORMAT('DimProduct'[VentasPorMarca], "Currency")
)
ORDER BY 'DimProduct'[VentasPorMarca] DESC

Aunque podamos estar acostumbrados al uso de la partícula TOP en Transact-SQL, en el contexto de las consultas y expresiones con DAX es probable que el empleo de TOPN no resulte tan intuitivo como a priori cabría esperar. En este enlace se proporciona una explicación más detallada sobre dicha función, y en este post se muestra un interesante caso de uso.
A raíz de las consultas desarrolladas en la entrega anterior, supongamos que ahora debemos visualizar, además de la marca del producto y el año de venta, el nombre del país y tipo de almacén en el que ésta se ha producido; datos que encontramos en las tablas DimGeography y DimStore respectivamente.
CROSSJOIN. Aumentando la cantidad de tablas a combinar
Puesto que la función GENERATE admite solamente dos tablas en su lista de parámetros debemos modificar nuestra consulta, de forma que soporte la inclusión de nuevas columnas, por lo que vamos a recurrir a la función CROSSJOIN, que al igual que GENERATE, realiza un producto cartesiano de las tablas que recibe como parámetro, pero con la diferencia de poder utilizar más de dos tablas, como vemos en la siguiente consulta.
EVALUATE
CROSSJOIN(
VALUES('DimGeography'[ContinentName]),
VALUES('DimStore'[StoreType]),
VALUES('DimProduct'[BrandName]),
VALUES('DimDate'[CalendarYear])
)
ORDER BY [ContinentName], [StoreType], [BrandName], [CalendarYear]

Para obtener el mismo resultado en Transact-SQL podemos utilizar la cláusula CROSS JOIN, pero con el inconveniente de que el tiempo de ejecución de la consulta será notablemente mayor que en DAX.
SELECT DISTINCT g.ContinentName,s.StoreType,p.BrandName,d.CalendarYear
FROM DimGeography AS g
CROSS JOIN DimStore AS s
CROSS JOIN DimProduct AS p
CROSS JOIN DimDate AS d
ORDER BY g.ContinentName,s.StoreType,p.BrandName,d.CalendarYear
Para añadir a la consulta DAX la columna con las cifras de ventas emplearemos ADDCOLUMNS como uno más de los parámetros de CROSSJOIN.
EVALUATE
CROSSJOIN(
VALUES('DimGeography'[ContinentName]),
VALUES('DimStore'[StoreType]),
VALUES('DimProduct'[BrandName]),
ADDCOLUMNS(
VALUES('DimDate'[CalendarYear]),
"Ventas",FORMAT(SUMX(RELATEDTABLE(FactSales), 'FactSales'[SalesAmount]), "Currency")
)
)
ORDER BY [ContinentName], [StoreType], [BrandName], [CalendarYear]

Tras ejecutar la consulta, con toda seguridad habremos advertido que el cálculo resultante en la columna de ventas es incorrecto, ya que para cada fila únicamente se está teniendo en cuenta la columna 'DimDate'[CalendarYear] al realizar la suma de 'FactSales'[SalesAmount]. Este resultado para cada grupo de años es el mismo que si utilizáramos la siguiente consulta SQL.
SELECT YEAR(DateKey) AS AñoVenta,
FORMAT(SUM(SalesAmount), 'C', 'es-es') AS ImporteVentas
FROM FactSales
GROUP BY YEAR(DateKey)
ORDER BY YEAR(DateKey)

Sin embargo, nuestro objetivo consiste en efectuar la suma de 'FactSales'[SalesAmount] para cada combinación de 'DimGeography'[ContinentName], 'DimStore'[StoreType], 'DimProduct'[BrandName] y 'DimDate'[CalendarYear]. Supongamos que tuviéramos que construir esta consulta para ejecutarla en el motor relacional. La sentencia SQL a emplear, utilizando CTEs, sería parecida a la siguiente.
WITH
tblGeography AS
(
SELECT DISTINCT GeographyKey, ContinentName
FROM DimGeography
),
tblStore AS
(
SELECT StoreKey, StoreType, tblGeography.ContinentName
FROM DimStore
INNER JOIN tblGeography
ON DimStore.GeographyKey = tblGeography.GeographyKey
),
tblProduct AS
(
SELECT ProductKey, BrandName
FROM DimProduct
),
tblDate AS
(
SELECT Datekey, CalendarYear
FROM DimDate
)
SELECT ContinentName, StoreType, BrandName, CalendarYear, FORMAT(SUM(SalesAmount), 'C', 'es-es') AS Ventas
FROM FactSales
INNER JOIN tblStore
ON FactSales.StoreKey = tblStore.StoreKey
INNER JOIN tblProduct
ON FactSales.ProductKey = tblProduct.ProductKey
INNER JOIN tblDate
ON FactSales.DateKey = tblDate.Datekey
GROUP BY ContinentName, StoreType, BrandName, CalendarYear
ORDER BY ContinentName, StoreType, BrandName, CalendarYear

Para obtener un resultado equivalente en DAX debemos cambiar la posición de las funciones CROSSJOIN y ADDCOLUMNS dentro de la consulta.
Como primer parámetro de ADDCOLUMNS pasaremos la llamada a CROSSJOIN, y como segundo la expresión que crea la columna calculada que realiza la suma de 'FactSales'[SalesAmount]. Adicionalmente situaremos todo este código dentro de la función FILTER, añadiendo una condición para quitar las filas que carezcan de importe de venta.
EVALUATE
FILTER(
ADDCOLUMNS(
CROSSJOIN(
VALUES('DimGeography'[ContinentName]),
VALUES('DimStore'[StoreType]),
VALUES('DimProduct'[BrandName]),
VALUES('DimDate'[CalendarYear])
),
"Ventas",FORMAT(SUMX(RELATEDTABLE(FactSales), 'FactSales'[SalesAmount]), "Currency")
),
LEN([Ventas]) > 0
)
ORDER BY [ContinentName], [StoreType], [BrandName], [CalendarYear]

Como alternativa a lo anterior podemos volver a usar GENERATE, pasando en el primer parámetro la llamada a CROSSJOIN y en el segundo la función ADDCOLUMNS.
EVALUATE
FILTER(
GENERATE(
CROSSJOIN(
VALUES('DimGeography'[ContinentName]),
VALUES('DimStore'[StoreType]),
VALUES('DimProduct'[BrandName])
),
ADDCOLUMNS(
VALUES('DimDate'[CalendarYear]),
"Ventas",FORMAT(SUMX(RELATEDTABLE(FactSales), 'FactSales'[SalesAmount]), "Currency")
)
),
LEN([Ventas]) > 0
)
ORDER BY [ContinentName], [StoreType], [BrandName], [CalendarYear]
En la segunda parte de esta serie ilustrábamos cómo mediante el trabajo en conjunto de VALUES más ADDCOLUMNS, lográbamos crear consultas cuyos resultados ofrecieran al mismo tiempo la información de una tabla más una operación de agregado sobre otra tabla relacionada.
Pero habrá ocasiones en que esto no sea suficiente, siendo necesario añadir a la consulta datos de otras tablas del modelo, para ampliar y mejorar la información proporcionada al usuario.
Por ejemplo, tomando como punto de partida la consulta con la función VALUES de la entrega anterior, supongamos que además de la columna BrandName, necesitamos añadir al resultado la columna CalendarYear de la tabla DimDate y un cálculo basado en la suma de las ventas realizadas por la combinación de las dos columnas.
Si con ADDCOLUMNS intentáramos ejecutar una consulta como la siguiente se produciría un error, ya que de los parámetros admitidos por esta función, sólo el primero puede ser una tabla o expresión de tabla, siendo el resto expresiones que generan columnas calculadas.
EVALUATE
ADDCOLUMNS(
VALUES('DimProduct'[BrandName]),
VALUES('DimDate'[CalendarYear]),
"Ventas por marca y año", CALCULATE(SUM('FactSales'[SalesAmount]))
)
GENERATE. Enriqueciendo el resultado mediante la combinación de tablas
Para resolver este problema recurriremos a la función GENERATE, que recibe como parámetro dos tablas o expresiones de tabla, generando como resultado el producto cartesiano de ambas. A continuación vemos una consulta de ejemplo que emplea esta función para combinar las tablas DimStore y DimGeography.
EVALUATE
GENERATE('DimStore','DimGeography')
Observando el resultado, apreciaremos ocurrencias de filas en la tabla DimStore que tienen una correspondencia geográfica incorrecta al combinarse con la tabla DimGeography, debido precisamente al modo en el que se realiza el cruce entre los registros de ambas tablas. En breve explicaremos cómo arreglar este aspecto en particular.
La forma de aplicar GENERATE para resolver nuestro problema consiste en pasarle como primer parámetro la función VALUES, creando así la columna con las marcas de productos; y como segundo parámetro la función ADDCOLUMNS, para crear tanto la columna de años como la columna calculada con la suma de ventas. Todo ello lo vemos en la siguiente consulta.
EVALUATE
GENERATE(
VALUES('DimProduct'[BrandName]),
ADDCOLUMNS(
VALUES('DimDate'[CalendarYear]),
"Ventas por marca y año", SUMX(RELATEDTABLE('FactSales'), 'FactSales'[SalesAmount])
)
)
ORDER BY [BrandName], [CalendarYear]
En el conjunto de resultados obtenido, como ya hemos mencionado, habrá filas con el valor de la columna "Ventas por marca y año" vacías, que corresponderán a registros de la tabla FactSales para los que no se han producido ventas en la fecha con la que se están cruzando.

Para evitar la visualización de las filas que no tienen valor en la columna de importe de venta agregaremos a la consulta la función FILTER, en la que situaremos como primer parámetro la expresión anterior, mientras que en el segundo parámetro escribiremos una expresión que actúe como filtro, eliminando las filas sin valor de venta.
EVALUATE
FILTER(
GENERATE(
VALUES('DimProduct'[BrandName]),
ADDCOLUMNS(
VALUES('DimDate'[CalendarYear]),
"Ventas por marca y año", SUMX(RELATEDTABLE('FactSales'), 'FactSales'[SalesAmount])
)
),
NOT(ISBLANK([Ventas por marca y año]))
)
ORDER BY [BrandName], [CalendarYear]
Otra variante de la expresión de filtro podría ser la siguiente:
[Ventas por marca y año] > 0
Con cualquiera de estas formas de aplicar el filtro, el conjunto de resultados ya no mostrará importes vacíos.

Con Transact-SQL podemos obtener este mismo resultado utilizando una sentencia muy similar a la última empleada, pero añadiendo la combinación con la tabla DimDate y su campo DateKey, así como el campo CalendarYear en la cláusula GROUP BY.
WITH tblFactSales AS
(
SELECT DateKey, ProductKey, SalesAmount
FROM FactSales
)
SELECT BrandName, CalendarYear, SUM(tblFactSales.SalesAmount) AS [Ventas por marca y año]
FROM DimProduct
INNER JOIN tblFactSales
ON DimProduct.ProductKey = tblFactSales.ProductKey
INNER JOIN DimDate
ON DimDate.Datekey = tblFactSales.DateKey
GROUP BY BrandName, CalendarYear
ORDER BY BrandName, CalendarYear
Corrección de efectos no deseados en la ordenación de números con formato
Si en la anterior consulta DAX aplicamos un formato (moneda, separador de millar, etc.) sobre la columna "Ventas por marca y año", creada dinámicamente, y ordenamos por dicha columna de la siguiente manera.
EVALUATE
FILTER(
GENERATE(
VALUES('DimProduct'[BrandName]),
ADDCOLUMNS(
VALUES('DimDate'[CalendarYear]),
"Ventas por marca y año", FORMAT(SUMX(RELATEDTABLE('FactSales'), 'FactSales'[SalesAmount]), "Currency")
)
),
LEN([Ventas por marca y año]) > 0
)
ORDER BY [Ventas por marca y año]
El resultado no se mostrará ordenado tal y como esperamos.

Esto es debido a que la columna "Ventas por marca y año", por causa del formateo aplicado, contiene datos de tipo carácter en lugar de numéricos, lo que produce el aparentemente extraño comportamiento en la ordenación, que en realidad no es tal, ya que se trata del modo correcto de ordenar una columna de tipo string (obsérvese también que para la condición de filtro hemos utilizado la función LEN, que se emplea en el manejo de cadenas).
Vamos a solucionar este inconveniente recurriendo de nuevo a la creación de medidas mediante la instrucción DEFINE, con la que crearemos la medida que realiza la suma de las ventas, la cual emplearemos en los siguientes puntos de la consulta: dentro de ADDCOLUMNS, como expresión de filtro, y como criterio de ordenación.
DEFINE
MEASURE 'FactSales'[Ventas] = SUMX(RELATEDTABLE('FactSales'), 'FactSales'[SalesAmount])
EVALUATE
FILTER(
GENERATE(
VALUES('DimProduct'[BrandName]),
ADDCOLUMNS(
VALUES('DimDate'[CalendarYear]),
"Ventas por marca y año", FORMAT('FactSales'[Ventas], "Currency")
)
),
[Ventas] > 0
)
ORDER BY [Ventas] DESC

La obtención de resultados procedentes de dos tablas resulta una estupenda posibilidad de la que disponemos al escribir consultas DAX, aunque en escenarios en los que debamos manejar información de un mayor número de tablas resultará insuficiente. En la siguiente parte de esta serie abordaremos dicha cuestión.
En la primera parte de esta serie comenzamos abordando la creación de columnas calculadas, que uníamos al resto de columnas de la tabla mediante la función ADDCOLUMNS. En esta segunda entrega veremos cómo construir consultas basadas en los valores de una única columna.
VALUES. Valores distintos de una columna
Aunque la función ADDCOLUMNS es adecuada para dotar de valores adicionales a una tabla, puede suceder que en determinados momentos estemos más interesados en obtener el contenido de una sola de sus columnas, y más concretamente, de los distintos valores que la componen, bien para visualizarlos o como parte de una expresión más compleja.
Para tal cometido disponemos de la función VALUES, que recibe como parámetro una columna de la que extrae sus distintos valores, retornándolos en forma de tabla de una sola columna, como vemos en el siguiente ejemplo, donde se muestran los nombres de las marcas de productos de la tabla DimProduct.
EVALUATE
VALUES('DimProduct'[BrandName])
ORDER BY [BrandName]

En una consulta SQL disponemos de la cláusula DISTINCT, con la que obtendremos los distintos valores de uno de los campos de la tabla.
SELECT DISTINCT BrandName
FROM DimProduct
ORDER BY BrandName
Puesto que el tipo de dato devuelto por VALUES es una tabla, y ADDCOLUMNS recibe un valor de este tipo en su primer parámetro, podemos combinar ambas funciones para crear una consulta en la que mostremos las marcas de productos de la tabla DimProduct, y una columna calculada que sume las ventas de la tabla FactSales, obteniendo como resultado las cifras de ventas por cada marca. Recordemos que para que la operación de suma se lleve a cabo correctamente emplearemos las funciones CALCULATE y SUM, o la combinación de SUMX y RELATEDTABLE.
EVALUATE
ADDCOLUMNS(
VALUES('DimProduct'[BrandName]),
"Ventas por marca", CALCULATE(SUM('FactSales'[SalesAmount]))
)
ORDER BY [BrandName]

La función DISTINCT también devuelve, al igual que VALUES, los valores únicos de una columna, por lo que en la mayoría de las situaciones los resultados serán iguales utilizando indistintamente una u otra. No obstante, en algunos casos, VALUES devolverá un valor vacío o elemento desconocido, cuando al combinar dos tablas no exista correspondencia entre alguno de sus miembros.
EVALUATE
ADDCOLUMNS(
DISTINCT('DimProduct'[BrandName]),
"Ventas por marca", CALCULATE(SUM('FactSales'[SalesAmount]))
)
ORDER BY [BrandName]
Desde Transact-SQL necesitamos en este caso emplear la cláusula GROUP BY, puesto que junto al campo BrandName vamos a usar la función de agregado SUM. Para la combinación entre las tablas DimProduct y FactSales definiremos sobre esta última una expresión común de tabla o CTE (Common Table Expression).
WITH tblFactSales AS
(
SELECT ProductKey, SalesAmount
FROM FactSales
)
SELECT BrandName, SUM(tblFactSales.SalesAmount) AS [Ventas por marca]
FROM DimProduct
INNER JOIN tblFactSales
ON DimProduct.ProductKey = tblFactSales.ProductKey
GROUP BY BrandName
ORDER BY BrandName
DEFINE y MEASURE. Creación y reutilización de medidas calculadas
Si en la anterior consulta con VALUES quisiéramos además ordenar el resultado por la columna "Ventas por marca", tendríamos que repetir la expresión que contiene la operación de suma en la cláusula de ordenación.
EVALUATE
ADDCOLUMNS(
VALUES('DimProduct'[BrandName]),
"Ventas por marca", CALCULATE(SUM('FactSales'[SalesAmount]))
)
ORDER BY CALCULATE(SUM('FactSales'[SalesAmount]))
Esta situación puede resultar bastante incómoda en el caso de consultas extensas, provocando además, que nuestro código se vuelva más complicado de leer y mantener.
Una forma de evitar este tipo de reiteraciones pasa por utilizar las instrucciones DEFINE y MEASURE al comienzo de la consulta, las cuales, tal y como sus nombres indican, nos permiten definir una medida calculada que podemos reutilizar tantas veces como queramos a lo largo de la consulta. Como ventaja adicional, en el ejemplo que nos ocupa, al crear la medida con DEFINE ya no será necesario emplear la función CALCULATE.
DEFINE
MEASURE FactSales[Ventas] = SUM('FactSales'[SalesAmount])
EVALUATE
ADDCOLUMNS(
VALUES('DimProduct'[BrandName]),
"Ventas por marca", FORMAT(FactSales[Ventas], "Currency")
)
ORDER BY FactSales[Ventas] DESC
Para mejorar la visualización de los datos al crear esta nueva versión de la consulta incluimos la función FORMAT, aplicando un formato de moneda a la medida/columna calculada.

El siguiente paso consistirá en construir una consulta compuesta por columnas provenientes de varias tablas, cuestión esta que abordaremos en las próximas entregas.
Como ya explicábamos en el artículo sobre creación de consultas para modelos tabulares (primera y segunda parte), el lenguaje DAX puede ser utilizado para emitir consultas contra un modelo tabular, obteniendo como respuesta un conjunto de resultados, dentro de una mecánica similar a la que estamos acostumbrados a emplear cuando trabajamos con el motor relacional, es decir, mediante sentencias SQL del estilo "SELECT ListaDeCampos FROM Tabla". Esta funcionalidad del lenguaje recibe el nombre de Consultas DAX (DAX Queries), siendo la instrucción EVALUATE la encargada de llevarla a cabo empleando sentencias del tipo "EVALUATE Expresión".
Aunque inicialmente, las consultas DAX no estaban disponibles en el lenguaje cuando PowerPivot hizo su primera presentación, la llegada de SQL Server 2012, que posiciona a Business Intelligence Semantic Model (BISM) como el nuevo paradigma en el desarrollo de soluciones de inteligencia de negocio, ha supuesto una ampliación de las características del lenguaje, que ahora nos va a permitir escribir consultas contra los nuevos modelos tabulares pertenecientes a BISM.
Con el presente artículo iniciamos una serie en la que continuaremos la línea de exploración iniciada en el artículo antes mencionado, por lo que son recomendables, al menos, unos conocimientos básicos sobre DAX, de forma que los ejemplos que vamos a exponer puedan ser seguidos adecuadamente. Adicionalmente, en aquellos casos en que se estime oportuno, mostraremos el código de la consulta tanto en lenguaje DAX como en Transact-SQL, lo que nos permitirá observar las diferencias entre ambos a la hora de abordar la resolución de un mismo problema.
El proyecto de ejemplo
Como base de los ejemplos que iremos desarrollando vamos a utilizar el modelo ContosoTabular, anteriormente creado aquí; aunque en esta ocasión volveremos a abrir el proyecto de dicho modelo en SQL Server Data Tools (SSDT) para añadir la tabla DimEmployee (necesaria en alguna de las diversas pruebas a realizar), tras lo cual desplegaremos nuevamente el modelo en el servidor de análisis de SQL Server, actualizando así los cambios realizados. A continuación ejecutaremos SQL Server Management Studio (SSMS) abriendo una ventana de consulta, que a partir de este momento utilizaremos para ilustrar las funciones empleadas en la construcción de consultas DAX.

ADDCOLUMNS. Creación dinámica de columnas
La función ADDCOLUMNS permite crear una o varias columnas calculadas a partir de una tabla o expresión de tabla. Como primer parámetro situaremos la tabla a la que se añadirá la columna(s), definiendo a continuación cada una de las columnas mediante una cadena para el título y una expresión DAX con la fórmula para el cálculo de la columna.
En la siguiente sentencia mostramos la tabla DimStore junto a una columna calculada que duplica el número de empleados de cada almacén.
EVALUATE
ADDCOLUMNS(
'DimStore',
"Empleados Duplicados", [EmployeeCount] * 2
)

Para obtener el mismo resultado utilizando la instrucción SELECT en Transact-SQL escribiremos la siguiente consulta.
SELECT *, EmployeeCount * 2 AS [Empleados Duplicados]
FROM DimStore
Continuando con la tabla DimStore, supongamos que de la columna OpenDate, que contiene la fecha de apertura del almacén, queremos obtener la parte correspondiente al mes como nombre. Una posibilidad, aunque no la más óptima, como veremos enseguida, sería utilizar la siguiente consulta, en la que combinando ADDCOLUMNS con las funciones MONTH y SWITCH, obtenemos una nueva columna con el nombre del mes correspondiente a OpenDate.
EVALUATE
ADDCOLUMNS(
'DimStore',
"Mes Apertura", SWITCH(MONTH([OpenDate]),
1,"Enero",
2,"Febrero",
3,"Marzo",
4,"Abril",
5,"Mayo",
6,"Junio",
7,"Julio",
8,"Agosto",
9,"Septiembre",
10,"Octubre",
11,"Noviembre",
12,"Diciembre"
)
)

La función MONTH devuelve el número del mes de la fecha pasada como parámetro. Si este resultado lo pasamos a su vez como parámetro a la función SWITCH, para cada número retornamos una cadena con el nombre del mes correspondiente.
Pero como ya decíamos antes, existe una forma mucho más simple de obtener el mismo resultado, consistente en utilizar la función FORMAT, a la que pasaremos como primer parámetro la columna de fecha, y como segundo la cadena con el formato que deseamos aplicar. La siguiente expresión consigue el mismo resultado que la anterior pero con mucho menos código.
EVALUATE
ADDCOLUMNS(
'DimStore',
"Mes Apertura", FORMAT([OpenDate], "MMMM")
)
En Transact-SQL también disponemos de la función FORMAT que se utiliza como vemos a continuación.
SET LANGUAGE Spanish
SELECT *, FORMAT(OpenDate, 'MMMM') AS [Mes Apertura]
FROM DimStore
Compliquemos ahora un poco la expresión utilizada para crear la columna calculada: supongamos que necesitamos averiguar la cantidad de años transcurridos desde la fecha de apertura del almacén (columna OpenDate) hasta la fecha actual, teniendo en cuenta que si existe valor en la columna CloseDate significará que el almacén ya ha sido cerrado.
La expresión de columna para resolver esta situación implicará, pues, el uso de las siguientes funciones: YEAR, para extraer el año de las columnas de fecha; NOW, para obtener la fecha actual; ISBLANK, para comprobar la existencia de valor en la columna CloseDate; e IF, para determinar la operación que realiza el cálculo de años según la columna CloseDate tenga o no valor.
EVALUATE
ADDCOLUMNS(
'DimStore',
"Años actividad almacén", IF(ISBLANK([CloseDate]),
YEAR(NOW()) - YEAR([OpenDate]),
YEAR([CloseDate]) - YEAR([OpenDate])
)
)

En SQL usaremos la función DATEDIFF para hallar la diferencia entre las fechas. La expresión CASE nos servirá para utilizar en el cálculo la fecha actual o la de cierre del almacén si la primera es nula.
SELECT *,
CASE
WHEN CloseDate IS NULL THEN
DATEDIFF(year, OpenDate, GETDATE())
ELSE
DATEDIFF(year, OpenDate, CloseDate)
END AS [Años actividad almacén]
FROM DimStore
Las columnas calculadas creadas hasta el momento con la función ADDCOLUMNS se basan en operaciones sobre las columnas de la propia tabla DimStore, pasada como primer parámetro a la función. No obstante, también es posible generar columnas dinámicas que pertenezcan a otras tablas del modelo tabular con las que DimStore guarda relación.
Observando el diagrama de tablas del modelo tabular en SSDT, vemos que DimStore mantiene relaciones con las tablas FactSales, DimGeography y DimEmployee.

Es por lo tanto perfectamente posible crear una columna calculada con ADDCOLUMNS que muestre la información de la columna RegionCountryName, perteneciente a la tabla DimGeography, ya que se relaciona con la tabla DimStore a través de la columna GeographyKey, común en ambas tablas. Es necesario tener en cuenta que en el momento de crear la expresión de la columna calculada debemos indicar explícitamente la existencia de dicha relación utilizando la función RELATED. Esta nueva columna tendrá como título "País ubicación".
EVALUATE
ADDCOLUMNS(
'DimStore',
"País ubicación", RELATED('DimGeography'[RegionCountryName])
)
Además, tal y como explicábamos al principio de este apartado, con ADDCOLUMNS podemos crear varias columnas calculadas al mismo tiempo, por lo que añadiremos a la consulta anterior otra expresión con el título "Ventas almacén", que realice, empleando la función SUM, la suma de la columna SalesAmount, perteneciente a la tabla FactSales, con la que también se relaciona DimStore a través del campo StoreKey.
EVALUATE
ADDCOLUMNS(
'DimStore',
"País ubicación", RELATED('DimGeography'[RegionCountryName]),
"Ventas almacén", SUM('FactSales'[SalesAmount])
)

Pero observando el resultado obtenido vemos que hay algo que no funciona correctamente, ya que la finalidad de la anterior consulta consiste en obtener la suma del campo SalesAmount para cada uno de los almacenes, y lo que hemos conseguido es que el valor de la columna calculada "Ventas almacén" muestre en todos los casos la suma total de SalesAmount, lo cual no es el objetivo que perseguimos.
Podemos solucionar este problema de dos formas: la primera consiste en sustituir la función SUM por SUMX, que sobre una tabla pasada como primer parámetro realiza la suma de la columna pasada en el segundo parámetro. Aquí también debemos indicar que existe una relación entre DimStore y FactSales, para lo que usaremos la función RELATEDTABLE.
EVALUATE
ADDCOLUMNS(
'DimStore',
"País ubicación", RELATED('DimGeography'[RegionCountryName]),
"Ventas almacén", SUMX(RELATEDTABLE('FactSales'), 'FactSales'[SalesAmount])
)
De esta forma la suma ya se realizará de forma separada para cada almacén.

La otra vía de abordar la suma de las ventas por almacén consiste en utilizar la función CALCULATE, a la que pasaremos como parámetro la operación de suma. CALCULATE se encargará de evaluar dicha expresión para cada uno de los almacenes (valores distintos del campo StoreKey) existentes en la tabla DimStore, obteniendo de esta manera el resultado deseado.
EVALUATE
ADDCOLUMNS(
'DimStore',
"País ubicación", RELATED('DimGeography'[RegionCountryName]),
"Ventas almacén", CALCULATE(SUM('FactSales'[SalesAmount]))
)
La función CALCULATE puede revestir cierta complejidad inicial, por lo que recomendamos visitar el siguiente post del blog de Marco Russo, en el que realiza una detallada explicación acerca de su funcionamiento.
Para obtener este resultado con Transact-SQL usaremos una combinación de tipo LEFT JOIN entre las tablas DimStore y DimGeography, además de una subconsulta sobre la tabla FactSales, que realice la suma del campo SalesAmount con la función SUM.
SELECT S.*, G.RegionCountryName,
(SELECT SUM(F.SalesAmount) FROM FactSales AS F WHERE F.StoreKey = S.StoreKey) AS [Ventas almacén]
FROM DimStore AS S
LEFT JOIN DimGeography AS G
ON S.GeographyKey = G.GeographyKey
Como hemos podido comprobar, la función ADDCOLUMNS resulta de gran utilidad a la hora de añadir a nuestras consultas información adicional en forma de columnas calculadas. En las siguientes entregas explicaremos diversas técnicas y funciones del lenguaje, mediante las que conseguiremos seleccionar un subconjunto de columnas de la tabla, evitando la necesidad de obtener la totalidad de las mismas.
Una de las primeras cosas que busqué en el modo de escritorio tradicional al instalar el nuevo Windows 8 fue el botón "Mostrar escritorio". Se trata de una opción que particularmente considero muy útil y cómoda para limpiar el escritorio cuando está plagado de ventanas. Sin embargo, en un principio sólo encontré esta posibilidad haciendo clic derecho en la barra de tareas.

Tras publicar este post, Antón Molleda (al que agradezco su interés y ayuda) envió un acertado comentario indicando que en el extremo derecho de la barra de tareas, si bien no tiene la forma de un botón, podemos seguir haciendo clic, provocando la desaparición de las ventanas actualmente visibles en el escritorio (efecto que también conseguiremos con la combinación de teclas Windows+D).
Si además de todo esto queremos contar con la posibilidad de disponer de un acceso directo anclado en la barra de tareas para lograr el mismo resultado, podemos seguir lo sencillos pasos que se comentan a continuación.
En primer lugar haremos clic derecho en una zona vacía del escritorio, seleccionando la opción "Nuevo | Acceso directo". En la ventana de creación del acceso directo que aparecerá seguidamente, escribiremos la siguiente instrucción en el campo de ubicación del elemento:
C:\windows\explorer.exe shell:::{3080F90D-D7AD-11D9-BD98-0000947B0257}

En el siguiente paso daremos un nombre al acceso directo y haremos clic en Finalizar para completar su creación.

El icono del acceso directo recién creado es igual al del Explorador de archivos, por lo que puede resultar una buena idea cambiarlo. Haciendo clic derecho sobre el icono seleccionaremos la opción Propiedades, abriéndose una ventana del mismo nombre, en la que haremos clic en el botón "Cambiar icono..."; lo que a su vez abrirá el cuadro de diálogo para realizar esta operación.
Podemos elegir entre los iconos que acompañan al archivo explorer.exe, ofrecido por defecto, o bien usar alguno de los siguientes archivos, que también adjuntan colecciones de iconos.
%SystemRoot%\system32\Shell32.dll
%SystemRoot%\system32\imageres.dll

Aceptando ambas ventanas confirmaremos el cambio de icono para el acceso directo, por lo que ya podemos hacer doble clic sobre el mismo para ocultar todas las ventanas del escritorio.

Como detalle final podemos hacer clic derecho en el acceso directo y seleccionar la opción "Anclar a la barra de tareas", para tener esta opción disponible de forma mucho más inmediata.

Espero que os resulte de utilidad.
En la primera parte de este artículo explicábamos la forma de crear un modelo tabular en SQL Server 2012 para su posterior consulta utilizando algunas de las técnicas que ilustraremos en esta segunda entrega.
Consultas DAX
Todos aquellos lectores que hayan tenido la oportunidad de trabajar con PowerPivot conocerán en mayor o menor medida el lenguaje DAX y la capacidad que otorga al usuario de construir, en un modelo de datos, expresiones en forma de columnas y medidas calculadas para obtener los resultados analíticos requeridos.
En lo que respecta al desarrollo de modelos tabulares, a las funcionalidades que acabamos de mencionar se suma ahora la posibilidad de escribir consultas con las que obtener conjuntos de resultados a partir de las tablas del modelo.
Para ello haremos uso de la instrucción EVALUATE que representa la pieza fundamental en la construcción de consultas DAX contra un modelo tabular, acompañando a la misma el nombre de la tabla a obtener, tal y como vemos en el siguiente ejemplo.
EVALUATE DimStore;
Para poder apreciar el resultado de la ejecución de esta sentencia iniciaremos SSMS y nos conectaremos a Analysis Server. A continuación haremos clic en el modelo ContosoTabular recién desplegado en el servidor, que aparece como nodo dependiente de la instancia de Analysis Server, y seleccionaremos la opción de menú "File | New | Query with Current Connection", que abrirá una ventana de consultas MDX que en este caso utilizaremos para escribir sentencias DAX. Antes de proseguir aclararemos al lector que cuando nos encontramos trabajando con modelos tabulares, el editor de consultas de SSMS, a pesar de guardar los archivos de sentencias con la extensión MDX, soporta tanto consultas DAX como MDX.

Si ya hemos consultado la documentación oficial que sobre la instrucción EVALUATE existe, veremos que es posible ordenar el conjunto de resultados obtenido mediante la cláusula ORDER BY, de tal forma que si modificamos nuestra consulta de la manera mostrada en la siguiente sentencia, los datos aparecerán ordenados por la columna StoreManager.
EVALUATE DimStore
ORDER BY [StoreManager];
Podemos combinar varias columnas a la hora de ordenar, así como alterar el sentido de la ordenación con los modificadores ASC y DESC.
EVALUATE DimStore
ORDER BY [StoreManager] ASC, [StoreKey] DESC;

Pero aún hay más respecto a la ordenación, ya que usando la subcláusula START AT conseguiremos que los datos comiencen a mostrarse a partir de un determinado valor, perteneciente a la columna(s) que participa en el orden.
EVALUATE DimStore
ORDER BY [StoreManager] START AT 75;
Dedicaremos un próximo artículo a profundizar en el uso de EVALUATE y el resto de instrucciones DAX especialmente desarrolladas para el trabajo con modelos tabulares.
Elementos calculados. Medidas y columnas
En un modelo de datos tabular, además de la información que obtenemos de las columnas de una tabla, podemos encontrarnos con columnas y medidas calculadas que no forman parte de la estructura original de dicha tabla, sino que son producto de una expresión/fórmula DAX. En este apartado haremos una primera aproximación a este tipo de elementos de un modelo.
Pongamos por ejemplo que para la tabla FactSales queremos saber cuánto ha supuesto el total de ventas, por lo que tendremos que aplicar una operación de suma sobre la columna SalesAmount.
En primer lugar nos situaremos en dicha tabla dentro del diseñador. Como podemos comprobar, la cuadrícula de datos se divide en dos zonas: la superior, correspondiente a los propios registros de la tabla; y la inferior, reservada a las medidas calculadas, y que actualmente se encuentra vacía.
Para crear la medida que sume las filas de la columna SalesAmount haremos clic en una celda vacía de la zona inferior de dicha columna, y a continuación escribiremos la siguiente expresión en la barra de fórmulas del diseñador.
TotalVentas:=SUM([SalesAmount])
De esta forma obtendremos una medida con el nombre TotalVentas, que mostrará el resultado de la operación de suma sobre la columna.

También conseguiremos el mismo resultado mediante la opción de menú "Column | AutoSum | Sum", con la diferencia de que el nombre de la medida (Sum of SalesAmount) será asignado automáticamente, no obstante, siempre podemos recurrir a la ventana de propiedades de la medida y cambiarle el nombre en su propiedad Measure Name.
Pasemos seguidamente a la creación de una columna calculada, donde vamos a suponer que necesitamos una nueva columna para la tabla FactSales, en la que se realice una operación sobre la columna SalesAmount, que arroje como resultado el valor de dicha columna para la fila actual menos el 10%. Esto lo conseguiremos haciendo clic en una celda de la primera columna vacía disponible en la tabla y escribiendo la siguiente sentencia en la barra de fórmulas.
=[SalesAmount] - ([SalesAmount] * 0,1)
A continuación asignaremos el nombre a esta nueva columna haciendo doble clic en su cabecera y escribiendo el valor VentasConDescuento.

Tras la creación de estos miembros calculados proseguiremos con otro de los modos de consulta a nuestra disposición: las consultas MDX.
Consultas MDX
Aunque MDX fue desarrollado como lenguaje de consulta para cubos OLAP (modelos multidimensionales), en SQL Server 2012 ha sido adaptado para poder interaccionar también con modelos tabulares, de manera que cuando lo empleamos dentro de este contexto, el nombre del modelo (archivo bim del proyecto en SSDT) representa al cubo de datos, mientras que las tablas identifican a las dimensiones. Las medidas que hayamos calculado en las diferentes tablas quedarán agrupadas en un nodo aparte, funcionando como una dimensión especial, lo que vemos al abrir el modelo en SSMS.

Para hacer una consulta MDX contra el modelo emplearemos la instrucción SELECT, donde al igual que si consultáramos un cubo OLAP, especificaremos, mediante la cláusula ON COLUMNS, la dimensión que ubicaremos en el eje de las columnas, y que en este caso será la medida TotalVentas, ya que como hemos mencionado, las medidas del modelo también pueden comportarse como una dimensión al crear la consulta.
Con la cláusula ON ROWS indicamos que la dimensión utilizada para las filas será DimGeography, visualizando el nombre de las provincias o estados de los países con la función Children de MDX aplicada al atributo StateProvinceName.
SELECT
{[Measures].[TotalVentas]} ON COLUMNS,
{[DimGeography].[StateProvinceName].Children} ON ROWS
FROM [Model];

Agregando la cláusula WHERE a la consulta podremos aplicar filtros al resultado. Por ejemplo, si queremos que las cifras de ventas se circunscriban solamente a las ventas de productos de color azul (tabla DimProduct, atributo ColorName), emplearemos la siguiente consulta.
SELECT
{[Measures].[TotalVentas]} ON COLUMNS,
{[DimGeography].[RegionCountryName].Children} ON ROWS
FROM [Model]
WHERE ([DimProduct].[ColorName].&[Azul]);

Podemos desplazar la medida TotalVentas a la cláusula WHERE y poner en las columnas una selección de los años en los que se realizaron las ventas. Para especificar dichos años como un rango, simplemente hemos de poner el primer y último elemento del mismo separados por dos puntos, tal y como vemos a continuación.
SELECT
{[DimDate].[CalendarYear].&[2007] : [DimDate].[CalendarYear].&[2010]} ON COLUMNS,
{[DimGeography].[RegionCountryName].Children} ON ROWS
FROM [Model]
WHERE ([Measures].[TotalVentas]);

Excel como herramienta de consulta
Y llegamos a Excel, el tercero de los tipos de consulta contra modelos tabulares que estamos presentando en este artículo.
Cuenta como gran ventaja con una potente interfaz de usuario, lo que nos permite construir informes de gran riqueza ya que pone a nuestra disposición un amplio abanico de elementos visuales tales como tablas dinámicas, gráficos, estilos, formatos, etc.
Una vez iniciado Excel, para acceder al contenido de un modelo tabular haremos clic en la pestaña Datos de la cinta de opciones, y a continuación, dentro del grupo "Obtener datos externos", haremos clic en la opción "De otras fuentes" seleccionando en la lista desplegable el valor "Desde Analysis Services".

Como resultado se abrirá un asistente en el que especificaremos el servidor\instancia a la que queremos conectarnos.

En el siguiente paso seleccionaremos la base de datos que contiene el modelo tabular (ContosoTabular) junto al propio modelo, que en el asistente estará identificado como un cubo.

En el último paso guardamos la información de conexión en un archivo.

Como resultado de la ejecución del asistente se establecerá una conexión contra el modelo tabular, y en la hoja de cálculo se creará una tabla dinámica que nos servirá como vehículo para presentar los datos del modelo. En el panel de campos de la tabla dinámica tendremos a nuestra disposición las tablas y medidas que hayamos definido en el modelo, que arrastraremos, bien directamente a la tabla o a los bloques denominados Valores, Etiquetas de fila/columna y Filtro de informe, hasta componer un informe similar al que podemos ver a continuación, donde analizamos las ventas de productos de color azul en función de los años y el país en que se ha producido la venta.

Además podemos añadir un gráfico sincronizado con los valores numéricos desde la opción Gráfico dinámico, perteneciente a la pestaña Opciones de la pestaña de nivel superior Herramientas de tabla dinámica. En dicho gráfico también es posible aplicar filtros al igual que desde la tabla dinámica.

Y llegados a este punto damos por concluida esta introducción dedicada a la elaboración de consultas contra los nuevos modelos tabulares de SQL Server 2012, espero que resulte de ayuda a todos los que quieran empezar a dar sus primeros pasos con esta interesante tecnología.
Los Servicios de Análisis de SQL Server 2012 (SQL Server Analysis Services, SSAS) han experimentado un importante cambio con la introducción del Modelo Semántico de Inteligencia de Negocio (Business Intelligence Semantic Model, BISM), un nuevo paradigma en la construcción de sistemas analíticos, que aúna el tradicional modelo multidimensional (Unified Dimensional Model, UDM), propio de los tradicionales cubos OLAP, con el novedoso modelo tabular, basado en un motor de datos relacional, el cual propugna una filosofía de trabajo más simple (aunque no menos potente en cuanto al volumen de datos a manejar) que nos permita resolver determinados problemas de una forma más sencilla y ágil que la requerida habitualmente para desarrollar un modelo multidimensional.
Y es en estos últimos, los modelos de datos tabulares, sobre los que vamos a concentrar nuestra atención a lo largo del presente artículo, más concretamente en la capacidad de elaborar consultas y métricas contra dichos modelos, las cuales nos permitan recabar la información necesaria acerca de su estado.
Para la confección de estas consultas emplearemos DAX (Data Analysis eXpressions), el lenguaje desarrollado con el objetivo de crear expresiones analíticas tanto para modelos de datos en PowerPivot como para modelos tabulares en BISM. También usaremos MDX para acceder a una fuente de datos tabular como si de un cubo OLAP se tratara. Finalizaremos nuestro periplo con Excel, como ejemplo de aplicación de usuario final dotada igualmente de la posibilidad de obtener información del modelo, presentándola con la conocida potencia gráfica de esta herramienta.
Creando el modelo tabular
Lo primero que necesitamos para poner en práctica lo explicado en el párrafo anterior es un modelo de datos tabular que podamos utilizar como banco de pruebas; así que comenzaremos arrancando SQL Server Data Tools (SSDT), el entorno de desarrollo para SQL Server 2012 basado en Visual Studio, que se encuentra en el mismo grupo de programas de SQL Server, y empleando el tipo de proyecto Analysis Services Tabular Project, crearemos un nuevo proyecto con el nombre ContosoTabular que usará ContosoRetailDW como fuente de datos.

El siguiente paso que daremos a continuación consistirá en la importación de las tablas de la base de datos ContosoRetailDW, que posteriormente utilizaremos en nuestras operaciones de análisis contra el modelo tabular, por lo que mediante la opción de menú "Model | Import From Data Source" de SSDT, ejecutaremos el asistente de importación de datos Table Import Wizard, en cuyo primer paso elegiremos el tipo de origen al que deseamos conectarnos.

En el siguiente paso introduciremos el servidor e instancia de SQL Server al que nos conectaremos (en este caso EVENDIM como servidor y SQLTABULAR como instancia), así como el nombre de la base de datos desde la que importaremos las tablas. También elegiremos Autenticación de Windows como método de conexión.

A continuación deberemos facilitar las credenciales de conexión que Analysis Services utilizará para conectarse a la base de datos al realizar el proceso de importación. En este caso emplearemos la cuenta del usuario que ha iniciado sesión en el equipo desde el que se desea conectar, aunque también podríamos haber usado la cuenta del servicio de Analysis Services, tal y como se explica en el siguiente enlace de este mismo blog.

Seguidamente podemos elegir el modo en que se realizará la importación de los datos: mediante una selección de las tablas/vistas de la base de datos o usando una consulta.

Para este ejemplo nos decantaremos por la primera opción, seleccionando las siguientes tablas: DimStore, DimGeography, DimDate, DimProduct, DimProductSubcategory y FactSales.

Completado este paso haremos clic en el botón Finish, que dará comienzo al proceso de importación, a cuyo término veremos la ventana de resultados, en la que si todo ha marchado bien, observaremos el detalle de las tablas y registros importados.

Una vez terminado el proceso de importación, el asistente nos devolverá a la ventana del diseñador del modelo tabular, donde tendremos a nuestra disposición las tablas recién importadas, organizadas en diversas pestañas.

También es posible visualizar estas tablas en forma de diagrama mediante la opción de menú "Model | Model View | Diagram View", lo que nos permitirá seleccionar la visualización de sus columnas, medidas, jerarquías, etc., así como las relaciones existentes entre estas.

Para retornar a la anterior vista del diseñador usaremos la opción "Model | Model View | Data View". Podemos conmutar entre estas dos vistas empleando los botones situados en la parte inferior derecha del diseñador.

Despliegue del modelo
Finalizada por el momento la etapa de desarrollo del modelo, procederemos a efectuar el despliegue del mismo en el servidor de análisis, de modo que pueda ser consultado por aquellas aplicaciones cliente que lo requieran. Para ello seleccionaremos la opción de menú "Build | Deploy ContosoTabular" o haremos clic derecho en el icono correspondiente al proyecto en el Explorador de Soluciones, eligiendo la opción Deploy. Ambas acciones conducirán a la ventana de despliegue, que nos irá informando acerca del progreso de la operación.

Si el despliegue se realiza con éxito, cerraremos su ventana de progreso y abriremos SQL Server Management Studio (SSMS), conectándonos a Analysis Services, donde encontraremos el modelo recién subido al servidor. Expandiendo su nodo principal veremos las tablas que lo componen.

A partir de este punto podemos comenzar a enviar consultas al modelo utilizando cualquiera de los medios a nuestra disposición: consultas DAX, consultas MDX o Excel. Todo ello será tratado en la segunda parte de este artículo.
(The spanish version of this article was previously published in dNM+, issue 83)
PowerPivot is a technology for information analysis whose peculiarity lies in the ability of working with massive amounts of data using Excel as user interface, so it becomes an attractive offer, given the popularity of this tool belonging to the Office suite. This article introduce the reader to PowerPivot as well as DAX (Data Analysis eXpressions), the analytical expression language that accompanies it, reeling off the most important features of this technology, which coupled with traditional Analysis Services and MDX (MultiDimensional eXpressions) language, make SQL Server 2012 one of the most robust and powerful solutions in the Business Intelligence arena.
With increasing frequency, organizations must perform diverse analysis about their status in the most varied perspectives (business, financial, HR, etc.) and on an increasingly vast amount of data, for which require suitable tools to enable them to obtain a reliable and valid information of such state.
As explained in this blog's article, OLAP data cubes are one of the key elements in this data analysis ecosystem called Business Intelligence (BI).
However, a BI system, as occurs in other areas such as Web applications, desktop, etc., requires a development team and time to its creation, this latter factor sometimes extends for longer than initially estimated.
Given the problems posed above, we can propose as a solution to improve development tools of the information system, which accelerate its creation time, and allow handling a greater data volume, given the widespread increase of this aspect in all corporate areas.
It would also be necessary to promote the development team, incorporating new players into the system development cycle, which so far have played a role of mere validators, but they can bring great value to it, given his extensive knowledge about the structure of the various data sources available within the organization.
PowerPivot comes with the aim of providing solutions to the challenges posed above, covering a number of specific areas in the development of an information system based on BI.
Self-service BI
When we talk about self-service, usually framed this concept within the product sales activity, where a person, the client, moves to a place, the supermarket, where products are exposed, and once there the client picks up the products, all generally without requiring the intervention of any intermediary.
Within the BI context we can make an analogy with the self-service model, using a concept called self-service BI, which advocates that end users (advanced or other profile) of an information system can elaborate their own analysis and reports on the system contents, to meet the specific needs emerging in several company departments, without having on these situations to rely from the enterprise BI development team.
As requirements are the access need by these users to those data sources that require analysis and it is also crucial that self-service BI oriented applications have a learning curve as small as possible, so users can start being productive with them almost immediately.
PowerPivot meet these requirements, since being a technology built into Excel has a wide user base already working with both, the functionality of the spreadsheet and access to external data sources, thus reducing or eliminating training in the use of user interface, and focuses on learning specific features about additional support for data analysis.
Incorporating new user profiles to the development cycle
In addition to release development teams a significant workload, providing self-service BI tools to end users; we get as a bonus the possibility that advanced users from various departments of the organization, through the use of such tools, to assist in the development cycle of the information system, providing ideas and suggestions.
VertiPaq. High performance data processing engine
PowerPivot introduces a new data processing engine: VertiPaq, which through a column-based storage, implements a series of algorithms for data compression, by which is capable of loading millions of records in memory.
However, due to the execution in memory nature of VertiPaq is recommended, whenever possible, work on a 64-bit environment, with adequate software version to this architecture: operating system, SQL Server, Office 2010, PowerPivot, etc., since it does not suffer the memory address limitations found in the 32-bit systems.
Architectural elements
PowerPivot, as seen in next image, consists of an assembly that is loaded into Excel process; VertiPaq engine, which handles data loading, query management and DAX expressions execution against the data warehouse, as well as PowerPivot's pivot tables and charts, and finally, the OLAP provider, the Analysis Management Objects (AMO) objects and ADOMD.NET provider, allowing communication with analysis services, for gathering information from data cubes if applicable.

Collaboration environment. PowerPivot for SharePoint
Using PowerPivot is expected that the number of analysis models created by users grows noticeably, and it is therefore very important to have a management mechanism, with respect to the publishing and access security within the organization.
This is the purpose of PowerPivot for SharePoint, a plug-in extending Excel Services for SharePoint, in the above paragraphs about management and collaboration.
Additionally, PowerPivot for SharePoint allows users to view PowerPivot models published in the organization simply using a web browser, so you only need to install Excel and PowerPivot for Excel on the machines of users who will develop analytical models using this tool.
PowerPivot. BISM cornerstone
(Note: for a more accurate description of BISM than below, we recommend read in first place the SSAS Tabular Models article which includes updated information about this technology).
SQL Server 2012 ships with a new BI model called Business Intelligence Semantic Model (BISM). This model doesn't replace or displace Unified Dimensional Model (UDM), the existing BI model from SQL Server 2005 but coexist with it, complementing and enriching the SQL Server offer in BI area, allowing us to have both models to work with the one that best suits our needs.
SQL Server has been traditionally blamed, referring to UDM, that its implementation in BI area has a difficult learning curve with complex concepts: cubes, dimensions, measures, etc., and that the time needed to develop a solution extends too much when it comes to creating a simple analysis system.
From our point of view, this learning curve isn't much more complicated than we find in other aspects of software development, although we admit that from the perspective of developers used to working with a relational data model, getting familiar with the UDM-OLAP multidimensional model to build a business analytics solution may involve some initial complexity until the developer master the concepts and features of the technology.
BISM comes to solve this problem, providing a simpler way of work, distinguished by its relational philosophy supported by PowerPivot and VertiPaq. While BISM is primarily oriented to the development of self-service analytical solutions or for small teams, UDM is focused in the development of corporate BI solutions, which require more planning.
BISM is a model with a distributed architecture in three layers: data access, business logic and data model.

The data access layer is responsible for connecting to data sources to load the data they contain, existing two ways to operate with them: cached or real time. In the case of cached mode, data is loaded using VertiPaq, whereas if we use the real-time mode, this task is left to the original data source from which data are extracted.
The business logic layer will implement the managing and processing of data that allow us to make them relevant information. We will use for this any expression languages at our disposal: DAX or MDX.
DAX is the expression language for data analysis built-in PowerPivot. We'll use it to build queries against the data store located in VertiPaq, while MDX is the language usually employed in UDM to query OLAP cubes.
DAX is not as powerful as MDX, but it is much easier to use, and aims to query data located in a self-service BI model, which theoretically should not require the use of complex queries, reserved to be made against the corporate BI system, which under normal conditions will have been developed by UDM.
Finally, the data model layer will be used by client applications (Excel, SharePoint, Reporting Services, etc.) gathering information from the proper model: relational or multidimensional. Among the reporting tools that use this model is Power View, a project whose objective is to provide a better user experience for data visualization and reporting against BISM based models.
Some prominent voices in the BI-related technical community have expressed concern about the UDM-OLAP position with the BISM arrival in SQL Server 2012, although from Analysis Services development team wanted to send a reassuring message, insisting that BISM is not a replacement for UDM but a complementary technology.
PowerPivot in practice
After architectural features review made in previous paragraphs, it's time to get down to working to develop a PowerPivot model, which help us to get our first impressions about the scope of this technology.
First of all we need to have Office 2010 installed, and download and install PowerPivot for Excel plug-in. In the event that our system is 32 bit, we'll download the file PowerPivot_for_Excel_x86.msi, but if we work on a 64 bit we must download the file PowerPivot_for_Excel_amd64.msi. For the sake of testing, in this article we use a Windows 7 operating system virtual machine with 1.5 GB of RAM and Intel Core 2 Duo processor.
Although the name of the download page referred to SQL Server 2008 R2, it is not necessary to have it installed if we are going to work with other data formats such as Access, text files, etc. However, in our case, we will use SQL Server as data engine and the database ContosoRetailDW. Through the latter link we'll download the ContosoBIdemoBAK.exe file containing the database in format backup.
The reason why we use this database is that it has some tables with millions of records, allowing us to test one of the salient features of PowerPivot: the ability to handle large volumes of data.
Creating a model
After restoring the database we'll open Excel, then clicking on the ribbon tab PowerPivot. Among the options on this tab we will click PowerPivot window, belonging to Launch group, which as its name suggests, will open the PowerPivot working window.
The first task in creating the model will be to connect to a data source to import its content, so that from the Home tab of the PowerPivot window, in the Get External Data group, we'll click From database option, selecting From SQL Server among the available items.

As a result will open the Table Import Wizard, where we'll include information to connect to the database ContosoRetailDW. In the Select Tables and Views step we will check FactSales, DimDate, DimStore and DimProduct.

After this step will begin the import process, once completed will show each of the imported tables in different tabs in PowerPivot window.

Querying the model
The set of tables we have just import will conduct an analysis of company sales, according to data related to the sales table: dates, products, stores, etc.
To do this we will click on PivotTable option in Reports group reports, creating a PowerPivot's PivotTable with which to perform our analysis operations. This action will move the focus to Excel window, where a dialog box will ask you the coordinates to place the PivotTable in the spreadsheet. Accepting the default values the new PivotTable will be created.

On the pivot table's right side we find the PowerPivot Field List panel where we select the fields to use for our analysis, either as axis labels of rows and columns, numeric values, filters, and slicers.
Those readers with expertise in creating OLAP cubes will verify that querying a PowerPivot model is very similar to the query against a data cube, but without the existence of a real cube, as in the field list panel, the items in block Values represent the cube measures or metrics, while the rest of the block fields, represent the dimension attributes, displayed either in rows, columns or filter the PivotTable.
We will begin creating our query by checking the field SalesAmount, from FactSales table. Due this is a numeric field, it will automatically be placed as a measure in the Values block. PowerPivot will apply then an addition operation on all records in the table to which it belongs.
The current state of the PivotTable does not offer, however, great analytical possibilities, since we only have total by SalesAmount field. We need to add additional elements to the query relating to the table FactSales such as DimProduct table. This table contains each product's name and additional information as manufacturer, type, color, size, etc.
Suppose we want to find out the number of sales by product, but instead of using his name, we need to analyze by manufacturer. What we do in this case is check the field Manufacturer, belonging to the field list of DimProduct table. This action will place the field in the Row Labels block and their values in the PivotTable row axis. In this way we will know how sales have meant by each of the manufacturers.
Since the SalesAmount field values are not shown formatted, as additional work we right-click any cell in this field, choosing Number Format... option. In the format dialog box select type Currency with two decimal places and euro symbol, being applied to all cells in this field in the pivot table.

Writing DAX expressions
As noted earlier, DAX is the expression language through which we'll build the business logic in PowerPivot, both as PivotTable and data model.
Before starting to write our first sentences in this language, we must know that without us aware of this, we have already written a DAX expression! If we right-click Sum of SalesAmount in Values block and select Edit Measure..., a dialog box will open with the expression automatically created by PowerPivot
=SUM('FactSales'[SalesAmount])

The creation of a DAX expression is very similar to writing a formula in Excel, since then the equal sign, write one or more functions with their corresponding parameters, forming the whole expression. In the expression currently under review, the SUM function will make a sum of SalesAmount field for all records in the table FactSales. If the table name has spaces or other special characters we enclose it in quotes, in other cases not necessary. With respect to the field name should always be enclosed in brackets.
Now let's approach to writing our own expressions, through two major elements PowerPivot: computed columns and measures.
First we will create a computed column in the table FactSales, to obtain the sales amount without discounting. To this end, in PowerPivot window we should be on the last available empty column, writing in the formula bar as follows:
=FactSales[UnitPrice] * FactSales[SalesQuantity]
To assist in the writing of expressions, the AutoComplete feature will suggest at all times a list of functions, tables and fields, depending on what we go on writing.
After writing the expression, the new column will be filled with the resulting values. Double-clicking the column header we'll assign the name ImporteSinDescuento, ending the creation of computed column.

Another way to create a computed column is to click on Add option from Columns group, belonging to Design tab.
Because the fields used in the formula belong to the same table on which we are creating the calculated column, in the expression syntax we can circumvent the name of the table, being as follows:
=[UnitPrice] * [SalesQuantity]
To use the new column in the pivot table we'll back to the Excel window, finding a warning in PowerPivot Field List which informs us that data was modified. Clicking the Refresh button of the warning the field list will update, adding the new field ImporteSinDescuento, which we'll check to include in the PivotTable.

Our next step is creating a measure to calculate, for each manufacturer, the percentage of his sales with respect to total sales of the company.
In PowerPivot tab, within Measures group, we click the New Measure option, opening the Measure Settings window, where we'll type PorcentajeVentas as measure's name and the following expression in formula field:
=SUM(FactSales[SalesAmount]) / CALCULATE(SUM(FactSales[SalesAmount]), ALL(FactSales))

The first part of this expression, as we have seen, is responsible for adding the field SalesAmount. This sum will be made for each manufacturer, because this field is used in PivotTable labels row.
The second part of the expression returns, for all rows, the total sum of SalesAmount field, which we accomplish combining the functions CALCULATE, SUM and ALL. CALCULATE function evaluates the expression passed as the first parameter, which is a sum of SalesAmount field, and requires that the sum is over all FactSales table records by using the ALL function, regardless of the filters that are applied in the PivotTable.
Finally, add the new measure to the pivot table by applying the percentage format.

Filters and slicers
Our PivotTable now shows arranged by product manufacturers, a series of measures obtained from fields and calculations of FactSales table belonging to PowerPivot data model.
The results of these measurements are made from all records of that table, but at some point we may also be interested to have the ability to apply filters on the model's information, in order to obtain different perspectives of analysis.
PowerPivot pivot tables provide the user with filters and slicers as data filtering tools. Next, we'll explain to the reader to use them as a means to narrow the results of the report.
Let's suppose we want to filter information in the pivot table depending on the type of store where are the products. We get this data from field StoreType in DimStore table.
To create a filter of this type, in the field list pane we'll open out the field list of table DimStore and drag the StoreType field to the Report Filter block. As a result this filter is inserted in the top of the pivot table.
By clicking on the filter selecting values button, we'll select "Almacén". This action will activate the filter, causing all numeric cells in the pivot table refreshing sales information depending on the filter set.

Slicers, moreover, they behave as filters, although its handling by the user is slightly different.
Let's create a slicer that allows us to filter the results by sales year, for which we will drag the CalendarYear field from DimDate table and drop in Slicers Horizontal block. This action will create the slicer with all filter values placed above the pivot table.

Slicer values that appear in a darker color tone are those with which we can make effective filters, while the use of those with a lighter shade will not produce results as there are no sales in those years.
If we want to use more than one value in the slicer, we must keep down the CTRL key while clicking on the different values. The result of the filter is not effective until we release the CTRL key. To remove all filters from the segmentation will click the funnel-shaped icon situated at the top right.
Creating a PivotChart
In certain circumstances, the numerical data provided by a PivotTable it may not be sufficient to analyze the content of a PowerPivot model, so we also have pivot charts at our disposal, as an alternative in the information display.
We can create a pivot chart from scratch or use an existing pivot table as chart's foundation. In this case we'll choose the latter method, for which, once positioned in the pivot table, we'll click on Options tab belonging to PivotTable Tools category, and then select PivotChart option, located in the group Tools. This opens the Insert Chart dialog box, offering us a wide range of chart types to insert into the worksheet. Having made our choice, we will accept this dialog inserting the chart next to the PivotTable.
From this point, both the table and chart will be synchronized, so that the modifications of filters, measures axes labels, etc. performed in one will be reflected immediately in the other. Next image shows the appearance of our report including pivot table and pivot chart.

Conclusions
PowerPivot represents a technology with great potential to become an important part of the toolkit available for BI development with SQL Server. If we add capacity in handling large volumes of data, the power of its query expressions language, and the fact of using Excel as user interface, we obtain a product that allows users to create their own analysis models to address specific information needs. At the same time, these models can be of assistance to BI development teams in building corporate information systems.
(The spanish version of this article was previously published in dNM+, issue 93)
The recent SQL Server 2012 release (formerly codenamed Denali) is accompanied, as usual in any new version, for several interesting improvements. In this article we'll focus on development of analytical tabular models, an integral part of Business Intelligence Semantic Model (BISM), the new paradigm for developing Business Intelligence (BI) solutions based on SQL Server 2012 Analysis Services.
As was already mentioned in the introductory article about PowerPivot, the incorporation of BISM into Analysis Services makes SQL Server one of the most powerful solutions within the current BI arena. However, since the first public announcements about BISM so far, new details have been revealed about the architecture of this technology, allowing us to be more precise in the descriptions of it.
BISM. A single model for all analytical needs
BISM becomes the new model for the development of BI solutions that fulfills the analysis requirements of all kind users.
BISM represents an evolution from Unified Dimensional Model (UDM) towards a combined model that offers all UDM multidimensional development features plus new features based on a relational analysis engine, what enrich the current SQL Server offer in the analytical services area. Therefore, from now on, the term UDM is replaced for BISM, when we mention the development model used by Analysis Services. Similarly, when we update an UDM project to SQL Server 2012, it will be deemed as a BISM project for all purposes.
This integration between work philosophies, and hence on the underlying technologies (multidimensional and relational), results in an architecture distributed in three layers, as we see in the following image, with response capacity to users with very different needs in all concerning tasks related to data analysis.

Watching the workflow of this architecture in a bottom-top direction, in first place, we find the data access layer, which is responsible for performing the information extraction resident in the data sources we connect, existing two modes for this extraction: cache and passthrough.
The caching mechanism gets the data from the original source and stored it in a data structure based on a compression algorithm optimized for high-speed access. In turn, using this cache mode we must choose two different storage engines: MOLAP or xVelocity.
MOLAP is the intermediate storage system traditionally used in Analysis Services, and as the name suggests, is optimized for use in multidimensional models development (OLAP cubes).
xVelocity is a brand new system introduced with BISM for use in tabular models, consisting of a column based data storage engine, which combines sophisticated compression and search algorithms, to offer, without using indexes or aggregations, excellent performance in speed response to our queries.
Regarding to passthrough mode, as its name implies, sends the query directly to data source engine, so both operations: data processing and business logic are performed there. In this case there are two modes of execution too: ROLAP and DirectQuery.
ROLAP mode is usually used by Analysis Services, which uses the data source itself to process queries against cubes belonging to multidimensional models.
DirectQuery is used in tabular models to also process queries in the data source.
Business logic layer includes queries in MDX or DAX languages with which we'll implement our solution's logic, depending whether we work respectively against a multidimensional or tabular model.
Finally, in the data model layer we find the conceptual part of this architecture, where using SQL Server Data Tools (formerly Business Intelligence Development Studio and current development environment based on Visual Studio 2010) will build our model using one of the available project templates: Analysis Services Multidimensional and Data Mining Project or Analysis Services Tabular Project.
It's worth mentioning that it is also possible to use PowerPivot as development tool for a tabular model, but to deploy it in the server it's mandatory to use SQL Server Data Tools.
Installing Analysis Services. A semantic model, two modes of execution
Due to the particular features of each BISM data model, in SQL Server 2012 setup, at Analysis Services step, we must choose which analysis server execution mode we want to install: Multidimensional or Tabular, because it is not possible to install both simultaneously in the same instance. For this reason the recommended procedure is to install separate instances. However, in our case it is sufficient to install only the tabular mode for the development of this example.

Let's get started. Creating a tabular model
Exposed all necessary aspects to conceptually locate the reader within the new semantic model, let's get into the practical part of the article, in which we will develop a tabular analysis project.
We will use the sample database AdventureWorksDWDenali being the model's goal to analyze the shipping orders expenses placed by customers through the Internet and the number of orders issued, all according customer's residence location and promotion details about items purchased.
In first place we will start SQL Server Data Tools, creating a new project based on Analysis Services Tabular Project template, giving it the name PruebaModeloTabular. As we shall see in Solution Explorer, this project will consist of a file named Model.bim, which represents the tabular model designer.

The next step will be to add tables to the model, so we'll select the menu option "Model | Import From Data Source", which will start the data import wizard. After choosing the type of data source, SQL Server instance and database, we reach the step where we must enter credentials to connect to the data source and proceed with data extraction.
At this point we'll use our Windows account (in case we have adequate access permissions) or choose the Service Account option, which will use the NT SERVICE\MSOLAP$SQL2012TAB account (SQL2012TAB is the SQL Server instance name of my machine) associated with the Analysis Services service. If we use the latter possibility, we need to grant access and read permissions to this account on the AdventureWorksDWDenali database.

To assign that permission, in SQL Server Management Studio we right-click the Logins node, belonging in turn to Security node, choosing New Login option, which will open the dialog box to add a new login to the server, where we'll type the Analysis Services account name. In User Mapping section we'll check the database to grant access.

Continuing the import wizard we'll reach the selection table step, where we'll check DimSalesTerritory, DimPromotion and FactInternetSales tables, and proceeded to start the import process.

When the data import process finishes, the designer will show the tables in a grid, arranging them into tabs.

This display mode can be switched to a diagram mode, more suitable if we want to focus on concrete elements such as relationships between tables. To change the view mode use the buttons for that purpose in the bottom right of the designer window.

Creating measures
Despite having imported tables to the model, we still can't conduct a proper analysis on it, because we lack the calculations (measures in BI context) responsible for providing numerical results, essential in any system of this type, as we saw in the article about OLAP Cubes.
To create a measure for the model, we must firstly set the designer's display to grid (default mode). Once this adjustment is made, we see that each table shows two sections: top, which contains the rows of the table itself, and bottom, reserved for calculated measures created by the model developer.
Next we'll select the FactInternetSales table (also known as fact table in multidimensional context) that contains those columns that can be used in obtaining numerical results.
We will use the Freight column to create the first of our measures: sum of the values in that column for table's rows. In grid's bottom area we'll select an empty cell for the column above mention, then we'll click the toolbar's Sum button, which apply the DAX language formula "SUM([Freight])" to the column.
As a result of this operation we'll obtain the Sum of Freight measure, but we'll change this name, assigned automatically by the development environment, to GastosTransporte, using the properties window.

Watching the resulting figure we will realize that there's a formatting problem with the obtained value (7,339,696,091.00 €), as the correct result is 733969.6091. We can verify this running the following query in SQL Server Management Studio.
SELECT SUM(Freight) FROM FactInternetSales
However, we'll find this difficulty only in model designer scope, since as we shall see in the next section, when analyze it from an external tool, the values are displayed correctly.
Next we will create the measure to calculate the orders quantity issued by the company, so we'll apply to SalesOrderNumber column a distinct count operation, because the FactInternetSales table can have more than one row for the same order.
We can drop-down the Sum button in Visual Studio's toolbar, so we can choose some of the most common calculations, Distinct Count among them, but this time we'll manually create the measure FacturasEmitidas, writing the DAX expression "FacturasEmitidas:=DistinctCount([SalesOrderNumber])" in the formula bar.
Analyzing the model from Excel
If we have Excel 2010 installed we can use it as analysis tool for the model we are developing. All we have to do is select the menu option "Model | Analyze in Excel" and Visual Studio will open Excel, loading the model in a PivotTable.
On the pivot table panel PivotTable Field List we will check the GastosTransporte measure, which will be located in the block Values. Similarly we'll proceed with SpanishPromotionCategory field from DimPromotion table, but this time we'll place it on Column Labels block. Finally, fields SalesTerritoryGroup and SalesTerritoryCountry from DimSalesTerritory table will be placed in Row Labels block, so that we can watch this information into a hierarchical fashion.

At the same time we can analyze the amount of bills issued by the company for sold items, adding FacturasEmitidas measure to the PivotTable, so we also get this information by the fields currently located in rows and columns or by other fields belonging to the model.
Those readers who have had the opportunity to use PowerPivot will find interesting similarities with this way of work, since in both cases the underlying technology is the same.
Conclusions
Tabular analysis models belonging to BISM, the new paradigm in building BI solutions, shipped with SQL Server 2012, are a great tool to develop powerful information systems using SQL Server Analysis Services. In this article we have made an introduction to this interesting technology, showing the reader how to take advantage from it.
Durante el desarrollo de un proyecto de análisis de tipo tabular en SQL Server 2012 empleando SQL Server Data Tools, al ejecutar el asistente Table Import Wizard, que realiza la importación de datos al modelo tabular (opción de menú "Model | Import From Data Source"), en el caso de que nuestra fuente de datos sea una base de datos SQL Server y en el paso Connect to a Microsoft SQL Server Database hayamos elegido conectarnos mediante autenticación de Windows, deberemos indicar, dentro del paso Impersonation Information, las credenciales de acceso a dicha base de datos, para que el proceso de importación se realice de forma satisfactoria.

Por defecto, el asistente de importación en el mencionado paso Impersonation Information utiliza la información de la cuenta del usuario actual (nombre de usuario y contraseña) para conceder o denegar el acceso a la base de datos.

Sin embargo, podemos elegir que sea la cuenta asignada al servicio de Analysis Services la que se utilice para conceder acceso al origen del que extraeremos los datos que formarán parte del modelo tabular; procedimiento que describiremos a continuación.

Como primera consideración, la cuenta asociada al servicio de Analysis Services tiene el siguiente formato:
NT Service\MSOLAP$NombreInstancia
Por lo que si nos encontramos trabajando contra una instancia cuyo nombre es, por ejemplo, SQLTABULAR, el nombre de la cuenta será:
NT Service\MSOLAP$SQLTABULAR
Por otro lado, supongamos que la base de datos con la que vamos a trabajar en la creación de nuestro modelo tabular es ContosoRetailDW, disponible en el siguiente enlace.
Para establecer los oportunos permisos de acceso de la cuenta NT Service\MSOLAP$SQLTABULAR sobre esta base de datos, iniciaremos SQL Server Management Studio y nos conectaremos al motor de base de datos.

A continuación desplegaremos el nodo Security, y dentro de este, el nodo Logins, donde veremos las cuentas que actualmente tienen asignado algún tipo de permiso de acceso.

Haciendo clic derecho en el nodo Logins, seleccionaremos la opción New Login, que abrirá el cuadro de diálogo de creación del login, en cuyo apartado General escribiremos el nombre de la cuenta de servicio de Analysis Services, que utilizará autenticación de Windows para identificarse.

En el apartado User Mapping haremos clic en la casilla correspondiente a la base de datos ContosoRetailDW, y en la parte inferior estableceremos el modo de acceso a la misma: propietario, lectura, escritura, etc.

Aceptando el cuadro de diálogo, la cuenta de servicio de Analysis Services se añadirá al conjunto de logins del servidor SQL Server, con lo que ya podremos importar las tablas de la base de datos ContosoRetailDW a través de dicha cuenta en el asistente de importación de modelos tabulares.

Espero que os resulte de utilidad.
Un saludo.
(The spanish version of this article was previously published in dNM+, issue 77)
It is an undeniable fact that in recent times, the volume of data that organizations must manage has soared. Analyze so much data; in order to make strategic decisions has become a real problem. In this article we will make an introduction to OLAP data cubes in SQL Server 2008 R2 Analysis Services, a powerful tool that can transform vast amounts of data into useful information.
The excessive increase of the volume of data into information systems of a company, without proper organization and structure, can have negative effects such as slow in their status analysis, or worse, lead to inappropriate strategic decision making, since having millions of records distributed across multiple heterogeneous data sources (SQL Server and Access databases, flat files, Excel spreadsheets, etc..), need not be synonymous in all cases of a system that provides quality information.
To solve such problems we have the so-called Business Intelligence (BI) tools, and in the case we are working with SQL Server, one of its main components: SQL Server Analysis Services (SSAS), allow us to create a data cube, an element by which to generate information to analyze the state of the company from all its data sources.
Conceptual
From a conceptual perspective, a data cube is one more part into the gear of an information system called data warehouse. The cube is provided with internal machinery that can process high volumes of data in a relatively short period of time, and whose goal is always to obtain a numerical result (amount of sales, expenses, number of products sold, etc.). These results may change depending on one or more filters that apply on the cube. The response time is minimal because the cube processing engine precalculates the possible combinations of results that the user can request. The numerical results obtained are called measures, while the elements used to sort / filter information are called dimensions.
Represented graphically, a data cube is shown as the geometric shape which takes its name, horizontally and vertically partitioned into a series of divisions that give rise to multiple cells, which identify each of the possible outcomes of the measures, obtained by the intersection in each cell of the dimensions that make up the cube. Next figure shows such a graphical representation of a cube, with sales information by product, employees and currency. On the cube sides are placed the dimensions, whose cross produces numerical results in the cells.

Watching the previous figure, the reader may think that the number of dimensions in a cube is limited to those we represent through that geometric shape. Nothing is further from reality, since a cube can withstand a high number of dimensions, which loosely cover the requirements of the information to be obtained.
We can find additional information about the conceptual aspects in a series of articles on data cubes in SQL Server, previously appeared in this publication.
Main elements in a data warehouse
As mentioned above, a data cube is one of the pieces of a more complex architecture: the data warehouse, in which creative process involved several components, which are responsible to take the original data in the rough and polish it until it becomes in information ready for analysis. Next figure shows a diagram where we can see the phases of this transformation process.

Broadly depicted, this process performs, in first place, an operation of extraction, processing and loading data from source origin, located in the operational area, to a database located in the integration area, using SQL Server Integration Services (SSIS) packages, which also performed data cleaning tasks.
Then we would pass to the cube construction phase, which we will develop using SQL Server Analysis Services (SSAS). Finally, we reach the stage of cube query by end users, using several products such as SQL Server Reporting Services (SSRS), Excel, etc.
Physical elements. Fact and dimension tables
At the physical level, to build a data cube we need a database containing a table called fact table, whose structure is formed by a series of fields, called measurement fields, from which we obtain the cube's numerical results; and on the other hand, a set of fields called dimension fields, that we will use to join with the dimension tables, in order to get filtered results for the several dimensions integrating the cube.
The other main foundation in the cube creation is made from the dimension tables. For each dimension or query / filter category we incorporate into our cube will need a table that will join with the fact table by a key field. This dimension table will act as a catalog of values, also called attributes, which will be used independently or combined with other dimensions, to obtain results with greater accuracy.
Developing a data cube
Once explained the necessary basics, we enter into the practical part of the article, where we'll develop our own data cube. We will focus all our efforts exclusively on the cube creation, without addressing the extraction, transformation and loading operations, which would be made using SSIS packages, since the latter are issues that fall outside the scope of this article, pending a future delivery.
First, from Windows Start menu we'll launch SQL Server Business Intelligence Development Studio, which is located in Microsoft SQL Server 2008 R2 program group. This is a special version of Visual Studio for BI projects development. In the starting dialog we'll select Analysis Services Project as project template, giving it the name CuboDatosAdvWorks.
Then right-click Data Sources node in the Solution Explorer and select New Data Source option, which opens the wizard to create the cube data source, which in our case will be the test database AdventureWorksDW2008 available in CodePlex, whose structure is ready to be used in data cubes design.
We will leave the default options in the wizard until the connection to the data source step, selecting AdventureWorksDW2008. On reaching the wizard final step will see a summary of the data source we have created.

Our next step is to create a view of the data source, allowing us, as his name suggests, define a custom view of the database, including the tables we need to create the cube.
In this sample cube we'll measure the amount of sales that AdventureWorks company resellers have invoiced. Being possible query / filter the results by currency type in which the sale was made, and geographic area where the order was sent. We will use FactResellerSales as fact table, and DimSalesTerritory and DimCurrency as dimension tables.
By right-clicking the Data Source Views node in Solution Explorer, we'll select New Data Source View option, which opens a wizard in which first step we will choose the data source we just created. In the second step we'll select the tables just mentioned.

Once finished this wizard it will display its design window where we see a diagram of the selected tables, with the existing relationships between them.

Building dimensions. Basic dimension
The next step is to create the dimension that allows us to query / filter the cube information by order payment currency. Right clicking on the Dimensions node of Solution Explorer, we'll select the New Dimension option, starting the wizard as usual. In his first step Select Creation Method, we'll leave the default option Use an existing table. Upon entering step Specify Source Information, three drop-down lists allow us to configure the information to be obtained for the dimension: Main table to choose the table that use in the dimension: DimCurrency; Key columns to indicate the primary key; and finally, Name column, selecting CurrencyName field, which identifies the attribute to be displayed.

In the Select Dimension Attributes step, the wizard will offer Currency Key as an attribute of the dimension, which we will use, but changing its name to Currency. An attribute is a field, normal or calculated, belonging to the dimension table that is shown as a label anywhere in the dimension involved as part of a query against the data cube.

At the last step we shall give the name Currency to the dimension, finishing the wizard. The dimension designer will show now with the structure we just created. Among all the properties of the attribute Currency, the most important are Name, which contains the name that will appear in queries against the cube; KeyColumns, which contains the key field in the table that relates to the fact table, and NameColumn, containing the field of the table that shows the attribute's value.

In case we need to add more attributes to the dimension, just drag and drop fields from the table in Data Source View pane to the Attributes pane of the designer.
When attribute creation is finished, we'll process the dimension by clicking the Process button on the toolbar's designer, or through the Visual Studio Build | Process menu. When the processing dimension is completed, we will click on the designer's Browser tab, where we'll inspect its contain.

Building dimensions. Hierarchical dimension
In addition to the single level dimensions, as we just saw in the previous section, we can create dimensions that group data on multiple levels, which provide a greater ability to disaggregate the cube information when consulted through a dimension of this type. This element is called a dimension hierarchy.
Let's take DimSalesTerritory table from our Data Source View sample project. We can see that the combination of SalesTerritoryGroup, SalesTerritoryCountry and SalesTerritoryRegion fields allow us to establish several grouping levels.

Suppose we need to create a dimension based on this table to obtain, starting from SalesTerritoryGroup field, a hierarchical level display effect.
To do this, we will create the dimension using the wizard in the way explained in previous section. The default attribute selected by the wizard will correspond to table's primary key: SalesTerritoryKey field.
Once located in the dimension designer, drag from the table in Data Source View pane the SalesTerritoryGroup, SalesTerritoryCountry and SalesTerritoryRegion fields, and drop on the Attributes panel of the same designer.
Then drag the Sales Territory Group attribute to the Hierarchies pane, which will create a new hierarchy. We'll change its default name by Sales Territory. Also in this hierarchy will drop the Sales Territory Country and Sales Territory Region, noting that next to the hierarchy name appears a warning icon which informs us that relationships between the hierarchy attributes aren't properly created, which can negatively affect the dimension processing.

To solve this problem we will click the Attribute Relationships tab, where we see the relationships between attributes automatically created by the designer.

These relationships, however, are not valid for our purposes, so we'll select their representing arrows in the diagram and delete them. To create the new relationships drag from source attribute to destination, to leave them as see in next figure.

Before dimension processing let's back to the Dimension Structure tab to verify that the warning is gone. On the other hand we'll select all attributes of the Attributes panel, assigning the False value on his property AttributeHierarchyVisible, so we'll get independent attributes are not shown, since what interests us here is to explore only the hierarchy. Next figure shows the dimension result, with all elements of the hierarchy expanded.

Cube Building
We reached the final stage in the development of our sample project: building the data cube. We will begin by right-clicking the Cubes node in Solution Explorer and selecting the Add Cube option, which will open the creation wizard, leaving the default values until the Select Measure Group Tables step, which as its name suggests, it asked us to select the table containing the fields that we use as measures for the cube, i.e. the fact table, which in this case will be FactResellerSales.

Clicking Next, go into the Select Measures step, where we have to select the fields that serve as cube measures. The goal of this cube is to ascertain the sales amount made by resellers, therefore, we'll select only the SalesAmount field.

The next step prompts us to select the dimensions that will be part of the cube. It automatically detected the dimensions created by us earlier, which already are offered selected by default.

The wizard performs then a search in the fact table to find a field that could also be capable of being treated as a dimension. Since we don't need this feature, we'll uncheck the selection of the fact table as source for the creation of dimensions.

And now the final step, where we will give the name VentasDistribuidores to the cube, ending the wizard.

As a result the cube designer displays itself, showing various key elements, such as the dimensions pane, table diagram, measures, etc.

In the Measures pane appears the measure selected in the wizard, but we'll change its name to Importe Ventas in properties window. In this same window we can watch the name, aggregate function used to calculate the measure, used table field, format string, and so on.

For the measure to appear correctly formatted, in addition to assigning a value to the FormatString property, in the cube properties we have to assign the value Spanish (Spain) to the Language property.
Finally, before we can see the cube, as we did with the dimensions, we must process it by clicking the Process button, which opens the dialog box cube processing where we'll click the Run button.

Once the cube has been processed, we can see its contents by clicking on the Browser tab. In the Measure Group pane we'll expand the Measures node up to the Importe Ventas measure, which drag into the central area of the display. Then drag the SalesTerritory dimension to the display's left margin. We can right click on this dimension, selecting Expand Items option, resulting in a display of the dimension elements. Finally drag the Currency dimension to the upper margin. As a result we get a data grid where each cell displays the measure for the intersection of the dimensions placed in columns and rows of data display.

Conclusions
In this article we have made an introduction to the development of data cubes with SQL Server 2008 Analysis Services, a component of SQL Server product family intended to provide business intelligence solutions that exploit the analytics potential that lies in data's company. The possibilities and power of this tool are enormous, and we encourage the reader to implement them.
An essential feature in BISM (Business Intelligence Semantic Model) tabular data models, the new BI development paradigm built in SQL Server 2012, are the relationships between tables of a particular model, because without them it would be impossible perform a proper analysis of the information contained therein.
Let's say we have a database named AdvWksDW, created from some of the tables belonging to AdventureWorksDW2012 database, but without the corresponding relations between them. The following code block shows the SQL statements necessary for its creation.
CREATE DATABASE AdvWksDW
GO
USE AdvWksDW
GO
SELECT * FROM INTO FactInternetSales AdventureWorksDW2012. Dbo.FactInternetSales
GO
SELECT * FROM INTO DimDate AdventureWorksDW2012. Dbo.DimDate
GO
For those readers familiar working with multidimensional analysis models, if we were to build an OLAP cube using this database, the FactInternetSales table represents the fact table because it contains the columns from which we obtain measures or analyze numerical results, while DimDate table represents a dimension table, used to filter the information by the dates in the model.
Then we'll create a tabular analysis project with SQL Server Data Tools (SSDT), the development environment for SQL Server 2012 based on Visual Studio 2010, known in earlier versions as Business Intelligence Development Studio.

In the data model of this project we will include the tables in the database just created, adding also a measure that sums the values in column SalesAmount.

To analyze this model from Excel we will select the menu option SSDT "Model | Analyze in Excel", or we will click on the button on the toolbar that serves the same purpose.
I
This action will result in opening a new spreadsheet showing the data model in a pivot table which we'll place the measure "Sum of SalesAmount", from FactInternetSales table, into the Values block, and the CalendarYear field from DimDate table in block "Row Labels".
The result we would need to obtain from this pivot table is the sum of SalesAmount field, that belongs to FactInternetSales table, grouped by sale years, that is, for the year in OrderDateKey field. Since we have DimDate table as catalog dates, if we combine both tables in a SQL query using the appropriate date fields, we'll get the expected numbers.

However, the result obtained in the pivot table is very different, because all the cells show the same value: the sum total of SalesAmount field.This is due to the lack of a relationship between fields and DateKey OrderDateKey from FactInternetSales and DimDate tables respectively, which causes tabular model engine don't know how to apply the measure "Sum of SalesAmount".

Relationships in the tabular model
In scenarios like the one just described is where we realize the importance that proper establishment of relations between tables in our model, in order to achieve optimal data analysis.
A tabular model gets, if any, the relationships directly from the data source during execution of the data import wizard.The relationships obtained in this way will name automatic relationships.
If the model has no relationships or we need additional relationships to existing ones, we can create them using the management relationships window available for this purpose in the development environment of Visual Studio 2010; we'll call this type manual relationships.
In our current situation, for the information in model is consistent, we need, as noted above, create a relationship between FactInternetSales and DimDate tables, using OrderDateKey and DateKey fields; so we'll select the menu opion "Table | Create Relationships" which opens the window where we will make this operation by selecting tables and fields members of the relationship.

After relationship creation, we'll back to the pivot table in Excel that we are using to analyze the model, and then clicking on the Refresh option (Data group in Options tab, contained in the top-level tab "PivotTable Tools") a new data read will take place, refreshing the pivot table content, and making sales figures by year show correctly.

The state of the relationship. Active and inactive relationships
In a data model may happen that a table where we have defined measures, several of his fields relate to a single field in another table that we will use to filter data from the first.When this happens, how can we know what field into measures table is relating to the filter table when we analyze the model for both tables? For those readers experienced OLAP cubes development, the question to ask is: what field of the fact table is relating to the dimension table?
The answer to this question is found in the relationship status (active or inactive), and the best way to illustrate this concept is through an example.Therefore, we create a new tabular project in SSDT, whose data source is this time the AdventureWorksDW2012 database, importing into the model, as in the previous example, FactInternetSales and DimDate tables, and also defining a measure with the sum of SalesAmount column.
When we analyze the model in Excel, placing the field CalendarYear in the row labels, the resulting numbers correspond to sales by invoice date.

That will be the same figures we obtained from the SQL query presented in a previous section.

At this point is where the reader will surely wonder: "How does the model I want to get the sum of SalesAmount field based on OrderDateKey field, and not on DueDateKey or ShipDateKey, which are also type date? "
Actually, the model does not know, but merely it uses the active relationship of the model. Let us explain this in more detail:
If we look at the designer's model diagram view, we'll see that from the table FactInternetSales three relationships go to DimDate table.Between the arrows indicating visually the relationships, the one with the continuous line trace corresponds to the active relationship, and therefore, will be used by default when you see the model using the field CalendarYear from DimDate table.Clicking on this relationship, we'll see its detail in the Visual Studio properties window . Note that the Active property is True.
We can also find out the working relationship, and therefore inactive by selecting in Visual Studio the option menu "Table | Manage Relationships", which will open the relationship management window, where the Active column report us this particular aspect.
Querying the model through a inactive relationship
The issue to ask now would be how to question the model, so that it provide us the sum of SalesAmount field, but using any of the other two date fields (ShipDateKey or DueDateKey) that are also related to the DateKey field in DimDate table. In other words, what we need is to alter the context of the currently active default filter.
The solution is to create a new measure, which will also sum the field SalesAmount, but somehow indicating that the grouping field will be one other than the default.For example, if we want the sales results through ShipDateKey field, we'll use the following DAXexpression:
VentasPorFechaEnvio: = CALCULATE (SUM ([SalesAmount]); USERELATIONSHIP (FactInternetSales [ShipDateKey] DimDate [DateKey]))
To add this measure to the data model, within the area reserved for the measures we will click on an empty cell below the SalesAmount column, and write the previous expression in the formulas / expressions panel.
Consider this expression by parts: we use the SUM() function since our main goal is to add the SalesAmount field, but when grouping the sum, we want use the ShipDateKey field through its relationship with the DateKey field of DimDate table, so we must explicitly designate that relationship because isn't active, and this is what we do with USERELATIONSHIP() function.But because of using a relationship that is not active, we need to change the current filter context, what we get using the CALCULATE() function.

As we did in previous examples, if we transfer this to a SQL query to check that the values obtained are the same, the sentence would be as illustrated below.

Conclusions
In this paper we have made an introduction to the relationships in tabular models of the new semantic model for Business Intelligence built in SQL Server 2012 2012.This feature represents a cornerstone in building analysis systems using this technology, so we hope this article will help the reader to start in developing their own analysis models using BISM. The worked examples throughout the article are available at this link.
After data generation in Excel explained in the first part of the
article, in this second installment we'll show how to insert that information
into a SQL Server database.
Database creation
After spreadsheet creation, we'll transfer its contents into a SQL
Server database running the following script from SQL Server Management Studio.
USE master
GO
CREATE DATABASE Poblacion
GO
USE Poblacion
GO
CREATE TABLE DatosPoblacion (
Fila_ID int NOT NULL,
Edad_ID int NULL,
Sexo_ID char(1) NULL,
CCAA_ID int NULL,
Pais_ID int NULL,
Fecha_Alta datetime NULL,
CONSTRAINT PK_DatosPoblacion PRIMARY KEY CLUSTERED (Fila_ID ASC))
GO
CREATE TABLE Sexo (
Sexo_ID char(1) NOT NULL,
Sexo_DS varchar(10) NULL,
CONSTRAINT PK_Sexo PRIMARY KEY CLUSTERED (Sexo_ID ASC))
GO
CREATE TABLE CCAA (
CCAA_ID int NOT NULL,
CCAA_DS varchar(50) NULL,
CONSTRAINT PK_CCAA PRIMARY KEY CLUSTERED (CCAA_ID ASC))
GO
CREATE TABLE Pais (
Pais_ID int NOT NULL,
Pais_DS varchar(50) NULL,
CONSTRAINT PK_Paises PRIMARY KEY CLUSTERED (Pais_ID ASC))
GO
As we have seen, in addition to the table that will house the data
generated from Excel, we will also create the catalog tables, which contain
descriptions of some fields of code existing in DatosPoblacion table with which
establish the necessary relationships.

Import Excel data from
SQL Server
To insert data into DatosPoblacion table, we'll use Transact-SQL's
OPENROWSET function as follows.
INSERT INTO DatosPoblacion
SELECT Fila_ID,Edad_ID,Sexo_ID,CCAA_ID,Pais_ID,Fecha_Alta
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 8.0;Database=C:\DatosOrigen\GenerarDatosPoblacion.xlsx',
'SELECT * FROM [Hoja1$]')
However, it's possible we get an error when trying to execute this
insert sentence, which informs us that SQL Server is not configured to query in
this way, called 'Ad Hoc Distributed Queries'.
If we want to check the SQL Server settings, we must run the system
stored procedure sp_configure. However, it is likely that among the options
listed, we do not see the distributed queries configuration. If we are in this
case, we must enable the advanced options display using the following
sentences.
EXECUTE sp_configure 'show advanced options',1
GO
RECONFIGURE
GO
Now we see the value of 'Ad Hoc Distributed Queries' option by
executing sp_configure.To enable it execute the following.
EXECUTE sp_configure 'Ad Hoc Distributed Queries',1
GO
RECONFIGURE
GO
When we run sp_configure back, we will already see activated the ability
to run distributed queries.

Thus, the previous sentence with OPENROWSET will work properly, filling
the DatosPoblacion table with the content of the GenerarDatosPoblacion.xlsx
file.

Optimizing data import
from Excel
However, at the current point we may face a performance problem, because
if we followed the above steps for creating the Excel file, we have a
spreadsheet with one million rows, which can take about fifteen minutes to load
into SQL Server table. For the sample developed in this article, we have used a virtual machine
with Windows 7 operating system and 1.5 GB RAM, so that the above times may
vary depending on the configuration used for these tests.
If we want to reduce these load times we may opt an alternative
technique for transferring data which explain below.
In first place we will open back the file GenerarDatosPoblacion.xlsx,
saving it as "CSV (Comma delimited)" type.

In this way we'll get a text file with fields delimited by the semicolon
character. We can see its contents by opening it with Notepad.

Then we will create a new table in the database with the following
structure.
CREATE TABLE DatosPoblacionExcel (
Fila_ID varchar(20) NULL,
Edad_ID varchar(20) NULL,
Sexo_ID varchar(20) NULL,
CCAA_ID varchar(20) NULL,
Pais_ID varchar(20) NULL,
Anualidad varchar(20) NULL,
Mes varchar(20) NULL,
Dia varchar(20) NULL,
Fecha_Alta varchar(20) NULL)
Into this new table we will import the contents of
GenerarDatosPoblacion.csv, using the Transact-SQL statement BULK INSERT.
Through the FIELDTERMINATOR option, we'll specify the character used as field
separator, while the option FIRSTROW indicate that the reading of data starts
in the second row of the file, because the first contains the column names.
BULK INSERT DatosPoblacionExcel
FROM 'C:\DatosOrigen\GenerarDatosPoblacion.csv'
WITH (FIELDTERMINATOR =';', FIRSTROW=2)
The time consumed by the bulk insert operation in the table
DatosPoblacionExcel be about thirty seconds.
To finish this process, we will insert in the table DatosPoblacion the
records from DatosPoblacionExcel table, excluding the unnecessary fields, as
shown in the following statement, whose execution will take approximately 15
seconds.
INSERT INTO DatosPoblacion
SELECT Fila_ID,Edad_ID,Sexo_ID,CCAA_ID,Pais_ID,Fecha_Alta
FROM DatosPoblacionExcel
As we have seen, this inserting data technique, but requires us to
perform an additional step, is a significant gain in time, employing less than
a minute in the transfer of data to the table DatosPoblacion, compared to the
fifteen minutes used by the OPENROWSET function.
Importing the other
catalog tables
For tables CCAA and Pais, we will use two Excel files containing,
respectively, the official classification of regions
and countries .These files are available in
compressed format on the website of the Madrid Community Statistics Institute. Once downloaded and unzipped we'll
run the following SQL statements for import into our database.
INSERT INTO CCAA
SELECT DISTINCT ccaa, liteccaa
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\DatosOrigen\ccaaprov.xls',
'SELECT * FROM [ccaaprov$]')
INSERT INTO Pais
SELECT DISTINCT isopais,lpaisc
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\DatosOrigen\cozonu.xls',
'SELECT * FROM [cozonu$]')
Note that depending on the version of Excel file to import, in
OPENROWSET function we'll use a different provider to
get the data.If the file is for Excel 2007-2010 will use
'Microsoft.ACE.OLEDB.12.0 ', while for earlier versions will be
"Microsoft.Jet.OLEDB.4.0 '.
Setting the country
code in the table DatosPoblacion
A closer look at the records of the Pais table will unveil that Pais_ID
field values are uncorrelated, being also the lowest value 4 and the highest
894.

This contrasts with existing data in the field of the same name for the
table DatosPoblacion because, although the maximum and minimum value of this
field is also between 4 and 894, we find a lot of records where the field
Pais_ID not correspond to any value in the Pais table.
To solve this problem we resort to a couple of techniques, of which the
first will be taken from DatosPoblacion table, each Pais_ID field values that
do not exist in the Pais table, adding one until we reach a value which does
exist in the countries catalog table mentioned. We will implement this
process in the SQL Server function below.
CREATE FUNCTION dbo.ObtenerPais(@nPais_ID int)
RETURNS int
AS
BEGIN
WHILE (SELECT COUNT(*) FROM Pais WHERE Pais_ID = @nPais_ID) = 0
BEGIN
SET @nPais_ID = @nPais_ID + 1
END
RETURN @nPais_ID
END
GO
The update of field Pais_ID in table DatosPoblacion the take out with
the next statement.
UPDATE DatosPoblacion
SET Pais_ID = dbo.ObtenerPais(Pais_ID)
WHERE Pais_ID NOT IN (SELECT Pais_ID FROM Pais)
The second technique is more direct, as it avoids the use of the search
function of the field Pais_ID in the countries table.What we do here is an
update of the field Pais_ID for the entire table DatosPoblacion, looking in the
Pais table, the value of Pais_ID closest to that exists in the same field in the
table DatosPoblacion.
UPDATE DatosPoblacion
SET Pais_ID = (SELECT TOP 1 Pais.Pais_ID FROM Pais
WHERE Pais.Pais_ID >= DatosPoblacion.Pais_ID
ORDER BY Pais.Pais_ID)
In both cases, we get that all the records in the DatosPoblacion table
join correctly with the Pais table through Pais_ID field.
Manual data entry
After the above operations, the only remaining empty table is Sexo, so
we'll run the following statements, which will create the necessary records.
INSERT INTO Sexo VALUES ('H','Hombre')
INSERT INTO Sexo VALUES ('M','Mujer')
Establishing
relationships between tables
To finish the database creation, we will establish the appropriate
relationships between fields in the table DatosPoblacion and other catalog
tables using the following sentences.
ALTER TABLE DatosPoblacion WITH CHECK ADD
CONSTRAINT FK_DatosPoblacion_Sexo FOREIGN KEY(Sexo_ID) REFERENCES Sexo (Sexo_ID),
CONSTRAINT FK_DatosPoblacion_CCAA FOREIGN KEY(CCAA_ID) REFERENCES CCAA (CCAA_ID),
CONSTRAINT FK_DatosPoblacion_Pais FOREIGN KEY(Pais_ID) REFERENCES Pais (Pais_ID)
After this operation we finish this article, which explained the
different parts of a process for generating, from Excel, a considerable amount
of test data that can be used later from SQL Server.We hope you find useful.
Regards.
Creating sample data
During the application development stages, in most of the time, we are
faced with the need to have at our disposal a set of test data to use in the
different processes that are developing. A similar situation occurs if we are
building an information system based on OLAP data cubes using SQL Server
Analysis Services, because in these cases we will need also a large volume of
data, to perform analysis simulations.
Suppose we have to generate a database of population data, with a table
containing data of individuals such as age, code region of residence, national
origin, sex, registration date, etc. On the other hand, also need
a series of catalog tables of countries, regions, and other values related to
the fields in the table of individuals.
Among the full range of utilities, tricks, and more, which exist to
carry out this task, in this article we will use Excel as a tool for generating
the dummy data set for individuals that later we'll dump in a SQL Server
database, which could be used as a data source for the application or
information system.
Creating data with
Excel
After starting Excel 2010 (earlier versions may also use), our first
task will be to create a column with the values that will serve to identify
the rows in the table. The simplest way to generate is to introduce a pair
of consecutive numbers in individual cells in a column of the worksheet, select
both cells and drag the fill handle to the last row for which we want to
generate the numbers.

However, the generation of values using this technique can be somewhat
cumbersome in the case that we must to produce a large number of rows and / or
columns, so that for the creation of data for all columns of the sheet we'll
use a more flexible yet powerful solution: Excel macros.
Creating a Macro
To create a macro we will click on the View tab of the Excel ribbon, and
within the Macros group we will click on
the option of the same name, which opens the Macro dialog box, where we'll write
the macro name: CrearDatosPoblacion.

Clicking the Create button opens the Visual Basic for Applications (VBA)
window editor, so that we can start writing the code of the macro.

Within the procedure body CrearDatosPoblación, write the following code block, where first, clean up the cells of the spreadsheet where we are
currently positioned. Then ask the user to enter through an InputBox
dialog box, the number of rows to generate. After inserting the column title,
we will introduce the first two values that start the series, which generate
using the method Selection.AutoFill.
Option Explicit
Sub CrearDatosPoblacion()
Dim nFilaDestino As Long
Dim sCeldaOrigen As String
Dim sCeldaDestino As String
'clean worksheet cells
Cells.Select
Selection.ClearContents
nFilaDestino = InputBox("Number of records to generate")
nFilaDestino = nFilaDestino + 1
'fila_id column
'---------------
'column caption
Range("A1").Select
ActiveCell.FormulaR1C1 = "Fila_ID"
'init values of series to generate
Range("A2").Select
ActiveCell.FormulaR1C1 = "1"
Range("A3").Select
ActiveCell.FormulaR1C1 = "2"
'select init values range
Range("A2:A3").Select
'fill total range of cells
sCeldaOrigen = "A2"
sCeldaDestino = "A" & nFilaDestino
Selection.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
End Sub

To run the macro will select the menu "Run | Run Sub / UserForm", or
press the key F5, filling the first column of the sheet with the number of
values indicated in the InputBox.

Before continuing we'll save our work in VBA or Excel, bearing in mind
that we must do it in a file of type "Excel Macro-Enabled Workbook (.xlsm)", what
we will be notified by a dialog box in time to save. We will click "No" in that
dialog and save our worksheet in a file named GenerarDatosPoblacion.xlsm.

Generating random data
The next column to create correspond to the age of individuals, in it,
need to generate random values within a range of numbers, representing the
maximum and minimum age that a person can have, for example between 0 and 120.
To create a random value between two numbers in a cell of Excel, we can
use a formula with RANDBETWEEN function, that receives as parameter the above
numbers, returning as a result the number generated.

Moving this functionality to the macro that we developed, we will add to
it the next code block, which assign the formula expression RANDBETWEEN
(0,120) to a cell in the second column. We will select that cell, and repeat
the formula over a range of cells using the method ActiveCell.AutoFill.
'age
'----
Range("B1").Select
ActiveCell.FormulaR1C1 = "Edad_ID"
sCeldaOrigen = "B2"
sCeldaDestino = "B" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(0,120)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
Running the macro again, we'll get the new column with the age data.

Our next step is for the column of sex of individuals. Here
we could have used numbers 1 and 2, also generated randomly, to identify male
and female respectively, but let's get a little this operation, using instead
the letters H and M as values for the column. So the problem now is how to
randomly generate these letters in the cells of the column, since RANDBETWEEN only
receives and returns numerical results.
The solution is very simple, as it also involves using RANDBETWEEN
function, but combining it with the decision expression IF. We will pass to RANDBETWEEN
the numbers 1 and 2 as parameters, and according to the result obtained, IF
return the letter H or M. Below is the code block for this column, we will add
to the macro.
'sex
'---
Range("C1").Select
ActiveCell.FormulaR1C1 = "Sexo_ID"
sCeldaOrigen = "C2"
sCeldaDestino = "C" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=IF(RANDBETWEEN(1,2)=1,""H"",""M"")"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault

For the next two columns: code region of residence and country of origin
code, we'll follow the same mechanics as the column ages, although using
different numerical ranges.
'code region of residence
'------------------------
Range("D1").Select
ActiveCell.FormulaR1C1 = "CCAA_ID"
sCeldaOrigen = "D2"
sCeldaDestino = "D" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,19)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'country
'-------
Range("E1").Select
ActiveCell.FormulaR1C1 = "Pais_ID"
sCeldaOrigen = "E2"
sCeldaDestino = "E" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(4,894)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault

Our next step is to create the data of an alleged registration date of
individuals in this population system, a task that will make in two phases. First,
we create each part of the date in separate columns.
'date elements:
'==============
'year
'---
Range("F1").Select
ActiveCell.FormulaR1C1 = "Anualidad"
sCeldaOrigen = "F2"
sCeldaDestino = "F" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(2008,2010)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'month
'-----
Range("G1").Select
ActiveCell.FormulaR1C1 = "Mes"
sCeldaOrigen = "G2"
sCeldaDestino = "G" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RANDBETWEEN(1,12)"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
'day
'---
Range("H1").Select
ActiveCell.FormulaR1C1 = "Dia"
sCeldaOrigen = "H2"
sCeldaDestino = "H" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=IF(RC[-1]=2, RANDBETWEEN(1,28), IF(OR(RC[-1]=4, RC[-1]=6, RC[-1]=9, RC[-1]=11), RANDBETWEEN(1,30), RANDBETWEEN(1,31)))"
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
As we have seen, the day is the part of the date that requires further
work, as we must be careful to avoid the generation, for example, of a day 31 if
in the month column we have the value 2. We will solve this checking the month
column in first place, and according this we'll use a different range to
generate the day. So that in the formula we'll use several IF expressions
combined with RANDBETWEEN functions.

In the second phase of this operation, we will create a final column in
the spreadsheet with a date in a format understandable to SQL Server, as a
result of the concatenation of the above columns: Anualidad, Mes and Dia.
It is necessary to take into account when composing the date in this
way, we must add a zero to the month and day when these values are only one
digit.
'date composition
'----------------
Range("I1").Select
ActiveCell.FormulaR1C1 = "Fecha_Alta"
sCeldaOrigen = "I2"
sCeldaDestino = "I" & nFilaDestino
Range(sCeldaOrigen).Select
ActiveCell.FormulaR1C1 = "=RC[-3] & IF(LEN(RC[-2])=1,""0"" & RC[-2],RC[-2]) & " & _
"IF(LEN(RC[-1])=1,""0"" & RC[-1],RC[-1]) "
ActiveCell.AutoFill Destination:=Range(sCeldaOrigen & ":" & sCeldaDestino), Type:=xlFillDefault
When we rerun the macro, we'll get the correctly formatted date in the
last column. This time also establish the number of records to generate in
a million, that way we will test the data creation power of our process.

To complete operations in Excel we'll save the file as "Excel
Workbook (*.xlsx)". Dialog box appears again warning that we couldn't save
macros in a .xlsx file, which we'll reply by clicking Yes.
At this point we conclude the first part of the article, in the second
installment will explain how to transfer the data just generated to SQL Server.
Regards,
This article addresses the challenge of develop a process to create a database with demographic information, which serves as a starting point for creating a data model in PowerPivot, used in the construction of the population pyramid shown in Population pyramids with PowerPivot. Preparing the data and Population pyramids with PowerPivot. Chart development.
Unlike the article "Generate SQL Server test data from Excel" (part 1 and part 2) also published in this blog, where we used Excel to generate dummy data population, this time we will work with real data, which represent a larger volume of information to manage with respect the above-mentioned article.
The current aim is to create a database containing a population table, in which each record represents an individual, with its corresponding age, sex and healthcare area.
Before proceeding, I want to express my gratitude to Ricard Gènova Maleras, a demographer specializing in population health analysis, part of the Health Service Reports and Studies (Directorate of Health Promotion and Prevention, DG Primary Care, Health Ministry CM) for their valuable advice in those key demographics for the development of this and the all related population pyramids published in this blog; and the other members of that Service: Jenaro Astray Mochales, Felicitas Dominguez Berjón, María Dolores Esteban Vasallo and Beatriz Elvira Rodriguez, for their support and assistance in all health concepts about population necessary for the proper focus of this article.
Last but not least, thanks to Enrique Barceló, fellow sufferer in this work of BI, for the interesting times we chatting about matters relating to the development of information systems in general and data cubes in particular, and for sharing his amazing knowledge of OLAP with the rest of the team.
Data collection and preparation
There are several Web sites belonging to different agencies, where we will find demographic information needed to perform our analysis, organized by criteria such as sex, age, geographic region, etc. Of all these agencies to highlight the relevant United Nations , National Statistical Institute (Spain), Institute of Statistics of the CAM (Madrid) and CELADE (demographics of Latin America and Caribbean). In all cases, in addition to direct consultation in the website, we have the ability to download information in various formats such as Excel, CSV, etc.
For the development of the examples in this article we'll use the registered population data from the Community of Madrid for the year 2010, classified by age, sex and health zoning, effective as of that year, found in the following link (T10Z2_SALUD10) from the website of the Statistical Institute of CAM.
Data extraction strategy
The Excel file pc10t10z2_salud10.xls obtained from the above link, organizes the data of population around three worksheets containing respectively, the values of total population, men and women.
Each worksheet has in its first two columns the code and name of the health care area, while the remaining columns contain population figures classified in five-year groups, where each group has a column with the total five-year and several columns representing the detail by simple age of the ages that make up the group.

The data we need extract are, first, the codes and names of health zoning, for which we can use any of the worksheets in the Excel file. Moreover, we have to get the total five-year age group in both sheets: male and female population.
We will bring all these values to a new worksheet in which we copy each combination of health area codes, age group and population type (male / female) in order of steps similar to the following figure.

Here we see an approximation of what would be the resulting spreadsheet.
Using select, copy and paste operations by hand to transfer the needed data to a new sheet can be a cumbersome and heavy task. That is why we propose to use a macro that automates all this work, allowing us to repeat the process as many times as we want, and apply it to other populations whose data is structured the same way.
To create the macro, on the Excel Ribbon will click on the "Macros" option, belonging to the same name group, located on the "View" tab.
This option is also available on the "Developer" tab, within "Code" group.
In the event that the "Developer" tab is not visible, we will click on the "File" tab, and within this on "Options". In "Excel Options" window will click "Customize Ribbon" and displaying the list on the right select the item "Main Tabs". In the panel below the list will check "Developer", which will make this tab visible in the ribbon.
Regardless the way chosen for its creation, the "Macro" window will appear, we'll give the name "TraspasarDatosPoblacion" to our macro and click "Create", which will lead to the VBA (Visual Basic for Applications) editor, where we write the code for the macro.
We will divide the work to carry out the macro into three parts: data creation for health area code column, age range column, and finally, the columns of numbers of population by sex.
In the next block of code we can see the instructions responsible of creating a new worksheet, which assign a name and column headings. Later we'll select the area codes from a source worksheet, which will paste in the new sheet many times as there age ranges.
Option Explicit
Private Const RANGOS_COLUMNAS As Integer = 20
Sub TraspasarDatosPoblacion()
' create new sheet and assign name
Dim sHojaDestino As String
sHojaDestino = "DatosBasePoblacion"
Dim oWorksheet As Worksheet
Set oWorksheet = Sheets.Add(After:=Sheets(Sheets.Count))
oWorksheet.Name = sHojaDestino
' assign column titles
Range("A1").Select
ActiveCell.FormulaR1C1 = "Zona"
Range("B1").Select
ActiveCell.FormulaR1C1 = "Rango_Edad"
Range("C1").Select
ActiveCell.FormulaR1C1 = "Poblacion_H"
Range("D1").Select
ActiveCell.FormulaR1C1 = "Poblacion_M"
' back to source data sheet
Sheets("Hombres").Select
' set start and end data rows
Dim sFilaInicial As String
Dim sFilaFinal As String
sFilaInicial = "14"
sFilaFinal = "299"
' calculate number of existing zones
Dim nCantidadZonas As Integer
nCantidadZonas = (CInt(sFilaFinal) - CInt(sFilaInicial))
' zone codes
' ---------------
Dim sColumna As String
sColumna = "A"
' select codes, copy to clipboard and paste in a new sheet
Range(sColumna & sFilaInicial & ":" & sColumna & sFilaFinal).Select
Selection.Copy
Sheets(sHojaDestino).Select
Dim nContador As Integer
Dim nFilaVacia As Long
' do paste for each age group
For nContador = 0 To RANGOS_COLUMNAS
' obtain next empty row number from column
nFilaVacia = FilaVaciaEnColumna(1)
' place in cell
Range(sColumna & CStr(nFilaVacia)).Select
' paste data
ActiveSheet.Paste
Next
' assign white color background to column
Columns("A:A").Select
Selection.Interior.Pattern = xlNone
Selection.Interior.TintAndShade = 0
Selection.Interior.PatternTintAndShade = 0
Before each assignment of values in cells, we need to know the empty row from which paste the data. For this, we'll use the auxiliary function "FilaVaciaEnColumna" to iterate the collection of rows in the sheet to find the row containing an empty cell in that column on which we are working.
Function FilaVaciaEnColumna(ByVal nPosicionCelda As Integer) As Long
Dim vFila As Variant
Dim nFila As Long
For Each vFila In ActiveSheet.Rows
nFila = vFila.Row
If vFila.Cells(nPosicionCelda).Text = "" Then
Exit For
End If
Next
FilaVaciaEnColumna = nFila
End Function
The next phase in creating the macro will be to assign values to the column age ranges. So we'll create an array containing these values, and to cross it, in each iteration, the current range value will be copied the same number of times than the amount of existing health zones. Note that before assigning values to this column we will apply the text format (property Selection.NumberFormat), since otherwise, the default format may cause some values of the age ranges are interpreted as month-year.
' age ranges
' --------------
Dim aRangosEdad(RANGOS_COLUMNAS) As String
aRangosEdad(0) = "0-4"
aRangosEdad(1) = "5-9"
aRangosEdad(2) = "10-14"
aRangosEdad(3) = "15-19"
aRangosEdad(4) = "20-24"
aRangosEdad(5) = "25-29"
aRangosEdad(6) = "30-34"
aRangosEdad(7) = "35-39"
aRangosEdad(8) = "40-44"
aRangosEdad(9) = "45-49"
aRangosEdad(10) = "50-54"
aRangosEdad(11) = "55-59"
aRangosEdad(12) = "60-64"
aRangosEdad(13) = "65-69"
aRangosEdad(14) = "70-74"
aRangosEdad(15) = "75-79"
aRangosEdad(16) = "80-84"
aRangosEdad(17) = "85-89"
aRangosEdad(18) = "90-94"
aRangosEdad(19) = "95-99"
aRangosEdad(20) = "100+"
' assign format of age column to text
sColumna = "B"
Columns(sColumna & ":" & sColumna).Select
Selection.NumberFormat = "@"
' repeat each value of aRangosEdad array
' as times as number of existing zones
Dim vRangoEdad As Variant
For Each vRangoEdad In aRangosEdad
' obtain next empty row number from column
nFilaVacia = FilaVaciaEnColumna(2)
' assign age range value in next two cells down
Range(sColumna & CStr(nFilaVacia)).Select
ActiveCell.FormulaR1C1 = vRangoEdad
Range(sColumna & CStr(nFilaVacia + 1)).Select
ActiveCell.FormulaR1C1 = vRangoEdad
' set cell range and fill with value of above cells
Range(sColumna & CStr(nFilaVacia) & ":" & sColumna & CStr(nFilaVacia + 1)).Select
Selection.AutoFill Destination:=Range(sColumna & CStr(nFilaVacia) & ":" & sColumna & CStr(nFilaVacia + nCantidadZonas)), Type:=xlFillDefault
Next
And we end the macro with the assignment of the columns dedicated to the population figures by sex, where this time use two combined arrays containing the column headings that correspond to the five-year total population and population type. When touring both in a nested way, for a certain type of population, we'll extract their numbers of individuals, assigning them to the destination worksheet.
' population figures by sex
' ----------------------------
Dim aColumnasPoblacion(RANGOS_COLUMNAS) As String
aColumnasPoblacion(0) = "D"
aColumnasPoblacion(1) = "J"
aColumnasPoblacion(2) = "P"
aColumnasPoblacion(3) = "V"
aColumnasPoblacion(4) = "AB"
aColumnasPoblacion(5) = "AH"
aColumnasPoblacion(6) = "AN"
aColumnasPoblacion(7) = "AT"
aColumnasPoblacion(8) = "AZ"
aColumnasPoblacion(9) = "BF"
aColumnasPoblacion(10) = "BL"
aColumnasPoblacion(11) = "BR"
aColumnasPoblacion(12) = "BX"
aColumnasPoblacion(13) = "CD"
aColumnasPoblacion(14) = "CJ"
aColumnasPoblacion(15) = "CP"
aColumnasPoblacion(16) = "CV"
aColumnasPoblacion(17) = "DB"
aColumnasPoblacion(18) = "DH"
aColumnasPoblacion(19) = "DN"
aColumnasPoblacion(20) = "DT"
Dim aTiposPoblacionColumnas(1) As String
aTiposPoblacionColumnas(0) = "Hombres,C,3"
aTiposPoblacionColumnas(1) = "Mujeres,D,4"
Dim vTipoPoblacionColumna As Variant
Dim sTipoPoblacion As String
Dim nPosicionColumna As Integer
Dim vColumnaPoblacion As Variant
' for each population type
For Each vTipoPoblacionColumna In aTiposPoblacionColumnas
' obtain information of population type:
' sex, target column, numeric position in target column
sTipoPoblacion = Split(vTipoPoblacionColumna, ",")(0)
sColumna = Split(vTipoPoblacionColumna, ",")(1)
nPosicionColumna = Split(vTipoPoblacionColumna, ",")(2)
' iterate the columns with population figures
' and paste in new sheet
For Each vColumnaPoblacion In aColumnasPoblacion
' place in sheet with source population data
Sheets(sTipoPoblacion).Select
' select and copy cells
Range(vColumnaPoblacion & sFilaInicial & ":" & vColumnaPoblacion & sFilaFinal).Select
Selection.Copy
' place in target sheet
Sheets(sHojaDestino).Select
' obtain next empty row, place in cell and paste data
nFilaVacia = FilaVaciaEnColumna(nPosicionColumna)
Range(sColumna & CStr(nFilaVacia)).Select
ActiveSheet.Paste
Next
Next
' change column style to display cell borders
Columns("C:D").Select
Selection.Style = "Normal"
Range("A1").Select
End Sub
Then create a new macro with the name "TraspasarDatosZonificacion" whereby we'll move to a new worksheet codes and names of the healthcare areas. These data will be used to load a table in the database we'll create later.
Sub TraspasarDatosZonificacion()
' create new sheet and assign name
Dim sHojaDestino As String
sHojaDestino = "DatosZonificacion"
Dim oWorksheet As Worksheet
Set oWorksheet = Sheets.Add(After:=Sheets(Sheets.Count))
oWorksheet.Name = sHojaDestino
' set column titles
Range("A1").Select
ActiveCell.FormulaR1C1 = "Zona_ID"
Range("B1").Select
ActiveCell.FormulaR1C1 = "Zona_DS"
' back to data sheet
Sheets("Hombres").Select
' copy zone codes and names to clipboard
Range("A14:B299").Select
Selection.Copy
' back to target sheet and paste data
Sheets(sHojaDestino).Select
Range("A2").Select
ActiveSheet.Paste
' remove cell background color
Columns("A:B").Select
Selection.Interior.Pattern = xlNone
Selection.Interior.TintAndShade = 0
Selection.Interior.PatternTintAndShade = 0
Range("A1").Select
End Sub
When we copy the codes and names of the health zones to new sheets within the Excel file, the color configuration of the cells corresponding to these values is copied too. To achieve white background color in these cells, we use the following lines of code.
Selection.Interior.TintAndShade = 0
Selection.Interior.PatternTintAndShade = 0
But if our version of Excel is prior to 2007 these lines will fail at runtime, so we must comment them to avoid being executed.
Finally, from the macro management window, we'll select each one and click on "Run".
As a result we will obtain two new sheets with the data generated by the macros.

Database creation
Our next step is to create the database PiramidePoblacion, to accommodate the information we have just prepared in Excel. The following code block shows the Transact-SQL script that will be executed in our SQL Server instance (in this article we have used SQL Server 2008 R2).
CREATE DATABASE PiramidePoblacion
GO
USE PiramidePoblacion
GO
CREATE TABLE DatosBasePoblacion
(
Fila_ID int IDENTITY(1,1) NOT NULL,
Zona varchar(3) NULL,
Rango_Edad varchar(10) NULL,
Poblacion_H int NULL,
Poblacion_M int NULL
)
GO
CREATE TABLE Poblacion
(
Fila_ID int IDENTITY(1,1) NOT NULL,
Zona_ID varchar(3) NULL,
Edad_ID int NULL,
Sexo_ID char(1) NULL,
CONSTRAINT PK_Poblacion PRIMARY KEY CLUSTERED (Fila_ID ASC)
)
GO
CREATE TABLE Zona
(
Zona_ID varchar(3) NOT NULL,
Zona_DS varchar(30) NULL,
CONSTRAINT PK_Zona PRIMARY KEY CLUSTERED (Zona_ID ASC)
)
GO
CREATE TABLE Edad
(
Edad_ID int NOT NULL,
Edad_Grupo varchar(20) NULL,
CONSTRAINT PK_Edad PRIMARY KEY CLUSTERED (Edad_ID ASC)
)
GO
CREATE TABLE Sexo
(
Sexo_ID char(1) NOT NULL,
Sexo_DS varchar(10) NULL,
CONSTRAINT PK_Sexo PRIMARY KEY CLUSTERED (Sexo_ID ASC)
)
GO
The table Poblacion will contain the main data of our population, while tables Zona, Edad and Sexo contain catalog information (code / descriptor).
Surely, in the script we will have noticed the lack of foreign keys between Poblacion table and the rest. This is an intentional forgetting that we'll fix in the articles Population pyramids with PowerPivot. Preparing the data and Population pyramids with PowerPivot. Chart development, where we demonstrate how using PowerPivot we can also create relationships between tables in the model.
After running the script, the first action that will take place in the new database will be transfer data from sheet DatosBasePoblacion of file pc10t10z2_salud10.xls into the table of the same name, using the following statement.
INSERT INTO DatosBasePoblacion
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 8.0;Database=C:\DatosOrigen\pc10t10z2_salud10.xls',
'SELECT * FROM [DatosBasePoblacion$]')
Generating population records in the database
The purpose of the process that we are doing so far is to generate, for table Poblacion, a number of records that represent the population with which we work, and that will equal the sum of columns Poblacion_H and Poblacion_M from table DatosBasePoblacion.
For this volume of data, using a loop to create one by one the records is totally unfeasible due to time and performance reasons, so we must find an alternative technique that allows us to work using sets of results, to create the necessary number of records, using a small number of operations.
The technique that we will opt is shown in the book Microsoft SQL Server 2008: T-SQL Querying , by Itzik Ben-Gan and other great names in SQL Server: Lubor Kollar, Dejan Sarka and Steve Kass; and is use several "Common Table Expressions" or CTE nested, through which generate a result set of appropiate size, that we will combine with an INSERT INTO statement, to add to Poblacion table a determined number of records. All this will include in the following stored procedure.
CREATE PROCEDURE GenerarRegistrosPoblacion
@nZona_ID AS int,
@sRango_Edad AS varchar(10),
@sSexo_ID AS char(1),
@nPoblacion AS int
AS
BEGIN
DECLARE @nEdad_ID AS int =
CASE
WHEN CHARINDEX('-',@sRango_Edad) = 0 THEN 100
WHEN CHARINDEX('-',@sRango_Edad) = 2 THEN CAST(LEFT(@sRango_Edad,1) AS int)
WHEN CHARINDEX('-',@sRango_Edad) = 3 THEN CAST(LEFT(@sRango_Edad,2) AS int)
END;
WITH
Numeros0 AS (SELECT 1 AS Numero UNION ALL SELECT 1),
Numeros1 AS (SELECT 1 AS Numero FROM Numeros0 AS TblPrim CROSS JOIN Numeros0 AS TblSeg),
Numeros2 AS (SELECT 1 AS Numero FROM Numeros1 AS TblPrim CROSS JOIN Numeros1 AS TblSeg),
Numeros3 AS (SELECT 1 AS Numero FROM Numeros2 AS TblPrim CROSS JOIN Numeros2 AS TblSeg),
Numeros4 AS (SELECT 1 AS Numero FROM Numeros3 AS TblPrim CROSS JOIN Numeros3 AS TblSeg),
Numeros5 AS (SELECT 1 AS Numero FROM Numeros4 AS TblPrim CROSS JOIN Numeros4 AS TblSeg),
NumerosTotal AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS NumeroFila FROM Numeros5)
INSERT INTO Poblacion
SELECT @nZona_ID, @nEdad_ID, @sSexo_ID
FROM NumerosTotal
WHERE NumeroFila <= @nPoblacion;
END
GO
Then write a script which will cross the DatosBasePoblacion table, and each of their records will take the value of Poblacion_H and Poblacion_M fields, inserting in Poblacion table, by executing the stored procedure GenerarRegistrosPoblacion, a number of rows equal to value of the aforementioned fields.
DECLARE @nContador AS int = 1;
DECLARE @nDatosBaseFilaFinal AS int = (SELECT MAX(Fila_ID) FROM DatosBasePoblacion);
DECLARE @nZona AS int;
DECLARE @sRango_Edad AS varchar(10);
DECLARE @nPoblacion_H AS int;
DECLARE @nPoblacion_M AS int;
WHILE (@nContador <= @nDatosBaseFilaFinal)
BEGIN
SELECT @nZona = Zona,
@sRango_Edad = Rango_Edad,
@nPoblacion_H = Poblacion_H,
@nPoblacion_M = Poblacion_M
FROM DatosBasePiramidePoblacion
WHERE Fila_ID = @nContador;
EXECUTE GenerarRegistrosPoblacion @nZona, @sRango_Edad, 'H',@nPoblacion_H
EXECUTE GenerarRegistrosPoblacion @nZona, @sRango_Edad, 'M',@nPoblacion_M
SET @nContador += 1
END
The time spent in the execution of this script was 1 minute and 37 seconds in a virtual machine equipped with a Core 2 Duo CPU and 1.5 GB of RAM, which represents excellent performance given the large number of records added to Poblacion table.
We ended the preparing database operations with the statements used to add data in tables that serve as catalogs for the table Poblacion. As we see below, for the Zona table we also get the records from pc10t10z2_salud10.xls file.
--////
INSERT INTO Zona
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 8.0;Database=C:\DatosOrigen\pc10t10z2_salud10.xls',
'SELECT * FROM [DatosZonificacion$]')
--////
DECLARE @nContador AS int = 0;
WHILE (@nContador <= 120)
BEGIN
INSERT INTO Edad
SELECT @nContador,
CASE
WHEN @nContador BETWEEN 0 AND 4 THEN '000-004'
WHEN @nContador BETWEEN 5 AND 9 THEN '005-009'
WHEN @nContador BETWEEN 10 AND 14 THEN '010-014'
WHEN @nContador BETWEEN 15 AND 19 THEN '015-019'
WHEN @nContador BETWEEN 20 AND 24 THEN '020-024'
WHEN @nContador BETWEEN 25 AND 29 THEN '025-029'
WHEN @nContador BETWEEN 30 AND 34 THEN '030-034'
WHEN @nContador BETWEEN 35 AND 39 THEN '035-039'
WHEN @nContador BETWEEN 40 AND 44 THEN '040-044'
WHEN @nContador BETWEEN 45 AND 49 THEN '045-049'
WHEN @nContador BETWEEN 50 AND 54 THEN '050-054'
WHEN @nContador BETWEEN 55 AND 59 THEN '055-059'
WHEN @nContador BETWEEN 60 AND 64 THEN '060-064'
WHEN @nContador BETWEEN 65 AND 69 THEN '065-069'
WHEN @nContador BETWEEN 70 AND 74 THEN '070-074'
WHEN @nContador BETWEEN 75 AND 79 THEN '075-079'
WHEN @nContador BETWEEN 80 AND 84 THEN '080-084'
WHEN @nContador BETWEEN 85 AND 89 THEN '085-089'
WHEN @nContador BETWEEN 90 AND 94 THEN '090-094'
WHEN @nContador BETWEEN 95 AND 99 THEN '095-099'
WHEN @nContador >= 100 THEN '100+'
END
SET @nContador += 1;
END
--////
INSERT INTO Sexo VALUES ('H','Hombre')
INSERT INTO Sexo VALUES ('M','Mujer')
Data loading completed
At this point we conclude the process of creation and loading of demographic information in our database. In the articles pointed out at the beginning, we will use this database as a starting point for the construction of population pyramids with PowerPivot.
Más artículos
Página siguiente >