Tuesday, October 13, 2020

Benefits and capabilities of Data Lake

We know that data is the business asset for any organisation which always keeps secure and accessible to business users whenever it required. 
Data Lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. The data structure and requirements are not defined until the data is needed. We can say that Data Lake is a more organic store of data without regard for the perceived value or structure of the data.

The data lake is essential for any organization who wants to take full advantage of its data. The data lake arose because new types of data needed to be captured and exploited by the enterprise. As this data became increasingly available, early adopters discovered that they could extract insight through new applications built to serve the business.
Benefits and capabilities of Data Lake - It supports the following capabilities:
  • To capture and store raw data at scale for a low cost – i.e. The Hadoop-based data lake
  • To store many types of data in the same repository – data lake store the data as-is and support structured, semi-structured, and unstructured data
  • To perform transformations on the data
  • To define the structure of the data at the time, it is used, referred to as schema on reading
  • To perform new types of data processing
  • To perform single subject analytics based on particular use cases
  • To catch all phrase for anything that does not fit into the traditional data warehouse architecture
  • To be accessed by users without technical and/or data analytics skills is ludicrous

Silent Points of Data Lake - It is containing some of the salient points as given below:
  1. A Data Lake stores data in 'near exact original format' and by itself does not provide data integration
  2. Data Lakes need to bring ALL data (including the relevant relational data)
  3. A Data Lake becomes meaningful ONLY when it comes a Data Integration Platform

  4. A Data Integration Platform (Meaningful Data Lake) requires the following 4 major components: An ingestion layer, A multi-modal NoSQL database for data persistence, Transformation Code (Cleanse, Massage & Harmonize data) and A Hadoop Cluster for (generate batch and real-time analytics)
  5. The goal of this architecture is to use 'the right technology solution for the right problem'
  6. This architecture utilizes the foundation data management principle of ELT not ETL. In fact T is continuous (T to the power of n). Transformation (change) is continuous in every aspect of any thriving business and Data Integration Platforms (Meaningful Data Lakes) need to support that.
  7. So the process is as follows:
  • 1) Ingest ALL data
  • 2) Persist in a scalable multi-model NoSQL database  - RawDB
  • 3) Transform the data continuously - CleanDB
  • 4) Transport 'clean data' to Hadoop to generate Analytics
  • 5) Persist the 'Analytics' back in the NoSQL database - AnalyticsDB
  • 6) Expose the databases using REST endpoints
  • 7) Consume the data via applications

Friday, June 1, 2018

SSIS - How to Call Multiple Child Packages by Master Package

In our day to day activities, we can build a data extraction module that can be called from different packages. 
For example, we have to load the data into a star schema, and we can build a separate package to populate each dimension and the fact table and these packages are located in some folder. They should be executed in certain order one by one.
Now, we want to create one master SSIS package that will go to that folder and grab those child packages and execute or run them one by one (no repeat) in a predefined order.
To accomplished this, You could build a Foreach Loop Container with a Package Execute Task in it to execute them, then we would need to name them so that they were retrieved in the order we wanted. 
With the help of variables, we can set the package name property in Package Execute Task and this variable will get the next value from the Foreach loop Container.

Foreach Loop Container can get these package names from a data table that contained the order we wanted and then we populate an SSIS object type variable with the record-set and use that to feed the order and the list to a Foreach loop just like we mentioned above. The only difference would be the source of the list.

The below video is capable to explain that How can we call multiple child packages by parent or master package?

Saturday, May 5, 2018

SSIS - Script Task Check if Sub-folder || Directory Exists or Not


In this tutorial, we are going to explain the functionalities of Script Task to Check if Subfolder Exists in SQL Server Integration Services. The Script task provides code to perform functions that are not available in the built-in tasks and transformations that SQL Server Integration Services provides. The Script task can also combine functions in one script instead of using multiple tasks and transformations.
For more information, please visit us at http://www.sql-datatools.com/2016/04/ssis-object-types-variable-in-execute.html

Wednesday, May 2, 2018

SSIS - Tracking Object Type Variables || Foreach Loop Container


In this tutorial, we are going to explain the functionalities of Object Type Variable or Custom Variables in Foreach Loop Container in SQL Server Integration Services. Variables are extremely important and are widely used in an SSIS package. A variable is a named object that stores one or more values and can be referenced by various SSIS components throughout the package’s execution.
For more information, please visit us at http://www.sql-datatools.com/2016/04/ssis-object-types-variable-in-execute.html

SSIS - How to work with Foreach Loop Container || Basics of Foreach Loop...


In this tutorial, we are going to explain the functionalities of Object Type Variable or Custom Variables in Foreach Loop Container in SQL Server Integration Services. Variables are extremely important and are widely used in an SSIS package. A variable is a named object that stores one or more values and can be referenced by various SSIS components throughout the package’s execution.
For more information, please visit us at http://www.sql-datatools.com/2016/04/ssis-object-types-variable-in-execute.html