What is SQL Server Integration Services (SSIS)?
- SQL Server Integration Services (SSIS) is component of SQL Server 2005 and later versions. SSIS is an enterprise scale ETL (Extraction, Transformation and Load) tool which allows you to develop data integration and workflow solutions. Apart from data integration, SSIS can be used to define workflows to automate updating multi-dimensional cubes and automating maintenance tasks for SQL Server databases.
- This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.
How does SSIS differ from DTS?
- SSIS is a successor to DTS (Data Transformation Services) and has been completely re-written from scratch to overcome the limitations of DTS which was available in SQL Server 2000 and earlier versions. A significant improvement is the segregation of the control/work flow from the data flow and the ability to use a buffer/memory oriented architecture for data flows and transformations which improve performance.
Data Transformation Services
| SQL Server Integration Services
|
Limited Error Handling
| Complex and powerful Error Handling
|
Message Boxes in ActiveX Scripts
| Message Boxes in .NET Scripting
|
No Deployment Wizard
| Interactive Deployment Wizard
|
Limited Set of Transformation
| Good number of Transformations
|
NO BI functionality
| Complete BI Integration
|
What is a workflow in SSIS ?
- Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages.
What is the Control Flow?
- When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components. The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order.
What is the Data Flow Engine?
- The Data Flow Engine, also called the SSIS pipeline engine, is responsible for managing the flow of data from the source to the destination and performing transformations (lookups, data cleansing etc.). Data flow uses memory oriented architecture, called buffers, during the data flow and transformations which allows it to execute extremely fast. This means the SSIS pipeline engine pulls data from the source, stores it in buffers (in-memory), does the requested transformations in the buffers and writes to the destination. The benefit is that it provides the fastest transformation as it happens in memory and we don't need to stage the data for transformations in most cases.
What is a Transformation?
- A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are removed. Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.
What is a Task?
- A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language.
What is a Precedence Constraint and what types of Precedence Constraint are there?
- SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.
- These are the types of precedence constraints and the condition could be either a constraint, an expression or both
- Success (next task will be executed only when the last task completed successfully) or
- Failure (next task will be executed only when the last task failed) or
- Complete (next task will be executed no matter the last task was completed or failed).
What is a container and how many types of containers are there?
- A container is a logical grouping of tasks which allows you to manage the scope of the tasks together.
- These are the types of containers in SSIS:
- Sequence Container - Used for grouping logically related tasks together
- For Loop Container - Used when you want to have repeating flow in package
- For Each Loop Container - Used for enumerating each object in a collection; for example a record set or a list of files.
- Apart from the above mentioned containers, there is one more container called the Task Host Container which is not visible from the IDE, but every task is contained in it (the default container for all the tasks).
What are variables and what is variable scope?
- A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create, assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.
- Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container.
How many difference source and destinations have you used?
What configuration options have you used?
How do you apply business rules in SSIS (Transformations….Specific calculations but also cleansing)?
How to quickly load data into sql server table?
Give example of handling data quality issues?
When to use Stored Procedures?
During ETL extract you usually type simple SQL because it comes from other sources and usually over complication is not a good choice (make it dynamic) because any changes usually affect the package which has to be updated as well.
During Transformation phase (business rules, cleaning, core work) you should use Transformation tasks not Stored procedures! There are loads of tasks that make the package much easier to develop but also a very important reason is readability which is very important for other people who need to change the package and obviously it reduces risks of making errors. Performance is usually very good with SSIS as it is memory/flow based approach. So when to use Stored Procedures for transformations? If you don't have strong SSIS developers or you have performance reasons to do it. In some cases SPs can be much faster (usually it only applies to very large datasets). Most important is have reasons which approach is better for the situation.
What is your approach for ETL with data warehouses (how many packages you developer during typical load etc.)?
- It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping.
What configuration options have you used?
- This is an important one. Configuration should always be dynamic and usually is done using XML and/or Environment Variable and SQL Table with all configurations.
How do you apply business rules in SSIS (Transformations….Specific calculations but also cleansing)?
- Some people use SSIS only to extract data and then go with stored procedures only….they are usually missing the point of the power of SSIS. Which allows to create "a flow" and on each step applies certain rules this greatly simplifies the ETL process and simplicity is very good.
How to quickly load data into sql server table?
- Fast Load option. This option is not set by default so most developers know this answer as otherwise the load is very slow.
Give example of handling data quality issues?
- Data Quality is almost always a problem and SSIS handles it very well. Examples include importing customers from different sources where customer name can be duplicates. For instance you can have as company name: SQL Server Business Intelligence but also SQL Server BI or SQL Server BI LTD or SQL Server BI Limited or intelligence (with one l). There are different ways to handle it. Robust and time consuming is to create a table with or possible scenarios and update it after each update. You can also use fuzzy grouping which is usually easy to implement and will make usually very good decisions but it is not 100% accurate so this approach has to be justified. Other typical quality issues are nulls (missing values), outliers (dates like 2999 or types like 50000 instead of 5000 especially important if someone is adjusting the value to get bigger bonus), incorrect addresses and these are either corrected during ETL, ignored, re-directed for further manual updates or it fails the packages which for big processes is usually not practised.
When to use Stored Procedures?
- This was one of the requested question in comment (at the bottom of the page). This one is very important but also tricky. ALL SSIS developers have SQL Server background and that is sometime not very good if they use SQL not SSIS approach.
During ETL extract you usually type simple SQL because it comes from other sources and usually over complication is not a good choice (make it dynamic) because any changes usually affect the package which has to be updated as well.
During Transformation phase (business rules, cleaning, core work) you should use Transformation tasks not Stored procedures! There are loads of tasks that make the package much easier to develop but also a very important reason is readability which is very important for other people who need to change the package and obviously it reduces risks of making errors. Performance is usually very good with SSIS as it is memory/flow based approach. So when to use Stored Procedures for transformations? If you don't have strong SSIS developers or you have performance reasons to do it. In some cases SPs can be much faster (usually it only applies to very large datasets). Most important is have reasons which approach is better for the situation.
What is your approach for ETL with data warehouses (how many packages you developer during typical load etc.)?
- This is rather generic question. A typical approach (for me) when building ETL is to. Have a package to extract data per source with extract specific transformations (lookups, business rules, cleaning) and loads data into staging table. Then a package do a simple merge from staging to data warehouse (Stored Procedure) or a package that takes data from staging and performs extra work before loading to data warehouse. I prefer the first one and due to this approach I occasionally consider having extract stage (as well as stage phase) which gives me more flexibility with transformation (per source) and makes it simpler to follow (not everything in one go). So to summarize you usually have package per source and one package per data warehouse table destination. There are might be other approach valid as well so ask for reasons.
dis is gd fr interviews
ReplyDeletethank you. Very helpful.
ReplyDeleteGoos article
ReplyDeletethank you so much.. very useful information.
ReplyDeleteVery Nice
ReplyDeletethank you so much.. very useful information.
ReplyDeletethank you so much.. very useful information.
ReplyDeletethank u so much
ReplyDeleteVery Nice
ReplyDeleteWhen I initially commented, I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several emails with the same comment. Is there any way you can remove people from that service? Thanks.
ReplyDeleteAmazon Web Services Training in OMR , Chennai | Best AWS Training in OMR, Chennai
Amazon Web Services Training in Tambaram, Chennai|Best AWS Training in Tambaram, Chennai
AWS Training in Chennai |Best Amazon Web Services Training in Chennai
This is exceedingly helpful information, very good work. Thanks for sharing and let me wait for further updates.
ReplyDeleteMachine Learning course in Chennai
Machine Learning institute in Chennai
Data Analytics Courses in Chennai
Big Data Analytics Courses in Chennai
DevOps course in Chennai
Best DevOps Training in Chennai
Machine Learning Training in Anna Nagar
Machine Learning Training in T Nagar
Great article, valuable and excellent article, lots of great information, thanks for sharing with peoples.
ReplyDeleteData Science
This comment has been removed by the author.
ReplyDeleteThis is an awesome blog. Really very informative and creative contents. This concept is a good way to enhance the knowledge. Thanks for sharing.
ReplyDeleteExcelR business analytics course
Nice Blog. Really appreciate One those who are working behind of this Article.
ReplyDeleteData Science Training Course In Chennai | Data Science Training Course In Anna Nagar | Data Science Training Course In OMR | Data Science Training Course In Porur | Data Science Training Course In Tambaram | Data Science Training Course In Velachery
ReplyDeleteGreat information!! Thanks for sharing nice blog.
Data Science Course in Hyderabad
It would have been the happiest moment for you,I mean if we have been waiting for something to happen and when it happens we
ReplyDeleteforgot all hardwork and wait for getting that happened.
Java training in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Online Training
The Post is really very impressive.every concept should be explained very unique in the manner.keep sharing this blog.
ReplyDeletehardware and networking training in chennai
hardware and networking training in tambaram
xamarin training in chennai
xamarin training in tambaram
ios training in chennai
ios training in tambaram
iot training in chennai
iot training in tambaram
I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing..
ReplyDeletehadoop training in chennai
hadoop training in annanagar
salesforce training in chennai
salesforce training in annanagar
c and c plus plus course in chennai
c and c plus plus course in annanagar
machine learning training in chennai
machine learning training in annanagar
I just see the post i am so happy to the communication science post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be replay for your great thinks and I hope you post again soon...
ReplyDeletejava training in chennai
java training in omr
aws training in chennai
aws training in omr
python training in chennai
python training in omr
selenium training in chennai
selenium training in omr
I am really happy with your blog because your article is very unique and powerful for new reader.data science course in Hyderabad
ReplyDeleteIt is imperative that we read blog post very carefully. I am already done it and find that this post is really amazing.
ReplyDeletedata scientist training and placement in hyderabad
ReplyDeleteGreat blog found to be well written in a simple manner that everyone will understand and gain the enough knowledge from your blog
DevOps Training in Hyderabad
Thanks for posting the best information and the blog is very good and the blog is very good .digital marketing institute in hyderabad
ReplyDeleteEşya Depolama
ReplyDeleteReally an awesome blog. Informative and knowledgeable content. Keep sharing more blogs with us. Thank you.
ReplyDeleteOnline Data Science Course in Hyderabad
I have bookmarked your site since this site contains significant data in it. You rock for keeping incredible stuff. I am very appreciative of this site.
ReplyDeletedata analytics course in hyderabad
En Son Çıkan Perde Modelleri
ReplyDeletesms onay
mobil ödeme bozdurma
nft nasıl alınır
Ankara Evden Eve Nakliyat
trafik sigortası
dedektor
Web Sitesi Kurmak
aşk kitapları
beykoz daikin klima servisi
ReplyDeletekartal toshiba klima servisi
ümraniye toshiba klima servisi
kartal beko klima servisi
ümraniye beko klima servisi
beykoz lg klima servisi
üsküdar lg klima servisi
beykoz alarko carrier klima servisi
üsküdar alarko carrier klima servisi
en son çıkan perde modelleri
ReplyDeleteyurtdışı kargo
uc satın al
en son çıkan perde modelleri
lisans satın al
minecraft premium
özel ambulans
nft nasıl alınır