Thursday, 7 March 2013

SSIS 2012 Improvement

1. ODBC Support

The ODBC support is becoming first class now I guess because of the future full integration with Hadoop and an increased demand to integrate more easily with various open source platforms. So I guess the days when you will be able to easily connect to a Linux machine from a SQL Server are coming. Attunity connectors also get more readily available and covering more vendors.

2. Change Data Capture for SSIS

The Change Data Capture (CDC) is not new to SQL Server, but it is a new kind of an animal to SSIS:

Now with CDC one can easily capture the changes in data sources and provide them for reporting, data analysis or feed into the Data Warehouse.

3. Support for Variable Number of Columns in a Flat File

This is a productivity enhancement that potentially pays for a good portion of the upgrade fee (IMHO). I just happen to see how many developers stumble upon such a reality unable to overcome this barrier resorting to various online forums or blogs. No longer!

If you see a file as depicted below:

No fear, it will be understood by the SSIS engine and handled without incidents:

Hooray! No more time wasted and scratching your head!

4. Revamped Configurations

This is another big improvement.

Did you ever wonder why you deployed a package and it took the design time parameters? Did you struggle to deploy your config files or a database along with the package?

No longer! You now can have several configurations, for Dev and Prod, no problem. If you envied your fellow C# or VB .Net developer being able to store parameters right in the Visual Studio, no more, now you can, too. As an aside, there is no more BIDS, there is the new Data Tools, but to me it is a Visual Studio 2010, I just develop special projects in it, and it is a 1st class tool! And how about this: you can even add parameters after the package has been deployed? Do you feel thrilled as me? Not yet, then how about the possibility of sharing parameters across many packages within a project?

5. Script Component – you can debug it, finally!

If your heart is not beating faster by now, then let’s recall how much you struggled to find out why a Script Component does not work as expected? A value, or worse yet, three are not right?

Remember? No? I do, I remember how I needed to build a console app till 10 PM to just solve the mystery why the values were wrong sitting along in the office biting nails because at midnight a package just had to load the latest flight data. I wish I could just debug the mysterious component with 400 lines of code. Sigh and smile, now I will:

Better yet, all my runtime values are captured. Did I say it is a Visual Studio?

6. SSIS Package Format Changed and the Specs are Open Source!

Bye-bye the lineage IDs, cryptic, long XML! Hello comparable, mergable packages!

vs.

Easily compare packages with Diff tools now! Full specs are at: http://msdn.microsoft.com/en-us/library/gg587140.aspx

7. Built-in Reporting

Yes, there will be three canned reports provided for You, dear developer to benchmark, troubleshoot and just better support a live implementation:

8. Data Taps

This is totally new: have you ever been asked to fix a package with no rights to access the data source? I had such an “opportunity”, their DBA just shrugged off my requests to provide with a read only account. But now you are more in control, you can now turn on and off small data dumps to a CSV file for an ad-hock analysis. Those, most often, are instrumental in finding metadata differences and thus allowing a real quick fix to many issues.

9. Deploying Projects from Visual Studio

Yes, like I said, Visual Studio is the centerpiece to developing and deploying a SSIS solution. Now you need to think more project oriented as a result, so there is a bit of paradigm shift, or I would say you need to think of a project as unit more than of a package now in SSIS 2012 (for those not ready for the change the old deployment model still works, so not to worry).

So what is different, actually all and more simple, you just deploy with a right-click on the project, no more fiddling around with the Deployment manifest or manual copy and paste, import, etc.

The configurations are taken care of automatically!

10. Manage with PowerShell

Did I mention about the PowerShell book at the beginning of the post? I did this on purpose . SSIS 2012 provides with 1st class support to managing the SSIS indices as the SSIS catalog, package deployment and maintenance. You can craft and automate most tasks using an editor, just reference the needed module:

There are also the APIs to validate a package, configure and deploy a package:

Oh, I have just already covered 10 improvements, wait but there are more:

Un-do and Re-do are now possible (I can hear the wow!);
New designer surface (AKA canvas) with adorners

Shared (across a project) Connection Managers (no more click and copy, pastes)!
Shared (across packages in project) Cache Managers
Do you remember the dreaded errors all over the package after some metadata changed? Now you can resolve them all up the stream with a single click!
Group items to reduce clutter without resorting to sequence containers:

The ability to rollback to an older (and working) version of a package:

I can hear the applause…

Consideration for SSIS and SQL Server 2012 Business Intelligence Edition

An Important Consideration for SSIS and SQL Server 2012 Business Intelligence Edition

It is very easy to misinterpret what one sees in SQL Server Data Tools (SSDT – the new BIDS). When you open SSDT all SSIS components are available, whether or not those components are allowed in your edition of SQL Server. In addition, SSDT will allow you to use and execute these components in the SSDT Environment at design time.

SSIS 2012 BI Edition is Not SSIS 2012 Enterprise

You can build and test an SSIS package using Enterprise-Edition-only components, but you cannot execute this package outside the SSDT environment. Here’s a screenshot of a BI Edition toolbox:

Note the Enterprise Edition-only components: Fuzzy Grouping, Fuzzy Lookup, etc. Take a look at what happens when I execute a Data Flow Task that contains a Fuzzy Lookup Transformation in BI Edition:

Now take a look at what happens when I execute that same package using DtExec on the same virtual machine:

Lot of confusion !

Tuesday, 26 February 2013

New Features SQL Server 2008 R2

With the introduction of OLAP Services in version 7.0, BI features have always been a critical part of SQL Server. From a high level view the SQL Server BI offering is a complete and integrated offering with widespread delivery of business intelligence including Microsoft Office.

In this blog entry I will share some of the new BI features of SQL Server 2008 on three main areas SSIS, SSRS and SSAS.

SQL Server Integration Services (SSIS) 2008

With version 2008 a more scalable and efficient architecture for extraction, transformation, and loading (ETL) is offered to the customers. Some of the new features include the following.

SSIS pipeline improvements: In SQL Server 2005, the SSIS pipeline execution engine did not scale up to multi-proc machines with more than two processors or CPUs. The Integration Services design in SQL Server 2008 improves the scalability of runtime to multiple processors

SSIS data profiling: The new data profiling feature in SSIS 2008 identifies patterns in data. This feature generates statistics that can be viewed by the stand-alone Data Profile Viewer.

Lookups: Performance is improved, access to new data sources including XML and a new component TXLookup transformation is added.

The new MERGE SQL statement allows you to insert, update, or delete data based on certain join conditions in the same statement where in previous versions of SQL Server you had to create separate statements.

CDC (Change Data Capture) is used to capture insert, update and delete activity on a SQL table using the transaction log and placing these data in a separate relational table.

Data Warehousing: It’s now easier to manage large tables more effectively by transparently breaking them into manageable blocks of data. Also with the resource Governor you can now assign limits on CPU and memory use for each workload. And finally with native data compression techniques disk I/O can be dramatically reduced increasing overall performance.

SQL Server Reporting Services (SSRS) 2008

In SSRS 2008, a report server is implemented as a Windows-based service that hosts the Report Manager, the Report Server Web service, and background processing feature areas decreasing the dependence on IIS and also consolidating server applications into a single service reduces the configuration and maintenance tasks.

The report server has new management features to set a memory threshold for background operations and performance counters for monitoring service activity.

SRS 2008 supports two modes of deployment for report server, the native mode and the SharePoint integrated mode. In the native mode, a report server is a stand-alone application server that provides viewing, management, and delivery of reports and report models.

In the SharePoint integrated mode, a report server becomes part of a SharePoint Web application. Users of Windows SharePoint Services 3.0 can store reports in SharePoint libraries and access them from SharePoint sites that are used for accessing other business documents.

Office Word 2007 rendering extension can be used to export a report to a Word document and Office Excel renderer has been enhanced to support features like nested data regions and sub-reports.

Report Designer improvements:

· Entity hierarchies: Provides a flattened analytic-style metadata browser that presents all entities as a flattened list.

· Live data in design view: Allows display of live data by using simple iteration of design-time elements.

· Instances in metadata browser: Extends the metadata browser to include instance data.

· Filtering on the design surface: Adds UI elements for defining basic filter conditions directly on the design surface.

· Multiple data sources: Builds reports against more than one data source and has more than one data region for each report.

· Office12 ribbon: Provides UI parity with Microsoft Office 2007 products.

· Conditional formatting: Provides conditional formatting, which is a recommended customer feature.

SQL Server Analysis Services (SSAS) 2008

In SSAS 2008, Multidimensional Expressions (MDX) query performance has improved because Analysis Services can deal better with cube space. SSAS 2008 divides the space to separate calculated members, regular members, and empty space to better evaluate cells that need to be included in calculations.

The new MOLAP-enabled write back capabilities in SSAS 2008 remove the need to query relational online analytical processing (ROLAP) partitions. This provides you with enhanced write back scenarios from within analytical applications without affecting the traditional online analytical processing (OLAP) performance.

A single read-only copy of SSAS 2008 database can be shared between many SQL Servers through a virtual IP address. This creates a highly scalable deployment option for an Analysis Services solution.

With SQL Server 2008 backup compression, less storage is required to keep backups online. The backups also run significantly faster because lesser disk I/O is required.

In SSAS 2008, data mining models have been enhanced by appending a new algorithm to the Microsoft Time Series algorithm. This improves the accuracy and stability of predictions in the data mining models. The new algorithm is based on the Auto Regressive Integrated Moving Average (ARIMA) algorithm, and provides better long-term predictions than the Auto Regression Trees with Cross Predict (ARTxp) algorithm that has been used in SQL Server 2005 Analysis Services (SSAS 2005).

You can use the Analysis Services personalization extensions to create new Analysis Services objects and functionality, and then provide these objects and functionality dynamically in the context of the user session.

SQL Server 2008 generates Analysis Management Objects (AMO) warnings spanning all objects to alert you when you violate any of the design best practices or make logical errors in database design

The improvements in designing dimensions include features such as the Attribute Relationship designer, a simplified and enhanced Dimension wizard, and the Key Columns dialog box.

It hardly seems possible, but SQL Server 2008 R2 is almost ready. Like most R2 releases, SQL Server 2008 R2 builds on the functionality of the base SQL Server 2008 release by improving scalability as well as BI features.

1. Support for 256 logical processors
Organizations pushing the high end of scalability will want to take advantage of SQL Server 2008 R2 support for up to 256 logical processors (which requires Windows Server 2008 R2). This is a big improvement over the previous maximum of 64 processors.

2. Improvements in multi-server management
SQL Server Management Studio (SSMS) has always been great for managing SQL Server. However, SSMS was really for single server management and wasn’t strong in the area of multi-server management. New wizards, dashboards, and enhancements to Policy Based Management will boost SQL Server 2008 R2’s multi-server management capabilities.

3. Master Data Services
Almost all large organizations face the problem of multiple data definitions where the same data is defined and used differently in various locations. SQL Server 2008 R2’s new Master Data Services (MDS) feature provides a central portal from which administrators can create and update master data members and hierarchies, with the goal of ensuring data consistency across the enterprise.

4. Geospatial visualizations in Reporting Services
Building on the geospatial data types in the base SQL Server 2008 release, SQL Server 2008 R2’s Reporting Services will support reports with visual geographic mapping. New geospatial features include visualizations for mapping routine and custom shapes and support for Microsoft Virtual Earth tiles.

5. Self-service BI with Project Gemini
Project Gemini is the core feature of the SQL Server 2008 R2 release. Its goal is to enable organizations to more easily adopt and take advantage of SQL Server’s BI capabilities. It provides self-service BI through a powerful Excel add-in and integrates with SharePoint for centralized management. Project Gemini will require Excel 2010 and SharePoint 2010. For more about Gemini, check out Sheila Molnar’s interview of Microsoft’s Donald Farmer, InstantDoc ID 102613, in this issue of SQL Server Magazine.

Star Join Query Optimizations

Business Intelligence (BI) plays a significant role in businesses nowadays. Moreover, the databases that deal with the queries related to BI are presently facing an increase in workload. At present, when queries are sent to very large databases, millions of rows are returned. Also the users have to go through extended query response times when joining multiple tables are involved with such queries. ‘Star Join Query Optimization’ is a new feature of SQL Server 2008 Enterprise Edition. This mechanism uses bitmap filtering for improving the performance of some types of queries by the effective retrieval of rows from fact tables.

Improved Query Response Times
In general, data warehouses employ dimensionally modeled star or snowflake schemas. These schemas have one or more than one fact tables that contain transactional data and many dimension tables, which holds information such as product data, customer information, and times and dates – all these define the fact table data. Usually, foreign keys are employed for maintaining relationships between the rows in fact tables and also between the rows in the dimension tables. Databases that contain star schemas are recognized by SQL Server 2008 Enterprise. It uses the new Star Join Query logic for processing queries against such star schemas more efficiently. Typically, on an average, data warehouse queries run faster to approximately 20 percent.

Automatically Implemented
Star Join Query Optimization is automatically implemented by the SQL Server. It does not require a special database or application configuration. The query processor will usually optimize queries with medium selectivity (this refers to the queries that retrieve approximately 10% to 75% of rows from a fact table). Such queries are usually handled using hash joins to join the dimension and fact tables by employing the foreign keys to identify the matching rows. A hash table is built for each dimension table referenced in the query in the case of hash joins; the optimization process uses these hash tables for deriving bitmap filters. The key values from each dimension table are identified by bitmap filters; these key values qualify for inclusion in the query. When the fact table is scanned, the bitmap filters are applied to it. These bitmap filters eliminate those rows of the fact table which are not qualified for inclusion in the result set. The most selective bitmap filter is applied first as it is found to eliminate the highest number of rows. Since the eliminated rows do not need further processing, the subsequent filters need not be applied to them – this way the process becomes more efficient.

Query Selectivity
The performance is enhanced in the case of medium selectivity queries while using bitmap filtering because the rows are filtered before any joins are implemented. Hence, there is a decrease in the number of rows that are processed by each join. Bitmap filtering is not applied when queries are highly selective (i.e., those queries that return less than 10% of the rows in a fact table). In such case, a nested loop join is found to be generally more efficient. Similarly, when the queries are not very selective at all (queries which return more than 75% of the rows in a fact table), bitmap filtering is not applied as there are very few rows to be filtered, and hence, there is no requirement of enhancement in performance in this case.

Integer Data Types
Star join optimization is found to give the highest efficiency when the data type of the columns used in the joins is integer. This feature enables the bitmap filter to be applied as part of the initial table or index scan rather than being used at a later stage in the query plan. Most of the queries are benefited from star join optimization since foreign key relationships are commonly implemented using integer-based alternate key columns.

SQL Server 2008 will offer much more powerful relational data warehousing capabilities than its predecessor. As part of the development effort for the star join optimization in SQL Server 2008, Microsoft performed a number of performance studies based on benchmark and real customer workloads and made a significant enhancement.

Star Join Query Optimization is a new feature of SQL Server 2008 Enterprise Edition. This mechanism uses bitmap filtering for improving the performance of some types of queries by the effective retrieval of rows from fact tables.

SQL Server 2008 Star Join Query has

Improved Query Response Times
Automatically Implemented
Query Selectivity
Bitmap Filter – Integer Data Types

As per the Microsoft, they done a deep testing and It’s worth looking at the results from three of these workloads in the following figures.

Figure : Performance improvements with star join optimization

As the figure shows, all workloads improve significantly, by 12 up to 30 percent. While your individual mileage will vary, Microsoft expect decision support workloads against the SQL Server Engine to improve by approximately 15–20 percent based on the extension of the star join-specific optimizations that are new in SQL Server 2008.

Friday, 22 February 2013

Business Intelligence FAQ's

What is Business Intelligence (BI)

Business Intelligence (BI) is a broad term for a software technology that provides reporting and analytical tools for pulling data from various sources to generate customizable reports and support decision making. It allows administrators and other business decision makers to view reliable, timely data.

Business Intelligence simplifies information discovery and analysis, making it possible for decision-makers at all levels of an organization to more easily access, understand, analyze, collaborate, and act on information, anytime and anywhere. BI helps move from just consuming information to developing deep contextual knowledge about that information. By tying strategy to metrics, organizations can gain competitive advantage by making better decisions faster, at all levels of the organization.

BI is the capability that transforms data into meaningful, actionable information.

Business Intelligence (BI) software consolidates, analyzes and displays the data collected or generated by your company’s transactional business systems to provide powerful insights into how your business works and how it can be improved.

BI software consolidates data from different sources and assembles it in “data warehouses” or “data marts” that eliminate distinctions in data formats. It then presents the results through a reporting, analytics or dashboard interface. BI software thus serves as a common platform for shared, company-wide insight. BI software makes analysis and report-making much faster and more reliable.

Using BI software, users can do things like profile customer behavior, monitor buying patterns, detect bottlenecks an opportunities to reduce costs, and pinpoint top customers. The data warehouse serves as a gateway to the underlying data systems. Users can drill-down into the source data to find the details behind a particular query.

Properly implemented, BI is a decision-support system that puts the entire wealth of your enterprise’s electronically stored information to optimal use.

What is Data Warehouse (DW)?

A data warehouse is a repository of your organization’s stored data. In essence, it is a place where content from different systems can be consolidated together in one logical location. A related term is “data mart”. Although there can be important design differences between data warehouses and data marts, in general a “warehouse” supports an entire corporation or enterprise, and a “mart” supports one specific business area. For the purposes of this discussion, we will not make a distinction between data warehouses and data marts.

What is a "Cube"?

A Cube is a data warehose in which data is partitioned in dimensions. This approach makes it much easier for users to understand and manipulate the data available to them. In BI data warehouses, data is stored in a multi dimensional cube that is optimized for analysis and reporting. You can have not just two or three dimensions, but basically as many as you need.

ETL is the name given to the software tools that create cubes from existing data sources.

Who can benefit from Business Intelligence (BI)?

Business Intelligence (BI) is for businesses that want to drive intelligent decision-making throughout their organizations and make it easy for everyone in the organization to collaborate, analyze, share, and act on business information from a centrally managed, more secure source. Business Intelligence (BI) supports IT professionals, information workers, and developers. BI can empower and cater to the needs of the business users of organizations of all sizes.

How can Business Intelligence (BI) be useful to my organization?

Business Intelligence (BI) connects the right people to the right information at the right time. With 1KEY BI, all the business users right from managers to the operational staff, frontline executives to back office team get the power to define ad hoc reports with intuitive, highly interactive & user friendly interface coupled with multi formatted visually stunning & flexible data presentation layers. It becomes easy to report on and analyze data to efficiently manage personnel objectives across key indicators. 1KEY end users can perform sales, category, and inventory analyses more effectively to track product sales, product mix, and inventory levels.

Does Business Intelligence (BI) works with existing applications and databases?

Yes. Business Intelligence (BI) solution like 1KEY is designed to interoperate with data that exists in virtually any enterprise data source, such as Oracle, IBM DB2, or Sybase. For example, you can use SQL Server 2005 Reporting Services to run reports on data stored in an Oracle database. Also, SQL Server 2005 Integration Services can be used to extract data to or from any OLEDB data source. 1KEY can connect to multiple database, multiple applications like ERP, CRM, SCM, HRM, financial accounting or any other core transactional application (SAP, Orracle E-Business Suite, PeopleSoft, JD Edwards, Siebel, QAD MFG/Pro, Microsoft Dynamics, Navision, Ramco, Tally.ERP)

Does the BI accommodate requirements that organizations have to provide data across the organization?

The BI capabilities that 1KEY provides are geared more towards reporting and analyzing data rather than collecting the data.

Should BI projects be prioritized?

Organizations need to determine the tangible benefits such as eliminated cost of producing legacy reports, with BI. Secondly BI enforces access to data for the entire organization. In this way even a small benefit, such as a few minutes saved, will make a difference when it is multiplied by the number of employees in the entire organization.

Does BI project require Data Warehousing?

No. Not all BI projects require Data Warehousing. BI tools like 1KEY, can give reporting and analytics to the business users across the enterprise from any data source directly without having the need to build a data warehouse.

Data Warehouse facilitates reporting and analysis. It is used for an organisation's need for reliable, consolidated, unique and integrated reporting and analysis of its data, at different levels of aggregation. Data Warehousing is not just into accumulating historical data, but analyzing them as well to help Business Decision Makers make effective decisions that affect their company’s bottom line.

Benefits of Data Warehouse

- Data Warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc.

- Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.

- Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.

- Data warehouses facilitate BI reports such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals.

What do various terms like Dashboard, KPI, Operational BI, Analytics, OLAP?

An executive dashboard is a user interface, usually Web-based and often with a BI system/data warehouse on the backend, that organizes and presents information in a way that is easy to read and interpret. Somewhat akin to an automobile's dashboard, executive dashboards often have graphical elements, such as color-coding, gauges and charts.

Key Performance Indicator (KPI) is a measure of performance which are commonly used to help an organization define and evaluate how successful it is, typically in terms of making progress towards its long-term organizational goals. KPIs can be specified by answering the question, "What is really important to different stakeholders?". KPIs are monitored using BI to assess the present state of the business and to assist in prescribing a course of action. KPIs are frequently used to "value" difficult to measure activities such as the benefits of leadership development, engagement, service, and satisfaction. KPIs differ depending on the nature of the organization and the organization's strategy. They help to evaluate the progress of an organization towards its vision and long-term goals, especially toward difficult to quantify knowledge-based goals. A KPI is a key part of a measurable objective, which is made up of a direction, KPI, benchmark, target, and time frame. For example: "Increase Average Revenue per Customer from $100 to $150 by year end 2010". In this case, 'Average Revenue Per Customer' is the KPI.

Operational business intelligence is BI that is available enterprise-wide – to mid-level managers and front-line operational workers. Ideally, operational BI delivers timely information to employees at all levels, so important metrics and other data are delivered in the context of associated business processes.

Data analytics is the science of examining raw data, using sophisticated software and analytic algorithms, for the purpose of drawing conclusions about that information. Data analytics is often distinguished from data mining by the scope, purpose and focus of the analysis.

OLAP - Online analytical processing is automated processing and analysis of data, defined by olapreport.com as "fast analysis of shared multidimensional information.

What are the challenges faced while successfully implementing Business intelligence?

Most companies grasp the potential value of successful business intelligence applications. However, a common challenge that IT professionals encounter is how to find successful BI application that are adopted by users and make a positive impact on the organization and on the bottom line. It can be difficult to structure effective BI project teams, select the right BI application, manage business-IT communications and measure BI success. BI is robust application, developing and optimizing successful BI application is always a challenge which involves structuring BI teams and managing business-IT communication.

How much does Business Intelligence (BI) cost? Is there going to be a proven ROI?

Believe it or not, you can start benefiting from BI technology for much better than you may think (and much quicker than you may think). Let's take an example. Here are some ballpark numbers and timelines so you can have realistic expectations. For $9,000 and within 4 weeks, 1KEY BI can be implemented enterprise wide. The actual prices may vary depending on the modules you choose to implement from the BI stack and how many people you deploy them to. Please contact your local sales representative or MAIA partner for more information about Business Intelligence pricing.

In the early stages of a BI deployment, most companies measure value based on IT-centric, quantifiable criteria, including improved operational efficiency and minimized total cost of ownership. But today, as more BI deployments mature and companies focus on optimizing business processes, the metrics have changed and are more difficult to interpret. Measuring the value of BI is more of a matter of mind over money.

Which industries can benefit from BI?

Companies across all the industries like Manufacturing, BFSI, Retail, Service, Telecom, ITES, Pharma & Healthcare, Infrastructure & Construction, Automotive, Education, Government, Public Sector, etc. can benefit from BI.

Which business Processes / departments can benefit from BI?

Business processes / departments like Contact Centers, Field Service, IT Services Engagements, Human Resources, Marketing & Advertising, Supply & Demand Chain, Retail Operations, IT Operations Performance, Insurance Underwriting can get benefited from BI

What are the examples of BI Initiatives (Focus Areas)?

Customer Satisfaction, Problem Resolution, Loyalty & Customer Value, Up-sell & Cross-sell, Customer Churn, Customer Segmentation, Cost Control, Revenue & Profitability, Performance Management, Outage / Downtime, Defect, Marketing Effectiveness, Collections are some of the examples of BI initiatives.

Who are BI systems intended for?

For any member of the company who makes decisions based on transactional systems' data. Such systems may be oriented to different levels (analysts, managers, directors, auditors, etc.), providing different kinds of information in each case.

Can BI be integrated with any system active at the company?

Yes, BI can be integrated directly, through the development of interfaces that extract and update transactional systems’ data.

Does BI require a special technological platform?

No, BI can connect to database, because they generate their own repository to upload and transform the data that will be fed into the multidimensional cubes.

Is it necessary to have specialized staff for BI?

No, BI tool like 1KEY are available with easy to learn and use features for queries, automation of data extraction interfaces and distance support ensure their use rapidly, without engaging company’s staff specifically.

Can they be applied to any company area?

Yes, through transactional systems’ data extraction interfaces, you can resolve systems for: sales, auditing, suppliers, accounting, production, etc., and additionally, integrating key indicators for each area, summarizing the information required by the company’s first level decision-makers.

How do I recover my investment?

Companies that have not adopted a decision-making system yet have a “hidden cost” to access information: time to enter data in transactional systems, upload and transformation, presentation through reports, etc. Features provided by BI systems decrease such costs dramatically, because information is available in a timely manner, with integration, security and orientation to optimize decisions that empower the company’s activities.

Return on investment (ROI). A ROI can be achieved in a number of ways, such as:

- Lower cost – Costs could be lowered through better inventory management, fewer dollars spent on unproductive measures, product promotions, and so on

- Improved productivity – Greater productivity could be expected from both IT and the user. Today user analysts may spend 80 percent of their time gathering data and only 20 percent analyzing the data. The data warehouse should reverse those numbers. IT will still be responsible for developing complex reports, as well as writing reports for production systems. The data warehouse can provide reporting tools with a well documented, clean and easily accessible database. This capability should significantly improve IT productivity

- Increased revenue – This could be a result of greater market share and increased sales as marketing is able to more effectively target customers and provide the right products at the right time to the right market. The effects of costs and revenues may be difficult to assign to the impact of the data warehouse. As the data warehouse is being implemented, the organization is not standing still. There are both internal and external factors that impact costs and revenues, so the actual benefits of the data warehouse may be difficult to determine

Pushdown Optimization and Types in Informatica

Pushdown optimization technique in informatica pushes the part or complete transformation logic to the source or target database. The integration service translates the transformation logic into SQL queries and sends those SQL statements to the database. Then the source or target database runs those SQL queries to process the transformation logic.

The amount of transformation logic you can push to the database depends on the database, transformation logic, mapping and session configuration. The Integration Service processes all transformation logic that it cannot push to a database.

To preview the SQL statements and mapping logic that the integration service pushes to the source or target database, use the pushdown optimization viewer.

Pushdown Optimization Types:

There are three types of pushdown optimization techniques that you can configure in a session. They are:

Source-side Pushdown Optimization
Target-side Pushdown Optimization
Full Pushdown Optimization

Source-Side Pushdown Optimization:

The Integration Service first analyzes the mapping from the source to the target or until it reaches a downstream transformation that it cannot push to the source database. Then the integration service generates the SELECT statement and executes the SQL in the source database. Then it reads the result rows and process the remaining transformations.

Target-Side Pushdown Optimization:

The Integration Service first analyzes the mapping from the target to the source or until it reaches an upstream transformation that it cannot push to the target database. Then the integration service generates the Insert, Update or Delete statements. The integration service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQLs on the target database.

Full Pushdown Optimization:

You can use full pushdown optimization only when the source and target databases are in the same relational database management system.

The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database. If the Integration Service cannot push all transformation logic to the target database, it tries to push all transformation logic to the source database.

If it cannot push all transformation logic to the source or target, the Integration Service pushes as much transformation logic to the source database, processes intermediate transformations that it cannot push to any database, and then pushes the remaining transformation logic to the target database. The Integration Service generates and executes an INSERT SELECT, DELETE, or UPDATE statement for each database to which it pushes transformation logic.

Pushdown Optimization and Slowly Changing Dimensions:

Slowly changing dimensions type 1 and type 3 logic can be pushed to the database using pushdown optmization technique. Use the following rules and guidelines for pushing SCD logic to database.

Type 1 and type 3 SCDs can be pushed to oracle and IBM DB2 databases.
Source data must not have duplicate rows. The database can become deadlocked if it makes multiple updates to the same row.
You must create SCDs using wizard in verion 8.5 or higher. SCDs created in lower version cannot be pushed to database.

When to Use Pushdown Optimization:

Use pushdown optmization technique when the source or target dabase gives better performance when compared to the informatica.

Reverse the Contents of Flat File – Informatica

Q1) I have a flat file, want to reverse the contents of the flat file which means the first record should come as last record and last record should come as first record and load into the target file.

As an example consider the source flat file data as

Informatica Enterprise Solution
Informatica Power center
Informatica Power exchange
Informatica Data quality

The target flat file data should look as

Informatica Data quality
Informatica Power exchange
Informatica Power center
Informatica Enterprise Solution

Solution:

Follow the below steps for creating the mapping logic

Create a new mapping.
Drag the flat file source into the mapping.
Create an expression transformation and drag the ports of source qualifier transformation into the expression transformation.
Create the below additional ports in the expression transformation and assign the corresponding expressions

Variable port: v_count = v_count+1
Output port o_count = v_count

Now create a sorter transformation and drag the ports of expression transformation into it.
In the sorter transformation specify the sort key as o_count and sort order as DESCENDING.
Drag the target definition into the mapping and connect the ports of sorter transformation to the target.

Q2) Load the header record of the flat file into first target, footer record into second target and the remaining records into the third target.

The solution to this problem I have already posted by using aggregator and joiner. Now we will see how to implement this by reversing the contents of the file.

Solution:

Connect the source qualifier transformation to the expression transformation. In the expression transformation create the additional ports as mentioned above.
Connect the expression transformation to a router. In the router transformation create an output group and specify the group condition as o_count=1. Connect this output group to a target and the default group to sorter transformation.
Sort the data in descending order on o_count port.
Connect the output of sorter transformation to expression transformation (don’t connect o_count port).
Again in the expression transformation create the same additional ports mentioned above.
Connect this expression transformation to router and create an output group. In the output group specify the condition as o_count=1 and connect this group to second target. Connect the default group to the third group.