SAP Data Services delivers a single enterprise-class solution for data integration, data quality, data profiling, and text data processing that allows you to integrate, transform, improve, and deliver trusted data to critical business processes. It has many advantages but there are some challenges that can occur. In this blog, I will give an overview of various strategies within a SAP Data Services that will improve your integration job’s performance time.
Suggestion #1 – Push Down Operations
Data Services can push down SELECT operations to the source database and put them together into one request to the database. These operations include joins, group by, and other common functions. The software also allows you to include aggregations and filters in SQL statements to decrease the size of the data retrieved and processed. The two different types of push down operations are full and partial.
A full pushdown operation exists when the application can push down all of the transform operations to the databases and there is a direct data stream from the source to the target database. These two databases (source and target) must either reside in the same data store, reside in data stores that have database links, or reside in data stores that have linked remote servers. SAP Data Services can still execute a partial push down to just the source database if a full push down is not possible. Examples of partial operations are aggregations, distinct rows, filtering, joins, ordering, projection, and functions correspondent to the source database.
Unfortunately, there are operations that Data Services is not able to push down to the source database. Examples of these are expressions with functions that do not exist within the source database, operations including triggers, join operations between sources existing on different servers without database links, and any other transform than a Query.
Suggestion #2 – Caching
Caching data can greatly improve performance of transformations of data that take place in memory. This strategy reduces the amount of calls made to the database to retrieve data. The two different types of caches in SAP Data Services are In-Memory and Pageable cache. The default cache type is pageable cache. The in-memory cache option is a good technique to use when you are only processing a small dataset that will fit into memory and the pageable cache option is a good option if the dataset will exceed the available memory limit.
Suggestion #3 – Parallel Execution
SAP Data Services is able to handle data extraction, transformation, and loading in parallel. The application accomplishes this by parallel options for sources, transforms, and targets. Data flows and workflows can also execute in parallel when not connected in the workspace. The Parallel process threads option is an example of these settings. The best practice for adjusting this number is to set the option to the number of CPUs on your Job Server computer.
Suggestion #4 – Distributed Execution
SAP Data Services accomplishes distributed execution by breaking a data flow into separate sub data flows and utilizing additional memory from various gigabytes of memory or from another computer. As an example, if your computer has six gigabytes of memory, you could have three sub data flows executed using distribution with two gigabytes associated with each sub data flow. This feature allows SAP Data Services to distribute operations that are CPU and memory intensive such as joins or grouping. This leads to better memory management and job performance. The Run as separate process option breaks the data flow into various sub data flows.
Other Performance Tuning Techniques
Join rank settings and array fetch size are two other techniques to consider when improving job performance in SAP Data Services. The join rank dictates the order in which sources are coupled. Sources with higher join ranks are joined before sources with lower ranks. The default for join rank is zero and this setting requires it to be a non-negative integer. The array fetch size setting controls the amount of data retrieved from a source database per request. Increasing this number will also increase the amount of data retrieved from a source database per request. SAP Data Service will automatically retrieve 1000 rows of data and the option’s maximum is 5000 bytes.
I hope you find these strategies helpful and useful when developing integration jobs using SAP Data Services.