Thursday, December 31, 2015

SSIS Best Performance tuning considerations

1.     Other Performance considerations

2.     Packet size in connection should equal to 32767

3.     Consider using NOLOCK in source table

4.     Select only columns you need in source query

5.     Use Shared lookup cache in lookup transfer

6.     Consider is the transformation fall under syn/asynchronous type

7.     Data types as narrow as possible for less memory usage

8.     Do not perform excessive casting

9.     Use merge instead of SCD

10. Use group by instead of aggregation

11. insert into instead of a data flow on a single sql instance

12. unnecessary delta detection vs. reload

13. use sql server destination

14. commit size 0 == fastest

15. drop some indexes based on load growth %

16. clustered indexes:+inf%(dont drop)

17. single nonclustered index: >=~100%

18. multiple nonclusted index: >=~10%(varies)       

19. truncate,not delete

20. In “Table or view” access mode, the OLE DB source adapter calls OpenRowset to get column metadata at Validate phase. The data returned from OpenRowset include more than just column metadata. Thus the adapter issues “SET ROWCOUNT 1” statement to retrieve column metadata information. “SET ROWCOUNT 1” causes an inefficient execution plan (i.e. Nested Loop) to be cached and later used in the subsequent Execute phase.

21. In “SQL command” access mode, the OLE DB source adapter calls “sp_prepare” to get column metadata at Validate phase, and “sp_execute” at Execute phase. The execution plan used at Execute phase is Hash Match which is more efficient than Nested Loop.


22. Avoid many small buffers. Tweak the values for DefaultMaxBufferRows and DefaultMaxBufferSize to get as many records into a buffer as possible. To optimize the Pipeline, the goal is to pass as many records as possible through the fewest number of buffers, especially when dealing with large data volumes. 

23. DefaultMaxBufferRows and DefaultMaxBufferSize are configured separately for each Data Flow task. When integrating data from multiple data sources using a single Data Flow task, these two settings will only impact the data source components and transformations in that task. 

24. Do not increase buffer size to the point where paging to disk starts to occur. 

25. Design the package in such a way that it does a full pull of data only in the beginning or on-demand, next time onward it should do the incremental pull, this will greatly reduce the volume of data load operations, especially when volumes are likely to increase over the lifecycle of an application. For this purpose, use upstream enabled CDC (Change Data Capture) feature of SQL Server 2008; for previous versions of SQL Server incremental pull logic.

26. It is recommended to set two of the data flow task properties viz. BufferTempStoragePath and BLOBTempStoragePath instead of using default values. The location provided should be of some fast drives. It is also recommended that BufferTempStoragePath and BLOBTempStoragePath should point to drives which are on separate spindles in order to maximize I/O throughput


No comments:

Post a Comment