1. Other Performance considerations
2. Packet size in connection should equal to
32767
3. Consider using NOLOCK in source table
4. Select only columns you need in source
query
5. Use Shared lookup cache in lookup transfer
6. Consider is the transformation fall under
syn/asynchronous type
7. Data types as narrow as possible for less
memory usage
8. Do not perform excessive casting
9. Use merge instead of SCD
10. Use group by instead of aggregation
11. insert into instead of a data flow on a
single sql instance
12. unnecessary delta detection vs. reload
13. use sql server destination
14. commit size 0 == fastest
15. drop some indexes based on load growth %
16. clustered indexes:+inf%(dont drop)
17. single nonclustered index: >=~100%
18. multiple nonclusted index:
>=~10%(varies)
19. truncate,not delete
20. In “Table or view” access mode, the OLE DB
source adapter calls OpenRowset to get column metadata at Validate phase. The
data returned from OpenRowset include more than just column metadata. Thus the
adapter issues “SET ROWCOUNT 1” statement to retrieve column metadata information.
“SET ROWCOUNT 1” causes an inefficient execution plan (i.e. Nested Loop) to be
cached and later used in the subsequent Execute phase.
21. In “SQL command” access mode, the OLE DB
source adapter calls “sp_prepare” to get column metadata at Validate phase, and
“sp_execute” at Execute phase. The execution plan used at Execute phase is Hash
Match which is more efficient than Nested Loop.
22. Avoid many small buffers. Tweak the values
for DefaultMaxBufferRows and DefaultMaxBufferSize to get as many records into a
buffer as possible. To optimize the Pipeline, the goal is to pass as many
records as possible through the fewest number of buffers, especially when
dealing with large data volumes.
23. DefaultMaxBufferRows and
DefaultMaxBufferSize are configured separately for each Data Flow task. When
integrating data from multiple data sources using a single Data Flow task,
these two settings will only impact the data source components and
transformations in that task.
24. Do not increase buffer size to the point
where paging to disk starts to occur.
25. Design the package in such a way that it
does a full pull of data only in the beginning or on-demand, next time onward
it should do the incremental pull, this will greatly reduce the volume of data
load operations, especially when volumes are likely to increase over the
lifecycle of an application. For this purpose, use upstream enabled CDC (Change
Data Capture) feature of SQL Server 2008; for previous versions of SQL Server
incremental pull logic.
26. It is recommended to set two of the data
flow task properties viz. BufferTempStoragePath and BLOBTempStoragePath instead
of using default values. The location provided should be of some fast drives.
It is also recommended that BufferTempStoragePath and BLOBTempStoragePath
should point to drives which are on separate spindles in order to maximize I/O
throughput
No comments:
Post a Comment