Blog Posts

DataStage Training - Best Practice Techniques to Become a Pro

So you are pursuing DataStage Training, here are some useful practices to make you a Pro DataStager when you finish your course. It’s time to heighten your skills, broaden your knowledge, and be a competitive player in today's BI marketplace with a few easy yet crucial practices covered in this article.

1. Don’t go for Pointless Conversions

It’s crucial to verify whether or not run-time schemas match the job design, and for that, set OSH_PRINT_SCHEMAS (to print record schema of all data sets & interfaces).

2. Careful with the Usage of Transformer Stage

Transforming stage can be a bit slow. Avoid multiple stages that incorporate the entire functionality into a single stage and use other stage types for executing simple transformation operations.

3. Augment Sort Performance

Careful job design is a key to augment sort performance in both on-link sort in input page partitioning and standalone Sort stages.

4. Eliminate Inessential Columns

During the DataStage training, creating columns for data seems important, but on Business-level, it increases complexity. Hence, it’s recommended to eliminate unnecessary columns or the columns you don’t need anymore. The inessential columns occupy the buffer memory and as a result, affect performance when it comes to transferring columns to the next stage.

5. Never Read from Sequential File with the Same Partitioning method, Ever!

If you don’t have the practice to specify more than one source file (in case you are using any), the entire file will be read sequentially that too into a single partitioning and you might end up spending hours on repartitioning the redundant data.

6. Additional Practices Tips

• On a performance perspective, using individual SQL statements can be an expensive affair. In various cases, ‘DataStage Joins’ among DB2 reference data and the input is faster than Sparse Lookup. So make your choice wisely.
• In the case where DB2 or Oracle table reference rows are smaller than input row (1:100 or more), a Sparse Lookup is appropriate.
• Bulky parallel jobs can be handled through a mechanism called Parallelism (when CPU intensive application performs multiple demanding operations supporting system capacity), especially when the memory is divided among different partitions.
• Applications those extract data from and fill data into RDBMS are highly disk or I/O-exhaustive. Such applications benefit from a configuration where the logic node and disk spindle numbers are equal.
• Wherever it’ required, turn-off the runtime columns as it will accumulate lots of space and processing time.
• Stick to dedicated servers or at least dedicated CPU in case of virtualization
• Use enterprise scheduler than the Job scheduler (depending on its availability)
• Enterprise Scheduler can handle – Dependency (cross applications and platform), Resource, & Concurrency Management, includes DataStage Upgrade that handles dependency during transitions, and handle every Active-to-Active Engine needs.
• Every ideal IO system requires files: Target Files Folder, Resource Disk, and Scratch Disk.

DataStage is the future of BI. From traditional to modern, it encapsulates every crucial feature within it. Being a workhorse in data integration it examples how and why keeping one’s organization connected to DataStage is imperative. Of course, popularity has led many people choosing DataStage over other technologies. So stand out from the crowd and make yourself business ready by implementing these handy practices during your DataStage training.

Views: 3

Comment

You need to be a member of On Feet Nation to add comments!

Join On Feet Nation

© 2024   Created by PH the vintage.   Powered by

Badges  |  Report an Issue  |  Terms of Service