Are you aware that the process which may slow down the PDI (Pentaho Data Integration) is the insert/update step?

We GrayMatter Software Service are glad to describe the process to avoid in PDI (Pentaho Data Integration), namely, insert/update step, which is the reason for slow down in PDI process. Explanation is given below in brief.

Get a Free Quote for Pentaho Consulting

    Pentaho Data Integration (PDI) Insert/Update step by step process slows down the PDI process as mentioned below

    Let us take an example of loading a target table. Assume that there is a daily load of 100k records into a target table with 10million records and every incoming row from the source table looks up against all the 10 million records in the target table. This process continues for all the 100k input records.

    Quick Enquiry





      Are you aware that the process which may slow down the PDI (Pentaho Data Integration) is the insert/update step?

      We GrayMatter Software Service are glad to describe the process to avoid in PDI (Pentaho Data Integration), namely, insert/update step, which is the reason for slow down in PDI process. Explanation is given below in brief.

      Get a Free Quote for Pentaho Consulting

        Pentaho Data Integration (PDI) Insert/Update step by step process slows down the PDI process as mentioned below

        Let us take an example of loading a target table. Assume that there is a daily load of 100k records into a target table with 10million records and every incoming row from the source table looks up against all the 10 million records in the target table. This process continues for all the 100k input records.

        Quick Enquiry





          Insert/Update step involves multiple round trips to the database depending on the commit size.

          Performance of most steps depends heavily upon the number of round trips, speed of round trip which is a combination of speed of network, the latency on the network and the performance of database.

          Find below the wiki link to explain the Insert/Update step

          Steps to Insert / Update

          As there is an extra lookup process involved in this step, it definitely slows down the process as it needs to run through the entire lot of records to check for a matching record ( to update ) and if none matches ( to perform insert ).This step is also slower than the regular “Table Output step”.

          Addressing the above situation, on the network latency side, there is very little one can do. In the lookout for other possible solutions, reduction of number of round trips to the database is the first thing to be considered. How this can be accomplished is by having a mechanism to load lookup data in memory (cache)

          In PDI, most Lookup steps have options to cache data. Steps such as “Stream Lookup” & “Merge Join” are some of them.

          In the above scenario, one can design the transformation with Merge join” step to perform operations within a single statement, therefore, minimizing the number of times data in the source and target tables are processed. This helps in less memory consumption and good performance. “Merge join” requires input data for the step to be sorted.

          Find below the wiki link to explain the Merge Join step.

          Steps to Merge / Join

          Our Innovative Pentaho Projects by Service Line

          75+

          BUSINESS ANALYTICS

          65+

          DATA INTEGRATION

          16+

          EMBED PENTAHO

          43+

          PREMIUM CONSULTING

          10+

          BIG DATA

          15+

          MIGRATION

          Projects by service line

          75+

          BUSINESS ANALYTICS

          65+

          DATA INTEGRATION

          16+

          EMBED PENTAHO

          43+

          PREMIUM CONSULTING

          10+

          BIG DATA

          15+

          MIGRATION

          At GrayMatter, Your Business Matters!
          At GrayMatter, Your Business Matters!