[Redshift only] "Fail to transform data. ERROR: could not open relation with OID xxx” error during Import/Transform Job

:bulb: You should know:

This section below is applicable only to Holistics 3.0 tenants that use AWS Redshift. If you are on Holistics 4.0, we offer Flow-based Persistence which should prevent this issue from happening.

:question:The problem:

Your Import or Transform job, when executed in Full mode, has failed with this error: "Fail to transform data. ERROR: could not open relation with OID xxx”. What does this mean?

:memo: Answer:

What is this error?

To understand this issue, let us quickly go through the execution flow of Data Model Persistence in Full mode. It involves 3 steps:

  1. Create a temporary table.
  2. Run the query to persist the result on the temporary table.
  3. Hotswap the temporary table with destination table in the same database transaction. Specifically, it will:
    1. Delete the existing destination table
    2. Rename the temporary table with the (dropped) destination table’s name.

To learn more about Hotswap step, it is advisable to check out this Community Post.

In step 3a, if there are other operations that query the existing destination table while it is being dropped, Redshift will alert “relation does not exist” error.

Furthermore, it is noteworthy that in Redshift, the dropped table in step 3a is visible to other transactions even if its transaction is not committed.

From AWS Redshift document:

System catalog tables (PG) and other Amazon Redshift system tables (STL and STV) are not locked in a transaction; therefore, changes to database objects that arise from DDL and TRUNCATE operations are visible on commit to any concurrent transactions.

For example, suppose that table A exists in the database when two concurrent transactions, T1 and T2, start. If T2 returns a list of tables by selecting from the PG_TABLES catalog table, and then T1 drops table A and commits, and then T2 lists the tables again, table A is no longer listed. If T2 tries to query the dropped table, Amazon Redshift returns a “relation does not exist” error. The catalog query that returns the list of tables to T2 or checks that table A exists is not subject to the same isolation rules as operations against user tables.

Can I prevent it?

Yes. There are a couple ways to workaround the issue:

  • Use the Incremental mode for your Model Storage Settings. The Incremental mode inserts data directly into the existing destination table instead of doing a Hotswap.
  • Arrange Persistences’ schedule so that they run one after another instead of concurrently.
  • Do not use Storage Settings on small queries.
  • Consider migrating to Holistics 4.0, where Flow-based Persistence can prevent this issue.