Decide on Seeding Strategy for Staging Environments
Problem
Longer-lived staging environments need a dataset that closely resembles production. If this dataset becomes stale, we’ll not be effectively testing releases before they hit production. Restoring snapshots from production is not recommended.
Considerations
-
Should contain anonymized users, invalid email addresses
-
No CHD, PHI, PII must be contained in the database
-
The scale of data should be close to the production database
-
Snapshots from production are dangerous if not anonymized/scrubbed (imagine the risk of sending emails to everyone from your staging env)
-
Fixtures are not recommended (scale of data for fixtures usually does not represent production)
-
We recommend including the DBA in these conversations.
-
QA teams want stable data so that they can run through their test scenarios
Recommendations
Cloud Posse does not have a turnkey solution for seeding staging environments
- ETL pipeline scrubs the data and refreshes the database weekly or monthly. (e.g. AWS Glue, GitHub Action Schedule Job)