Hi there,
I use terraform for my whole infrastructure.
he current way is that the master branch represents the applied state of the production infrastructure.
So the idea is to evaluate everything in an dev environment and create a pull request when everything is ready.
In the pull request I can only run a terraform plan and some other checks, but I can not directly verify the apply.
So here comes the actual question:
How to deal with broken apply attempts?
It happened to me quite some time that an apply screwed up the infrastructure because of missing permissions, wrong dependencies, deadlocks, timeouts… All things that can be caught during a plan.
Ideally now one would only revert that PR that broke things, but I also experienced situations where this was not possible and the only way to revert things was manual interaction from a local Laptop/PC.
What are strategies to resolve those situations in bigger teams? In my opinion the PR to master approach with a fast forward merege strategy is in theory a great way to organize and queue applies.
Would terraform cloud or terragrunt or any of these tools resolve this explicit issue and if so how? What other alternatives exist there? I found very little resources on how to handle the infrastrucutre in a team where multiple poeple can apply changes. For me the state file is only part of the equation. But maybe I am missing something here.
Thanks for your help!
1 post - 1 participant