Why can't I just write DML in my transformations?
select
statements make transformations accessible
More people know how to write select
statements, than DMLData Manipulation Language (DML) is a class of SQL statements that are used to query, edit, add and delete row-level data from database tables or views. The main DML statements are SELECT, INSERT, DELETE, and UPDATE., making the transformation layer accessible to more people!
Writing good DML is hard
If you write the DDLData Definition Language (DDL) is a group of SQL statements that you can execute to manage database objects, including tables, views, and more. / DML yourself you can end up getting yourself tangled in problems like:
- What happens if the tableIn simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells. already exists? Or this table already exists as a viewA view (as opposed to a table) is a defined passthrough SQL query that can be run against a database (or data warehouse)., but now I want it to be a table?
- What if the schema already exists? Or, should I check if the schema already exists?
- How do I replace a model atomically (such that there's no down-time for someone querying the table)
- What if I want to parameterize my schema so I can run these transformations in a development environment?
- What order do I need to run these statements in? If I run a
cascade
does it break other things?
Each of these problems can be solved, but they are unlikely to be the best use of your time.
dbt does more than generate SQL
You can test your models, generate documentation, create snapshots, and more!
You reduce your vendor lock in
SQL dialects tend to diverge the most in DML and DDL (rather than in select
statements) — check out the example here. By writing less SQL, it can make a migration to a new database technology easier.
If you do need to write custom DML, there are ways to do this in dbt using custom materializations.