Over the past decade, innovation in the modern data stack has pushed the boundaries of what’s possible with data. At the bottom of the modern data stack, cloud data warehouses like Snowflake and Redshift have made it easier than ever to store and query large volumes of data. At the top of the modern data stack, visualization tools like Tableau and Looker have completely democratized self-service analytics, enabling business users to answer their questions with no code. Between these layers, dbt (data build tool) has quickly become an industry standard for data transformation in the modern data stack. This blog explains why we decided to build the Numbers Station Data Transformation Assistant on top of dbt, and how Numbers Station enables any data analyst to rapidly create powerful data transformation pipelines.
dbt is one of the fastest growing tools for data modeling and transformation. It serves as a collaborative tool that enables technical data analysts, or newly called analytics engineers, to build well-governed data transformation pipelines. At its core, dbt brings the best software engineering practices into the modern data stack. With dbt, analytics engineers can collaborate on their transformation pipelines, document their code, manage dependencies, define metrics, test their models, implement version control and orchestrate production runs. The result is a powerful tool that governs the data transformation process, enforcing software engineering best practices for analytics engineers who may not be software engineering experts. As a result, dbt users no longer need to worry about setting up and maintaining their own infrastructure, which can be a significant barrier to entry for smaller organizations or teams.
Unfortunately, technical skills (e.g. SQL, Python) are still a prerequisite for using dbt, which creates a high barrier to entry for less technical data and business analysts. Consequently, business stakeholders need to constantly work with analytics engineers to create new data views, which can sometimes cause costly communication cycles until data is prepared and amenable to analysis. Even worse, communication costs are even higher for business questions that rely on statistical and machine learning-based transformations as the data science team also needs to be looped in.
At Numbers Station, our mission is to close this gap in skills using foundation model technology. Foundation models, because of their natural language interface, enable users with limited to no technical expertise to be part of the data transformation journey. Our team pioneered applying foundation models to data transformation tasks (see our research blog on this topic) and was the first to show that foundation models can clean data, reformat columns, fill-in missing values or match duplicate records. We firmly believe that the next level of accessibility for data transformation will be powered by foundation model technology, a vision we share with Tristan Handy, dbt labs’s CEO, who told us:
Available now, the Numbers Station Data Transformation Assistant democratizes the data transformation process to all skill levels in the data and analytics space. In more details, the platform offers two main types of transformations powered by dbt: SQL-based and AI-based transformations. For data and business analysts that are not comfortable expressing their ideas in code, Numbers Station’s SQL Transformations offer a natural language interface that enables users to generate SQL code for mundane transformation tasks like joining or aggregating data. For advanced tasks such as extracting values from text, classifying data, and predicting sentiments, Numbers Station’s AI transformations offer a natural language interface to intelligent foundation models over users’ data. By directly producing answers for the different transformation tasks, these foundation models unlock the power of AI for any data analyst.
The combination of Numbers Station and dbt can accelerate data transformation and intelligence tasks by allowing data analysts to create data pipelines directly in natural language. This saves precious time for engineers who can now focus on more important problems than writing group by queries or reformatting entries. Of course, there is always a tension between speed and accuracy, and to ensure their pipelines are trusted, Numbers Station’s users have the option to export their pipelines as dbt projects and share them with engineers for verification and deployment into production.
As AI technology continues to become more advanced, the flexibility and power to gain valuable insights from data only grows stronger. Numbers Station is bringing our cutting edge foundation model technology to analytics workflows, empowering data and business analysts to accelerate their data-driven insights. To learn more, listen to our podcast with dbt on Spotify or sign up to start your free trial.