Building in Public: Why We Contribute to the Open Source Data Ecosystem

At Blueprint, one of our core philosophies is "Open Source at the Core." We don't just use open source tools because they are cost effective; we choose them because they offer control, flexibility, and the power of community.

In the fast moving world of data, proprietary vendors can't always keep up with every niche API or new database technology. That's where the beauty of the Singer specification and frameworks like Meltano shines. If a connector doesn't exist, we don't have to wait on a roadmap. We can build it ourselves.

And when we build something useful, we believe in sharing it back. Over the last few months, we've released several taps to help data teams connect more tools to their modern data stack.

Bridging the Gaps: Our Newest Taps

We've focused on connectors that unlock data from diverse sources. From blockchain analytics to edge databases. Here is a look at our recent open source contributions.

1. Unlocking On Chain Insights with `tap-dune`

Dune Analytics is the gold standard for blockchain data analysis. However, getting those insights out of Dune and into your own data warehouse for broader analysis can be a challenge.

We built tap-dune to solve this. It allows you to:

Extract data from any Dune query.
Support incremental replication using query parameters.
Automatically infer schemas from query results.

This tap is perfect for teams bridging the gap between web3 on chain data and their internal business intelligence.

2. Taming NoSQL Data with `tap-firestore`

Google Cloud Firestore is a fantastic document database for app development, but its NoSQL structure makes analytics tricky.

tap-firestore simplifies this by:

Extracting data from collections and subcollections.
Flattening complex document structures into analytics ready tables.
Supporting incremental extraction via timestamps or string based keys.

Now, you can join your app's operational data with your marketing and financial data seamlessly.

3. Integrating Identity Data with `tap-persona`

Identity verification is critical for fintech and compliance heavy industries. Persona handles the heavy lifting of KYB/KYC, but you often need that data in your warehouse to build risk models or operational dashboards.

tap-persona enables you to:

Sync inquiries, accounts, and cases.
Utilize cursor based pagination for reliable data fetching.
Keep a historical record of verification statuses for compliance auditing.

4. Data form the Edge with `tap-turso`

Turso is redefining databases at the edge with its fork of SQLite. As more applications move to the edge, the need to centralize that distributed data becomes paramount.

tap-turso allows you to:

Extract data from both local and remote Turso instances.
Replicate full tables or perform incremental updates.
Bring edge generated data into your central warehouse for holistic analysis.

Why We Build with the Meltano SDK

All of these taps were built using the Meltano Singer SDK. The SDK has been a game changer for us. It handles the boilerplate of the Singer spec state management, stream selection, configuration parsing. So we can focus on the logic that matters: communicating with the API.

This ensures that our taps are:

Standardized: They behave predictably and play nicely with other Singer targets.
Maintainable: The SDK provides a robust structure that makes updates and bug fixes easier.
High Quality: We get features like batching and stream maps out of the box.

What's Next?

Our open source journey is just getting started. We will continue to add new sources as we encounter them in our client work, ensuring that the community has access to the tools they need.

But innovation isn't always about building from scratch. We are also dedicated to maintaining and improving existing tools in the ecosystem. We've made significant contributions to and maintain robust versions of:

tap-zendesk and tap-appsflyer: Ensuring reliable extraction for critical business data.
target-bigquery: We've added features like Storage Write API support and JSON column handling to make loading data into BigQuery faster and more flexible.
tap-mongodb: Improving support for schemaless document extraction.

Beyond ingestion, we are looking at the next step in the pipeline. We plan to start developing dbt packages designed to simplify the first layer of data modeling. Our goal is to provide standard staging models that work seamlessly with our taps, helping you go from raw data to analysis ready tables with minimal friction.

Join Us in Building the Future

We build these tools because we need them for our clients, but we open source them because we know we aren't the only ones facing these challenges.

If you are using any of these tools, give our taps a spin. Star them on GitHub, open an issue if you find a bug, or even better submit a pull request.

Let's build a better data ecosystem, together.

Building in Public: Why We Contribute to the Open Source Data Ecosystem

Bridging the Gaps: Our Newest Taps

1. Unlocking On Chain Insights with `tap-dune`

2. Taming NoSQL Data with `tap-firestore`

3. Integrating Identity Data with `tap-persona`

4. Data form the Edge with `tap-turso`

Why We Build with the Meltano SDK

What's Next?

Join Us in Building the Future

Join our Newsletter

RSS Feed

Bridging the Gaps: Our Newest Taps

1. Unlocking On Chain Insights with tap-dune

2. Taming NoSQL Data with tap-firestore

3. Integrating Identity Data with tap-persona

4. Data form the Edge with tap-turso

Why We Build with the Meltano SDK

What's Next?

Join Us in Building the Future

Join our Newsletter

RSS Feed

1. Unlocking On Chain Insights with `tap-dune`

2. Taming NoSQL Data with `tap-firestore`

3. Integrating Identity Data with `tap-persona`

4. Data form the Edge with `tap-turso`