I'm building Briefcase—a platform designed to help people facing the legal system feel less alone and more in control.

A large part of that mission is helping people understand what is happening.

That includes the legal matter itself, but it also includes the activity surrounding it: who reviewed a document, which parts of a presentation held someone's attention, where questions may be forming, what changed since the last visit, and whether the information being shared is actually moving a conversation forward.

For founders using a Briefcase intelligence room during fundraising, that intelligence should not disappear into an internal analytics dashboard that only we can see. It should become part of the product.

A traditional data room can tell you that someone opened a file.

I want Briefcase to help you understand the story behind that activity.

Did an investor open the pitch deck once and disappear?

Did three people from the same firm return to the financial model over several days?

Did readers move quickly through the market slides but slow down around the business model, competition, or use of funds?

Did their cursors repeatedly return to a particular number, chart, or claim?

Are people revisiting one contract because it answers an important question—or because it is creating one?

A mouse-movement heatmap will never tell you exactly what someone is thinking. But combined with dwell time, presentation progress, repeat visits, document activity, and broader engagement trends, it can tell a much richer story than a simple open notification.

The goal is not surveillance for its own sake. It is to give founders and teams useful signals so they can communicate more clearly, anticipate questions, and make better decisions.

The same principle applies across Briefcase.

We need product-level KPIs that help us understand whether people are successfully completing difficult workflows. We need to identify friction before it becomes abandonment. We need to know whether a new experience is reducing confusion—or merely moving it somewhere else.

That creates a lot of event data.

Page views. Document opens. Presentation progress. Session activity. Search behavior. Mouse movement. Return visits. Changes over time.

Some events matter individually. Most become valuable only when they can be aggregated, compared, and understood as patterns.

## Choosing the analytics foundation

I was fortunate to be selected for both the Google for Startups Cloud Program and AWS Activate, giving Briefcase meaningful access to—and credits within—both cloud ecosystems.

Because Briefcase is being built as a hybrid-cloud platform, I did not want to choose an analytics system simply because the rest of a particular workload happened to live nearby. I spent time evaluating the strengths of both ecosystems and looking for the right architecture for this specific data path.

The requirements were straightforward, even if the decision was not.

The system needed to accept a very high volume of relatively small events, retain the raw history economically, support questions I had not thought to ask yet, and scale without requiring me to operate a large analytics cluster before the product actually needed one.

For that workload, Athena on S3 stood out.

S3 provides effectively unbounded object storage at a remarkably low cost. Athena lets me query that data using SQL without provisioning or maintaining a dedicated database cluster. The architecture can begin small, remain inexpensive while usage grows, and scale to enormous datasets without a fundamental redesign.

It also creates a clean separation between the application's operational database and the much larger stream of behavioral and analytical data surrounding it.

Raw events can land in S3 as an immutable history, organized by time and workload for efficient querying. From there, incremental rollups transform that stream into increasingly useful layers of analytics.

Those rollups are watermarked, so each job knows exactly how far it has processed. They are also idempotent, so retrying a job does not duplicate data or corrupt the result.

Instead of repeatedly scanning the full history to render a dashboard, the application can query compact, purpose-built datasets containing the metrics it needs.

That is where Athena becomes especially compelling.

With the data properly partitioned, stored in efficient columnar formats, and reduced through well-designed rollups, queries remain fast and the amount of data scanned stays controlled—even as the raw event history continues to grow.

The raw layer preserves flexibility.

The rolled-up layers provide speed.

And because the transformation process is repeatable, the system can recover, backfill, or produce new metrics without treating the first version of the analytics model as permanent.

Those datasets can eventually power:

- Engagement summaries and activity timelines
- Document and presentation rankings
- Slide-level attention and mouse-movement heatmaps
- Conversion and return-visit trends
- Team- and room-level KPIs
- Unusual activity or intent signals
- AI-generated summaries of meaningful changes

In the cloud, the architecture made sense.

Then I needed to build it.

More specifically, I needed to run the full application locally and test the entire analytics path in CI/CD.

Sigh. 😮‍💨

You can't run Athena on your laptop. There is no native local mode or offline switch—and emulating it through providers like LocalStack means paying for a commercial license.

That leaves you hitting the real cloud during development, skipping the data path in CI, or paying for the privilege of testing locally.

Neither option felt acceptable.

## The local-development gap

Using the live cloud during development sounds manageable until it becomes part of the normal inner loop.

Every query needs credentials. Every developer needs access to an AWS environment. Test data has to be uploaded or seeded remotely. Network availability becomes a dependency. Queries take longer. Costs may be small individually, but the friction becomes permanent.

CI/CD is even less forgiving.

You can mock the Athena client, but a mock only proves that your code called the interface you expected. It does not prove that the SQL is valid, that result pagination behaves correctly, that the response shape matches Athena, or that your application can execute the full path from query submission to returned rows.

It does not prove that a watermark advances correctly.

It does not prove that retrying a rollup remains idempotent.

And it does not prove that the data produced by one stage can actually be queried by the next.

You can skip those tests, but then one of the most important data paths in the product is validated only after deployment.

You can point CI at a shared AWS account, but now your supposedly isolated test suite depends on cloud credentials, remote state, network availability, account configuration, cleanup, and the possibility that two test runs interfere with one another.

None of those options gave me the development experience I wanted.

I wanted the same thing I expect from the rest of the stack:

Clone the repository.

Start the local services.

Seed the data.

Run the application.

Run the tests.

No special cloud account. No hidden shared environment. No conditional code path that bypasses production behavior. No pretending that a mocked response is an integration test.

So I built Athena Local.

## A local Athena API backed by Trino

Athena Local provides a compatible Athena API backed by Trino, the distributed SQL engine that shares lineage with the technology behind Athena itself.

The application continues using the real AWS SDK.

Instead of rewriting the analytics layer or introducing a development-only abstraction, you point the Athena client at localhost.

The endpoint changes.

The application does not.

That means local development and CI can exercise the same core workflow used in production:

1. Submit a query through the AWS SDK.
1. Poll for its execution status.
1. Retrieve the result set.
1. Handle pagination and Athena-style responses.
1. Run assertions against real SQL execution.

The underlying data can be seeded locally, queried immediately, and discarded when the test completes.

It is fast, free, offline, and repeatable.

More importantly, it turns the analytics path into something that can be developed and tested like the rest of the application.

I can test a rollup from its starting watermark through its output data.

I can deliberately run it again and verify that it produces the same result.

I can test malformed SQL, pagination, query-state transitions, failure handling, and the application behavior surrounding them.

And I can do it without creating cloud resources or coupling a test run to a shared environment.

## Built with AI agents, but not by accident

Athena Local was also built the way I build most software now: with AI agents, detailed specifications, and tight review loops.

That does not mean asking an agent to "build an Athena clone" and accepting whatever appears.

It means defining the contract carefully, breaking the work into bounded implementation stages, testing behavior through the real AWS SDK, reviewing architectural decisions, and repeatedly tightening the gap between something that merely runs and something I would trust other developers to use.

AI dramatically increased the speed at which I could research, scaffold, test, and document the project.

The judgment still mattered.

What should be emulated?

What should intentionally remain out of scope?

How faithfully should errors and asynchronous query states behave?

How should developers seed S3-compatible data?

What needs to work for unit tests, integration tests, and a locally running application?

Those are product and architecture decisions, not autocomplete.

The result is an open-source package that solved a real problem inside Briefcase—and, I suspect, a problem other teams using Athena have quietly worked around for years.

Athena Local is available on npm and designed for local development, automated testing, and CI/CD.

The endpoint changes.

Your application does not.
   [building in public](/tags/building in public)[founder story](/tags/founder story)[AWS](/tags/AWS)[DX](/tags/DX)[serverless](/tags/serverless)[open source](/tags/open source)[data](/tags/data)[ci-cd](/tags/ci-cd)
## Comments (0)

Join the discussion