Year in Review 2025

Commercial VIN decoders charge $0.10-0.50 per lookup. They rate-limit you, require API keys, and make you pay for data that's public domain from NHTSA. The entire industry exists because nobody bothered to make the government data usable.
We fixed that, then built an entire automotive platform on top of it.
The Problem: VIN Decoding is a Racket
NHTSA publishes the VPIC database - every vehicle specification for every VIN pattern ever registered in North America. It's free, public domain data. But it's distributed as a 5GB+ SQL dump that's unusable without serious infrastructure.
So companies charge you to query it. Per lookup. With rate limits. Some charge $50-200/month for API access. Others charge per thousand lookups. It's rent-seeking on public data.
Corgi: 50MB, Fully Offline, Sub-Millisecond Decoding
We took NHTSA's database, optimized it down to 50MB, and released it as open source. Read the full technical breakdown in our Corgi deep dive.
The entire decoder runs offline. No API calls, no rate limits, no costs. It works in Node.js, browsers (via sql.js), and Cloudflare Workers (via D1). We wrote database adapters for each runtime so the core decoding logic stays identical.
Here's how it works:
Database optimization: We analyzed NHTSA's schema, removed redundant columns, normalized strings, and compressed the SQLite database. The final database is 50MB gzipped, 150MB uncompressed. Commercial solutions ship 1GB+ databases or require server roundtrips.
Pattern matching engine: VINs encode vehicle data in the first 11 characters. We built a pattern matcher that queries the optimized database and returns full vehicle specs in <1ms. No external API calls required.
Multi-runtime support: Same core logic, different database adapters. Better-sqlite3 for Node.js, sql.js for browsers, D1 binding for Cloudflare Workers. The decoder doesn't care where it runs.
import { decodeVIN } from "@cardog/corgi";
const result = await decodeVIN("1HGBH41JXMN109186");That's it. No API keys, no HTTP requests, no vendor lock-in. Just VIN in, vehicle data out.
Market Intelligence: Real-Time Automotive Analytics
The core product is market analysis. We built a system that analyzes millions of vehicle listings and provides actual insights, not just aggregated averages that tell you nothing.
The Architecture
At the bottom is a Postgres database (on Neon) with every active vehicle listing in North America. On top of that, we built a market analysis service that computes:
- Price distribution histograms - Not just "average price" but actual distribution curves showing where listings cluster
- Odometer-price correlation - Depreciation analysis with enough data points to be statistically meaningful
- Geographic breakdowns - Where vehicles are listed and how prices vary by region
- Time series trends - Week-over-week and month-over-month price movements
- Trim-level analysis - Price deltas between trims, showing which configurations hold value
- Individual listing positioning - Where any specific listing sits relative to the market (percentile rank, pricing guidance)
- Similar listings algorithm - Proximity-based matching considering make/model/year/trim/mileage/price
The service layer is written in TypeScript using Drizzle ORM for type-safe SQL queries. Every endpoint is contract-validated with Zod schemas to prevent schema drift between API and clients.
API routes look like this:
app.get("/:make/:model/:year/overview", async (c) => {
const { make, model, year } = c.req.valid("param");
const svc = new MarketService(c.env.CORE_DB_URL);
return c.json(await svc.overview(make, model, year));
});Simple, fast, type-safe. Hono + Cloudflare Workers means sub-100ms response times globally. Explore the full API documentation.
Research Database: Vehicle Specs That Don't Lie
We maintain a comprehensive vehicle specification database covering every make/model/year/trim variant. This includes:
- Full trim lineups by manufacturer and year
- Technical specifications (engine, transmission, drivetrain)
- OEM imagery (exterior shots, interior, detail photos)
- Factory color options with hex codes
- Media categorization (shot types, angles, contexts)
The research API exposes all of this through clean REST endpoints:
const lineup = await client.research.lineup("Toyota", { year: 2024 });
const images = await client.research.getImages({
make: "Honda",
model: "Civic",
year: 2023,
});
const colors = await client.research.getColors({
make: "Ford",
model: "F-150",
year: 2024,
});This powers both our consumer app and dealer tools. Same API, different use cases.
Recalls Pipeline: Government Data That Actually Works
Government recall data is a mess. Transport Canada and NHTSA both publish recall information, but in different formats, with inconsistent schemas, and zero normalization.
We built beaver - an ETL pipeline that:
- Extracts raw recall CSVs from government sources
- Transforms the data into a normalized schema
- Loads it into SQLite with proper indexes
The schema is straightforward:
CREATE TABLE transport_canada_recalls (
id TEXT PRIMARY KEY,
recall_number TEXT NOT NULL,
year INTEGER,
make TEXT NOT NULL,
model TEXT NOT NULL,
units_affected INTEGER,
system_type TEXT,
comment TEXT,
recall_date DATE
);Now we can query recalls by VIN, make/model/year, or recall number instantly. The recalls tool exposes this through simple endpoints that return clean JSON instead of government CSV soup.
MCP Server: AI That Knows Cars
We built an MCP (Model Context Protocol) server that exposes our entire data platform to AI agents. This means Claude (or any MCP-compatible AI) can:
- Search vehicle listings with complex filters
- Analyze market conditions for specific vehicles
- Look up recalls by VIN
- Get comprehensive vehicle specifications
- Find similar listings
- Calculate market positioning
The tools are defined with Zod schemas and return structured JSON. For example:
agent.server.tool(
"get_market_overview",
"Get market statistics for a make/model/year",
makeModelYearSchema,
async (params) => {
const overview = await client.market.overview(
params.make,
params.model,
params.year
);
return { content: [{ type: "text", text: JSON.stringify(overview) }] };
}
);This turns our entire platform into a knowledge base that AI can query programmatically. No prompt engineering required - just clean tool definitions and structured data.
The Client Layer: Type-Safe Everything
We abstracted all API interaction into a typed client library (@cardog/client) that both our web and mobile apps consume. It's built on top of Zod-validated contracts (@cardog/contracts), so if the API changes, the client breaks at compile time, not runtime.
The client exposes methods for every API endpoint and includes TanStack Query hooks for React:
const { data, isLoading } = useMarketOverview("Toyota", "Camry", 2024);No manual cache management, no stale data, no imperative fetching logic. Just declare what data you need and the hooks handle the rest.
The Apps: Mobile-First, Web Second
We shipped two consumer-facing applications this year:
Mobile app - React Native using Expo 52. Multi-platform (iOS + Android) with native performance. Uses the same API client as web, same authentication system, same data contracts.
Web app - Next.js 15 on Vercel. Tailwind + shadcn/ui primitives. Server components where it makes sense, client components where we need interactivity. Fast, responsive, doesn't feel like a web app pretending to be native.
Both apps hit the same Hono API running on Cloudflare Workers. Authentication is handled by better-auth with custom plugins.
The Data Layer: Postgres + Drizzle
We use two Postgres databases on Neon:
- Core DB - Vehicle listings, specs, images, market data, recalls. Read-heavy, optimized for analytical queries.
- App DB - User accounts, auth, subscriptions, billing, garage. OLTP workload, optimized for transactional consistency.
Drizzle ORM gives us type-safe queries without the bloat of Prisma. Migrations are SQL files we actually understand. Schema definitions live in TypeScript and get inferred everywhere.
Neon's branching feature means we can spin up ephemeral database branches for testing without contaminating production data. Delete the branch when done.
The Monorepo: Turborepo + pnpm
Everything lives in a monorepo with apps/ and packages/:
Apps:
apps/api- Hono + Cloudflare Workers APIapps/crdg.ai- Next.js web applicationapps/cardog-native- React Native mobile appapps/corgi.vin- Corgi VIN decoder demo siteapps/beaver- Recalls ETL pipeline
Packages:
packages/client- API client librarypackages/contracts- Zod schemas (single source of truth)packages/corgi- Open source VIN decoderpackages/data/postgres- Drizzle ORM schemaspackages/mcp- MCP server implementation
Turborepo handles build orchestration. pnpm handles dependency management. We don't think about either one anymore - they just work.
What We Learned
Building automotive infrastructure taught us a few things:
-
Government data is always worse than you expect - Plan for inconsistency, missing fields, and schema changes with zero notice.
-
Offline-first is underrated - Corgi works offline because we bundled the data. No network = no problem. This matters more than you think.
-
Type safety prevents production fires - Zod contracts between API and clients caught hundreds of bugs before they shipped. Worth the extra code.
-
Monorepos scale - Sharing code between mobile and web is trivial when everything's in one repo with proper package boundaries.
-
Simple architectures win - Hono + Workers + Postgres is fast, cheap, and easy to reason about. No Kubernetes, no microservices, no complexity budget blown on infrastructure.
What's Next
We're building dealer tools on top of this infrastructure. Same APIs, different UI. Market analysis for inventory optimization, listing insights, competitive intelligence.
The platform is flexible enough to support both consumer and B2B use cases without rewriting core services. That's the benefit of building proper abstractions from day one.
All the code runs in production today. Corgi is open source at github.com/cardog-ai/corgi. The rest powers cardog.app and our mobile app.
Not bad for a year of work.