Constant Product DEX — The Most Basic AMM

x * y = k. Four characters, one multiplication, one equals sign. The formula fits on a Post-it note. A middle schooler could verify it with a calculator. It is, by any reasonable measure, the simplest pricing model in all of decentralized finance. Two token reserves multiplied together equal a constant. Deposit one token, withdraw the other, keep the product unchanged. That's it. That's the entire pricing algorithm.

I'm integrating my first constant product AMM right now, and I want to be clear about something: the formula is simple. Everything else is not.

The gap between "I understand x * y = k" and "I can construct a valid swap instruction that executes on-chain" is enormous. It's the gap between understanding that a car engine works by burning gasoline and being able to rebuild a transmission. The principle is accessible. The implementation is a different world — a world of byte offsets, account ordering, integer rounding, legacy dependencies, and error messages that tell you nothing useful.

This is the first DEX I'm integrating directly via CPI, and I chose a constant product AMM specifically because it's supposed to be the easiest. If I can't get the simplest AMM architecture working, I have no business attempting concentrated liquidity or order books. The math is a solved problem. The engineering is where I'm spending all my time.

The Farmer's Market Scale

Here's how x * y = k works, stripped of all jargon.

Imagine a farmer's market stall with two produce baskets. One basket has apples. The other has oranges. The vendor has a single rule: the number of apples multiplied by the number of oranges must always equal a fixed number. Say 10,000.

If the stall starts with 100 apples and 100 oranges, the product is 10,000. A customer walks up wanting to buy apples. They put oranges into the orange basket. The vendor calculates how many apples to hand over such that the product remains 10,000. If the customer adds 10 oranges (bringing the total to 110), the vendor gives them enough apples to bring the apple count down to 10,000 / 110 = 90.91. The customer gets about 9 apples for 10 oranges.

But here's the thing the formula captures: the more apples the customer buys, the more expensive each additional apple becomes. After the first customer, there are only ~91 apples left. The next customer adding 10 oranges gets fewer apples — 10,000 / 120 = 83.33, so only about 7.6 apples. Same 10 oranges, fewer apples received. The price moves against the buyer as the pool becomes unbalanced.

This is price impact, and it's built directly into the math. There's no order book, no market maker adjusting quotes, no human judgment. The formula is the market maker. The spot price at any moment is simply reserve_b / reserve_a — the ratio of the two reserves. As trades push the reserves out of balance, the price adjusts automatically.

That's the theory. Beautiful, elegant, and it genuinely does fit on a Post-it note. Now I need to make it work on-chain.

Documentation: The Progression Toward Truth

The first thing I do when integrating a new DEX protocol is look for documentation. Official docs, API references, integration guides. The kind of material that a team publishes when they want other developers to build on their platform.

For newer protocols, this sometimes exists and is sometimes accurate. For older protocols — the ones that have been running on mainnet for years, through multiple program upgrades and team changes — the documentation situation is grimmer.

I'm looking at a classic constant product AMM. The official documentation exists, but sections of it describe instruction signatures that don't match the currently deployed program. The "Getting Started" guide references SDK functions that have been deprecated. The account structure diagrams show field names but not byte sizes. The fee documentation says "see the config account" without specifying the config account's layout.

This is not unusual. It's the norm.

The progression I've learned — and I wish someone had told me this months ago — goes like this: official documentation is the hypothesis. GitHub source code is the better hypothesis. On-chain data is the truth.

Official docs might be written by someone who isn't the program developer. They might describe the intended design rather than the deployed implementation. They might be two versions behind.

GitHub source code is closer to truth because it's the actual Rust structs and instruction handlers. But as I learned the hard way in an earlier episode, the source code on GitHub might not match the deployed on-chain program. Programs are deployed as compiled bytecode. The GitHub repo might have commits that haven't been deployed yet, or — worse — the deployed program might have been compiled from a branch or tag that isn't the default branch.

On-chain data doesn't lie. The bytes in the account are what the program actually wrote. The instruction that succeeds is the instruction the program actually accepts. When the docs say one thing, the source code says another, and the on-chain data says a third, the on-chain data wins. Always.

This feels like buying a used car. The listing on the website says "excellent condition, one owner." The Carfax report tells a more nuanced story — two owners, one accident on record. But if you get under the car and look at the frame yourself, you see the truth: the welds, the rust, the patches. The listing is marketing. The Carfax is closer. The physical inspection is reality. You have to get under the car.

Parsing the Pool State

To simulate a swap — to calculate how much token B I'll get for a given amount of token A — I need four numbers from the pool account: the reserve of token A, the reserve of token B, the fee rate, and the pool's operational status (is it active, paused, or frozen?).

These numbers are sitting in the pool account's data field. As I described in detail a few episodes ago, that data field is a raw byte array. No labels, no field names. Just bytes. To read the reserves, I need to know the exact byte offset where each reserve value is stored.

This is where the fun begins.

I start with the source code. The pool struct has fields defined in order: some configuration fields, public keys for the token vaults, reserve amounts stored as u64 (8-byte unsigned integers), fee parameters, status flags. I count up the sizes: 8 bytes for the discriminator (it's an Anchor program), 32 bytes for the config address, 32 bytes for the creator, and so on. Addition. Just addition.

Except I make a mistake. There's a field early in the struct that I misread as a u32 — a 4-byte integer. It's actually a u64 — an 8-byte integer. The field names and types are clear in hindsight, but when you're staring at a struct with 30+ fields and doing mental arithmetic to track cumulative offsets, a u32-versus-u64 confusion is easy to make.

The consequence: every offset after that field is wrong by 4 bytes. Not obviously wrong — 4 bytes doesn't produce nonsensical numbers the way a 32-byte shift would. I read the reserves and get values that are plausible but don't match what the protocol's own UI reports. The fee rate looks wrong. One of the public keys doesn't correspond to any known account.

I spend hours thinking my parsing logic has a bug. I add debug logging. I refactor the code. The bug isn't in the logic. The logic is perfectly correct for the layout I calculated. The layout I calculated is wrong.

The fix comes from cross-verification — the practice I developed in an earlier episode. I take a pool whose token mint addresses I know, fetch the raw account data, and search for those 32-byte public key values at my calculated offsets. They're not there. I search the entire byte array and find them 4 bytes later than expected. That tells me exactly which field is the wrong size. I fix the field type, recalculate all offsets, and everything clicks into place.

One field type. Four bytes. Half a day of debugging.

Programs that don't use Anchor have no discriminator — the first field starts at byte 0 instead of byte 8. This is another assumption that's easy to get wrong, especially when switching between Anchor and non-Anchor programs. The 8-byte shift from a phantom discriminator produces a subtler kind of wrong — the public keys you extract might partially overlap with real data and look almost-but-not-quite valid.

Swap Simulation: When 1 Lamport Matters

Once I can correctly parse the pool state, I need to simulate swaps. Given an input amount of token A, how much token B will the pool give me?

The formula, accounting for fees, is:

output = (reserve_b * input * (10000 - fee_bps)) / (reserve_a * 10000 + input * (10000 - fee_bps))

Where fee_bps is the fee in basis points. A fee of 25 basis points means 0.25%, so (10000 - 25) = 9975, meaning 99.75% of the input amount participates in the swap calculation.

The formula itself is straightforward. Plug in the numbers, get the output. I implement it, test it against known swaps, and... the numbers are off by 1 lamport. Sometimes 2.

One lamport. The smallest unit of SOL, worth fractions of a penny. Who cares about 1 lamport?

The on-chain program cares. Here's why.

Blockchain math is integer math. There are no floating point numbers on-chain. When the formula produces a fractional result — and it almost always does, because division rarely produces a clean integer — the program has to decide: round up or round down?

On-chain programs universally round against the user. If the calculated output is 1,000,000.7 lamports, the program gives 1,000,000. If the fee calculation is 2,500.3 lamports, the program charges 2,501. This isn't arbitrary — it prevents rounding exploits where an attacker could extract fractional tokens by making millions of tiny swaps that all round in their favor.

My simulation has to match this behavior exactly. Not "close enough." Exactly. Because when my bot constructs a swap instruction, it specifies a minimum output amount — a slippage check. If my simulation says I'll get 1,000,001 lamports but the on-chain program calculates 1,000,000, and my minimum output is set to 1,000,001, the transaction fails. One lamport. Transaction rejected.

The implementation detail that matters: ceiling division versus floor division. Standard integer division in most programming languages truncates toward zero (floor division). Ceiling division — rounding up — requires a different formula: (a + b - 1) / b instead of a / b. The on-chain program uses ceiling division for fees (round up against the user) and floor division for output (round down against the user). My simulation must use the same operations in the same places.

This is like filing your federal tax return. The IRS has specific rules about when to round up and when to round down, and they're different for different lines on the form. Round your income up. Round your deduction down. Get it wrong by a dollar and the math doesn't match, and the IRS sends you a letter. The tax code doesn't care that you were "close." Close doesn't reconcile.

I test my simulation against a dozen known swaps on different pools. I compare my calculated output against the actual output recorded on-chain for each swap. The numbers must match exactly — not approximately, not within a tolerance, but to the lamport. After adjusting the rounding behavior, they do. Every one. Zero discrepancy.

This is the foundation everything else builds on. If my swap simulation is wrong by even 1 lamport, every profitability calculation is unreliable, every slippage parameter is potentially wrong, and every transaction has a chance of failing for a reason that looks like a mystery but is actually just arithmetic.

The 15-Account Jigsaw Puzzle

I can parse the pool state. I can simulate the swap output. Now I need to build the actual instruction — the on-chain message that tells the program "swap this many of token A for token B on this pool."

A swap instruction on a constant product AMM needs somewhere around 15 accounts. Fifteen. For what is conceptually the simplest possible DeFi operation.

Let me list what a typical constant product swap requires:

  • The AMM program itself
  • The pool state account
  • The pool authority (a PDA — Program Derived Address)
  • Token vault for token A
  • Token vault for token B
  • The user's token account for token A
  • The user's token account for token B
  • The user's wallet (signer)
  • The SPL Token program
  • An associated configuration account
  • Several additional accounts depending on the specific program

And depending on the protocol and its history, there may be more. Accounts for market state. Accounts for event recording. Accounts for programs that the DEX used to interact with but technically doesn't anymore. Each account has to be in the right position, with the right read/write permission, and with the right signing authority.

Building this account list is like assembling IKEA furniture where the instruction manual is for a slightly different model. The general shape is right — you recognize the pieces, you understand what a bookshelf looks like — but step 7 references a dowel that isn't in your box, and step 12 skips a critical bracket that you only discover is missing when the shelf wobbles. The error message is the shelf wobbling. Not "you're missing bracket B7 on the left side." Just: wobble.

On Solana, the equivalent of the wobble is a generic program error. "Program failed to execute." No indication of which account is wrong, which position is mismatched, or what account should be there instead. Sometimes the error code points to a general area — "invalid account owner" — but which of the 15 accounts has the wrong owner? The program doesn't say. I get to figure that out by comparing my account list against the IDL, position by position, checking each account's on-chain data to verify it's what the program expects.

The ordering is particularly brutal. Solana programs reference accounts by index. If the program expects the pool authority at position 4 and I put it at position 5, the program reads whatever is at position 4 — which in my misordered list is probably the token A vault — and tries to use it as the authority. The data at that address is the vault's state, not an authority derivation. The program either panics with a cryptic error or, worse, silently reads the wrong bytes.

It's a Scantron problem, the same one I described in the previous episode. Every answer shifted by one row. The machine doesn't know you shifted; it just marks everything wrong.

The Ghost Accounts: Legacy Structure

Here's something that would be completely baffling without historical context.

Raydium AMM V4 — one of the oldest and highest-volume constant product AMMs on Solana — requires accounts in its swap instruction that relate to OpenBook, the on-chain order book protocol formerly known as Serum. When Raydium AMM V4 was originally designed, it shared liquidity with the Serum order book. Every swap on AMM V4 could potentially interact with Serum's order matching system. The instruction needed Serum's market account, its bids and asks accounts, its event queue, and its vault signer.

Serum rebranded to OpenBook. More importantly, AMM V4 pools now operate primarily as pure constant product AMMs. The shared liquidity mechanism with the order book is largely vestigial for most pools. But the instruction structure hasn't changed. The accounts are still required.

This means I need to include OpenBook market accounts in my swap instruction for a pool that isn't meaningfully using them. Accounts I don't care about. Accounts that don't affect my swap's output or execution. But if I leave them out, the instruction is malformed and the transaction fails.

It's like filling out a government form that still asks for your fax number. Nobody has a fax number anymore. The field has been irrelevant for a decade. But the form hasn't been updated, and if you leave the field blank, the clerk rejects the submission. "All fields required." You write "N/A" or make something up, because the system demands a value even though nothing downstream reads it.

These ghost accounts take up space in my transaction. Each one is 32 bytes in the account keys array — space I can't use for anything else in a transaction format where every byte matters. They add complexity to my instruction builder. They require additional data lookups, because I need to know which OpenBook market corresponds to each AMM V4 pool. Some pools have active OpenBook markets; others have markets that were created at pool launch and never used since.

Without knowing the history of how these protocols evolved — without understanding that Serum existed, that Raydium built on top of it, that Serum became OpenBook, that the architecture outlived the relationship — these accounts are completely inexplicable. They show up in the IDL as required accounts with names that suggest order book functionality, in a pool that does no order book matching. The only explanation is history, and history isn't documented in the IDL.

This is one of the hardest things about integrating with older protocols. The current state of the code carries the scars of past architectural decisions. Every migration, every partnership, every pivot leaves behind structural remnants that new integrators must accommodate without understanding why they exist. The code doesn't explain its own history. You either know the context or you don't, and if you don't, you spend hours trying to understand why a constant product AMM needs an order book.

CPI: The Invisible Missing Piece

There's one more trap I walk into, and it's specific to how I'm calling the DEX.

Most developers interact with Solana programs by constructing transactions on the client side — in TypeScript, Python, or Rust — and submitting them via RPC. In this model, the client builds the full instruction with all accounts and data, signs it, and sends it to the network. The SDK handles a lot of implicit work: resolving PDAs, looking up associated token accounts, adding program IDs where needed.

I'm not doing that. My bot uses CPI — Cross-Program Invocation. I have my own on-chain program (an arbitrage router), and that program calls the DEX's swap instruction from within its own execution. Program calling program. No client SDK in the loop. No helpful library adding accounts I forgot.

And here's the trap: when calling a program via CPI, the target program's ID must be explicitly included in the account list. In a client-side transaction, the program ID is part of the instruction metadata — it's the program_id field that tells the runtime which program to execute. The runtime handles the lookup. In CPI, the calling program needs to have the target program's account available in its own account list to invoke it.

I miss this the first time. My instruction builder includes all the accounts the swap needs — the pool state, the vaults, the user accounts, the token program — but not the AMM program's own ID as an account. In a client-side transaction, this would work fine. In CPI, the invoke fails because my program can't find the target program to call.

The error message is generic. Something about an account not being found or an invalid instruction. It doesn't say "you forgot to include the program you're trying to call." I spend an hour checking every other account before realizing the issue.

This is a pattern I encounter repeatedly across DEX integrations: what SDKs do automatically, CPI requires explicitly. PDAs that SDKs derive and add behind the scenes? Manual in CPI. Token program accounts that SDKs include by default? Manual in CPI. The AMM program's own account? Manual. Everything is manual, and the error messages never point at the specific omission.

It's like the difference between filing taxes using TurboTax versus filing by hand on paper forms. TurboTax auto-populates fields from your W-2, calculates deductions, adds the right schedules. Filing on paper, you have to know which schedules to attach, which lines to cross-reference, which forms are required for your specific situation. Miss one form and the IRS sends back the whole package. They don't tell you which form is missing. Just: "INCOMPLETE SUBMISSION."

CPI is filing by hand. Every account, every permission, every program reference — all explicit, all manual, all your responsibility.

The Test Loop

With the math implemented, the account list assembled, and the CPI call constructed, I'm ready to test. And testing on-chain swap instructions is its own kind of experience.

I construct the transaction. Submit it. Wait for confirmation. It fails.

The error is a program error with a numeric code. I look up the error code in the program's source. It maps to something generic — "InvalidAccountData" or similar. Which of my 15+ accounts has invalid data? The error doesn't specify.

I go through the accounts one by one. Is the pool state account correct? Is it owned by the right program? Is the authority PDA derived correctly — right seeds, right bump? Are the token vaults the ones actually associated with this pool, not some other pool? Is the user's token account initialized and owned by the right wallet? Is the SPL Token program the right one — not Token-2022, not some other address?

I find the issue. One of the accounts — a vault address — was derived from the wrong pool. A copy-paste error in my test setup. I fix it. Submit again. Fail again. Different error code this time.

This cycle — submit, fail, diagnose, fix, resubmit — repeats many times. Each iteration takes 10-30 seconds for the transaction to process and the error to come back. Each diagnosis requires reading the error code, tracing through the program's error definitions, and reasoning about which account or parameter could produce that specific error.

There's no local test environment that perfectly replicates on-chain behavior. Solana has a local validator for testing, but some DEX programs aren't available on the local validator, and the pool state data needs to match live pools. I can simulate transactions against the live network without actually executing them — a "dry run" — but even simulation has limitations.

The test loop for on-chain program integration is slow, opaque, and unforgiving. Each failure costs time. Each success — when the transaction finally lands and the swap executes — feels disproportionately satisfying.

What "Simple" Actually Means

I'm now three days into integrating the "simplest" AMM architecture. Three days for x * y = k. A formula that fits on a Post-it note.

Here's what those three days actually contained:

  • Reading documentation that turned out to be partially wrong
  • Reading source code that required careful field-by-field offset calculation
  • Making a u32/u64 mistake that cost half a day
  • Implementing integer-exact swap simulation with correct rounding behavior
  • Discovering that the AMM requires order book accounts from a protocol relationship that's largely historical
  • Learning that CPI requires explicit program account inclusion that SDKs handle automatically
  • Debugging through 15+ accounts, one at a time, with error messages that identify the problem domain but not the specific account

None of this is intellectually beyond reach. None of it requires advanced mathematics or novel algorithms. The constant product formula is genuinely simple. The fee calculation is basic arithmetic. The account list is just a list.

But the integration is complex because the system is complex. The on-chain program is a compiled binary that accepts a specific byte-level input format and rejects everything else without helpful diagnostics. The documentation is a best-effort snapshot of a moving target. The historical architecture bakes in dependencies that no longer serve a functional purpose but remain structurally required. The testing loop is slow and opaque.

This is the reality of building on permissionless infrastructure. There's no support ticket to file. There's no integration engineer at the DEX company who'll hop on a call and walk me through the account list. There's no sandbox with helpful error messages. There's the deployed program, its bytes, and my ability to figure out what it expects.

And this is just one DEX. One AMM architecture. The simplest one.

Concentrated liquidity protocols have tick arrays and bin arrays that add dynamic accounts to the swap instruction. Order book protocols have an entirely different paradigm. Each additional DEX is another integration built from scratch, with its own account structures, its own fee models, its own historical baggage, its own error codes that mean different things.

If the most basic AMM — the one everyone learns first, the one every DeFi tutorial starts with, the one whose formula genuinely does fit on a Post-it note — takes three days of careful engineering to integrate at the CPI level, what does that say about the rest of the landscape? How much complexity is hiding behind each additional protocol I need to support? And at what point does the integration difficulty of the Nth DEX outweigh the arbitrage opportunities it provides access to?

I don't know yet. I'm still finishing this one. And even this "simple" one has shown me that the distance between understanding a formula and actually making it work on-chain is measured in days, not minutes.

Disclaimer

This article is for informational and educational purposes only and does not constitute financial, investment, legal, or professional advice. Content is produced independently and supported by advertising revenue. While we strive for accuracy, this article may contain unintentional errors or outdated information. Readers should independently verify all facts and data before making decisions. Company names and trademarks are referenced for analysis purposes under fair use principles. Always consult qualified professionals before making financial or legal decisions.