DEX Integration Lessons — When Documentation Doesn't Exist

I need to integrate a new DEX program. The first thing I do — the thing any reasonable developer does — is look for documentation. I find the program's GitHub repository. There's a README. It describes what the project is, how to build it, maybe how to run the test suite. There is nothing about how to actually call the program's instructions. No API reference. No account descriptions. No parameter explanations. The README tells me the project exists and that I can compile it. It does not tell me how to use it.

I check for a docs folder. Empty, or missing entirely. I look for a website with developer documentation. There's a landing page with marketing copy and a link to the app. The "Developers" section, if it exists, contains a single page with a half-finished quick-start guide that references instruction names that no longer exist in the current codebase. The guide was written eight months ago. The program has been rewritten twice since then.

This is not an exception. This is the norm.

The Documentation Landscape

There's a spectrum of documentation quality across Solana DEX programs, and the vast majority cluster at the low end.

At the best end, a small number of programs maintain comprehensive documentation: instruction references, account descriptions, example transactions, integration guides. These are the exception — the programs backed by well-funded teams with dedicated developer relations staff. Working with them feels like assembling furniture from a manufacturer that includes clear, illustrated instructions with every piece labeled and every step numbered. You open the box, follow the steps, and end up with a functioning bookshelf.

Most programs are not at that end. Most programs are closer to buying a used car from a private seller who hands you the keys and says "it runs." No owner's manual. No maintenance records. No explanation of what that one switch on the dashboard does or why the previous owner installed an aftermarket relay behind the glove box. The car runs — you can see it running — but understanding how to operate anything beyond the basics requires experimentation, or crawling under the hood and tracing wires yourself.

And then there's the worst end: programs where documentation exists but is actively misleading. The README describes an instruction called swap that takes seven accounts. The actual on-chain program has an instruction called swap_v2 that takes twelve accounts. The old swap instruction still exists in the code as a deprecated function that returns an error if called. The documentation confidently describes an interface that hasn't been current for months. This is worse than no documentation at all. No documentation forces you to find the truth. Wrong documentation gives you a plausible fiction that you spend hours debugging before realizing it was never going to work.

It's the difference between hiking a trail with no trail map and hiking a trail with a map from three years ago that shows a bridge that has since washed out. Without a map, you proceed carefully, looking at every fork, reading the terrain. With the old map, you stride confidently toward a river crossing that no longer exists, and you don't discover the problem until you're standing at the edge of the water.

Why Documentation Lags

This isn't laziness. There are structural reasons why on-chain program documentation tends to be poor.

First, the programs change rapidly. Solana DEX programs are in active development. Instructions get renamed, accounts get added, fee structures get restructured. Every change to the program interface should produce a corresponding change to the documentation. But documentation updates are invisible work — they don't ship features, they don't fix bugs, they don't improve performance. In a competitive market where shipping speed is a survival trait, documentation updates get deprioritized. The code moves forward. The docs stay behind.

Second, the primary users of many DEX programs interact through the protocol's own frontend — the web application. The frontend team has direct access to the on-chain program's interface because they're part of the same organization. They don't need external documentation because they can walk over to the smart contract developer's desk and ask. The documentation that would help external integrators — people like me, building bots that call the program directly — serves a user base that's smaller and less commercially important than the frontend users.

Third, Solana programs are written in Rust, and Rust code has a particular quality that both helps and hurts: it's extremely explicit. Every account, every constraint, every type is declared in the code. In a sense, the code is the documentation. This is true — but it's the kind of truth that requires you to read Rust fluently and understand Solana's account model deeply before the "self-documenting" nature of the code becomes useful. The code documents itself the way a legal contract documents an agreement — every detail is technically there, in precise language, but extracting meaning requires specialized literacy.

The Source Code Is the Only Reliable Reference

When the documentation is missing, outdated, or wrong, there's exactly one source of truth: the program's source code. Not the README. Not the developer guide. Not the Discord channel where someone posted an example six months ago. The source code — the actual Rust files that compile into the deployed on-chain program.

This realization changes my entire integration workflow. I stop looking for documentation first. I go straight to the repository and open the source code. Everything I need is in there. The instruction definitions. The account structures. The business logic. The error codes. The fee calculations. The constraints that determine whether a transaction succeeds or fails. It's all there, written in Rust, waiting to be read.

The key word is read. Source code doesn't explain itself. It doesn't say "here's why we require this account" or "here's the scenario where this error triggers." It just declares what it does. Extracting the information I need — the practical information required to construct a valid instruction — requires a systematic approach to reading unfamiliar code.

I develop a pattern. It works on every Anchor-based program I encounter, and with minor adjustments, on native Solana programs too. The pattern has four stages, and the order matters.

Stage One: The Instruction List

The first thing I need to know is: what can this program do? What instructions does it expose? What are they called?

In an Anchor program, instructions are typically defined in a single file — often called lib.rs or instructions/mod.rs. Each instruction is a function decorated with specific attributes. The function name is the instruction name. The function signature tells me what data parameters the instruction accepts.

I open this file first. I skim the function names. I'm looking for the swap-related instructions — the ones that take tokens in and give tokens out. There might be one swap instruction. There might be several: swap, swap_v2, swap_exact_output, two_hop_swap. Each variant represents a different swap mode with different parameters and different account requirements.

This stage takes minutes. I don't read the function bodies yet. I just collect the names and signatures. It's like walking through a building and reading the signs on the doors before entering any office. I want the directory first. The details come later.

This is the cookbook index. Before I read any recipe, I need to know what recipes exist. I flip through the table of contents, note which dishes are in the book, and mark the ones I need. The actual cooking instructions come in the next stages.

Stage Two: The Account Structures

Every instruction on Solana requires a specific set of accounts. These accounts are the on-chain data that the program reads from and writes to during execution. Getting the account list wrong — missing an account, including the wrong account, putting accounts in the wrong order — causes the transaction to fail.

In Anchor programs, each instruction has an associated context struct that declares exactly which accounts the instruction needs. The struct lists every account with its type, its constraints, and whether it needs to be a signer or writable. This is the most critical piece of information for integration. Without the correct account list, I cannot construct a valid transaction.

I read each context struct carefully. For every account in the struct, I note:

What is this account? A pool state. A token vault. A user's token account. An authority PDA. An oracle. A fee recipient.

Is it mutable? If yes, the transaction must mark this account as writable. Missing this flag causes a runtime error.

Is it a signer? If yes, which key needs to sign? The user's wallet? A PDA derived from specific seeds?

What are the constraints? Is this account expected to be owned by a specific program? Is it expected to have a specific discriminator? Are there seeds that derive this account's address?

This is the most time-consuming stage, and the most important. The account structure is where most integration bugs live. One missing account. One account in the wrong position. One account that should be writable but isn't marked as such. Any of these produces a transaction that fails, often with an error message that doesn't clearly indicate which account is the problem.

It's like reading the ingredient list for a recipe that was written by someone who assumes you already know how to cook. "Flour, eggs, butter, cream, vanilla, sugar, a prepared crust." What kind of flour? How many eggs? What does "prepared crust" mean — am I making it from scratch or buying one? The list tells me what I need but not the specifics, and the specifics are where things go wrong. The context struct tells me the account types but not always where to find them or how to derive their addresses. That requires reading further.

Stage Three: The Business Logic

The processor — the function body of each instruction — contains the actual swap logic. This is where the program reads account data, performs calculations, moves tokens, updates state, and emits events. Reading the processor tells me what the program actually does when my instruction executes.

I don't read this for integration purposes alone. I read it to understand the math. How does the program calculate swap output? What formula does it use? How does it apply fees? In what order does it perform operations? Does it check for slippage before or after the fee deduction?

The processor is also where I discover edge cases and implicit requirements that aren't visible from the instruction signature or account structure. Maybe the program checks whether the pool is paused before executing. Maybe it requires that the input amount is above a minimum threshold. Maybe it refuses to execute if an oracle price is stale. These conditions aren't declared in the account struct — they're buried in the business logic, conditional checks that determine whether the instruction succeeds or fails.

Reading the processor is like reading the fine print on a contract. The headline terms — price, duration, parties — are on the first page. The conditions that actually determine whether the contract works the way you expect — termination clauses, liability limits, force majeure definitions — are buried in the middle pages. You have to read the middle pages. The headline terms tell you what the deal looks like. The fine print tells you what the deal actually is.

Stage Four: The Data Structures

State files define how on-chain accounts store their data. Pool state accounts, configuration accounts, user position accounts — each one has a defined layout: which fields exist, what types they use, how they're ordered in the serialized byte array.

I need this information for two reasons. First, to parse on-chain data. My bot reads pool state accounts to get current reserves, fee rates, and other parameters needed for off-chain calculation. Parsing these accounts correctly requires knowing exactly which bytes correspond to which fields. A wrong offset means reading garbage data, and garbage data in a swap calculation means garbage output — a predicted profit that doesn't match reality.

Second, to understand the relationships between accounts. The pool state account might contain the public keys of its token vaults, its fee account, its authority PDA. Reading these fields tells me where to find the other accounts I need for the instruction. The state file is the map that connects everything together.

This stage often sends me back to Stage Two. I read the state layout, realize that a field I need — the fee rate, the tick spacing, the protocol fee share — is stored in a different account than I expected, and I have to trace the account relationships to find it. The integration is an iterative process, not a linear one. Each stage informs the others.

The On-Chain Transaction as Rosetta Stone

Source code tells me what the program expects. On-chain transactions tell me what actually works.

When I'm stuck — when the source code is ambiguous, when I can't determine the correct account ordering, when a constraint makes no sense to me — I turn to the blockchain itself. I find a successful transaction that calls the instruction I'm trying to replicate. I open it in a block explorer. And I read it like an archaeologist reading a recovered manuscript.

The transaction shows me everything: which accounts were included, in what order, with what flags (signer, writable). It shows me the instruction data — the serialized parameters that were passed. It shows me the program logs — messages the program emitted during execution. It shows me the token balance changes — how much went in, how much came out.

A single successful transaction is worth more than ten pages of documentation. Documentation tells me what should work. A successful transaction tells me what does work. There's no ambiguity. This specific combination of accounts, in this specific order, with this specific instruction data, produced this specific result. It's a proven fact, recorded immutably on-chain.

I develop the habit of collecting reference transactions for every instruction I integrate. Before I write a single line of integration code, I find three to five successful transactions of the type I need. I compare them. What's consistent across all of them? Those are the fixed requirements. What varies? Those are the parameters I need to customize for each specific pool or swap.

It's like learning to cook a dish not from a recipe but from watching someone else's cooking show. I can't taste what they're making, but I can see every ingredient they add, every technique they use, every step they follow. I can pause, rewind, and examine each moment. The demonstration is concrete where a written recipe might be vague. "Season to taste" is ambiguous. Watching someone add exactly one teaspoon of salt is not.

The combination of source code reading and transaction analysis covers almost every gap that missing documentation creates. The source code tells me what the program is designed to do. The transactions tell me what the program actually does in practice. Together, they form a complete picture.

The Open Source Prerequisite

This entire methodology — reading source code, tracing account structures, analyzing business logic — has a prerequisite that's easy to overlook: the source code must be available. The program must be open source.

On Solana, deployed programs are compiled bytecode. Without the source code, I can see that a program exists and I can observe its behavior through transactions, but I cannot read its logic. I can reverse-engineer some things from transaction analysis alone — which accounts it needs, what instruction data format it expects — but I cannot understand why it needs those accounts or what it does with the instruction data internally. The business logic, the fee calculations, the edge case handling — all of that remains opaque.

Open source programs give me everything. Closed source programs with no documentation give me nothing except transaction archaeology. I can mimic what I observe without understanding what I'm mimicking. This works until it doesn't — until the program has a mode or edge case that I haven't observed in existing transactions, and I hit it for the first time with no understanding of why it fails.

The open source nature of most Solana DEX programs isn't just convenient — it's essential. It's the reason integration is possible at all in the absence of documentation. The community norm of publishing source code compensates for the community shortcoming of not writing documentation. The code is public. The code is readable. The code is truth.

This is the genuine, practical value of open source that no amount of philosophical argument captures as well as the experience of needing to integrate with a closed source program that has no docs. Open source means that when the documentation fails — and it will fail — there's a fallback. The fallback is harder to read than good documentation would be. It requires Rust literacy and Solana domain knowledge. But it exists. With closed source and no docs, there is no fallback. There's a wall.

The Hierarchy of Truth

Through dozens of integration attempts, a hierarchy emerges. It's the order in which I seek information, ranked by reliability:

Official documentation, when it exists and is current, is the easiest to consume. But it must be verified against the source code, because documentation can lag behind the deployed program. Documentation is a convenience, not a source of truth.

Source code on GitHub is the ground truth for what the program is designed to do. But it must be verified against the deployed version, because the repository's main branch might contain changes that haven't been deployed yet. The deployed program and the published source code can diverge.

On-chain transactions are the ground truth for what the program actually does in production. They cannot lie — they are recordings of actual executions. But they only show cases that have actually occurred. If no one has ever called a particular instruction with a particular edge case parameter, there's no transaction to observe.

On-chain account data is the ground truth for current state. The actual reserves in a pool. The actual fee rate in a configuration account. The actual tick arrays in a concentrated liquidity pool. This data is always current, always real, and always available to read directly from the chain.

Each layer verifies the one above it. Documentation says the fee rate is 0.3%. Source code shows the fee rate is read from a configuration account. On-chain data shows the configuration account currently contains 0.25%. The documentation is wrong. The source code is right about where the fee lives. The on-chain data is right about what the fee actually is.

Official docs to GitHub source to on-chain transactions to on-chain data. That is the path from convenience to truth. When I'm in a hurry, I start at the top. When accuracy matters — and in MEV, accuracy always matters — I work my way down to the bottom.

What This Teaches

The absence of documentation is frustrating, but it teaches something that good documentation never would: how to read code as the primary information source instead of a secondary reference.

Most developers learn to code by writing it. Reading code — especially unfamiliar code, in an unfamiliar codebase, written by developers with different conventions and assumptions — is a separate skill. It's the difference between speaking a language and reading literature written in that language. Speaking is producing. Reading is comprehending someone else's production, with all their stylistic choices, their implicit assumptions, their organizational preferences that differ from yours.

Integrating undocumented DEX programs forces this skill. I can't write integration code without first reading the target program's code. I can't read the target program's code without understanding Rust's type system, Solana's account model, and Anchor's framework conventions. Each integration is a reading comprehension exercise before it's a coding exercise.

The methodology — instruction list, account structures, business logic, data structures, cross-referenced against on-chain transactions — works because it imposes structure on what would otherwise be an overwhelming amount of unfamiliar code. I don't try to understand the entire program. I extract the specific information I need, in a specific order, building understanding incrementally. The four stages are a reading protocol, a way to navigate thousands of lines of unfamiliar Rust without getting lost.

Code is truth, not docs. The documentation might tell me what the developers intended. The source code tells me what they built. The on-chain data tells me what's running right now. In a domain where a single wrong account or a single wrong byte offset means a failed transaction and a missed opportunity, intention doesn't matter. What the program actually does is the only thing that matters. And the only way to know what the program actually does is to read it — in the source code, and on the chain.

Disclaimer

This article is for informational and educational purposes only and does not constitute financial, investment, legal, or professional advice. Content is produced independently and supported by advertising revenue. While we strive for accuracy, this article may contain unintentional errors or outdated information. Readers should independently verify all facts and data before making decisions. Company names and trademarks are referenced for analysis purposes under fair use principles. Always consult qualified professionals before making financial or legal decisions.