Meteora Debugging — The War Against Custom Errors

The integration code compiles. The swap simulation runs. The instruction builder produces transactions. Everything looks correct on paper. I submit the first batch of bin-based swap transactions, and every single one fails.

Not "fails silently." Not "produces a wrong answer." Fails loudly, on-chain, with an error message that reads like a rejection notice from the DMV: a number, no explanation, and an implicit suggestion that I should have known what it means.

Custom program error: 0x1772.

That's it. That's the entire diagnostic. A hex number. No text. No hint about which account was wrong, which parameter was invalid, which assumption I violated. Just a number, printed in the transaction logs like a citation on a parking ticket — technically all the information I need, if I know where to look it up.

I don't know where to look it up. Not yet.

The Error Code Wall

On-chain programs on Solana don't throw exceptions with helpful English messages. They can't. Every byte of output costs compute units. Every character in an error message is a character that adds to the transaction's resource footprint. So instead of telling you "the third account in your instruction is the wrong type and should be a writable token account owned by the system program," the program emits a number.

Just a number.

It's the blockchain equivalent of your car's check engine light. The light comes on — something is wrong. What's wrong? The light doesn't say. You need an OBD-II scanner to read the diagnostic trouble code. P0300 means random misfire. P0420 means catalytic converter efficiency below threshold. P0171 means the system is too lean. Each code maps to a specific condition, documented in a thick manual that the manufacturer provides to certified mechanics.

On-chain programs work the same way. The program defines a list of error conditions, each assigned a number. When a condition is triggered, the program logs the number and aborts. To decode the number, you need access to the program's error definition list — its equivalent of the OBD-II code manual.

The problem is that not every program publishes its manual in an easily accessible format. Some programs are built with the Anchor framework, which generates an IDL — an Interface Definition Language file — that includes a complete mapping of error codes to human-readable names and descriptions. If I have the IDL, decoding an error is trivial: look up the number, read the description, understand the problem.

If I don't have the IDL, I'm reading source code. Scrolling through Rust files, looking for an enum decorated with #[error_code], counting variants from the top to figure out which variant corresponds to the number I'm seeing. It's like trying to diagnose a car problem by reading the factory service manual in the original German, page by page, because nobody translated the code table into English.

0x1772 in decimal is 6002. Anchor encodes custom errors by adding 6000 to the variant's index, so 6002 means the variant at index 2 — the third entry in the program's error enum. That entry holds the answer to why my transaction is failing.

The Anchor Error System

Anchor, the framework that most Solana programs are built with, uses a structured error system. Programs define their errors in an enum, like this conceptually:

Error 0: SomeFirstCondition
Error 1: AnotherCondition  
Error 2: YetAnotherThing
...

But the numbers you see in transaction logs aren't 0, 1, 2. Anchor offsets custom program errors by a base value — 6000 for program-specific errors. So the first custom error in the program's enum is 6000, the second is 6001, the third is 6002, and so on.

This means error 6002 is the third custom error variant in the program's error enum. To decode it, I need to find that enum and count to the third entry.

For programs with published IDLs, this is straightforward. The IDL is a JSON file that includes an errors array with code-name-message triples. Load the IDL, find the entry where code equals 6002, read the name and message. Done.

For programs without a conveniently published IDL — or programs where the published IDL is outdated and doesn't match the deployed version — I'm spelunking through source code. Clone the repo. Find the error enum. Start counting. Hope that the version I'm reading matches the version deployed on-chain, because if the developers added new error variants between the version I'm reading and the deployed version, my count is off and I'm decoding the wrong error.

This is exactly what happens. I find the error enum, I count to position three (0-indexed: position 2, which maps to code 6002), and I read the variant name. It describes a specific constraint violation related to the swap state. Now I have a name. A name is not a fix, but it's the first handhold on a cliff face.

The Collision Problem

Here's where it gets genuinely hostile. When my bot calls a DEX program via CPI, and that program calls yet another program internally, the error code that bubbles up could originate from any program in the call chain.

Anchor framework errors — things like "account not initialized," "constraint violation," "account owner mismatch" — live in a lower range (typically below 6000). Program-specific custom errors start at 6000. But when multiple programs are involved through CPI, each with its own 6000+ error range, the same number can mean completely different things depending on which program threw it.

The practical result: when I see "Custom Error: 6002" in a failed transaction's logs, I need to determine which program in the CPI chain produced it. The same number, multiple possible sources, completely different meanings.

It's like getting a rejection letter from the IRS that says "Error 1040." Is that a reference to Form 1040 — your tax return? Or is 1040 an error code in their processing system that means something unrelated to the form number? The number is the same. The context determines the meaning. And the context isn't always clear.

To disambiguate, I have to look at where in the instruction's execution the error occurs. If it happens during Anchor's account deserialization and constraint checking phase — before the program's business logic runs — it's likely a framework error. If it happens during the program's swap execution — after accounts are validated and the actual swap math runs — it's likely a custom program error.

But the transaction logs don't always make this distinction cleanly. Especially when a program uses CPIs — Cross-Program Invocations — where one program calls another, the error might bubble up through multiple layers, and the layer that emitted the error code might not be the layer I expect.

I end up building a manual disambiguation process: check the error code against the program's IDL error list, check it against Anchor's framework error list, look at the instruction index where the failure occurred, look at the CPI depth, and make a judgment call. It works. It's tedious. It's error-prone in exactly the ironic way you'd expect from a process designed to debug errors.

"Just One Bin" — The Assumption That Breaks Everything

After decoding the error and understanding the constraint it describes, I start investigating why the constraint is being violated. The error points to a missing account — an account that the program expects to find in the instruction's account list but doesn't find.

The account in question is a bin array.

Here's my original mental model: a swap happens within the active bin. The active bin is in a specific bin array. I include that bin array in the transaction's account list. The swap executes within the active bin, moves some liquidity around, done.

This mental model is wrong.

A swap doesn't necessarily stay within one bin. If the input amount is large enough to exhaust the active bin's liquidity at the current price, the swap continues into the next bin. And the next. And the next, until the entire input amount is consumed or there's no more liquidity to trade against.

I know this. I wrote the simulation code that handles multi-bin traversal. I implemented the bitmap scanning that finds the next active bin when the current one empties. This is not new information.

What I missed is the account implication. When the swap crosses from one bin into another, and those two bins are in different bin arrays, both bin arrays need to be in the transaction's account list. Not just the one containing the active bin — all of them. Every bin array that the swap might touch during execution must be present as an account in the instruction.

And "might touch" is the operative phrase. I have to predict, before submission, how many bins the swap will traverse. If my prediction is wrong — if the on-chain state has changed between my simulation and the transaction's execution, causing the swap to traverse one more bin array than I expected — the transaction fails. Missing account. Custom error. Back to the rejection notice.

It's like planning a road trip and being told you need to list every single highway you'll drive on before you start the car. Miss one? Trip cancelled. And the highways you'll need depend on traffic conditions that change between the moment you plan the route and the moment you actually drive it.

My initial implementation includes exactly one bin array — the one containing the active bin. This works for small swaps that stay within a single bin. It fails for any swap that crosses a bin boundary into a different bin array. Which, given that my arbitrage bot is trying to exploit price differences that often span multiple bins, is most of them.

The fix is to include additional bin arrays. But how many? One extra? Two? The safe approach is to include the bin arrays on both sides of the active bin — the next higher and the next lower — covering the most likely traversal directions. For larger swaps or swaps in pools with small bin steps (where bins are close together and a price movement crosses many of them), I might need even more.

Each additional bin array account costs transaction space. Every account in a Solana transaction's account list takes 32 bytes. There are limits to how many accounts a single transaction can reference. Including too many "just in case" bin arrays bloats the transaction, potentially pushing it over the size limit or the compute budget. Including too few means failing transactions.

The solution is to include enough bin arrays to cover the swap's traversal range. Too few means failing transactions. Too many wastes transaction space. Finding the right balance is part of the engineering challenge.

The Bitmap Extension Surprise

The fix for missing bin arrays resolves most of the failures. Most, not all. A subset of transactions keep failing with the same class of error — missing account — but this time the missing account isn't a bin array. It's something else.

The bitmap extension.

The bitmap is the data structure that tracks which bins have liquidity. The base bitmap covers a certain range of bins centered around the pool's current active bin. For pools where liquidity is spread over a wider range — or where the active bin is near the edge of the base bitmap's coverage — the program needs to look beyond the base bitmap to find the next bin with liquidity.

This is where the bitmap extension PDA comes in. It extends the search range beyond what the base bitmap covers. And here's the thing I missed: the on-chain swap instruction doesn't just use the bitmap as an internal lookup table that the program reads from its own state. The bitmap extension is a separate account, and it needs to be included in the transaction's account list if the swap might need to consult it.

My active bin search logic has a bug. When the swap simulation determines which direction to search for the next active bin (after exhausting the current one), it searches the base bitmap. If the next active bin is within the base bitmap's range, everything works. But if the search needs to extend beyond the base bitmap — if the next bin with liquidity is far enough away that the base bitmap doesn't cover it — the search needs the bitmap extension, and my instruction builder doesn't include it.

The failure mode is insidious. Most of the time, the next active bin is close enough that the base bitmap covers it. The bitmap extension is only needed when there's a gap in liquidity — a stretch of empty bins wide enough that the base bitmap's range is insufficient. This happens unpredictably. Some pools have dense, continuous liquidity and never trigger the issue. Other pools have spotty liquidity with gaps, and those pools fail intermittently, creating the worst kind of bug: the kind that works most of the time.

To fix this, I need to determine whether the bitmap extension is needed for a given swap. The bitmap extension PDA can be derived from the pool address, and I can check on-chain whether it exists. If it does, I include it in the account list. If it doesn't, it's not needed.

The remaining failures disappear. Every bin array, every bitmap extension, every account the on-chain program might possibly reference during the swap — all of them present in the transaction. The account list for a bin-based swap instruction is longer than I expected. Not by a little. Some swaps that traverse multiple bins across multiple bin arrays with bitmap extension lookups end up with a dozen or more accounts in the instruction.

The Simulator as Debugger

Through all of this, one tool saves me more time than everything else combined: simulateTransaction.

Solana's RPC provides a simulation endpoint. Send a transaction — unsigned, doesn't matter — and the RPC node executes it against the current state without actually committing anything. No signature needed. No fee charged. No on-chain effect. The simulation returns the execution result: success or failure, compute units consumed, and full program logs.

Full program logs. That's the key.

When a transaction fails on-chain, the logs in the transaction record are often truncated or missing. The explorer shows "Custom Error: 6002" and maybe one or two log lines. Not enough to trace the execution path and understand where exactly the error occurred.

Simulation gives me everything. Every msg!() the program emits during execution. Every CPI call. Every account access. The complete trace of what happened between "instruction starts executing" and "program aborts with error." It's the difference between a crash report that says "application stopped working" and a full stack trace with line numbers and variable values.

I build simulation into my debugging workflow as a first-class tool. Before submitting any transaction, simulate it. If the simulation fails, parse the logs, decode the error, fix the instruction, simulate again. Only submit to the network after simulation succeeds.

This sounds obvious. It is obvious. But there's a nuance: simulation runs against the current state at the moment of the RPC call. The on-chain state changes constantly — every slot, new transactions modify account data. A simulation that succeeds at time T might fail at time T+400ms because another transaction modified the pool state in between.

For debugging, this doesn't matter. I'm using simulation to verify that my instruction construction is correct, not to predict whether the transaction will land successfully in a competitive MEV environment. If my simulation fails, the problem is in my code — wrong accounts, wrong data, wrong parameters. If my simulation succeeds but the on-chain transaction fails, the problem is state change between simulation and execution, which is a timing issue, not a construction issue.

Simulation becomes my primary debugging loop. Fail → simulate → read logs → decode error → fix → simulate → succeed → submit. The cycle time drops from minutes (submit, wait for confirmation, check explorer, parse sparse logs) to seconds (simulate, read full logs immediately). The density of debugging information per iteration increases dramatically.

I start thinking of simulation as the on-chain equivalent of a local test environment. In traditional software development, you run your code locally, see the errors, fix them, run again. On-chain development doesn't have a "local" in the same way — the program runs on the network, not on my machine. But simulation gives me a close approximation. I can throw malformed transactions at the simulator all day, read the full error context, and iterate without spending a single lamport.

The Account Checklist

After weeks of debugging bin-based swap failures, a pattern emerges. The vast majority of errors fall into a small number of categories, and they all reduce to the same root cause: the transaction's account list is wrong.

Not "the swap math is wrong." Not "the parameters are invalid." The accounts. Always the accounts.

Here's the mental checklist I develop for building a bin-based swap instruction:

Pool state account. Obviously. This is the main account that holds the pool's configuration and current state — active bin ID, fee parameters, protocol fees. Every swap instruction starts here.

Token vaults. The pool's reserve accounts for token X and token Y. The program reads the current reserve balances and modifies them during the swap. Both must be present and writable.

Token mint accounts. The mint accounts for both tokens in the pair. The program needs these to verify token types and decimals. Sometimes only needed for specific token program interactions, but safer to include.

User token accounts. My token accounts for the input and output tokens. The direction of the swap determines which is the source (tokens go out) and which is the destination (tokens come in). Getting the source/destination indices wrong based on swap direction is a bug I've already fixed once and refuse to fix again.

Bin arrays. All bin arrays that the swap might traverse. Not just the active bin's bin array — the adjacent ones too, based on swap simulation results plus a safety margin.

Oracle account. Some pools have an oracle account that tracks price history. If the pool uses it, the instruction must reference it.

Bitmap extension. If the swap's bin search might extend beyond the base bitmap's range, the bitmap extension PDA must be included. Direction-dependent — a swap that moves the price up needs the extension in the positive direction, and vice versa.

Token program accounts. The SPL Token program, or Token-2022, depending on which token standard the pool's tokens use. Some pools mix both. Getting this wrong produces unhelpful "owner mismatch" errors from the token program.

Event authority. Some programs emit events through a separate event authority PDA. If the program's instruction expects this account, omitting it causes a failure that has nothing to do with the swap logic.

Every missing account produces the same symptom: a numeric error code. Every wrong account produces the same symptom: a different numeric error code, or sometimes the same one. The error doesn't say "you forgot the bitmap extension." It says "6002." Or "6003." Or "3012." And I'm back to the OBD-II scanner, looking up codes in a manual.

What Numbers Don't Tell You

The deepest frustration with numeric error codes isn't the decoding. I can build lookup tables. I can memorize the most common codes. I can write scripts that automatically map hex codes to error names. The decoding is a solved problem after enough repetition.

The frustration is what the error code doesn't tell me: why the condition was triggered in this specific case.

Error 6002 might mean "invalid account." Great. Which account? There are fifteen accounts in this instruction. The error doesn't say which one is invalid. It doesn't say what makes it invalid — wrong owner, wrong type, not initialized, not writable, wrong PDA derivation? Any of these could produce the same error code.

It's like getting a hospital bill that says "Charge: $847.00 — Code 99213." What's 99213? It's a CPT code for an outpatient office visit of moderate complexity. Does that tell me what the doctor did? Technically yes — "moderate complexity visit." Does it tell me which test they ran, which diagnosis they considered, which medication they discussed? No. The code is a category, not a narrative.

On-chain error codes are categories. "Something was wrong with an account." "A mathematical invariant was violated." "An authorization check failed." The category narrows the search space but doesn't pinpoint the problem. I still have to reconstruct the narrative — which specific account, which specific invariant, which specific authorization — through simulation logs, account inspection, and sometimes plain guesswork validated by trial and error.

Over time, I build an intuition. Error X in the context of instruction Y with this particular set of accounts usually means Z. This intuition speeds up debugging enormously. But it's pattern matching, not systematic diagnosis, and pattern matching fails when the situation doesn't match any pattern I've seen before. Then I'm back to first principles: simulate, read every log line, inspect every account, and work through the possibilities one by one.

The War Is Ongoing

I don't "solve" the error code problem. It's not the kind of problem that gets solved. It's the kind of problem that gets managed.

New error codes appear when the program updates. Existing error codes change meaning when the program's error enum is modified. Error codes that I've never seen before surface when a rare code path triggers for the first time. The lookup table grows. The pattern-matching intuition expands. The simulation-based debugging workflow gets more efficient. But the fundamental challenge — a number that means something, decode it — never goes away.

What does improve is my relationship with the number wall. The first time I saw "Custom Error: 0x1772," it was a brick wall with no doors. Now it's a locked door and I have a ring of keys. Not all the keys work on the first try, but I know which ones to reach for first, and I know how to cut a new key when none of the existing ones fit.

The bin-based swap integration is working now. Transactions land. The account lists are comprehensive — sometimes longer than I'd like, but complete. The simulation catches construction errors before they hit the chain. The error code lookup table handles the common cases. The disambiguation logic between framework errors and program errors usually gets it right on the first interpretation.

The swap traverses multiple bins when it needs to. The bitmap extension is included when the search goes wide. The bin arrays cover the traversal range with a safety margin. Each of these was a separate failure, a separate debugging session, a separate numeric code decoded and traced to a root cause.

Nobody tells you, when you start building on-chain systems, how much of your time will be spent decoding numbers. Not writing algorithms. Not optimizing performance. Not designing architecture. Decoding numbers. Looking up error 6002. Figuring out which 6002. Determining which account triggered 6002 this time, versus the different account that triggered 6002 last time.

The transactions land now. The numbers still appear, sometimes, for new edge cases, for new pools with unusual configurations, for state changes that my simulation didn't predict. And when they do, the process restarts: simulation, logs, decode, fix, simulate again.

The war against custom errors doesn't end. The front line just moves.

Disclaimer

This article is for informational and educational purposes only and does not constitute financial, investment, legal, or professional advice. Content is produced independently and supported by advertising revenue. While we strive for accuracy, this article may contain unintentional errors or outdated information. Readers should independently verify all facts and data before making decisions. Company names and trademarks are referenced for analysis purposes under fair use principles. Always consult qualified professionals before making financial or legal decisions.