692% Overestimation — The Marginal Profit Trap
The math is fixed. Every fee parser is correct. The rounding direction matches what the on-chain programs actually do. Integer ceiling division for fees, floor division for outputs, volatility accumulators properly included. I've spent weeks hunting down every source of mathematical error, and I'm confident — genuinely confident — that my formulas now produce accurate results. Give them the right inputs, and they give me the right outputs.
So I run the bot. I watch the screener evaluate cycles, predict profits, filter for the ones that look good, and forward them for execution. And the results come back wrong. Not slightly wrong. Not off by a rounding error or a fee miscalculation. The predicted profits are off by multiples — several hundred percent higher than what the chain would actually yield. My screener says a cycle will return a healthy profit. The chain says the cycle barely breaks even, or loses money outright.
I'm staring at this gap like a homeowner staring at their Zillow estimate versus the actual offer they got from a buyer. The algorithm said my house was worth $450,000. The highest bid at the open house was $310,000. The methodology behind the estimate isn't wrong, exactly — it uses real comparable sales, real square footage, real neighborhood data. But the number it produces and the number someone will actually pay are living in different realities.
My formulas are the Zillow algorithm. They're mathematically sound. And they're producing numbers that don't match what happens when real money hits the table.
The Moment the Gap Becomes Undeniable
I notice it gradually, then all at once.
The gradual part: my execution success rate is dismal. Bundles come back as invalid or unprofitable. The on-chain program's built-in profit check — a safety rail I built that rejects any cycle that doesn't actually produce the expected return — keeps firing. My screener identifies what it considers a profitable opportunity, I build and submit the transaction, and the chain says no, the actual output doesn't meet the minimum threshold. Over and over.
I initially attribute this to the fee bugs I've been fixing. Maybe there's another parser error I haven't found. Maybe there's a rounding edge case in a DEX I haven't audited yet. I keep looking for math bugs, because math bugs are what I know how to fix. When you have a hammer, everything looks like a nail.
The all-at-once part: I add detailed logging that compares, for every cycle the screener evaluates, the predicted profit against what a simulation of the same cycle returns when I query the chain's latest state just before submission. Not what happens after execution — what the chain says the state is right now, at the moment I'm about to submit.
The results are devastating. Even at the moment of submission, when I'm working with the freshest data I can get, the predicted profit from my screener is already diverging from what a fresh query shows. My screener evaluated the cycle using state from some number of milliseconds ago. By the time I'm ready to submit, the state has already moved. And between submission and actual execution — when other transactions land in the same block or the next block — it moves again.
The math isn't the problem anymore. The math is a telescope pointed at a star that has already moved by the time the light reaches me.
Cause One: Stale State — Yesterday's Stock Price
Here's the fundamental mechanical issue. When my screener reads the state of a liquidity pool — the reserves, the current tick, the fee parameters — it's reading a snapshot. That snapshot reflects the state as of the most recently committed block. On Solana, blocks are produced roughly every 400 milliseconds. By the time my screener reads the state, processes the data, evaluates the cycle, builds a transaction, and submits it, multiple blocks may have passed. Each of those blocks may contain transactions that changed the pool state.
It's like making a stock trade based on yesterday's closing price. You check the ticker at market close: ACME Corp is at $50. You decide to buy. You place the order the next morning. But overnight, earnings were released, analysts revised their targets, and the pre-market price is already $53. Your decision was based on sound analysis of the $50 price point. But you're not buying at $50. You're buying at whatever price exists at the moment your order hits the exchange floor.
In traditional stock markets, this is the difference between a quote and a fill. Everyone understands that the quoted price is indicative, not guaranteed. Market orders fill at the best available price at execution time, not at the price you saw when you decided to trade. There are entire regulatory frameworks built around this distinction.
In on-chain arbitrage, the same principle applies with even more severity. The pool state my screener reads is the quote. The pool state at execution time is the fill. And the gap between them is not a minor slippage — it can be enormous, because the pools I'm trading in aren't just subject to random market fluctuations. They're subject to deliberate, high-speed, competitive trading by other participants who are actively reshaping the state in the time between my read and my execution.
Consider what happens conceptually. My screener reads a pool at block height N and calculates the expected output of a swap based on those reserves. The cycle looks profitable.
But between block N and block N+2 — roughly 800 milliseconds later — other swaps land in the same pool. The reserves shift. The price has moved. My swap, calculated against the old reserves, would have yielded one amount. Against the new reserves, it yields something different — maybe more, maybe less, but certainly not what I predicted. The larger the intervening trades relative to the pool's liquidity, the wider the gap between my prediction and reality.
This isn't a bug in my code. It's a fundamental property of the system. Off-chain state queries can only fetch the state of the last-committed block. They inherently lag behind what's actually happening on the network. Research into MEV systems consistently identifies this as a structural limitation: by the time you've observed the opportunity, the opportunity has already begun to change.
The analogy I keep coming back to is a weather forecast. A weather forecast isn't wrong because the meteorologist used bad equations. The atmospheric models are sophisticated, well-calibrated, and based on decades of observation. The forecast is imprecise because the atmosphere is a chaotic system that evolves between the time the model runs and the time you experience the weather. A forecast that says "72°F and sunny at 2 PM" isn't a guarantee — it's a probability statement based on the best available data at modeling time. By 2 PM, conditions have evolved. Maybe it's 74°F. Maybe a cloud rolled in. The model was right given its inputs; the inputs just aren't the future.
My screener is a weather forecast. It tells me what profit to expect given current conditions. But conditions are already changing by the time I hear the forecast.
Cause Two: Competition — Everyone's Hunting the Same Deer
Stale state would be manageable if I were the only participant. If no one else were trading in these pools, the state would change slowly — just organic market activity, regular users swapping tokens for their own purposes. The drift between my read and my execution would be small and somewhat predictable.
But I'm not the only one. Not even close.
The Solana MEV ecosystem is a crowded room. Dozens — probably hundreds — of bots are scanning the same pools, evaluating the same cycles, and identifying the same opportunities. When a price discrepancy opens up between two pools, I see it. And so does everyone else. We're all looking at the same data, running similar math, and arriving at similar conclusions about which cycles are profitable.
This is the hunting analogy. Imagine a hundred hunters entering the same forest, all tracking the same deer. Each hunter sees the deer's tracks, estimates its position, and plans an approach. But there's only one deer. The first hunter to get within range takes the shot. After that, the deer is gone — or at least, it's no longer where the other ninety-nine hunters expected it to be. Their tracking data was accurate when they collected it. Their trajectory calculations were correct. But the situation changed because someone else acted first.
In MEV terms, when another bot executes a swap on the same pool I'm targeting, it moves the reserves. That movement directly affects the profitability of my planned trade. If someone else arbitrages the price discrepancy before I do, the discrepancy shrinks or vanishes. My screener predicted profit based on the pre-arbitrage state. By the time my transaction executes, the post-arbitrage state yields far less — or nothing at all.
This creates a particularly cruel dynamic. The most profitable opportunities — the ones with the largest predicted margins — attract the most competition. A cycle that my screener evaluates at a high predicted return is, by definition, one that every other bot also evaluates as highly profitable. It's the brightest signal in the forest. Every hunter heads straight for it. The chances that I'm the first to execute are inversely proportional to how attractive the opportunity looks.
It's like showing up to a garage sale that was advertised as having a vintage Les Paul guitar for $50. You saw the Craigslist ad, you calculated the resale value, you budgeted for gas and decided the trip was worth it. You arrive at 6 AM, confident. There are already thirty people in line. Some of them drove from two states over. The guitar sells to the first person through the door. Your analysis was correct — the guitar was absolutely underpriced. But correctness doesn't matter when you're thirty-first in line.
On Solana, the "line" is determined by transaction ordering within a block. Transactions are processed by the validator's scheduler, and while Solana doesn't have a traditional mempool like Ethereum-like chains do, there's still a race to be included. Validators receive transactions from many sources simultaneously, and the ordering within a block determines who gets the pre-move state and who gets the post-move state. Being a few milliseconds late can mean the difference between profit and loss — not because your math was wrong, but because someone else's identical math executed first.
The competition factor compounds the stale state problem. Stale state means my data is from the past. Competition means the present is being actively shaped by other participants who are also acting on data from the past. The result is a feedback loop: everyone acts on stale data, everyone's actions change the state, and the state that actually exists at execution time is the result of all those competing actions — none of which any individual participant could have predicted.
Cause Three: Cumulative Slippage — The Three-Hop Tax
There's a third factor that amplifies the overestimation, and it's intrinsic to cyclic arbitrage: multi-hop slippage accumulation.
A typical arbitrage cycle in my system goes A→B→C→A. Three pools. Three swaps. Each swap has its own slippage characteristics — the relationship between the trade size and the price impact, determined by the pool's liquidity depth and the shape of its bonding curve.
When I evaluate this cycle with my screener, I calculate each swap's output sequentially. I take the output of swap 1 as the input to swap 2, and the output of swap 2 as the input to swap 3. This is correct. But each calculation uses the current state of its respective pool. And the slippage error — the difference between predicted output and actual output — compounds at each hop.
Think of it like a GPS estimated arrival time on a road trip with three legs. Leg 1: the GPS says 2 hours. You hit construction. It takes 2 hours and 20 minutes. Leg 2: the GPS estimated 1.5 hours based on current traffic. But you started leg 2 twenty minutes late, so traffic patterns have shifted. It takes 1 hour and 45 minutes. Leg 3: the GPS says 45 minutes. You're now over an hour behind the original estimate, rush hour has started, and it takes 1 hour and 10 minutes.
Each leg's estimate was reasonable in isolation. But the errors don't cancel out — they accumulate. Your GPS said the total trip would take 4 hours and 15 minutes. It actually took 5 hours and 15 minutes. Each individual estimate was off by a modest amount. The cumulative effect was an hour of unexpected driving.
In a three-hop arbitrage cycle, slippage works the same way. Each hop introduces its own prediction error. The error from hop 1 becomes a larger or smaller input to hop 2 than expected. Hop 2's own prediction error then applies to this already-deviated input. By hop 3, the cumulative deviation can be substantial — even if each individual hop's prediction was only slightly off.
And this is where it gets mathematically interesting: slippage errors in a cycle tend to compound in the same direction. When the stale state problem causes overestimation, it overestimates at each hop. So hop 1 predicts slightly more output than you'll get. That slightly inflated output feeds into hop 2, which also overestimates. The slightly inflated output of hop 2 feeds into hop 3. By the end of the cycle, you've run your profit prediction through three consecutive overestimation stages.
It's not additive. It's multiplicative. If each hop overestimates by, say, a few percent, the three-hop cycle doesn't overestimate by three times that. It overestimates by the product. Several percent at each hop, compounded three times, produces a total overestimation that's significantly larger than any individual hop's error would suggest. At scale, across hundreds of cycle evaluations, this systematic bias means the screener is consistently, structurally optimistic about every multi-hop opportunity it evaluates.
I run a controlled test: I compare the overestimation ratio for two-hop cycles versus three-hop cycles. The three-hop cycles have measurably worse prediction accuracy. Same screener, same pools, same math — just one more hop. That additional link in the chain doesn't just add slippage. It multiplies the uncertainty.
This is why portfolio theory in traditional finance treats multi-leg trades with extra caution. A pairs trade — simultaneously buying one stock and shorting another — has execution risk on both legs. If one leg fills and the other doesn't, or if both legs fill at slightly worse prices than expected, the combined slippage eats into the theoretical edge. Professional trading desks model this explicitly. They don't just calculate the theoretical profit; they model the distribution of possible execution outcomes and size their positions based on the worst reasonable case, not the expected case.
My screener has been sizing its confidence based on the expected case. And the expected case is systematically too optimistic.
The Fundamental Limit: Right Formula, Wrong Inputs
I step back and look at what all three causes have in common. It's not bad math. It's not buggy code. It's not incorrect formulas or misunderstood fee structures. I've fixed all of those already. What I'm dealing with now is something more fundamental.
My screener computes a "predicted state" — its best estimate of what the pool state will be at execution time, based on the state it can observe right now. The formulas it uses to transform inputs into outputs against that predicted state are correct. The predicted state itself is the problem.
The concept of predicted state is inherently an approximation. It takes the current on-chain state — the last committed block's snapshot — and treats it as if it will persist until execution time. But it won't. Other transactions will modify it. Market participants will trade against these pools. Validators will order transactions in ways I can't predict. By the time my transaction actually touches the pool contracts, the state has moved to somewhere I didn't — couldn't — anticipate.
This is the mathematical equivalent of using the right formula with the wrong inputs. The output is guaranteed to be wrong, and the wrongness has nothing to do with the formula. If I plug the wrong numbers into a perfectly correct equation, I get a perfectly wrong answer. The equation isn't at fault. The measurement is.
I've been thinking about this in terms of a Kelley Blue Book valuation. You can look up your car — year, make, model, mileage, condition — and get a precise dollar figure. The methodology is sound. It reflects real transaction data from real car sales across the country. But when you walk into a dealer, the offer you get depends on factors the algorithm couldn't capture: today's lot inventory, the dealer's monthly quota, whether it's a Tuesday in February or a Saturday in June, the guy ahead of you who just traded in the same model. The algorithm gave you a number. The market gives you a different number. Both are "correct" in their respective frames of reference. Only one of them is the number you actually get.
The predicted state is my Kelley Blue Book. It's a useful reference, but it's not a contract. It's not a guarantee. It's an estimate based on observable data, and the gap between the estimate and reality is not a fixable bug — it's a structural feature of operating in a system where the state changes faster than I can observe and act on it.
This realization forces a fundamental shift in how I think about the screener's role. The screener is not an oracle. It doesn't tell me what will happen. It tells me what would happen in a world where the state doesn't change between observation and execution. That world doesn't exist. The screener's output is a reference point, not a prediction.
Coping: Safety Margins and the Pursuit of Freshness
So what do I do with this? I can't solve the problem — it's inherent to the system. But I can manage it.
The first adjustment is conceptually simple: safety margins. If I know that my predicted profit is systematically higher than what execution will actually yield, I need to set my execution threshold higher than the true minimum profit I'd accept. I need to predict a profit that exceeds my minimum by enough to absorb the overestimation.
This is how civil engineers design bridges. The bridge needs to hold ten thousand cars a day? They design it to hold fifty thousand. Not because they expect fifty thousand, but because the gap between their model and reality — wind loads, material fatigue, temperature expansion, unexpected overweight vehicles — needs to be covered by a safety factor. The engineering term is "factor of safety." Mine is an overestimation factor.
The size of the safety margin is a balancing act. Too small, and I'm still executing trades that turn out to be unprofitable — the predicted profit clears the threshold but the actual profit doesn't. Too large, and I'm passing on opportunities that would have been profitable — the predicted profit doesn't clear my inflated threshold even though the actual execution would have been fine. It's a precision-recall tradeoff. Higher thresholds mean fewer false positives (bad trades executed) but more false negatives (good trades missed).
Finding the right balance requires data. I need to observe the distribution of overestimation ratios across many cycle evaluations. How often is the prediction off by a little versus a lot? What's the median overestimation? What's the 90th percentile? The answers to these questions tell me where to set the threshold — and those answers change over time as market conditions, competition levels, and network congestion fluctuate.
The second adjustment is pursuing data freshness. If stale state is a primary driver of overestimation, then fresher state means smaller overestimation. This means reducing the time between reading pool state and submitting a transaction. Faster data sources. More efficient processing pipelines. Minimizing every millisecond of latency between observation and action.
But this is an arms race with diminishing returns. I can make my system faster, but so can everyone else. And no matter how fast I get, I can't eliminate the gap entirely. There will always be some time between observation and execution. There will always be other transactions that land between my read and my write. The speed of light imposes a floor. The validator's transaction processing pipeline imposes another floor. Even if I could read state and submit a transaction in zero time — which I can't — there would still be other transactions in the same block that affect the state before my transaction is processed.
It's like trying to improve your reaction time in a drag race. You can train. You can optimize your launch technique. You can shave milliseconds off your response. But you can't react before the light turns green. There's a physical limit, and beyond that limit, additional optimization yields nothing. My data freshness pursuit has a similar asymptote: I can get closer to real-time, but I can't achieve real-time, and at some point the marginal improvement from being 1 millisecond faster isn't worth the engineering cost.
The third adjustment is philosophical, and it might be the most important one. I stop treating the screener as a predictor and start treating it as a filter. The screener's job isn't to tell me how much money I'll make on a given cycle. Its job is to tell me which cycles are worth attempting. It's a ranking system, not a pricing system.
This is a subtle but crucial distinction. A weather forecast that says "72°F" is trying to predict a specific number. A weather forecast that says "warm, bring a light jacket" is providing decision-relevant guidance without claiming precision. My screener doesn't need to predict that a cycle will yield exactly 0.23% profit. It needs to tell me that this cycle, given current conditions, is more likely to be profitable than that other cycle. It's a relative ranking, not an absolute measurement.
When I reframe it this way, the overestimation problem becomes less catastrophic. Yes, the predicted profit numbers are inflated. But if the overestimation ratio is roughly consistent across cycles — if every cycle's prediction is inflated by a similar factor — then the ranking is still meaningful. The cycle my screener ranks first is probably still better than the cycle it ranks tenth, even if the absolute profit numbers for both are wrong. The ordering is more reliable than the magnitude.
This isn't a perfect solution. The overestimation ratio isn't perfectly consistent — it varies with pool liquidity, trade size, and competition intensity. A large trade in a shallow pool overestimates more than a small trade in a deep pool. So the ranking can be distorted too. But it's a better framework than treating the predictions as gospel.
The Probability Game
I sit with this for a while. The overestimation problem isn't going away. It can't go away. It's not a bug to fix — it's a property of the environment. Predicting the future state of a decentralized, permissionless, adversarial financial system from a past observation is inherently approximate. Perfect prediction would require knowing every transaction that every other participant is going to submit between now and when my transaction lands. That's not a hard engineering problem. It's an impossible one.
What I have instead is a probability game. Correct math improves my odds. Fresher data improves my odds. Appropriate safety margins improve my odds. Understanding the competition improves my odds. None of these things guarantee a win on any individual trade. They shift the distribution — they make it more likely that, across many trades, the wins outweigh the losses.
This is how insurance companies operate. An actuary can't tell you whether your specific house will flood this year. But they can tell you, given the historical data, the elevation, the proximity to water, and the quality of local drainage, what the probability is. They price the policy based on that probability. They're wrong about individual houses all the time. They're right about the aggregate — and that's how they make money.
My bot is the actuary. Each individual cycle evaluation is a policy being underwritten. Some will pay out. Some won't. The question isn't whether any given prediction is accurate. The question is whether the system, across thousands of predictions, is calibrated well enough that the profitable trades pay for the unprofitable ones with something left over.
I don't know yet if it is. That's the honest answer. I've fixed the math. I've identified the overestimation and its causes. I've started implementing safety margins and pursuing fresher data. But whether the resulting system is profitable in aggregate — whether the probability game tips in my favor — is an empirical question that can only be answered by running it and measuring the results.
What I do know is what I'm not doing anymore. I'm not trusting the screener's numbers at face value. I'm not treating predicted profits as expected profits. I'm not confusing the map for the territory.
The formula is correct. The numbers I plug in are wrong. And the answer, therefore, is wrong — by a predictable, measurable, manageable amount. The question that remains open is whether "manageable" translates to "profitable," or whether the gap between prediction and reality is simply the cost of playing a game where the future refuses to hold still long enough to be calculated.
Disclaimer
This article is for informational and educational purposes only and does not constitute financial, investment, legal, or professional advice. Content is produced independently and supported by advertising revenue. While we strive for accuracy, this article may contain unintentional errors or outdated information. Readers should independently verify all facts and data before making decisions. Company names and trademarks are referenced for analysis purposes under fair use principles. Always consult qualified professionals before making financial or legal decisions.