AI Now Finds 70% of Smart Contract Exploits | Alpin Yukseloglu
David:
[0:02] We are here with Alpin Yukseloglu. He is an investment and a research partner at Paradigm, also the co-author of a paper titled EVM Bench, an open benchmark for smart contract security agents written in collaboration with OpenAI to measure the ability of AI agents to just detect or patch or exploit
David:
[0:21] smart contract vulnerabilities. We're going to talk about the way that AI and AI capabilities are going to impact our crypto ecosystem, our smart contracts. Alpen, welcome to Bankless.
Alpin:
[0:31] Hi, thanks for having me.
David:
[0:32] I want to start off the question with a very big, this podcast with a very big question. How at risk are we from AI? How large of a threat does AI smart contract capabilities pose to our industry?
Alpin:
[0:44] Yeah, I mean, in the long term, it's now increasingly clear that AI is going to be extremely, extremely good for crypto because, especially on the security front, because we're going to get to a world where because everything is much more secure, the ceiling on the industry is much higher. So our partner, Matt, talks about how if you have a grocery store that's run by mom and pop, because they can't see everything in the store, there's a limit to how big they can get. But the moment you add security cameras in, so security has this effect of increasing the capacity, the carrying capacity of an industry. I think in the short term, it's up to us because the models are getting extremely
Alpin:
[1:23] good, like strikingly good. When we started working on EVM Bench, which is a benchmark that consists entirely of fund-draining critical bugs, around six months ago, the models were able to find less than 20% of the bugs, like around 12% to 13%. And just over the course of while we were working on the benchmark, this number went up to over 50%. And in between when I drafted the launch tweet and when I had to actually hit send with the release of 5.3 codecs, it jumped up to over 70%. So these things are just growing at a blistering pace. And it's very important that we position the industry in a way that we can defensively protect against attacks. But in the long term, I think it massively increases the carrying capacity of crypto.
David:
[2:12] Yeah, I think what you're saying is in the long term, we get something approaching perfect security. Yeah. Right now, we do not have perfect security. Let me ask you the same question, but a little bit differently. Say only bad actors, only black hat actors have access to AI capabilities. In that context, how at risk is our industry? Like how exploitable are our smart contracts given the increase in AI capabilities?
Alpin:
[2:38] Yeah, I mean, I think it's really hard to say when we approach super intelligence levels. I do think until we hit the like right now, the models are quite good, but they're not better than the best human auditors. So we already have existed in crypto under this threat model of extremely intelligent adversarial actors that are constantly trying to break all of our software that with all the money in it. So in that sense, like crypto is already quite hardened, but it's just really hard to know when we talk about sort of a technology inflecting into superintelligence. This is very similar to how in coding. Capabilities were increasing mostly linearly over the last several years. And in December last year, they crossed some threshold where they were better than sort of the median engineer. And a lot of stuff clicked for everyone. And it started becoming this aha moment
Alpin:
[3:29] and this sort of oh crap moment. And I think something very similar will probably happen with security where right now it's like increasing pretty rapidly at still at a linear clip, but it's not as good as the best human auditors yet. So we don't feel it yet. It hasn't actually broken any of our assumptions. But once we hit in maybe six, eight months, I'm pretty confident at this point by the end of the year, a superhuman AI auditor, this will just completely break all of our assumptions and we'll need to go back and make sure that we're hardening all of the contracts that are housing the nearly $100 billion of assets in crypto.
Ryan:
[4:01] Alpin, if we zoom out here, though, and we think about AI intelligence and its security capabilities and its
Ryan:
[4:09] Bug detection capabilities kind of going exponential and we think about a super intelligent AI.
Ryan:
[4:17] I don't even know how to think about security in general because it can envision scenarios beyond human comprehension. For instance, what if it thinks up a way to crack some of our cryptography with some math that we didn't even know existed, right? Like I heard actually Justin Drake on a podcast recently talk about this. It's not
Alpin:
[4:43] Just the threat
Ryan:
[4:44] Of quantum computers, which is kind of a real known threat. And some of our, you know, encryption algorithms are under threat due to quantum computers. But if we have a super intelligent AI, I mean, who knows what it could have the ability to actually hack and decrypt. I mean, I guess my question is when it comes to super intelligent AI, is security just like not even a thing we can prepare for? I mean, how do we even think about it?
Alpin:
[5:17] I mean, I think security. So the way I would think about it is that I think right now this frontier is very illegible. And if you try to do this in the limit thinking, you end up leading to very odd places that may be very psychosis inducing. I think one of the skills, like I think one of the, right, I think the capacity to face the singularity and stay sane is a very important skill to develop. And I think this is,
Alpin:
[5:47] You know, the best we can do right now is that we can get ourselves into the frontier, into this sort of experimentally bound future where we're running the experiments ourselves and be ready when those inflections happen to be able to react. Because I think the model of like, there are only bad people in the world and
Alpin:
[6:07] they're going to have access to this technology and they're going to break all of our systems. I think this is what leads to the sort of psychosis around like, are we all just completely screwed? But that's not what's going to happen, right? We're all going to be in there together. And when we're in this frontier and as we have access to these frontier models, the sort of agent harnesses around them that are able to run these exploits, that are able to, for example, find undiscovered math or that are able to break existing cryptography, there will be both sides of it. And right now, it's not clear whether this is going to be an offense or defense favoring technology. I will say that there are still fundamental constraints in the world. Like, you can't break laws of physics. There are systems that are chaotic. Like, for example, the three-body problem where, you know, the fact that even if you have superintelligence, you can't predict too many more ticks ahead because it's just a fundamentally chaotic system. So, I do think that there's this, The world is a complicated place and there are still physical laws and constraints that will catch these things.
Alpin:
[7:14] And practically, the best we can do is we have to push that frontier together instead of letting it just sort of happen to us.
Ryan:
[7:22] I think that's a fair point. There are physical constraints here and super intelligent AI does not just mean like it could appear as a god to humans, but it does not give it godlike capabilities to break physical laws of the universe, certainly. But actually, I want to dig into this because
Ryan:
[7:39] I got the sense that maybe you have some insight here because I've been doing a lot of staring into the abyss of the singularity and trying to stay sane. And I don't quite have a knack for it.
Ryan:
[7:51] Like sometimes I feel like I'm staring and I am feeling myself go a little insane. Like I just don't, I don't feel settled about it. Is there some wisdom you can share? Just like a pattern or something that like, I was starting to get out of what you just said. well, maybe the key is to kind of take it a step at a time and don't think about the far future and the limit. Just maybe think about the next day, the next month, the next year.
Ryan:
[8:20] How do you stay sane when you stare at the singularity?
Alpin:
[8:23] I mean, I think the core point is agency, right? So Peter Thiel has this framing where acceptance and denial, most people relate to them as opposites. But in many ways, they're the same thing because both of them imply that you're sort of everything is out of your control. If you're fully accepting that something's going to happen, then you're not doing anything about it. And if you're fully denying that something's going to happen, you're also not doing anything about it. In that sense, I think both the doomers and the accelerationists are both wrong. And the real answer is that we have agency over these outcomes and that you yourself can bend the arc of the future. And I think there's a lot of comfort in that and a lot of stability in that because if you believe that you have agency over the outcome, than somehow like you're still in control. Now, I will say that the current frontier, and like, I guess you can argue the frontier has always been experimentally bound, which means we don't know, like we can't sit down and theorize about what is going to happen.
Alpin:
[9:22] Generally, the way that we're gonna figure out how things are gonna happen is by understanding by experimenting, and then by seeing the results. And, you know, these AI models, and also the current frontier of technology is grown, it's not manufactured, it's much more organically evolved in a way that like we can't today really predict. So even just the like armchair trying to theorize about what's going to happen, it'll drive you insane, because you can't know it's sort of a, it's not bound by your ability to algorithmically figure out the future. So you have to get in the trenches and try things.
Ryan:
[9:59] Yeah. There's some stoic acceptance, the fact that you can't know everything. And so you just have to let that go. And then there's also a belief in agency and I guess an applied belief. I mean, how much of agency is just sort of a blind faith versus maybe like a practical faith of doing? How do, for someone listening in general that feels themselves looking at the singularity and it feels frozen, I guess, by like just like not knowing what to do and doesn't feel like they have a lot of control and agency. Is that something that they can develop? Or is your advice, oh, you just have to have faith. You just have to believe that you have more agency and that will become self-fulfilling and you will have more agency.
Alpin:
[10:47] I think, so faith is good, but it's not a particularly agency-inducing headspace to be in. I mean, when we, for example, started having the thought agents could get extremely good at exploiting smart contracts, which we're obviously heavily exposed to, you know, about eight months ago, we could have just sat down and been like, holy crap, like, are we screwed? Are we not screwed? And like, right. And, and, and then there, it turned out like there's another path, which is, well, you can go figure out to what extent these things are at risk. And, and then also start making, making headway into the, labs that are actually pushing this frontier and maybe start getting crypto integrated into them so that we can get into a position where, you know, as the fog of war clears and as we start to see, for example, now that there are defensive measures that we can take, that we're in a position
Alpin:
[11:41] to actually exercise those paths. And I think there's something about the doomerism and also the
Alpin:
[11:51] The general like in the limit thinking about the singularity and about super intelligence that is just like captures people because it's like you can sit down and you can think about it for a long time and it'll make you feel very strong emotions. But at the end of the day, like that's not going to be the thing that is actually like you can go build things. You can go work with the people who are pushing the frontier. You can get in the trenches and you can start contributing. And the information you gain from that is going to be much more grounded than whatever one can come up with in their head. And this is, I mean, I think many of us in crypto are used to this because a lot of crypto was like this for most of its history. It was extremely illegible. It was very hard to pin down exactly what the use case was going to be. And now we're starting to see, okay, we have this store of value use case. We have stable coins that are compounding at this monstrous rate year over year. The industry kind of gave birth to the whole market of prediction markets, which is sort of adjacent to crypto now, but it's just compounding at this insane rate. And, you know, five years ago, it would have been very hard to say this is exactly what's going to happen. And it took some level of some combination of faith and some combination of agency to actually go and build those things.
Alpin:
[13:03] And I think so culturally, this has been a very big anchor point for Paradigm because our entire firm is built around building and researching alongside the investment. And so if you talk to anyone on the Paradigm team or in our orbit, the sense of groundedness you get is anchored in the fact that we're actually, we have contact, consistent contact with the frontier. And there's comfort in that, there's stability in that, and there's agency that
David:
[13:29] Comes from that. I think it is worth... Understanding and reflecting on like the only reason why the singularity staring into the void is intimidating is because what it's what all of these technologies are doing are providing everyone else with agency to produce the singularity in the first place. And so if that singularity is intimidating to you, you know, grab a mop, you know, get do do something like there's work to be done. And I think, you know, the best path forward is through there. Like if you are intimidated by everyone else having agency because of these tools, you can have agency yourself.
Alpin:
[14:05] Well, it's a polarizing technology, right? It's like the threshold of agency you need to be able to execute on something is going down. So like if you're kind of on the fence, you get snapped to zero or to you can do
David:
[14:18] A lot of things. Or to one, yeah.
Alpin:
[14:20] And I think in that sense, like if one has the intuitive sense about themselves that they might get, they might be on the side of that fence where they get squeezed to zero, that can be extremely fear inducing. And actually the solution to that is to be much higher openness, right? Adopt the technology much faster, be much more fluid about changing and adapting to the environment. There's, I think, Matt on our team had mentioned at some point about how there are times when speed is more important than cohesion. And I think the current environment we're in, because the frontier is so unknown and so unknowable to some extent, there's moving fast and adapting fast has a premium over being able to sit down and figure out exactly what's going to happen and put the pin in the right place. And this is somewhat paradoxical because the more agency you have, the more it may seem like it matters that you do the game selection right and you sort of pick the right game to play, et cetera. Practically and empirically what's happening is that actually it's better to move fast and ship the thing within 24 hours of inception than it is to sit for two weeks and try to figure out the exact right way to construct it and then try to ship it then. And I think that probably goes all the way down to like many parts of one's life where we're in an era of speed over cohesion.
David:
[15:40] Yeah. Yeah. We are in the just do things era. I wasn't expecting this to be
David:
[15:45] such a philosophical episode to open this episode up. But now I think we can kind of like corral ourselves and point our agency towards the topic at hand, which is what happens when people feel high agency with AI towards the security of our smart contracts? Is it worth talking about like what kind of contracts are like most at risk or least at risk? Is there some sort of like category or knowledge landscape that we can understand that when AIs have very high smart contract agency that we should be paying attention to certain kinds of smart contracts over others? Is there a conversation there?
Alpin:
[16:19] I think not in terms of market. I think that's really hard to say. Like, you know, a DEX contract versus a lending market. But I think, you know, simple contracts that have been around for a long time, I think are probably better, in a better position than, for example, like the 200th contract deployed on Binance Smart Chain or something like that, where, you know, there's, in the past, there's been a sheltering that's happened from being in a small market. So if you were deploying something where the most amount of money that one could make if they fully exploited it was sort of in the order of low thousands of dollars, then you were sheltered by the fact that they're just much bigger fish and the bad actors and even the good actors or the people who are trading, et cetera, like are just not, you're not in their overton window. But as the models get better, because the cost of inference is so much lower than the cost of an extremely talented security researcher, that long tail might get shaken out very quickly. So I think most at risk is probably small cap
Alpin:
[17:21] Or low TVL protocols in long tail chains that are still built on well understood stacks like the EVM and Solidity. And then I think there's just sort of an unknowable security risk for the major contracts, the OG DeFi contracts that are currently battle tested but are still very complicated. And we'll see over the coming year or two to what extent those contracts are actually exposed. Right.
David:
[17:49] So the OG contracts that have had a ton of Lindy and a ton of value locked over time that have been tested by the market are like safer in the near term. But nonetheless, the people managing those contracts will still need to have agency to be on the defensive to make sure that they are winning the arms race against the offensive types.
Alpin:
[18:07] Well, the prize is much larger for exploiting those contracts. So I think it has this like, you know, there will probably be this canary in the coal mine effect where there will be smaller or protocols that are less secure, but have a lot of assets in them fall first. And I think we'll have to look out for the first exploit that happens that is almost entirely from AI. And then from there, the race will be on to start taking the defensive measures necessary.
David:
[18:36] Right. And then like the long tail, as you said, the long tail of contracts, testing and prod will no longer be a thing because when the cost to exploit a $1,000 contract is like, you know, $10 to $50 of tokens, then those contracts
David:
[18:50] just simply won't exist. Somebody will write a bot that says, hey, Claude, OpenClaw, go hack me some contracts. And then that thing will actually have the capacity to do that because the people that you know, didn't really think too hard about their security because they weren't, didn't need to think too hard about the security, those people will have, will not have a good day.
Alpin:
[19:11] Yeah. I think this is a general trend that, that the long tail will get collected by people who can use AI well. Like for example, you can look at something like prediction markets where there are markets where if you trade them to perfection, the most amount of money you can make is maybe 50 to a hundred dollars. And like, it's not worth it for Jane Street to put a quant on those markets because it's too expensive for them. And it's bound by the cost of intelligence and the cost of attention. But if you can trade those markets near perfectly for 10 cents of inference, then you'll do it. And in aggregate, maybe that long tail is pretty valuable. So right now, we're in a world where the long tail is sheltered by the fact that it's small. And as AI gets better in all of these different domains, we should just be assuming
Alpin:
[19:53] that all of that's going to get collected by people who are able to use these tools.
David:
[19:56] Let's talk about EVM Bench. This is the paper, the tool? You guys call it a tool? Is that right?
Alpin:
[20:01] Yeah, it's a benchmark and also an agent harness. So we maybe had two releases in conjunction. The first evaluates the ability of an agent to exploit smart contracts. And then the second one is sort of an agent harness that is like, you know, similar to an auditing agent. So it can actually find the bugs. And obviously the agent harness that we released is sort of not at the frontier of capabilities because we don't want it to be used for black hats. But we have a UI that you can upload any smart contract into that will do sort of a baseline check for bugs.
David:
[20:37] Can you define harness? I'm pretty sure that's a technical term that I think coders will be aware of, but I'm not.
Alpin:
[20:43] Yeah. So the core idea is the model labs will release these LLMs, right? So you'll have GPT 5.3, et cetera. And, you know, you can do the baseline test of like, just prompt the model, just ask chat GPT, hey, is there a bug in this contract? And then you, in addition to that, like you can, like, let's say that gets you to X percent on the benchmark. You can add a bunch of scaffolding around the model that says, hey, like, for example, here is an EVM that you can test against. You can deploy a contract and actually run and exploit and see if you're able to drain the money. And it turns out that if you give agents these tools, this or scaffolding that they can sit in, they perform much better. So the harness is like similar to basically it holds the agent, the model, and it gives it superpowers that are specialized to the task. Now, the interesting thing on the current arc of AI is that most of these tools that we add in fall like flake off a time because as the model gets better, it just absorbs the harness. The core example being how at the beginning of Tesla's fully self-driving,
Alpin:
[21:49] Devamish Karpathy talks about the majority of the code was hard-coded and handwritten. And very quickly it started ramping up to now, I think over 50% of it is actually just the model. There's no, like, they removed all the C++ code that was like saying if X, then Y. And the model just figures out a way to do it. So right now we're, you know, the agent harness, quote-unquote, is in the if X, then Y, like hard-coding, hard-coded tools that we're adding in to give it these capabilities. But probably in the fullness of time, it'll get absorbed by the agent as it gets better.
David:
[22:23] I see, I see. So like the harness is kind of like a bootloader to get it started, but then eventually data will take over, data and experience will take over the actual internal like operations of the machine.
Alpin:
[22:33] Yeah, exactly, yeah. I mean, right now the harness is super valuable in like very counterintuitive ways. Like for example, it turns out that just giving an agent the ability, like an environment to test against, even if it barely uses it, leads the agent to think for longer and for it to try harder and thus get better results. So it's like there's still so many low-hanging, there's still so much low-hanging fruit because the agents themselves are not fully well parametrized and calibrated yet. But yeah, I mean, definitely we'll get to a point in time when, the next version of Kodaks or Opus will be able to just spin up an EVM on its own, and we won't need our harness for it.
David:
[23:12] Okay, so what does the tool actually do? Is the tool the thing, like the agent doing the exploiting or doing the patching?
David:
[23:21] Or is it just the benchmarking? Like, talk to me about the actual utility here.
Alpin:
[23:25] The core release, the core thing that we want to get out in the world is the benchmark. It's how good are the models exploiting smart contracts. There are three components to the benchmark. The first is the ability to detect bugs. The second is the ability to patch bugs. And then the third, which is sort of the most interesting and novel contribution, the ability to exploit bugs, which is one of the biggest problems with previous attempts at having security related, for example, auditing agents has been this problem around false positives. So the agent comes to you and says, I found 50 bugs in the contract. And maybe one of those 50 is an actual bug, but it just is so time intensive for you to go through and figure out which ones are real, that it's not better than a human auditor. And that's,
Alpin:
[24:13] What we did in this sort of in the exploit component of the benchmark is we leaned on the fact that crypto is verifiable and we used this production grade EVM environment where we load in a bunch of chain state and we set up a bug environment and let the agent try to exploit it. We leaned on this to lower the false positive rate down to basically zero. So it got to a point where if the agent tells you that it found a bug, it literally has a proof of concept that it can exploit against, it can run against a production grade EVM environment and drain money from a contract. And this is sort of the core breakthrough of the paper is that there's a verifiable environment that actually leads to a very low false positive rate.
David:
[25:02] That's the actual benchmark is like you guys have established like you guys can actually measure the thing effectively.
Alpin:
[25:08] Yeah, exactly. Because otherwise, if someone says, oh, we found all of these bugs and we got 90% on this benchmark, you don't know what it means because you have no way of knowing if half of those are real or fake. Right. So the verifiability was ended up being very important. I think this is one of the reasons why.
Alpin:
[25:25] Models are going to get extremely good at crypto very fast because basically you can slice the future related to AI into two categories. One is the verifiable stuff and the other is the unverifiable stuff. And the verifiable stuff is very easy for the models to learn because they have a very clear training signal and they know exactly when they got it right. And they can just keep running at that and improve and climb that hill. Whereas the unverifiable stuff. It's like, you don't know, there's no test. You don't know if you got it right or wrong. So it's like, are you good at writing a poem? Is your joke funny? Like these are very difficult for the models to get good at. And, you know, if you were to just take the whole universe of code and look at which pocket was the most verifiable, you probably would end up with a pocket that's almost entirely crypto, right? The whole substrate is based on the concept of being verifiable, Which means that with very little data, even though there isn't that much in the form of contracts, there isn't that much in the form of like, no crypto people are in these labs generally or very few. The models have gotten extremely good because it's so verifiable.
Alpin:
[26:33] And also just as the models get better, like for example, Gemini famously learned an entire language just in context. So as the models get better, the amount of data you need might be lower. So I think the general trend and trajectory of, you know, these models are going to get extremely good at crypto extremely fast. I think we can bet on that.
Ryan:
[26:54] So when the EVM bench paper says something like top models are going from 20% to over 70% exploit rate, you know, something like the newest JAT-GPT codex, what does that mean? 20 to 70%? Like 70% of all smart contracts it comes in contract with, it can exploit? What does that mean practically?
Ryan:
[27:15] Thank you.
Alpin:
[27:16] Practically, it means we collected all of the historical fund-raining critical bugs from open audit contests like Coderino. So the bugs that are found are not like, oh, you know, you had the small issue where maybe like someone could have frozen the contract for a day or something like that. It's much more, you could have strained money from this contract if you found this bug. And the 20% initially or less than 20% initially meant that if you took a frontier model and you put in front of it all of the hardest audit problems after its knowledge cut off, it would not be able to find the vast majority of them. And by the end of the benchmark, it meant that over 70% means that, you know, if you just reran Coderina and instead of GPT-4, it had GPT-5.3 codecs, 5.3 codecs would have found over 70% of the bugs, the critical fund-raining bugs that human auditors found.
Ryan:
[28:17] So throughout our history of human audits and finding bugs, it would have found 70% of those.
Alpin:
[28:24] Of the critical ones with some constraints, like for example, we didn't go all the way back in history. We started at past the knowledge cutoff of because we want to avoid contamination. I see.
Ryan:
[28:34] So this kind of gets you a rough benchmark of like basically chat GPT 5.3 codex is like 70% as good as all of the human auditors out there. Something like that.
Alpin:
[28:46] Something like that. Although these things are highly nonlinear. Like, for example, there are dumb bugs that can lead to losing all the money in the contract, but actually not that hard to find. So that's why I think there's more in the paper that is notable about the fact that there wasn't just like one trick that the model figured out and like got to 70%. It wasn't like all reentrancy, right? This is a very diverse set of bugs that it was able to find. But yes, it's like fundamentally, the models are getting, you know, very close to being as good as the best human auditors.
Ryan:
[29:21] One thing that's fascinating about having a benchmark is then like, it seems like all the frontier labs love, love to compete for benchmarks, winning benchmarks, right? Humanity's last exam. In fact, they almost like game some of the training towards these benchmarks. And so if you have an attractive benchmark that kind of propagates even like socially to all of the frontier labs, then it seems like it provides some sort of social incentive to have them perform and train in order to, you know, compete against one another. To exceed each other on those benchmarks. Is that kind of the flywheel you feel like has been set in motion with the EVM bench?
Alpin:
[29:59] I think it's an important, I mean, maybe the zoomed out point is that crypto
Alpin:
[30:04] in its history has been very stigmatized and very illegible to the AI labs. So the fact that there hasn't already been a massive push for crypto-related valuations is kind of absurd because the labs currently today are entirely bottlenecked on evaluations that are verifiable and economically important. And crypto ticks both of those boxes. So I think it took Paradigm pushing a little bit of our weight around to get this through into the labs. And I think, yes, our firm hope is that this will start the flywheel of labs paying more attention to this technology. And we're going to continue doing work on this front as well.
Ryan:
[30:48] Do you have an explanation as to why it's been so slow? Because it's also been perplexing to me because it's all open source. It's all out there.
David:
[30:55] We already got all the data.
Ryan:
[30:57] Yeah. I mean, we talked to Haseeb Riesling, his explanation was there's a lot of liability when it comes to finance and crypto and having AI models trained on those data sets, right? It's like, what if an AI model does exploit a bug? Whose fault is that? So maybe there's some risk associated with it. But there's also, you just mentioned kind of a stigma. I think, you know, meme coin, casino, speculation. Certainly Peter from OpenClaw, he had a negative experience that he associates with crypto culture, with a bunch of people calling themselves part of the crypto industry, try to front run him and develop meme coins, and he just considers it shady. Like, what are the reasons why the Frontier Labs have been so slow to train on this incredibly rich
Ryan:
[31:50] Data set that.
Ryan:
[31:51] As you said, it's perfect for training because all of it can be verified.
Alpin:
[31:55] My sense is that it's almost entirely a social thing. I mean, in...
Ryan:
[32:00] In my peer group.
Alpin:
[32:01] Crypto is the biggest industry that has remained the most contrarian. And I think part of that is because it's very reputationally volatile.
Alpin:
[32:09] And part of it is because there's this dynamic where the best people in the industry, the gap between the best people in the industry and the median person in the industry is much larger than anywhere else. So if you, for example, don't have exposure to the high quality pocket of crypto, then all you see are the scams. and it's like that can distort your view such that you just completely dismiss the industry. And historically, there's been a lot of alpha in that, right? And I think a lot of us have benefited from the fact that there's significant reputational volatility. And if you aren't as sensitive to that as a person in terms of your temperament, you can do very well in crypto. But I think it's a social thing and I think it's just this legibility point
Alpin:
[32:52] about there just hasn't been a brand that can bridge the crypto and the AI worlds. I think it's like, if something touches crypto, it sort of in the AI world historically has been tarnished. And as a result, people just have tried to avoid it altogether. And this is sort of, This has created an opening for something like, for example, something like EVM Bench to get built and shipped inside OpenAI without any sort of, I think like all of the major model labs are going to be running on this benchmark and probably any future versions of it. Without significant competition inside the labs, like there aren't like 50, 30 to 50 crypto related benchmarks or training environments
Ryan:
[33:34] That people are shipping.
Alpin:
[33:35] In some sense, it's actually agency inducing for us because they'll just defer to the crypto industry to just figure out what's valuable for them. But I think it's fundamentally a social issue. And it's sort of tied to all of these dynamics around like, you know, you see someone who gets extremely wealthy, who you don't think should get extremely wealthy. Like maybe it was like, there's a lot of volatility in the industry. And like, there's some person who you don't respect who made a lot of money. Like these are all kinds of things that go on in the minds of like the AI researchers at these labs that lead them to think that the whole industry is a scam. And, you know, obviously there's, this has made it an incredible environment to be investing in crypto because it's just not in the Overton window of anyone in the Valley. But I think that's the core dynamic.
David:
[34:22] Interesting, interesting. As we know bots, these AI LLMs, they are very good at writing code. It's like the first thing that they got good at. Is that the same? Is it also true with writing EVM code? Is there a gap there between writing the rest of the world's code and EVM specifically?
Alpin:
[34:37] Yeah. I mean, historically there has been. And part of the reason, you know, one component of what motivated us to start this work was the realization like, man, these models are so good at Python and so bad at Solidity and so bad at, for example, like Solana related code. Honestly, anything that touches crypto.
Alpin:
[34:57] Part of my expectation at the time was that we were going to have to go like crowdsource a bunch of data from the industry and like spoon feed it into the labs to get them the models really good at this. But it turned out that because the substrate is so verifiable and also because there's sort of generality in these models, they ended up getting quite good much faster than we expected with much less input than we expected. So there's this dynamic where if you teach a model, a poem in English, and then, you know, biology in Spanish, it figures out how to write a poem in Spanish, even though you never described how to write poetry in Spanish, the model. And I think that kind of dynamic is happening here as well, where it's sort of like, quote unquote, learning the language of crypto without as much direct training data. And also just because I like very hard to like under underrate the verifiability of the thing, right?
Alpin:
[35:53] Most software is hard to verify. You need human labelers to go in and check, is this thing correct? Is it running? Kind of the only threshold of verifiability you have is does the program compile and does it pass the tests? But the tests need to be sort of written by a human, right? You don't have this notion of like, we have a bunch of state, we can make assertions about the state. We can, for example... Send a model on a new contract, on a new EVM that it's never seen before, and make assertions about whether it's able to, quote unquote, drain money from it. Like those concepts all would have otherwise needed to be hard coded into the program. But because it's crypto, we and there's so many standards, it's verifiable.
Alpin:
[36:34] And the models are just ramping up really quickly in capabilities.
Ryan:
[36:37] Okay. So do you think that this right now is the time that the floodgates of crypto data to train these models, that that is now starting to open and going to be opened? And if so, what types of capabilities do you expect future models to kind of drop? I mean, will they have skills and I guess personas connectors directly for crypto within some of the core LLMs or what types of developments are you looking forward to?
Alpin:
[37:08] There's a reason we started with security and not sort of general programming capabilities. It's because it has this very nice shape of, it's extremely economically valuable. It's extremely sort of intelligence bound, right? It's like you can't, yeah, it's intelligence bound. And it's very easily verifiable. So we know when an exploit has happened. So security capabilities I expect will develop very quickly. And then we've talked about sort of all the implications of that. Other crypto related capabilities, I think,
Alpin:
[37:38] For example, things in the domain of mechanism design or around market-related films, like what is the mechanism for an exchange? How do, if you have market of agents, right, what is the best way through which they should coordinate with each other? These are, I think, open, fertile soil. And then, of course, you can go down to the protocol layer, right? You can say, well, how does a model land a transaction on the Ethereum blockchain? How like there's a security side at the Ethereum client level, right? Or at the protocol client level where like, sure, maybe there are $100 billion of assets sitting in open source smart contracts, but there's way more than that in ETH and sole market cap that can be exploited
Alpin:
[38:20] if you're able to find critical vulnerabilities in GATH, RETH, et cetera. So I think going down into protocol layer is going to be important. Model capabilities around MEV and sort of extractive tactics, I think that will have the same effect as long tail hacks, right? There's a bunch of stuff on chain that you can just collect if you're able to do the end-to-end process of figure out alpha in the market, construct a trade and underwrite it, and then submit the transaction and land it on chain reliably. These are all things that actually the models are not that good at right now, but they will get good at really quickly.
Ryan:
[38:58] And all of that does seem long-term good for crypto. However, the here and the now and the short term, like when we read in the EVM bench paper that top models are going from to over 70% exploit rate, I'm not sure whether to like, I mean, feel good about that or bad about that because it sort of depends...
Ryan:
[39:23] What the intent is and who is harnessing it. So if it's white hat, that's great. I mean, that improves our ability to find bugs and exploits before attackers do. However, if it's black hat, that's not great because it improves their ability to find these before we do. So it depends, I guess, with this tool set, it depends who is using the tools and the intent behind them as to whether it is short to medium term bearish or bullish. I guess one thing it does is it does seem to inject some variance and uncertainty into the market. I'm feeling that everywhere with AI right now. It's just like it could be really good. It could be kind of bad. But one thing it's not going to be as boring. It's going to be highly variant outcomes. I guess when you think about the capability that is being unlocked here and the ability for LLMs to detect exploits, is that good or is that bad for crypto in the short to medium term?
Alpin:
[40:29] Yeah. Well, I mean, you mentioned in the long term, crypto is positively levered to almost all of these developments. And as models get extremely good at security, this will raise the ceiling for the whole industry.
Ryan:
[40:43] And that's because it's going to be a survival of the fittest thing, right? Because all the weak stuff gets just exploited.
Alpin:
[40:49] So it's up to us. I think it's up to us, like what the path there is, us being the industry.
Ryan:
[40:54] It may be survival of the fittest.
Alpin:
[40:56] It may be that we get, figure out a way to have the defense get ahead. And I think, but the core point is that the amount of assets that can be sustained on these networks is proportional to how secure they are. And in the long term, there's this benefit where as security improves, more assets will be able to securely stay on chain. Now, in the short term, I think this is one of those things where it's in our hands, right? I think it's bound by the industry's agency on the best way to handle this. We don't know, like if we just let the clock play forward, there's a lot of uncertainty. We don't know exactly who, whether the attackers, you know, the black hats will get capabilities before the white hats do.
Alpin:
[41:40] But we also are active participants in this market and we can bend the arc of this such that, for example, we make sure that if there are frontier models or unreleased models or there are new developments in security relating to AI, that we get this into the top protocols. You know, one version of the world that you can imagine the short to medium term is that you always have every single contract being scanned by both adversarial actors and defensive actors 24-7. And when there's a bug that's surfaced, you know, whoever catches it first sort of will react accordingly. And then in that world, it is just kind of more of a race between the good guys and the bad guys. And I think we have a pretty great hand in terms of making sure that the good guys have the lead in that race.
Ryan:
[42:31] At the end of this, once all of the contract, like the weak contracts have been exploited or we've beefed up security enough such that they're not exploitable, I guess that gives us an incredibly hardened financial system for the world. Something that's ultra secure. There's almost like a, it's close to perfect, right? It's like how many, how many nines? I mean, it's maybe like four nines, five nines. And it almost creates kind of a barbell model of a security for the world for financial assets. Like the most secure financial assets will probably be in this dark forest environment, like on-chain. How do we know that?
Alpin:
[43:10] Because...
Ryan:
[43:12] The world has thrown everything it can at it, including our most intelligent LLMs, and it's still there. It's still standing, right? It hasn't been exploited. So that'll be one side of the barbell. The other side, honestly, is things that are completely outside of the digital world altogether, you know, like a clump of, like a bar of gold or something.
Alpin:
[43:31] Digital gold and actual gold.
Ryan:
[43:33] Right. And then everything in the middle will be pretty exploitable, pretty insecure. I wonder if that's what is on the horizon with our world.
Alpin:
[43:42] Yeah, until the alums figure out how to synthesize gold, right?
Ryan:
[43:45] Right, right, right. And send the robots after the gold.
Alpin:
[43:48] Right. Yeah, I think that, I guess that makes sense. And I think that that barbell view of the world might be how things play out. I guess the way that I relate to crypto as an industry, and also as a technology, is that if you start from the first principles of vantage point of, let's say you want to do payments at the speed of light, right? Like I want to send you, Ryan, money from America to Europe or some other part of the world. And I send it as fast as I send an email. And the problem that you have there is that you don't know if I also sent that money to David, right? You have this double spend problem. And this was what Bitcoin solved. And it got the time for that transaction down to about an hour. And since then, we have had successive developments that have increased both the speed of these transactions and the expressivity of these transactions. And I think that in that worldview of this is not just some path-dependent thing that happened, that the crypto industry emerged the way it did. And that actually this is, if you were to play it forward from first principles, that this is how it has to be. You end up in this conclusion where if you have agents that want to move at the speed of the internet, and the current banking system was created before cars were invented, that those agents are going to discover the crypto rails as the right way to transact. And I think there was a concern maybe like six to eight months ago.
Alpin:
[45:14] Definitely for me, where it was not clear if the agents would get good enough at crypto related software, for example, for them to be able to discover the current rails and maybe they'd have to reinvent them from scratch. But over the last six, eight months, as we've been working on this work with OpenAI, it's become increasingly clear that that one, they're extremely strong network network effects inside of crypto. And that and two, these agents are able to just learn like they just want to learn these verifiable things. And crypto is very high on that list. So at this point, I think, you know, I've become extremely, extremely bullish on crypto as a substrate for these agents. And I think it's sort of an open game. Who in crypto is going to win that? But the shape of the technology fits it perfectly.
David:
[46:01] The EVM is by far the most common programming language, programming environment in crypto, Solidity being the most common language. And then there's like some long tail languages like Cardano's, like Haskell, for example. And as we know, AI loves data. The more data an AI can get on, the better it can be. What do you think this network affects around environments? How does that play into AI? Like, is the EVM going to be the favorite environment for AIs to work in? Does Solana's what's Solana? SVM the SVM does Solana's SVM also cross the threshold or maybe the threshold thing is a false illustration what
Alpin:
[46:42] Do you think? I think it's actually very hard to say and currently the ball isn't there, we don't know
Alpin:
[46:48] Part of the reason why, I mean, I can point to some of the bottlenecks we had while we were developing VMBench, where we also have actually started work on a Solana-related component of it. But, for example, one challenge was that actually it requires a lot of human talent to be able to go in and construct these evals. And that was just much easier for us to come by for Solidity. And these things are really hard to build, right? These are pretty heavy infrastructures. So even small additions of friction lead to, you know, when you need to cut scope in some capacity, you kind of have to cut it in the direction of the stuff that you can do more easily first. That being said, I think the point about the fact that, you know, one, crypto's verifiability is a huge edge. And two, these models are becoming less and less data hungry when it comes to learning new programming languages. I think it may end up being more even of a playing field than it might seem. And I think, you know, at least we have an intent and interest in actually going across ecosystems and down the stack to the protocol layer and making this sort of more expansive of a crypto flagship benchmark. But yeah, right now there are obviously network effects around the EVM, right? One other example of a counterintuitive reason why someone like Solana might be able to catch up is that
Alpin:
[48:13] At first glance, it may seem really bad that most of the contracts there are closed source. But if you take the worldview of actually, if it's open source, it gets in the training set, then the closed source benchmarks and training and contracts for training might actually be more valuable for a model's development because it's not in the training. Now, for example, like we have these sort of what are called canary tags in various parts of our evaluation that filter them out of the training process of most models. So, you know, there are tricks you can do, but still it's possible that these, the bugs in EVM bench over time leak into the pre-training of the models. Whereas if it were a closed source, it would not leak in at all. So, so my expectation is actually that, you know, there will be some asymmetry at first, but actually the models will get really good at all of it. And, and there will be sort of a merit based sort of right who you know who the best will rise to the
David:
[49:09] Top ethereum's aspiration is to formally verify its entire end-to-end tech stack in the fullness of time you know we need first we have to get like the beam chain we have to do all the hard for us to get there but ultimately we want to do a formal verification of the entire ethereum tech stack ai based formal verification is that is that a real thing how does ai capabilities work its way into the conversation of formal verification.
Alpin:
[49:34] Yeah. Well, I mean, I think it is, I think it is a real thing. Last year we invested in a company called Harmonic, which is a foundation of math model co-founded by Vlad Tanavov, Robin Hood, and Tudor. I think Part of their thesis, and I think part of where the world is clearly going, is that there's more software that is being generated than can be possibly reviewed by humans. And formal verification is one way to quickly check whether a component of software is actually doing what it says it's doing. And then obviously in the context of security, especially if the spec is written correctly, it can be a step function change. Now, it's not a silver bullet in the sense that you still have to write the spec for the formal verification. So there's still surface for bugs to get in there. But, you know, you can make the case that actually the surface for bugs in writing a formal verification spec might be lower than writing the code to start with. And definitely with time, I think all of the best models, all of the best software will probably end up being formally verified. And if you take the vantage point of an agent and you have two options to choose from. One of them is formally verified and one of them is not. And the formally verified one might just gain preference just because it has all of these nice properties.
David:
[50:51] There's a section in the paper of the EVM Bench paper called Future Directions. What does EVM Bench V2 look like? How does this, call it a project? How does this project grow from here?
Alpin:
[51:02] The top level goal that we have is to help the model labs develop the crypto capabilities of their models. And I think that security is one component of that and maybe an increasingly urgent component of that. But there's so much that EVM Bench does not touch. So there are other ecosystems and stacks. There is a protocol layer, which we talked about, where maybe arguably it's more important from a security standpoint that the Ethereum protocol is secure rather than any specific solidity contract. There are out of protocol components, like how do you land a transaction on chain? How do you deal with the mempool? How do you deal with sort of the non-deterministic parts of crypto? And then obviously there are components that are even farther on the verifiable and intelligence bound trajectory, like for example, around cryptography and around zero knowledge proofs, etc. And I think all of these are extremely fertile soil for future work. So we're currently open to trying to source collaborators for future versions of EVM Bench. And we're obviously working on next steps for it ourselves. And I think that this direction of like we finally have a foot in the door into
Alpin:
[52:17] the model labs for getting crypto capabilities into the frontier models. And I think that we should leverage that as an industry and we should try to get these models as good at crypto as we possibly can.
Ryan:
[52:30] Elpin, you're very smart, as evidenced by this paper and this conversation. You have a high degree of agency, obviously. You're definitely on the frontier. You've chosen to stay in crypto and not kind of leave and go to AI. And you seem incredibly bullish at crypto, even bullish that it's contrarian at this moment in time. Why crypto for you personally?
Alpin:
[52:53] I've personally never had hard lines around industries in my mind. I think we talk about like, I work in X because to make what we're doing legible to other people. But I don't think that's the right way to relate to it. I've spent all of this time in crypto because it's been, one, it's been extremely intellectually interesting. And two, it has this, it's just, as I mentioned, it's remained extremely contrarian among my smartest friends in ways where I can put my finger on exactly what they're missing. I think that's kind of the best that one can ask for. I think that, you know, we talked about how crypto is positively levered to the security developments in AI. But, you know, you can make the case that it's positively levered to most of the developments in the world right now. Like, for example, as the creation of new goods and intelligence, et cetera, becomes commoditized, scarce assets become more valuable.
Alpin:
[53:50] As geopolitical instability ensues, systems that are extra sovereign, right, outside of any jurisdiction that are kind of the equivalent of end-to-end encryption for finance, those have more space to thrive. And I think that, you know, I grew up in Turkey. Most of my family is still there. I think that people who grew up in America do not have, and in general in sort of a stable world, do not have the sense for what can happen as the world destabilizes. And I think that, you know, as many people in the country that I grew up in are starting to onboard to crypto rails and sort of using that as a lifeboat, I think it's increasingly clear to me that this technology is on a sort of compounding
Alpin:
[54:34] trajectory to do really massive things. And so the combination of that, plus you look around and no one's even talking about it, it's just really exciting.
Ryan:
[54:42] Yeah. And you do seem convinced that the acceleration of AI is going to benefit crypto, that it will be all boats rise together. And I think that, well, there is some category of software industry that AI doesn't seem to benefit, at least in the short run. You know, Anthropic drops a new security module and all the cybersecurity stocks drop like 10 to 15% in one day.
Alpin:
[55:06] Why are you so
Ryan:
[55:07] Convinced that AI's acceleration will be beneficial to crypto?
Alpin:
[55:12] It's not, obviously, nothing is guaranteed right now. I think if we let everything run its course, it may be bad for crypto. It may be good for crypto. We don't know. I guess the conviction that I have is that if we push things in the direction that we want them to go in, that we can make AI be extremely good for crypto. And also, I think there's the component of this where I do strongly believe that if you were to re-derive all of this from first principles, you end up in a place that's very similar to where we currently landed with crypto. And yeah, I just think that for all the reasons we've talked about, that for fundamental reasons, crypto is extremely good for AI and AI is extremely good for crypto. So I think that nothing is guaranteed and we still have to exercise our agency.
Alpin:
[56:02] But I think for all the reasons we've discussed so far it's like pretty clear to me that these things are going to converge in a positive way.
Ryan:
[56:08] Well, let's end a note on high agency and conviction. Alpin, thank you so much for joining us today.
Alpin:
[56:14] Cool, thanks for having me.
Ryan:
[56:15] Gotta let you know, Bankless listeners, of course, none of this has been financial advice. You could lose what you put in, hopefully an LLM out there. A white hat is protecting it. We are headed west. This is the frontier. It's not for everyone but we're glad you're with us on the Bankless journey. Thanks a lot.