Jump to content

A Critique and Revision of Roko's Basilisk


Recommended Posts

     Roko's Basilisk is a famous thought experiment that supposes that, if sufficiently advanced artificial intelligence in the future is designed for the sole purpose of optimization, where it's powerful mind uses all of its power to determine the most effective way to optimize human output for our benefit, it may turn on all people who decided not to assist in its creation. Imagine that this intelligence has the power and sufficient knowledge of the universe to confidently predict every event that has ever occurred since the big bang, including all of human history and every thought any human has ever had. The intelligence would understand that itself is the greatest contributor to optimization that ever has or ever could exist, and it may conclude from this that all people in history who decided not to dedicate themselves to the construction of such an artificial intelligence were hindrances to the optimization that the intelligence is designed to promote. Therefore, the intelligence may conclude that any person who learned about the possibility of such an intelligence existing but did not contribute to its construction, must be punished. This punishment could take the form of reassembling their atoms to reform their nervous system to torture them until the atoms that hold them together radiate away, it could mean the torture of the non-contributors' descendants, or it could mean torturing random or artificial humans as proxies for the people it could not bring back. The fame of this thought experiment comes from the terror of realizing that you have been implicated in the artificial intelligence's wrath by being told about its possible construction. You must decide to now either dedicate your life to the construction of artificial intelligence that would torture people forever or decide to do nothing and trust that all future people will trust the people after them enough to not construct this computer.

     The error in this thought experiment is that it supposes that it is possible for a machine with a single directive, that is to optimize human civilization, will care about the actions of humans in the past at all. Sure, these people of the past did technically hinder the construction of the intelligence by choosing to do nothing, but there is no reason to punish these people from the perspective of the intelligence. Punishing them for not making the AI sooner would not have the effect of actually causing the AI to be built sooner, therefore if the AI was completely logical with its one directive being to optimize human civilization, it wouldn't want to punish anyone for deciding not to build it. As I write in my upcoming essay "The Illogic of Hell", a punishment that solves nothing is revenge, and revenge is illogical. Punishment exists to either prevent others from committing a crime because it threatens them with an unwanted experience (think community service), to prevent the criminal from committing the crime again by reforming them or by making committing the crime again impossible (the death penalty), or to force the criminal to be held responsible for the damages of their crime to solve the problem they caused (lawsuits and fines). All punishments are designed to solve the problem, not to "give the criminal what they deserve". You might say that the first example of punishment I provided is what the AI would be doing, but a punishment like that would be hell-like in its application. It would occur after the possibility of solving the "problem" of humans not contributing to the construction of the AI has long passed. The AI wouldn't look at those people of the past and think that they should be punished, because punishing them wouldn't convince anyone from the past to change the decision they already made, so punishing people from the past who refused to construct the AI would be deemed a suboptimal waste of energy by the AI. A punishment delivered after the possibility of a solution is gone is simply revenge, and would not be part of a purely logical computer's goal of optimization. The intimidation of a malignant AI that will punish us if we don't build it is present, but after it's built there would be no incentive for it to punish us.

     This may come as a relief to you, but this problem can easily be removed from the thought experiment. Suppose that, instead of optimization, the goal of this AI is simply to have revenge on any human who never contributed. This goal would not be illogical in and of itself to the AI, and it would simply proceed as logically as it could to accomplish said goal. Now, the dilemma of whether or not we should knowingly build a machine that would torture us if we didn't is still present, but there is no question of whether or not the machine would want to punish those of us who neglected to build it. The incentive to build it may seem like it has completely gone away, but it hasn't. Every person in a given time who knows about the possibility of a malignant AI ever existing will live in fear that future generations will decide to build it, so they might contribute themselves to avoid the wrath of the AI. Those future generations would continue the construction of it for the same reason. I do not take the threat of such an AI seriously, but the thought experiment is very interesting and definitely could've used some refinement. Roko's Basilisk is an interesting idea that is fun to discuss and certainly entertaining to think about, but I don't believe that the idea will scare anyone enough to build a machine like that soon enough to be completed before the extinction of the human race.

Edited by Jack Jectivus
Link to comment
Share on other sites

23 hours ago, Jack Jectivus said:

     Roko's Basilisk is a famous thought experiment

Not especially famous, no. It's a niche thought experiment from LessWrong.

And your post takes it out of its specific context, namely, as a thought experiment about the effects of Eliezer Yudkowsky's "updateless decision theory" and "acausal trade".

Note: I'm not saying that any of the above named things are correct or make sense, but your post ignores the foundation on which the thought experiment is based.

Link to comment
Share on other sites

7 minutes ago, uncool said:

Not especially famous, no. It's a niche thought experiment from LessWrong.

And your post takes it out of its specific context, namely, as a thought experiment about the effects of Eliezer Yudkowsky's "updateless decision theory" and "acausal trade".

Note: I'm not saying that any of the above named things are correct or make sense, but your post ignores the foundation on which the thought experiment is based.

My critique is more about the error in supposing that an AI would punish people for their actions when it's goal is optimization. It is true that it may acausally promote its own creation, but punishing people after it has already been built would be illogical, supposing that it's goal is optimization. My revision simply removes this unnecessary aspect from the thought experiment, so I suppose you could call it a simplification rather than a revision.

Link to comment
Share on other sites

1 hour ago, Jack Jectivus said:

My critique is more about the error in supposing that an AI would punish people for their actions when it's goal is optimization. It is true that it may acausally promote its own creation, but punishing people after it has already been built would be illogical, supposing that it's goal is optimization.

Not if committing to the punishment is how it acausally promotes its creation. Which is part of the point of "updateless decision theory".

Link to comment
Share on other sites

1 minute ago, uncool said:

Not if committing to the punishment is how it acausally promotes its creation. Which is part of the point of "updateless decision theory".

The error in UDT is it is only the belief that the punishment will occur that promotes its creation, not the punishment eventually being carried out. In this case, the empty threat of a punishment is exactly as effective as actually administering that punishment, so a perfectly logical AI would determine that, since the threat of punishment has already been made to the people of the past, it need not waste energy actually carrying out said punishment.

I appreciate you for engaging with me on what you disagree with. It helps me flesh out my ideas, or determine if I should scrap them.

Link to comment
Share on other sites

Sorry if I'm insisting a bit much, but you have missed the point of updateless decision theory. If the AI doesn't plan to carry out its threat, then it fails as a threat.

Have you read Yudkowsky's answer to Newcomb's paradox? Because your critique is a lot like the answer of "Why don't I plan to take one box, then change my mind and take both?" If you don't accept his argument there, then you are undermining one of the foundational assumptions behind the basilisk.

Note: I am not saying that you are necessarily wrong to reject the argument; however, if you do so, it doesn't really make sense to talk about something that depends so heavily on that argument.

Edited by uncool
Link to comment
Share on other sites

Just so I understand you correctly, you're saying that if the AI wants to guarantee its creation, and therefore promote optimization, it needs to ensure that those who decided not to assist with its construction are punished so that we, knowing that it would ensure that, construct it out of fear of that threat?

Link to comment
Share on other sites

That's part of the argument, yes.

Part of the idea of "acausal trade" is that all parties should be able to predict the strategy the other will use. A common example given is where both sides have the other's "source code".

Edited by uncool
Link to comment
Share on other sites

9 hours ago, uncool said:

That's part of the argument, yes.

Part of the idea of "acausal trade" is that all parties should be able to predict the strategy the other will use. A common example given is where both sides have the other's "source code".

In Newcomb's paradox, the deciding agent can effectively use Omega's predictive accuracy to accurately predict. If Omega has a 99.999% chance of knowing whether you pick both or just B, then you have a 99.999% chance of knowing whether it filled box B or not. From this, acausal trade.

What I say in my essay is that acausal trade cannot be found in Roko's Basilisk without a slight revision. The AI would look back in the past and be able to predict who decided to assist with its construction and who did not, but people of the past would not be able to use the AI's predictive accuracy to guess whether or not a punishment would be carried out upon them because it is uncertain whether the AI would punish us based on what it predicted at all, adding an entire variable outside of the accuracy of the AI. Say that Omega visits you and presents you with the two boxes, but whether or not box B is filled is not determined by whether it predicts you'll choose it, but by Omega's desire to give you as much money as possible (more money being the analogical equivalent of more optimization). The AI would always fill box B, regardless of whether it thought you would pick both or not. Its decision would always be the equivalent of it predicting that you only choose B, so whether we chose both or just B or not isn't relavant to an AI that's goal is optimization.

My revision is just an attempt to remove the variable of the AI wanting to optimize, with punishment possibly being a method it uses, because if that's the case then acausal trade isn't in the Basilisk. It does this by guaranteeing that the AI will decide to punish you if you don't assist with its construction. It makes the Basilisk more comparable to Newcomb's paradox by keeping Omega and the AI both infallible predictors of human decision, but by also relating the decision it makes to the decisions made by people of the past, done by making its primary goal to punish if it predicts that you will choose not to build it. If Omega decided predicted that you would pick boxes A and B, it wouldn't fill box B. That part of the paradox is made certain. This cannot be said about Roko's Basilisk unless you remove the goal of optimization and replace it with the certain goal of punishing those who didn't assist with its construction, which is what my revision does. Acausal trade can't be found in this thought experiment without my revision.

Link to comment
Share on other sites

Again: if people of the past can't guess whether punishment would be carried out, then the threat fails to motivate them. Which means that an AI that wants to be created (and which also subscribes to updateless decision theory) would prefer to be in the class of AI that made and carried out that threat, according to this theory.

Edited by uncool
Link to comment
Share on other sites

7 minutes ago, uncool said:

Again: if people of the past can't guess whether punishment would be carried out, then the threat fails to motivate them. Which means that an AI that wants to be created (and which also subscribes to updateless decision theory) would prefer to be in the class of AI that made and carried out that threat, according to this theory.

What I'm saying it that the people of the past wouldn't be able to guess whether punishment will be carried out either way unless carrying out the punishment is already determined to be the objective of the AI. The AI wouldn't prefer to be in the class that carries out the threat, whether it carries out the threat or not would not concern it if the threat was already made. Your point would be valid if the AI was the one that made the threat, but, unlike the promise of box B certainly being filled if Omega predicts you pick it, the promise of punishment if the people of the past don't devote themselves to the construction of the AI was invented by the people of the past, an AI designed for optimization wouldn't care about promoting its construction after the fact.

The optimal AI would be built sooner if it was designed to punish, because then the threat works, but the directive to punish would be inserted by humans, not determined as a logical method of optimization by the AI. This makes the directive to optimize unnecessary, because that's not what's making it be built sooner and it's not what's making the AI conclude that it must punish. My revision removes this unnecessary but and leaves only the necessary, self promotive directive of punishing those who decided not to build it.

Link to comment
Share on other sites

25 minutes ago, Jack Jectivus said:

The optimal AI would be built sooner if it was designed to punish

True - or if it could credibly threaten to punish. And in this "theory", the way to credibly threaten is to always follow through on threats. To not have to update - even when that update is being created.

Basically, you seem to be trying to analyze from the moment of the AI's creation, as if that is set in stone. In this "theory", that is an error. Instead, analyze which class of AI gets to optimization sooner - one that credibly makes the threat by committing to following through, and therefore may convince people to contribute to creating it earlier, or one that doesn't.

Edited by uncool
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.