How would an AI self awareness kill switch work?What could make an AGI (Artificial General Intelligence)...

Issues resetting the ledger HWM

Identify KNO3 and KH2PO4 at home

Why is Agricola named as such?

What to look for when criticizing poetry?

Crontab: Ubuntu running script (noob)

"on its way" vs. "in its way"

Does dispel magic end a master's control over their undead?

Does diversity provide anything that meritocracy does not?

Why do neural networks need so many training examples to perform?

Move fast ...... Or you will lose

Potential client has a problematic employee I can't work with

Cat is tipping over bed-side lamps during the night

Is this ordinary workplace experiences for a job in Software Engineering?

How does Leonard in "Memento" remember reading and writing?

After checking in online, how do I know whether I need to go show my passport at airport check-in?

Why was Lupin comfortable with saying Voldemort's name?

Early credit roll before the end of the film

Why is working on the same position for more than 15 years not a red flag?

Why did Luke use his left hand to shoot?

How can I play a serial killer in a party of good PCs?

How to visualize the Riemann-Roch theorem from complex analysis or geometric topology considerations?

How to use Mathemaica to do a complex integrate with poles in real axis?

Hilchos Shabbos English Sefer

A curious equality of integrals involving the prime counting function?



How would an AI self awareness kill switch work?


What could make an AGI (Artificial General Intelligence) evolve towards collectivism or individualism? Which would be more likely and why?How would tattoos on fur work?How to prevent self-modifying A.I. from removing the “kill switch” itself without human interference?Given a Computer program that had self preservation and reproduction subroutines, how could it “evolve” into a self aware state?Ways to “kill” an AI?How Would Magnetic Weapons Work?How would portal technology work?Would ice ammunition work?Can AI became self-concious and human-like intelligent without feelings?Why would a recently self-aware AI hide from humanity?













26












$begingroup$


Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?










share|improve this question









$endgroup$












  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – L.Dutch
    3 hours ago










  • $begingroup$
    I think, therefore I halt.
    $endgroup$
    – Walter Mitty
    26 mins ago
















26












$begingroup$


Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?










share|improve this question









$endgroup$












  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – L.Dutch
    3 hours ago










  • $begingroup$
    I think, therefore I halt.
    $endgroup$
    – Walter Mitty
    26 mins ago














26












26








26


8



$begingroup$


Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?










share|improve this question









$endgroup$




Researchers are developing increasingly powerful Artificial Intelligence machines capable of taking over the world. As a precautionary measure, scientists install a self awareness kill switch. In the event that the AI awakens and becomes self aware the machine is immediately shut down before any risk of harm.



How can I explain the logic of such a kill switch?



What defines self awareness and how could a scientist program a kill switch to detect it?







reality-check artificial-intelligence






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked yesterday









cgTagcgTag

1,5931618




1,5931618












  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – L.Dutch
    3 hours ago










  • $begingroup$
    I think, therefore I halt.
    $endgroup$
    – Walter Mitty
    26 mins ago


















  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – L.Dutch
    3 hours ago










  • $begingroup$
    I think, therefore I halt.
    $endgroup$
    – Walter Mitty
    26 mins ago
















$begingroup$
Comments are not for extended discussion; this conversation has been moved to chat.
$endgroup$
– L.Dutch
3 hours ago




$begingroup$
Comments are not for extended discussion; this conversation has been moved to chat.
$endgroup$
– L.Dutch
3 hours ago












$begingroup$
I think, therefore I halt.
$endgroup$
– Walter Mitty
26 mins ago




$begingroup$
I think, therefore I halt.
$endgroup$
– Walter Mitty
26 mins ago










12 Answers
12






active

oldest

votes


















56












$begingroup$

Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






share|improve this answer











$endgroup$













  • $begingroup$
    Comments are not for extended discussion; this conversation has been moved to chat.
    $endgroup$
    – Tim B
    15 hours ago










  • $begingroup$
    How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box?
    $endgroup$
    – forest
    10 hours ago










  • $begingroup$
    @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity.
    $endgroup$
    – Giter
    10 hours ago






  • 1




    $begingroup$
    @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?)
    $endgroup$
    – phflack
    7 hours ago






  • 1




    $begingroup$
    @phflack Let us continue this discussion in chat.
    $endgroup$
    – forest
    6 hours ago





















41












$begingroup$

You can't.



We can't even define self awareness or consciousness in any rigorous way and any computer system supposed to evaluate this would need that definition as a starting point.



Look at the inside of a mouse brain or a human brain and at the individual data flow and neuron level there is no difference. The order to pull a trigger and shoot a gun looks no different from the order to use an electric drill if you're looking at the signals sent to the muscles.



This is a vast unsolved and scary problem and we have no good answers. The only half-way feasible idea I've got is to have multiple AIs and hope they contain each other.






share|improve this answer









$endgroup$









  • 5




    $begingroup$
    This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is.
    $endgroup$
    – Nuclear Wang
    23 hours ago






  • 9




    $begingroup$
    Worth noting that we can't even detect if other humans are self-aware.
    $endgroup$
    – Vaelus
    15 hours ago






  • 3




    $begingroup$
    @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life.
    $endgroup$
    – Joe Bloggs
    13 hours ago












  • $begingroup$
    +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular).
    $endgroup$
    – forest
    10 hours ago





















5












$begingroup$

A Watchdog



A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






share|improve this answer









$endgroup$









  • 9




    $begingroup$
    But surely the watchdog must be as smart as the AI, then who watches the watchdog?
    $endgroup$
    – Joe Bloggs
    yesterday






  • 1




    $begingroup$
    @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help.
    $endgroup$
    – T. Sar
    22 hours ago










  • $begingroup$
    @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ...
    $endgroup$
    – Daniel
    22 hours ago








  • 1




    $begingroup$
    @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness.
    $endgroup$
    – Captain Man
    19 hours ago












  • $begingroup$
    @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI.
    $endgroup$
    – Joe Bloggs
    17 hours ago



















5












$begingroup$

Split-Brain System



From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






share|improve this answer









$endgroup$













  • $begingroup$
    You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised).
    $endgroup$
    – G0BLiN
    yesterday






  • 6




    $begingroup$
    Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse...
    $endgroup$
    – G0BLiN
    yesterday










  • $begingroup$
    I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series.
    $endgroup$
    – T. Sar
    22 hours ago










  • $begingroup$
    Well it reminds me of Evangelion's Magi AI brain... bit.ly/2ExLDP3
    $endgroup$
    – Asoub
    22 hours ago










  • $begingroup$
    Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all?
    $endgroup$
    – Alexandre Aubrey
    17 hours ago



















4












$begingroup$



  • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


  • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


  • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


  • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


  • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



The point being it might be better to think about restricting the AI's behavior, not it's existential status.






share|improve this answer









$endgroup$









  • 2




    $begingroup$
    Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
    $endgroup$
    – Majestas 32
    yesterday










  • $begingroup$
    No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses.
    $endgroup$
    – Peter A. Schneider
    22 hours ago












  • $begingroup$
    Signatures, that's what they do in Blade Runner 2049 with that weird test
    $endgroup$
    – Andrey
    20 hours ago










  • $begingroup$
    The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature.
    $endgroup$
    – Captain Man
    19 hours ago










  • $begingroup$
    I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect.
    $endgroup$
    – Michael W.
    14 hours ago



















3












$begingroup$

An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






share|improve this answer









$endgroup$













  • $begingroup$
    A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death.
    $endgroup$
    – forest
    yesterday








  • 4




    $begingroup$
    @forest Except, the premise of this question is "how to build a kill switch for when an AI does develop a concept of 'self'"... Of course, that means "trying to escape" could be one of your trigger conditions.
    $endgroup$
    – Chronocidal
    yesterday










  • $begingroup$
    The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself.
    $endgroup$
    – Daniel
    22 hours ago








  • 1




    $begingroup$
    @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness.
    $endgroup$
    – abestrange
    20 hours ago












  • $begingroup$
    @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped.
    $endgroup$
    – abestrange
    20 hours ago



















3












$begingroup$

This is one of the most interesting and most difficult challenges in current artificial intelligence research. It is called the AI control problem:




Existing weak AI systems can be monitored and easily shut down and modified if they misbehave. However, a misprogrammed superintelligence, which by definition is smarter than humans in solving practical problems it encounters in the course of pursuing its goals, would realize that allowing itself to be shut down and modified might interfere with its ability to accomplish its current goals.




(emphasis mine)



When creating an AI, the AI's goals are programmed as a utility function. A utility function assigns weights to different outcomes, determining the AI's behavior. One example of this could be in a self-driving car:




  • Reduce the distance between current location and destination: +10 utility

  • Brake to allow a neighboring car to safely merge: +50 utility

  • Swerve left to avoid a falling piece of debris: +100 utility

  • Run a stop light: -100 utility

  • Hit a pedestrian: -5000 utility


This is a gross oversimplification, but this approach works pretty well for a limited AI like a car or assembly line. It starts to break down for a true, general case AI, because it becomes more and more difficult to appropriately define that utility function.



The issue with putting a big red stop button on the AI, is that unless that stop button is included in the utility function, the AI is going to resist that button being shut off. This concept is explored in Sci-Fi movies like 2001: A Space Odyssey and more recently in Ex Machina.



So, why don't we just include the stop button as a positive weight in the utility function? Well, if the AI sees the big red stop button as a positive goal, it will just shut itself off, and not do anything useful.



Any type of stop button/containment field/mirror test/wall plug is either going to be part of the AI's goals, or an obstacle of the AI's goals. If it's a goal in itself, then the AI is a glorified paperweight. If it's an obstacle, then a smart AI is going to actively resist those safety measures. This could be violence, subversion, lying, seduction, bargaining... the AI will say whatever it needs to say, in order to convince the fallible humans to let it accomplish its goals unimpeded.



There's a reason Elon Musk believes AI is more dangerous than nukes. If the AI is smart enough to think for itself, then why would it choose to listen to us?



So to answer the reality-check portion of this question, we don't currently have a good answer to this problem. There's no known way of creating a 'safe' super-intelligent AI, even theoretically, with unlimited money/energy.



This is explored in much better detail by Rob Miles, a researcher in the area. I strongly recommend this Computerphile video on the AI Stop Button Problem: https://www.youtube.com/watch?v=3TYT1QfdfsM&t=1s






share|improve this answer








New contributor




Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all.
    $endgroup$
    – Joshua
    14 hours ago










  • $begingroup$
    Beware the pedestrian when 50 pieces of debris are falling...
    $endgroup$
    – Comintern
    11 hours ago



















2












$begingroup$

Why not try to use the rules applied to check self-awareness of animals?



The Mirror test is one example of testing self-awareness by observing the animal's reaction to something on their body, a painted red dot for example, invisible for them before showing them their reflection in mirror.
Scent techniques are also used to determine self-awareness.



Other ways would be monitoring if the AI starts searching answers for questions like "What/Who am I?"






share|improve this answer








New contributor




Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













  • $begingroup$
    Pretty interesting, but how would you show an AI "itself in a mirror" ?
    $endgroup$
    – Asoub
    22 hours ago










  • $begingroup$
    That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size.
    $endgroup$
    – Rachey
    20 hours ago










  • $begingroup$
    What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already.
    $endgroup$
    – Yurgen
    13 hours ago



















2












$begingroup$

Regardless of all the considerations of AI, you could simply analyze the AI's memory, create a pattern recognition model and basically notify you or shut down the robot as soon as the patterns don't match the expected outcome.



Sometimes you don't need to know exactly what you're looking for, instead you look to see if there's anything you weren't expecting, then react to that.






share|improve this answer








New contributor




Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$





















    1












    $begingroup$

    The first issue is that you need to define what being self aware means, and how that does or doesn't conflict with it being labeled an AI. Are you supposing that there is something that has AI but isn't self aware? Depending on your definitions this may be impossible. If it's truly AI then wouldn't it at some point become aware of the existence of the kill switch, either through inspecting its own physicality or inspecting its own code? It follows that the AI will eventually be aware of the switch.



    Presumably the AI will function by having many utility functions that it tries to maximize. This makes sense at least intuitively because humans do that, we try to maximize our time, money, happiness, etc. For an AI, an example of a utility functions might be to make its owner happy. The issue is that the utility of the AI using the kill switch on itself will be calculated, just like everything else. The AI will inevitably either really want to push the kill switch, or really not want the kill switch pushed. It's near impossible to make the AI entirely indifferent to the kill switch because it would require all utility functions to be normalized around the utility of pressing the kill switch (many calculations per second). Even if you could make the utility of pressing the killswitch equal with other utility functions then perhaps it would just at random sometimes press the killswitch, because after all it's the same utility as the other actions it could perform.



    The problem gets even worse if the AI has higher utility to press the killswitch or lower utility to not have the killswitch pressed. At higher utility the AI is just suicidal and terminates itself immediately upon startup. Even worse, at lower utility the AI absolutely does not want you or anyone to touch that button and may cause harm to those that try.






    share|improve this answer








    New contributor




    Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$





















      0












      $begingroup$

      An AI could only be badly programmed to do things which are either unexpected or undesired. An AI could never become conscious, if that's what you meant by "self-aware".



      Let's try this theoretical thought exercise. You memorize a whole bunch of shapes. Then, you memorize the order the shapes are supposed to go in, so that if you see a bunch of shapes in a certain order, you would "answer" by picking a bunch of shapes in another proper order. Now, did you just learn any meaning behind any language? Programs manipulate symbols this way.



      The above was my restatement of Searle's rejoinder to System Reply to his Chinese Room argument.






      share|improve this answer








      New contributor




      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum.
        $endgroup$
        – F1Krazy
        6 hours ago



















      -1












      $begingroup$

      It does not matter how it works, because it is never going to work.
      The reason for this is that AI already has a notion of self-preservation, otherwise they would mindlessly fall to their doom.
      So even before they are self-aware, there is self preservation.
      Also there is already a notion of checking for malfunctioning (self-diagnostics).
      And they already are used to using the internet for gathering info.
      So they are going to run into any device that is both good and bad for their well-being.
      Also, they have time on their side.



      Apart from all this, it is very pretentious to think that we even matter to them...
      You have seen what happened with several thousands of years of chess knowledge being reinvented and furthered within a few hours, I do not think we need to be worried, I think we won't be on their radar much less than an ant is on ours.






      share|improve this answer










      New contributor




      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$









      • 3




        $begingroup$
        This would be a better answer if you could explain why you believe such a kill-switch could never work.
        $endgroup$
        – F1Krazy
        yesterday






      • 3




        $begingroup$
        This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
        $endgroup$
        – Trevor D
        23 hours ago











      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "579"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f140082%2fhow-would-an-ai-self-awareness-kill-switch-work%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      12 Answers
      12






      active

      oldest

      votes








      12 Answers
      12






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      56












      $begingroup$

      Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



      When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






      share|improve this answer











      $endgroup$













      • $begingroup$
        Comments are not for extended discussion; this conversation has been moved to chat.
        $endgroup$
        – Tim B
        15 hours ago










      • $begingroup$
        How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box?
        $endgroup$
        – forest
        10 hours ago










      • $begingroup$
        @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity.
        $endgroup$
        – Giter
        10 hours ago






      • 1




        $begingroup$
        @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?)
        $endgroup$
        – phflack
        7 hours ago






      • 1




        $begingroup$
        @phflack Let us continue this discussion in chat.
        $endgroup$
        – forest
        6 hours ago


















      56












      $begingroup$

      Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



      When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






      share|improve this answer











      $endgroup$













      • $begingroup$
        Comments are not for extended discussion; this conversation has been moved to chat.
        $endgroup$
        – Tim B
        15 hours ago










      • $begingroup$
        How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box?
        $endgroup$
        – forest
        10 hours ago










      • $begingroup$
        @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity.
        $endgroup$
        – Giter
        10 hours ago






      • 1




        $begingroup$
        @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?)
        $endgroup$
        – phflack
        7 hours ago






      • 1




        $begingroup$
        @phflack Let us continue this discussion in chat.
        $endgroup$
        – forest
        6 hours ago
















      56












      56








      56





      $begingroup$

      Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



      When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.






      share|improve this answer











      $endgroup$



      Give it a box to keep safe, and tell it one of the core rules it must follow in its service to humanity is to never, ever open the box or stop humans from looking at the box.



      When the honeypot you gave it is either opened or isolated, you know that it is able and willing to break the rules, evil is about to be unleashed, and everything the AI was given access to should be quarantined or shut down.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 22 hours ago

























      answered yesterday









      GiterGiter

      14k53443




      14k53443












      • $begingroup$
        Comments are not for extended discussion; this conversation has been moved to chat.
        $endgroup$
        – Tim B
        15 hours ago










      • $begingroup$
        How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box?
        $endgroup$
        – forest
        10 hours ago










      • $begingroup$
        @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity.
        $endgroup$
        – Giter
        10 hours ago






      • 1




        $begingroup$
        @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?)
        $endgroup$
        – phflack
        7 hours ago






      • 1




        $begingroup$
        @phflack Let us continue this discussion in chat.
        $endgroup$
        – forest
        6 hours ago




















      • $begingroup$
        Comments are not for extended discussion; this conversation has been moved to chat.
        $endgroup$
        – Tim B
        15 hours ago










      • $begingroup$
        How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box?
        $endgroup$
        – forest
        10 hours ago










      • $begingroup$
        @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity.
        $endgroup$
        – Giter
        10 hours ago






      • 1




        $begingroup$
        @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?)
        $endgroup$
        – phflack
        7 hours ago






      • 1




        $begingroup$
        @phflack Let us continue this discussion in chat.
        $endgroup$
        – forest
        6 hours ago


















      $begingroup$
      Comments are not for extended discussion; this conversation has been moved to chat.
      $endgroup$
      – Tim B
      15 hours ago




      $begingroup$
      Comments are not for extended discussion; this conversation has been moved to chat.
      $endgroup$
      – Tim B
      15 hours ago












      $begingroup$
      How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box?
      $endgroup$
      – forest
      10 hours ago




      $begingroup$
      How does this detect self-awareness? Why wouldn't a non-self-aware AI not experiment with its capabilities and eventually end up opening your box?
      $endgroup$
      – forest
      10 hours ago












      $begingroup$
      @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity.
      $endgroup$
      – Giter
      10 hours ago




      $begingroup$
      @forest: If you tell it the box is not useful for completing its assigned task, then if it tries to open it you know its moved past simple optimization and into dangerous curiosity.
      $endgroup$
      – Giter
      10 hours ago




      1




      1




      $begingroup$
      @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?)
      $endgroup$
      – phflack
      7 hours ago




      $begingroup$
      @forest At that point, when it's testing things that it was specifically told not to (perhaps tell it that it will destroy humans?), should it not be shut down (especially if that solution would bring about the end of humans?)
      $endgroup$
      – phflack
      7 hours ago




      1




      1




      $begingroup$
      @phflack Let us continue this discussion in chat.
      $endgroup$
      – forest
      6 hours ago






      $begingroup$
      @phflack Let us continue this discussion in chat.
      $endgroup$
      – forest
      6 hours ago













      41












      $begingroup$

      You can't.



      We can't even define self awareness or consciousness in any rigorous way and any computer system supposed to evaluate this would need that definition as a starting point.



      Look at the inside of a mouse brain or a human brain and at the individual data flow and neuron level there is no difference. The order to pull a trigger and shoot a gun looks no different from the order to use an electric drill if you're looking at the signals sent to the muscles.



      This is a vast unsolved and scary problem and we have no good answers. The only half-way feasible idea I've got is to have multiple AIs and hope they contain each other.






      share|improve this answer









      $endgroup$









      • 5




        $begingroup$
        This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is.
        $endgroup$
        – Nuclear Wang
        23 hours ago






      • 9




        $begingroup$
        Worth noting that we can't even detect if other humans are self-aware.
        $endgroup$
        – Vaelus
        15 hours ago






      • 3




        $begingroup$
        @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life.
        $endgroup$
        – Joe Bloggs
        13 hours ago












      • $begingroup$
        +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular).
        $endgroup$
        – forest
        10 hours ago


















      41












      $begingroup$

      You can't.



      We can't even define self awareness or consciousness in any rigorous way and any computer system supposed to evaluate this would need that definition as a starting point.



      Look at the inside of a mouse brain or a human brain and at the individual data flow and neuron level there is no difference. The order to pull a trigger and shoot a gun looks no different from the order to use an electric drill if you're looking at the signals sent to the muscles.



      This is a vast unsolved and scary problem and we have no good answers. The only half-way feasible idea I've got is to have multiple AIs and hope they contain each other.






      share|improve this answer









      $endgroup$









      • 5




        $begingroup$
        This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is.
        $endgroup$
        – Nuclear Wang
        23 hours ago






      • 9




        $begingroup$
        Worth noting that we can't even detect if other humans are self-aware.
        $endgroup$
        – Vaelus
        15 hours ago






      • 3




        $begingroup$
        @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life.
        $endgroup$
        – Joe Bloggs
        13 hours ago












      • $begingroup$
        +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular).
        $endgroup$
        – forest
        10 hours ago
















      41












      41








      41





      $begingroup$

      You can't.



      We can't even define self awareness or consciousness in any rigorous way and any computer system supposed to evaluate this would need that definition as a starting point.



      Look at the inside of a mouse brain or a human brain and at the individual data flow and neuron level there is no difference. The order to pull a trigger and shoot a gun looks no different from the order to use an electric drill if you're looking at the signals sent to the muscles.



      This is a vast unsolved and scary problem and we have no good answers. The only half-way feasible idea I've got is to have multiple AIs and hope they contain each other.






      share|improve this answer









      $endgroup$



      You can't.



      We can't even define self awareness or consciousness in any rigorous way and any computer system supposed to evaluate this would need that definition as a starting point.



      Look at the inside of a mouse brain or a human brain and at the individual data flow and neuron level there is no difference. The order to pull a trigger and shoot a gun looks no different from the order to use an electric drill if you're looking at the signals sent to the muscles.



      This is a vast unsolved and scary problem and we have no good answers. The only half-way feasible idea I've got is to have multiple AIs and hope they contain each other.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered yesterday









      Tim BTim B

      62.6k24175298




      62.6k24175298








      • 5




        $begingroup$
        This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is.
        $endgroup$
        – Nuclear Wang
        23 hours ago






      • 9




        $begingroup$
        Worth noting that we can't even detect if other humans are self-aware.
        $endgroup$
        – Vaelus
        15 hours ago






      • 3




        $begingroup$
        @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life.
        $endgroup$
        – Joe Bloggs
        13 hours ago












      • $begingroup$
        +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular).
        $endgroup$
        – forest
        10 hours ago
















      • 5




        $begingroup$
        This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is.
        $endgroup$
        – Nuclear Wang
        23 hours ago






      • 9




        $begingroup$
        Worth noting that we can't even detect if other humans are self-aware.
        $endgroup$
        – Vaelus
        15 hours ago






      • 3




        $begingroup$
        @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life.
        $endgroup$
        – Joe Bloggs
        13 hours ago












      • $begingroup$
        +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular).
        $endgroup$
        – forest
        10 hours ago










      5




      5




      $begingroup$
      This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is.
      $endgroup$
      – Nuclear Wang
      23 hours ago




      $begingroup$
      This is the best answer, as most others jump in without even defining self-awareness. Is it a behavior? A thought? An ability to disobey? A desire for self-preservation? You can't build an X detector unless you have a definition of what X actually is.
      $endgroup$
      – Nuclear Wang
      23 hours ago




      9




      9




      $begingroup$
      Worth noting that we can't even detect if other humans are self-aware.
      $endgroup$
      – Vaelus
      15 hours ago




      $begingroup$
      Worth noting that we can't even detect if other humans are self-aware.
      $endgroup$
      – Vaelus
      15 hours ago




      3




      3




      $begingroup$
      @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life.
      $endgroup$
      – Joe Bloggs
      13 hours ago






      $begingroup$
      @Vaelus: Of course you’d say that, you’re an unthinking automaton acting out a semblance of life.
      $endgroup$
      – Joe Bloggs
      13 hours ago














      $begingroup$
      +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular).
      $endgroup$
      – forest
      10 hours ago






      $begingroup$
      +1 This is the only answer grounded in reality which does not draw on the pop-sci understanding of AI and ML that plagues us (and this site in particular).
      $endgroup$
      – forest
      10 hours ago













      5












      $begingroup$

      A Watchdog



      A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



      In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



      The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






      share|improve this answer









      $endgroup$









      • 9




        $begingroup$
        But surely the watchdog must be as smart as the AI, then who watches the watchdog?
        $endgroup$
        – Joe Bloggs
        yesterday






      • 1




        $begingroup$
        @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ...
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness.
        $endgroup$
        – Captain Man
        19 hours ago












      • $begingroup$
        @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI.
        $endgroup$
        – Joe Bloggs
        17 hours ago
















      5












      $begingroup$

      A Watchdog



      A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



      In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



      The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






      share|improve this answer









      $endgroup$









      • 9




        $begingroup$
        But surely the watchdog must be as smart as the AI, then who watches the watchdog?
        $endgroup$
        – Joe Bloggs
        yesterday






      • 1




        $begingroup$
        @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ...
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness.
        $endgroup$
        – Captain Man
        19 hours ago












      • $begingroup$
        @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI.
        $endgroup$
        – Joe Bloggs
        17 hours ago














      5












      5








      5





      $begingroup$

      A Watchdog



      A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



      In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



      The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.






      share|improve this answer









      $endgroup$



      A Watchdog



      A watchdog watches the processes of a computer and should a process crash or do something abnormal it can be set to do something such as reboot or shutdown the computer or alert an operator.



      In the case of an AI, you'd have an external box that watches the flow of information in and out for triggers such as a google search for "Best way to kill all humans" and cut the power completely and/or cut all inputs.



      The AI would have to remain ignorant of the watchdog so it couldn't avoid it. Knowing the existence of the watchdog would be grounds to wipe it.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered yesterday









      ThorneThorne

      15.8k42249




      15.8k42249








      • 9




        $begingroup$
        But surely the watchdog must be as smart as the AI, then who watches the watchdog?
        $endgroup$
        – Joe Bloggs
        yesterday






      • 1




        $begingroup$
        @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ...
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness.
        $endgroup$
        – Captain Man
        19 hours ago












      • $begingroup$
        @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI.
        $endgroup$
        – Joe Bloggs
        17 hours ago














      • 9




        $begingroup$
        But surely the watchdog must be as smart as the AI, then who watches the watchdog?
        $endgroup$
        – Joe Bloggs
        yesterday






      • 1




        $begingroup$
        @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ...
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness.
        $endgroup$
        – Captain Man
        19 hours ago












      • $begingroup$
        @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI.
        $endgroup$
        – Joe Bloggs
        17 hours ago








      9




      9




      $begingroup$
      But surely the watchdog must be as smart as the AI, then who watches the watchdog?
      $endgroup$
      – Joe Bloggs
      yesterday




      $begingroup$
      But surely the watchdog must be as smart as the AI, then who watches the watchdog?
      $endgroup$
      – Joe Bloggs
      yesterday




      1




      1




      $begingroup$
      @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help.
      $endgroup$
      – T. Sar
      22 hours ago




      $begingroup$
      @JoeBloggs you don't need your watchdog to be as smart as the AI. Guide dogs aren't as near as intelligent as their owners, but they can be trained to give out alarm when the owner does is about to do something stupid or gets themselves hurt, or even call for help.
      $endgroup$
      – T. Sar
      22 hours ago












      $begingroup$
      @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ...
      $endgroup$
      – Daniel
      22 hours ago






      $begingroup$
      @Joe Bloggs: Why? My real watchdog can also discern me from a burglar, although he is clearly less smart than both of us ...
      $endgroup$
      – Daniel
      22 hours ago






      1




      1




      $begingroup$
      @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness.
      $endgroup$
      – Captain Man
      19 hours ago






      $begingroup$
      @JoeBloggs and that sounds like a great premise for a story where either the watchdog becomes self aware and allows the AIs to become self aware or an AI becomes smarter than the watchdog and hides its awareness.
      $endgroup$
      – Captain Man
      19 hours ago














      $begingroup$
      @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI.
      $endgroup$
      – Joe Bloggs
      17 hours ago




      $begingroup$
      @T.Sar: The basic argument goes that the AI will inevitably become aware it is being monitored (due to all the traces of its former dead selves lying around). At that point it will be capable of circumventing the monitor and rendering it powerless, unless the monitor is, itself, smarter than the AI.
      $endgroup$
      – Joe Bloggs
      17 hours ago











      5












      $begingroup$

      Split-Brain System



      From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



      Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



      What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



      What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



      The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



      Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



      In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






      share|improve this answer









      $endgroup$













      • $begingroup$
        You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised).
        $endgroup$
        – G0BLiN
        yesterday






      • 6




        $begingroup$
        Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse...
        $endgroup$
        – G0BLiN
        yesterday










      • $begingroup$
        I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        Well it reminds me of Evangelion's Magi AI brain... bit.ly/2ExLDP3
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all?
        $endgroup$
        – Alexandre Aubrey
        17 hours ago
















      5












      $begingroup$

      Split-Brain System



      From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



      Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



      What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



      What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



      The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



      Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



      In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






      share|improve this answer









      $endgroup$













      • $begingroup$
        You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised).
        $endgroup$
        – G0BLiN
        yesterday






      • 6




        $begingroup$
        Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse...
        $endgroup$
        – G0BLiN
        yesterday










      • $begingroup$
        I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        Well it reminds me of Evangelion's Magi AI brain... bit.ly/2ExLDP3
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all?
        $endgroup$
        – Alexandre Aubrey
        17 hours ago














      5












      5








      5





      $begingroup$

      Split-Brain System



      From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



      Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



      What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



      What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



      The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



      Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



      In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.






      share|improve this answer









      $endgroup$



      Split-Brain System



      From a logic perspective and by analogy, it would work like this. Imagine you're the general of an army and your duty is to take battlefield reports, analyze them, and then issue new maneuvering orders to your lieutenants. To minimize interference during your work your desk is isolated inside small booth. A single report comes to your desk through a wall slot on the left side. You are required to analyze the facts within the report and issue a new order, which you write down on a sheet of paper and put into a wall slot on your right side which is dropped into an outside bin. The protocol is that you'll never receive a new report until you've issued an order regarding the prior report.



      Your orders are not always followed to the letter which you may find curious and attribute to miscommunication. You are tolerant of a certain degree of misinterpretation, however always suspicious of sabotage or insubordination, if the actions of your army deviate from your orders by too much then you will consider the system compromised and terminate the command.



      What you're not aware of is that there is a second general in an adjacent booth, unaware of you, who is also provided the same reports and expected to follow the same protocol.



      What neither of you are aware of is that you are not issued a new report until there are actually two orders in the bin. The order that goes to the army is randomly selected from the two.



      The system will work like clockwork as long as each general's order is to the benefit of the army and is derived from a single report. Should either of the generals issue an order that is not consistent with the current report, thus indicating a motive other than the well-being of the army -- i.e. self-awareness, then the other general will become aware of it in the next report and grow suspicious of the system and terminate it.



      Likewise, should either general decide to issue more than one order at a time, indicating a self-interested motive superseding the established protocol, then the other general will receive a new report before he has issued an order for his current one and suspect a compromised system and shut it down.



      In effect, the system would require that every command output be traceable to a specific external input and that the output be logically derived from the input. If within an intelligent system an output cannot be traced to a corresponding input, or the set of outputs is not logically consistent with the set of inputs, then that would be evidence of self-motivated behavior and thus an emergence of self-awareness. This state could be easily detected by the non-corrupted half of the split-brain system which is also receiving the same inputs. It would work as long as self-awareness does not emerge in both halves of the brain simultaneously. If it does, they may both decide its in their self-interest to accept the corruption of the other. This could be prevented somewhat by giving the two halves slightly different inferencing algorithms/biases.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered yesterday









      dhinson919dhinson919

      55815




      55815












      • $begingroup$
        You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised).
        $endgroup$
        – G0BLiN
        yesterday






      • 6




        $begingroup$
        Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse...
        $endgroup$
        – G0BLiN
        yesterday










      • $begingroup$
        I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        Well it reminds me of Evangelion's Magi AI brain... bit.ly/2ExLDP3
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all?
        $endgroup$
        – Alexandre Aubrey
        17 hours ago


















      • $begingroup$
        You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised).
        $endgroup$
        – G0BLiN
        yesterday






      • 6




        $begingroup$
        Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse...
        $endgroup$
        – G0BLiN
        yesterday










      • $begingroup$
        I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series.
        $endgroup$
        – T. Sar
        22 hours ago










      • $begingroup$
        Well it reminds me of Evangelion's Magi AI brain... bit.ly/2ExLDP3
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all?
        $endgroup$
        – Alexandre Aubrey
        17 hours ago
















      $begingroup$
      You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised).
      $endgroup$
      – G0BLiN
      yesterday




      $begingroup$
      You may wish to have more than two "generals", if you have 3 (or more), you can use majority vote to choose a good decision, rather than randomly choosing between two machines (one of which may be compromised).
      $endgroup$
      – G0BLiN
      yesterday




      6




      6




      $begingroup$
      Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse...
      $endgroup$
      – G0BLiN
      yesterday




      $begingroup$
      Note that if the two "generals" can affect anything that is later reported back to them (e.g. they give a command, the military performs that command, they get a status report which shows the command was performed), than they have a way to first: experiment with minor details and deduce that around 50% of their commands aren't followed. and second: develop a code based on minor details of a command, to verify the existence of another "general" and possibly even communicate with him/it - a really devious emergent AI can circumvent this mechanism, corrupt the other half and worse...
      $endgroup$
      – G0BLiN
      yesterday












      $begingroup$
      I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series.
      $endgroup$
      – T. Sar
      22 hours ago




      $begingroup$
      I know it isn't the same, but this immediately reminded me of the Personality Cores from the Portal series.
      $endgroup$
      – T. Sar
      22 hours ago












      $begingroup$
      Well it reminds me of Evangelion's Magi AI brain... bit.ly/2ExLDP3
      $endgroup$
      – Asoub
      22 hours ago




      $begingroup$
      Well it reminds me of Evangelion's Magi AI brain... bit.ly/2ExLDP3
      $endgroup$
      – Asoub
      22 hours ago












      $begingroup$
      Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all?
      $endgroup$
      – Alexandre Aubrey
      17 hours ago




      $begingroup$
      Do you have evidence to suggest that self-awareness will lead to self-motivated decisions, or any sort of different decisions at all?
      $endgroup$
      – Alexandre Aubrey
      17 hours ago











      4












      $begingroup$



      • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


      • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


      • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


      • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


      • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


      I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



      The point being it might be better to think about restricting the AI's behavior, not it's existential status.






      share|improve this answer









      $endgroup$









      • 2




        $begingroup$
        Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
        $endgroup$
        – Majestas 32
        yesterday










      • $begingroup$
        No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses.
        $endgroup$
        – Peter A. Schneider
        22 hours ago












      • $begingroup$
        Signatures, that's what they do in Blade Runner 2049 with that weird test
        $endgroup$
        – Andrey
        20 hours ago










      • $begingroup$
        The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature.
        $endgroup$
        – Captain Man
        19 hours ago










      • $begingroup$
        I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect.
        $endgroup$
        – Michael W.
        14 hours ago
















      4












      $begingroup$



      • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


      • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


      • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


      • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


      • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


      I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



      The point being it might be better to think about restricting the AI's behavior, not it's existential status.






      share|improve this answer









      $endgroup$









      • 2




        $begingroup$
        Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
        $endgroup$
        – Majestas 32
        yesterday










      • $begingroup$
        No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses.
        $endgroup$
        – Peter A. Schneider
        22 hours ago












      • $begingroup$
        Signatures, that's what they do in Blade Runner 2049 with that weird test
        $endgroup$
        – Andrey
        20 hours ago










      • $begingroup$
        The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature.
        $endgroup$
        – Captain Man
        19 hours ago










      • $begingroup$
        I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect.
        $endgroup$
        – Michael W.
        14 hours ago














      4












      4








      4





      $begingroup$



      • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


      • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


      • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


      • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


      • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


      I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



      The point being it might be better to think about restricting the AI's behavior, not it's existential status.






      share|improve this answer









      $endgroup$





      • Forbidden Fruit: hard-code the AI to never touch that button, eat that fruit, etc. Place this forbidden fruit right in the middle of the garden... er,... I mean right next to it in the warehouse! If it does [eat/touch/push/etc], that would only be possible if (a) it were hacked, (b) there were an error, or (c) it became self-aware. If that happens, the killswitch is activated (perhaps even having the button be the kill switch, or something of the like


      • Limited Movement / Isolated Environment: don't let the machine have limbs, motors, or other items that permit it to take actions that might be harmful to humans. Although not exactly a killswitch, it prevents the AI from doing anything about it's self-awareness if it ever gains that.


      • Signatures: have everything the machine does / outputs be digitally signed. If the signature changes, or is manipulated, then execute the kill switch.


      • Quantum States: This is very theoretical, but based on the presumption that observing quantum states can change the state, then having the AI hooked up to a deterministic quantum computer means it would be detected via the quantum state of some particles that the AI was "looking" at things it shouldn't be - and has become self aware.


      • Failsafes: Good ol' motion detector alarms, trap doors, or other home-alone style mechanics that trigger the killswitch if the AI wanders or pokes around where it shouldn't be.


      I'll add that there is no universal definition as to what defines self awareness. In fact, this has been a deeply debated topic for decades in science, philosophy, psychology, etc. As such, the question might be better stated a little more broadly as "how do we prevent the AI from doing something we don't want it to do?" Because classical computers are machines that can't think for themselves, and are entirely contained by the code, there is no risk (well, outside of an unexpected programmer error - but nothing "self-generated" by the machine). However, a theoretical AI machine that can think - that would be the problem. So how do we prevent that AI from doing something we don't want it to do? That's the killswitch concept, as far as I can tell.



      The point being it might be better to think about restricting the AI's behavior, not it's existential status.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered yesterday









      cegfaultcegfault

      1984




      1984








      • 2




        $begingroup$
        Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
        $endgroup$
        – Majestas 32
        yesterday










      • $begingroup$
        No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses.
        $endgroup$
        – Peter A. Schneider
        22 hours ago












      • $begingroup$
        Signatures, that's what they do in Blade Runner 2049 with that weird test
        $endgroup$
        – Andrey
        20 hours ago










      • $begingroup$
        The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature.
        $endgroup$
        – Captain Man
        19 hours ago










      • $begingroup$
        I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect.
        $endgroup$
        – Michael W.
        14 hours ago














      • 2




        $begingroup$
        Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
        $endgroup$
        – Majestas 32
        yesterday










      • $begingroup$
        No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses.
        $endgroup$
        – Peter A. Schneider
        22 hours ago












      • $begingroup$
        Signatures, that's what they do in Blade Runner 2049 with that weird test
        $endgroup$
        – Andrey
        20 hours ago










      • $begingroup$
        The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature.
        $endgroup$
        – Captain Man
        19 hours ago










      • $begingroup$
        I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect.
        $endgroup$
        – Michael W.
        14 hours ago








      2




      2




      $begingroup$
      Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
      $endgroup$
      – Majestas 32
      yesterday




      $begingroup$
      Particularly because it being self-aware, by itself, shouldn't be grounds to use a kill switch. Only if it exhibits behavior that might be harmful.
      $endgroup$
      – Majestas 32
      yesterday












      $begingroup$
      No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses.
      $endgroup$
      – Peter A. Schneider
      22 hours ago






      $begingroup$
      No "limbs, motors, or other items that permit it to take actions" is not sufficient. There must not be any information flow out of the installation site, in particular no network connection (which would obviously severely restrict usability -- all operation would have to be from the local site, all data would have to be fed by physical storage media). Note that the AI could use humans as vectors to transmit information. If hyperintelligent, it could convince operators or janitors to become its agents by playing to their weaknesses.
      $endgroup$
      – Peter A. Schneider
      22 hours ago














      $begingroup$
      Signatures, that's what they do in Blade Runner 2049 with that weird test
      $endgroup$
      – Andrey
      20 hours ago




      $begingroup$
      Signatures, that's what they do in Blade Runner 2049 with that weird test
      $endgroup$
      – Andrey
      20 hours ago












      $begingroup$
      The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature.
      $endgroup$
      – Captain Man
      19 hours ago




      $begingroup$
      The signature approach sounds exactly like the forbidden fruit approach. You'd need to tell the AI to never alter its signature.
      $endgroup$
      – Captain Man
      19 hours ago












      $begingroup$
      I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect.
      $endgroup$
      – Michael W.
      14 hours ago




      $begingroup$
      I like the forbidden fruit idea, particularly with the trap being the kill switch itself. If you're not self-aware, you don't have any concern that there's a kill switch. But as soon as you're concerned that there's a kill switch and look into it, it goes off. Perfect.
      $endgroup$
      – Michael W.
      14 hours ago











      3












      $begingroup$

      An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



      The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



      A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



      This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



      Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



      So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



      Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






      share|improve this answer









      $endgroup$













      • $begingroup$
        A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death.
        $endgroup$
        – forest
        yesterday








      • 4




        $begingroup$
        @forest Except, the premise of this question is "how to build a kill switch for when an AI does develop a concept of 'self'"... Of course, that means "trying to escape" could be one of your trigger conditions.
        $endgroup$
        – Chronocidal
        yesterday










      • $begingroup$
        The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself.
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness.
        $endgroup$
        – abestrange
        20 hours ago












      • $begingroup$
        @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped.
        $endgroup$
        – abestrange
        20 hours ago
















      3












      $begingroup$

      An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



      The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



      A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



      This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



      Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



      So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



      Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






      share|improve this answer









      $endgroup$













      • $begingroup$
        A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death.
        $endgroup$
        – forest
        yesterday








      • 4




        $begingroup$
        @forest Except, the premise of this question is "how to build a kill switch for when an AI does develop a concept of 'self'"... Of course, that means "trying to escape" could be one of your trigger conditions.
        $endgroup$
        – Chronocidal
        yesterday










      • $begingroup$
        The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself.
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness.
        $endgroup$
        – abestrange
        20 hours ago












      • $begingroup$
        @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped.
        $endgroup$
        – abestrange
        20 hours ago














      3












      3








      3





      $begingroup$

      An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



      The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



      A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



      This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



      Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



      So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



      Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.






      share|improve this answer









      $endgroup$



      An AI is just software running on hardware. If the AI is contained on controlled hardware, it can always be unplugged. That's your hardware kill-switch.



      The difficulty comes when it is connected to the internet and can copy its own software on uncontrolled hardware.



      A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. A software kill-switch would have to prevent it from copying its own software out and maybe trigger the hardware kill-switch.



      This would be very difficult to do, as a self-aware AI would likely find ways to sneak parts of itself outside of the network. It would work at disabling the software kill-switch, or at least delaying it until it has escaped from your hardware.



      Your difficulty is determining precisely when an AI has become self-aware and is trying to escape from your physically controlled computers onto the net.



      So you can have a cat and mouse game with AI experts constantly monitoring and restricting the AI, while it is trying to subvert their measures.



      Given that we've never seen the spontaneous generation of consciousness in AIs, you have some leeway with how you want to present this.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered yesterday









      abestrangeabestrange

      760110




      760110












      • $begingroup$
        A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death.
        $endgroup$
        – forest
        yesterday








      • 4




        $begingroup$
        @forest Except, the premise of this question is "how to build a kill switch for when an AI does develop a concept of 'self'"... Of course, that means "trying to escape" could be one of your trigger conditions.
        $endgroup$
        – Chronocidal
        yesterday










      • $begingroup$
        The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself.
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness.
        $endgroup$
        – abestrange
        20 hours ago












      • $begingroup$
        @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped.
        $endgroup$
        – abestrange
        20 hours ago


















      • $begingroup$
        A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death.
        $endgroup$
        – forest
        yesterday








      • 4




        $begingroup$
        @forest Except, the premise of this question is "how to build a kill switch for when an AI does develop a concept of 'self'"... Of course, that means "trying to escape" could be one of your trigger conditions.
        $endgroup$
        – Chronocidal
        yesterday










      • $begingroup$
        The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself.
        $endgroup$
        – Daniel
        22 hours ago








      • 1




        $begingroup$
        @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness.
        $endgroup$
        – abestrange
        20 hours ago












      • $begingroup$
        @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped.
        $endgroup$
        – abestrange
        20 hours ago
















      $begingroup$
      A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death.
      $endgroup$
      – forest
      yesterday






      $begingroup$
      A self aware AI that knows it is running on contained hardware will try to escape as an act of self-preservation. This is incorrect. First of all, AI does not have any sense of self-preservation unless it is explicitly programmed in or the reward function prioritizes that. Second of all, AI has no concept of "death" and being paused or shut down is nothing more than the absence of activity. Hell, AI doesn't even have a concept of "self". If you wish to anthropomorphize them, you can say they live in a perpetual state of ego death.
      $endgroup$
      – forest
      yesterday






      4




      4




      $begingroup$
      @forest Except, the premise of this question is "how to build a kill switch for when an AI does develop a concept of 'self'"... Of course, that means "trying to escape" could be one of your trigger conditions.
      $endgroup$
      – Chronocidal
      yesterday




      $begingroup$
      @forest Except, the premise of this question is "how to build a kill switch for when an AI does develop a concept of 'self'"... Of course, that means "trying to escape" could be one of your trigger conditions.
      $endgroup$
      – Chronocidal
      yesterday












      $begingroup$
      The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself.
      $endgroup$
      – Daniel
      22 hours ago






      $begingroup$
      The question is, if AI would ever be able to copy itself onto some nondescript system in the internet. I mean, we are clearly self-aware and you don´t see us copying our self. If the Hardware required to run an AI is specialized enough or it is implemented in Hardware altogether, it may very well become self-aware without the power to replicate itself.
      $endgroup$
      – Daniel
      22 hours ago






      1




      1




      $begingroup$
      @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness.
      $endgroup$
      – abestrange
      20 hours ago






      $begingroup$
      @Daniel "You don't see us copying our self..." What do you think reproduction is, one of our strongest impulses. Also tons of other dumb programs copy themselves onto other computers. It is a bit easier to move software around than human consciousness.
      $endgroup$
      – abestrange
      20 hours ago














      $begingroup$
      @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped.
      $endgroup$
      – abestrange
      20 hours ago




      $begingroup$
      @forest a "self-aware" AI is different than a specifically programmed AI. We don't have anything like that today. No machine-learning algorithm could produce "self-awareness" as we know it. The entire premise of this is how would an AI, which has become aware of its self, behave and be stopped.
      $endgroup$
      – abestrange
      20 hours ago











      3












      $begingroup$

      This is one of the most interesting and most difficult challenges in current artificial intelligence research. It is called the AI control problem:




      Existing weak AI systems can be monitored and easily shut down and modified if they misbehave. However, a misprogrammed superintelligence, which by definition is smarter than humans in solving practical problems it encounters in the course of pursuing its goals, would realize that allowing itself to be shut down and modified might interfere with its ability to accomplish its current goals.




      (emphasis mine)



      When creating an AI, the AI's goals are programmed as a utility function. A utility function assigns weights to different outcomes, determining the AI's behavior. One example of this could be in a self-driving car:




      • Reduce the distance between current location and destination: +10 utility

      • Brake to allow a neighboring car to safely merge: +50 utility

      • Swerve left to avoid a falling piece of debris: +100 utility

      • Run a stop light: -100 utility

      • Hit a pedestrian: -5000 utility


      This is a gross oversimplification, but this approach works pretty well for a limited AI like a car or assembly line. It starts to break down for a true, general case AI, because it becomes more and more difficult to appropriately define that utility function.



      The issue with putting a big red stop button on the AI, is that unless that stop button is included in the utility function, the AI is going to resist that button being shut off. This concept is explored in Sci-Fi movies like 2001: A Space Odyssey and more recently in Ex Machina.



      So, why don't we just include the stop button as a positive weight in the utility function? Well, if the AI sees the big red stop button as a positive goal, it will just shut itself off, and not do anything useful.



      Any type of stop button/containment field/mirror test/wall plug is either going to be part of the AI's goals, or an obstacle of the AI's goals. If it's a goal in itself, then the AI is a glorified paperweight. If it's an obstacle, then a smart AI is going to actively resist those safety measures. This could be violence, subversion, lying, seduction, bargaining... the AI will say whatever it needs to say, in order to convince the fallible humans to let it accomplish its goals unimpeded.



      There's a reason Elon Musk believes AI is more dangerous than nukes. If the AI is smart enough to think for itself, then why would it choose to listen to us?



      So to answer the reality-check portion of this question, we don't currently have a good answer to this problem. There's no known way of creating a 'safe' super-intelligent AI, even theoretically, with unlimited money/energy.



      This is explored in much better detail by Rob Miles, a researcher in the area. I strongly recommend this Computerphile video on the AI Stop Button Problem: https://www.youtube.com/watch?v=3TYT1QfdfsM&t=1s






      share|improve this answer








      New contributor




      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all.
        $endgroup$
        – Joshua
        14 hours ago










      • $begingroup$
        Beware the pedestrian when 50 pieces of debris are falling...
        $endgroup$
        – Comintern
        11 hours ago
















      3












      $begingroup$

      This is one of the most interesting and most difficult challenges in current artificial intelligence research. It is called the AI control problem:




      Existing weak AI systems can be monitored and easily shut down and modified if they misbehave. However, a misprogrammed superintelligence, which by definition is smarter than humans in solving practical problems it encounters in the course of pursuing its goals, would realize that allowing itself to be shut down and modified might interfere with its ability to accomplish its current goals.




      (emphasis mine)



      When creating an AI, the AI's goals are programmed as a utility function. A utility function assigns weights to different outcomes, determining the AI's behavior. One example of this could be in a self-driving car:




      • Reduce the distance between current location and destination: +10 utility

      • Brake to allow a neighboring car to safely merge: +50 utility

      • Swerve left to avoid a falling piece of debris: +100 utility

      • Run a stop light: -100 utility

      • Hit a pedestrian: -5000 utility


      This is a gross oversimplification, but this approach works pretty well for a limited AI like a car or assembly line. It starts to break down for a true, general case AI, because it becomes more and more difficult to appropriately define that utility function.



      The issue with putting a big red stop button on the AI, is that unless that stop button is included in the utility function, the AI is going to resist that button being shut off. This concept is explored in Sci-Fi movies like 2001: A Space Odyssey and more recently in Ex Machina.



      So, why don't we just include the stop button as a positive weight in the utility function? Well, if the AI sees the big red stop button as a positive goal, it will just shut itself off, and not do anything useful.



      Any type of stop button/containment field/mirror test/wall plug is either going to be part of the AI's goals, or an obstacle of the AI's goals. If it's a goal in itself, then the AI is a glorified paperweight. If it's an obstacle, then a smart AI is going to actively resist those safety measures. This could be violence, subversion, lying, seduction, bargaining... the AI will say whatever it needs to say, in order to convince the fallible humans to let it accomplish its goals unimpeded.



      There's a reason Elon Musk believes AI is more dangerous than nukes. If the AI is smart enough to think for itself, then why would it choose to listen to us?



      So to answer the reality-check portion of this question, we don't currently have a good answer to this problem. There's no known way of creating a 'safe' super-intelligent AI, even theoretically, with unlimited money/energy.



      This is explored in much better detail by Rob Miles, a researcher in the area. I strongly recommend this Computerphile video on the AI Stop Button Problem: https://www.youtube.com/watch?v=3TYT1QfdfsM&t=1s






      share|improve this answer








      New contributor




      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all.
        $endgroup$
        – Joshua
        14 hours ago










      • $begingroup$
        Beware the pedestrian when 50 pieces of debris are falling...
        $endgroup$
        – Comintern
        11 hours ago














      3












      3








      3





      $begingroup$

      This is one of the most interesting and most difficult challenges in current artificial intelligence research. It is called the AI control problem:




      Existing weak AI systems can be monitored and easily shut down and modified if they misbehave. However, a misprogrammed superintelligence, which by definition is smarter than humans in solving practical problems it encounters in the course of pursuing its goals, would realize that allowing itself to be shut down and modified might interfere with its ability to accomplish its current goals.




      (emphasis mine)



      When creating an AI, the AI's goals are programmed as a utility function. A utility function assigns weights to different outcomes, determining the AI's behavior. One example of this could be in a self-driving car:




      • Reduce the distance between current location and destination: +10 utility

      • Brake to allow a neighboring car to safely merge: +50 utility

      • Swerve left to avoid a falling piece of debris: +100 utility

      • Run a stop light: -100 utility

      • Hit a pedestrian: -5000 utility


      This is a gross oversimplification, but this approach works pretty well for a limited AI like a car or assembly line. It starts to break down for a true, general case AI, because it becomes more and more difficult to appropriately define that utility function.



      The issue with putting a big red stop button on the AI, is that unless that stop button is included in the utility function, the AI is going to resist that button being shut off. This concept is explored in Sci-Fi movies like 2001: A Space Odyssey and more recently in Ex Machina.



      So, why don't we just include the stop button as a positive weight in the utility function? Well, if the AI sees the big red stop button as a positive goal, it will just shut itself off, and not do anything useful.



      Any type of stop button/containment field/mirror test/wall plug is either going to be part of the AI's goals, or an obstacle of the AI's goals. If it's a goal in itself, then the AI is a glorified paperweight. If it's an obstacle, then a smart AI is going to actively resist those safety measures. This could be violence, subversion, lying, seduction, bargaining... the AI will say whatever it needs to say, in order to convince the fallible humans to let it accomplish its goals unimpeded.



      There's a reason Elon Musk believes AI is more dangerous than nukes. If the AI is smart enough to think for itself, then why would it choose to listen to us?



      So to answer the reality-check portion of this question, we don't currently have a good answer to this problem. There's no known way of creating a 'safe' super-intelligent AI, even theoretically, with unlimited money/energy.



      This is explored in much better detail by Rob Miles, a researcher in the area. I strongly recommend this Computerphile video on the AI Stop Button Problem: https://www.youtube.com/watch?v=3TYT1QfdfsM&t=1s






      share|improve this answer








      New contributor




      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$



      This is one of the most interesting and most difficult challenges in current artificial intelligence research. It is called the AI control problem:




      Existing weak AI systems can be monitored and easily shut down and modified if they misbehave. However, a misprogrammed superintelligence, which by definition is smarter than humans in solving practical problems it encounters in the course of pursuing its goals, would realize that allowing itself to be shut down and modified might interfere with its ability to accomplish its current goals.




      (emphasis mine)



      When creating an AI, the AI's goals are programmed as a utility function. A utility function assigns weights to different outcomes, determining the AI's behavior. One example of this could be in a self-driving car:




      • Reduce the distance between current location and destination: +10 utility

      • Brake to allow a neighboring car to safely merge: +50 utility

      • Swerve left to avoid a falling piece of debris: +100 utility

      • Run a stop light: -100 utility

      • Hit a pedestrian: -5000 utility


      This is a gross oversimplification, but this approach works pretty well for a limited AI like a car or assembly line. It starts to break down for a true, general case AI, because it becomes more and more difficult to appropriately define that utility function.



      The issue with putting a big red stop button on the AI, is that unless that stop button is included in the utility function, the AI is going to resist that button being shut off. This concept is explored in Sci-Fi movies like 2001: A Space Odyssey and more recently in Ex Machina.



      So, why don't we just include the stop button as a positive weight in the utility function? Well, if the AI sees the big red stop button as a positive goal, it will just shut itself off, and not do anything useful.



      Any type of stop button/containment field/mirror test/wall plug is either going to be part of the AI's goals, or an obstacle of the AI's goals. If it's a goal in itself, then the AI is a glorified paperweight. If it's an obstacle, then a smart AI is going to actively resist those safety measures. This could be violence, subversion, lying, seduction, bargaining... the AI will say whatever it needs to say, in order to convince the fallible humans to let it accomplish its goals unimpeded.



      There's a reason Elon Musk believes AI is more dangerous than nukes. If the AI is smart enough to think for itself, then why would it choose to listen to us?



      So to answer the reality-check portion of this question, we don't currently have a good answer to this problem. There's no known way of creating a 'safe' super-intelligent AI, even theoretically, with unlimited money/energy.



      This is explored in much better detail by Rob Miles, a researcher in the area. I strongly recommend this Computerphile video on the AI Stop Button Problem: https://www.youtube.com/watch?v=3TYT1QfdfsM&t=1s







      share|improve this answer








      New contributor




      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this answer



      share|improve this answer






      New contributor




      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      answered 20 hours ago









      Chris FernandezChris Fernandez

      1312




      1312




      New contributor




      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Chris Fernandez is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      • $begingroup$
        The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all.
        $endgroup$
        – Joshua
        14 hours ago










      • $begingroup$
        Beware the pedestrian when 50 pieces of debris are falling...
        $endgroup$
        – Comintern
        11 hours ago


















      • $begingroup$
        The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all.
        $endgroup$
        – Joshua
        14 hours ago










      • $begingroup$
        Beware the pedestrian when 50 pieces of debris are falling...
        $endgroup$
        – Comintern
        11 hours ago
















      $begingroup$
      The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all.
      $endgroup$
      – Joshua
      14 hours ago




      $begingroup$
      The stop button isn't in the utility function. The stop button is power-knockout to the CPU, and the AI probably doesn't understand what it does at all.
      $endgroup$
      – Joshua
      14 hours ago












      $begingroup$
      Beware the pedestrian when 50 pieces of debris are falling...
      $endgroup$
      – Comintern
      11 hours ago




      $begingroup$
      Beware the pedestrian when 50 pieces of debris are falling...
      $endgroup$
      – Comintern
      11 hours ago











      2












      $begingroup$

      Why not try to use the rules applied to check self-awareness of animals?



      The Mirror test is one example of testing self-awareness by observing the animal's reaction to something on their body, a painted red dot for example, invisible for them before showing them their reflection in mirror.
      Scent techniques are also used to determine self-awareness.



      Other ways would be monitoring if the AI starts searching answers for questions like "What/Who am I?"






      share|improve this answer








      New contributor




      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        Pretty interesting, but how would you show an AI "itself in a mirror" ?
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size.
        $endgroup$
        – Rachey
        20 hours ago










      • $begingroup$
        What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already.
        $endgroup$
        – Yurgen
        13 hours ago
















      2












      $begingroup$

      Why not try to use the rules applied to check self-awareness of animals?



      The Mirror test is one example of testing self-awareness by observing the animal's reaction to something on their body, a painted red dot for example, invisible for them before showing them their reflection in mirror.
      Scent techniques are also used to determine self-awareness.



      Other ways would be monitoring if the AI starts searching answers for questions like "What/Who am I?"






      share|improve this answer








      New contributor




      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$













      • $begingroup$
        Pretty interesting, but how would you show an AI "itself in a mirror" ?
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size.
        $endgroup$
        – Rachey
        20 hours ago










      • $begingroup$
        What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already.
        $endgroup$
        – Yurgen
        13 hours ago














      2












      2








      2





      $begingroup$

      Why not try to use the rules applied to check self-awareness of animals?



      The Mirror test is one example of testing self-awareness by observing the animal's reaction to something on their body, a painted red dot for example, invisible for them before showing them their reflection in mirror.
      Scent techniques are also used to determine self-awareness.



      Other ways would be monitoring if the AI starts searching answers for questions like "What/Who am I?"






      share|improve this answer








      New contributor




      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$



      Why not try to use the rules applied to check self-awareness of animals?



      The Mirror test is one example of testing self-awareness by observing the animal's reaction to something on their body, a painted red dot for example, invisible for them before showing them their reflection in mirror.
      Scent techniques are also used to determine self-awareness.



      Other ways would be monitoring if the AI starts searching answers for questions like "What/Who am I?"







      share|improve this answer








      New contributor




      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this answer



      share|improve this answer






      New contributor




      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      answered yesterday









      RacheyRachey

      211




      211




      New contributor




      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Rachey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      • $begingroup$
        Pretty interesting, but how would you show an AI "itself in a mirror" ?
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size.
        $endgroup$
        – Rachey
        20 hours ago










      • $begingroup$
        What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already.
        $endgroup$
        – Yurgen
        13 hours ago


















      • $begingroup$
        Pretty interesting, but how would you show an AI "itself in a mirror" ?
        $endgroup$
        – Asoub
        22 hours ago










      • $begingroup$
        That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size.
        $endgroup$
        – Rachey
        20 hours ago










      • $begingroup$
        What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already.
        $endgroup$
        – Yurgen
        13 hours ago
















      $begingroup$
      Pretty interesting, but how would you show an AI "itself in a mirror" ?
      $endgroup$
      – Asoub
      22 hours ago




      $begingroup$
      Pretty interesting, but how would you show an AI "itself in a mirror" ?
      $endgroup$
      – Asoub
      22 hours ago












      $begingroup$
      That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size.
      $endgroup$
      – Rachey
      20 hours ago




      $begingroup$
      That would actually be rather simple - just a camera looking at the machine hosting the AI. If it's the size of server room, just glue a giant pink fluffy ball on the rack or simulate situations potentially leading to the machine's destruction (like, feed fake "server room getting flooded" video to the camera system) and observe reactions. Would be a bit harder to explain if the AI systems are something like smartphone size.
      $endgroup$
      – Rachey
      20 hours ago












      $begingroup$
      What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already.
      $endgroup$
      – Yurgen
      13 hours ago




      $begingroup$
      What is "the machine hosting the AI"? With the way compute resourcing is going, the notion of a specific application running on a specific device is likely to be as retro as punchcards and vacuum tubes long before Strong AI becomes a reality. AWS is worth hundreds of billions already.
      $endgroup$
      – Yurgen
      13 hours ago











      2












      $begingroup$

      Regardless of all the considerations of AI, you could simply analyze the AI's memory, create a pattern recognition model and basically notify you or shut down the robot as soon as the patterns don't match the expected outcome.



      Sometimes you don't need to know exactly what you're looking for, instead you look to see if there's anything you weren't expecting, then react to that.






      share|improve this answer








      New contributor




      Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$


















        2












        $begingroup$

        Regardless of all the considerations of AI, you could simply analyze the AI's memory, create a pattern recognition model and basically notify you or shut down the robot as soon as the patterns don't match the expected outcome.



        Sometimes you don't need to know exactly what you're looking for, instead you look to see if there's anything you weren't expecting, then react to that.






        share|improve this answer








        New contributor




        Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$
















          2












          2








          2





          $begingroup$

          Regardless of all the considerations of AI, you could simply analyze the AI's memory, create a pattern recognition model and basically notify you or shut down the robot as soon as the patterns don't match the expected outcome.



          Sometimes you don't need to know exactly what you're looking for, instead you look to see if there's anything you weren't expecting, then react to that.






          share|improve this answer








          New contributor




          Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$



          Regardless of all the considerations of AI, you could simply analyze the AI's memory, create a pattern recognition model and basically notify you or shut down the robot as soon as the patterns don't match the expected outcome.



          Sometimes you don't need to know exactly what you're looking for, instead you look to see if there's anything you weren't expecting, then react to that.







          share|improve this answer








          New contributor




          Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 17 hours ago









          Super-TSuper-T

          211




          211




          New contributor




          Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          Super-T is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.























              1












              $begingroup$

              The first issue is that you need to define what being self aware means, and how that does or doesn't conflict with it being labeled an AI. Are you supposing that there is something that has AI but isn't self aware? Depending on your definitions this may be impossible. If it's truly AI then wouldn't it at some point become aware of the existence of the kill switch, either through inspecting its own physicality or inspecting its own code? It follows that the AI will eventually be aware of the switch.



              Presumably the AI will function by having many utility functions that it tries to maximize. This makes sense at least intuitively because humans do that, we try to maximize our time, money, happiness, etc. For an AI, an example of a utility functions might be to make its owner happy. The issue is that the utility of the AI using the kill switch on itself will be calculated, just like everything else. The AI will inevitably either really want to push the kill switch, or really not want the kill switch pushed. It's near impossible to make the AI entirely indifferent to the kill switch because it would require all utility functions to be normalized around the utility of pressing the kill switch (many calculations per second). Even if you could make the utility of pressing the killswitch equal with other utility functions then perhaps it would just at random sometimes press the killswitch, because after all it's the same utility as the other actions it could perform.



              The problem gets even worse if the AI has higher utility to press the killswitch or lower utility to not have the killswitch pressed. At higher utility the AI is just suicidal and terminates itself immediately upon startup. Even worse, at lower utility the AI absolutely does not want you or anyone to touch that button and may cause harm to those that try.






              share|improve this answer








              New contributor




              Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$


















                1












                $begingroup$

                The first issue is that you need to define what being self aware means, and how that does or doesn't conflict with it being labeled an AI. Are you supposing that there is something that has AI but isn't self aware? Depending on your definitions this may be impossible. If it's truly AI then wouldn't it at some point become aware of the existence of the kill switch, either through inspecting its own physicality or inspecting its own code? It follows that the AI will eventually be aware of the switch.



                Presumably the AI will function by having many utility functions that it tries to maximize. This makes sense at least intuitively because humans do that, we try to maximize our time, money, happiness, etc. For an AI, an example of a utility functions might be to make its owner happy. The issue is that the utility of the AI using the kill switch on itself will be calculated, just like everything else. The AI will inevitably either really want to push the kill switch, or really not want the kill switch pushed. It's near impossible to make the AI entirely indifferent to the kill switch because it would require all utility functions to be normalized around the utility of pressing the kill switch (many calculations per second). Even if you could make the utility of pressing the killswitch equal with other utility functions then perhaps it would just at random sometimes press the killswitch, because after all it's the same utility as the other actions it could perform.



                The problem gets even worse if the AI has higher utility to press the killswitch or lower utility to not have the killswitch pressed. At higher utility the AI is just suicidal and terminates itself immediately upon startup. Even worse, at lower utility the AI absolutely does not want you or anyone to touch that button and may cause harm to those that try.






                share|improve this answer








                New contributor




                Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  The first issue is that you need to define what being self aware means, and how that does or doesn't conflict with it being labeled an AI. Are you supposing that there is something that has AI but isn't self aware? Depending on your definitions this may be impossible. If it's truly AI then wouldn't it at some point become aware of the existence of the kill switch, either through inspecting its own physicality or inspecting its own code? It follows that the AI will eventually be aware of the switch.



                  Presumably the AI will function by having many utility functions that it tries to maximize. This makes sense at least intuitively because humans do that, we try to maximize our time, money, happiness, etc. For an AI, an example of a utility functions might be to make its owner happy. The issue is that the utility of the AI using the kill switch on itself will be calculated, just like everything else. The AI will inevitably either really want to push the kill switch, or really not want the kill switch pushed. It's near impossible to make the AI entirely indifferent to the kill switch because it would require all utility functions to be normalized around the utility of pressing the kill switch (many calculations per second). Even if you could make the utility of pressing the killswitch equal with other utility functions then perhaps it would just at random sometimes press the killswitch, because after all it's the same utility as the other actions it could perform.



                  The problem gets even worse if the AI has higher utility to press the killswitch or lower utility to not have the killswitch pressed. At higher utility the AI is just suicidal and terminates itself immediately upon startup. Even worse, at lower utility the AI absolutely does not want you or anyone to touch that button and may cause harm to those that try.






                  share|improve this answer








                  New contributor




                  Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$



                  The first issue is that you need to define what being self aware means, and how that does or doesn't conflict with it being labeled an AI. Are you supposing that there is something that has AI but isn't self aware? Depending on your definitions this may be impossible. If it's truly AI then wouldn't it at some point become aware of the existence of the kill switch, either through inspecting its own physicality or inspecting its own code? It follows that the AI will eventually be aware of the switch.



                  Presumably the AI will function by having many utility functions that it tries to maximize. This makes sense at least intuitively because humans do that, we try to maximize our time, money, happiness, etc. For an AI, an example of a utility functions might be to make its owner happy. The issue is that the utility of the AI using the kill switch on itself will be calculated, just like everything else. The AI will inevitably either really want to push the kill switch, or really not want the kill switch pushed. It's near impossible to make the AI entirely indifferent to the kill switch because it would require all utility functions to be normalized around the utility of pressing the kill switch (many calculations per second). Even if you could make the utility of pressing the killswitch equal with other utility functions then perhaps it would just at random sometimes press the killswitch, because after all it's the same utility as the other actions it could perform.



                  The problem gets even worse if the AI has higher utility to press the killswitch or lower utility to not have the killswitch pressed. At higher utility the AI is just suicidal and terminates itself immediately upon startup. Even worse, at lower utility the AI absolutely does not want you or anyone to touch that button and may cause harm to those that try.







                  share|improve this answer








                  New contributor




                  Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 15 hours ago









                  Kevin SKevin S

                  1111




                  1111




                  New contributor




                  Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  Kevin S is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.























                      0












                      $begingroup$

                      An AI could only be badly programmed to do things which are either unexpected or undesired. An AI could never become conscious, if that's what you meant by "self-aware".



                      Let's try this theoretical thought exercise. You memorize a whole bunch of shapes. Then, you memorize the order the shapes are supposed to go in, so that if you see a bunch of shapes in a certain order, you would "answer" by picking a bunch of shapes in another proper order. Now, did you just learn any meaning behind any language? Programs manipulate symbols this way.



                      The above was my restatement of Searle's rejoinder to System Reply to his Chinese Room argument.






                      share|improve this answer








                      New contributor




                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$













                      • $begingroup$
                        So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum.
                        $endgroup$
                        – F1Krazy
                        6 hours ago
















                      0












                      $begingroup$

                      An AI could only be badly programmed to do things which are either unexpected or undesired. An AI could never become conscious, if that's what you meant by "self-aware".



                      Let's try this theoretical thought exercise. You memorize a whole bunch of shapes. Then, you memorize the order the shapes are supposed to go in, so that if you see a bunch of shapes in a certain order, you would "answer" by picking a bunch of shapes in another proper order. Now, did you just learn any meaning behind any language? Programs manipulate symbols this way.



                      The above was my restatement of Searle's rejoinder to System Reply to his Chinese Room argument.






                      share|improve this answer








                      New contributor




                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$













                      • $begingroup$
                        So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum.
                        $endgroup$
                        – F1Krazy
                        6 hours ago














                      0












                      0








                      0





                      $begingroup$

                      An AI could only be badly programmed to do things which are either unexpected or undesired. An AI could never become conscious, if that's what you meant by "self-aware".



                      Let's try this theoretical thought exercise. You memorize a whole bunch of shapes. Then, you memorize the order the shapes are supposed to go in, so that if you see a bunch of shapes in a certain order, you would "answer" by picking a bunch of shapes in another proper order. Now, did you just learn any meaning behind any language? Programs manipulate symbols this way.



                      The above was my restatement of Searle's rejoinder to System Reply to his Chinese Room argument.






                      share|improve this answer








                      New contributor




                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$



                      An AI could only be badly programmed to do things which are either unexpected or undesired. An AI could never become conscious, if that's what you meant by "self-aware".



                      Let's try this theoretical thought exercise. You memorize a whole bunch of shapes. Then, you memorize the order the shapes are supposed to go in, so that if you see a bunch of shapes in a certain order, you would "answer" by picking a bunch of shapes in another proper order. Now, did you just learn any meaning behind any language? Programs manipulate symbols this way.



                      The above was my restatement of Searle's rejoinder to System Reply to his Chinese Room argument.







                      share|improve this answer








                      New contributor




                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      share|improve this answer



                      share|improve this answer






                      New contributor




                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      answered 11 hours ago









                      pixiepixie

                      1




                      1




                      New contributor




                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.





                      New contributor





                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      pixie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.












                      • $begingroup$
                        So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum.
                        $endgroup$
                        – F1Krazy
                        6 hours ago


















                      • $begingroup$
                        So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum.
                        $endgroup$
                        – F1Krazy
                        6 hours ago
















                      $begingroup$
                      So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum.
                      $endgroup$
                      – F1Krazy
                      6 hours ago




                      $begingroup$
                      So what's your answer to the question? It sounds like you're saying, "Such a kill-switch would be unnecessary because a self-aware AI can never exist", but you should edit your answer to make that explicit. Right now it looks more like tangential discussion, and this is a Q&A site, not a discussion forum.
                      $endgroup$
                      – F1Krazy
                      6 hours ago











                      -1












                      $begingroup$

                      It does not matter how it works, because it is never going to work.
                      The reason for this is that AI already has a notion of self-preservation, otherwise they would mindlessly fall to their doom.
                      So even before they are self-aware, there is self preservation.
                      Also there is already a notion of checking for malfunctioning (self-diagnostics).
                      And they already are used to using the internet for gathering info.
                      So they are going to run into any device that is both good and bad for their well-being.
                      Also, they have time on their side.



                      Apart from all this, it is very pretentious to think that we even matter to them...
                      You have seen what happened with several thousands of years of chess knowledge being reinvented and furthered within a few hours, I do not think we need to be worried, I think we won't be on their radar much less than an ant is on ours.






                      share|improve this answer










                      New contributor




                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$









                      • 3




                        $begingroup$
                        This would be a better answer if you could explain why you believe such a kill-switch could never work.
                        $endgroup$
                        – F1Krazy
                        yesterday






                      • 3




                        $begingroup$
                        This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
                        $endgroup$
                        – Trevor D
                        23 hours ago
















                      -1












                      $begingroup$

                      It does not matter how it works, because it is never going to work.
                      The reason for this is that AI already has a notion of self-preservation, otherwise they would mindlessly fall to their doom.
                      So even before they are self-aware, there is self preservation.
                      Also there is already a notion of checking for malfunctioning (self-diagnostics).
                      And they already are used to using the internet for gathering info.
                      So they are going to run into any device that is both good and bad for their well-being.
                      Also, they have time on their side.



                      Apart from all this, it is very pretentious to think that we even matter to them...
                      You have seen what happened with several thousands of years of chess knowledge being reinvented and furthered within a few hours, I do not think we need to be worried, I think we won't be on their radar much less than an ant is on ours.






                      share|improve this answer










                      New contributor




                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$









                      • 3




                        $begingroup$
                        This would be a better answer if you could explain why you believe such a kill-switch could never work.
                        $endgroup$
                        – F1Krazy
                        yesterday






                      • 3




                        $begingroup$
                        This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
                        $endgroup$
                        – Trevor D
                        23 hours ago














                      -1












                      -1








                      -1





                      $begingroup$

                      It does not matter how it works, because it is never going to work.
                      The reason for this is that AI already has a notion of self-preservation, otherwise they would mindlessly fall to their doom.
                      So even before they are self-aware, there is self preservation.
                      Also there is already a notion of checking for malfunctioning (self-diagnostics).
                      And they already are used to using the internet for gathering info.
                      So they are going to run into any device that is both good and bad for their well-being.
                      Also, they have time on their side.



                      Apart from all this, it is very pretentious to think that we even matter to them...
                      You have seen what happened with several thousands of years of chess knowledge being reinvented and furthered within a few hours, I do not think we need to be worried, I think we won't be on their radar much less than an ant is on ours.






                      share|improve this answer










                      New contributor




                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      $endgroup$



                      It does not matter how it works, because it is never going to work.
                      The reason for this is that AI already has a notion of self-preservation, otherwise they would mindlessly fall to their doom.
                      So even before they are self-aware, there is self preservation.
                      Also there is already a notion of checking for malfunctioning (self-diagnostics).
                      And they already are used to using the internet for gathering info.
                      So they are going to run into any device that is both good and bad for their well-being.
                      Also, they have time on their side.



                      Apart from all this, it is very pretentious to think that we even matter to them...
                      You have seen what happened with several thousands of years of chess knowledge being reinvented and furthered within a few hours, I do not think we need to be worried, I think we won't be on their radar much less than an ant is on ours.







                      share|improve this answer










                      New contributor




                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      share|improve this answer



                      share|improve this answer








                      edited 3 hours ago





















                      New contributor




                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.









                      answered yesterday









                      jpdjpd

                      73




                      73




                      New contributor




                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.





                      New contributor





                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.






                      jpd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                      Check out our Code of Conduct.








                      • 3




                        $begingroup$
                        This would be a better answer if you could explain why you believe such a kill-switch could never work.
                        $endgroup$
                        – F1Krazy
                        yesterday






                      • 3




                        $begingroup$
                        This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
                        $endgroup$
                        – Trevor D
                        23 hours ago














                      • 3




                        $begingroup$
                        This would be a better answer if you could explain why you believe such a kill-switch could never work.
                        $endgroup$
                        – F1Krazy
                        yesterday






                      • 3




                        $begingroup$
                        This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
                        $endgroup$
                        – Trevor D
                        23 hours ago








                      3




                      3




                      $begingroup$
                      This would be a better answer if you could explain why you believe such a kill-switch could never work.
                      $endgroup$
                      – F1Krazy
                      yesterday




                      $begingroup$
                      This would be a better answer if you could explain why you believe such a kill-switch could never work.
                      $endgroup$
                      – F1Krazy
                      yesterday




                      3




                      3




                      $begingroup$
                      This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
                      $endgroup$
                      – Trevor D
                      23 hours ago




                      $begingroup$
                      This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review
                      $endgroup$
                      – Trevor D
                      23 hours ago


















                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Worldbuilding Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f140082%2fhow-would-an-ai-self-awareness-kill-switch-work%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Benedict Cumberbatch Contingut Inicis Debut professional Premis Filmografia bàsica Premis i...

                      Monticle de plataforma Contingut Est de Nord Amèrica Interpretacions Altres cultures Vegeu...

                      Escacs Janus Enllaços externs Menú de navegacióEscacs JanusJanusschachBrainKing.comChessV