DEVICE/BROWSER INFO
aatventure
Good news: apparently, the newest Claude model has a 0% blackmail rate! Bad news: the researchers think it's because the model realizes researchers are testing it, so it goes on its best behavior. If this trend continues continues, then we won't be able to find out if the models are scheming