Why didn't Meta meet their own Llama 3.0 Safety benchmark? Why does Llama 3.2 generate drastically more unsafe responses?
Safety Benchmark of Meta Llama 3.x Models
Why didn't Meta meet their own Llama 3.0 Safety benchmark? Why does Llama 3.2 generate drastically more unsafe responses?