Identifying vulnerabilities is good for public safety, industry, and the scientists making these models. Large language models (LLMs) like ChatGPT show reasoning errors across many domains. • Identifying vulnerabilities is good for public safety, industry, and the scientists making these models. • The …