SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

SafePyramid benchmark evaluates guardrail systems' ability to identify safety violations through in-context policy specification across multiple domains and complexity levels.

Hugging Face · Daily Papers ·Jiacheng Zhang, Haoyu He · ·▲ 3 upvotes

Este artigo está em destaque na seleção diária de papers do Hugging Face, curada pela comunidade de pesquisa em IA.

Autores: Jiacheng Zhang, Haoyu He, Sen Zhang, Shen Wang, Xiaolei Xu, Yuhao Sun

  • 3 upvotes da comunidade
  • Temas: guardrails, in-context policy guardrailing, safety benchmark, policy specifications, multi-turn conversations, natural-language rules

Resumo

Resumo original (em inglês), extraído do paper:

SafePyramid benchmark evaluates guardrail systems' ability to identify safety violations through in-context policy specification across multiple domains and complexity levels.

Onde ler

compartilhar: