Evaluates AI model responses related to health, assessing quality, helpfulness, and factuality based on predefined criteria.