The Digital Jury: How AI Mimics—and Distorts—Human Trust

As Large Language Models (LLMs) transition from simple chatbots to active participants in decision-making, a fundamental question arises: How do these machines judge us?

New research from the Hebrew University of Jerusalem suggests that AI does not merely process data; it forms structured assessments of human character. However, while these models mimic the logic of human trust, they do so with a mechanical rigidity that can amplify social biases in ways humans might not.

The Mechanics of Machine Judgment

In a comprehensive study published in Proceedings of the Royal Society A, researchers Valeria Lerman and Yaniv Dover compared the decision-making processes of five different LLMs against human participants. Through 43,200 simulations across various real-world scenarios—such as deciding whether to lend money to a business owner or trusting a babysitter—the team identified a striking parallel.

Both humans and AI tend to base trust on three core pillars:
1. Competence: The perceived ability to perform a task.
2. Integrity: The perceived honesty of the individual.
3. Benevolence: The perceived kindness or good intentions of the person.

However, the way these pillars are applied differs significantly. While human judgment is often “messy” and holistic, AI operates like a spreadsheet. It breaks a person down into discrete components, scoring each trait mathematically. This results in a judgment style that is highly consistent and systematic, but lacks the nuanced, fluid understanding that defines human social interaction.

The Bias Amplification Problem

The most concerning finding of the study is not that AI is biased, but that its biases are systematic and predictable.

In scenarios involving financial decisions—such as determining loan amounts or charitable donations—the LLMs demonstrated significant disparities based solely on demographic traits. Even when every other detail about a person remained identical, the AI’s “verdict” changed based on:
Age: Older individuals often received more favorable outcomes, though patterns were inconsistent.
Religion: This factor had a profound impact, particularly in monetary decisions.
Gender: Certain models showed distinct shifts in judgment based on gender.

While humans certainly hold prejudices, the researchers noted that AI biases can be more dangerous because they are embedded in the model’s logic, making them harder to detect and more uniform in their application.

The “Model Lottery”: Why Your Choice of AI Matters

The study also revealed that there is no “universal” AI perspective. Different LLMs often reached wildly different conclusions about the same individual. One model might reward a specific personality trait, while another might penalize it.

This creates a high-stakes environment for industries currently integrating AI, including:
Human Resources: Screening job candidates.
Finance: Assessing creditworthiness.
Healthcare: Recommending medical actions.
Management: Guiding organizational decisions.

If the outcome of a person’s life—such as getting a loan or a job—depends on which specific LLM is running the assessment, the potential for systemic unfairness increases.

Conclusion

The research serves as a critical reminder that while AI can model human reasoning, it does not replicate human empathy or nuance. As these systems become more embedded in the infrastructure of society, the primary challenge is no longer just whether we can trust machines, but whether we can accurately interpret the ways in which they judge us.