Autonomous vehicles (AVs) have significantly advanced in real-world
deployment in recent years, yet safety continues to be a critical barrier to
widespread adoption. Traditional functional safety approaches, which primarily
verify the reliability, robustness, and adequacy of AV hardware and software
systems from a vehicle-centric perspective, do not sufficiently address the
AV's broader interactions and behavioral impact on the surrounding traffic
environment. To overcome this limitation, we propose a paradigm shift toward
behavioral safety, a comprehensive approach focused on evaluating AV responses
and interactions within the traffic environment. To systematically assess
behavioral safety, we introduce a third-party AV safety assessment framework
comprising two complementary evaluation components: the Driver Licensing Test
and the Driving Intelligence Test. The Driver Licensing Test evaluates the AV's
reactive behaviors under controlled scenarios, ensuring basic behavioral
competency. In contrast, the Driving Intelligence Test assesses the AV's
interactive behaviors within naturalistic traffic conditions, quantifying the
frequency of safety-critical events to deliver statistically meaningful safety
metrics before large-scale deployment. We validated our proposed framework
using Autoware.Universe, an open-source Level 4 AV, tested both in simulated
environments and on the physical test track at the University of Michigan's
Mcity Testing Facility. The results indicate that Autoware.Universe passed 6
out of 14 scenarios and exhibited a crash rate of 3.01e-3 crashes per mile,
approximately 1,000 times higher than the average human driver crash rate.
During the tests, we also uncovered several unknown unsafe scenarios for
Autoware.Universe. These findings underscore the necessity of behavioral safety
evaluations for improving AV safety performance prior to widespread public
deployment.