Sony Advanced Visual Sensing AG
Fall detection for elderly care using non-invasive vision-based systems remains an important yet unsolved problem. Driven by strict privacy requirements, inference must run at the edge of the vision sensor, demanding robust, real-time, and always-on perception under tight hardware constraints. To address these challenges, we propose a neuromorphic fall detection system that integrates the Sony IMX636 event-based vision sensor with the Intel Loihi 2 neuromorphic processor via a dedicated FPGA-based interface, leveraging the sparsity of event data together with near-memory asynchronous processing. Using a newly recorded dataset under diverse environmental conditions, we explore the design space of sparse neural networks deployable on a single Loihi 2 chip and analyze the tradeoffs between detection F1 score and computational cost. Notably, on the Pareto front, our LIF-based convolutional SNN with graded spikes achieves the highest computational efficiency, reaching a 55x synaptic operations sparsity for an F1 score of 58%. The LIF with graded spikes shows a gain of 6% in F1 score with 5x less operations compared to binary spikes. Furthermore, our MCUNet feature extractor with patched inference, combined with the S4D state space model, achieves the highest F1 score of 84% with a synaptic operations sparsity of 2x and a total power consumption of 90 mW on Loihi 2. Overall, our smart security camera proof-of-concept highlights the potential of integrating neuromorphic sensing and processing for edge AI applications where latency, energy consumption, and privacy are critical.
Every generation of mobile devices strives to capture video at higher resolution and frame rate than previous ones. This quality increase also requires additional power and computation to capture and encode high-quality media. We propose a method to reduce the overall power consumption for capturing high-quality videos in mobile devices. Using video frame interpolation (VFI), sensors can be driven at lower frame rate, which reduces sensor power consumption. With modern RGB hybrid event-based vision sensors (EVS), event data can be used to guide the interpolation, leading to results of much higher quality. If applied naively, interpolation methods can be expensive and lead to large amounts of intermediate data before video is encoded. This paper proposes a video encoder that generates a bitstream for high frame rate video without explicit interpolation. The proposed method estimates encoded video data (notably motion vectors) rather than frames. Thus, an encoded video file can be produced directly without explicitly producing intermediate frames.
There are no more papers matching your filters at the moment.