Announcing:
PhD Proposal Review
Name:
Mehrshad Zandigohar
Title:
Deployable and Multimodal Human Grasp Intent Inference in Prosthetic Hand Control
Date:
7/18/2024
Time:
11:30:00 AM
Location: https://teams.microsoft.com/l/meetup-join/19%3ameeting_N2QyNzc1MWMtOWJmMi00NGNmLThlNzctN2JlNjU2Y2I1MmI1%40thread.v2/0?context=%7b%22Tid%22%3a%22a8eec281-aaa3-4dae-ac9b-9a398b9215e7%22%2c%22Oid%22%3a%22de13c261-ac42-49d7-8950-6dec3adaca4e%22%7d
ISEC 532 –
Committee Members:
Prof. Gunar Schirner (Advisor)
Prof. Deniz Erdogmus
Prof. Mallesham Dasari
Prof. Mariusz P. Furmanek
Abstract:
For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Among robotic control methods for prosthetic hand actuators, coarse-grained grasp types are a common means of effortless yet effective control. However, to advance next-generation prosthetic hand control design, it is crucial to address current shortcomings in robustness to out of lab artifacts, generalizability to new environments and deployment of such compute-intensive grasp estimators.
First and foremost, current control methods based on physiological modality such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Similarly, methods based on visual modality are also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. To address such drawbacks of single modality approaches, we present a multimodal evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. Given the lack of a synchronized multimodal dataset for evaluating multimodal grasp estimation, we propose our own customized HANDSv2 dataset with the most complete EMG profile and visual data synchronized in time. Our experimental results indicate that fusing both modalities, on average, improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%.
Although visual grasp classification has shown promising results, the generalizability to unseen object classes remains a significant challenge within the research community. This limitation arises from the fixed number of grasp types available in existing models, contrasted with the virtually infinite variety of objects encountered in the real world. The poor performance of grasp detection models on unseen objects negatively affects users’ independence and quality of life. To address this, we propose Grasp Vision Language Model (Grasp-VLM). Grasp-VLM takes advantage of the zero-shotness capability of large vision language models and teach them to perform human-like reasoning to infer the suitable grasp type estimate based on the object’s physical characteristics suitable for previously unseen objects, resulting in better generalizability in real-life scenarios. Our initial results show a significant 49% accuracy of Grasp-VLM over unseen object types compared to 15.3% accuracy of the current State-of-the-Art.
Lastly, given the computational intensity of such models, which often contain billions of parameters, deploying them to edge devices poses a serious challenge. To mitigate this, we investigate Hybrid Grasp Network (HGN), a deployment infrastructure that combines an edge-specialized model for low-latency operations with a cloud-based universal model ensuring high generalization, effectively balancing performance and resource constraints.
The holistic approach presented in this dissertation tackles four essential areas of robotic prosthetic hand control design. Handsv2 provides a customized dataset filling the gap for a multimodal synchronized dataset. Our multimodal fusion approach effectively outperforms single modality approaches providing accurate and robust grasp type estimations during the entire grasping timeline. In addition, Grasp-VLM addresses the lack of generalizability to new object types providing a more realistic grasp estimation. Lastly, our HGN design aims at providing a real-time solution investigating both speed and accuracy objectives.