Plenty of CVC members attended, with a number of colloquiums debating on Deep Learning and its use in Computer Vision. Dr. Dimosthenis Karatzas, from CVC, moderated an intense debate in which speakers were asked to reflect on the future of Computer Vision and the role of Deep Learning within this new paradigm.
A team of CVC and INRIA researchers along with clinicians have developed a neural network model able to identify and monitor the evolution of Alzheimer disease in patients, by interpreting the results of a test known as PRAXIS with the use of Computer Vision and Deep Learning.
Most of the world’s developed societies are experiencing an aging trend of their population. Aging is correlated with cognitive impairment such as dementia and its most common type: Alzheimer’s disease (AD). With this in mind, researchers and policymakers know that there is an urge to develop effective technological tools that will help doctors perform precoucius a to do early and precise diagnoses of cognitive decline.
Acknowledged as a simple test, the PRAXIS test consists of a series of two-handed tasks, in which an Alzheimer’s patient has to imitate the doctor’s gestures, (or an avatar), in doing simple movements like waving; indicating actions, like going to sleep; rotating hands or upper body. Although the test is straightforward, it is highly time-consuming and costly as it demands the presence of an expert psychiatric clinician. Most hospitals and primary facilities run short, or lack, of such experts and patients are reportedly getting less assistance time in hospital appointments in even the most developed sanitary systems.
Researchers Pau Rodríguez and Dr. Jordi Gonzalez at CVC, along with Farhood Negin and Dr. François Brémond from INRIA, and members of the Institute Claude Pompidou (Nice) have developed a computer vision system able to detect if gestures or hand positions when performing tasks correspond to those shown by the avatar or doctor, and if the patient is effectively following movements. This system is fully described in the paper entitled “PRAXIS: Towards automatic cognitive assessment using gesture recognition”
“The avatar will make two types of gestures”, as explained by Pau Rodríguez, second author of the paper and CVC PhD student under Dr. Gonzalez’s supervision. “Static or dynamic gestures. The static gesture goes from repose to movement and back to repose again. The dynamic gesture has a movement associated to it, a specific action with both or one hand, like indicating they’re tired by bringing their hand to their cheek and tilting their head, or rotating their hands from their wrists”.
From this, and by assessing the patient’s movements, professional doctors are able to identify if they are performing optimally or if the body language differs. “Sometimes, they’ll be asked to move their right hand clockwise, and they’ll move it anticlockwise, or will move the wrong hand”, clarifies Dr. Gonzalez. “Thus this is an indication that they’re not performing properly, and according to the degree of performance and compared to a previous test, physicians can estimate the degree of cognitive loss caused by AD and hence evaluate their evolution”.
According to theAmerican Psychiatric Textbook of Alzheimer Disease “most cognitively normal adults will perform these tasks effortlessly. Patients with mild dementia most often perform poorly on the two-handed praxis test”. However, clinicians tend to skip it. And they have their reasons for it. As the authors explain in their paper the method lacks objectiveness and can be modified depending on the clinician who is performing it. General agreement on results is rare, and, as has already been mentioned, time shortage in medical visits and lack of prepared clinicians makes it even more difficult. Therefore, and as the authors put it in their paper, “an automatic solution that can address these problemsby providing a standardized test can be considered as a significant contribution in the field”.
This has been the main motivation for CVC and INRIA Researchers to develop an algorithm that can detect and evaluate performance with an average 90% accuracy. In their paper ‘PRAXIS: Towards automatic cognitive assessment using gesture recognition’, they lay out their research, assessing four different computer vision methods, in a quest for the optimal option. As is getting more and more usual, the method using Deep Learning outperformed the other three.
“We started off with a skeleton based method, defining the global appearance of poses by joint angle and distance features”, details Pau Rodríguez. “We then tried a multi-modal fusion method. In this approach, the skeleton feature captures only the global appearance of a person, while deep VGG features extracted from RGB video stream give additional information about hand shape and motion. This is important in order to discriminate gestures, especially the ones with similar poses”. A VGG is a deep convolutional network for object recognition, taking its name after the group that developed it: the Visual Geometry group at the University of Oxford.
Thirdly, the authors tried a local descriptor based method, in which they recognised actions and extracted features using dense trajectories in order to extract local spatio-temporal descriptors. Dense trajectories are an effective method for video representation with which researchers are able to track the location of features from one frame to the other.
Last but not least, the fourth approach was a deep learning based method. “Deep Learning performed best due to its nature”, as Dr. Jordi Gonzalez points out. “It’s the method that best uses the high amount of information it receives. All other models make a simplification of the input information, in order to be able to process it. Deep Learning does exactly the contrary, with a high computational complexity, is the only method able to identify subtle information, crucial for tasks within the medical sector”.
Building the dataset in order to train the neural networks was definitely challenging. “It was challenging because you’ve got 2D and 3D information to work with. We were faced with both dynamic and static objects, not only having to recognise and evaluate actions, but also train the neural network to recognise the patient’s gesture and give a level of proximity with the avatar’s movement in order to assess the evolution of AD in the patient”.
The system removes the need for a specialized physician within the periodic monitoring of the AD patient, and gives an objective tool of measurement, thus being able to assess and monitor the disease’s evolution with a higher degree of accuracy. But most importantly, the system is portable. It can easily be adapted to small devices, such as a smartphone, giving care givers, be it family or nurse, a tool for diagnosis and the possibility to give periodic feedback to specialists back at the hospital.
“It is totally non-invasive”, as Pau Rodríguez analyses, “advantages of using computer vision in diagnosis assessment are varied: mechanical sensor materials sometimes cause symptoms. Allergic skin reactions, for instance. In other cases, users need to be fully accustomed to devices and their functioning”.
The system will be put to use in the hospitals with which the team in France has been collaborating, and more specifically, with the Cognition, behaviour and technology Unit and the CHU memory Centre at the University Cote d’Azur, Institute Claude Pompidou, in Nice.
“It’s the first time we’ve used Deep Learning for the detection of Alzheimer and we think results are highly promising”, Dr. Jordi Gonzalez states, optimistic. The system has been built on real AD patient images compiled by clinicians at Claude Pompidou Institute. Posed as a patient oriented solution, the research opens the door to personalised control of AD and a new tool for care givers and family; a notable improvement for a disease that has been growing fast and steady in Europe for the last 20 years.
The research leading to the results obtained in this work has been partially supported by the French ANR Safee project, INRIA Large-scale initiative action called 565 PAL (Personally Assisted Living), the Spanish project TIN2015-65464-R (MINECO/FEDER), the 2016FI\_B 01163 grant of Generalitat de Catalunya, and the COST Action IC1307 European Network on Integrating Vision and Language (iV&L Net) supported by COST (European Cooperation in Science and Technology).
6 CVc papers have been accepted at this year’s European Conference on Computer Vision (ECCV) that will take place in Munich from the 8 to the 14th of September. Most of the papers aren’t accessible yet, and we will be publishing them as they become public.
For now, we only have two available paper and several temptative titles: