When the numbers of infections by COVID-19 went from being a trickle to an exponential growth curve in Spain; when a confined population became obsessed with epidemiological prediction models and trend charts; Concha Bielza, Professor of the Department of Artificial intelligence from Higher Technical School of Computer Engineers from Polytechnic University of MadridHe felt that his entire career had taken him up to this moment. The pandemic brought with it an avalanche of data and information who needed computational methods to extract scientific knowledge urgently.
With experience in the application of Machine Learning to biomedicine, bioinformatics, neuroscience or even the sport, researchers from Computational Intelligence Group from the UPM began to work with data from the health crisis in Spain. To their desolation, they found chaos: as it has been verified throughout these months, there are no unified criteria, and the problems of count Y consolidation Are frequent. At that time, Professor Bielza began a laborious job of knocking on hospitals door to door to ask them for access to their clinical information.
The perseverance of the mathematical researcher has paid off: driven from the dataset Released by HM Hospitals, COVID Data Saves Lives, his project of a prediction model using artificial intelligence of the evolution of a patient according to his prognosis on admission and the effectiveness for him of different treatments, has received one of the Special Aid summoned by the BBVA Foundation for research teams addressing the various facets of the pandemic. Three of the main hospitals in Madrid -the Ramón y Cajal, the Jiménez Foundation Diaz and the hospital Sanitas-La Zarzuela– now add their accumulated experience in the treatment of COVID-19.
How were the first steps of the project, at the height of the health crisis?
We have experience analyzing data for 25 years. Not only in medicine, we also work on such diverse things as football. Back in early March, we wanted to do something with the pandemic data, but we had a hard time finding it. It so happened that we have a Chinese former student, and he was able to provide us with data from a hospital near Wuhan. So we started with information from China, when the logical thing to do would have been to work on Spain and more specifically Madrid, which was suffering so much. But after a few days, they told us that we couldn’t finally use them, because they were going to try a new treatment and they wanted to publish it. That’s what happens with data, which is like oil …
They are very valuable.
Very valuable, yes. We then started making calls, but they were days of a lot of chaos. We then saw that HM Hospitales released some 2,000 clinical data on patients admitted to its health centers from February 24 to April 24. We made a request that was approved by your committee data science, and we started looking at what information there was. Shortly afterwards, the special call for grants from the BBVA Foundation was launched.
The Jiménez Díaz Foundation, the Ramón y Cajal Hospital or the Hospital de la Zarzuela also participate in the project. How did they get ‘recruited’?
When we were preparing the BBVA Foundation Aid, we told ourselves that with a single database it made no sense to do an analysis. Even more so when the treatment of patients is changing so much as the disease becomes better known. We were interested in analyzing much more data, and from different hospitals. The three, all university students, agreed to share their data and to collaborate in the project, also contributing their knowledge of clinical practice. The project officially started on the ninth, and now one of the most difficult things is to get the data to be homogeneous between the three hospitals. For the second year, I would love to be able to connect with more. In the past I have worked with hospitals such as Gregorio Marañón, and I do not rule out contacting them. We have asked the Community of Madrid to help us open doors, which would benefit the project.
What difficulties have you encountered in organizing? On his side, they were confined; on the hospital side, saturation; and in the middle, the problems to unify data that we are seeing in Spain.
At the moment we organize with virtual meetings and with the telephone or email. The Jiménez Díaz Foundation groups together four hospitals, which is very positive. They have committed to data from 4,000 patients; the Ramón y Cajal, with another 2,400; Sanitas-La Zarzuela, with 600 … So I hope to reach 10,000 patients if we include those from HM Hospitales. That would be the ideal minimum volume, but the more the merrier. We aspire to continue feeding the model with data over the next two years of the project to cover more casuistry and achieve greater generalizability. Something interesting would be to compare the treatments that were used at the peak of the pandemic with what is happening now, see what the data says about whether a change has actually occurred.
What data do you need hospitals to provide?
The objectives go through the prediction of intubation and mortality, the estimation of the efficiency of the treatment, and, indeed, any question that occurs to us to ask the model. We need data on age, sex; risk factors such as smoking, diabetes, hypertension, ischemic events, cholesterol; if you have cancer, COPD … In the HM Hospitales database we had vital signs at the time of admission: temperature, pulse, saturation … then, after going to the ICU, all the markers of the inflammatory response, the viral load … Then all the treatments, which have been a lot. And the end result. The important thing will be to see what hospitals are collecting and try to homogenize, although there are machine learning techniques that can impute a missing value: for example, if some have begun to register symptoms such as abdominal pain or diarrhea and others not.
What ‘machine learning’ would do is compute all the factors at the time of admission, relate them to different treatments and infer the probable results.
Yes, it can be said like this, focusing on the result. The model will not only tell me with what probability the patient will end up dying or intubated, which is very important, but it will show me in the form of a graph, such as circles and arrows, what relationship some factors have with others. It is very transparent and what Artificial Intelligence is demanding recently: Explainable Artificial Intelligence, Explainable Artificial Intelligence. There are ‘black box’ models that only show you the result, without explaining why. But what we are going to use, that of Bayesian networks, provides us with the possibility of asking super-specific questions about a given patient, and knowing the why of the answer. What drugs are associated with high mortality, what factors lead to successful treatment, what is the most likely profile of the deceased … these are questions that we can answer.
And all this would result not only in better survival and recovery, but in a shorter hospitalization time, and in less health collapse.
Of course, if the treatment is more effective, the discharge will occur earlier and there will be less hospital saturation. But another thing that I attach great importance to is our intention to make our model available to the medical community so that they can use it from a web platform. The idea is that a clinician fill in the variables of his patient, and obtain an answer according to the result he is looking for.
Why have not this type of technology been applied more to the fight against the pandemic in Spain? Not even the tracking app has gotten off the ground.
Well, that I would like to know. I, who dedicate myself to this, from the first moment saw the opportunity of my life to be able to help. But there was no data. Thanks to the help of the BBVA Foundation, the project is now seen differently. It is a very competitive call, in my area four out of 150 projects have been selected. And so a greater commitment is achieved: they realize that you are worth it. I understand that hospitals already have enough with the day to day, but having the information of thousands of patients is much better than what a doctor can see throughout his life. On the fact of digitizing and sharing health data, it will be a revolution, but from my point of view we are already late if we compare ourselves with other countries around us.
After having worked in so many areas, do you have the feeling that sharing our health data anonymously causes us more concern than handing over other types of personal information that we do freely give?
It’s a matter of changing your mentality: every day you leave a trail of information on the networks from which they can blatantly draw many features of your personality. Why don’t we do it in medicine? In the case of the coronavirus, people don’t want to be confined, or singled out. In my own family, we are many cousins, in a WhatsApp group they talk about anything, but if there is a sick person, it is hard to say. I find out on the other hand. People are afraid, they feel plagued, controlled by the Government … instead of thinking about what is necessary to stop the virus, as they do in other countries in the most natural way. Just like you donate organs, awareness must be raised to “donate” your data about the virus for the common good. It is about putting a very important grain to stop a global pandemic. And there are many ways to ensure privacy. There may have been cases of security flaws that make people go against it, but you have to be aware that this will help. As we do not develop a social conscience, we are not going anywhere.