data analysis - RISC2 Project

Scientific Machine Learning and HPC

wp_risc — Wed, 28 Jun 2023 08:24:28 +0000

In recent years we have seen rapid growth in interest in artificial intelligence in general, and machine learning (ML) techniques, particularly in different branches of science and engineering. The rapid growth of the Scientific Machine Learning field derives from the combined development and use of efficient data analysis algorithms, the availability of data from scientific instruments and computer simulations, and advances in high-performance computing. On May 25 2023, COPPE/UFRJ organized a forum to discuss Artificial Intelligence developments and its impact on the society [*].

As the coordinator of the High Performance Computing Center (Nacad) at COPPE/UFRJ, Alvaro Coutinho, presented advances in AI in Engineering and the importance of multidisciplinary research networks to address current issues in Scientific Machine Learning. Alvaro took the opportunity to highlight the need for Brazil to invest in high performance computing capacity.

The country’s sovereignty needs autonomy in producing ML advances, which depends on HPC support at the Universities and Research Centers. Brazil has nine machines in the Top 500 list of the most powerful computer systems in the world, but almost all at Petrobras company, and Universities need much more. ML is well-known to require HPC, when combined to scientific computer simulations it becomes essential.

The conventional notion of ML involves training an algorithm to automatically discover patterns, signals, or structures that may be hidden in huge databases and whose exact nature is unknown and therefore cannot be explicitly programmed. This field may face two major drawbacks: the need for a significant volume of (labelled) expensive to acquire data and limitations for extrapolating (making predictions beyond scenarios contained in the trained data difficult).

Considering that an algorithm’s predictive ability is a learning skill, current challenges must be addressed to improve the analytical and predictive capacity of Scientific ML algorithms, for example, to maximize its impact in applications of renewable energy. References [1-5] illustrate recent advances in Scientific Machine Learning in different areas of engineering and computer science.

References:

[*] https://www.coppe.ufrj.br/pt-br/planeta-coppe-noticias/noticias/coppe-e-sociedade-especialistas-debatem-os-reflexos-da-inteligencia

[1] Baker, Nathan, Steven L. Brunton, J. Nathan Kutz, Krithika Manohar, Aleksandr Y. Aravkin, Kristi Morgansen, Jennifer Klemisch, Nicholas Goebel, James Buttrick, Jeffrey Poskin, Agnes Blom-Schieber, Thomas Hogan, Darren McDonaldAlexander, Frank, Bremer, Timo, Hagberg, Aric, Kevrekidis, Yannis, Najm, Habib, Parashar, Manish, Patra, Abani, Sethian, James, Wild, Stefan, Willcox, Karen, and Lee, Steven. Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence. United States: N. p., 2019. Web. doi:10.2172/1478744.

[2] Brunton, Steven L., Bernd R. Noack, and Petros Koumoutsakos. “Machine learning for fluid mechanics.” Annual Review of Fluid Mechanics 52 (2020): 477-508.

[3] Karniadakis, George Em, et al. “Physics-informed machine learning.” Nature Reviews Physics 3.6 (2021): 422-440.

[4] Inria White Book on Artificial Intelligence: Current challenges and Inria’s engagement, 2nd edition, 2021. URL: https://www.inria.fr/en/white-paper-inria-artificial-intelligence

[5] Silva, Romulo, Umair bin Waheed, Alvaro Coutinho, and George Em Karniadakis. “Improving PINN-based Seismic Tomography by Respecting Physical Causality.” In AGU Fall Meeting Abstracts, vol. 2022, pp. S11C-09. 2022.

The post Scientific Machine Learning and HPC first appeared on RISC2 Project.

Latin American researchers present greener gateways for Big Data in INRIA Brazil Workshop

wp_risc — Wed, 03 May 2023 13:29:03 +0000

In the scope of the RISC2 Project, the State University of Sao Paulo and INRIA (Institut National de Recherche en Informatique et en Automatique), a renowned French research institute, held a workshop, on that set the stage for the presentation of the results accomplished under the work Developing Efficient Scientific Gateways for Bioinformatics in Supercomputer Environments Supported by Artificial Intelligence.

The goal of the investigation is to provide users with simplified access to computing structures through scientific solutions that represent significant developments in their fields. In the case of this project, it is intended to develop intelligent green scientific solutions for BioinfoPortal (a multiuser Brazilian infrastructure)supported by High-Performance Computing environments.

Technologically, it includes areas such as scientific workflows, data mining, machine learning, and deep learning. The outlook, in case of success, is the analysis and interpretation of Big Data allowing new paths in molecular biology, genetics, biomedicine, and health— so it becomes necessary tools capable of digesting the amount of information, efficiently, which can come.

The team performed several large-scale bioinformatics experiments that are considered to be computationally intensive. Currently, artificial intelligence is being used to generate models to analyze computational and bioinformatics metadata to understand how automatic learning can predict computational resources efficiently. The workshop was held from April 10th to 11th, and took place in the University of Sao Paulo.

RISC2 Project, which aims to explore the HPC impact in the economies of Latin America and Europe, relies on the interaction between researchers and policymakers in both regions. It also includes 16 academic partners such as the University of Buenos Aires, National Laboratory for High Performance Computing of Chile, Julich Supercomputing Centre, Barcelona Supercomputing Center (the leader of the consortium), among others.

The post Latin American researchers present greener gateways for Big Data in INRIA Brazil Workshop first appeared on RISC2 Project.

Developing Efficient Scientific Gateways for Bioinformatics in Supercomputer Environments Supported by Artificial Intelligence

wp_risc — Mon, 20 Mar 2023 09:37:46 +0000

Scientific gateways bring enormous benefits to end users by simplifying access and hiding the complexity of the underlying distributed computing infrastructure. Gateways require significant development and maintenance efforts. BioinfoPortal^[1], through its CSGrid^[2] middleware, takes advantage of Santos Dumont ^[3] heterogeneous resources. However, task submission still requires a substantial step regarding deciding the best configuration that leads to efficient execution. This project aims to develop green and intelligent scientific gateways for BioinfoPortal supported by high-performance computing environments (HPC) and specialised technologies such as scientific workflows, data mining, machine learning, and deep learning. The efficient analysis and interpretation of Big Data opens new challenges to explore molecular biology, genetics, biomedical, and healthcare to improve personalised diagnostics and therapeutics; finding new avenues to deal with this massive amount of information becomes necessary. New Bioinformatics and Computational Biology paradigms drive storage, management, and data access. HPC and Big Data advanced in this domain represent a vast new field of opportunities for bioinformatics researchers and a significant challenge. the BioinfoPortal science gateway is a multiuser Brazilian infrastructure. We present several challenges for efficiently executing applications and discuss the findings on improving the use of computational resources. We performed several large-scale bioinformatics experiments that are considered computationally intensive and time-consuming. We are currently coupling artificial intelligence to generate models to analyze computational and bioinformatics metadata to understand how automatic learning can predict computational resources’ efficient use. The computational executions are conducted at Santos Dumont, the largest supercomputer in Latin America, dedicated to the research community with 5.1 Petaflops and 36,472 computational cores distributed in 1,134 computational nodes.

By:

Carneiro, B. Fagundes, C. Osthoff, G. Freire, K. Ocaña, L. Cruz, L. Gadelha, M. Coelho, M. Galheigo, and R. Terra are with the National Laboratory of Scientific Computing, Rio de Janeiro, Brazil.

Carvalho is with the Federal Center for Technological Education Celso Suckow da Fonseca, Rio de Janeiro, Brazil.

Douglas Cardoso is with the Polytechnic Institute of Tomar, Portugal.

Boito and L, Teylo is with the University of Bordeaux, CNRS, Bordeaux INP, INRIA, LaBRI, Talence, France.

Navaux is with the Informatics Institute, the Federal University of Rio Grande do Sul, and Rio Grande do Sul, Brazil.

References:

Ocaña, K. A. C. S.; Galheigo, M.; Osthoff, C.; Gadelha, L. M. R.; Porto, F.; Gomes, A. T. A.; Oliveira, D.; Vasconcelos, A. T. BioinfoPortal: A scientific gateway for integrating bioinformatics applications on the Brazilian national high-performance computing network. Future Generation Computer Systems, v. 107, p. 192-214, 2020.

Mondelli, M. L.; Magalhães, T.; Loss, G.; Wilde, M.; Foster, I.; Mattoso, M. L. Q.; Katz, D. S.; Barbosa, H. J. C.; Vasconcelos, A. T. R.; Ocaña, K. A. C. S; Gadelha, L. BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments. PeerJ, v. 1, p. 1, 2018.

Coelho, M.; Freire, G.; Ocaña, K.; Osthoff, C.; Galheigo, M.; Carneiro, A. R.; Boito, F.; Navaux, P.; Cardoso, D. O. Desenvolvimento de um Framework de Aprendizado de Máquina no Apoio a Gateways Científicos Verdes, Inteligentes e Eficientes: BioinfoPortal como Caso de Estudo Brasileiro In: XXIII Simpósio em Sistemas Computacionais de Alto Desempenho – WSCAD 2022 (https://wscad.ufsc.br/), 2022.

Terra, R.; Ocaña, K.; Osthoff, C.; Cruz, L.; Boito, F.; Navaux, P.; Carvalho, D. Framework para a Construção de Redes Filogenéticas em Ambiente de Computação de Alto Desempenho. In: XXIII Simpósio em Sistemas Computacionais de Alto Desempenho – WSCAD 2022 (https://wscad.ufsc.br/), 2022.

Ocaña, K.; Cruz, L.; Coelho, M.; Terra, R.; Galheigo, M.; Carneiro, A.; Carvalho, D.; Gadelha, L.; Boito, F.; Navaux, P.; Osthoff, C. ParslRNA-Seq: an efficient and scalable RNAseq analysis workflow for studies of differentiated gene expression. In: Latin America High-Performance Computing Conference (CARLA), 2022, Rio Grande do Sul, Brazil. Proceedings of the Latin American High-Performance Computing Conference – CARLA 2022 (http://www.carla22.org/), 2022.

^[1] https://bioinfo.lncc.br/

^[2] https://git.tecgraf.puc-rio.br/csbase-dev/csgrid/-/tree/CSGRID-2.3-LNCC

^[3] https://https://sdumont.lncc.br

The post Developing Efficient Scientific Gateways for Bioinformatics in Supercomputer Environments Supported by Artificial Intelligence first appeared on RISC2 Project.

Mapping human brain functions using HPC

wp_risc — Wed, 01 Feb 2023 13:17:19 +0000

ContentMAP is the first Portuguese project in the field of Psychology and Cognitive Neuroscience to be awarded with European Research Council grant (ERC Starting Grant #802553). In this project one is mapping how the human brain represents object knowledge – for example, how one represents in the brain all one knows about a knife (that it cuts, that it has a handle, that is made out of metal and plastic or metal and wood, that it has a serrated and sharp part, that it is smooth and cold, etc.)? To do this, the project collects numerous MRI images while participants see and interact with objects (fMRI). HPC (High Performance Computing) is of central importance for processing these images . The use of HPC has allowed to manipulate these data, perform analysis with machine learning and complex computing in a timely manner.

Humans are particularly efficient at recognising objects – think about what surrounds us: one recognises the object where one is reading the text from as a screen, the place where one sits as a chair, the utensil in which one drinks coffee as a cup, and one does all of this extremely quickly and virtually automatically. One is able to do all this despite the fact that 1) one holds large amounts of information about each object (if one is asked to write down everything you know about a pen, you would certainly have a lot to say); and that 2) there are several exemplars of each object type (a glass can be tall, made out of glass, metal, paper or plastic, it can be different colours, etc. – but despite that, any of them would still be a glass). How does one do this? How one is able to store and process so much information in the process of recognising a glass, and generalise all the different instances of a glass to get the concept “glass”? The goal of the ContentMAP is to understand the processes that lead to successful object recognition.

The answer to these question lies in better understanding of the organisational principles of information in the brain. It is, in fact, the efficient organisation of conceptual information and object representations in the brain that allows one to quickly and efficiently recognise the keyboard that is in front of each of us. To study the neuronal organisation of object knowledge, the project collects large sets of fMRI data from several participants, and then try to decode the organisational principles of information in the brain.

Given the amount of data and the computational requirements of this type of data at the level of pre-processing and post processing, the use of HPC is essential to enable these studies to be conducted in a timely manner. For example, at the post-processing level, the project uses whole brain Support Vector Machine classification algorithms (searchlight procedures) that require hundreds of thousands of classifiers to be trained. Moreover, for each of these classifiers one needs to compute a sample distribution of the average, as well as test the various classifications of interest, and this has to be done per participant.

Because of this, the use of HPC facilities of of the Advanced Computing Laboratory (LCA) at University of Coimbra is crucial. It allows us to actually perform these analyses in one to two weeks – something that on our 14-core computers would take a few months, which in pratice would mean, most probably, that the analysis would not be done.

By Faculty of Psychology and Educational Sciences, University of Coimbra

Reference

ProAction Lab http://proactionlab.fpce.uc.pt/

The post Mapping human brain functions using HPC first appeared on RISC2 Project.

Webinar: Developing complex workflows that include HPC, Artificial Intelligence and Data Analytics

wp_risc — Tue, 24 Jan 2023 10:51:32 +0000

Date: February 22, 2023 | 4 p.m. (UTC)

Speaker: Rosa M. Badia, Barcelona Supercomputing Center

Moderator: Esteban Mocskos, Universidad de Buenos Aires

The evolution of High-Performance Computing (HPC) systems towards every-time more complex machines is opening the opportunity of hosting larger and heterogeneous applications. In this sense, the demand for developing applications that are not purely HPC, but that combine aspects of Artifical Intelligence and or Data analytics is becoming more common. However, there is a lack of environments that support the development of these complex workflows. The webinar will present PyCOMPSs, a parallel task-based programming in Python. Based on simple annotations, sequential Python programs can be executed in parallel in HPC-clusters and other distributed infrastructures.

PyCOMPSs has been extended to support tasks that invoke HPC applications and can be combined with Artificial Intelligence and Data analytics frameworks.

Some of these extensions are made in the framework of the eFlows4HPC project, which in addition is developing the HPC Workflows as a Service (HPCWaaS) methodology to make the development, deployment, execution and reuse of workflows easier. The webinar will present the current status of the PyCOMPSs programming model and how it is being extended in the eFlows4HPC project towards the project needs. Also, the HPCWaaS methodology will be introduced.

About the speaker: Rosa M. Badia holds a PhD on Computer Science (1994) from the Technical University of Catalonia (UPC). She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC).

Her current research interests are programming models for complex platforms (from edge, fog, to Clouds and large HPC systems). The group led by Dr. Badia has been developing StarSs programming model for more than 15 years, with a high success in adoption by application developers. Currently the group focuses its efforts in PyCOMPSs/COMPSs, an instance of the programming model for distributed computing including Cloud.

Dr Badia has published nearly 200 papers in international conferences and journals in the topics of her research. Her group is very active in projects funded by the European Commission and in contracts with industry. Dr Badia is the PI of the eFlows4HPC project.

Registrations are now closed.

The post Webinar: Developing complex workflows that include HPC, Artificial Intelligence and Data Analytics first appeared on RISC2 Project.

JUPITER Ascending – First European Exascale Supercomputer Coming to Jülich

wp_risc — Mon, 02 Jan 2023 12:14:22 +0000

It was finally decided in 2022: Forschungszentrum Jülich will be home to Europe’s first exascale computer. The supercomputer is set to be the first in Europe to surpass the threshold of one trillion (“1” followed by 18 zeros) calculations per second. The system will be acquired by the European supercomputing initiative EuroHPC JU. The exascale computer should help to solve important and urgent scientific questions regarding, for example, climate change, how to combat pandemics, and sustainable energy production, while also enabling the intensive use of artificial intelligence and the analysis of large data volumes. The overall costs for the system amount to 500 million euros. Of this total, 250 million euros is being provided by EuroHPC JU and a further 250 million euros in equal parts by the German Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MKW NRW).

The computer named JUPITER (short for “Joint Undertaking Pioneer for Innovative and Transformative Exascale Research”) will be installed 2023/2024 on the campus of Forschungszentrum Jülich. It is intended that the system will be operated by the Jülich Supercomputing Centre (JSC), whose supercomputers JUWELS and JURECA currently rank among the most powerful in the world. JSC has participated in the application procedure for a high-end supercomputer as a member of the Gauss Centre for Supercomputing (GCS), an association of the three German national supercomputing centres JSC in Jülich, High Performance Computing Stuttgart (HLRS), and Leibniz Computing Centre (LRZ) in Garching. The competition was organized by the European supercomputing initiative EuroHPC JU, which was formed by the European Union together with European countries and private companies.

JUPITER is now set to become the first European supercomputer to make the leap into the exascale class. In terms of computing power, it will be more powerful that 5 million modern laptops of PCs. Just like Jülich’s current supercomputer JUWELS, JUPITER will be based on a dynamic, modular supercomputing architecture, which Forschungszentrum Jülich developed together with European and international partners in the EU’s DEEP research projects.

In a modular supercomputer, various computing modules are coupled together. This enables program parts of complex simulations to be distributed over several modules, ensuring that the various hardware properties can be optimally utilized in each case. Its modular construction also means that the system is well prepared for integrating future technologies such as quantum computing or neurotrophic modules, which emulate the neural structure of a biological brain.

Figure Modular Supercomputing Architecture: Computing and storage modules of the exascale computer in its basis configuration (blue) as well as optional modules (green) and modules for future technologies (purple) as possible extensions.

In its basis configuration, JUPITER will have and enormously powerful booster module with highly efficient GPU-based computation accelerators. Massively parallel applications are accelerated by this booster in a similar way to a turbocharger, for example to calculate high-resolution climate models, develop new materials, simulate complex cell processes and energy systems, advanced basic research, or train next-generation, computationally intensive machine-learning algorithms.

One major challenge is the energy that is required for such large computing power. The average power is anticipated to be up to 15 megawatts. JUPITER has been designed as a “green” supercomputer and will be powered by green electricity. The envisaged warm water cooling system should help to ensure that JUPITER achieves the highest efficiency values. At the same time, the cooling technology opens up the possibility of intelligently using the waste heat that is produced. For example, just like its predecessor system JUWELS, JUPITER will be connected to the new low-temperature network on the Forschungszentrum Jülich campus. Further potential applications for the waste heat from JUPITER are currently being investigated by Forschungszentrum Jülich.

By Jülich Supercomputing Centre (JSC)

The first image is JUWELS: Germany’s fastest supercomputer JUWELS at Forschungszentrum Jülich, which is funded in equal parts by the Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MKW NRW) via the Gauss Centre for Supercomputing (GCS). (Copyright: Forschungszentrum Jülich / Sascha Kreklau)

The post JUPITER Ascending – First European Exascale Supercomputer Coming to Jülich first appeared on RISC2 Project.

Managing Data and Machine Learning Models in HPC Applications

wp_risc — Mon, 21 Nov 2022 14:09:42 +0000

The synergy of data science (including big data and machine learning) and HPC yields many benefits for data-intensive applications in terms of more accurate predictive data analysis and better decision making. For instance, in the context of the HPDaSc (High Performance Data Science) project between Inria and Brazil, we have shown the importance of realtime analytics to make critical high-consequence decisions in HPC applications, e.g., preventing useless drilling based on a driller’s realtime data and realtime visualization of simulated data, or the effectiveness of ML to deal with scientific data, e.g., computing Probability Density Functions (PDFs) over simulated seismic data using Spark.

However, to realize the full potential of this synergy, ML models (or models for short) must be built, combined and ensembled, which can be very complex as there can be many models to select from. Furthermore, they should be shared and reused, in particular, in different execution environments such as HPC or Spark clusters.

To address this problem, we proposed Gypscie [Porto 2022, Zorrilla 2022], a new framework that supports the entire ML lifecycle and enables model reuse and import from other frameworks. The approach behind Gypscie is to combine several rich capabilities for model and data management, and model execution, which are typically provided by different tools, in a unique framework. Overall, Gypscie provides: a platform for supporting the complete model life-cycle, from model building to deployment, monitoring and policies enforcement; an environment for casual users to find ready-to-use models that best fit a particular prediction problem, an environment to optimize ML task scheduling and execution; an easy way for developers to benchmark their models against other competitive models and improve them; a central point of access to assess models’ compliance to policies and ethics and obtain and curate observational and predictive data; provenance information and model explainability. Finally, Gypscie interfaces with multiple execution environments to run ML tasks, e.g., an HPC system such as the Santos Dumont supercomputer at LNCC or a Spark cluster.

Gypscie comes with SAVIME [Silva 2020], a multidimensional array in-memory database system for importing, storing and querying model (tensor) data. The SAVIME open-source system has been developed to support analytical queries over scientific data. Its offers an extremely efficient ingestion procedure, which practically eliminates the waiting time to analyze incoming data. It also supports dense and sparse arrays and non-integer dimension indexing. It offers a functional query language processed by a query optimiser that generates efficient query execution plans.

References

[Porto 2022] Fabio Porto, Patrick Valduriez: Data and Machine Learning Model Management with Gypscie. CARLA 2022 – Workshop on HPC and Data Sciences meet Scientific Computing, SCALAC, Sep 2022, Porto Alegre, Brazil. pp.1-2.

[Zorrilla 2022] Rocío Zorrilla, Eduardo Ogasawara, Patrick Valduriez, Fabio Porto: A Data-Driven Model Selection Approach to Spatio-Temporal Prediction. SBBD 2022 – Brazilian Symposium on Databases, SBBD, Sep 2022, Buzios, Brazil. pp.1-12.

[Silva 2020] A.C. Silva, H. Lourenço, D. Ramos, F. Porto, P. Valduriez. Savime: An Array DBMS for Simulation Analysis and Prediction. Journal of Information Data Management 11(3), 2020.

By LNCC and Inria

The post Managing Data and Machine Learning Models in HPC Applications first appeared on RISC2 Project.

RISC2 receives honors in 2022 HPCwire Readers’ and Editors’ Choice Awards

wp_risc — Thu, 17 Nov 2022 11:17:39 +0000

The RISC2 project has been recognised in the annual HPCwire Readers’ and Editors’ Choice Awards, presented at the 2022 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), in Dallas, Texas. RISC2 was selected for the Best HPC Collaboration (Academia/Government/Industry).

Fabrizio Gagliardi, director of the RISC2, says “I feel particularly honored by this recognition on behalf of all the project members who have worked so hard to achieve in such a short time, and with limited resources, a considerable impact in promoting HPC activities in Latin America in collaboration with Europe”.

Editors’ Choice: Best HPC Collaboration (Academia/Government/Industry)

The RISC2 project, following the RISC2 Project, aims to promote and improve the relationship between research and industrial communities, focusing on HPC application and infrastructure deployment, between Europe and Latin America. Led by the Barcelona Supercomputing Center (BSC), RISC2 brings together 16 partners from 12 different countries.

About the HPCwire Readers’ and Editors’ Choice Awards

The list of winners was revealed at the SC22 HPCwire booth and on the HPCwire website.

The coveted annual HPCwire Readers’ and Editors’ Choice Awards are determined through a nomination and voting process with the global HPCwire community, a well a selections from the HPCwire editors. The awards are an annual feature of the publication and constitute prestigious recognition from the HPC community . They are revealed each year too kick off the annual supercomputing conference, which showcases high performance computing, networking, storage, and data analysis.

“The 2022 Readers’ and Editors’ Choice Awards are exceptional, indeed. Solutions developed with HPC led the world out of the Pandemic, and we officially broke the Exascale threshold – HPC has now reached a billion, billion operations per second!” said Tom Tabor, CEO of Tabor Communications, publishers of HPCwire. “Between our worldwide readership of HPC experts and the most renowned panel of editors in the industry, the Readers’ and Editors’ Choice Awards represent resounding recognition of HPC accomplishments throughout the world. Our sincerest gratitude and hearty congratulations go out to all of the winners.”

The post RISC2 receives honors in 2022 HPCwire Readers’ and Editors’ Choice Awards first appeared on RISC2 Project.

Seminar on High-Performance Scientific Computing

wp_risc — Wed, 09 Nov 2022 09:53:01 +0000

RISC2 is organizing, together with our partner Universidad de la República Uruguay (UdelaR), a seminar on High-Performance Scientific Computing. RISC2 are participating directly on: ClusterUY Advanced Usage Tutorial November 11, 2022 Santiago Iturriaga, UdelaR Talk: Systems based on blockchain. Studying its behaviour as a distributed system: from emulation to simulation December 8, 2022 Esteban Mocskos, […]

The post Seminar on High-Performance Scientific Computing first appeared on RISC2 Project.

Leveraging HPC technologies to unravel epidemic dynamics

wp_risc — Mon, 17 Oct 2022 08:10:17 +0000

When we talk about the 14th century, we probably are making reference to one of the most adverse periods of human history. It was an era of regular armed conflicts, declining social systems, famine, and disease. It was the time of the bubonic plague pandemics, the Black Death, that wiped out millions of people in Europe, Africa, and Asia [1].

Several factors contributed to the catastrophic outcomes of the Black Death. The crises was boosted by the lack of two important components: knowledge and technology. There was no clue about the spread dynamics of the disease, and containment policies were desperately based on assumptions or beliefs. Some opted for self-isolation to get away from the “bad air” that was believed to be the cause of the illness [2]. Others thought the plague was a divine punishment and persecuted the heretics in order to “appease the heavens” [3]. Though the first of these two strategies was actually very effective, the second one only increased the tragedy of that scenario.

The bubonic plague of the 14th century is a great example of how unfortunate ignorance can be in the context of epidemics. If the transmission mechanisms are not well-understood, we are not able to design productive measures against them. We may end up −such as our medieval predecessors− making things much more worse. Fortunately, the advances in science and technology have provided humanity with powerful tools to comprehend infectious diseases and rapidly develop response plans. In this particular matter, epidemic models and simulations have become crucial.

In the recent COVID-19 events, many public health authorities relied on the outcomes of models, so as to determine the most probable paths of the epidemic and make informed decisions regarding sanitary measures [4]. Epidemic models have been around for a long time, and have become more and more sophisticated. One reason is the fact that they feed on data that has to be collected and processed, and which has increased in quantity and variety.

Data contains interesting patterns that give hints about the influence of apparently non-epidemiological factors such as mobility and interaction type [5]. This is how, in the 19th century, John Snow managed to discover the cause of a cholera epidemic in Soho. He plotted the registered cholera cases in a map and saw they clustered around a water pump that he presumed was contaminated [6]. Thanks to Dr. Snow’s findings, water quality started to be considered as an important component of public health.

As models grow in intricacy, the demand for more powerful computing systems also increases. In advanced approaches such as agent-based [7] and network (graph) models [8], every person is represented inside a complex framework in which the infection spreads according to specific rules. These rules could be related to the nature of the relations between individuals, their number of contacts, the places they visit, disease characteristics, and even stochastic influences. Frameworks are commonly composed of millions of individuals too, because we often want to analyze countrywide effects.

In brief, to unravel epidemic dynamics we need to process and produce a lot of accurate information, and we need to do it fast. High-performance computing (HPC) systems provide high-spec hardware and support advanced techniques such as parallel computing, which accelerate calculation by using several resources at a time to perform one or different tasks concurrently. This is an advantage for stochastic epidemic models that require hundreds of independent executions to deliver reliable outputs. Frameworks with millions of nodes or agents need several GB of memory to be processed, which is a requirement that can be met only by HPC systems.

Based on the work of Cruz et al. [9], we developed a model that represents the spread dynamics of COVID-19 in Costa Rica [10]. This model consists of a contact network of five million nodes, in which every Costa Rican citizen has a family, school, work, or random connection with their neighbors. These relations impact the probability of getting infected, as well as the “infection status” of the neighbors. The infection status varies with time, as people evolve from not having symptoms to have mild, severe, or critical conditions. People may be asymptomatic as well. The model also addresses variations in location, school and workplace sizes, age, mobility, and vaccination rates. In addition, some of these inputs are stochastic.

Such model takes only a few hours to be simulated in an HPC cluster, when normal systems would require much more time. We managed to evaluate scenarios in which different sanitary measures were changed or eliminated. This analysis brought interesting results, such as that going to a meeting with our family or friends could be as harmful as attending a concert with dozens of strangers, in terms of the additional infections that these activities would generate. Such findings are valuable inputs for health authorities, because they demonstrate that preventing certain behaviors in the population can delay the peak of infections and give them more time to save lives.

Even though HPC has been fundamental in computational epidemiology to give key insights into epidemic dynamics, we still have to leverage this technology in some contexts. For example, we must first strengthen health and information systems in developing countries to get the maximum advantage of HPC and epidemic models. The above can be achieved through inter–institutional and international collaboration, but also through national policies that support research and development. If we encourage the study of infectious diseases, we benefit from this knowledge in a way that we can approach other pandemics better in the future.

References

[1] Encyclopedia Britannica. n.d. Crisis, recovery, and resilience: Did the Middle Ages end?. [online] Available at: [Accessed 13 September 2022].

[2] Mellinger, J., 2006. Fourteenth-Century England, Medical Ethics, and the Plague. AMA Journal of Ethics, 8(4), pp.256-260.

[3] Carr, H., 2020. Black Death Quarantine: How Did We Try To Contain The Deadly Disease?. [online] Historyextra.com. Available at: [Accessed 13 September 2022].

[4] McBryde, E., Meehan, M., Adegboye, O., Adekunle, A., Caldwell, J., Pak, A., Rojas, D., Williams, B. and Trauer, J., 2020. Role of modelling in COVID-19 policy development. Paediatric Respiratory Reviews, 35, pp.57-60.

[5] Pasha, D., Lundeen, A., Yeasmin, D. and Pasha, M., 2021. An analysis to identify the important variables for the spread of COVID-19 using numerical techniques and data science. Case Studies in Chemical and Environmental Engineering, 3, p.100067.

[6] Bbc.co.uk. 2014. Historic Figures: John Snow (1813 – 1858). [online] Available at: [Accessed 13 September 2022].

[7] Publichealth.columbia.edu. 2022. Agent-Based Modeling. [online] Available at: [Accessed 13 September 2022].

[8] Keeling, M. and Eames, K., 2005. Networks and epidemic models. Journal of The Royal Society Interface, 2(4), pp.295-307.

[9] Cruz, E., Maciel, J., Clozato, C., Serpa, M., Navaux, P., Meneses, E., Abdalah, M. and Diener, M., 2021. Simulation-based evaluation of school reopening strategies during COVID-19: A case study of São Paulo, Brazil. Epidemiology and Infection, 149.

[10] Abdalah, M., Soto, C., Arce, M., Cruz, E., Maciel, J., Clozato, C. and Meneses, E., 2022. Understanding COVID-19 Epidemic in Costa Rica Through Network-Based Modeling. Communications in Computer and Information Science, pp.61-75.

By CeNAT

The post Leveraging HPC technologies to unravel epidemic dynamics first appeared on RISC2 Project.