Blog - RISC2 Project https://www.risc2-project.eu Mon, 11 Sep 2023 15:02:49 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Scientific Machine Learning and HPC https://www.risc2-project.eu/2023/06/28/scientific-machine-learning-and-hpc/ Wed, 28 Jun 2023 08:24:28 +0000 https://www.risc2-project.eu/?p=2863 In recent years we have seen rapid growth in interest in artificial intelligence in general, and machine learning (ML) techniques, particularly in different branches of science and engineering. The rapid growth of the Scientific Machine Learning field derives from the combined development and use of efficient data analysis algorithms, the availability of data from scientific […]

The post Scientific Machine Learning and HPC first appeared on RISC2 Project.

]]>
In recent years we have seen rapid growth in interest in artificial intelligence in general, and machine learning (ML) techniques, particularly in different branches of science and engineering. The rapid growth of the Scientific Machine Learning field derives from the combined development and use of efficient data analysis algorithms, the availability of data from scientific instruments and computer simulations, and advances in high-performance computing. On May 25 2023, COPPE/UFRJ organized a forum to discuss Artificial Intelligence developments and its impact on the society [*].

As the coordinator of the High Performance Computing Center (Nacad) at COPPE/UFRJ, Alvaro Coutinho, presented advances in AI in Engineering and the importance of multidisciplinary research networks to address current issues in Scientific Machine Learning. Alvaro took the opportunity to highlight the need for Brazil to invest in high performance computing capacity.

The country’s sovereignty needs autonomy in producing ML advances, which depends on HPC support at the Universities and Research Centers. Brazil has nine machines in the Top 500 list of the most powerful computer systems in the world, but almost all at Petrobras company, and Universities need much more. ML is well-known to require HPC, when combined to scientific computer simulations it becomes essential.

The conventional notion of ML involves training an algorithm to automatically discover patterns, signals, or structures that may be hidden in huge databases and whose exact nature is unknown and therefore cannot be explicitly programmed. This field may face two major drawbacks: the need for a significant volume of (labelled) expensive to acquire data and limitations for extrapolating (making predictions beyond scenarios contained in the trained data difficult).

Considering that an algorithm’s predictive ability is a learning skill, current challenges must be addressed to improve the analytical and predictive capacity of Scientific ML algorithms, for example, to maximize its impact in applications of renewable energy. References [1-5] illustrate recent advances in Scientific Machine Learning in different areas of engineering and computer science.

References:

[*] https://www.coppe.ufrj.br/pt-br/planeta-coppe-noticias/noticias/coppe-e-sociedade-especialistas-debatem-os-reflexos-da-inteligencia

[1] Baker, Nathan, Steven L. Brunton, J. Nathan Kutz, Krithika Manohar, Aleksandr Y. Aravkin, Kristi Morgansen, Jennifer Klemisch, Nicholas Goebel, James Buttrick, Jeffrey Poskin, Agnes Blom-Schieber, Thomas Hogan, Darren McDonaldAlexander, Frank, Bremer, Timo, Hagberg, Aric, Kevrekidis, Yannis, Najm, Habib, Parashar, Manish, Patra, Abani, Sethian, James, Wild, Stefan, Willcox, Karen, and Lee, Steven. Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence. United States: N. p., 2019. Web. doi:10.2172/1478744.

[2] Brunton, Steven L., Bernd R. Noack, and Petros Koumoutsakos. “Machine learning for fluid mechanics.” Annual Review of Fluid Mechanics 52 (2020): 477-508.

[3] Karniadakis, George Em, et al. “Physics-informed machine learning.” Nature Reviews Physics 3.6 (2021): 422-440.

[4] Inria White Book on Artificial Intelligence: Current challenges and Inria’s engagement, 2nd edition, 2021. URL: https://www.inria.fr/en/white-paper-inria-artificial-intelligence

[5] Silva, Romulo, Umair bin Waheed, Alvaro Coutinho, and George Em Karniadakis. “Improving PINN-based Seismic Tomography by Respecting Physical Causality.” In AGU Fall Meeting Abstracts, vol. 2022, pp. S11C-09. 2022.

The post Scientific Machine Learning and HPC first appeared on RISC2 Project.

]]>
Subsequent Progress And Challenges Concerning The México-UE Project ENERXICO: Supercomputing And Energy For México https://www.risc2-project.eu/2023/05/24/subsequent-progress-and-challenges-concerning-the-mexico-ue-project-enerxico-supercomputing-and-energy-for-mexico/ Wed, 24 May 2023 09:38:01 +0000 https://www.risc2-project.eu/?p=2824 In this short notice, we briefly describe some afterward advances and challenges with respect to two work packages developed in the ENERXICO Project. This opened the possibility of collaborating with colleagues from institutions that did not participate in the project, for example from the University of Santander in Colombia and from the University of Vigo […]

The post Subsequent Progress And Challenges Concerning The México-UE Project ENERXICO: Supercomputing And Energy For México first appeared on RISC2 Project.

]]>
In this short notice, we briefly describe some afterward advances and challenges with respect to two work packages developed in the ENERXICO Project. This opened the possibility of collaborating with colleagues from institutions that did not participate in the project, for example from the University of Santander in Colombia and from the University of Vigo in Spain. This exemplifies the importance of the RISC2 project in the sense that strengthening collaboration and finding joint research areas and HPC applied ventures is of great benefit for both: our Latin American Countries and the EU. We are now initiating talks to target several Energy related topics with some of the RISC2 partners. 

The ENERXICO Project focused on developing advanced simulation software solutions for oil & gas, wind energy and transportation powertrain industries.  The institutions that collaborated in the project are for México: ININ (Institution responsible for México), Centro de Investigación y de Estudios Avanzados del IPN (Cinvestav), Universidad Nacional Autónoma de México (UNAM IINGEN, FCUNAM), Universidad Autónoma Metropolitana-Azcapotzalco, Instituto Mexicano del Petróleo, Instituto Politécnico Nacional (IPN) and Pemex, and for the European Union: Centro de Supercómputo de Barcelona (Institution responsible for the EU), Technische Universitäts München, Alemania (TUM), Universidad de Grenoble Alpes, Francia (UGA), CIEMAT, España, Repsol, Iberdrola, Bull, Francia e Universidad Politécnica de Valencia, España.  

The Project contemplated four working packages (WP): 

WP1 Exascale Enabling: This was a cross-cutting work package that focused on assessing performance bottlenecks and improving the efficiency of the HPC codes proposed in vertical WP (UE Coordinator: BULL, MEX Coordinator: CINVESTAV-COMPUTACIÓN); 

WP2 Renewable energies:  This WP deployed new applications required to design, optimize and forecast the production of wind farms (UE Coordinator: IBR, MEX Coordinator: ININ); 

WP3 Oil and gas energies: This WP addressed the impact of HPC on the entire oil industry chain (UE Coordinator: REPSOL, MEX Coordinator: ININ); 

WP4 Biofuels for transport: This WP displayed advanced numerical simulations of biofuels under conditions similar to those of an engine (UE Coordinator: UPV-CMT, MEX Coordinator: UNAM); 

For WP1 the following codes were optimized for exascale computers: Alya, Bsit, DualSPHysics, ExaHyPE, Seossol, SEM46 and WRF.   

As an example, we present some of the results for the DualPHYysics code. We evaluated two architectures: The first set of hardware used were identical nodes, each equipped with 2 ”Intel Xeon Gold 6248 Processors”, clocking at 2.5 GHz with about 192 GB of system memory. Each node contained 4 Nvidia V100 Tesla GPUs with 32 GB of main memory each. The second set of hardware used were identical nodes, each equipped with 2 ”AMD Milan 7763 Processors”, clocking at 2.45 GHz with about 512 GB of system memory. Each node contained 4 Nvidia V100 Ampere GPUs with 40 GB of main memory each. The code was compiled and linked with CUDA 10.2 and OpenMPI 4. The application was executed using one GPU per MPI rank. 

In Figures 1 and 2 we show the scalability of the code for the strong and weak scaling tests that indicate that the scaling is very good. Motivated by these excellent results, we are in the process of performing in the LUMI supercomputer new SPH simulations with up to 26,834 million particles that will be run with up to 500 GPUs, which is 53.7 million particles per GPU. These simulations will be done initially for a Wave Energy Converter (WEC) Farm (see Figure 3), and later for turbulent models. 

Figure 1. Strong scaling test with a fix number of particles but increasing number of GPUs.

 

Figure 2. Weak scaling test with increasing number of particles and GPUs.

 

Figure 3. Wave Energy Converter (WEC) Farm (taken from https://corpowerocean.com/)

 

As part of WP3, ENERXICO developed a first version of a computer code called Black Hole (or BH code) for the numerical simulation of oil reservoirs, based on the numerical technique known as Smoothed Particle Hydrodynamics or SPH. This new code is an extension of the DualSPHysics code (https://dual.sphysics.org/) and is the first SPH based code that has been developed for the numerical simulation of oil reservoirs and has important benefits versus commercial codes based on other numerical techniques.  

The BH code is a large-scale massively parallel reservoir simulator capable of performing simulations with billions of “particles” or fluid elements that represent the system under study. It contains improved multi-physics modules that automatically combine the effects of interrelated physical and chemical phenomena to accurately simulate in-situ recovery processes. This has led to the development of a graphical user interface, considered as a multiple-platform application for code execution and visualization, and for carrying out simulations with data provided by industrial partners and performing comparisons with available commercial packages.  

Furthermore, a considerable effort is presently being made to simplify the process of setting up the input for reservoir simulations from exploration data by means of a workflow fully integrated in our industrial partners’ software environment.  A crucial part of the numerical simulations is the equation of state.  We have developed an equation of state based on crude oil data (the so-called PVT) in two forms, the first as a subroutine that is integrated into the code, and the second as an interpolation subroutine of properties’ tables that are generated from the equation of state subroutine.  

An oil reservoir is composed of a porous medium with a multiphase fluid made of oil, gas, rock and other solids. The aim of the code is to simulate fluid flow in a porous medium, as well as the behaviour of the system at different pressures and temperatures.  The tool should allow the reduction of uncertainties in the predictions that are carried out. For example, it may answer questions about the benefits of injecting a solvent, which could be CO2, nitrogen, combustion gases, methane, etc. into a reservoir, and the times of eruption of the gases in the production wells. With these estimates, it can take the necessary measures to mitigate their presence, calculate the expense, the pressure to be injected, the injection volumes and most importantly, where and for how long. The same happens with more complex processes such as those where fluids, air or steam are injected, which interact with the rock, oil, water and gas present in the reservoir. The simulator should be capable of monitoring and preparing measurement plans. 

In order to be able to perform a simulation of a reservoir oil field, an initial model needs to be created.  Using geophysical forward and inverse numerical techniques, the ENERXICO project evaluated novel, high-performance simulation packages for challenging seismic exploration cases that are characterized by extreme geometric complexity. Now, we are undergoing an exploration of high-order methods based upon fully unstructured tetrahedral meshes and also tree-structured Cartesian meshes with adaptive mesh refinement (AMR) for better spatial resolution. Using this methodology, our packages (and some commercial packages) together with seismic and geophysical data of naturally fractured reservoir oil fields, are able to create the geometry (see Figure 4), and exhibit basic properties of the oil reservoir field we want to study.  A number of numerical simulations are performed and from these oil fields exploitation scenarios are generated.

 

Figure 4. A detail of the initial model for a SPH simulation of a porous medium.

 

More information about the ENERXICO Project can be found in: https://enerxico-project.eu/

By: Jaime Klapp (ININ, México) and Isidoro Gitler (Cinvestav, México)

 

 

 

 

The post Subsequent Progress And Challenges Concerning The México-UE Project ENERXICO: Supercomputing And Energy For México first appeared on RISC2 Project.

]]>
Developing Efficient Scientific Gateways for Bioinformatics in Supercomputer Environments Supported by Artificial Intelligence https://www.risc2-project.eu/2023/03/20/developing-efficient-scientific-gateways-for-bioinformatics-in-supercomputer-environments-supported-by-artificial-intelligence/ Mon, 20 Mar 2023 09:37:46 +0000 https://www.risc2-project.eu/?p=2781 Scientific gateways bring enormous benefits to end users by simplifying access and hiding the complexity of the underlying distributed computing infrastructure. Gateways require significant development and maintenance efforts. BioinfoPortal[1], through its CSGrid[2]  middleware, takes advantage of Santos Dumont [3] heterogeneous resources. However, task submission still requires a substantial step regarding deciding the best configuration that […]

The post Developing Efficient Scientific Gateways for Bioinformatics in Supercomputer Environments Supported by Artificial Intelligence first appeared on RISC2 Project.

]]>
Scientific gateways bring enormous benefits to end users by simplifying access and hiding the complexity of the underlying distributed computing infrastructure. Gateways require significant development and maintenance efforts. BioinfoPortal[1], through its CSGrid[2]  middleware, takes advantage of Santos Dumont [3] heterogeneous resources. However, task submission still requires a substantial step regarding deciding the best configuration that leads to efficient execution. This project aims to develop green and intelligent scientific gateways for BioinfoPortal supported by high-performance computing environments (HPC) and specialised technologies such as scientific workflows, data mining, machine learning, and deep learning. The efficient analysis and interpretation of Big Data opens new challenges to explore molecular biology, genetics, biomedical, and healthcare to improve personalised diagnostics and therapeutics; finding new avenues to deal with this massive amount of information becomes necessary. New Bioinformatics and Computational Biology paradigms drive storage, management, and data access. HPC and Big Data advanced in this domain represent a vast new field of opportunities for bioinformatics researchers and a significant challenge. the BioinfoPortal science gateway is a multiuser Brazilian infrastructure. We present several challenges for efficiently executing applications and discuss the findings on improving the use of computational resources. We performed several large-scale bioinformatics experiments that are considered computationally intensive and time-consuming. We are currently coupling artificial intelligence to generate models to analyze computational and bioinformatics metadata to understand how automatic learning can predict computational resources’ efficient use. The computational executions are conducted at Santos Dumont, the largest supercomputer in Latin America, dedicated to the research community with 5.1 Petaflops and 36,472 computational cores distributed in 1,134 computational nodes.

By:

Carneiro, B. Fagundes, C. Osthoff, G. Freire, K. Ocaña, L. Cruz, L. Gadelha, M. Coelho, M. Galheigo, and R. Terra are with the National Laboratory of Scientific Computing, Rio de Janeiro, Brazil.

Carvalho is with the Federal Center for Technological Education Celso Suckow da Fonseca, Rio de Janeiro, Brazil.

Douglas Cardoso is with the Polytechnic Institute of Tomar, Portugal.

Boito and L, Teylo is with the University of Bordeaux, CNRS, Bordeaux INP, INRIA, LaBRI, Talence, France.

Navaux is with the Informatics Institute, the Federal University of Rio Grande do Sul, and Rio Grande do Sul, Brazil.

References:

Ocaña, K. A. C. S.; Galheigo, M.; Osthoff, C.; Gadelha, L. M. R.; Porto, F.; Gomes, A. T. A.; Oliveira, D.; Vasconcelos, A. T. BioinfoPortal: A scientific gateway for integrating bioinformatics applications on the Brazilian national high-performance computing network. Future Generation Computer Systems, v. 107, p. 192-214, 2020.

Mondelli, M. L.; Magalhães, T.; Loss, G.; Wilde, M.; Foster, I.; Mattoso, M. L. Q.; Katz, D. S.; Barbosa, H. J. C.; Vasconcelos, A. T. R.; Ocaña, K. A. C. S; Gadelha, L. BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments. PeerJ, v. 1, p. 1, 2018.

Coelho, M.; Freire, G.; Ocaña, K.; Osthoff, C.; Galheigo, M.; Carneiro, A. R.; Boito, F.; Navaux, P.; Cardoso, D. O. Desenvolvimento de um Framework de Aprendizado de Máquina no Apoio a Gateways Científicos Verdes, Inteligentes e Eficientes: BioinfoPortal como Caso de Estudo Brasileiro In: XXIII Simpósio em Sistemas Computacionais de Alto Desempenho – WSCAD 2022 (https://wscad.ufsc.br/), 2022.

Terra, R.; Ocaña, K.; Osthoff, C.; Cruz, L.; Boito, F.; Navaux, P.; Carvalho, D. Framework para a Construção de Redes Filogenéticas em Ambiente de Computação de Alto Desempenho. In: XXIII Simpósio em Sistemas Computacionais de Alto Desempenho – WSCAD 2022 (https://wscad.ufsc.br/), 2022.

Ocaña, K.; Cruz, L.; Coelho, M.; Terra, R.; Galheigo, M.; Carneiro, A.; Carvalho, D.; Gadelha, L.; Boito, F.; Navaux, P.; Osthoff, C. ParslRNA-Seq: an efficient and scalable RNAseq analysis workflow for studies of differentiated gene expression. In: Latin America High-Performance Computing Conference (CARLA), 2022, Rio Grande do Sul, Brazil. Proceedings of the Latin American High-Performance Computing Conference – CARLA 2022 (http://www.carla22.org/), 2022.

[1] https://bioinfo.lncc.br/

[2] https://git.tecgraf.puc-rio.br/csbase-dev/csgrid/-/tree/CSGRID-2.3-LNCC

[3] https://https://sdumont.lncc.br

The post Developing Efficient Scientific Gateways for Bioinformatics in Supercomputer Environments Supported by Artificial Intelligence first appeared on RISC2 Project.

]]>
Towards a greater HPC capacity in Latin America https://www.risc2-project.eu/2023/02/24/towards-a-greater-hpc-capacity-in-latin-america/ Fri, 24 Feb 2023 15:36:39 +0000 https://www.risc2-project.eu/?p=2739 High-Performance Computing (HPC) has proven to be a strong driver for science and technology development, and is increasingly considered indispensable for most scientific disciplines. HPC is making a difference in key topics of great interest such as climate change, personalised medicine, engineering, astronomy, education, economics, industry and public policy, becoming a pillar for the development […]

The post Towards a greater HPC capacity in Latin America first appeared on RISC2 Project.

]]>
High-Performance Computing (HPC) has proven to be a strong driver for science and technology development, and is increasingly considered indispensable for most scientific disciplines. HPC is making a difference in key topics of great interest such as climate change, personalised medicine, engineering, astronomy, education, economics, industry and public policy, becoming a pillar for the development of any country, and to which the great powers are giving strategic importance and investing billions of dollars, in competition without limits where data is the new gold.

A country that does not have the computational capacity to solve its own problems will have no alternative but to try to acquire solutions provided by others. One of the most important aspects of sovereignty in the 21st century is the ability to produce mathematical models and to have the capacity to solve them. Today, the availability of computing power commensurate with one’s wealth exponentially increases a country’s capacity to produce knowledge. in the developed world, it is estimated that for every dollar invested in supercomputing, the return to society is of the order of US$ 44(1) and to the academic world US$ 30(2). For these reasons, HPC occupies an important place on the political and diplomatic agendas of developed countries. 

In Latin America, investment in HPC is very low compared to what’s the US, Asia and Europe are doing. In order to quantify this difference, we present the tables below, which show the accumulated computing capacity in the ranking of the 500 most powerful supercomputers in the world – the TOP500(3) – (Table 1), and the local reality (Table 2). Other data are also included, such as the population (in millions), the number of researchers per 1,000 inhabitants (Res/1000), the computing capacity per researcher (Gflops/Res) and the computing capacity per US$ million of GPD. In Table 1, we have grouped the countries by geographical area. America appears as the area with the highest computing capacity, essentially due to the USA, which has almost 45% of the world’s computing capacity in the TOP500. It if followed by Asia and then Europe. Tis TOP500 list includes mainly academic research centres, but also industry ones, typically those used in applied research (many private ones do not wish to publish such information for obvious reasons). For example, in Brazil – which shows good computing capacity with 88,175 TFlops – the vast majority is in the hands of the oil industry and only about 3,000 TFlops are used for basic research. Countries listed in the TOP500 invest in HPC from a few TFlops per million GDP (Belgium 5, Spain 7, Bulgaria 8), through countries investing in the order of hundreds (Italy 176, Japan 151, USA 138), to even thousands, as is the case in Finland with 1,478. For those countries where we were able to find data on the number of researchers, these range from a few Gflops per researcher (Belgium 19, Spain 24, Hungary 52) to close to 1,000 GFlops, i.e. 1 TFlop (USA 970, Italy 966), with Finland surpassing this barrier with 4,647. Note that, unlike what happens locally, countries with a certain degree of development invest every 3-4 years in supercomputing, so the data we are showing will soon be updated and there will be variations in the list. For example, this year a new supercomputer will come into operation in Spain(4), which, with an investment of some 150 million euros, will give Spain one of the most powerful supercomputers in Europe – and the world.

Country Rpeak 

(TFlops)

Population

(millions)

Res/1000 GFlops/Res Tflops/M US$
United States 3.216.124 335 9.9 969.7 138.0
Canada 71.911 39 8.8 209.5 40.0
Brazil 88.175 216 1.1 371.1  51.9
AMERICA 3.376.211 590      
           
China 1.132.071 1400     67.4
Japan 815.667 124 10.0 657.8 151.0
South Korea 128.264 52 16.6 148.6 71.3
Saudi Arabia 98.982 35     141.4
Taiwan 19.562 23     21.7
Singapore 15.785 6     52.6
Thailand 13.773 70     27.5
United Arab Emirates 12.164 10     15.2
India 12.082 1380     4.0
ASIA 2.248.353 3100      
           
Finland 443.391 6 15.9 4647.7 1478.0
Italy 370.262 59 6.5 965.5 176.3
Germany 331.231 85 10.1 385.8 78.9
France 251.166 65 11.4 339.0 83.7
Russia 101.737 145     59.8
United Kingdom 92.563 68 9.6 141.8 29.9
Netherlands 56.740 18 10.6 297.4 56.7
Switzerland 38.600 9 9.4 456.3 48.3
Sweden 32.727 10 15.8 207.1 54.5
Ireland 26.320 5 10.6 496.6 65.8
Luxembourg 18.291 0.6     365.8
Poland 17.099 38 7.6 59.2 28.5
Norway 17.031 6 13.0 218.3 34.1
Czech Republic 12.914 10 8.3 155.6 43.0
Spain 10.296 47 7.4 29.6 7.4
Slovenia 10.047 2 9.9 507.4 167.5
Austria 6.809 9 11.6 65.2 13.6
Bulgaria 5.942 6     8.5
Hungary 4.669 10 9.0 51.9 23.3
Belgium 3.094 12 13.6 19.0 5.2
EUROPA 1.850.934 610.6      
OTHER          
Australia 60.177 26     40.1
Morocco 5.014 39     50.1

Table 1. HPC availability per researcher and relative to GDP in the TOP500 countries (includes HPC in industry).

The local reality is far from this data. Table 2 shows data from Argentina, Brazil, Chile and Mexico. In Chile, the availability of computing power is 2-3 times less per researcher than in countries with less computing power in the OECD and up to 100 times less than a researcher in the US. In Chile, our investment measured in TFlops per million US$ of GDP is 166 times less than in the US; with respect to European countries that invest less in HPC it is 9 times less, and with respect to the European average (including Finland) it is 80 times less, i.e. the difference is considerable. It is clear that we need to close this gap. An investment go about 5 million dollars in HPC infrastructure in the next 5 years would close this gap by a factor of almost 20 times our computational capacity. However, returning to the example of Spain, the supercomputer it will have this year will offer 23 times more computing power than at present and, therefore, we will only maintain our relative distance. If we do not invest, the dap will increase by at least 23 times and will end up being huge. Therefore, we do not only need a one-time investment, but we need to ensure a regular investment. Some neighbouring countries are already investing significantly in supercomputing. This is the case in Argentina, where they are investing 7 million dollars (2 million for the datacenter and 5 million to buy a new supercomputer), which will increase their current capacities by almost 40 times(5).

Country Rpeak 

(TFlops)

Population (millions) Res/1000 GFlops/Res Tflops/M US$
Brazil* 3.000 216 1.1  12.6 1.8
Mexico 2.200 130 1.2 14.1 1.8
Argentina 400 45 1.2 7.4  0.8
Chile 250 20 1.3 9.6 0.8

Table 2. HPC availability per researcher and relative to GDP in the region (*only HPC capacity in academia is considered in this table).

For the above reasons, we are working to convince the Chilean authorities that we must have greater funding and, more crucially, permanent state funding in HPC. In relation to this, on July 6 we signed a collaboration agreement between 44 institutions with the support of the Ministry of Science to work on the creation of the National Supercomputing Laboratory(6). The agreement recognised that supercomputers are a critical infrastructure for Chile’s development, that it is necessary to centralise the requirements/resources at the national level, obtain permanent funding from the State and create a new institutional framework to provide governance. In an unprecedented inter-institutional collaboration in Chile, the competition for HPC resources at the national level is eliminated ad the possibility of direct funding from the State is opened up without generating controversy.

Undoubtedly, supercomputing is a fundamental pillar for the development of any country, where increasing investment provides a strategic advantage, and in Latin America we should not be left behind.

By NLHPC

 

References

(1) Hyperion Research HPC Investments Bring High Returns

(2) EESI-2 Special Study To Measure And Model How Investments In HPC Can Create Financial ROI And Scientific Innovation In Europe 

(3) https://top500.org/ 

(4) https://www.lavanguardia.com/ciencia/20230129/8713515/llega-superordenador-marenostrum-5-bsc-barcelona.html

(5) https://www.hpcwire.com/2022/12/15/argentina-announces-new-supercomputer-for-national-science/

(6) https://uchile.cl/noticias/187955/44-instituciones-crearan-el-laboratorio-nacional-de-supercomputacion

 

The post Towards a greater HPC capacity in Latin America first appeared on RISC2 Project.

]]>
Mapping human brain functions using HPC https://www.risc2-project.eu/2023/02/01/mapping-human-brain-functions-using-hpc/ Wed, 01 Feb 2023 13:17:19 +0000 https://www.risc2-project.eu/?p=2697 ContentMAP is the first Portuguese project in the field of Psychology and Cognitive Neuroscience to be awarded with European Research Council grant (ERC Starting Grant #802553). In this project one is mapping how the human brain represents object knowledge – for example, how one represents in the brain all one knows about a knife (that […]

The post Mapping human brain functions using HPC first appeared on RISC2 Project.

]]>
ContentMAP is the first Portuguese project in the field of Psychology and Cognitive Neuroscience to be awarded with European Research Council grant (ERC Starting Grant #802553). In this project one is mapping how the human brain represents object knowledge – for example, how one represents in the brain all one knows about a knife (that it cuts, that it has a handle, that is made out of metal and plastic or metal and wood, that it has a serrated and sharp part, that it is smooth and cold, etc.)? To do this, the project collects numerous MRI images while participants see and interact with objects (fMRI). HPC (High Performance Computing) is of central importance for processing these images . The use of HPC has allowed to manipulate these data, perform analysis with machine learning and complex computing in a timely manner.

Humans are particularly efficient at recognising objects – think about what surrounds us: one recognises the object where one is reading the text from as a screen, the place where one sits as a chair, the utensil in which one drinks coffee as a cup, and one does all of this extremely quickly and virtually automatically. One is able to do all this despite the fact that 1) one holds large amounts of information about each object (if one is asked to write down everything you know about a pen, you would certainly have a lot to say); and that 2) there are several exemplars of each object type (a glass can be tall, made out of glass, metal, paper or plastic, it can be different colours, etc. – but despite that, any of them would still be a glass). How does one do this? How one is able to store and process so much information in the process of recognising a glass, and generalise all the different instances of a glass to get the concept “glass”? The goal of the ContentMAP is to understand the processes that lead to successful object recognition.

The answer to these question lies in better understanding of the organisational principles of information in the brain. It is, in fact, the efficient organisation of conceptual information and object representations in the brain that allows one to quickly and efficiently recognise the keyboard that is in front of each of us. To study the neuronal organisation of object knowledge, the project collects large sets of fMRI data from several participants, and then try to decode the organisational principles of information in the brain.

Given the amount of data and the computational requirements of this type of data at the level of pre-processing and post processing, the use of HPC is essential to enable these studies to be conducted in a timely manner. For example, at the post-processing level, the project uses whole brain Support Vector Machine classification algorithms (searchlight procedures) that require hundreds of thousands of classifiers to be trained. Moreover, for each of these classifiers one needs to compute a sample distribution of the average, as well as test the various classifications of interest, and this has to be done per participant.

Because of this, the use of HPC facilities of of the Advanced Computing Laboratory (LCA) at University of Coimbra is crucial. It allows us to actually perform these analyses in one to two weeks – something that on our 14-core computers would take a few months, which in pratice would mean, most probably, that the analysis would not be done. 

By Faculty of Psychology and Educational Sciences, University of Coimbra

 

Reference 

ProAction Lab http://proactionlab.fpce.uc.pt/ 

The post Mapping human brain functions using HPC first appeared on RISC2 Project.

]]>
JUPITER Ascending – First European Exascale Supercomputer Coming to Jülich https://www.risc2-project.eu/2023/01/02/jupiter-ascending-first-european-exascale-supercomputer-coming-to-julich/ Mon, 02 Jan 2023 12:14:22 +0000 https://www.risc2-project.eu/?p=2637 It was finally decided in 2022: Forschungszentrum Jülich will be home to Europe’s first exascale computer. The supercomputer is set to be the first in Europe to surpass the threshold of one trillion (“1” followed by 18 zeros) calculations per second. The system will be acquired by the European supercomputing initiative EuroHPC JU. The exascale computer […]

The post JUPITER Ascending – First European Exascale Supercomputer Coming to Jülich first appeared on RISC2 Project.

]]>
It was finally decided in 2022: Forschungszentrum Jülich will be home to Europe’s first exascale computer. The supercomputer is set to be the first in Europe to surpass the threshold of one trillion (“1” followed by 18 zeros) calculations per second. The system will be acquired by the European supercomputing initiative EuroHPC JU. The exascale computer should help to solve important and urgent scientific questions regarding, for example, climate change, how to combat pandemics, and sustainable energy production, while also enabling the intensive use of artificial intelligence and the analysis of large data volumes. The overall costs for the system amount to 500 million euros. Of this total, 250 million euros is being provided by EuroHPC JU and a further 250 million euros in equal parts by the German Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MKW NRW).

The computer named JUPITER (short for “Joint Undertaking Pioneer for Innovative and Transformative Exascale Research”) will be installed 2023/2024 on the campus of Forschungszentrum Jülich. It is intended that the system will be operated by the Jülich Supercomputing Centre (JSC), whose supercomputers JUWELS and JURECA currently rank among the most powerful in the world. JSC has participated in the application procedure for a high-end supercomputer as a member of the Gauss Centre for Supercomputing (GCS), an association of the three German national supercomputing centres JSC in Jülich, High Performance Computing Stuttgart (HLRS), and Leibniz Computing Centre (LRZ) in Garching. The competition was organized by the European supercomputing initiative EuroHPC JU, which was formed by the European Union together with European countries and private companies. 

JUPITER is now set to become the first European supercomputer to make the leap into the exascale class. In terms of computing power, it will be more powerful that 5 million modern laptops of PCs. Just like Jülich’s current supercomputer JUWELS, JUPITER will be based on a dynamic, modular supercomputing architecture, which Forschungszentrum Jülich developed together with European and international partners in the EU’s DEEP research projects.

In a modular supercomputer, various computing modules are coupled together. This enables program parts of complex simulations to be distributed over several modules, ensuring that the various hardware properties can be optimally utilized in each case. Its modular construction also means that the system is well prepared for integrating future technologies such as quantum computing or neurotrophic modules, which emulate the neural structure of a biological brain.

Figure Modular Supercomputing Architecture: Computing and storage modules of the exascale computer in its basis configuration (blue) as well as optional modules (green) and modules for future technologies (purple) as possible extensions. 

In its basis configuration, JUPITER will have and enormously powerful booster module with highly efficient GPU-based computation accelerators. Massively parallel applications are accelerated by this booster in a similar way to a turbocharger, for example to calculate high-resolution climate models, develop new materials, simulate complex cell processes and energy systems, advanced basic research, or train next-generation, computationally intensive machine-learning algorithms.

One major challenge is the energy that is required for such large computing power. The average power is anticipated to be up to 15 megawatts. JUPITER has been designed as a “green” supercomputer and will be powered by green electricity. The envisaged warm water cooling system should help to ensure that JUPITER achieves the highest efficiency values. At the same time, the cooling technology opens up the possibility of intelligently using the waste heat  that is produced. For example, just like its predecessor system JUWELS, JUPITER will be connected to the new low-temperature network on the Forschungszentrum Jülich campus. Further potential applications for the waste heat from JUPITER are currently being investigated by Forschungszentrum Jülich.

By Jülich Supercomputing Centre (JSC)

 

The first image is JUWELS: Germany’s fastest supercomputer JUWELS at Forschungszentrum Jülich, which is funded in equal parts by the Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MKW NRW) via the Gauss Centre for Supercomputing (GCS). (Copyright: Forschungszentrum Jülich / Sascha Kreklau)

The post JUPITER Ascending – First European Exascale Supercomputer Coming to Jülich first appeared on RISC2 Project.

]]>
Advanced Computing Collaboration to Growth Sustainable Ecosystems https://www.risc2-project.eu/2022/12/12/advanced-computing-collaboration-to-growth-sustainable-ecosystems/ Mon, 12 Dec 2022 10:45:48 +0000 https://www.risc2-project.eu/?p=2612 The impact of High-Performance Computing (HPC) in different contexts related to the needs of high capabilities and strategies to simulate or to compute is very known. In the development of the RISC2 project, observing the project’s main goals, it is not a potential impact to support scientific challenges recognised after the exploration but an essential […]

The post Advanced Computing Collaboration to Growth Sustainable Ecosystems first appeared on RISC2 Project.

]]>
The impact of High-Performance Computing (HPC) in different contexts related to the needs of high capabilities and strategies to simulate or to compute is very known. In the development of the RISC2 project, observing the project’s main goals, it is not a potential impact to support scientific challenges recognised after the exploration but an essential requirement for scientific, productive, and social activities. Different outcomes are presented in the academic spaces as the workshops and main tracks of the Latin American Conference on High-Performance Computing (CARLA 2023). In these spaces, different RISC2 proposals show how HPC allows competitiveness, demands collaboration to attack global interests, and guarantees sustainability.

In the European and Latin American (EuroLatAm) HPC ecosystems, it tis possible to identify actors in different domains: industry, academy, research, society, and government. Each of them, at different levels, has a group of demands or interactions, depending on the interests. I.e., the industry demands capabilities to have HPC solutions for productivity and wants skills from the academy to perform development actors to build applications to use solutions. Another example could be the relationship between research and the government. In the HPC Ecosystem, collaborations allow synergies to face common interests. Still, it demands policies and coordinated roadmaps to support long-term projects and activities with a clear impact on society.

Of course, a historical relationship exists between Latin America and Europe from colonial history. In the case of advanced computing projects, it is possible to identify, from the first EuroLatAm Grid Computing projects more than twenty years ago until the real supercomputing projects such as RISC and RISC2. Still, now, more with shared interests and the different EuroLatAm HPC projects improve competitiveness and collaboration. Competitiveness for industrial and productive business, partnership (and competitiveness) in science and education goals, and human wellness. So paraphrasing Mateo Valero “who does not compute does not compete”, I would add “who does not collaborate does not survive”.

Taking collaboration and competitiveness, the RISC2 project allows identifying sustainability elements and sustainable workflows for different projects. The impressive interaction between the actors of the HPC EuroLatAm ecosystem has not only given scientific results but also policies, recommendations, best practices, and new questions. For these outcomes, in the past 2022 Supercomputing Conference, RISC2 was awarded the 2022 HPCWire Editors’ Choice Award as the Best HPC Collaboration.

Sustainable advanced computing ecosystems and their growth are evident with the knowledge of the results of projects such as RISC2. Collaboration, interaction, and competitiveness build human development and guarantee development, technological diversification, and peer-to-peer relationships to attack common interests and problems. So, RISC2 is a crucial step to advance to a RISC3 as it was at the time of the previous RISC.

 

By Universidad Industrial de Santander

The post Advanced Computing Collaboration to Growth Sustainable Ecosystems first appeared on RISC2 Project.

]]>
Managing Data and Machine Learning Models in HPC Applications https://www.risc2-project.eu/2022/11/21/managing-data-and-machine-learning-models-in-hpc-applications/ Mon, 21 Nov 2022 14:09:42 +0000 https://www.risc2-project.eu/?p=2508 The synergy of data science (including big data and machine learning) and HPC yields many benefits for data-intensive applications in terms of more accurate predictive data analysis and better decision making. For instance, in the context of the HPDaSc (High Performance Data Science) project between Inria and Brazil, we have shown the importance of realtime […]

The post Managing Data and Machine Learning Models in HPC Applications first appeared on RISC2 Project.

]]>
The synergy of data science (including big data and machine learning) and HPC yields many benefits for data-intensive applications in terms of more accurate predictive data analysis and better decision making. For instance, in the context of the HPDaSc (High Performance Data Science) project between Inria and Brazil, we have shown the importance of realtime analytics to make critical high-consequence decisions in HPC applications, e.g., preventing useless drilling based on a driller’s realtime data and realtime visualization of simulated data, or the effectiveness of ML to deal with scientific data, e.g., computing Probability Density Functions (PDFs) over simulated seismic data using Spark.

However, to realize the full potential of this synergy, ML models (or models for short) must be built, combined and ensembled, which can be very complex as there can be many models to select from. Furthermore, they should be shared and reused, in particular, in different execution environments such as HPC or Spark clusters.

To address this problem, we proposed Gypscie [Porto 2022, Zorrilla 2022], a new framework that supports the entire ML lifecycle and enables model reuse and import from other frameworks. The approach behind Gypscie is to combine several rich capabilities for model and data management, and model execution, which are typically provided by different tools, in a unique framework. Overall, Gypscie provides: a platform for supporting the complete model life-cycle, from model building to deployment, monitoring and policies enforcement; an environment for casual users to find ready-to-use models that best fit a particular prediction problem, an environment to optimize ML task scheduling and execution; an easy way for developers to benchmark their models against other competitive models and improve them; a central point of access to assess models’ compliance to policies and ethics and obtain and curate observational and predictive data; provenance information and model explainability. Finally, Gypscie interfaces with multiple execution environments to run ML tasks, e.g., an HPC system such as the Santos Dumont supercomputer at LNCC or a Spark cluster. 

Gypscie comes with SAVIME [Silva 2020], a multidimensional array in-memory database system for importing, storing and querying model (tensor) data. The SAVIME open-source system has been developed to support analytical queries over scientific data. Its offers an extremely efficient ingestion procedure, which practically eliminates the waiting time to analyze incoming data. It also supports dense and sparse arrays and non-integer dimension indexing. It offers a functional query language processed by a query optimiser that generates efficient query execution plans.

 

References

[Porto 2022] Fabio Porto, Patrick Valduriez: Data and Machine Learning Model Management with Gypscie. CARLA 2022 – Workshop on HPC and Data Sciences meet Scientific Computing, SCALAC, Sep 2022, Porto Alegre, Brazil. pp.1-2. 

[Zorrilla 2022] Rocío Zorrilla, Eduardo Ogasawara, Patrick Valduriez, Fabio Porto: A Data-Driven Model Selection Approach to Spatio-Temporal Prediction. SBBD 2022 – Brazilian Symposium on Databases, SBBD, Sep 2022, Buzios, Brazil. pp.1-12. 

[Silva 2020] A.C. Silva, H. Lourenço, D. Ramos, F. Porto, P. Valduriez. Savime: An Array DBMS for Simulation Analysis and Prediction. Journal of Information Data Management 11(3), 2020. 

 

By LNCC and Inria 

The post Managing Data and Machine Learning Models in HPC Applications first appeared on RISC2 Project.

]]>
Using supercomputing for accelerating life science solutions https://www.risc2-project.eu/2022/11/01/using-supercomputing-for-accelerating-life-science-solutions/ Tue, 01 Nov 2022 14:11:06 +0000 https://www.risc2-project.eu/?p=2504 The world of High Performance Computing (HPC) is now moving towards exascale performance, i.e. the ability of calculating 1018 operations per second. A variety of applications will be improved to take advantage of this computing power, leading to better prediction and models in different fields, like Environmental Sciences, Artificial Intelligence, Material Sciences and Life Sciences. In […]

The post Using supercomputing for accelerating life science solutions first appeared on RISC2 Project.

]]>
The world of High Performance Computing (HPC) is now moving towards exascale performance, i.e. the ability of calculating 1018 operations per second. A variety of applications will be improved to take advantage of this computing power, leading to better prediction and models in different fields, like Environmental Sciences, Artificial Intelligence, Material Sciences and Life Sciences.

In Life Sciences, HPC advancements can improve different areas:

  • a reduced time to scientific discovery;
  • the ability of generating predictions necessary for precision medicine;
  • new healthcare and genomics-driven research approaches;
  • the processing of huge datasets for deep and machine learning;
  • the optimization of modeling, such as Computer Aided Drug Design (CADD);
  • enhanched security and protection of healthcare data in HPC environments, in compliance with European GDPR regulations;
  • management of massive amount of data for example for clinical trials, drug development and genomics data analytics.

The outbreak of COVID-19 has further accelerated this progress from different points of view. Some European projects aim at reusing known and active ingredients to prepare new drugs as contrast therapy against COVID disease [Exscalate4CoV, Ligate], while others focus on the management and monitoring of contagion clusters to provide an innovative approach to learn from SARS-CoV-2 crisis and derive recommendations for future waves and pandemics [Orchestra].

The ability to deal with massive amounts of data in HPC environments is also used to create databases with data from nucleic acids sequencing and use them to detect allelic variant frequencies, as in the NIG project [Nig], a collaboration with the Network for Italian Genomes. Another example of usage of this capability is the set-up of data sharing platform based on novel Federated Learning schemes, to advance research in personalised medicine in haematological diseases [Genomed4All].

Supercomputing is widely used in Drug Design (the process of finding medicines for disease for which there are no or insufficient treatments), with many projects active in this field just like RISC2.

Sometimes, when there is no previous knowledge of the biological target, just like what happened with COVID-19, discovering new drugs requires creating from scratch new molecules [Novartis]. This process involves billion dollar investments to produce and test thousands of molecules and it usually has a low success rate: only about 12% of potential drugs entering the clinical development are approved [Engitix]. The whole process from identifying a possible compound to the end of the clinical trial can take up to 10 years. Nowadays there is an uneven coverage of disease: most of the compounds are used for genetic conditions, while only a few antiviral and antibiotics have been found.

The search for candidate drugs occurs mainly through two different approaches: high-throughput screening and virtual screening. The first one is more reliable but also very expensive and time consuming: it is usually applied when dealing with well-known targets by mainly pharmaceutical companies. The second approach is a good compromise between cost and accuracy and is typically applied against relatively new targets, in academics laboratories, where it is also used to discover or understand better mechanisms of these targets. [Liu2016]

Candidate drugs are usually small molecules that bind to a specific protein or part of it, inhibiting the usual activity of the protein itself. For example, binding the correct ligand to a vial enzyme may stop viral infection. In the process of virtual screening million of compounds are screened against the target protein at different levels: the most basic one simply takes into account the shape to correctly fit into the protein, at higher level also other features are considered as specific interactions, protein flexibility, solubility, human tolerance, and so on. A “score” is assigned to each docked ligand: compounds with highest score are further studied. With massively parallel computers, we can rapidly filter extremely large molecule databases (e.g. billions of molecules).

The current computational power of HPC clusters allow us to analyze up to 3 million compounds per second [Exscalate]. Even though vaccines were developed remarkably quickly, effective drug treatments for people already suffering from covid-19 were very fresh at the beginning of the pandemic. At that time, supercomputers around the world were asked to help with drug design, a real-world example of the power of Urgent Computing. CINECA participates in Exscalate4cov [Exscalate4Cov], currently the most advanced center of competence for fighting the coronavirus, combining the most powerful supercomputing resources and Artificial Intelligence with experimental facilities and clinical validation. 

 

References

[Engitix] https://engitix.com/technology/

[Exscalate] https://www.exscalate.eu/en/projects.html

[Exscalate4CoV] https://www.exscalate4cov.eu/

[Genomed4All] https://genomed4all.eu/

[Ligate] https://www.ligateproject.eu/

[Liu2016] T. Liu, D. Lu, H. Zhang, M. Zheng, H. Yang, Ye. Xu, C. Luo, W. Zhu, K. Yu, and H. Jiang, “Applying high-performance computing in drug discovery and molecular simulation” Natl Sci Rev. 2016 Mar; 3(1): 49–63.

[Nig] http://www.nig.cineca.it/

[Novartis] https://www.novartis.com/stories/art-drug-design-technological-age

[Orchestra] https://orchestra-cohort.eu/

 

By CINECA

The post Using supercomputing for accelerating life science solutions first appeared on RISC2 Project.

]]>
First School of HPC Administrators in Latin America and the Caribbean: A space for the formation of computational thinking https://www.risc2-project.eu/2022/10/31/first-school-of-hpc-administrators-in-latin-america-and-the-caribbean-a-space-for-the-formation-of-computational-thinking/ Mon, 31 Oct 2022 09:33:11 +0000 https://www.risc2-project.eu/?p=2533 From the top 500 High performance computing systems of the world, only 6 are placed in Latin America; this makes patent the need to develop and gather technological efforts; which, by many social and economic issues are placed in second place. The HPC tools are used for economic, demographic, weather and social analysis, even for […]

The post First School of HPC Administrators in Latin America and the Caribbean: A space for the formation of computational thinking first appeared on RISC2 Project.

]]>
From the top 500 High performance computing systems of the world, only 6 are placed in Latin America; this makes patent the need to develop and gather technological efforts; which, by many social and economic issues are placed in second place. The HPC tools are used for economic, demographic, weather and social analysis, even for life savings when taken to medicine appliances, achieving a direct impact in decision making based on science.

The NLHPC staff  set their  fundamental pillar to focus  efforts on the scientific community and show HPC as an essential tool for country development by getting users from diverging scientific areas, industry and public sector. This entails breaking access barriers to this kind of technology. NLHPC faces this challenge by making training for the basic use of HPC  and scientific software optimization;  which is key in order to make a good use of resources.

The training was carried out within a framework of computational thinking, being the process by which an individual, through his professional experience and acquired knowledge, manages to face problems of different kinds. This could be evidenced in our active participation in the resolution of the proposed activities, which enhanced our abstraction and engineering thinking. We will certainly take this vision of education and collaborative work to our professional environment, in the different roles we play as HPC administrators, teachers and students.

The proper use of computing services involves efforts to perform monitoring, control and infrastructure management tasks. With the help of the tools reviewed during our visit, we will be able to provide our users with the highest standards of quality, security and accessibility.

The joint effort of the RISC2 and EU-CELAC ResInfra projects made it possible for engineers from Colombia, Mexico and Peru to participate in this HPC management course, learn about Chilean culture, gain knowledge and valuable contacts for our profession.

After living this great experience, we hope that in the near future other supercomputing centers replicate this type of initiatives in other parts of the world, thus increasing the communication bridges between HPC administrators from different places, sharing knowledge and experiences.

We are left with the milestone of being part of the First School of HPC Administrators of Latin America and the Caribbean, with experiences that made us grow in professional, academic, and human aspects. As well as with alliances among colleagues and now friends, a network of support as brothers of the same region.

We conclude by thanking Rafael Mayo of CIEMAT for the initiative; Ginés Guerrero, Pedro Schürmann, Eugenio Guerra, Pablo Flores, Angelo Guajardo, Esteban Osorio, José Morales for the knowledge and experiences shared; RISC2 and EU-CELAC ResInfra for providing us with this learning opportunity, supporting the scholarship grant.

By:

Miguel Angel Barrera Arbelaez, Universidad de los Andes, Colombia

Carlos Enrique Mosquera Trujillo, Centro de bioinformática y biología computacional de Colombia BIOS, Colombia

César Alexander Bernal Díaz, Universidad Industrial de Santander, Colombia.

Eduardo Romero Arzate, Universidad Autónoma Metropolitana, México.

Ronald Darwin Apaza Veliz, Universidad Nacional de San Agustín, Perú.

Joel Gonzalez Lara, Centro de Análisis de Datos y Supercómputo, México

The post First School of HPC Administrators in Latin America and the Caribbean: A space for the formation of computational thinking first appeared on RISC2 Project.

]]>