clusters - RISC2 Project https://www.risc2-project.eu Mon, 11 Sep 2023 15:01:49 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 Webinar: Improving energy-efficiency of High-Performance Computing clusters https://www.risc2-project.eu/events/webinar-7-improving-energy-efficiency-of-high-performance-computing-clusters/ Thu, 26 Jan 2023 13:37:07 +0000 https://www.risc2-project.eu/?post_type=mec-events&p=2666 Date: April 26, 2023 | 3 p.m. (UTC+1) Speakers: Lubomir Riha and Ondřej Vysocký, IT4Innovations National Supercomputing Center Moderator: Esteban Mocskos, Universidad de Buenos Aires High-Performance Computing centers consume megawatts of electrical power, which is a limiting factor in building bigger systems on the path to exascale and post-exascale clusters. Such high power consumption leads to several challenges […]

The post Webinar: Improving energy-efficiency of High-Performance Computing clusters first appeared on RISC2 Project.

]]>

Date: April 26, 2023 | 3 p.m. (UTC+1)

Speakers: Lubomir Riha and Ondřej Vysocký, IT4Innovations National Supercomputing Center

Moderator: Esteban Mocskos, Universidad de Buenos Aires

High-Performance Computing centers consume megawatts of electrical power, which is a limiting factor in building bigger systems on the path to exascale and post-exascale clusters. Such high power consumption leads to several challenges including robust power supply and its network, enormous energy bills, or significant CO2 emissions. To increase power efficiency, vendors accommodate various heterogeneous hardware that must be fully utilized by users’ applications, to be used efficiently. Such requirements may be hard to fulfill, which open a possibility of limiting the available resources for additional power and energy savings with no or small performance penalty.

The talk will present best practices on how to grant rights to control hardware parameters, how to measure the energy consumption of the hardware, and what can be expected from performing energy-saving activities based on hardware tuning.

About the speakers:

Lubomir Riha, Ph.D. is the Head of the Infrastructure Research Lab at IT4Innovations National Supercomputing Center. Previously he was a research scientist in the High-Performance Computing Lab at George Washington University, ECE Department. He received his Ph.D. degree in Electrical Engineering from the Czech Technical University in Prague, Czech Republic, and a Ph.D. degree in Computer Science from Bowie State University, USA. Currently, he is a local principal investigator of two EuroHPC Centers of Excellence: MAX and SPACE, and two EuroHPC projects: SCALABLE and EUPEX (designs a prototype of the European Exascale machine). Previously he was a local PI of the H2020 Center of Excellence POP2 and H2020-FET HPC READEX projects. His research interests are optimization of HPC applications, energy-efficient computing, acceleration of scientific and engineering applications using GPU and many-core accelerators, and parallel and distributed rendering.

Ondrej Vysocky is a Ph.D. candidate at VSB – Technical University of Ostrava, Czech Republic and at the same time he works at IT4Innovations in Infrastructure Research Lab. His research is focused on energy efficiency in high-performance computing. He was an investigator of the Horizon 2020 READEX project which dealt with the energy efficiency of parallel applications using dynamic tuning. Since that time, he develops a MERIC library, a runtime system for energy measurement and hardware parameters tuning during a parallel application run. Using the library he is an investigator of several H2020 projects including Performance Optimisation and Productivity (POP2), or European Pilot for Exascale (EUPEX). He is also a member of the PowerStack initiative, which works on a holistic, extensible, and scalable approach of power management.

The post Webinar: Improving energy-efficiency of High-Performance Computing clusters first appeared on RISC2 Project.

]]>
RISC2 webinar series aims to benefit HPC research and industry in Europe and Latin America https://www.risc2-project.eu/2023/01/26/risc2-webinar-season-is-back-for-season-2/ Thu, 26 Jan 2023 13:32:50 +0000 https://www.risc2-project.eu/?p=2657 After the success of the first 4 webinars, the RISC2 Webinar Series “HPC System & Tools” is back for its 2nd season. The webinars will be happening until May 2023, starting on February 22. In each webinar, it will be presented the state-of-the-art in methods and tools for setting-up and maintaining HPC hardware and software infrastructures. […]

The post RISC2 webinar series aims to benefit HPC research and industry in Europe and Latin America first appeared on RISC2 Project.

]]>
After the success of the first 4 webinars, the RISC2 Webinar Series “HPC System & Tools” is back for its 2nd season. The webinars will be happening until May 2023, starting on February 22.

In each webinar, it will be presented the state-of-the-art in methods and tools for setting-up and maintaining HPC hardware and software infrastructures. The duration of each talk will be around 30-40 minutes, followed by a 10–15-minute moderated discussion with the audience.

There are already 4 webinars scheduled:

 

 

 

The post RISC2 webinar series aims to benefit HPC research and industry in Europe and Latin America first appeared on RISC2 Project.

]]>
Webinar: Developing complex workflows that include HPC, Artificial Intelligence and Data Analytics https://www.risc2-project.eu/events/webinar-5-developing-complex-workflows-that-include-hpc-artificial-intelligence-and-data-analytics/ Tue, 24 Jan 2023 10:51:32 +0000 https://www.risc2-project.eu/?post_type=mec-events&p=2661 Date: February 22, 2023 | 4 p.m. (UTC) Speaker: Rosa M. Badia, Barcelona Supercomputing Center Moderator: Esteban Mocskos, Universidad de Buenos Aires The evolution of High-Performance Computing (HPC) systems towards every-time more complex machines is opening the opportunity of hosting larger and heterogeneous applications. In this sense, the demand for developing applications that are not purely HPC, but […]

The post Webinar: Developing complex workflows that include HPC, Artificial Intelligence and Data Analytics first appeared on RISC2 Project.

]]>

Date: February 22, 2023 | 4 p.m. (UTC)

Speaker: Rosa M. Badia, Barcelona Supercomputing Center

Moderator: Esteban Mocskos, Universidad de Buenos Aires

The evolution of High-Performance Computing (HPC) systems towards every-time more complex machines is opening the opportunity of hosting larger and heterogeneous applications. In this sense, the demand for developing applications that are not purely HPC, but that combine aspects of Artifical Intelligence and or Data analytics is becoming more common. However, there is a lack of environments that support the development of these complex workflows. The webinar will present PyCOMPSs, a parallel task-based programming in Python. Based on simple annotations, sequential Python programs can be executed in parallel in HPC-clusters and other distributed infrastructures.

PyCOMPSs has been extended to support tasks that invoke HPC applications and can be combined with Artificial Intelligence and Data analytics frameworks.

Some of these extensions are made in the framework of the eFlows4HPC project, which in addition is developing the HPC Workflows as a Service (HPCWaaS) methodology to make the development, deployment, execution and reuse of workflows easier. The webinar will present the current status of the PyCOMPSs programming model and how it is being extended in the eFlows4HPC project towards the project needs. Also, the HPCWaaS methodology will be introduced

About the speaker: Rosa M. Badia holds a PhD on Computer Science (1994) from the Technical University of Catalonia (UPC).  She is the manager of the Workflows and Distributed Computing research group at the Barcelona Supercomputing Center (BSC).

Her current research interests are programming models for complex platforms (from edge, fog, to Clouds and large HPC systems).  The group led by Dr. Badia has been developing StarSs programming model for more than 15 years, with a high success in adoption by application developers. Currently the group focuses its efforts in PyCOMPSs/COMPSs, an instance of the programming model for distributed computing including Cloud.

Dr Badia has published nearly 200 papers in international conferences and journals in the topics of her research. Her group is very active in projects funded by the European Commission and in contracts with industry. Dr Badia is the PI of the eFlows4HPC project.

Registrations are now closed.

 

The post Webinar: Developing complex workflows that include HPC, Artificial Intelligence and Data Analytics first appeared on RISC2 Project.

]]>
Managing Data and Machine Learning Models in HPC Applications https://www.risc2-project.eu/2022/11/21/managing-data-and-machine-learning-models-in-hpc-applications/ Mon, 21 Nov 2022 14:09:42 +0000 https://www.risc2-project.eu/?p=2508 The synergy of data science (including big data and machine learning) and HPC yields many benefits for data-intensive applications in terms of more accurate predictive data analysis and better decision making. For instance, in the context of the HPDaSc (High Performance Data Science) project between Inria and Brazil, we have shown the importance of realtime […]

The post Managing Data and Machine Learning Models in HPC Applications first appeared on RISC2 Project.

]]>
The synergy of data science (including big data and machine learning) and HPC yields many benefits for data-intensive applications in terms of more accurate predictive data analysis and better decision making. For instance, in the context of the HPDaSc (High Performance Data Science) project between Inria and Brazil, we have shown the importance of realtime analytics to make critical high-consequence decisions in HPC applications, e.g., preventing useless drilling based on a driller’s realtime data and realtime visualization of simulated data, or the effectiveness of ML to deal with scientific data, e.g., computing Probability Density Functions (PDFs) over simulated seismic data using Spark.

However, to realize the full potential of this synergy, ML models (or models for short) must be built, combined and ensembled, which can be very complex as there can be many models to select from. Furthermore, they should be shared and reused, in particular, in different execution environments such as HPC or Spark clusters.

To address this problem, we proposed Gypscie [Porto 2022, Zorrilla 2022], a new framework that supports the entire ML lifecycle and enables model reuse and import from other frameworks. The approach behind Gypscie is to combine several rich capabilities for model and data management, and model execution, which are typically provided by different tools, in a unique framework. Overall, Gypscie provides: a platform for supporting the complete model life-cycle, from model building to deployment, monitoring and policies enforcement; an environment for casual users to find ready-to-use models that best fit a particular prediction problem, an environment to optimize ML task scheduling and execution; an easy way for developers to benchmark their models against other competitive models and improve them; a central point of access to assess models’ compliance to policies and ethics and obtain and curate observational and predictive data; provenance information and model explainability. Finally, Gypscie interfaces with multiple execution environments to run ML tasks, e.g., an HPC system such as the Santos Dumont supercomputer at LNCC or a Spark cluster. 

Gypscie comes with SAVIME [Silva 2020], a multidimensional array in-memory database system for importing, storing and querying model (tensor) data. The SAVIME open-source system has been developed to support analytical queries over scientific data. Its offers an extremely efficient ingestion procedure, which practically eliminates the waiting time to analyze incoming data. It also supports dense and sparse arrays and non-integer dimension indexing. It offers a functional query language processed by a query optimiser that generates efficient query execution plans.

 

References

[Porto 2022] Fabio Porto, Patrick Valduriez: Data and Machine Learning Model Management with Gypscie. CARLA 2022 – Workshop on HPC and Data Sciences meet Scientific Computing, SCALAC, Sep 2022, Porto Alegre, Brazil. pp.1-2. 

[Zorrilla 2022] Rocío Zorrilla, Eduardo Ogasawara, Patrick Valduriez, Fabio Porto: A Data-Driven Model Selection Approach to Spatio-Temporal Prediction. SBBD 2022 – Brazilian Symposium on Databases, SBBD, Sep 2022, Buzios, Brazil. pp.1-12. 

[Silva 2020] A.C. Silva, H. Lourenço, D. Ramos, F. Porto, P. Valduriez. Savime: An Array DBMS for Simulation Analysis and Prediction. Journal of Information Data Management 11(3), 2020. 

 

By LNCC and Inria 

The post Managing Data and Machine Learning Models in HPC Applications first appeared on RISC2 Project.

]]>
Using supercomputing for accelerating life science solutions https://www.risc2-project.eu/2022/11/01/using-supercomputing-for-accelerating-life-science-solutions/ Tue, 01 Nov 2022 14:11:06 +0000 https://www.risc2-project.eu/?p=2504 The world of High Performance Computing (HPC) is now moving towards exascale performance, i.e. the ability of calculating 1018 operations per second. A variety of applications will be improved to take advantage of this computing power, leading to better prediction and models in different fields, like Environmental Sciences, Artificial Intelligence, Material Sciences and Life Sciences. In […]

The post Using supercomputing for accelerating life science solutions first appeared on RISC2 Project.

]]>
The world of High Performance Computing (HPC) is now moving towards exascale performance, i.e. the ability of calculating 1018 operations per second. A variety of applications will be improved to take advantage of this computing power, leading to better prediction and models in different fields, like Environmental Sciences, Artificial Intelligence, Material Sciences and Life Sciences.

In Life Sciences, HPC advancements can improve different areas:

  • a reduced time to scientific discovery;
  • the ability of generating predictions necessary for precision medicine;
  • new healthcare and genomics-driven research approaches;
  • the processing of huge datasets for deep and machine learning;
  • the optimization of modeling, such as Computer Aided Drug Design (CADD);
  • enhanched security and protection of healthcare data in HPC environments, in compliance with European GDPR regulations;
  • management of massive amount of data for example for clinical trials, drug development and genomics data analytics.

The outbreak of COVID-19 has further accelerated this progress from different points of view. Some European projects aim at reusing known and active ingredients to prepare new drugs as contrast therapy against COVID disease [Exscalate4CoV, Ligate], while others focus on the management and monitoring of contagion clusters to provide an innovative approach to learn from SARS-CoV-2 crisis and derive recommendations for future waves and pandemics [Orchestra].

The ability to deal with massive amounts of data in HPC environments is also used to create databases with data from nucleic acids sequencing and use them to detect allelic variant frequencies, as in the NIG project [Nig], a collaboration with the Network for Italian Genomes. Another example of usage of this capability is the set-up of data sharing platform based on novel Federated Learning schemes, to advance research in personalised medicine in haematological diseases [Genomed4All].

Supercomputing is widely used in Drug Design (the process of finding medicines for disease for which there are no or insufficient treatments), with many projects active in this field just like RISC2.

Sometimes, when there is no previous knowledge of the biological target, just like what happened with COVID-19, discovering new drugs requires creating from scratch new molecules [Novartis]. This process involves billion dollar investments to produce and test thousands of molecules and it usually has a low success rate: only about 12% of potential drugs entering the clinical development are approved [Engitix]. The whole process from identifying a possible compound to the end of the clinical trial can take up to 10 years. Nowadays there is an uneven coverage of disease: most of the compounds are used for genetic conditions, while only a few antiviral and antibiotics have been found.

The search for candidate drugs occurs mainly through two different approaches: high-throughput screening and virtual screening. The first one is more reliable but also very expensive and time consuming: it is usually applied when dealing with well-known targets by mainly pharmaceutical companies. The second approach is a good compromise between cost and accuracy and is typically applied against relatively new targets, in academics laboratories, where it is also used to discover or understand better mechanisms of these targets. [Liu2016]

Candidate drugs are usually small molecules that bind to a specific protein or part of it, inhibiting the usual activity of the protein itself. For example, binding the correct ligand to a vial enzyme may stop viral infection. In the process of virtual screening million of compounds are screened against the target protein at different levels: the most basic one simply takes into account the shape to correctly fit into the protein, at higher level also other features are considered as specific interactions, protein flexibility, solubility, human tolerance, and so on. A “score” is assigned to each docked ligand: compounds with highest score are further studied. With massively parallel computers, we can rapidly filter extremely large molecule databases (e.g. billions of molecules).

The current computational power of HPC clusters allow us to analyze up to 3 million compounds per second [Exscalate]. Even though vaccines were developed remarkably quickly, effective drug treatments for people already suffering from covid-19 were very fresh at the beginning of the pandemic. At that time, supercomputers around the world were asked to help with drug design, a real-world example of the power of Urgent Computing. CINECA participates in Exscalate4cov [Exscalate4Cov], currently the most advanced center of competence for fighting the coronavirus, combining the most powerful supercomputing resources and Artificial Intelligence with experimental facilities and clinical validation. 

 

References

[Engitix] https://engitix.com/technology/

[Exscalate] https://www.exscalate.eu/en/projects.html

[Exscalate4CoV] https://www.exscalate4cov.eu/

[Genomed4All] https://genomed4all.eu/

[Ligate] https://www.ligateproject.eu/

[Liu2016] T. Liu, D. Lu, H. Zhang, M. Zheng, H. Yang, Ye. Xu, C. Luo, W. Zhu, K. Yu, and H. Jiang, “Applying high-performance computing in drug discovery and molecular simulation” Natl Sci Rev. 2016 Mar; 3(1): 49–63.

[Nig] http://www.nig.cineca.it/

[Novartis] https://www.novartis.com/stories/art-drug-design-technological-age

[Orchestra] https://orchestra-cohort.eu/

 

By CINECA

The post Using supercomputing for accelerating life science solutions first appeared on RISC2 Project.

]]>
HPC meets AI and Big Data https://www.risc2-project.eu/2022/10/06/hpc-meets-ai-and-big-data/ Thu, 06 Oct 2022 08:23:34 +0000 https://www.risc2-project.eu/?p=2413 HPC services are no longer solely targeted at highly parallel modelling and simulation tasks. Indeed, the computational power offered by these services is now being used to support data-centric Big Data and Artificial Intelligence (AI) applications. By combining both types of computational paradigms, HPC infrastructures will be key for improving the lives of citizens, speeding […]

The post HPC meets AI and Big Data first appeared on RISC2 Project.

]]>
HPC services are no longer solely targeted at highly parallel modelling and simulation tasks. Indeed, the computational power offered by these services is now being used to support data-centric Big Data and Artificial Intelligence (AI) applications. By combining both types of computational paradigms, HPC infrastructures will be key for improving the lives of citizens, speeding up scientific breakthrough in different fields (e.g., health, IoT, biology, chemistry, physics), and increasing the competitiveness of companies [OG+15, NCR+18].

As the utility and usage of HPC infrastructures increases, more computational and storage power is required to efficiently handle the amount of targeted applications. In fact, many HPC centers are now aiming at exascale supercomputers supporting at least one exaFLOPs (1018 operations per second), which represents a thousandfold increase in processing power over the first petascale computer deployed in 2008 [RD+15]. Although this is a necessary requirement for handling the increasing number of HPC applications, there are several outstanding challenges that still need to be tackled so that this extra computational power can be fully leveraged. 

Management of large infrastructures and heterogeneous workloads: By adding more compute and storage nodes, one is also increasing the complexity of the overall HPC distributed infrastructure and making it harder to monitor and manage. This complexity is increased due to the need of supporting highly heterogeneous applications that translate into different workloads with specific data storage and processing needs [ECS+17]. For example, on the one hand, traditional scientific modeling and simulation tasks require large slices of computational time, are CPU-bound, and rely on iterative approaches (parametric/stochastic modeling). On the other hand, data-driven Big Data applications contemplate shorter computational tasks, that are I/O bound and, in some cases, have real-time response requirements (i.e., latency-oriented). Also, many of the applications leverage AI and machine learning tools that require specific hardware (e.g., GPUs) in order to be efficient.

Support for general-purpose analytics: The increased heterogeneity also demands that HPC infrastructures are now able to support general-purpose AI and BigData applications that were not designed explicitly to run on specialised HPC hardware [KWG+13]. Therefore, developers are not required to significantly change their applications so that they can execute efficiently at HPC clusters.

Avoiding the storage bottleneck: By only increasing the computational power and improving the management of HPC infrastructures it may still not be possible to fully harmed the capabilities of these infrastructures. In fact, Big Data and AI applications are data-driven and require efficient data storage and retrieval from HPC clusters. With an increasing number of applications and heterogeneous workloads, the storage systems supporting HPC may easily become a bottleneck [YDI+16, ECS+17]. Indeed, as pointed out by several studies, the storage access time is one of the major bottlenecks limiting the efficiency of current and next-generation HPC infrastructures. 

In order to address these challenges, RISC2 partners are exploring: New monitoring and debugging tools that can aid in the analysis of complex AI and Big Data workloads in order to pinpoint potential performance and efficiency bottlenecks, while helping system administrators and developers on troubleshooting these [ENO+21].

Emerging virtualization technologies, such as containers, that enable users to efficiently deploy and execute traditional AI and BigData applications in an HPC environment, without requiring any changes to their source-code [FMP21].  

The Software-Defined Storage paradigm in order to improve the Quality-of-Service (QoS) for HPC’s storage services when supporting hundreds to thousands of data-intensive AI and Big Data applications [DLC+22, MTH+22].  

To sum up, these three research goals, and respective contributions, will enable the next generation of HPC infrastructures and services that can efficiently meet the demands of Big Data and AI workloads. 

 

References

[DLC+22] Dantas, M., Leitão, D., Cui, P., Macedo, R., Liu, X., Xu, W., Paulo, J., 2022. Accelerating Deep Learning Training Through Transparent Storage Tiering. IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid)  

[ECS+17] Joseph, E., Conway, S., Sorensen, B., Thorp, M., 2017. Trends in the Worldwide HPC Market (Hyperion Presentation). HPC User Forum at HLRS.  

[FMP21] Faria, A., Macedo, R., Paulo, J., 2021. Pods-as-Volumes: Effortlessly Integrating Storage Systems and Middleware into Kubernetes. Workshop on Container Technologies and Container Clouds (WoC’21). 

[KWG+13] Katal, A., Wazid, M. and Goudar, R.H., 2013. Big data: issues, challenges, tools and good practices. International conference on contemporary computing (IC3). 

[NCR+18] Netto, M.A., Calheiros, R.N., Rodrigues, E.R., Cunha, R.L. and Buyya, R., 2018. HPC cloud for scientific and business applications: Taxonomy, vision, and research challenges. ACM Computing Surveys (CSUR). 

[MTH+22] Macedo, R., Tanimura, Y., Haga, J., Chidambaram, V., Pereira, J., Paulo, J., 2022. PAIO: General, Portable I/O Optimizations With Minor Application Modifications. USENIX Conference on File and Storage Technologies (FAST). 

[OG+15] Osseyran, A. and Giles, M. eds., 2015. Industrial applications of high-performance computing: best global practices. 

[RD+15] Reed, D.A. and Dongarra, J., 2015. Exascale computing and big data. Communications of the ACM. 

[ENO+21] Esteves, T., Neves, F., Oliveira, R., Paulo, J., 2021. CaT: Content-aware Tracing and Analysis for Distributed Systems. ACM/IFIP Middleware conference (Middleware). 

[YDI+16] Yildiz, O., Dorier, M., Ibrahim, S., Ross, R. and Antoniu, G., 2016, May. On the root causes of cross-application I/O interference in HPC storage systems. IEEE International Parallel and Distributed Processing Symposium (IPDPS). 

 

By INESC TEC

The post HPC meets AI and Big Data first appeared on RISC2 Project.

]]>