ieee cluster 2022 - RISC2 Project https://www.risc2-project.eu Mon, 11 Sep 2023 15:01:49 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 HPC meets AI and Big Data https://www.risc2-project.eu/2022/10/06/hpc-meets-ai-and-big-data/ Thu, 06 Oct 2022 08:23:34 +0000 https://www.risc2-project.eu/?p=2413 HPC services are no longer solely targeted at highly parallel modelling and simulation tasks. Indeed, the computational power offered by these services is now being used to support data-centric Big Data and Artificial Intelligence (AI) applications. By combining both types of computational paradigms, HPC infrastructures will be key for improving the lives of citizens, speeding […]

The post HPC meets AI and Big Data first appeared on RISC2 Project.

]]>
HPC services are no longer solely targeted at highly parallel modelling and simulation tasks. Indeed, the computational power offered by these services is now being used to support data-centric Big Data and Artificial Intelligence (AI) applications. By combining both types of computational paradigms, HPC infrastructures will be key for improving the lives of citizens, speeding up scientific breakthrough in different fields (e.g., health, IoT, biology, chemistry, physics), and increasing the competitiveness of companies [OG+15, NCR+18].

As the utility and usage of HPC infrastructures increases, more computational and storage power is required to efficiently handle the amount of targeted applications. In fact, many HPC centers are now aiming at exascale supercomputers supporting at least one exaFLOPs (1018 operations per second), which represents a thousandfold increase in processing power over the first petascale computer deployed in 2008 [RD+15]. Although this is a necessary requirement for handling the increasing number of HPC applications, there are several outstanding challenges that still need to be tackled so that this extra computational power can be fully leveraged. 

Management of large infrastructures and heterogeneous workloads: By adding more compute and storage nodes, one is also increasing the complexity of the overall HPC distributed infrastructure and making it harder to monitor and manage. This complexity is increased due to the need of supporting highly heterogeneous applications that translate into different workloads with specific data storage and processing needs [ECS+17]. For example, on the one hand, traditional scientific modeling and simulation tasks require large slices of computational time, are CPU-bound, and rely on iterative approaches (parametric/stochastic modeling). On the other hand, data-driven Big Data applications contemplate shorter computational tasks, that are I/O bound and, in some cases, have real-time response requirements (i.e., latency-oriented). Also, many of the applications leverage AI and machine learning tools that require specific hardware (e.g., GPUs) in order to be efficient.

Support for general-purpose analytics: The increased heterogeneity also demands that HPC infrastructures are now able to support general-purpose AI and BigData applications that were not designed explicitly to run on specialised HPC hardware [KWG+13]. Therefore, developers are not required to significantly change their applications so that they can execute efficiently at HPC clusters.

Avoiding the storage bottleneck: By only increasing the computational power and improving the management of HPC infrastructures it may still not be possible to fully harmed the capabilities of these infrastructures. In fact, Big Data and AI applications are data-driven and require efficient data storage and retrieval from HPC clusters. With an increasing number of applications and heterogeneous workloads, the storage systems supporting HPC may easily become a bottleneck [YDI+16, ECS+17]. Indeed, as pointed out by several studies, the storage access time is one of the major bottlenecks limiting the efficiency of current and next-generation HPC infrastructures. 

In order to address these challenges, RISC2 partners are exploring: New monitoring and debugging tools that can aid in the analysis of complex AI and Big Data workloads in order to pinpoint potential performance and efficiency bottlenecks, while helping system administrators and developers on troubleshooting these [ENO+21].

Emerging virtualization technologies, such as containers, that enable users to efficiently deploy and execute traditional AI and BigData applications in an HPC environment, without requiring any changes to their source-code [FMP21].  

The Software-Defined Storage paradigm in order to improve the Quality-of-Service (QoS) for HPC’s storage services when supporting hundreds to thousands of data-intensive AI and Big Data applications [DLC+22, MTH+22].  

To sum up, these three research goals, and respective contributions, will enable the next generation of HPC infrastructures and services that can efficiently meet the demands of Big Data and AI workloads. 

 

References

[DLC+22] Dantas, M., Leitão, D., Cui, P., Macedo, R., Liu, X., Xu, W., Paulo, J., 2022. Accelerating Deep Learning Training Through Transparent Storage Tiering. IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid)  

[ECS+17] Joseph, E., Conway, S., Sorensen, B., Thorp, M., 2017. Trends in the Worldwide HPC Market (Hyperion Presentation). HPC User Forum at HLRS.  

[FMP21] Faria, A., Macedo, R., Paulo, J., 2021. Pods-as-Volumes: Effortlessly Integrating Storage Systems and Middleware into Kubernetes. Workshop on Container Technologies and Container Clouds (WoC’21). 

[KWG+13] Katal, A., Wazid, M. and Goudar, R.H., 2013. Big data: issues, challenges, tools and good practices. International conference on contemporary computing (IC3). 

[NCR+18] Netto, M.A., Calheiros, R.N., Rodrigues, E.R., Cunha, R.L. and Buyya, R., 2018. HPC cloud for scientific and business applications: Taxonomy, vision, and research challenges. ACM Computing Surveys (CSUR). 

[MTH+22] Macedo, R., Tanimura, Y., Haga, J., Chidambaram, V., Pereira, J., Paulo, J., 2022. PAIO: General, Portable I/O Optimizations With Minor Application Modifications. USENIX Conference on File and Storage Technologies (FAST). 

[OG+15] Osseyran, A. and Giles, M. eds., 2015. Industrial applications of high-performance computing: best global practices. 

[RD+15] Reed, D.A. and Dongarra, J., 2015. Exascale computing and big data. Communications of the ACM. 

[ENO+21] Esteves, T., Neves, F., Oliveira, R., Paulo, J., 2021. CaT: Content-aware Tracing and Analysis for Distributed Systems. ACM/IFIP Middleware conference (Middleware). 

[YDI+16] Yildiz, O., Dorier, M., Ibrahim, S., Ross, R. and Antoniu, G., 2016, May. On the root causes of cross-application I/O interference in HPC storage systems. IEEE International Parallel and Distributed Processing Symposium (IPDPS). 

 

By INESC TEC

The post HPC meets AI and Big Data first appeared on RISC2 Project.

]]>
RISC2 organized a workshop co-located with IEEE Cluster 2022 https://www.risc2-project.eu/2022/09/21/risc2-organized-a-workshop-co-located-with-ieee-cluster-2022/ Wed, 21 Sep 2022 07:55:00 +0000 https://www.risc2-project.eu/?p=2359 RISC2, in collaboration with EU-LAC ResInfra, organized the workshop “HPC for International Collaboration between Europe and Latin America”, in conjunction with IEEE Cluster 2022 Conference in Heidelberg, Germany. About 15 people participated in the workshop, which took place on September 6, 2022. The workshop aimed to exchange experiences, results, and best practices of collaboration initiatives […]

The post RISC2 organized a workshop co-located with IEEE Cluster 2022 first appeared on RISC2 Project.

]]>
RISC2, in collaboration with EU-LAC ResInfra, organized the workshop “HPC for International Collaboration between Europe and Latin America”, in conjunction with IEEE Cluster 2022 Conference in Heidelberg, Germany. About 15 people participated in the workshop, which took place on September 6, 2022.

The workshop aimed to exchange experiences, results, and best practices of collaboration initiatives between Europe and Latin America, in which HPC was essential, and to discuss how to work towards sustainability by reinforcing the bridges between the HPC communities in both regions. The workshop was organized by our partners Esteban Meneses from CeNAT, Fabrizio Gagliardi from BSC, Bernd Mohr from JSC, Carlos J. Barrios H. from UIS, and Rafael Mayo-Gacía from CIEMAT.

The workshop was opened with a keynote by Daniele Lezzi from BSC who reviewed the EU-LATAM collaboration on HPC. Six more presentations highlighted research work from Latin America and collaborative work between organizations on both continents. More information about the workshop including a detailed program can be found here.

 

 

The RISC2 project supported the IEEE Cluster Conference, a major international forum for presenting and sharing recent accomplishments and technological developments in the field of cluster computing, as well as the use of cluster systems for scientific and commercial applications by organizing a networking event at the end of the workshop day.

Our partner Esteban Meneses, from National High Technology Center in Costa Rica and one of the RISC2 partners, was one of the Publicity Co-Chairs of the IEEE Cluster 2022 Conference.

 

 

The post RISC2 organized a workshop co-located with IEEE Cluster 2022 first appeared on RISC2 Project.

]]>
HPC for International Collaboration between Europe and Latin America Workshop https://www.risc2-project.eu/events/hpc-for-international-collaboration-between-europe-and-latin-america/ Mon, 23 May 2022 15:23:09 +0000 https://www.risc2-project.eu/?post_type=mec-events&p=2106

The post HPC for International Collaboration between Europe and Latin America Workshop first appeared on RISC2 Project.

]]>

The post HPC for International Collaboration between Europe and Latin America Workshop first appeared on RISC2 Project.

]]>
IEEE Cluster 2022 https://www.risc2-project.eu/events/ieee-cluster-2022/ Thu, 19 May 2022 08:13:05 +0000 https://www.risc2-project.eu/?post_type=mec-events&p=2047

The post IEEE Cluster 2022 first appeared on RISC2 Project.

]]>

The post IEEE Cluster 2022 first appeared on RISC2 Project.

]]>