Towards Cloud-HPC Continuum

Publications

2024

Simone Leo, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo, Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske Wit, Bruno P. Kinoshita, Stian Soiland-Reyes

Recording provenance of workflow runs with RO-Crate Journal Article

In: PLoS ONE, vol. 19, no. 9, pp. 1–35, 2024.

Abstract | Links | BibTeX

@article{24:pone:wfrunrocrate,

title = {Recording provenance of workflow runs with RO-Crate},

author = {Simone Leo and Michael R. Crusoe and Laura Rodríguez-Navas and Raül Sirvent and Alexander Kanitz and Paul De Geest and Rudolf Wittner and Luca Pireddu and Daniel Garijo and José M. Fernández and Iacopo Colonnelli and Matej Gallo and Tazro Ohta and Hirotaka Suetake and Salvador Capella-Gutierrez and Renske Wit and Bruno P. Kinoshita and Stian Soiland-Reyes},

doi = {10.1371/journal.pone.0309210},

year  = {2024},

date = {2024-09-01},

journal = {PLoS ONE},

volume = {19},

number = {9},

pages = {1–35},

publisher = {Public Library of Science},

abstract = {Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Alberto Mulone, Doriana Medić, Marco Aldinucci

A Fault Tolerance mechanism for Hybrid Scientific Workflows Proceedings Article

In: 1st workshop about High-Performance e-Science (HiPES), Madrid, Spain, 2024.

Abstract | BibTeX

Iacopo Colonnelli, Robert Birke, Giulio Malenza, Gianluca Mittone, Alberto Mulone, Jeroen Galjaard, Lydia Y. Chen, Sanzio Bassini, Gabriella Scipione, Jan Martinovič, Vit Vondrák, Marco Aldinucci

Cross-Facility Federated Learning Journal Article

In: Procedia Computer Science, vol. 240, pp. 3–12, 2024, ISSN: 1877-0509.

Abstract | Links | BibTeX

2023

Marco Aldinucci, Elena Maria Baralis, Valeria Cardellini, Iacopo Colonnelli, Marco Danelutto, Sergio Decherchi, Giuseppe Di Modica, Luca Ferrucci, Marco Gribaudo, Francesco Iannone, Marco Lapegna, Doriana Medic, Giuseppa Muscianisi, Francesca Righetti, Eva Sciacca, Nicola Tonellotto, Mauro Tortonesi, Paolo Trunfio, Tullio Vardanega

A Systematic Mapping Study of Italian Research on Workflows Proceedings Article

In: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W 2023, pp. 2065–2076, ACM, Denver, CO, USA, 2023.

Abstract | Links | BibTeX

Alberto Mulone, Sherine Awad, Davide Chiarugi, Marco Aldinucci

Porting the Variant Calling Pipeline for NGS data in cloud-HPC environment Proceedings Article

In: Shahriar, Hossain, Teranishi, Yuuichi, Cuzzocrea, Alfredo, Sharmin, Moushumi, Towey, Dave, Majumder, A. K. M. Jahangir Alam, Kashiwazaki, Hiroki, Yang, Ji-Jiang, Takemoto, Michiharu, Sakib, Nazmus, Banno, Ryohei, Ahamed, Sheikh Iqbal (Ed.): 47th IEEE Annual Computers, Software, and Applications Conference, COMPSAC 2023, pp. 1858–1863, IEEE, Torino, Italy, 2023.

Abstract | Links | BibTeX

Iacopo Colonnelli, Bruno Casella, Gianluca Mittone, Yasir Arfat, Barbara Cantalupo, Roberto Esposito, Alberto Riccardo Martinelli, Doriana Medić, Marco Aldinucci

Federated Learning meets HPC and cloud Proceedings Article

In: Bufano, Filomena, Riggi, Simone, Sciacca, Eva, Schillirò, Francesco (Ed.): Astrophysics and Space Science Proceedings, pp. 193–199, Springer, Catania, Italy, 2023, ISBN: 978-3-031-34167-0.

Abstract | Links | BibTeX

Sandro Gepiro Contaldo, Luca Alessandri, Iacopo Colonnelli, Marco Beccuti, Marco Aldinucci

Bringing Cell Subpopulation Discovery on a Cloud-HPC Using rCASC and StreamFlow Book Chapter

In: Calogero, Raffaele Adolfo, Benes, Vladimir (Ed.): Single Cell Transcriptomics: Methods and Protocols, pp. 337–345, Springer US, New York, NY, 2023, ISBN: 978-1-0716-2756-3.

Abstract | Links | BibTeX

Iacopo Colonnelli

Workflow Models for Heterogeneous Distributed Systems Proceedings Article

In: Bena, Nicola, Martino, Beniamino Di, Maratea, Antonio, Sperduti, Alessandro, Nardo, Emanuel Di, Ciaramella, Angelo, Montella, Raffaele, Ardagna, Claudio A. (Ed.): Proceedings of the 2nd Italian Conference on Big Data and Data Science (ITADATA 2023), Naples, Italy, September 11-13, 2023, CEUR-WS.org, 2023.

Abstract | BibTeX

2022

Marco Aldinucci, David Atienza, Federico Bolelli, Mónica Caballero, Iacopo Colonnelli, José Flich, Jon Ander Gómez, David González, Costantino Grana, Marco Grangetto, Simone Leo, Pedro López, Dana Oniga, Roberto Paredes, Luca Pireddu, Eduardo Quiñones, Tatiana Silva, Enzo Tartaglione, Marina Zapater

The DeepHealth Toolkit: A Key European Free and Open-Source Software for Deep Learning and Computer Vision Ready to Exploit Heterogeneous HPC and Cloud Architectures Book Section

In: Curry, Edward, Auer, Sören, Berre, Arne J., Metzger, Andreas, Perez, Maria S., Zillner, Sonja (Ed.): Technologies and Applications for Big Data Value, pp. 183–202, Springer International Publishing, Cham, 2022, ISBN: 978-3-030-78307-5.

Abstract | Links | BibTeX

@incollection{22:TABDV,

title = {The DeepHealth Toolkit: A Key European Free and Open-Source Software for Deep Learning and Computer Vision Ready to Exploit Heterogeneous HPC and Cloud Architectures},

author = {Marco Aldinucci and David Atienza and Federico Bolelli and Mónica Caballero and Iacopo Colonnelli and José Flich and Jon Ander Gómez and David González and Costantino Grana and Marco Grangetto and Simone Leo and Pedro López and Dana Oniga and Roberto Paredes and Luca Pireddu and Eduardo Quiñones and Tatiana Silva and Enzo Tartaglione and Marina Zapater},

editor = {Edward Curry and Sören Auer and Arne J. Berre and Andreas Metzger and Maria S. Perez and Sonja Zillner},

doi = {10.1007/978-3-030-78307-5_9},

isbn = {978-3-030-78307-5},

year  = {2022},

date = {2022-01-01},

booktitle = {Technologies and Applications for Big Data Value},

pages = {183–202},

publisher = {Springer International Publishing},

address = {Cham},

chapter = {9},

abstract = {At the present time, we are immersed in the convergence between Big Data, High-Performance Computing and Artificial Intelligence. Technological progress in these three areas has accelerated in recent years, forcing different players like software companies and stakeholders to move quickly. The European Union is dedicating a lot of resources to maintain its relevant position in this scenario, funding projects to implement large-scale pilot testbeds that combine the latest advances in Artificial Intelligence, High-Performance Computing, Cloud and Big Data technologies. The DeepHealth project is an example focused on the health sector whose main outcome is the DeepHealth toolkit, a European unified framework that offers deep learning and computer vision capabilities, completely adapted to exploit underlying heterogeneous High-Performance Computing, Big Data and cloud architectures, and ready to be integrated into any software platform to facilitate the development and deployment of new applications for specific problems in any sector. This toolkit is intended to be one of the European contributions to the field of AI. This chapter introduces the toolkit with its main components and complementary tools, providing a clear view to facilitate and encourage its adoption and wide use by the European community of developers of AI-based solutions and data scientists working in the healthcare sector and others.},

keywords = {},

pubstate = {published},

tppubtype = {incollection}

}

Eduardo Quiñones, Jesus Perales, Jorge Ejarque, Asaf Badouh, Santiago Marco, Fabrice Auzanneau, François Galea, David González, José Ramón Hervás, Tatiana Silva, Iacopo Colonnelli, Barbara Cantalupo, Marco Aldinucci, Enzo Tartaglione, Rafael Tornero, José Flich, Jose Maria Martinez, David Rodriguez, Izan Catalán, Jorge Garcia, Carles Hernández

The DeepHealth HPC Infrastructure: Leveraging Heterogenous HPC and Cloud Computing Infrastructures for IA-based Medical Solutions Book Section

In: Terzo, Olivier, Martinovič, Jan (Ed.): HPC, Big Data, and AI Convergence Towards Exascale: Challenge and Vision, pp. 191–216, CRC Press, Boca Raton, Florida, 2022, ISBN: 978-1-0320-0984-1.

Abstract | Links | BibTeX

Martin Golasowski, Jan Martinovič, Marc Levrier, Stephan Hachinger, Sophia Karagiorgou, Aikaterini Papapostolou, Spiros Mouzakitis, Ioannis Tsapelas, Monica Caballero, Marco Aldinucci, Jon Ander Gómez, Antony Chazapis, Jean-Thomas Acquaviva

Toward the Convergence of High-Performance Computing, Cloud, and Big Data Domains Book Section

In: Terzo, Olivier, Martinovič, Jan (Ed.): HPC, Big Data, and AI Convergence Towards Exascale: Challenge and Vision, pp. 1–16, CRC Press, Boca Raton, Florida, 2022, ISBN: 978-1-0320-0984-1.

Abstract | Links | BibTeX

Dana Oniga, Barbara Cantalupo, Enzo Tartaglione, Daniele Perlo, Marco Grangetto, Marco Aldinucci, Federico Bolelli, Federico Pollastri, Michele Cancilla, Laura Canalini, Costantino Grana, Cristina Muñoz Alcalde, Franco Alberto Cardillo, Monica Florea

Applications of AI and HPC in the Health Domain Book Section

In: Terzo, Olivier, Martinovič, Jan (Ed.): HPC, Big Data, and AI Convergence Towards Exascale: Challenge and Vision, pp. 217–239, CRC Press, Boca Raton, Florida, 2022, ISBN: 978-1-0320-0984-1.

Abstract | Links | BibTeX

Iacopo Colonnelli, Marco Aldinucci, Barbara Cantalupo, Luca Padovani, Sergio Rabellino, Concetto Spampinato, Roberto Morelli, Rosario Di Carlo, Nicolò Magini, Carlo Cavazzoni

Distributed workflows with Jupyter Journal Article

In: Future Generation Computer Systems, vol. 128, pp. 282–298, 2022, ISSN: 0167-739X.

Abstract | Links | BibTeX

2021

Giovanni Agosta, William Fornaciari, Andrea Galimberti, Giuseppe Massari, Federico Reghenzani, Federico Terraneo, Davide Zoni, Carlo Brandolese, Massimo Celino, Francesco Iannone, Paolo Palazzari, Giuseppe Zummo, Massimo Bernaschi, Pasqua D'Ambra, Sergio Saponara, Marco Danelutto, Massimo Torquati, Marco Aldinucci, Yasir Arfat, Barbara Cantalupo, Iacopo Colonnelli, Roberto Esposito, Alberto Riccardo Martinelli, Gianluca Mittone, Olivier Beaumont, Berenger Bramas, Lionel Eyraud-Dubois, Brice Goglin, Abdou Guermouche, Raymond Namyst, Samuel Thibault, Antonio Filgueras, Miquel Vidal, Carlos Alvarez, Xavier Martorell, Ariel Oleksiak, Michal Kulczewski, Alessandro Lonardo, Piero Vicini, Francesco Lo Cicero, Francesco Simula, Andrea Biagioni, Paolo Cretaro, Ottorino Frezza, Pier Stanislao Paolucci, Matteo Turisini, Francesco Giacomini, Tommaso Boccali, Simone Montangero, Roberto Ammendola

TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale Proceedings Article

In: Proc. of the 24th Euromicro Conference on Digital System Design (DSD), IEEE, Palermo, Italy, 2021.

Abstract | Links | BibTeX

@inproceedings{21:DSD:textarossa,

title = {TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale},

author = {Giovanni Agosta and William Fornaciari and Andrea Galimberti and Giuseppe Massari and Federico Reghenzani and Federico Terraneo and Davide Zoni and Carlo Brandolese and Massimo Celino and Francesco Iannone and Paolo Palazzari and Giuseppe Zummo and Massimo Bernaschi and Pasqua D'Ambra and Sergio Saponara and Marco Danelutto and Massimo Torquati and Marco Aldinucci and Yasir Arfat and Barbara Cantalupo and Iacopo Colonnelli and Roberto Esposito and Alberto Riccardo Martinelli and Gianluca Mittone and Olivier Beaumont and Berenger Bramas and Lionel Eyraud-Dubois and Brice Goglin and Abdou Guermouche and Raymond Namyst and Samuel Thibault and Antonio Filgueras and Miquel Vidal and Carlos Alvarez and Xavier Martorell and Ariel Oleksiak and Michal Kulczewski and Alessandro Lonardo and Piero Vicini and Francesco Lo Cicero and Francesco Simula and Andrea Biagioni and Paolo Cretaro and Ottorino Frezza and Pier Stanislao Paolucci and Matteo Turisini and Francesco Giacomini and Tommaso Boccali and Simone Montangero and Roberto Ammendola},

doi = {10.1109/DSD53832.2021.00051},

year  = {2021},

date = {2021-08-01},

booktitle = {Proc. of the 24th Euromicro Conference on Digital System Design (DSD)},

publisher = {IEEE},

address = {Palermo, Italy},

abstract = {To achieve high performance and high energy effi- ciency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: en- ergy efficiency and thermal control; extreme computation effi- ciency via HW acceleration and new arithmetics; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Iacopo Colonnelli, Barbara Cantalupo, Concetto Spampinato, Matteo Pennisi, Marco Aldinucci

Bringing AI pipelines onto cloud-HPC: setting a baseline for accuracy of COVID-19 diagnosis Proceedings Article

In: Iannone, Francesco (Ed.): ENEA CRESCO in the fight against COVID-19, ENEA, 2021.

Abstract | Links | BibTeX

Iacopo Colonnelli, Barbara Cantalupo, Roberto Esposito, Matteo Pennisi, Concetto Spampinato, Marco Aldinucci

HPC Application Cloudification: The StreamFlow Toolkit Proceedings Article

In: Bispo, João, Cherubin, Stefano, Flich, José (Ed.): 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021), pp. 5:1–5:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2021, ISSN: 2190-6807.

Abstract | Links | BibTeX

@inproceedings{colonnelli_et_al:OASIcs.PARMA-DITAM.2021.5,

title = {HPC Application Cloudification: The StreamFlow Toolkit},

author = {Iacopo Colonnelli and Barbara Cantalupo and Roberto Esposito and Matteo Pennisi and Concetto Spampinato and Marco Aldinucci},

editor = {João Bispo and Stefano Cherubin and José Flich},

doi = {10.4230/OASIcs.PARMA-DITAM.2021.5},

issn = {2190-6807},

year  = {2021},

date = {2021-01-01},

booktitle = {12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},

volume = {88},

pages = {5:1–5:13},

publisher = {Schloss Dagstuhl – Leibniz-Zentrum für Informatik},

address = {Dagstuhl, Germany},

series = {Open Access Series in Informatics (OASIcs)},

abstract = {Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud infrastructure can be offloaded to HPC environments to speed them up. We introduce StreamFlow, a novel Workflow Management System that supports such a design pattern and makes it possible to run the steps of a standard workflow model on independent processing elements with no shared storage. We validated the proposed approach's effectiveness on the CLAIRE COVID-19 universal pipeline, i.e. a reproducible workflow capable of automating the comparison of (possibly all) state-of-the-art pipelines for the diagnosis of COVID-19 interstitial pneumonia from CT scans images based on Deep Neural Networks (DNNs).},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Iacopo Colonnelli, Barbara Cantalupo, Ivan Merelli, Marco Aldinucci

StreamFlow: cross-breeding cloud with HPC Journal Article

In: IEEE Transactions on Emerging Topics in Computing, vol. 9, no. 4, pp. 1723–1737, 2021.

Abstract | Links | BibTeX