2023
- S. G. Contaldo, L. Alessandri, I. Colonnelli, M. Beccuti, and M. Aldinucci, “Bringing cell subpopulation discovery on a cloud-HPC using rCASC and StreamFlow,” in Single cell transcriptomics: methods and protocols, R. A. Calogero and V. Benes, Eds., New York, NY: Springer US, 2023, p. 337–345. doi:10.1007/978-1-0716-2756-3_17
[BibTeX] [Abstract]
The idea behind novel single-cell RNA sequencing (scRNA-seq) pipelines is to isolate single cells through microfluidic approaches and generate sequencing libraries in which the transcripts are tagged to track their cell of origin. Modern scRNA-seq platforms are capable of analyzing up to many thousands of cells in each run. Then, combined with massive high-throughput sequencing producing billions of reads, scRNA-seq allows the assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution.
@inbook{Contaldo2023, abstract = {The idea behind novel single-cell RNA sequencing (scRNA-seq) pipelines is to isolate single cells through microfluidic approaches and generate sequencing libraries in which the transcripts are tagged to track their cell of origin. Modern scRNA-seq platforms are capable of analyzing up to many thousands of cells in each run. Then, combined with massive high-throughput sequencing producing billions of reads, scRNA-seq allows the assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution.}, address = {New York, NY}, author = {Contaldo, Sandro Gepiro and Alessandri, Luca and Colonnelli, Iacopo and Beccuti, Marco and Aldinucci, Marco}, booktitle = {Single Cell Transcriptomics: Methods and Protocols}, doi = {10.1007/978-1-0716-2756-3_17}, editor = {Calogero, Raffaele Adolfo and Benes, Vladimir}, isbn = {978-1-0716-2756-3}, pages = {337--345}, publisher = {Springer {US}}, title = {Bringing Cell Subpopulation Discovery on a Cloud-{HPC} Using {rCASC} and {StreamFlow}}, year = 2023 }
2022
- I. Colonnelli, B. Casella, G. Mittone, Y. Arfat, B. Cantalupo, R. Esposito, A. R. Martinelli, D. Medić, and M. Aldinucci, “Federated learning meets HPC and cloud,” in Astrophysics and space science proceedings, Catania, Italy, 2022.
[BibTeX] [Abstract]
HPC and AI are fated to meet for several reasons. This article will discuss some of them and argue why this will happen through the set of methods and technologies that underpin cloud computing. As a paradigmatic example, we present a new federated learning system that collaboratively trains a deep learning model in different supercomputing centers. The system is based on the StreamFlow workflow manager designed for hybrid cloud-HPC infrastructures.
@inproceedings{22:ml4astro, abstract = {HPC and AI are fated to meet for several reasons. This article will discuss some of them and argue why this will happen through the set of methods and technologies that underpin cloud computing. As a paradigmatic example, we present a new federated learning system that collaboratively trains a deep learning model in different supercomputing centers. The system is based on the StreamFlow workflow manager designed for hybrid cloud-HPC infrastructures.}, address = {Catania, Italy}, author = {Iacopo Colonnelli and Bruno Casella and Gianluca Mittone and Yasir Arfat and Barbara Cantalupo and Roberto Esposito and Alberto Riccardo Martinelli and Doriana Medi\'{c} and Marco Aldinucci}, booktitle = {Astrophysics and Space Science Proceedings}, publisher = {Springer}, title = {Federated Learning meets {HPC} and cloud}, year = {2022} }
- M. Aldinucci, D. Atienza, F. Bolelli, M. Caballero, I. Colonnelli, J. Flich, J. A. Gómez, D. González, C. Grana, M. Grangetto, S. Leo, P. López, D. Oniga, R. Paredes, L. Pireddu, E. Quiñones, T. Silva, E. Tartaglione, and M. Zapater, “The DeepHealth toolkit: a key european free and open-source software for deep learning and computer vision ready to exploit heterogeneous HPC and Cloud architectures,” in Technologies and applications for big data value, E. Curry, S. Auer, A. J. Berre, A. Metzger, M. S. Perez, and S. Zillner, Eds., Cham: Springer international publishing, 2022, p. 183–202. doi:10.1007/978-3-030-78307-5_9
[BibTeX] [Abstract]
At the present time, we are immersed in the convergence between Big Data, High-Performance Computing and Artificial Intelligence. Technological progress in these three areas has accelerated in recent years, forcing different players like software companies and stakeholders to move quickly. The European Union is dedicating a lot of resources to maintain its relevant position in this scenario, funding projects to implement large-scale pilot testbeds that combine the latest advances in Artificial Intelligence, High-Performance Computing, Cloud and Big Data technologies. The DeepHealth project is an example focused on the health sector whose main outcome is the DeepHealth toolkit, a European unified framework that offers deep learning and computer vision capabilities, completely adapted to exploit underlying heterogeneous High-Performance Computing, Big Data and cloud architectures, and ready to be integrated into any software platform to facilitate the development and deployment of new applications for specific problems in any sector. This toolkit is intended to be one of the European contributions to the field of AI. This chapter introduces the toolkit with its main components and complementary tools, providing a clear view to facilitate and encourage its adoption and wide use by the European community of developers of AI-based solutions and data scientists working in the healthcare sector and others.
@incollection{22:TABDV, abstract = {At the present time, we are immersed in the convergence between Big Data, High-Performance Computing and Artificial Intelligence. Technological progress in these three areas has accelerated in recent years, forcing different players like software companies and stakeholders to move quickly. The European Union is dedicating a lot of resources to maintain its relevant position in this scenario, funding projects to implement large-scale pilot testbeds that combine the latest advances in Artificial Intelligence, High-Performance Computing, Cloud and Big Data technologies. The DeepHealth project is an example focused on the health sector whose main outcome is the DeepHealth toolkit, a European unified framework that offers deep learning and computer vision capabilities, completely adapted to exploit underlying heterogeneous High-Performance Computing, Big Data and cloud architectures, and ready to be integrated into any software platform to facilitate the development and deployment of new applications for specific problems in any sector. This toolkit is intended to be one of the European contributions to the field of AI. This chapter introduces the toolkit with its main components and complementary tools, providing a clear view to facilitate and encourage its adoption and wide use by the European community of developers of AI-based solutions and data scientists working in the healthcare sector and others.}, address = {Cham}, author = {Marco Aldinucci and David Atienza and Federico Bolelli and M\'{o}nica Caballero and Iacopo Colonnelli and Jos\'{e} Flich and Jon Ander G\'{o}mez and David Gonz\'{a}lez and Costantino Grana and Marco Grangetto and Simone Leo and Pedro L\'{o}pez and Dana Oniga and Roberto Paredes and Luca Pireddu and Eduardo Qui\~{n}ones and Tatiana Silva and Enzo Tartaglione and Marina Zapater}, booktitle = {Technologies and Applications for Big Data Value}, chapter = {9}, doi = {10.1007/978-3-030-78307-5_9}, editor = {Edward Curry and S\"{o}ren Auer and Arne J. Berre and Andreas Metzger and Maria S. Perez and Sonja Zillner}, isbn = {978-3-030-78307-5}, pages = {183--202}, publisher = {Springer International Publishing}, title = {The {DeepHealth} Toolkit: A Key European Free and Open-Source Software for Deep Learning and Computer Vision Ready to Exploit Heterogeneous {HPC} and {C}loud Architectures}, year = {2022} }
- E. Quiñones, J. Perales, J. Ejarque, A. Badouh, S. Marco, F. Auzanneau, F. Galea, D. González, J. R. Hervás, T. Silva, I. Colonnelli, B. Cantalupo, M. Aldinucci, E. Tartaglione, R. Tornero, J. Flich, J. M. Martinez, D. Rodriguez, I. Catalán, J. Garcia, and C. Hernández, “The DeepHealth HPC infrastructure: leveraging heterogenous HPC and cloud computing infrastructures for IA-based medical solutions,” in HPC, big data, and AI convergence towards exascale: challenge and vision, O. Terzo and J. Martinovič, Eds., Boca Raton, Florida: CRC press, 2022, p. 191–216. doi:10.1201/9781003176664
[BibTeX] [Abstract]
This chapter presents the DeepHealth HPC toolkit for an efficient execution of deep learning (DL) medical application into HPC and cloud-computing infrastructures, featuring many-core, GPU, and FPGA acceleration devices. The toolkit offers to the European Computer Vision Library and the European Distributed Deep Learning Library (EDDL), developed in the DeepHealth project as well, the mechanisms to distribute and parallelize DL operations on HPC and cloud infrastructures in a fully transparent way. The toolkit implements workflow managers used to orchestrate HPC workloads for an efficient parallelization of EDDL training operations on HPC and cloud infrastructures, and includes the parallel programming models for an efficient execution EDDL inference and training operations on many-core, GPUs and FPGAs acceleration devices.
@incollection{22:deephealth:HPCbook, abstract = {This chapter presents the DeepHealth HPC toolkit for an efficient execution of deep learning (DL) medical application into HPC and cloud-computing infrastructures, featuring many-core, GPU, and FPGA acceleration devices. The toolkit offers to the European Computer Vision Library and the European Distributed Deep Learning Library (EDDL), developed in the DeepHealth project as well, the mechanisms to distribute and parallelize DL operations on HPC and cloud infrastructures in a fully transparent way. The toolkit implements workflow managers used to orchestrate HPC workloads for an efficient parallelization of EDDL training operations on HPC and cloud infrastructures, and includes the parallel programming models for an efficient execution EDDL inference and training operations on many-core, GPUs and FPGAs acceleration devices.}, address = {Boca Raton, Florida}, author = {Eduardo Qui\~{n}ones and Jesus Perales and Jorge Ejarque and Asaf Badouh and Santiago Marco and Fabrice Auzanneau and Fran\c{c}ois Galea and David Gonz\'{a}lez and Jos\'{e} Ram\'{o}n Herv\'{a}s and Tatiana Silva and Iacopo Colonnelli and Barbara Cantalupo and Marco Aldinucci and Enzo Tartaglione and Rafael Tornero and Jos\'{e} Flich and Jose Maria Martinez and David Rodriguez and Izan Catal\'{a}n and Jorge Garcia and Carles Hern\'{a}ndez}, booktitle = {{HPC}, Big Data, and {AI} Convergence Towards Exascale: Challenge and Vision}, chapter = {10}, doi = {10.1201/9781003176664}, editor = {Olivier Terzo and Jan Martinovi\v{c}}, isbn = {978-1-0320-0984-1}, pages = {191--216}, publisher = {{CRC} Press}, title = {The {DeepHealth} {HPC} Infrastructure: Leveraging Heterogenous {HPC} and Cloud Computing Infrastructures for {IA}-based Medical Solutions}, year = {2022} }
- M. Golasowski, J. Martinovič, M. Levrier, S. Hachinger, S. Karagiorgou, A. Papapostolou, S. Mouzakitis, I. Tsapelas, M. Caballero, M. Aldinucci, J. A. Gómez, A. Chazapis, and J. Acquaviva, “Toward the convergence of high-performance computing, cloud, and big data domains,” in HPC, big data, and AI convergence towards exascale: challenge and vision, O. Terzo and J. Martinovič, Eds., Boca Raton, Florida: CRC press, 2022, p. 1–16. doi:10.1201/9781003176664
[BibTeX] [Abstract]
Convergence between big data, high-performance computing, and the cloud is the key driving factor for sustainable economic growth in the future. Technological advances in many fields are determined by competence to gain precise information from the large amounts of data collected, which in turn requires powerful computing resources. This chapter provides an overview on the evolution of the three fields and four different points of view on their convergence provided by the CYBELE, DeepHealth, Evolve, and LEXIS projects funded by the European Union under the Horizon 2020 Programme.
@incollection{22:intro:HPCbook, abstract = {Convergence between big data, high-performance computing, and the cloud is the key driving factor for sustainable economic growth in the future. Technological advances in many fields are determined by competence to gain precise information from the large amounts of data collected, which in turn requires powerful computing resources. This chapter provides an overview on the evolution of the three fields and four different points of view on their convergence provided by the CYBELE, DeepHealth, Evolve, and LEXIS projects funded by the European Union under the Horizon 2020 Programme.}, address = {Boca Raton, Florida}, author = {Martin Golasowski and Jan Martinovi{\v c} and Marc Levrier and Stephan Hachinger and Sophia Karagiorgou and Aikaterini Papapostolou and Spiros Mouzakitis and Ioannis Tsapelas and Monica Caballero and Marco Aldinucci and Jon Ander G{\'o}mez and Antony Chazapis and Jean-Thomas Acquaviva}, booktitle = {{HPC}, Big Data, and {AI} Convergence Towards Exascale: Challenge and Vision}, chapter = {1}, doi = {10.1201/9781003176664}, editor = {Olivier Terzo and Jan Martinovi\v{c}}, isbn = {978-1-0320-0984-1}, pages = {1--16}, publisher = {{CRC} Press}, title = {Toward the Convergence of High-Performance Computing, Cloud, and Big Data Domains}, year = {2022} }
- D. Oniga, B. Cantalupo, E. Tartaglione, D. Perlo, M. Grangetto, M. Aldinucci, F. Bolelli, F. Pollastri, M. Cancilla, L. Canalini, C. Grana, C. M. Alcalde, F. A. Cardillo, and M. Florea, “Applications of AI and HPC in the health domain,” in HPC, big data, and AI convergence towards exascale: challenge and vision, O. Terzo and J. Martinovič, Eds., Boca Raton, Florida: CRC press, 2022, p. 217–239. doi:10.1201/9781003176664
[BibTeX] [Abstract]
This chapter presents the applications of artificial intelligence (AI) and high-computing performance (HPC) in the health domain, illustrated by the description of five of the use cases that are developed in the DeepHealth project. In the context of the European Commission supporting the use of AI and HPC in the health sector, DeepHealth Project is helping health experts process large quantities of images, putting at their disposal DeepLearning and computer vision techniques, combined in the DeepHealth toolkit and HPC infrastructures. The DeepHealth toolkit is tested and validated through 15 use cases, each of them representing a biomedical application. The most promising use cases are described in the chapter, which concludes with the value proposition and the benefits that DeepHealth toolkit offers to future end users.
@incollection{22:applications:HPCbook, abstract = {This chapter presents the applications of artificial intelligence (AI) and high-computing performance (HPC) in the health domain, illustrated by the description of five of the use cases that are developed in the DeepHealth project. In the context of the European Commission supporting the use of AI and HPC in the health sector, DeepHealth Project is helping health experts process large quantities of images, putting at their disposal DeepLearning and computer vision techniques, combined in the DeepHealth toolkit and HPC infrastructures. The DeepHealth toolkit is tested and validated through 15 use cases, each of them representing a biomedical application. The most promising use cases are described in the chapter, which concludes with the value proposition and the benefits that DeepHealth toolkit offers to future end users.}, address = {Boca Raton, Florida}, author = {Dana Oniga and Barbara Cantalupo and Enzo Tartaglione and Daniele Perlo and Marco Grangetto and Marco Aldinucci and Federico Bolelli and Federico Pollastri and Michele Cancilla and Laura Canalini and Costantino Grana and Cristina Mu{\~n}oz Alcalde and Franco Alberto Cardillo and Monica Florea}, booktitle = {{HPC}, Big Data, and {AI} Convergence Towards Exascale: Challenge and Vision}, chapter = {11}, doi = {10.1201/9781003176664}, editor = {Olivier Terzo and Jan Martinovi\v{c}}, isbn = {978-1-0320-0984-1}, pages = {217--239}, publisher = {{CRC} Press}, title = {Applications of {AI} and {HPC} in the Health Domain}, year = {2022} }
- I. Colonnelli, M. Aldinucci, B. Cantalupo, L. Padovani, S. Rabellino, C. Spampinato, R. Morelli, R. Di Carlo, N. Magini, and C. Cavazzoni, “Distributed workflows with Jupyter,” Future generation computer systems, vol. 128, pp. 282-298, 2022. doi:10.1016/j.future.2021.10.007
[BibTeX] [Abstract]
The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code’s business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations.
@article{21:FGCS:jupyflow, abstract = {The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code's business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks' potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations.}, author = {Iacopo Colonnelli and Marco Aldinucci and Barbara Cantalupo and Luca Padovani and Sergio Rabellino and Concetto Spampinato and Roberto Morelli and Rosario {Di Carlo} and Nicol{\`o} Magini and Carlo Cavazzoni}, doi = {10.1016/j.future.2021.10.007}, issn = {0167-739X}, journal = {Future Generation Computer Systems}, pages = {282-298}, title = {Distributed workflows with {Jupyter}}, volume = {128}, year = {2022} }
2021
- G. Agosta, W. Fornaciari, A. Galimberti, G. Massari, F. Reghenzani, F. Terraneo, D. Zoni, C. Brandolese, M. Celino, F. Iannone, P. Palazzari, G. Zummo, M. Bernaschi, P. D’Ambra, S. Saponara, M. Danelutto, M. Torquati, M. Aldinucci, Y. Arfat, B. Cantalupo, I. Colonnelli, R. Esposito, A. R. Martinelli, G. Mittone, O. Beaumont, B. Bramas, L. Eyraud-Dubois, B. Goglin, A. Guermouche, R. Namyst, S. Thibault, A. Filgueras, M. Vidal, C. Alvarez, X. Martorell, A. Oleksiak, M. Kulczewski, A. Lonardo, P. Vicini, F. L. Cicero, F. Simula, A. Biagioni, P. Cretaro, O. Frezza, P. S. Paolucci, M. Turisini, F. Giacomini, T. Boccali, S. Montangero, and R. Ammendola, “TEXTAROSSA: towards extreme scale technologies and accelerators for eurohpc hw/sw supercomputing applications for exascale,” in Proc. of the 24th euromicro conference on digital system design (DSD), Palermo, Italy, 2021. doi:10.1109/DSD53832.2021.00051
[BibTeX] [Abstract]
To achieve high performance and high energy effi- ciency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: en- ergy efficiency and thermal control; extreme computation effi- ciency via HW acceleration and new arithmetics; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research.
@inproceedings{21:DSD:textarossa, abstract = {To achieve high performance and high energy effi- ciency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: en- ergy efficiency and thermal control; extreme computation effi- ciency via HW acceleration and new arithmetics; methods and tools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research.}, address = {Palermo, Italy}, author = {Giovanni Agosta and William Fornaciari and Andrea Galimberti and Giuseppe Massari and Federico Reghenzani and Federico Terraneo and Davide Zoni and Carlo Brandolese and Massimo Celino and Francesco Iannone and Paolo Palazzari and Giuseppe Zummo and Massimo Bernaschi and Pasqua D'Ambra and Sergio Saponara and Marco Danelutto and Massimo Torquati and Marco Aldinucci and Yasir Arfat and Barbara Cantalupo and Iacopo Colonnelli and Roberto Esposito and Alberto Riccardo Martinelli and Gianluca Mittone and Olivier Beaumont and Berenger Bramas and Lionel Eyraud-Dubois and Brice Goglin and Abdou Guermouche and Raymond Namyst and Samuel Thibault and Antonio Filgueras and Miquel Vidal and Carlos Alvarez and Xavier Martorell and Ariel Oleksiak and Michal Kulczewski and Alessandro Lonardo and Piero Vicini and Francesco Lo Cicero and Francesco Simula and Andrea Biagioni and Paolo Cretaro and Ottorino Frezza and Pier Stanislao Paolucci and Matteo Turisini and Francesco Giacomini and Tommaso Boccali and Simone Montangero and Roberto Ammendola}, booktitle = {Proc. of the 24th Euromicro Conference on Digital System Design ({DSD})}, doi = {10.1109/DSD53832.2021.00051}, month = aug, publisher = {IEEE}, title = {{TEXTAROSSA}: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale}, year = {2021} }
- I. Colonnelli, B. Cantalupo, C. Spampinato, M. Pennisi, and M. Aldinucci, “Bringing ai pipelines onto cloud-hpc: setting a baseline for accuracy of covid-19 diagnosis,” in Enea cresco in the fight against covid-19, 2021. doi:10.5281/zenodo.5151511
[BibTeX] [Abstract]
HPC is an enabling platform for AI. The introduction of AI workloads in the HPC applications basket has non-trivial consequences both on the way of designing AI applications and on the way of providing HPC computing. This is the leitmotif of the convergence between HPC and AI. The formalized definition of AI pipelines is one of the milestones of HPC-AI convergence. If well conducted, it allows, on the one hand, to obtain portable and scalable applications. On the other hand, it is crucial for the reproducibility of scientific pipelines. In this work, we advocate the StreamFlow Workflow Management System as a crucial ingredient to define a parametric pipeline, called “CLAIRE COVID-19 Universal Pipeline”, which is able to explore the optimization space of methods to classify COVID-19 lung lesions from CT scans, compare them for accuracy, and therefore set a performance baseline. The universal pipeline automatizes the training of many different Deep Neural Networks (DNNs) and many different hyperparameters. It, therefore, requires a massive computing power, which is found in traditional HPC infrastructure thanks to the portability-by-design of pipelines designed with StreamFlow. Using the universal pipeline, we identified a DNN reaching over 90\% accuracy in detecting COVID-19 lesions in CT scans.
@inproceedings{21:covi:enea, abstract = {HPC is an enabling platform for AI. The introduction of AI workloads in the HPC applications basket has non-trivial consequences both on the way of designing AI applications and on the way of providing HPC computing. This is the leitmotif of the convergence between HPC and AI. The formalized definition of AI pipelines is one of the milestones of HPC-AI convergence. If well conducted, it allows, on the one hand, to obtain portable and scalable applications. On the other hand, it is crucial for the reproducibility of scientific pipelines. In this work, we advocate the StreamFlow Workflow Management System as a crucial ingredient to define a parametric pipeline, called ``CLAIRE COVID-19 Universal Pipeline'', which is able to explore the optimization space of methods to classify COVID-19 lung lesions from CT scans, compare them for accuracy, and therefore set a performance baseline. The universal pipeline automatizes the training of many different Deep Neural Networks (DNNs) and many different hyperparameters. It, therefore, requires a massive computing power, which is found in traditional HPC infrastructure thanks to the portability-by-design of pipelines designed with StreamFlow. Using the universal pipeline, we identified a DNN reaching over 90\% accuracy in detecting COVID-19 lesions in CT scans.}, author = {Colonnelli, Iacopo and Cantalupo, Barbara and Spampinato, Concetto and Pennisi, Matteo and Aldinucci, Marco}, booktitle = {ENEA CRESCO in the fight against COVID-19}, doi = {10.5281/zenodo.5151511}, editor = {Francesco Iannone}, publisher = {ENEA}, title = {Bringing AI pipelines onto cloud-HPC: setting a baseline for accuracy of COVID-19 diagnosis}, year = {2021} }
- I. Colonnelli, B. Cantalupo, R. Esposito, M. Pennisi, C. Spampinato, and M. Aldinucci, “HPC Application Cloudification: The StreamFlow Toolkit,” in 12th workshop on parallel programming and run-time management techniques for many-core architectures and 10th workshop on design tools and architectures for multicore embedded computing platforms (parma-ditam 2021), Dagstuhl, Germany, 2021, p. 5:1–5:13. doi:10.4230/OASIcs.PARMA-DITAM.2021.5
[BibTeX] [Abstract]
Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud infrastructure can be offloaded to HPC environments to speed them up. We introduce StreamFlow, a novel Workflow Management System that supports such a design pattern and makes it possible to run the steps of a standard workflow model on independent processing elements with no shared storage. We validated the proposed approach’s effectiveness on the CLAIRE COVID-19 universal pipeline, i.e. a reproducible workflow capable of automating the comparison of (possibly all) state-of-the-art pipelines for the diagnosis of COVID-19 interstitial pneumonia from CT scans images based on Deep Neural Networks (DNNs).
@inproceedings{colonnelli_et_al:OASIcs.PARMA-DITAM.2021.5, abstract = {Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud infrastructure can be offloaded to HPC environments to speed them up. We introduce StreamFlow, a novel Workflow Management System that supports such a design pattern and makes it possible to run the steps of a standard workflow model on independent processing elements with no shared storage. We validated the proposed approach's effectiveness on the CLAIRE COVID-19 universal pipeline, i.e. a reproducible workflow capable of automating the comparison of (possibly all) state-of-the-art pipelines for the diagnosis of COVID-19 interstitial pneumonia from CT scans images based on Deep Neural Networks (DNNs).}, address = {Dagstuhl, Germany}, author = {Colonnelli, Iacopo and Cantalupo, Barbara and Esposito, Roberto and Pennisi, Matteo and Spampinato, Concetto and Aldinucci, Marco}, booktitle = {12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)}, doi = {10.4230/OASIcs.PARMA-DITAM.2021.5}, editor = {Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}}, isbn = {978-3-95977-181-8}, issn = {2190-6807}, pages = {5:1--5:13}, publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik}, series = {Open Access Series in Informatics (OASIcs)}, title = {{HPC Application Cloudification: The StreamFlow Toolkit}}, urn = {urn:nbn:de:0030-drops-136419}, volume = {88}, year = {2021} }
- I. Colonnelli, B. Cantalupo, I. Merelli, and M. Aldinucci, “StreamFlow: cross-breeding cloud with HPC,” IEEE Transactions on Emerging Topics in Computing, vol. 9, iss. 4, p. 1723–1737, 2021. doi:10.1109/TETC.2020.3019202
[BibTeX] [Abstract]
Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single cell transcriptomic data analysis workflow.
@article{20Lstreamflow:tetc, abstract = {Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single cell transcriptomic data analysis workflow.}, author = {Iacopo Colonnelli and Barbara Cantalupo and Ivan Merelli and Marco Aldinucci}, doi = {10.1109/TETC.2020.3019202}, journal = {{IEEE} {T}ransactions on {E}merging {T}opics in {C}omputing}, number = {4}, pages = {1723--1737}, title = {{StreamFlow}: cross-breeding cloud with {HPC}}, volume = {9}, year = {2021} }