Learning To Use Essential Tools


This project aims to research and development of a dynamic, reconfigurable tool to bridge users and remote computing resources to facilitate educational tools for data science. With this tool, curriculums and educational materials on essential tools for data science will be developed and disseminated.


Business, government, and science researchers are producing massive amounts of complex data. The availability of these huge datasets fuels a need for both data-driven analytics and a 21st-century workforce that can use data analytics to answer questions and solve problems. This collaborative project will develop a cloud-based virtual tool to train undergraduate students how to use software tools essential to data science. The platform will make state-of-the-art computing resources, including both powerful data analysis tools and parallel hardware systems, more accessible to students and faculty, even if they are at institutions without locally available high-power computing systems. The project aims to help students develop critical workforce skills in data science. The project will also provide professional development opportunities to help faculty use data-analysis tools in their courses and research. The output of this project includes a cloud-based infrastructure in the form of a virtual science platform with related training modules. It will leverage an existing framework for building web applications to provide broad access to open source, high performance computing resources at the collaborating universities and through the NSF Extreme Science and Engineering Discovery Environment. The cloud-based platform will support both training of students and collaboration among students. The project will produce a data science curriculum targeted to undergraduate students. The curriculum will also be suitable for graduate students, post-doctoral researchers, and information technology professionals interested in data science. The project will deliver a full set of interactive documents and video tutorials on using and configuring the platform. The educational activities will use graphical, interactive, simulation-based, and experiential learning components to teach data science concepts and computing skills, accessed through the cloud-based platform.


Through the project, students will have the opportunity to learn how to use powerful data science resources, enabling their potential to transform data-rich computer science and engineering problems into practical solutions. The project will deliver professional development for faculty at multiple institutions, to help them learn how to use data science in their classrooms and their own research. This project addresses national interests by making state-of-the-art computing resources more accessible to students, supporting their development of critical workforce skills.


Weijia Xu
Manager, Data Mining And Statistics

Ruizhu Huang
Research Associate

Rosie Gomez
Education & Outreach Manager


W. Xu, R Huang, Y. Wang A framework for reconfigurable web service on high performance computing resources, submitted.

Y. Wang, R. Huang, W. Xu On enabling social credential with traditional high performance computing resources, submitted.

Funding Source

National Science Foundation (#1726816, #1726532)