10 Things to Know about New TACC Resources for Texas Researchers

Lonestar6, Corral4 and Longhorn power research across the state

    None

    At the quarterly TACCSTER seminar in February 2021, TACC staff members Joe Allen, Nick Thorne, and Chris Jordan presented an overview of three new systems at TACC.

    Before we jump into the secret powers of the latest advanced computing systems at the Texas Advanced Computing Center (TACC), let's start with the basics.

    Through the University of Texas Research Cyberinfrastructure (UTRC) program, scholars and students at more than 30 Texas universities and research centers use computing resources and services at TACC that are among the most powerful in the world. Thousands of researchers use TACC to conduct groundbreaking science in areas such as wind energy, nanomaterials, and health.

    The University of Texas System launched the program in 2009, and it was later joined by Texas A&M System, Texas Tech, and the University of North Texas System.

    These institutions, as well as some long-standing academic research partners, contributed to the purchase and operations of TACC's latest machines, Lonestar6 and Corral4, which went into operation in late 2021. As part of the latest UTRC funding, Texas researchers can now also use Longhorn, an IBM/NVIDIA GPU system deployed in 2019.

    What follows are the basics about Lonestar6, Corral, and Longhorn — as well as 10 features and capabilities that you may not be aware of.

    Lonestar6

    The new Lonestar6 system will support Texas university researchers and students.

    Lonestar6 is UTRC's primary compute resource. Comprised of more than 800 Dell EMC PowerEdge C6525 servers with 3rd Generation AMD EPYC processors, it can perform about three quadrillion mathematical operations per second — or, in high performance computing terminology, three petaFLOPs of peak performance.

    BUT DID YOU KNOW…?

    1.

    The bulk of the system (70%) is housed in four immersion cooling tanks from Green Revolution Cooling (GRC), providing greater density than could be achieved otherwise. (The remainder of the system is contained in 10 air-cooled racks nearby.)

    Each tank contains 21 2U chassis submerged in mineral oil with heat exchangers keeping the components and oil cool. The servers literally operate in the oil! It's wild.

    TACC's previous experiences with GRC has shown that immersion cooling can provide improvements in power efficiency with possible benefits to failure rates of components.

    Cranes are used to lift the servers out of the mineral oil when needed. (And paper towels are available to keep administrators' hands clean.)

    2.

    Lonestar6 represents a departure for TACC in a number of ways. It is the first TACC system in more than a decade (since Ranger, circa 2008) to use AMD processors. Likewise, it is the first TACC system to deploy a file system other than Lustre — in Lonestar6's case, BeeGFS; and the first to use the new Rocky Linux open-source operating system.

    These departures show TACC's openness to new technologies, and allow Lonestar6 to serve as a testbed for future leadership-class systems at TACC.

    3.

    At present, Lonestar6 provides a small subset of nodes that have two Nvidia A-100 GPUs in them. Depending on demand and usage patterns, other GPU configurations are possible, most likely more GPU nodes with fewer GPUs per node (32 x 1 GPU instead of 16 x 2 GPU). Some experimental subsystems are also being tested with the intention of providing more variations of possible node configurations for both GPU count as well as memory per node.

    TACC administrators monitor usage patterns on other TACC GPU systems to determine how they will ultimately set up Lonestar6 to best facilitate the community of researchers.

    Corral4

    Corral4 is TACC's primary high-performance data storage and archival system.

    Corral4 is TACC's primary high-performance data storage and archival system. An IBM Elastic Storage System 5000, Corral4 contains more than 40 petabytes (PBs) of raw capacity (just under 40PB of useable storage) and can support eight billion files.

    BUT DID YOU KNOW…?

    4.

    Corral4 is intended for high-value data collections, including those that may have cost a lot to generate or have significant shared use value. One example: data from the Arecibo Observatory, which collapsed in 2021.

    In the wake of the disaster, five decades of irreplaceable digital records were relayed to TACC to store indefinitely and make available to the astronomy community.

    Corral4 also houses earthquake and hurricane data for the Natural Hazards Engineering Research Infrastructure project; genomic data from the Galaxy bioinformatics platform; millions of specimen and observation data from the Arctos community…and more than 200 other collections.

    5.

    The original Corral was installed in 2009 and data from that machine still resides on the newly-deployed Corral4 system. In other words, the system is continuous — if you stored data in 2009, you can still retrieve your data today and use it in the same way.

    6.

    Fundamentally, Corral4 is a big file system with a lot of storage, but it's also service oriented. The system uses SSH and GridFTP for high-speed data transfer. It provides NFS access to data, with hundreds of systems connected at any given time, and has a web publication mechanism for open data.

    Corral4 is available to anyone through a TACC allocation. University of Texas researchers can use up to 5TB of storage at no cost.

    7.

    Recently, TACC added protected data services to the capabilities available on Corral4 and other systems.

    A portion of Corral4 is reserved for protected data, with a full NIST compliance stack, regular audits, at-rest encryption, and special policies that enable safe and secure analysis. This capability is allowing researchers to apply high performance computing and analytics to COVID-19 data, maternal mortality records, and detailed investigation of chronic pain.

    (Information on how to apply for this special category of Corral4 allocation can be found at: https://www.tacc.utexas.edu/protected-data-service.)

    Longhorn

    Longhorn is a TACC resource built to support GPU-powered, double-precision machine learning and deep learning workload.

    Longhorn is a TACC resource built in partnership with IBM to support GPU-powered, double-precision machine learning and deep learning workloads, as well as general purpose GPU calculations.

    Originally only available to researchers with allocation on TACC's Frontera supercomputer, UTRC funding in 2021 opened up the powerful machine to Texas academic researchers.

    BUT DID YOU KNOW…?

    8.

    Machine and deep learning on Longhorn are two of the system's main use cases. TACC maintains most of the latest ML/DL packages and APIs on Longhorn in custom, optimized Conda environments available through the module system.

    9.

    Demand for GUI apps is exploding among HPC users. This includes RStudio, Jupyter Notebooks, and Microsoft Visual Studio Code. RStudio and Jupyter Notebooks are available to launch on Longhorn compute nodes, then connected to through a web browser. Visual Studio Code can be run directly on a Longhorn login node, giving researchers a familiar and powerful interface to their files and code.

    10.

    Do you need to scale your work beyond the normal queue limits or do you have massive data requirements? Longhorn may be the system for you. TACC admins are eager to work with researchers to help enable their larger-than-normal science endeavors. Reach out to them through the ticket system.

    To kick the tires and apply any of these computing resources to your research, visit: the University of Texas Research Cyberinfrastructure Portal to apply for time.

    Also consider incorporating HPC into your curriculum. Visit https://tacc.github.io/TeachingWithTACC/ to get started.