The University of Texas at Dallas’ Cyberinfrastructure & Research Services Department was established in 2015 to support research and teaching on campus by offering a suite of cyberinfrastructure services to faculty, staff, and students.
Our mission is to facilitate research and education on campus by providing computing resources, support and training that are synergistic with national resources. The UTD CIRC department provides free services to faculty, staff and students that include:
- Access to High Performance Computing (HPC) resources
- Full system lifecycle support including design, purchase, installation, administration and consulting
- Software consulting including installation and debugging / optimization support
- Workflow optimization and parallelization assistance
- Cloud computing support for commercial cloud services
CIRC’s primary resource is Ganymede, a heterogeneous local High Performance Computing (HPC) cluster. Ganymede is ~5000 core cluster based on CentOS and OpenHPC. It has a 10 Gigabit Ethernet network and a QDR (40 Gbps) InfiniBand interconnect configured in a fat-tree topology. It has two distributed file systems, one for the home directories over the 10 Gigabit Ethernet network available via NFS and one for the work file system that is a 1.2 PB high-performance parallel file system (GPFS) accessible over the InfiniBand network. The GPFS file system uses Data Direct Networks (DDN) directly attached to InfiniBand storage for the back end hardware. Compute nodes are all dual processor, many core systems with a variety of Intel architectures including Cascade Lake, Sky Lake, Sandy Bridge, Haswell and Broadwell. Additionally, several nodes in the system have Nvidia Tesla accelerators based on the Nvidia Pascal and Volta architectures.
In addition to Ganymede, CIRC provides system support for ~10 (FIXME) other small clusters for individual research groups as well as several research group file and compute servers. Also, we have recently started providing support for linux desktops. Finally, in addition to support for on premises Linux clusters, servers, and workstations, we also provide full support for cloud computing resources including database and website support via Amazon Web Services (AWS).
The Texas Advanced Computational Center (TACC) is an XSEDE (Extreme Science and Engineering Discovery Environment) funded NSF center that provides computational resources for our national scientists and researchers. Additionally, it provides dedicated resources for University of Texas System schools. Through partnerships with the XSEDE program and TACC specifically, CIRC is able to assist our campus researchers in getting access and compute time on TACC systems.
The first system we suggest to our users at TACC is Lonestar5 (LS5). LS5 is a system made available to the UT System schools through the University of Texas Research Cyberinfrastructure initiative. LS5 is a Cray XC40 system featuring 1252 Cray XC40 compute nodes, each with two 12-core Intel® Xeon® processing cores for a total of 30,048 compute cores, 2 large memory compute nodes, each with 1TB memory, 8 large memory compute nodes, each with 512GB memory, 16 Nodes with NVIDIA K-40 GPUs, and a 5 petabyte DataDirect Networks storage system. If your workloads are modest in size and use traditional CPUs for computation, LS5 might be the perfect fit for your research.
If, however, your work requires larger computational resources, another option is Stampede2. Stampede2 is an NSF-funded system featuring 4,200 Intel Knights Landing nodes, each with 68 cores, 96GB of DDR RAM, and 16GB of high speed MCDRAM, 1,736 Intel Xeon Skylake nodes, each with 48 cores and 192GB of RAM, 100 Gb/sec Intel Omni-Path network with a fat tree topology employing six core switches, and two dedicated high performance Lustre file systems with a storage capacity of 31PB. Note, in order to be able to run large jobs on Stampede2, users will usually be required to submit a scaling study as part of the allocation request. Thus, a good place to start is LS5 in order to demonstrate need for Stampede2.
For our deep learning users, TACC has a dedicated resource, Maverick2. Maverick2 is only for machine learning workloads and features 23 nodes with 4 x Nvidia 1080 TI GPUs, 4 nodes with 2 x Nvidia V100 cards and 3 nodes with 2 x Nvidia P100 cards.
In order to get access to any TACC resources, contact [email protected] to discuss your research and computing needs.