Description:Infrastructure Engineering & Lab Operations is engaged in the design and deployment of High-Performance Computing (HPC) platforms used for Machine Learning (ML), Artificial Intelligence (AI) research, Monte Carlo simulation, and Data Analysis/Data Analytics to accelerate the development of applications for government customer. We have a passion for excellence that is reflected in both the quality of our products and the services we provide our customers. Infrastructure Engineering & Lab Operations has successfully worked with HPC vendors that include Penguin Computing, Dell, Nvidia, Intel, Cisco, Univa Veritas and Mellanox to deliver:
• Linux based Beowulf clusters ranging from hundreds to thousands of CPU cores
• Tesla V100 & P100 GP-GPU enabled compute systems for Deep learning / Machine Learning and Scientific Applications
• Petabytes of storage
• High-Bandwidth Low Latency network
• Cloud & Hyper-Converged Computing
As an HPC Systems Administrator/Engineer you will join a development team focused on infrastructure development and collaboration with industry and government partners. This position provides the opportunity to define, implement, productize, and deliver high performance computing resources used for developing a wide-range of Department of Defense (DoD) applications. Interested candidates should have Cyber-Security/IA experience maintaining, supporting and applying vulnerability Scan/STIGs to HPC platforms in classified environment and have worked in DevOps or DevSecOps settings. Additionally, this individual will serve as a trusted advisor, technical leader, and HPC subject matter expert for the organization and drive future growth capabilities from existing engagements.
Candidate is expected to demonstrate extensive knowledge and experience of Linux operating systems (RHEL), workload management systems, parallel file systems, networking and security.
Direct experience and demonstrated proficiency with multiple programming and scripting languages (e.g., Perl, Python, C, Fortran, etc) preferred
Ability to maintain system software, utilizing debugging tools for problem isolation; will perform software builds, software upgrades, and patch installation as needed
Possess the organizational and analytical skills needed to effectively isolate both hardware and software problems and drive solutions through to conclusion
Excellent interpersonal, customer relations and problem management skills, with the ability to stay calm and professional under pressure while working to strict deadlines
Experience with project planning and management, process management, and team or project leadership preferred. Demonstrated ability to clearly document processes and procedures with a focus toward mentoring and knowledge sharing along with very good communication skills, both verbal and written
Infrastructure Engineering & Lab Operations is currently seeking HPC Professional with Linux Systems Administration experience and a programming background to join our Development Systems Integration Group (DSI).
The Infrastructure Engineering & Lab Operations organization designs computing platforms, adapt new methodologies, and creates tools/scripts for improving the HPC User Experience. This team also designs, integrates and supports HPC cluster operations as well as maintain Cybersecurity posture by implementing Risk Management Framework (RMF). Our HPC clusters incorporate Intel Broadwell and Skylake processors, NVIDIA Tesla GP-GPUs, parallel and clustered tiered storage, Univa distributed resource management, Fibre channel, InfiniBand, Giga-bit Ethernet switches, accelerated and Red Hat Enterprise Linux (RHEL) system software.
Candidate must be able to obtain & maintain a DoD Secret Security Clearance.
• Bachelor's degree in Computer Science, Data Science, Engineering or related fields with scientific computing experience Master's degree highly desired
• 4+ years' experience in IT with at least 2 of those in fast moving, heavily technology dependent startup environment
• 4+ years of Red Hat Enterprise Linux (RHEL) administration
• 3+ years of HPC administration in a product engineering environment
• 3+ years' experience in Clustered file systems [such as Vertitas] or experience w/parallel file systems [such as Lustre]
• High level of hands-on experience in managing, architecting and administering large CPU and GP-GPU based HPC's platforms
• Experience in a design engineering environment working on tight schedules
• Experience performing Security patching across multiple Unix platforms
• Experience with hypervisors and virtualization technology such as KVM, Xen or ESX
• Experience deploying Cloud systems such as OpenStack, OpenShift, Nutanix or Eucalyptus within a DevOps or DevSecOps environment
• Experience with identity management technology such as Active Directory, IDM and LDAP
• Experience configuring, installing and troubleshooting Univa GRID Engine (preferred) or other job schedulers/resource managers.
• Experience configuring and managing network-attached storage systems, such as RAID arrays or ZFS pools, high speed disk/SSD/NVMe systems, and storage networks
• Experience creating Linux kickstart image to support the efficient deployment system buildouts
• Demonstrated ability to manage the full stack (data-center rack equipment, server hardware, OS, network, and security) of multi-tenant Linux-based systems both individually and within a team environment
• Excellent collaboration and team-oriented skills as well as exceptional oral and written communication skills
• Experience scripting and automating tasks, using tools such as Python, Perl, and Bash
• Candidate must be able to obtain & maintain a DoD Secret Security Clearance.
• Master's or PhD degree in Computer Science, Data Science, Engineering or related fields with scientific computing experience
• Developing, optimizing, compiling, implementing, and testing multithreaded, multiprocessor performance-oriented software with Message Passing Interface, OpenMP, CUDA or other parallel processing frameworks
• Electromagnetics, fluid dynamics, multi-physics Finite Element Analysis, Monte Carlo Analysis, generative design, control theory, optimization, directed energy and/or other physics-based modeling and simulation
• Artificial Intelligence technologies, such as general machine learning algorithms and neural networks in a parallel HPC environment
• Interface, configure, and optimization of HPC technologies such as parallel/distributed files systems [Lustre], high speed interconnect fabrics [Infiniband], and HPC batch scheduling software [Univa]
• Advance knowledge of RHEL including Secure Linux [SELinux] as well as Multi-Level Security (MLS) tagging or labeling technologies.
• Understand complex engineering and modeling principles to effectively communicate with user community, understand their work, and translate requirements into HPC solutions
• Experience with systems automation tools such as Ansible or Puppet
• Hold a current Cybersecurity certification in Security+ and/or CISSP
Lockheed Martin is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status. Join us at Lockheed Martin, where your mission is ours. Our customers tackle the hardest missions. Those that demand extraordinary amounts of courage, resilience and precision. They're dangerous. Critical. Sometimes they even provide an opportunity to change the world and save lives. Those are the missions we care about.
As a leading technology innovation company, Lockheed Martin's vast team works with partners around the world to bring proven performance to our customers' toughest challenges. Lockheed Martin has employees based in many states throughout the U.S., and Internationally, with business locations in many nations and territories. Experience Level: Hourly/Non-Exempt Business Unit: RMS Relocation Available: No Career Area: Information Technology Clearance Level: Secret Type: Full-Time Virtual Location: no Work Schedule: TEMPO: 9X80A - Standard Fri to Fri (Flex & Rigid) Shift: First