The job below is no longer available.

You might also like

in San Francisco, CA

Use left and right arrow keys to navigate

About this job

Job Description

The Analytic Data Engineering team at Gap Inc is responsible for building systems and frameworks to leverage the company’s heterogeneous data systems – coalescing and synthesizing them for the company’s Advanced Analytic and Business Intelligence concerns.

The Senior Data Engineer plays a key role in the group, evaluating and implementing tools and patterns to create consistent high-throughput pipelines for a variety of internal uses and for external business teams.

The role has a specific Analytics focus, centered around BigData technologies, and is tasked with using our Hadoop ecosystem (HDP 2.2) to unify data from our various retail and online systems to provide a cohesive view of the company’s OmniChannel business patterns, implementing systems to provide low-latency high-granularity access to the information by end users, while also creating APIs and access patterns that influence analytic pipelines.

Examples of projects the Senior Data Engineer may be involved in or leading would be:
  • Large scale data ingestion and transforms on Hadoop -- using general Hadoop primitives where necessary, evolving to cascaded pipelines when patterns are recognized.

  • Creating tools, APIs, or domain specific languages that can be deployed to facilitate consistent access to the company’s sales, promotions, inventory, and customer information. Building templates to demonstrate how these can be used across a variety of data domains, including reporting, analysis, and analytic workflows.

  • Implementing, testing, and proving out Big Data tools and patterns for analytic workloads – in-memory architectures, streaming workflows, event-sourced and Lambda data architectures.


Qualification

Requirements:
  • The candidate should have a proven track record working with Big Data tools and constructs. Experience with high-performance in-memory architectures (Spark, Hana) a plus.

  • The candidate should be an expert of relational database technologies, including Teradata, Oracle, and MS SQL Server. Experience with data architecture, metadata management, and semantic schemas. Understanding of traditional ETL tools and concepts, as well as current knowledge of emerging high-throughput data frameworks.


  • Familiarity with relational data warehouse methodologies is desired, but stress is that the job is not to build a relational warehouse, but rather to leverage existing data in high value analytic flows.

  • The user should be highly skilled in data inspection, analysis, cross-referencing, and debugging. As we are synthesizing information across various different data systems – many with different design and keying patterns – the role should be very adept at inferring how they all stitch together.

  • Formal programming experience in Java or an equivalent language preferred. Scala and Python programming a plus.

  • Statistical and Data Science experience a plus, as is experience working with retail problem sets.

  • An inherent ability and willingness to learn new and complex concepts quickly, including platforms, languages, syntaxes, and technologies.

  • Coordination with and/or management of global teams may be required.


The ideal candidate will likely hold a BS/MS in Computer Science and have significant experience working with data systems or on high-throughput server-side programming projects. Degrees in other technical or scientific disciplines also considered.

The emphasis is on working with data at a large scale - terabytes at a minimum. As such, the role will also have to exhibit familiarity with computer architecture to be able to manage tradeoffs between memory, disk, and network efficiencies to execute complex data workstreams in tight SLA environments.