About the Company:
Our client has been recognized as one of the 100 Best Companies to Work For by Fortune Magazine. They earned this recognition for their focus on being a great place to work and delivering value for their customers and clients. They were also recognized as the only financial services company on Fortune’s inaugural “Best Big Companies to Work For” list, which recognized seven companies with more than 100,000 U.S.-based employees that passed the Great Place to Work Certification bar.
The Big Data Hadoop Architect / Lead position will be part of the Insight Core Hadoop platform team within Global Banking and Markets. The role is expected to lead deliverables around platform design & configuration, capacity planning, incident management, monitoring, and business continuity planning
Responsible for developing, enhancing, modifying and/or maintaining a multi–tenant big data platform •Functionally lead a team of developers located on and off shore and collaborate with Product Owners, Quants, and other technology teams to deliver data/applications/tools
Work closely with the Business Stakeholders, Management Team, Development Teams, Infrastructure Management and support partners
Use your in-depth knowledge of development tools and languages towards design and development of applications to meet complex business requirements
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc
Design and implement scalable data platforms for our customer facing services
Deploy and scale Hadoop infrastructure
Hadoop / HDFS maintenance and operations
Data cluster monitoring and troubleshooting
Hadoop capacity planning
OS integration and application installation
Partner with program management, network engineering, site reliability operations, and other related groups
Willingness to participate in a 24x7 on-call rotation for escalations
Bachelor’s Degree in Information/Computer Science or related field OR equivalent professional experience
Deep understanding of UNIX and network fundamentals
Expertise with Hadoop and its ecosystem Hive, Pig, Spark, HDFS, HBase, Oozie, Sqoop, Flume, Zookeeper, Kerberos, Sentry, Impala etc.
Experience designing multi-tenant, containerized Hadoop architectures for memory/CPU management/sharing across different LOBs
5+ years managing clustered services, secure distributed systems, production data stores
3+ years of experience administering and operating Hadoop clusters
Cloudera CHD4 /CDH5 cluster management and capacity planning experience
Ability to rapidly learn new software languages, frameworks and APIs quickly
Experience scripting for automation and config management (Chef, Puppet)
Multi-datacenter, multi-tenant deployment experience, a plus
Strong troubleshooting skills with exposure to large scale production systems
Hands on development experience and high proficiency in Java / Python
Skilled in data analysis, profiling, data quality and processing to create visualizations
Experience working with Agile Methodology
Good SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases (Hive, Impala, Kudu a plus).
Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Strong analytic skills related to working with unstructured datasets.
Build processes supporting data transformation, data structures, metadata, dependency and workload management.
Experience supporting and working with cross-functional teams in a dynamic environment.