Big Data Engineer
  • Experience: 3 - 6 Yrs
  • Location: Pune, New York, San Francisco

Company Summary

At Cuelogic, we develop Software, Intelligence and IoT systems for leading edge startups across the globe as well as with Fortune 500 enterprises looking to compete with unicorns.

At Cuelogic learning is a culture, we don’t force our engineers to take training instead we encourage them to learn, unlearn and relearn everyday. Our culture creates full-stack engineers, that can contribute above and beyond the usual.

Technology is changing everyday. Cuelogic funds dedicated COE team that keeps track of all technology advancements and integrate those within the organization.

This COE teams acts as a support ecosystem to the dedicated developers working on respective products as their technology mentor-buddies -partners. So whenever any developer is blocked or needs help, explore technology options, or is simply indisposed, our COE team steps in and works with that team to provide a 360-degree perspective.

Join us and kick start a professional journey like no other.

Technical/ Process Skills

  • Work with Apache Spark, HDFS, AWS EMR, Spark Streaming, GraphX, MlLib, Cassandra, Elasticsearch, Yarn, Hadoop, Hive, AWS Cloud services, SQL.
  • Be working with Machine learning / Deep learning libraries ( MlLib, Tensorflow, PyTorch) to implement solutions that solves or automates real world tasks like .. prediction, image processing, object detection, Natural language processing, anomaly detection, text to speech and many more.
  • Be building smart models that can be used in edge devices like IoT devices to perform edge computing and provide smart predictions locally.
  • Design, implement and automate deployment of distributed system for collecting and processing large data-sources.
  • Write ETL and ELT jobs and Spark/Hadoop jobs to perform computation on large scale datasets.
  • Design streaming applications using Apache Spark, Apache Kafka for real time computations.
  • Design complex data models and schemas for structured and semi structured datasets in SQL and NoSQL environments.
  • Deploy and test solutions on cloud platforms like Amazon EMR, Google Dataproc, Google Cloud Dataflow etc.
  • Explore and analyze data using various visualization tools like Tableau, Qlik etc.
  • Write unit tests , perform code reviews and collaborate with team to implement best coding practices.
  • Explore various big data technologies to design new product architectures and POC’s for same.
  • Proficiency with any one of the following Scala, Java and Python.
  • Minimum 1-2 years of experience in Apache Spark.
  • Experience in working with Streaming Environments (Spark Streaming / Flink)
  • Experience in Hadoop ecosystem ( Hadoop MR,HDFS, Pig, SQOOP, Impala, Hive, Presto)
  • Good experience of using Spark and Hadoop frameworks on Amazon EMR.
  • Strong knowledge of data modelling and design principles in SQL and NoSQL environments.
  • Strong experience in working and building ELT and ETL pipelines and their components.
  • Experience or familiarity with visualisation tools like Tableau, Qlik or Grafana.
  • Strong experience in developing REST API and consuming data from external web API’s.
  • Comfortable with source control system (Github) and linux  environments.

Added Advantage​ :-

  • Experience with any Machine Learning / Deep Learning platforms ( Spark ML, Scikit-learn, DL4J, Tensorflow)
  • Experience with processing text using natural language processing libraries like Core NLP, Open NLP, Spacy.
  • Experience with interactive notebooks like Jupyter, Zeppelin etc.
  • Experience with infrastructure tools like Docker, Kubernetes, Mesos.
  • Experience with graph databases

Apply for this job