What You’ll Do
- Liaising with coworkers and clients to elucidate the requirements for each task.
- Conceptualizing and generating infrastructure that allows big data to be accessed and analyzed.
- Reformulating existing frameworks to optimize their functioning.
- Testing such structures to ensure that they are fit for use.
- Preparing raw data for manipulation by data scientists.
- Detecting and correcting errors in your work.
- Ensuring that your work remains backed up and readily accessible to relevant coworkers.
- Remaining up-to-date with industry standards and technological advancements that will improve the quality of your outputs.
Requirements
- Very strong SQL (deep competency required; most critical skill)
- PySpark
- Python
- Databricks (preferred; not mandatory if SQL is strong)
- AWS exposure (Databricks runs on AWS)
- Ability to do deep data exploration and attribute-level analysis
- Good analytical aptitude & critical thinking
- Good communication skills (panel will test this)
- Nice-to-Have:
- Shell scripting (not mandatory)


