hero image

Databricks Data Architect

Description

Our client is building a unified Master Data Management platform that consolidates data from many applications into a single source of truth. On top of this platform an AI layer is being built - either as a separate layer over the unified data, or embedded directly into the ETL processes to deliver clean, connected, and enriched data. The project domain is Occupational Health & Safety, Incident Management, Risk Management, and global regulatory frameworks. The platform must provide trustworthy, connected, and explainable data for incident management, risk assessment, and compliance with global regulatory requirements. This is a hands-on architecture role. The person will own the platform design and also build reference pipelines and data models, additionally define AI/RAG patterns together with the engineering team. Tech stack: Databricks (Spark, PySpark, SQL, Delta Lake, Unity Catalog, Lakeflow), ETL/ELT, knowledge graphs / GraphFrames, semantic layer (Unity Catalog metric views), AI Search / Vector Search, RAG / LLM tooling, cloud infrastructure (AWS / Azure / GCP).

office remotePoland

Requirements

  • Strong production experience with Databricks Lakehouse architecture, including Spark, PySpark, SQL, Delta Lake, Unity Catalog, and workflow orchestration
  • Hands-on experience designing and building ETL/ELT pipelines for batch and incremental ingestion, cleansing, normalization, deduplication, and enrichment
  • Practical experience with MDM: golden records, survivorship/merge rules, trust ranking, identity resolution, duplicate detection, SCD, and exception workflows
  • Strong data modeling skills for analytical, operational, and semantic consumption patterns
  • Experience designing a semantic layer with shared business definitions, governed metrics, reusable dimensions, and consistent entity definitions
  • Experience with data quality and observability: pipeline SLAs, schema drift, CDC, data contracts, dead-letter handling, and source-to-master reconciliation
  • Experience implementing data governance and security: Unity Catalog lineage, RBAC/ABAC, row/column-level security, PII handling, and regulatory traceability
  • Ability to translate business requirements from product, compliance, and engineering stakeholders into scalable data architecture

Nice to have

  • Experience with Databricks Lakeflow Connect, Lakeflow Spark Declarative Pipelines, and Lakeflow Jobs
  • Experience with Unity Catalog metric views or comparable semantic-layer technologies
  • Experience with knowledge graphs, graph analytics (e.g. GraphFrames), or graph-based entity resolution - linking people, organizations, locations, incidents, hazards, controls, regulations, assets, and corrective actions
  • Experience building AI/RAG solutions over enterprise data using AI Search / Vector Search, embeddings, metadata filtering, retrieval evaluation, and source-grounded generation with citations
  • Experience with ML-based data enrichment, classification, anomaly detection, or entity matching
  • Experience in regulated domains such as occupational health and safety, incident management, risk, compliance, ESG, insurance, healthcare, or industrial operations

Responsibilities

  • Own the end-to-end architecture of a Databricks-based MDM platform for occupational health, safety, incident, risk, and regulatory data
  • Design ingestion and transformation patterns using Databricks, Spark, PySpark, SQL, Delta Lake, Unity Catalog, and Lakeflow where appropriate
  • Define canonical data models, golden-record logic, entity-resolution rules, and survivorship strategies across heterogeneous source systems
  • Build a semantic layer that provides consistent definitions for incidents, organizations, locations, hazards, controls, risks, regulations, corrective actions, and compliance metrics
  • Design graph-based relationship models for linking entities across systems and enriching downstream analytics and AI use cases
  • Architect AI/RAG capabilities for semantic search, regulatory lookup, incident enrichment, data validation, and source-grounded answers over governed enterprise data
  • Embed data quality, lineage, governance, access control, auditability, and monitoring into the platform from the start
  • Partner with product, engineering, compliance, and analytics teams to convert domain requirements into scalable architecture and implementation patterns

We offer

  • Projects for such clients as PayPal, Wargaming, Xerox, Philips, Adidas and Toyota
  • Competitive compensation that depends on your qualification and skills
  • Career development system with clear skill qualifications
  • Flexible working hours aligned to your schedule
  • Options to work remotely
  • Corporate medical insurance covering services of private and public medical centers
  • English courses online
  • Corporate parties and events for employees and their children
  • Internal conferences, workshops and meetups for learning and experience sharing
  • Gym membership compensation
  • 5 days of paid sick leave per year with no obligation to submit a sick-leave certificate

Any questions?

Apply for

Apply for

Databricks Data Architect

Apply by filling in the form beside or sending your CV to hh@itransition.com

By clicking the button Agree & send I give my consent to Itransition Group to process my personal data in accordance with Recruitment Privacy Statement for the purpose of potential employment, internship and future career opportunities.

The total size of attachments should not exceed 10 MB.

Allowed types:

jpg

jpeg

png

gif

doc

docx

ppt

pptx

pdf

txt

rtf

odt

ods

odg

odp

xls

xlsx

xlxs

vcf

vcard

key

rar

zip

7z

gz

gzip

tar