Principal Software Engineer - Data Hub
About the role
About the team
HubSpot’s Data Hub helps RevOps, marketing, sales, and customer teams turn fragmented data into actionable intelligence. We unify data across channels and tools, improve data quality, and activate it inside HubSpot so teams can run AI-powered demand generation, smarter campaigns and sales motions, agentic automations, trustworthy reporting, all without needing to be data experts.
We’re a product engineering team at the intersection of data engineering, ML, applied AI, and go-to-market, and we care as much about reliability, cost, and scale as we do about time-to-value and usability for marketers and sales reps.
About the role
We’re looking for a Principal Software Engineer to lead the next evolution of Data Hub as the backbone for data-driven demand generation.
In this role, you’ll:
- Own core pieces of our data lake and analytics stack (e.g., Iceberg, Spark, batch and streaming pipelines) that power demand gen, segmentation, and scoring at scale.
- Design and evolve data systems that balance cost, latency, data freshness, and reliability, making explicit tradeoffs using concepts like CAP theorem, efficient partitioning, and storage layout.
- Partner closely with PM, product analytics, and GTM leaders to shape commercially meaningful solutions: better lead scoring, funnel visibility, audience building, and campaign attribution for marketers and sales.
- Help make Data Hub an AI‑agent‑forward platform, where curated, evergreen datasets automatically feed AI agents and reporting surfaces rather than requiring manual stitching or ad-hoc pipelines.
Principal Engineers at HubSpot are expected to be hands-on builders, strong partners to product and design, and multipliers for the broader engineering organization.
Key expectations:
Technical skills & domain expertise
- Data engineering & storage: Deep experience building large‑scale data systems with Apache Spark and modern table formats like Apache Iceberg, including efficient partitioning, clustering, and file layout for both heavy ingestion and low‑latency reads.
- Distributed systems & tradeoffs: Applies distributed systems principles and CAP theorem pragmatically to design fault‑tolerant, horizontally scalable services that balance availability, consistency, latency, and cost, where it matters.
- Business outcomes: Can turn ambiguous business goals into clear data models, contracts, and SLAs across multiple storage and compute layers (e.g., Iceberg, warehouses, logs, CRM stores).
As a Principal Engineer, you will:
- Own platform-scale outcomes: Influence technical direction across the Data Hub product line and shape the architecture for unified profiles, segmentation, and datasets that other teams can build on.
- Be a high-leverage, hands-on builder: Write code and build systems while leading end-to-end delivery of high-impact, multi-quarter initiatives, setting standards for reliability, observability, testing, and incident response.
- Lead through architecture and influence: Define reusable patterns for ingestion, transformation, quality, sync, and observability, mentor senior engineers and tech leads.
Software development excellence & AI
- Use AI code agents: Actively use AI-assisted development tools to speed iteration, reduce toil (e.g., scaffolding, tests, refactors), and improve code quality, while defining best practices with the human‑in‑the‑loop approach.
- Champion incremental, outcome-focused delivery: Break down big, ambiguous problems into incremental milestones that deliver value early and often, balancing long-term platform bets with clear business impact (ARR, adoption, usage, efficiency).
- Raise the bar on engineering practice