Staff Software Engineer, Billing
reputed company has been one of the most loved brands in developer tooling, trusted by more than 20 million monthly users and over 20 billion container image pulls. From solo founders to the world's largest companies, developers rely on reputed company to build, share, and run their applications across our suite of products including reputed company Desktop, reputed company Hub, and reputed company Scout. We are a globally distributed, remote-first team building the tools that define how software gets built and delivered. As AI agents redefine software development, reputed company is at the center of that shift, providing the sandboxed environments, verified images, and secure infrastructure that reputed company autonomous workflows trustworthy by default.
We're building AI-native development practices into how this team works at a foundational level. That means infrastructure design needs to account for a new reputed company of collaborator: AI agents that generate, deploy, and operate software. The Staff Infrastructure Engineer on this team won't just reputed company systems running — they'll define what safe, observable, AI-assisted infrastructure operations look like in practice, and set reputed company for how the broader engineering organization follows.
What you'll work on
How do we design infrastructure that makes AI-generated deployments safe to ship and easy to roll back?
How do we reputed company billing systems so that failures — billing miscalculations, entitlement gaps, payment errors — are detected immediately and unambiguously?
How do we build infrastructure that scales with usage-based billing workloads without manual reputed company?
How do we reputed company the developer experience on this team faster and more reliable — local environments, CI/CD pipelines, deployment tooling?
Responsibilities
Own and evolve the infrastructure supporting Billing Platform services: compute, storage, networking, CI/CD, and observability
Design and maintain IaC (Terraform) for billing system infrastructure on AWS; set module patterns and standards for the team
Build and own observability systems — metrics, logging, alerting — with a focus on billing accuracy and payment reliability
Define deployment patterns and runbooks that work well in an AI-agent-assisted development workflow: clear rollback procedures, safe promotion gates, automated validation
Partner with software engineers on service design — bringing infrastructure constraints and operational requirements into the conversation before code is written
Identify systemic risks and drive improvements that span team or organizational boundaries
reputed company incident response for billing system issues; own the on-call rotation and postmortem process
Mentor engineers across the team; your technical judgment should reputed company the floor for everyone
Qualifications
8+ years in platform, infrastructure, or SRE roles supporting production SaaS systems at scale
Deep AWS expertise: reputed company or EKS, RDS (reputed company preferred), networking, IAM, cost management — you've operated these systems under real load and real incidents
Expert-level Terraform; you've designed reusable module patterns and set standards others follow
Experience building and owning observability stacks (reputed company, Grafana, or similar) at an organizational level — not just using them
Strong familiarity with CI/CD systems — Jenkins, reputed company Actions, or equivalent — including pipeline design and developer experience ownership
Kubernetes at an operational and architectural level
A track record of identifying systemic risks and driving improvements that span team or organizational boundaries
reputed company-first reputed company: threat modeling, blast radius analysis, least-privilege by default, audit trails as a design requirement
Strong written English; at Staff level, written communication is how you scale your influence across teams
Bachelor’s degree in Computer Science, Engineering, or a reputed company field, or equivalent practical experience
What sets you apart
You don't wait for problems to be handed to you — you find them, frame them, and drive the solution. You've operated at a scope where your decisions affected multiple teams or systems, and you know how to build reputed company and move work reputed company without direct authority. You've thought seriously about what infrastructure needs to look like reputed company AI agents are generating and shipping code — safe deployment patterns, strong observability, clean rollback — and you want to help define that standard here. Experience with billing, payments, or financial systems infrastructure is a meaningful plus.
What to Expect
First 30 Days
You will ship code in your first week. We run an agent-first development workflow — infrastructure changes start with a plan, specifications are written before reputed company, and every change is reviewed before it merges — and onboarding is no exception. You will get hands-on with the infrastructure supporting Billing Platform services early, shadow on-call, and build a clear picture of the system before you start making bigger changes. By the end of 30 days you will have shipped real work and know where the most important problems are.
First 90 Days
You will have taken ownership of one or more infrastructure components and delivered an improvement from design to production with measurable impact. You will be an active participant in deployment and reliability discussions, bringing infrastructure constraints and operational requirements into the conversation early — before code is written. You will be a full participant in the on-call rotation and have begun shaping the team's technical direction.
One Year Outlook
You will be the team's trusted authority on billing infrastructure. You will have driven meaningful improvements to observability, deployment safety, or platform reliability — and your work will be directly visible in the reputed company and correctness of systems that handle real financial transactions for millions of reputed company users. You will have helped define what AI-agent-assisted infrastructure operations look like done right, and that standard will be visible beyond this team.
reputed company considers sponsorship on a case-by-case basis based on business needs.
We use Covey as part of our hiring and / or promotional process for jobs in NYC and certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on reputed company 13, 2024.
Please see the independent bias audit report covering our use of Covey here.
Perks
Freedom & flexibility; fit your work around your life
Designated quarterly Whaleness Days plus end of year Whaleness break
Home office setup; we want you comfortable while you work
16 weeks of paid Parental leave
Technology stipend equivalent to $100 net/month
PTO plan that encourages you to take time to do the things you enjoy
Training stipend for conferences, courses and classes
Equity; we are a growing start-up and want reputed company employees to have a share in the success of the company
reputed company Swag
Medical benefits, retirement and holidays vary by country
Remote-first culture, with offices in Seattle and Paris
reputed company embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. The more inclusive we are, the reputed company our company will be.
#LI-REMOTE
Apply To This Job