Able

AI Senior Engineer (Vision)

engineeringfull-timeRemote, LATAM

SALARY

Not listed

WORK TYPE

remote

JOB TYPE

full-time

INDUSTRY

Apply for this position

✦ AutoApply Let us apply to roles like this on your behalf.

Learn more

About the role

What you’ll be doing

We are seeking someone who enjoys working at the cutting edge where Computer Vision meets Logic. You will be responsible for the "eyes" and the "brain" of our system—extracting complex data from visual documents and then orchestrating how that data is used by Large Language Models.

In short, someone who likes:

Unlocking Visual Data: Building pipelines that can "read" complex documents, understanding layout, charts, and visual context using Vision-Language Models (GPT-4V, Claude 3.5) and Layout Analysis.
Orchestrating Intelligence: Owning the application logic layer. You will use LangChain or LangGraph to build the agents and chains that query our data, reason about it, and generate responses.
Native PDF Handling: Handling the messy reality of PDF processing (PyMuPDF, layout parsing) to preserve structure before the AI even sees it.
Prompt Engineering & Logic: Crafting complex prompts and control flows to ensure models interpret financial charts and layouts accurately without hallucinating.
Cost & Scale: Applying a cost-optimization mindset (batch processing, model selection) to ensure our vision and orchestration layers are economically viable.

What we’re looking for

We want to work with people who have a passion for collaborating with their teams, building software while nurturing inclusive and respectful relationships with their coworkers. With the ones that are open about their shortcomings and what they do not know now, but remain eager to keep on growing and closing those gaps.

Ideally, they would also have:

LLM Orchestration (Must Have): Deep experience with LangChain, LangGraph, or similar frameworks. You know how to manage context windows, tool calling, and agentic workflows.
Multimodal AI Experience: Hands-on experience integrating state-of-the-art vision models (GPT-4V, Claude 3.5 Sonnet) and embedding models (CLIP).
Document Intelligence Specialist: Familiarity with specialized models (e.g., Donut, Pix2Struct) and tools like Unstructured.io or Docling.
PDF Processing Mastery: Mastery over tools like PyMuPDF or pdfplumber for native element extraction.
Python ML Stack: Strong proficiency in PyTorch or TensorFlow.

Nice-to-Have:

Fine-Tuning: Experience fine-tuning vision or language models, specifically to improve accuracy on domain-specific artifacts like financial charts or tables.
Domain Knowledge: Prior experience handling documents in the Real Estate domain.

✦ Let us apply for you

We find roles like this and apply on your behalf. Cover letter written for each one. Plans from $15/mo. Cancel anytime.

Get AutoApply

Apply now