AI Strategy for a 10-Person Business: Where to Start

The owner of a ten-person accounting firm decided it was time to "use AI." She bought three subscriptions, signed up for a chatbot, and asked her team to "try things out." Six months later, two people used one tool occasionally, the other subscriptions went untouched, and no one could say whether anything had improved. The firm had an AI spend but not an AI strategy.

Building a real AI strategy for a small business does not require a dedicated data science team or a six-figure consulting engagement. It requires a deliberate sequence: surface the work that consumes the most repetitive effort, run a focused pilot on exactly one workflow, measure what actually changed, and only then decide whether to expand. That sequence is short enough to execute in a quarter, concrete enough to defend to skeptical team members, and resilient enough to survive the inevitable overhype.


Step 1: Find the Repetitive Work That Is Draining Your Team

Before touching any tool, spend two weeks doing a light time audit. Ask each person on the team to flag tasks they do more than twice a week that feel mechanical — work where they are essentially applying the same judgment or the same format to new inputs every time. Common examples include: drafting outbound emails or proposals from a standard template, summarizing meeting notes or client calls, converting raw data into formatted reports, answering the same intake questions from prospects, and categorizing or routing incoming requests.

The goal is not to build a comprehensive process map. It is to produce a short list — ideally five to eight tasks — ranked by two simple criteria: how much total time does the task consume per week across the whole team, and how well-defined is the output? Tasks that are high on both dimensions are the strongest AI candidates. A task that takes eight hours a week but produces output that looks different every time is harder to automate than a task that takes four hours a week and always produces the same kind of document.

Pay particular attention to tasks that sit at handoff points — work that one person does to pass something to another. These bottlenecks are often invisible because each individual only sees their slice, but the cumulative delay is substantial. A five-minute formatting job that sits in a queue for two days before it gets done is a far better AI candidate than a technically harder task that someone completes immediately.

Once the list exists, apply one more filter: which tasks have clear success criteria? You will need to measure improvement, and that requires knowing what "better" looks like before you start. If the task is "draft a weekly client status email," better means less time spent drafting and fewer revision rounds. If the task is "categorize inbound support tickets," better means consistent categorization and faster first response. Tasks without measurable success criteria are not ready for a pilot — they are ready for more definition.

Step 2: Pick One Workflow and Run a Measured Pilot

With a ranked list in hand, choose one task and commit to it for four to six weeks. The selection rule is not "most impactful" — it is "most likely to produce a clean signal." A clean signal comes from a workflow that is high-volume (enough repetitions to generate meaningful data), well-defined (a person can explain the exact inputs and the expected output in two sentences), and not business-critical enough that a stumble causes serious harm. Customer-facing communications that are reviewed before sending, internal reports, and intake summaries all fit this profile well.

Set up a before baseline before the pilot starts. Measure actual time: have the person responsible track how long the task takes for two weeks pre-pilot, not how long they estimate it takes. Actual time is almost always different from perceived time, and the discrepancy matters when you report results. If the task involves quality — accuracy of categorization, number of revision rounds on a draft — record those numbers too.

Then introduce the tool with a specific operating procedure, not a generic instruction to "experiment." A useful prompt or workflow spec is narrow enough to be repeatable. For document drafting tasks, Claude Sonnet 4 works well as a balanced general-purpose model; for lightweight classification or routing tasks where speed matters more than nuance, Claude Haiku 4.5 offers fast, cost-efficient responses; for complex multi-step reasoning or agentic tasks, Claude Opus 4 is the appropriate tier. The point is not to evaluate models — it is to pick one configuration and hold it constant so you can compare before and after cleanly.

Research from Brynjolfsson, Li, and Raymond (NBER Working Paper 31161, 2023) tracked 5,179 customer support agents given access to an AI assistant. On average, productivity — measured as issues resolved per hour — increased 14%. Notably, the largest gains went to newer, less-experienced workers, who saw a 34% improvement as the AI effectively accelerated their access to the organization's institutional knowledge. For a ten-person firm where experienced staff are stretched thin and newer hires are still climbing the learning curve, that asymmetry is worth planning around deliberately.

After four to six weeks, run the numbers: time on task before vs. after, quality metrics before vs. after, and a qualitative check with the person doing the work. Did the tool reduce cognitive load or create new friction? Would they use it indefinitely if given the choice, or are they relieved to stop? Both the quantitative and qualitative signals matter. A tool that saves three hours a week but demoralizes the person using it is not a win.

Step 3: Decide When and How to Expand

A successful pilot creates an evidence-based case for expanding to a second workflow, not a mandate to automate everything. The evidence from the pilot will typically point in one of three directions: clear win, marginal gain, or the wrong tool for the job. Each warrants a different next step.

A clear win — measurable time savings, maintained or improved quality, and positive user sentiment — justifies running the same workflow with a second or third team member, and then identifying the next candidate from the original ranked list. Anthropic's own guidance on building effective AI systems emphasizes starting with the simplest solution and only adding complexity when necessary. Resist the temptation to string together multiple workflows into an ambitious automated pipeline before the simpler pieces are proven stable. The overhead of managing a fragile multi-step automation often exceeds the savings it produces.

A marginal gain — some time saved but inconsistent quality or user frustration — usually signals that the workflow was the right kind of task but the operating procedure needs tightening. Prompts that are too open-ended produce inconsistent outputs. The fix is usually more specification: provide the model with a concrete output template, two or three examples of ideal results, and explicit constraints. If tightening the procedure does not move the needle after a second attempt, move on to a different workflow rather than over-engineering this one.

The wrong tool for the job shows up when the task requires judgment that cannot be captured in a prompt — nuanced client relationship management, decisions that depend on contextual knowledge the model does not have, or work where the cost of an error is high and review time offsets the savings. These tasks are not AI candidates at this stage. Removing them from the list is not a failure — it is evidence that the scoping process is working.

When expansion is warranted, maintain the same discipline: one workflow at a time, a defined baseline, and a clear success criterion before starting. Organizations that chase broad AI adoption across multiple workflows simultaneously almost always end up with partial implementations everywhere and proven results nowhere. The ten-person business has a structural advantage here: decisions are fast, feedback loops are short, and a single well-run pilot is visible to everyone on the team. That visibility is a forcing function for honest evaluation that larger organizations often lack.

Document what works. A short internal playbook — two pages describing the workflow, the exact prompt or operating procedure, the tool configuration, and the measured results — does two things: it prevents regression when the person running the workflow changes, and it gives the next pilot a starting framework rather than a blank page.

Key Takeaways

  • Start with a two-week time audit across the team to surface repetitive, well-defined tasks before touching any tool.
  • Run a single focused pilot for four to six weeks with a measured baseline — resist the pressure to pilot multiple workflows simultaneously.
  • Match the AI model tier to the task: lightweight tasks benefit from faster, lower-cost options; complex or agentic tasks warrant more capable models.
  • The largest productivity gains from AI tools tend to accrue to less-experienced team members, so design pilots to capture that leverage.
  • Expand only after measuring results from the current workflow — breadth without depth produces AI spend without AI value.

References

  1. Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. "Generative AI at Work." NBER Working Paper 31161, 2023. https://doi.org/10.3386/w31161

  2. Anthropic. "Building Effective Agents." Anthropic Research, 2024. https://www.anthropic.com/research/building-effective-agents

  3. Anthropic. "Claude 4 Model Family." Anthropic News, 2025. https://www.anthropic.com/news/claude-4

Posts in this series