My AI-powered data analysis workflow (no more writing pandas boilerplate)
How I use Claude to analyze sales data, catch outliers, and generate charts — without fighting with matplotlib every time.
Every data scientist I know has a graveyard of half-finished Jupyter notebooks for one-off analyses. Client sends a CSV, you spend 45 minutes writing the same pandas boilerplate you've written a hundred times, you get your answer, and the notebook sits there forever unused.
I stopped doing this about two months ago. Here's my current workflow.
The old way
Client: "Can you look at this sales data and tell me why Q3 was bad?"
Me: Open Jupyter, load the CSV, check dtypes, fix the date columns (always the date columns), group by quarter, realize they have some NaN revenue rows from cancelled orders, decide how to handle those, generate the chart, realize the chart is ugly, fix the chart, write the summary.
Time: 1.5 hours minimum.
The new way
I upload the CSV. I write: "Analyze this sales data. Q3 looks bad — find out why. Check for data quality issues first, then look at the numbers."
Time: 8 minutes.
What I get back is actually better than what I used to write because Claude will catch things I'd miss when I'm moving fast. Last week it noticed that the "Q3 slump" a client was panicking about was entirely explained by a single large customer who always pays in October (the invoice dates were in Q3, payment dates in Q4 — they had a measurement problem, not a revenue problem).
I would not have caught that in my first pass. I was already mentally drafting the "your marketing spend efficiency dropped" email.
What I actually upload and ask
The prompts that work best for me:
For initial exploration:
Analyze this sales CSV. Tell me:
1. Data quality issues (nulls, duplicates, weird values)
2. Key metrics: total revenue, average order value, top 10 customers
3. Trends: month over month for the last 6 months
4. Anything that looks anomalous
Include Python code for any charts you recommend.For specific questions:
In this dataset, why did customer acquisition cost go up in March?
Look at spend by channel, conversion rates, and any seasonality patterns.For cohort analysis:
Build a cohort retention table by signup month.
The 'signup_date' and 'last_purchase_date' columns are in MM/DD/YYYY format.The model matters for this
For data work I've found Claude Sonnet 4.6 meaningfully better than GPT-4o. Not in every case — for simple calculations GPT is fine — but for reasoning about why a number is what it is, Claude tends to think through it more carefully.
The auto-routing usually sends data questions to Claude which is the right call.
What still needs my judgment
The AI is good at finding patterns. It's not good at knowing which patterns matter to a specific business. When it tells me "marketing spend in Channel A has a 0.73 correlation with new customer acquisition," I still have to decide if that's causal or if both are correlated to a third thing (usually seasonality).
I also don't let clients see the AI output directly. I take the analysis, check the numbers myself, and rewrite the narrative. The structure is AI's, the judgment calls are mine.
Practical stuff
Files over ~10MB start getting slower. For really large datasets I'll sample first (10k rows) to iterate on my questions, then run on the full dataset once I know what I'm looking for.
The "include Python code" instruction is important — it gives me reproducible code I can hand off or put in a notebook if the client needs updates later.
If you're a data analyst spending more than an hour on routine CSV analysis, try this for a week. The time savings are real.