Overview
This guide will help you craft effective prompts for processing large datasets at scale. Unlike one-off AI interactions, scale-based prompting requires special consideration because your prompt will be executed independently across thousands or hundreds of thousands of rows of data.
Key principle: Every prompt runs in isolation on each row. The AI doesn't know about other rows, can't learn from previous examples, and must make consistent decisions based solely on your instructions and the data provided in that single row.
Before You Start: Essential Planning
Before diving into prompt engineering, it's crucial to properly plan your approach. The steps below will help ensure your prompts are effective and your results are actionable.
📋 Before you write your first prompt, work through our requirements planning guide to define your goals, assess your data, and set up your success criteria.
The planning process involves four key areas:
The Scale-Based Prompting Framework
Basic Structure
Every prompt follows this three-part structure:
# System Message
This is where you define the AI's role, provide context about your business,
and give detailed instructions about the task.
# Prompt
This is where you insert the actual data from your warehouse.
Format it clearly using XML tags like <message>...</message>
# Output Specification
Define the exact format you want back (string, boolean, JSON object, etc.)
Why This Structure Works
- System Message: Sets context and provides comprehensive instructions
- Prompt Section: Contains the variable data for each row
- Output Spec: Ensures consistent, structured responses across all rows
Writing Effective System Messages
1. Establish Clear Context
Start by explaining who the AI is and what company/domain it's working in:
You are a customer experience analyst for a meal delivery service company.
This immediately gives the AI relevant domain knowledge and helps it make appropriate business decisions.
2. Explain the Business Context
Don't assume the AI understands your business. Provide relevant background:
The company wants to analyze customer feedback messages to identify when
customers mention dietary restrictions or food allergies, so the customer
service team can proactively suggest appropriate menu options in future
interactions.
3. Define the Specific Task
Be extremely specific about what you want the AI to do:
Your task is to determine if a customer is mentioning a dietary restriction
or food allergy, and if so, extract the specific dietary information mentioned.
4. Handle Edge Cases Explicitly
Since you can't manually review every row, anticipate edge cases in your instructions:
- Only flag messages that explicitly mention restrictions like vegetarian,
gluten-free, dairy-free, nut allergies, etc.
- Do not flag general food preferences or complaints about taste
- If multiple restrictions are mentioned, include all of them in your response
- If the message is unclear or ambiguous about dietary needs, return "false"
- Treat phrases like "I can't eat dairy" and "dairy-free" as equivalent
5. Provide Concrete Examples
Include examples that show both positive and negative cases:
For example, if a customer writes "I loved the pasta dish but I'm actually
lactose intolerant so I had to skip the cheese," you would extract
"lactose intolerant" as the dietary restriction.
Do NOT flag messages that only mention taste preferences like "I don't
really like spicy food" without mentioning actual restrictions or allergies.
Formatting Data in the Prompt Section
Formatting Data in the Prompt Section
Data Formatting Decision Tree
Choose your formatting approach based on content length and complexity:
Use label: value
format when:
- Content is under ~50 characters
- Single-line data (IDs, dates, statuses, numbers)
- No risk of special characters causing confusion
Use XML tags when:
- Content is longer than ~50 characters
- Multi-line content (messages, emails, transcripts)
- Content might contain quotes, apostrophes, or line breaks
- You need clear start/end boundaries
Short Data: Label Format
Perfect for concise, structured information:
<customer_info>
Customer ID: {{"CUSTOMER_ID"}}
Account Type: {{"ACCOUNT_TYPE"}}
Signup Date: {{"SIGNUP_DATE"}}
Total Orders: {{"ORDER_COUNT"}}
</customer_info>
<interaction_metadata>
Channel: {{"CHANNEL"}}
Agent ID: {{"AGENT_ID"}}
Duration: {{"CALL_DURATION_MINUTES"}} minutes
</interaction_metadata>
Long Data: XML Tag Format
Essential for content that might contain complex formatting:
<message_subject>{{"MESSAGE_SUBJECT" |> coalesce('')}}</message_subject>
<message_body>{{"MESSAGE_BODY" |> coalesce('')}}</message_body>
<call_transcript>{{"TRANSCRIPT_TEXT" |> coalesce('')}}</call_transcript>
<customer_notes>{{"SUPPORT_NOTES" |> coalesce('')}}</customer_notes>
Special Characters and Data Quality
Good news: Cotera handles special characters automatically! You don't need to worry about:
- Quotes and apostrophes in customer messages
- Line breaks in transcripts or emails
- Special unicode characters
- Escaping any characters
Cotera processes your data as-is and passes it cleanly to the AI, including:
- Newlines (preserved exactly as they appear)
- Quotation marks ("These work fine")
- Apostrophes (Don't worry about these)
- Emojis and special characters (😊 ✓ ñ)
Handle Missing Data Gracefully
Always account for null or missing values in your data formatting. Use coalesce functions to provide fallbacks:
Message Subject: {{"MESSAGE_SUBJECT" |> coalesce('')}}
Customer Notes: {{"NOTES" |> coalesce('No notes available')}}
Previous Interaction: {{"LAST_CONTACT" |> coalesce('First-time customer')}}
The AI should know what to expect when data is missing and how to handle these cases appropriately.
Keep Data Organized and Logical
Group related information together to make it easier for the AI to process:
<customer_profile>
Customer ID: {{"CUSTOMER_ID"}}
Tier: {{"CUSTOMER_TIER"}}
Tenure: {{"DAYS_SINCE_SIGNUP"}} days
Lifetime Value: ${{"LTV"}}
</customer_profile>
<current_interaction>
<channel>{{"INTERACTION_CHANNEL"}}</channel>
<timestamp>{{"CREATED_AT"}}</timestamp>
<message_content>{{"MESSAGE_TEXT"}}</message_content>
</current_interaction>
Output Specifications
Be Extremely Specific
Define exactly what format you want back:
{
"dietary_restriction": "string",
"has_dietary_info": "true | false"
}
Available Output Types
You can specify various output formats:
- String: Simple text responses
- Boolean: True/false values
- JSON Object: Structured data with multiple fields
- Array: Lists of items
- Enum: Specific predefined options
- Float/Int: Numeric values
Example Complex Output Spec
{
"sentiment": "positive | negative | neutral",
"confidence_score": "float between 0 and 1",
"topics": array["string"],
"requires_followup": "boolean",
"priority_level": "low | medium | high | urgent"
}
Advanced Prompting Techniques
1. Multi-Step Reasoning
For complex decisions, break down the reasoning process:
Follow these steps:
1. First, scan the message for any mentions of food restrictions or allergies
2. If found, extract the specific restriction mentioned
3. Determine if this information is actionable for menu recommendations
4. Format your response according to the output specification
2. Boundary Case Handling
Be extremely specific about edge cases:
**Boundary Cases:**
- Include explicit dietary restrictions (vegetarian, vegan, gluten-free, etc.)
- Include medical allergies (nut allergies, shellfish allergies, etc.)
- Exclude general taste preferences ("I don't like mushrooms")
- Exclude temporary dietary choices ("I'm trying to eat less sugar this week")
- If a message mentions both restrictions and preferences, only flag the restrictions
3. Quality Thresholds
Set clear thresholds for when the AI should act:
Only extract dietary information when you are confident (>80% certain) that
it represents a genuine restriction or allergy. When in doubt, return false
rather than guessing.
Tool Integration
When to Use Tool Calls
Tool calls are powerful for taking actions based on AI decisions. Common examples include:
- Appending rows to Google Sheets
- Sending Slack notifications
- Processing refunds or transactions
- Creating support tickets
- Updating CRM records
- And many other automated actions your business requires
Tool Call Structure
When incorporating tool calls, be specific about parameters:
Use the appendRow tool to add data to this Google Sheet:
- **Spreadsheet ID**: [YOUR_SPREADSHEET_ID]
Provide values as an array with exactly 5 elements in this order:
1. id
2. created_at
3. summary
4. category_tag
5. assessment
Decision-Based Tool Usage
You can have the AI make decisions about which tools to use:
Based on the urgency level determined:
- If "urgent": Use the slack_message tool to notify the support team
- If "high" or "medium": Use appendRow to add to the priority queue sheet
- If "low": Use appendRow to add to the standard processing sheet
Best Practices for Scale
1. Provide Comprehensive Instructions
Write your prompts as if you're providing detailed specifications to a highly capable analyst who:
- Has never worked at your company
- Doesn't understand your industry-specific terminology
- Can handle complex reasoning but needs explicit context about your business rules
- Will encounter edge cases you haven't anticipated
While AI models are sophisticated and can handle nuanced reasoning, they need you to explicitly share your domain expertise and business context to make decisions that align with your goals.
2. Domain-Specific Instructions
Include relevant business context:
Important context about our business:
- Customers often use "delivery issues" and "shipping problems" interchangeably
- Our premium service tier is called "Priority" - treat mentions of this
as high-value customer feedback
- Service pauses are different from cancellations - be careful
to distinguish between these
3. Consistency is Critical
Since you're processing thousands of rows, small inconsistencies become major problems:
**Consistency Requirements:**
- Always use the same date format: YYYY-MM-DD
- Capitalize proper nouns consistently
- Use the same terminology throughout (e.g., always "subscription pause"
not "account pause" or "service pause")
4. Handle Uncertainty Gracefully
Tell the AI what to do when it's not sure:
When you encounter ambiguous cases:
- If you're less than 70% confident, return "uncertain"
- If the data is incomplete, return "insufficient_data"
- If the message is too short to analyze, return "too_brief"
Common Pitfalls to Avoid
❌ Being Too Vague
Bad: "Analyze this customer message and tell me what's important."
Good: "Determine if this customer message indicates they're likely to cancel their subscription in the next 30 days based on language indicating dissatisfaction, mention of competitors, or explicit cancellation threats."
❌ Assuming Business Knowledge
Bad: "Check if this is a priority escalation situation."
Good: "Check if this message indicates a priority escalation situation, which is defined as a customer requesting a refund, replacement, or credit due to product quality issues."
❌ Ignoring Data Quality Issues
Bad: Not addressing what happens with null values, empty strings, or malformed data.
Good: "If the message body is empty or null, return 'no_content'. If the timestamp is malformed, use 'unknown_date' for date calculations."
Testing and Iteration
1. Start Small
Test your prompts on a small sample (100-1000 rows) before running at full scale.
2. Review Edge Cases
Look for patterns in incorrect outputs:
- Are certain types of messages consistently misclassified?
- Do null values cause problems?
- Are there industry terms the AI doesn't understand?
3. Refine Iteratively
Update your prompts based on what you learn:
- Add new boundary cases to your instructions
- Clarify ambiguous language
- Provide additional examples for difficult cases
4. Monitor Output Quality
Set up quality checks:
- Random sampling of outputs
- Confidence score tracking
- Alert systems for unusual patterns
Advanced Patterns
Conditional Logic
You can include conditional instructions:
If the message is from a VIP customer (indicated by CUSTOMER_TIER = "VIP"):
- Apply more lenient criteria for flagging issues
- Include VIP status in your response
- Use more detailed analysis
If the message is from a trial customer:
- Be more conservative about recommendations
- Focus on retention-related insights
Multi-Field Analysis
For complex analysis across multiple data points:
Analyze the combination of:
- Message sentiment
- Customer tenure (signup date vs message date)
- Previous interaction history (if provided)
- Product usage patterns
Use all these factors to determine the likelihood of churn.
Dynamic Tool Selection
Let the AI choose appropriate actions:
Based on your analysis:
- If fraud is detected: Use the fraud_alert tool
- If refund is needed: Use the process_refund tool
- If escalation required: Use the create_priority_ticket tool
- Otherwise: Use the standard_log tool
Final Tips
Remember the Golden Rules
- Be specific: You know your business best - share that knowledge
- Think about edge cases: With 100k+ rows, weird edge cases will happen
- Test thoroughly: Start small and iterate
- Keep it maintainable: Well-documented prompts are easier to update
- Monitor quality: Set up systems to catch when things go wrong
You Are the Expert
The AI is a powerful tool with sophisticated reasoning capabilities, but you understand your business, your data, and your goals better than any model ever could. The more context and specific instructions you provide, the better results you'll get. Think of prompt engineering as writing comprehensive specifications for a highly capable analyst who can handle complex tasks but is completely unfamiliar with your specific business domain.
Getting Help
When in doubt, err on the side of being too detailed rather than too vague. It's easier to simplify a working prompt than to debug a vague one that produces inconsistent results.
Remember: Great prompts at scale require great specificity. The time you invest in detailed prompt engineering pays dividends when processing large datasets.