Overview

This guide will help you craft effective prompts for processing large datasets at scale. Unlike one-off AI interactions, scale-based prompting requires special consideration because your prompt will be executed independently across thousands or hundreds of thousands of rows of data.

Key principle: Every prompt runs in isolation on each row. The AI doesn't know about other rows, can't learn from previous examples, and must make consistent decisions based solely on your instructions and the data provided in that single row.

Before You Start: Essential Planning

Before diving into prompt engineering, it's crucial to properly plan your approach. The steps below will help ensure your prompts are effective and your results are actionable.

📋 Before you write your first prompt, work through our requirements planning guide to define your goals, assess your data, and set up your success criteria.

The planning process involves four key areas:

The Scale-Based Prompting Framework

Basic Structure

Every prompt follows this three-part structure:

# System Message
This is where you define the AI's role, provide context about your business,
and give detailed instructions about the task.

# Prompt
This is where you insert the actual data from your warehouse.
Format it clearly using XML tags like <message>...</message>

# Output Specification
Define the exact format you want back (string, boolean, JSON object, etc.)

Why This Structure Works

  • System Message: Sets context and provides comprehensive instructions
  • Prompt Section: Contains the variable data for each row
  • Output Spec: Ensures consistent, structured responses across all rows

Writing Effective System Messages

1. Establish Clear Context

Start by explaining who the AI is and what company/domain it's working in:

You are a customer experience analyst for a meal delivery service company.

This immediately gives the AI relevant domain knowledge and helps it make appropriate business decisions.

2. Explain the Business Context

Don't assume the AI understands your business. Provide relevant background:

The company wants to analyze customer feedback messages to identify when
customers mention dietary restrictions or food allergies, so the customer
service team can proactively suggest appropriate menu options in future
interactions.

3. Define the Specific Task

Be extremely specific about what you want the AI to do:

Your task is to determine if a customer is mentioning a dietary restriction
or food allergy, and if so, extract the specific dietary information mentioned.

4. Handle Edge Cases Explicitly

Since you can't manually review every row, anticipate edge cases in your instructions:

- Only flag messages that explicitly mention restrictions like vegetarian,
  gluten-free, dairy-free, nut allergies, etc.
- Do not flag general food preferences or complaints about taste
- If multiple restrictions are mentioned, include all of them in your response
- If the message is unclear or ambiguous about dietary needs, return "false"
- Treat phrases like "I can't eat dairy" and "dairy-free" as equivalent

5. Provide Concrete Examples

Include examples that show both positive and negative cases:

For example, if a customer writes "I loved the pasta dish but I'm actually
lactose intolerant so I had to skip the cheese," you would extract
"lactose intolerant" as the dietary restriction.

Do NOT flag messages that only mention taste preferences like "I don't
really like spicy food" without mentioning actual restrictions or allergies.

Formatting Data in the Prompt Section

Formatting Data in the Prompt Section

Data Formatting Decision Tree

Choose your formatting approach based on content length and complexity:

Use label: value format when:

  • Content is under ~50 characters
  • Single-line data (IDs, dates, statuses, numbers)
  • No risk of special characters causing confusion

Use XML tags when:

  • Content is longer than ~50 characters
  • Multi-line content (messages, emails, transcripts)
  • Content might contain quotes, apostrophes, or line breaks
  • You need clear start/end boundaries

Short Data: Label Format

Perfect for concise, structured information:

<customer_info>
Customer ID: {{"CUSTOMER_ID"}}
Account Type: {{"ACCOUNT_TYPE"}}
Signup Date: {{"SIGNUP_DATE"}}
Total Orders: {{"ORDER_COUNT"}}
</customer_info>

<interaction_metadata>
Channel: {{"CHANNEL"}}
Agent ID: {{"AGENT_ID"}}
Duration: {{"CALL_DURATION_MINUTES"}} minutes
</interaction_metadata>

Long Data: XML Tag Format

Essential for content that might contain complex formatting:

<message_subject>{{"MESSAGE_SUBJECT" |> coalesce('')}}</message_subject>
<message_body>{{"MESSAGE_BODY" |> coalesce('')}}</message_body>
<call_transcript>{{"TRANSCRIPT_TEXT" |> coalesce('')}}</call_transcript>
<customer_notes>{{"SUPPORT_NOTES" |> coalesce('')}}</customer_notes>

Special Characters and Data Quality

Good news: Cotera handles special characters automatically! You don't need to worry about:

  • Quotes and apostrophes in customer messages
  • Line breaks in transcripts or emails
  • Special unicode characters
  • Escaping any characters

Cotera processes your data as-is and passes it cleanly to the AI, including:

  • Newlines (preserved exactly as they appear)
  • Quotation marks ("These work fine")
  • Apostrophes (Don't worry about these)
  • Emojis and special characters (😊 ✓ ñ)

Handle Missing Data Gracefully

Always account for null or missing values in your data formatting. Use coalesce functions to provide fallbacks:

Message Subject: {{"MESSAGE_SUBJECT" |> coalesce('')}}
Customer Notes: {{"NOTES" |> coalesce('No notes available')}}
Previous Interaction: {{"LAST_CONTACT" |> coalesce('First-time customer')}}

The AI should know what to expect when data is missing and how to handle these cases appropriately.

Keep Data Organized and Logical

Group related information together to make it easier for the AI to process:

<customer_profile>
Customer ID: {{"CUSTOMER_ID"}}
Tier: {{"CUSTOMER_TIER"}}
Tenure: {{"DAYS_SINCE_SIGNUP"}} days
Lifetime Value: ${{"LTV"}}
</customer_profile>

<current_interaction>
<channel>{{"INTERACTION_CHANNEL"}}</channel>
<timestamp>{{"CREATED_AT"}}</timestamp>
<message_content>{{"MESSAGE_TEXT"}}</message_content>
</current_interaction>

Output Specifications

Be Extremely Specific

Define exactly what format you want back:

{
  "dietary_restriction": "string",
  "has_dietary_info": "true | false"
}

Available Output Types

You can specify various output formats:

  • String: Simple text responses
  • Boolean: True/false values
  • JSON Object: Structured data with multiple fields
  • Array: Lists of items
  • Enum: Specific predefined options
  • Float/Int: Numeric values

Example Complex Output Spec

{
  "sentiment": "positive | negative | neutral",
  "confidence_score": "float between 0 and 1",
  "topics": array["string"],
  "requires_followup": "boolean",
  "priority_level": "low | medium | high | urgent"
}

Advanced Prompting Techniques

1. Multi-Step Reasoning

For complex decisions, break down the reasoning process:

Follow these steps:
1. First, scan the message for any mentions of food restrictions or allergies
2. If found, extract the specific restriction mentioned
3. Determine if this information is actionable for menu recommendations
4. Format your response according to the output specification

2. Boundary Case Handling

Be extremely specific about edge cases:

**Boundary Cases:**
- Include explicit dietary restrictions (vegetarian, vegan, gluten-free, etc.)
- Include medical allergies (nut allergies, shellfish allergies, etc.)
- Exclude general taste preferences ("I don't like mushrooms")
- Exclude temporary dietary choices ("I'm trying to eat less sugar this week")
- If a message mentions both restrictions and preferences, only flag the restrictions

3. Quality Thresholds

Set clear thresholds for when the AI should act:

Only extract dietary information when you are confident (>80% certain) that
it represents a genuine restriction or allergy. When in doubt, return false
rather than guessing.

Tool Integration

When to Use Tool Calls

Tool calls are powerful for taking actions based on AI decisions. Common examples include:

  • Appending rows to Google Sheets
  • Sending Slack notifications
  • Processing refunds or transactions
  • Creating support tickets
  • Updating CRM records
  • And many other automated actions your business requires

Tool Call Structure

When incorporating tool calls, be specific about parameters:

Use the appendRow tool to add data to this Google Sheet:
- **Spreadsheet ID**: [YOUR_SPREADSHEET_ID]

Provide values as an array with exactly 5 elements in this order:
1. id
2. created_at
3. summary
4. category_tag
5. assessment

Decision-Based Tool Usage

You can have the AI make decisions about which tools to use:

Based on the urgency level determined:
- If "urgent": Use the slack_message tool to notify the support team
- If "high" or "medium": Use appendRow to add to the priority queue sheet
- If "low": Use appendRow to add to the standard processing sheet

Best Practices for Scale

1. Provide Comprehensive Instructions

Write your prompts as if you're providing detailed specifications to a highly capable analyst who:

  • Has never worked at your company
  • Doesn't understand your industry-specific terminology
  • Can handle complex reasoning but needs explicit context about your business rules
  • Will encounter edge cases you haven't anticipated

While AI models are sophisticated and can handle nuanced reasoning, they need you to explicitly share your domain expertise and business context to make decisions that align with your goals.

2. Domain-Specific Instructions

Include relevant business context:

Important context about our business:
- Customers often use "delivery issues" and "shipping problems" interchangeably
- Our premium service tier is called "Priority" - treat mentions of this
  as high-value customer feedback
- Service pauses are different from cancellations - be careful
  to distinguish between these

3. Consistency is Critical

Since you're processing thousands of rows, small inconsistencies become major problems:

**Consistency Requirements:**
- Always use the same date format: YYYY-MM-DD
- Capitalize proper nouns consistently
- Use the same terminology throughout (e.g., always "subscription pause"
  not "account pause" or "service pause")

4. Handle Uncertainty Gracefully

Tell the AI what to do when it's not sure:

When you encounter ambiguous cases:
- If you're less than 70% confident, return "uncertain"
- If the data is incomplete, return "insufficient_data"
- If the message is too short to analyze, return "too_brief"

Common Pitfalls to Avoid

❌ Being Too Vague

Bad: "Analyze this customer message and tell me what's important."

Good: "Determine if this customer message indicates they're likely to cancel their subscription in the next 30 days based on language indicating dissatisfaction, mention of competitors, or explicit cancellation threats."

❌ Assuming Business Knowledge

Bad: "Check if this is a priority escalation situation."

Good: "Check if this message indicates a priority escalation situation, which is defined as a customer requesting a refund, replacement, or credit due to product quality issues."

❌ Ignoring Data Quality Issues

Bad: Not addressing what happens with null values, empty strings, or malformed data.

Good: "If the message body is empty or null, return 'no_content'. If the timestamp is malformed, use 'unknown_date' for date calculations."

Testing and Iteration

1. Start Small

Test your prompts on a small sample (100-1000 rows) before running at full scale.

2. Review Edge Cases

Look for patterns in incorrect outputs:

  • Are certain types of messages consistently misclassified?
  • Do null values cause problems?
  • Are there industry terms the AI doesn't understand?

3. Refine Iteratively

Update your prompts based on what you learn:

  • Add new boundary cases to your instructions
  • Clarify ambiguous language
  • Provide additional examples for difficult cases

4. Monitor Output Quality

Set up quality checks:

  • Random sampling of outputs
  • Confidence score tracking
  • Alert systems for unusual patterns

Advanced Patterns

Conditional Logic

You can include conditional instructions:

If the message is from a VIP customer (indicated by CUSTOMER_TIER = "VIP"):
- Apply more lenient criteria for flagging issues
- Include VIP status in your response
- Use more detailed analysis

If the message is from a trial customer:
- Be more conservative about recommendations
- Focus on retention-related insights

Multi-Field Analysis

For complex analysis across multiple data points:

Analyze the combination of:
- Message sentiment
- Customer tenure (signup date vs message date)
- Previous interaction history (if provided)
- Product usage patterns

Use all these factors to determine the likelihood of churn.

Dynamic Tool Selection

Let the AI choose appropriate actions:

Based on your analysis:
- If fraud is detected: Use the fraud_alert tool
- If refund is needed: Use the process_refund tool
- If escalation required: Use the create_priority_ticket tool
- Otherwise: Use the standard_log tool

Final Tips

Remember the Golden Rules

  1. Be specific: You know your business best - share that knowledge
  2. Think about edge cases: With 100k+ rows, weird edge cases will happen
  3. Test thoroughly: Start small and iterate
  4. Keep it maintainable: Well-documented prompts are easier to update
  5. Monitor quality: Set up systems to catch when things go wrong

You Are the Expert

The AI is a powerful tool with sophisticated reasoning capabilities, but you understand your business, your data, and your goals better than any model ever could. The more context and specific instructions you provide, the better results you'll get. Think of prompt engineering as writing comprehensive specifications for a highly capable analyst who can handle complex tasks but is completely unfamiliar with your specific business domain.

Getting Help

When in doubt, err on the side of being too detailed rather than too vague. It's easier to simplify a working prompt than to debug a vague one that produces inconsistent results.


Remember: Great prompts at scale require great specificity. The time you invest in detailed prompt engineering pays dividends when processing large datasets.