Run on All Rows vs Run on New Rows

When executing agent columns in your datasets, Cotera gives you control over which data gets processed. Understanding the difference between these execution modes helps you manage both costs and processing scope effectively.

The Two Execution Modes

Run on New Rows (Default)

This mode processes only data that hasn't been processed by the current column configuration. When you modify a prompt or change column logic, "Run on new rows" applies those changes going forward without touching previously processed data.

What happens:

Existing rows keep their current outputs
Only new data rows or triggers get processed with your updated logic
Previous results remain unchanged in your dataset

Run on All Rows (Backfill)

This mode reprocesses your entire dataset with the current column configuration. Every row gets processed again, replacing all previous outputs with new results based on your updated logic.

What happens:

All existing rows get reprocessed
Previous outputs are replaced with new results
The column applies its current configuration to the complete dataset

When to Use Each Mode

Use "Run on New Rows" When:

You've made prompt refinements - You improved your prompt for future processing but don't need to update historical analysis.
Your dataset is large - Processing thousands or millions of rows can be expensive. Incremental updates keep costs manageable.
Historical context matters - You want to preserve what the column understood at the time it processed each row.
You're testing changes - Verify your updates work correctly on new data before committing to a full backfill.

Use "Run on All Rows" When:

You need consistency across all data - All rows should reflect your current logic, like after fixing a bug or updating classification criteria.
You've changed core logic - Fundamental changes usually require reprocessing everything to maintain data integrity.
The dataset is manageable - Small to medium datasets (hundreds to low thousands of rows) can be reprocessed without significant cost.
You're standardizing historical data - All data needs to follow the same current standards.

Practical Example

Scenario: Customer Feedback Analysis

You've built an agent column that analyzes customer feedback for sentiment and categorizes issues. Your dataset contains 25,000 historical reviews plus new ones arriving daily.

Initial prompt:

Analyze customer feedback for sentiment (positive/negative/neutral)
and identify the main topic discussed.

Improved prompt:

Analyze customer feedback for sentiment (positive/negative/neutral).
Identify the main topic: Product Quality, Customer Service,
Shipping Experience, or Pricing Concerns.
Rate urgency as High, Medium, or Low based on language intensity.

Decision point:

Run on New Rows - Apply the improved analysis to all future reviews while preserving the simpler analysis on historical data. Keeps processing focused on new information.
Run on All Rows - Reprocess all 25,000 historical reviews to standardize everything with the improved categorization and urgency assessment. Provides complete dataset consistency for historical reporting.

The choice depends on your needs:

Need consistent historical reporting? → Run on all rows
Only care about improving future analysis? → Run on new rows
Want to test the new approach first? → Run on new rows initially, then backfill later if needed

The execution mode you choose determines the scope of your processing. For large datasets with thousands of rows, consider whether you need historical consistency or if forward-looking improvements are sufficient for your business needs.