Snowflake have launched an LLM function feature, they call it Cortex. The idea is it lets you create a new column with a plain old SQL expression, which makes it super flexible. It's in public preview, so anyone can try it out.
Snowflake have launched an LLM function feature, they call it Cortex. The idea is it lets you create a new column with a plain old SQL expression, which makes it super flexible. It's in public preview, so anyone can try it out. Thought I'd make a little guide on setting it up, it's worth a play! I've got some thoughts on potential improvements to the feature too, that I'll discuss near the end.
Ok, here's how to get started.
Firstly, this is only available in some regions at the moment, as it's still in preview. So if you follow the steps to grant the appropriate role to a user, and you get an error that the cortex functions don't exist, then this is probably why! The available regions are listed here: https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#availability
Snowflake's docs are fine here, you just need to follow the simple explanation here:
https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#required-privileges
There are 5 functions you can use, each of which is good for different purposes. The functions are all available in the `snowflake.cortex` schema. Here's a run down of each one with a simple example!
COMPLETE. This just completes a prompt. You give it some text and it keeps on talking. A practical use case for this might be personalized data driven icebreakers to pipe into your sales software.
TRANSLATE. This translates between two languages. The most reasonable use case I think of for this is translating messages from international customers at scale. It's probably no good for an actual internationalization project on a website or anything like that. The quality of the translations don't seem that great either.
EXTRACT_ANSWER. This ones quite interesting, you give it an input and then a specific question, and the LLM tries to answer that question. Unlike other functions, this one returns the data as an array with a single element, and that element is a struct with two keys, "answer", and "score". This implies that you might be able to get multiple candidate answers out of it, but I haven't actually been able to. The score is also quite useful, as it let's you know how confident the model is, which could be used to implement some fallback logic. A good use case for this might be something like identifying all the reviews that talk negatively about a particular product. The main catch here is that the answer is just freeform text, and there's no obvious way to constrain it like you might an OpenAI tool call. This makes post-processing difficult.
SUMMARIZE. This one just summarizes some text. Good if you've got a long conversation history, or a web page or something that you want to summarize. As far as I can tell, the summaries seem pretty good. The implication here is that you're submitting fairly chunk bits of text to be summarized. I haven't tried to push it that far so I don't know if there are unusual cases around context length being shorter than strings (which they almost certainly are, as snowflake strings can be up to 16,777,216 characters long).
SENTIMENT. This one gives a sentiment score from -1 to 1 for a given bit of text. It's probably one of the most practical functions here to be honest. It just returns a floating point number which is extremely simple to use and feed into downstream queries. The fact it's just a single number from -1 to 1 is not _that_ helpful though. In practice, you usually want to break down sentiment by topic, and it would be quite awkward to construct that out of the primitives we have here
In general, it should be pretty clear how flexible these functions are. They are just plain old scalar expressions, so they can be used extremely flexibly in your queries. For example, here's a bit of post processing on the sentiment function:
Honestly, overall I quite like this feature, I think it has several good points. Firstly, it's incredibly simple to use. By comparison, it's incredibly arduous and error prone to pull a load of data into another process, call an API, and wrie it all back to the warehouse. There are a few pretty sensible use cases built in with the various functions here too, which remove the need to worry about fine tuning, few shot learning, prompt engineering, etc, for a common use case.
I feel that there are some things that can be improved on here.
First up, you're constrained to the open source models here. This is fine and I get it, but it would probably be pretty easy for Snowflake to allow you to make an HTTP request with a similar mechanism and use GPT4 (or any other commercial model, or a custom model that you hosted yourself). GPT4 in particular has a couple of great features (json mode and tools) that make certain kinds of work much easier here. As it stands, the complete/answers question features are fairly constrained, and can't be used to do anything a bit more structured.
Secondly, it's not clear what the intended workflow is for these. If they're intended to be run ad hoc as part of a 'last mile' query, then the form factor is kind of fine. It's likely some of these outputs would want to be used in downstream transformations though. Sentiment score in particular comes to mind here, a number that you might want to aggregate and pass into a customer 360, or sync over to a CRM, etc. However, the existing form factor of the scalar expression makes this hard to do efficiently. I think you pretty much have to use a dbt incremental model (or some similar approach) to avoid running the same prompts again and again, racking up a huge bill in the process!
Thirdly, there's no convenient in way to provide examples as part of a prompt. LLMs are famously "few shot learners", and you can often get huge improvements in performance by providing a few examples for the LLM to follow. You _could_ construct such a prompt by hand here (they're just strings after all), but it's pretty awkward. Additionally there's no built in way to assess quality of a given iteration of a prompt—i.e. there's no eval framework. Of course, you have all the tools in Snowflake to build one, but not everyone is going to be comfortable doing that.
Finally, they're still embedded within SQL queries, so they're not that accessible to many people that would want to make their own custom columns based on LLM outputs. Think marketers, sales people and CX experts - many in these roles are using LLMs day to day, but don't have the tools to scale up and automate their own work. This feature feels tantalisingly close but it's still not quite there for that.
On that note, a shameless plug. Cotera has a feature that addresses more or less exactly those issues, and which works on any data warehouse. It's a UI that let's anybody define "magic columns" (that's what we call it) using LLMs. Similar to Snowflake, we have some pre-baked workflows. I'm biased, but I think our workflows are a little better thought through. For example, we have a fine tuned model for creating a set of tags and assessing sentiment for those tags independently. This is far more useful and actionable than a plain sentiment score. All of our workflows also have full support for building a set of examples. This is really easy to do, you just run through examples in the UI and correct the output. Whenever you correct something it gets saved as an example and used as further context in the future. The feature is also built on top of our "events system", which let's you detect changes in any table. As such the system will never run the same prompt twice.