I believe AI coding is a big part of the future for data professionals—including data engineering, data science, and analytics engineering. This means that adopting AI for development will be critical for career success. Since the Cursor article and video, I’ve been digging into the AI coding space more and using Claude Code as well, since it is a favorite among many developers. This post and the corresponding video focus on Claude Code and the most important concepts to master as you get started.
Many of these foundational ideas are also very useful for Cursor and other AI coding tools. With the right use of the different components we discuss, plus a bit of practice, I believe you and your colleagues will gain a lot of efficiency on certain types of work with these AI coding tools.
The 5 essentials you should understand about Claude Code:
- CLAUDE.md – Your main instruction file. It tells Claude your project’s rules, frameworks, and design principles from the start.
- Skills – Packages of expertise for specific tasks. Claude automatically pulls these in when they’re relevant to what you’re asking.
- Commands – Custom shortcuts you trigger manually (like /my_command) for tasks you do repeatedly.
- Subagents – Separate sessions for specific jobs (like writing tests). Keeps your main conversation focused and uncluttered.
- MCP Servers – Plugins that connect Claude to external tools and data sources, like Databricks SQL or a companies internal APIs.
The key to succeeding with agents is context, basically the session memory. It’s about providing the right details and managing memory so the agent has the correct information at the right time. And if the memory get’s too full your AI coding session won’t be as effective.
Here are the core components of Claude Code. They each can play a part in managing context, knowledge, and workflow:
| Component | Function | Usage |
| CLAUDE.md | Provides the core information and guardrails for a session. It is automatically added to context. | Contains key design principles, required frameworks, and style references for your whole project or user level. Keep it concise to save memory for specific tasks. Can reference external documentation or code. |
| Skills | A way to package specialized expertise, guidance, or workflows for specific tasks. | Automatically invoked by the agent when its short description (YAML front matter) suggests relevance to the user’s prompt (e.g., a ‘PDF Writer’ skill is invoked when working with PDFs). They provide context only when needed to avoid memory overload. |
| Commands | A way to trigger a predefined prompt for a specific task. | Manually triggered by the user (e.g., /mycommand). You control exactly what happens when you invoke it, but it does not run automatically. |
| Subagents | Stand-alone workers that split up tasks for fresh context or parallel work. | Used to focus on a particular item with isolated memory (e.g., spinning up a subagent just to write unit tests). This keeps the main agent memory clean. Configurable via “`/agent“`. |
| MCP Servers (Model Context Protocol) | A standardized way to plug in external capabilities and tools to your agent. | Often used to access internal data, authentication, or specialized APIs that would be complex to run via a simple terminal command. MCPs open up the ability to use specialized system calls from anywhere. |
CLAUDE.md
CLAUDE.md is essentially the core information that you want in memory when you start up a new agent session. You can have one of these at your user level for everything you do, or often you’ll see one at a project level that defines some things that are key and really worth having all the time. But you don’t want this to be too big or you’re going to fill up a lot of that memory with this information, and you probably need to save space for the specific things you’re going to work on.
The CLAUDE.md file is usually going to provide instructions and guardrails for your session. Key design principles, things about which tools or frameworks you want to use, maybe a little bit of references of which examples it should leverage to make sure that it writes code that fits the style and everything you’re after. If your whole project is focused on one area, then it might make sense to just have it all in your CLAUDE.md. It can reference additional markdowns, extra documentation or repositories, or you could put some sample code in here. If you put a lot in here your prompts can be much simpler but you also have less room to work with the memory before you need to start a fresh session.
Agent Skills
Claude Agent Skills are a powerful way to package expertise and workflows for AI agents. A skill bundles specialized knowledge, guidance, or step-by-step workflows that help an agent accomplish specific tasks. Think of it as giving Claude a playbook for particular types of work. Instead of loading all possible instructions into every conversation, skills use a smart activation system:
- YAML Front Matter: Each skill includes a short description that acts as a trigger
- Automatic Invocation: When you start working on a relevant task (like creating a PDF), Claude recognizes the context and automatically loads the appropriate skill
- Dynamic Loading: The skill’s detailed instructions and reference materials are pulled into memory only when needed
The PDF writer skill from Anthropic demonstrates this concept well. Rather than having PDF-creation instructions constantly in context, the skill:
- Waits in the background until you need to work with PDFs
- Activates automatically when PDF work begins
- Brings in relevant documentation and external resources as needed
- Provides step-by-step guidance for creating or modifying PDFs
This approach keeps your context clean while ensuring Claude has expert-level guidance available exactly when needed. Skills can reference external URLs and web resources to stay current and comprehensive, making Claude more intelligent and effective at specialized tasks from the very first attempt.
Commands
Claude Commands offer a different approach to extending Claude Code’s capabilities. When you run /<command_name>, it executes a predefined prompt with exact instructions you’ve specified. Unlike skills, commands aren’t triggered automatically. You need to explicitly choose to run them. Commands are particularly useful for capturing solutions to recurring challenges. When you encounter a situation where the LLM isn’t responding as you’d hoped and you find a better approach, you can create a command to preserve those exact steps for future use.
Skills, by contrast, are designed to be invoked automatically based on context. In practice, you may still need to hint in your prompt that Claude Code should use a particular skill, which often indicates that the YAML frontmatter description needs to be clearer or more specific. The key difference is this: commands are straightforward instructions you call directly, and everything gets added to context as a prompt. Skills can be equally simple, but when built out properly, they provide a smarter way to load information into your context only when it’s actually needed.
Subagents
Subagents are worth understanding, though I haven’t been using them heavily yet. I’m more likely to run two different Claude Code sessions in separate terminal tabs. But it’s an important concept to be aware of as you work with Claude Code.
Subagents provide a way to split up work, either to enable parallel execution or to apply pre-defined instructions and skills focused on just one type of task. Think of them as a way to have fresh context and fresh memory dedicated to a particular task within your larger project. For example, writing or refactoring a group of unit tests could be a good candidate for a subagent. This kind of task doesn’t require nearly as much knowledge about how you got to this point in the codebase as your main session has accumulated.
You can use /agent to configure what your prebuilt subagents will do. This gives you isolated sessions that handle their specific work without clogging up your main session’s memory with unnecessary context.
Model Context Protocol
MCP (Model Context Protocol) is a standardized way to plug different capabilities into your agent. It’s been around longer than skills and serves a similar purpose of extending what your agent can do. For simple command-line tools you already have enabled, you might not need MCP since your agent can access these directly via bash or terminal commands. But when you need to work with internal data that requires authentication or specialized API calls, wrapping those as an MCP server makes them available across Claude Code, Cursor, and other agents.
On the Databricks side, there are several managed MCPs you can configure and enable in your workspace. My favorite is the Databricks SQL one which lets you generate and run SQL against your SQL warehouse through MCP. You could alternatively have your agent generate code via API and submit it directly, but MCP simplifies this process so the agent doesn’t need to load extensive context to learn how to use these tools.
One important consideration: typically what you have within your MCPs adds to your context. Multiple MCPs can fill up context quickly, which is where skills help. However, it’s likely this downside of MCP servers becomes less of an issue over time. For example, Cursor recently shared improvements they are making.
- Databricks Managed MCPs:
- Genie Space: Wraps a set of tables and logic, converting questions into SQL and summarizing results.
- Databricks SQL: Generates SQL and runs it against a Databricks SQL warehouse.
- Vector Search: Query Vector Search indexes to find relevant documents.
- Unity Catalog Functions: Use Unity Catalog functions to run predefined SQL queries.
- External MCPs:
- Context7: Fetches up-to-date code examples and documentation (e.g., Databricks documentation).
- Playwright MCP: Automates web UI checking for web app development.
- MLflow MCP: Interacts with MLflow capabilities, like programmatically interacting with MLflow traces.
Claude Code with Databricks
When integrating your AI agent’s code with Databricks, the most popular options are:
- Databricks Connect: Generates notebooks and scripts that use Databricks Compute for Spark work but run other Python/Scala commands locally. Good for work “baked into the code itself.”
- Databricks CLI (Command Line Interface): A wrapper for running ad-hoc commands or deploying/running jobs and pipelines from a Databricks bundle. Recommended as a comprehensive wrapper for capabilities.
- Databricks SDK or REST API: Lower-level options that your agent can handle. Agents may initially default to older SDK versions or have trouble with connectivity, but it can be worked through.
The question becomes: Can you provide enough guidance to the agent to leverage Databricks Connect, CLI, or SDK directly, or do you need a layer on top, like a Model Context Protocol (MCP), to simplify the process? The managed MCP servers listed above can definitely help, adding skills seems to make a big impact, and additional context added to CLAUDE.md or through specific custom commands are valid options here.
