SQL JOINs allow retrieving data from several tables according to some specific conditions, with a single query. This article explores the usage of ChatGPT as a helper for writing SQL JOINs, providing an illustrated step-by-step guideline.
The database we’ll use for demonstration purposes will be AdventureWorks2019, a well-known SQL Server sample database. To validate and execute ChatGPT queries, we’ll apply dbForge Studio for SQL Server (a multi-featured and more powerful alternative to SSMS).
Whether you are a newcomer or an experienced professional in the area of database development, this article will help you master ChatGPT and utilize it in writing SQL JOINs. Let us start.
Similarities and Differences Between GPT-3 and GPT-4
The latest version of ChatGPT – ChatGPT-4 – was introduced to the public on March 14, 2023, though the previous version remained available. The purpose of both models remained the same, which is to generate human-like responses based on an input text. However, ChatGPT-4 came with significant improvements.
- Higher complexity: The GPT-4 version is larger and more complex. Therefore, its learning and text-generating capacities are higher. In practice, it means improved performance and better quality of the output text (summarization, translation, and so on).
- Better training: GPT-4 was trained on then-available data up until September 2021, while GPT-3, being naturally older, was trained up until June 2020. Thus ChatGPT-4 operates with more relevant information and provides more up-to-date responses.
- Improved fine-tuning: GPT-4 gets adapted to specific tasks or fields more easily. As a result, this model is more flexible when applied in various applications.
- Overall performance: GPT-4 delivers more accurate and informative results in comparison with GPT-3. The broader range of information available to it and higher complexity also lead to better context understanding, especially in longer conversations with human users.
- Better comprehension: The capability of generating human-like text in GPT-4 is superior to that of GPT-3. The latest model demonstrates more profound comprehension abilities.
Advantages of Using ChatGPT for SQL Database Developers
SQL developers already employ ChatGPT widely in their work. Its capabilities allow them to simplify a lot of code development tasks, such as:
- SQL query writing: With the help of ChatGPT, developers can quickly write standard SQL queries for data retrieval and manipulation tasks.
- SQL code debugging: Troubleshooting SQL code can become easier with ChatGPT’s ability to identify and fix syntax errors.
- SQL syntax recommendations: ChatGPT can provide important information related to the SQL query syntax so that developers can write accurate SQL code faster.
- Query optimization tips: ChatGPT can give recommendations on how to optimize various SQL queries to achieve better performance.
- Reports: Based on provided SQL data (sales, customers, inventory information, etc.), ChatGPT-4 can generate quality reports helping identify important trends in data.
Overall, SQL developers can benefit from using ChatGPT-4 significantly. It can’t replace everything, but it can provide support by quickly answering various questions and helping conduct SQL-related tasks. As a result, developers can become more productive and successful.
Now, let us have a look at practical cases where ChatGPT aids with writing complex SQL queries like JOINs.
Prerequisites
We are going to demonstrate the work of ChatGPT based on a few practical scenarios. To replicate them, you will need the following prerequisites:
- ChatGPT Account. Interaction with ChatGPT is personalized, so you need to register an account (if you don’t have one already). Besides, we’ll use both ChatGPT-3, which is free of charge, and ChatGPT-4, which is a paid service.
- AdventureWorks2019 database. We will use this version of Microsoft’s standard sample database to demonstrate how ChatGPT writes SQL JOIN queries (a newer version, AdventureWorks2022, is not yet covered by ChatGPT).
- dbForge Studio for SQL Server. This solution will serve us to test and verify the accuracy and effectiveness of the SQL JOIN queries created by ChatGPT.
When these prerequisites are in place, we can proceed to the next stage.
Types and Practical Examples of SQL JOIN Queries
Here’s how we’ll work on writing SQL queries with JOINs using ChatGPT:
- First, we will provide ChatGPT with the information about the desired outcome for each query. It requires direct interaction with this language model: the user specifies the tables and columns that should be involved in the JOIN query and the required result.
- Next, we will copy each SQL JOIN query generated by ChatGPT.
- Finally, we will execute each query made by ChatGPT against the actual AdventureWorks2019 database. This way, we’ll verify whether it works correctly.
In our practical examination, we are going to explore all types of SQL JOINs.
INNER JOIN
The simplest and most widespread JOIN is INNER JOIN – it returns rows from both specified tables if they match the join conditions explicitly. Any rows that don’t match our conditions will be ignored.
Assume we want to list all customers and their addresses from the database, and also add the orders made by each customer. It requires querying several tables using the INNER JOIN.
First, we utilize ChatGPT-3.
Let’s execute this query to check if it returns the required results.
As we can see, the query is valid, and the output matches our criteria.
Now, let’s use ChatGPT-4 for the same task and compare the outputs.
Here is what ChatGPT-4 has created:
Will it all work if we execute the query against the actual AdventureWorks2019 database?
Excellent. The query generated by ChatGPT according to our requirements is 100% valid, and the results are legit. However, do both these GPT versions provide the same results?
As we can see, ChatGPT-4 created a SELECT statement with an extra table involved. Besides, the results are presented in ascending order. This way, the query looks more informative and easier for analyzing and interpreting the results.
LEFT JOIN
A LEFT JOIN returns all rows from the left table and a selection of rows from the right table that match the conditions. If no rows match the conditions, the output will present NULL values for the right table.
Below we ask the AI to compose a query aimed to list all customers and all their orders.
First, we have the query from ChatGPT-3.
Next, we switch to ChatGPT-4 and set the same task.
And we have the below query as the result:
It is the Studio’s turn to enter as we need to validate the query.
Both queries are accurate, and they both deliver the expected results, but are they the same in syntax?
The primary difference is the JOIN type. In ChatGPT-3, we have a LEFT JOIN, but ChatGPT-4 applied an INNER JOIN instead.
Therefore, ChatGPT-3 joined the above-mentioned tables and returned the list of all customers from the Sales.Customer table – the left table, despite the fact that not all of them matched the records in the right Person.Person table. As for ChatGPT-4, it used the INNER JOIN for the tables in question to present the list of only those customers with the matching records in both Sales.Customers and Person.Person tables. Besides, ChatGPT-4 included the ORDER BY clause.
RIGHT JOIN
The RIGHT JOIN is similar to the LEFT JOIN, but it delivers results as mirrored, fetching all rows from the right table and a selection of rows from the left table (those rows that meet the conditions). If no match is detected, the query will return NULL values for the left table records.
Let us check how ChatGPT works with LEFT JOINs. Assume we need to query the database to list all vendors and correlate them with orders made from each vendor.
Here we have the query that uses the RIGHT JOIN option. To test its validity, we’ll execute it against the AdventureWorks2019 database.
Now, let’s see how ChatGPT-4 copes with the same task.
We have the following query generated by ChatGPT-4:
Let’s check how it works.
Both queries use a RIGHT JOIN for the Purchasing.PurchaseOrderHeader and Purchasing.Vendor tables, and they both retrieve records from the VendorName and OrderDate columns. However, we can also observe the difference: ChatGPT-4 also retrieves records from the VendorID and PurchaseOrderID columns, while ChatGPT-3 does not address those columns. Again, we see the ORDER BY clause included in the query syntax by ChatGPT-4.
FULL OUTER JOIN
This JOIN type returns all rows from all joined tables, both matching and non-matching ones. For non-matching rows in a joined table, the query returns NULLs.
Let us check how ChatGPT operates FULL OUTER JOINs.
ChatGPT has generated the below query, and we’ll check it against the AdventureWorks2019 database.
CROSS JOIN
A CROSS JOIN (a.k.a. Cartesian JOIN) matches each row in one table to each row in the second table. That’s how we get all possible combinations of rows.
To check how ChatGPT works with CROSS JOINs, we’ll ask it to present the list with all combos of product categories and subcategories.
ChatGPT-3 offers the following query:
Let us validate it using the Studio:
How about ChatGPT-4? Will it cope with the task better?
ChatGPT-4 generated the below query. Now, we need to test if it is correct.
Again, we notice the difference in queries presented by ChatGPT-3 and ChatGPT-4: the ORDER BY clause is added by ChatGPT-4. As for ChatGPT-3, it provides more concise results.
SELF JOIN
A SELF JOIN in SQL joins a table with itself. As a rule, this option is used to compare some data portions within a single table.
In our test case, we need to fetch the information about the categories of two employees from the HumanResources.Employee table.
Here is the query from ChatGPT-3:
See the output of this query:
Now, let us ask ChatGPT-4 for help:
Have a look at the query presented by ChatGPT-4:
Again, we need to validate the query:
The query is correct.
In general, we can see that both ChatGPT-3 and ChatGPT-4 create valid and effective SQL queries to retrieve data from SQL Server databases. However, ChatGPT-4 aims to provide the most informative results – it often queries more columns and fetches related data, and it sorts the results using the ORDER BY clause to ensure better readability. Thus, the data is easier to interpret.
Can ChatGPT learn?
ChatGPT was created to learn and improve using a vast amount of available text data. In addition, ChatGPT interacts with human users and receives their feedback continuously, and that information provides more context. Further, ChatGPT can use the information obtained from user requests to improve its responses. It learns all the time and becomes more sophisticated and relevant in its work.
The current article explored the usage of SQL JOINs and illustrated their performance with the help of AdventureWorks2019 – the standard sample database in SQL Server. What about your own projects? Let us refer to one more scenario.
Assume we have a database with information about Olympic athletes, including the medals they won in the Olympic Games, their types of sports, and so on. We want to retrieve some data portion from that database, specifically, the list of athletes and some related details about each of them. For that, we’ll provide our AI with a database schema and set the task to write a query for the database on that schema only.
The below script represents the schema. As you can see, it does not include any actual data.
Now, we ask ChatGPT-4 to write a query to present the information according to our requirements.
Let us now validate this query in our database:
Yes, this query is valid, even though we executed it against a custom database, and not some standard test database. Therefore, we can see that ChatGPT can generate queries for custom scenarios provided there is an appropriate context.
Conclusion
Using AI in SQL coding is a relatively new area, but it opens excellent possibilities. Actually, you only need to explain to the AI the precise results you want to obtain and from which tables and columns.
The right wording means much, so, we may expect that interacting with AI will become a separate required skill in the future. And, finally, by applying ChatGPT, we can make complex database development and analysis tasks much more accessible.