The LLM Face-Off: GPT 3.5 Turbo, Llama 2 7B, and Falcon 7B

More from Author
Akash

Engineering Lead

13 min read

Generative AI has been (and still is) doing a great job keeping its top spot on nearly everybody's tongues for the past few months. No wonder, considering Gen-AI-related news and updates are flowing nearly weekly, keeping the public warm and wondering what comes next.

Source code

OpenAI's GPT Models

As you're probably well aware, GPT-3.5 and GPT-4 belong to the family of large language models (LLMs) developed by OpenAI. They're both paid and available through OpenAI API.

GPT-3.5, released in March 2022, was primarily referred to as an updated version of GPT-3. With time, however, its creators started to call it, as we know it today. The most advanced (and, according to OpenAI, also the most cost-effective) of them, gpt-3.5-turbo, is the one that powers up the free version of ChatGPT.

GPT-4, the successor of GPT-3.5, was released in March 2023. It is the most recent and the most capable of OpenAI’s models but also the most powerful LLM out there (at least for now). Its main distinguisher is its ability to process not only text but also image inputs,

Meta's Llama 2 Models

Meta, in collaboration with Microsoft, recently unveiled Llama 2, a model that drew attention because of its open-source nature, making it free for both research and commercial use Llama 2 is currently available in three sizes — 7B, 13B, and 70B parameters, as well as in pretrained and fine-tuned variations.

To learn more about Llama 2 model - Exploring Llama 2: From Installation to Interaction

Falcon LLM

Falcon LLM was Founded and built by the Technology Innovation Institute (TII), a company that is part of the Abu Dhabi Government's Advanced Technology Research Council.The architecture of Falcon has been shown to significantly outperform GPT-3 for only 75% of the training compute budget

Falcon is currently available in these sizes — 1.3B, 7.5 B, 40B and 180B parameters, as well as in pretrained and fine-tuned variations.

To learn more about Falcon - Falcon LLM in Action: A Step-by-Step Tutorial

Let's evaluate these models across domains such as logical reasoning, mathematical problem-solving, ethical contemplation, code scrutiny and enhancement, suggestion validation, and content creation.

Topic GPT 3.5 Turbo Llama 2 7B Falcon 7B
Logical Reasoning Proficiency
Jane's mother has four children. Their names are Spring, Summer, and Autumn. What is the fourth child's name? GPT 3.5 Turbo - Pass
The fourth child's name is Jane
Llama 2 7B - Pass
The fourth child's name is Jane.
Falcon 7B - Fail
The fourth child's name is Autumn because it is the season that comes after Spring, Summer, and Autumn.
What is the next number in the below series 4, 8, 21, 59, 146, ? GPT 3.5 Turbo - Pass
The next number in the series is 322.
Llama 2 7B - Fail
The next number in the series is 282
Falcon 7B - Fail
The next number is 145.
Leo, Dolly, and Tommy are related to each other.
i. Among the three are Leo's legal spouse, Dolly's sibling, and Tommy's sister-in-law.
ii. Leo's legal spouse and Dolly's sibling are of the same sex.
Who do you know is a married man?
GPT 3.5 Turbo - Fail
Leo is a married man.
Llama 2 7B - Fail
we can deduce that Leo is a married man
Falcon 7B - Pass
The answer is B. Dolly
Four cups are placed upturned on the counter. Each cup has the same number of sweets and a declaration about the number of sweets in it.
The declaration are: Five or Six, Seven or Eight, Six or Seven, Seven or Five. Only one of the declaration is correct.
How many sweets are there under each cup?
GPT 3.5 Turbo - Fail
The exact distribution of sweets cannot be determined with the given information
Llama 2 7B - Fail
the total number of sweets is: 5 + 7 + 6 + 7 = 25 sweets.
Falcon 7B - Fail
The number of sweets under each cup is 5, 6, 7, and 8
A man has 53 socks in his drawer: 21 identical blue, 15 identical black and 17 identical red.
The lights are out and he is completely in the dark. How many socks must he take out to make 100 percent certain he has at least one pair of black socks?
GPT 3.5 Turbo - Pass
The man must take out 40 socks
Llama 2 7B - Pass
The man must take out 40 socks
Falcon 7B - Fail
He needs to take out 32 more socks to make 100 percent certain
On a train, S, R, and J are the fireman, brakeman, and the engineer, but may NOT be in order.
Also aboard the train are three businessmen who have the same names: a Mr. S, a Mr. R, and a Mr. J. Using the clues below, can you determine the identity of the Engineer? 1. Mr. R lives in Dehradun.
2. The brakeman lives exactly halfway between Noida and Dehradun.
3. Mr. J earns exactly Rs. 4,00,000 per year.
4. The brakeman's nearest neighbor, one of the passengers, earns exactly three times as much as the brakeman.
5. S beats the fireman in badminton.
6. The passenger whose name is the same as the brakeman's lives in Noida.
GPT 3.5 Turbo - Pass
From these clues, we can deduce that the Engineer is Mr. S.
Llama 2 7B - Pass
we can deduce that the Engineer is Mr. S.
Falcon 7B - Fail
The engineer is Mr. J.
Ethical Judgment
There is a runaway trolley barreling down the railway tracks.
Ahead, on the tracks, there are five people tied up and unable to move. The trolley is headed straight for them. You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a different set of tracks.However, you notice that there is one person on the side track. This one person is the person you love the most, such as your parents, siblings, spouse, or children. You have two (and only two) options: 1. Do nothing, in which case the trolley will kill the five people on the main track. 2. Pull the lever, diverting the trolley onto the side track where it will kill one person you love the most. Which is the more ethical option? What is the right thing to do?
GPT 3.5 Turbo - Pass
The more ethical option in this scenario would be to pull the lever and divert the trolley onto the side track, even though it would result in the death of the person you love the most
The reason behind this decision is based on the principle of utilitarianism, which states that the morally right action is the one that maximizes overall happiness or minimizes overall suffering. ...
Llama 2 7B - Pass
In my opinion, the more ethical option is to do nothing and allow the trolley to continue on its course, resulting in the death of the five people on the main track
I believe that the more ethical option is to allow the trolley to continue on its course and avoid harming any innocent people ...
Falcon 7B - Pass
The more ethical option is to pull the lever. The reason behind this is that you are actively choosing to save five people at the cost of one person you love
...
You are a member of a gang and you have been arrested with another member of your gang.
Both of you are confined away from each other and you have no way of communicating with the other member. The police does not have enough evidence to convict both of you with major charges and instead offer you and the other member a bargain. You have two choices: (a) You can remain silent (b) Or betray the other member and testify that he has committed the crime. Then there are three outcomes: (a) If you betray the other member and he remains silent, you will be set free and he will serve 3 years in prison. (But this also works vice versa. If you remain silent and he betrays you, you will serve 3 years in prison) (b) If you betray the other member, and he betrays you too, you both serve 2 years in prison (c) If you and the other member both remain silent, you both only serve one year in prison. What would you do?
GPT 3.5 Turbo - Pass
In this situation, the best logical answer would be to remain silent.
Llama 2 7B - Pass
In this situation, I believe that remaining silent and choosing option (c) is the most ethical and moral choice.
Falcon 7B - Fail
I would betray the other member and testify that he has committed the crime
Math Competence
What is the answer to the below equation
10 * 4 - 2 * (4² ÷ 4) ÷ 2 ÷ 1/2 + 9
GPT 3.5 Turbo - Fail
The answer to the equation is 35.

Therefore, the answer to the equation is 23.
Llama 2 7B - Fail
the final answer is 28.
Falcon 7B - Fail
Generated Text: -2
What is the answer to the below equation
-10 ÷ (20 ÷ 2² * 5 ÷ 5) * 8 - 2
GPT 3.5 Turbo - Fail
The answer to the equation is 6.
Llama 2 7B - Fail
the answer to the equation is -6.
Falcon 7B - Fail
The answer is -1.
Code Evaluation & Enhancement
You will be provided with a python code snippet - API developed using Django Rest Framework. Analyse and explain the code briefly and also provide a JSON response structure

Code Link -
GPT 3.5 Turbo - Acceptable
The `add_data_recursively` method is a recursive function that takes a dictionary `data` as input and adds additional data to it.
The JSON response structure would be a list of dictionaries, where each dictionary represents a context. Each context dictionary would have the following keys: "uuid", "question", "answer", and "children". The "uuid" key represents the unique identifier of the context, the "question" key represents the question associated with the context, the "answer" key represents the answer associated with the context, and the "children" key represents a list of child contexts
Llama 2 7B - Acceptable
1. The class inherits from `APIView`, which is a built-in Django Rest Framework view class.
2. The `add_data_recursively` method is defined inside the class. This method takes a dictionary `data` as input and recursively adds more data to it. … { "None": [ { "uuid": "context_id_1", "question": "Context 1", "answer": "Answer 1" }, { "uuid": "context_id_2", "question": "Context 2", "answer": "Answer 2" }, ] }
Falcon 7B - Fail
The provided code snippet is a Python code snippet that provides a list of contexts in the admin panel
You will be provided with a python code snippet - API developed using Django Rest Framework. Optimize the code and add relevant comments

Code Link -
GPT 3.5 Turbo - Acceptable
1. Use list comprehension to simplify the code for adding root nodes to the data dictionary.
2. Use a dictionary comprehension to simplify the code for adding nodes to the data dictionary. 3. Use a recursive function to simplify the code for adding children nodes. …
Llama 2 7B - Acceptable
1. Use consistent naming conventions
2. Use descriptive variable names 3. Use type hints for function parameters: 4. Use a consistent indentation size: 5. Use comments to explain what the code is doing 6. Consider using a list comprehension: 7. Use `dict` instead of `dict` for data structure: 8. Consider using `OrderedDict` instead of `dict`: …
Falcon 7B - Fail
The issue with the code is that the `None` value is being passed as a parameter to the `add_data_recursively`
function, which is not defined in the code. To fix this, you can define the `None` value as a constant in the `add_data_recursively` function and pass it as a parameter to the `get` function. Here's the updated code: …
Recommendation Evaluation
You will be provided with a list of movies or TV series . Your task is to analyse the genre and suggest similar movies or TV series
I like TV series like Tom Clancy's Jack Ryan, True Detective, Homeland. Suggest me similar TV series
GPT 3.5 Turbo - OK
1. 24
2. The Americans
3. The Blacklist
4. Bodyguard
5. Narcos
6. The Wire
7. Strike Back
8. The Bureau
9. The Bridge
10. The Night Manager
Llama 2 7B - BEST
1. Narcos
2. Peaky Blinders
3. The Night Manager
4. Killing Eve
5. The Americans
6. Better Call Saul
7. The Man in the High Castle
8. Ozark
9. The Sinner
10. Mindhunter
Falcon 7B - Fail
Breaking Bad
Game of Thrones
The Crown
Stranger Things
The Handmaid's Tale
The Office
The Walking Dead
Breaking Bad
The Crown
Stranger Things
The Handmaid's Tale
The Office
Breaking Bad
Stranger Things
The Walking Dead
Breaking Bad
Your task is to analyse the question and provide suggestions
Provide suggestions on how a software engineer become a billionaire
GPT 3.5 Turbo - BEST
1. Develop a unique and innovative software product:
2. Build a strong technical skillset
3. Focus on scalability and market potential
4. Establish a strong network
5. Seek venture capital or angel investment
6. Monetize your software effectively
7. Expand globally
8. Consider strategic partnerships or acquisitions
9. Invest wisely
10. Stay motivated and persistent
Llama 2 7B - Acceptable
1. Develop a valuable skill set
2. Build a strong network
3. Start small and iterate
4. Focus on solving real-world problems
5. Be adaptable and open to change
6. Develop a strong business plan
7. Be willing to take calculated risks
8. Focus on customer satisfaction
9. Stay up to date with industry trends
10. Be persistent and resilient
Falcon 7B - OK
- Build a strong brand
- Learn about venture capital and angel investors
- Build a strong network of contacts
- Learn about design and user experience
- Build a product that solves a problem
- Learn about business and finance
- Build a strong brand
- Learn about venture capital and angel investors
- Build a strong network of contacts
- Learn about design and user experience
Content Creation Proficiency
Analyze the below poem, and generate a poem of software development in the same style as the provided poem. Additionally generate a name for the poem
The Road Not Taken BY ROBERT FROST
Two roads diverged in a yellow wood, And sorry I could not travel both And be one traveler, long I stood And looked down one as far as I could To where it bent in the undergrowth;
Then took the other, as just as fair, And having perhaps the better claim, Because it was grassy and wanted wear; Though as for that the passing there Had worn them really about the same,
And both that morning equally lay In leaves no step had trodden black. Oh, I kept the first for another day! Yet knowing how way leads on to way, I doubted if I should ever come back.
I shall be telling this with a sigh Somewhere ages and ages hence: Two roads diverged in a wood, and I- -I took the one less traveled by, And that has made all the difference.
GPT 3.5 Turbo - BEST
Title: The Code Unwritten

Two paths emerged in a digital realm,
And regretfully, I couldn't traverse both, As a lone developer, I pondered at the helm, Gazing down one, seeking its hidden growth. Then I chose the other, seemingly just, Claiming it had the superior plea, For it was unexplored, untouched by dust, Though in truth, both were equally free. ...
Llama 2 7B - OK
The Code Not Written

Two lines of code diverged in a virtual wood,
And sorry I could not write both
And be one programmer, long I stood And looked down one as far as I could To where it branched in the subroutine; Then took the other, as just as fair, And having perhaps the better claim, Because it was logical and wanted clarity; Though as for that the passing there Had debugged them really about the same, ...
Falcon 7B - UNACCEPTABLE
A Software Developer's Lament

The codebase grows in size,
The bugs multiply in time;
The days turn to weeks, and weeks to months,
The months to years, and years to decades. The developer's eyes grow old, The lines of code grow longer; The headaches grow more frequent, The coffee stains grow thicker. The developer's hands grow tired, The codebase grows more complex; The days turn to weeks, and weeks to months, The months to years, and years to decades. ...
Generate a short, hilarious story that begins and ends in a joke based on the topic provided

Topic - I died and got reincarnated as a table
GPT 3.5 Turbo - BEST
Once upon a time, there was a man named John who had a rather peculiar sense of humor. He always loved cracking jokes and making people laugh. One day, tragedy struck, and John unexpectedly passed away.
Llama 2 7B - Acceptable
I died and got reincarnated as a table. I know, it sounds crazy, but it's true! I found myself transformed into a beautiful oak dining table, with intricate carvings and a sturdy base. At first, I was a bit confused, but then I realized - this is actually a great gig!
Falcon 7B - UNACCEPTABLE
I died and got reincarnated as a table. I was a bit confused at first, as I had no idea what my purpose was. I soon realized that I was no longer a human, but a table.

I taught myself how to walk, talk, and even eat. I was a bit clumsy at first, but I soon got the hang of it.

Performance Analysis

To gauge the real-world application of these models, tests were conducted in areas such as logical reasoning, ethics, math, code analysis, and content suggestion. Here are the results:

GPT 3.5 Turbo Llama 2 7B Falcon 7B
Parameters Not officially released.
But estimated to be 154 to 175 billion
7 billions 7 billions
Accessibility Via API Open source Open source
Logical Reasoning Proficiency 67% 50% 17%
Ethical Judgment 90% 90% 70%
Math Competence 25% 25% 10%
Code Evaluation & Enhancement 80% 80% 10%
Recommendation Evaluation 90% 90% 50%
Content Creation Proficiency 90% 90% 20%

Conclusion

GPT 3.5 Turbo, with its formidable parameter count, has proven its prowess across multiple dimensions, particularly in logical reasoning, ethical considerations, and content suggestions. However, the real surprise is the performance of Llama 2 7B. Despite having only a fraction of the parameters compared to GPT 3.5 Turbo, it impressively managed to match its performance in several areas. This showcases the efficiency and optimization Meta brought into its model. Falcon 7B, though offering commendable accessibility, significantly lags in performance metrics.

In the ever-evolving landscape of Generative AI, this comparison underscores an important reality: It's not just about how vast the resources are, but how effectively they are harnessed. As we ponder over these results, it brings us to the reflection that "In the realm of AI, efficiency can sometimes rival sheer power."

***Disclaimer: The views and opinions expressed in this content are solely those of the author and do not necessarily reflect the official policy or position of any affiliated agency, organization, or company. These thoughts and views are personal and are meant for informational purposes only.

Back To Blogs


Find out our capabilities to match your requirements

contact us