Falcon 7b first run on Google Colab
First of all appreciative of Google for providing the GPU(expensive hardware) for free of cost.
Google Colab for someone hearing it for the very first time is a freemium environment which can help us with practice our little experiments(especially Deep learning) without a local setup.
This is a article on how to just run your first LLM (opensource) of free Colab and get a feel for the Model with minimum steps.
Falcon 7b is a model that is chosen for this purpose because of a permissive licence(Apache 2.0) which can be used commercially as well, so that there is no deviation needed if they need to be used in commercial setting.
!nvidia-smi
if you run the above command you will know whether there is GPU set for your notebook.
GPUs are special hardware to parallel run the tasks where there is time constraints to run your experiments quickly and where the size of the model are too big to fit in the memory.
If not already set then if the above command errors out ,
then go to the top of the notebook and click on the run time

Then once you click on Runtime option there is an option to “change run time”

then save the settings
if you rerun the command nvidia-smi if you are lucky you may get 16gb gpu for free.
Then run the following commands
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "tiiuae/falcon-7b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
)
sequences = pipeline(
"Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
max_length=200,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
The above commands download the model and set everything for inference.The results are as shown below (highlighted in yellow).

This is how you run a small manageable model on colab and get inference from the model, be aware there is reason why 7b model is chosen instead of 13b or 65b or 70b models, since they are too big to fit into a single GPU given by google , if you have more GPU available you can try them.
Thank you happy learning.