Understand Machine Learning and Deep Machine Learning with Decision making!
Episode-1
Preface: Hi, my dear readers, Now a days, in the hustle and bustle of life. We
must not forget to skill-up ourselves for may be our passion or to attain speed
of survival in IT industry. People from different walks of life are now a days,
directly or in-directly connected to technology because of the influential
power of social media.
Data is everywhere, how to efficiently use data is very much important
and software industry now know the potential of consuming data. So, in
continuation of this discussion IT industry has introduced a buzz word
“Data-Sciences”.
So, let’s dig deeper and know about “What is Data-Sciences”. How can we
grasp and acquire this knowledge to gain insight into a very brand new domain
of Artificial Intelligence. Though, AI started evolving since late 1950’s but
now its spotted for its miracles on IT automation and decision-making.
Keeping this explanation simple and effective to learn “Machine Learning” easily, I will jot down some important points for you and this discussion will continue in episodes for our continuous relationship as I am doing in my other blog posts like below
What is Machine
learning and how can we learn this skillset? Lets break this technical term
into smaller pieces to learn more about it.
- Supervised Learning
- Un-supervised learning
- Re-enforcement learning
- Learning Models
- Training & Training Models
Supervised Learning:
Let’s start with basic thing about “Machine Learning” which is educating
a specific software with set of Labeled data which means providing both
Questions and Answer keys to the student to prepare for the exam.
Later, you ask questions and the answers are already provided to the
system and system predicts the right answers just like based on sequence of
word with some important patterns which is
- Tokenization
- Regression
- Grounding
- Feedback
Tokenization:
Is
the process of breaking down sentences into words, words into Alphabets so that
these words and their meanings and context with correlation be made through
process of embedding. Whereas embedding is the process of assigning numbers to
words for vector databases to hold data for semantic search.
It
will be difficult for you to understand what am I talking about, so lets break
all above difficult words (Terms) into understandable pieces. Let’s do it 😊
Step 1: What is Tokenization?
Tokenization
is the process of breaking text into smaller units (tokens) that a Large
Language Model (LLM) can understand.
- Think of it like chopping a sentence into Lego blocks.
- Each block (token) could be a word, part of a word, or even punctuation.
Example:
Sentence: “AI secures the cloud.”
Tokens might look like: ["AI", "secures", "the", "cloud", "."]
Step 2: Turning Tokens into Numbers
LLMs don’t understand words directly—they understand numbers. Each token is mapped to a unique ID number using vocabulary.
Example:
- "AI" → 1023
- "secures" → 5678
- "cloud" → 9012
So
the sentence becomes: [1023, 5678, 300, 9012, 2]
Step 3: Embeddings (Vector Representation)
Now,
instead of just IDs, we give each token a vector (a list of numbers) that
captures its meaning.
Example:
- "cloud" might be represented as a vector like: [0.12, -0.45, 0.88, …]
- This allows the model to understand semantic relationships (e.g., “cloud” is closer to “Azure” or “AWS” than to “banana”).
Step 4: Storing in a Vector Database
A vector database (like Pinecone, Weaviate, or Milvus) stores these embeddings so they can be searched efficiently. Instead of searching exact words, you search by meaning.
Example:
·
If you ask, “How do I secure AWS?”, the database finds vectors close to
“secure” + “AWS” and retrieves relevant documents—even if they don’t use the
exact same words.
Summary of the whole above concept
Imagine a library:
- Tokenization = cutting books into sentences and words.
- IDs = giving each word a catalog number.
- Embeddings = describing each word with a “meaning fingerprint.”
- Vector database = shelves organized by meaning, so when you ask for “cloud security,” the librarian instantly finds books about Azure, AWS, SIEM—even if the word “cloud” isn’t explicitly written.
In short: Tokenization breaks text into pieces,
embeddings give those pieces meaning, and vector databases store them for smart
retrieval.
Now our next step is to understand “Regression”. Stay Tuned for my next write-up...Happy Learning 😀


This is an informative post review. I appreciate your efforts and all the best. I am so pleased to get this post article and nice information. I was looking forward to getting such a post which is very helpful to us. Best Steam Presses A big thank for posting this article on this website.
ReplyDeleteBiodegradable Straw Machine