Understand Machine Learning and Deep Machine Learning with Decision making!

Episode-1


Preface: Hi, my dear readers, Now a days, in the hustle and bustle of life. We must not forget to skill-up ourselves for may be our passion or to attain speed of survival in IT industry. People from different walks of life are now a days, directly or in-directly connected to technology because of the influential power of social media.

Data is everywhere, how to efficiently use data is very much important and software industry now know the potential of consuming data. So, in continuation of this discussion IT industry has introduced a buzz word “Data-Sciences”.

So, let’s dig deeper and know about “What is Data-Sciences”. How can we grasp and acquire this knowledge to gain insight into a very brand new domain of Artificial Intelligence. Though, AI started evolving since late 1950’s but now its spotted for its miracles on IT automation and decision-making.

Keeping this explanation simple and effective to learn “Machine Learning” easily, I will jot down some important points for you and this discussion will continue in episodes for our continuous relationship as I am doing in my other blog posts like below

What is Machine learning and how can we learn this skillset? Lets break this technical term into smaller pieces to learn more about it.

Supervised Learning:

Let’s start with basic thing about “Machine Learning” which is educating a specific software with set of Labeled data which means providing both Questions and Answer keys to the student to prepare for the exam.

Later, you ask questions and the answers are already provided to the system and system predicts the right answers just like based on sequence of word with some important patterns which is 

Tokenization:

Is the process of breaking down sentences into words, words into Alphabets so that these words and their meanings and context with correlation be made through process of embedding. Whereas embedding is the process of assigning numbers to words for vector databases to hold data for semantic search.

It will be difficult for you to understand what am I talking about, so lets break all above difficult words (Terms) into understandable pieces. Let’s do it 😊



Step 1: What is Tokenization?

Tokenization is the process of breaking text into smaller units (tokens) that a Large Language Model (LLM) can understand.

  • Think of it like chopping a sentence into Lego blocks.
  • Each block (token) could be a word, part of a word, or even punctuation.

Example: 

Sentence: “AI secures the cloud.” 

Tokens might look like: ["AI", "secures", "the", "cloud", "."]

Step 2: Turning Tokens into Numbers

LLMs don’t understand words directly—they understand numbers. Each token is mapped to a unique ID number using vocabulary.

Example:

  • "AI" → 1023
  • "secures" → 5678
  • "cloud" → 9012

So the sentence becomes: [1023, 5678, 300, 9012, 2]

Step 3: Embeddings (Vector Representation)

Now, instead of just IDs, we give each token a vector (a list of numbers) that captures its meaning.

Example:

  • "cloud" might be represented as a vector like: [0.12, -0.45, 0.88, …]
  • This allows the model to understand semantic relationships (e.g., “cloud” is closer to “Azure” or “AWS” than to “banana”).

Step 4: Storing in a Vector Database

A vector database (like Pinecone, Weaviate, or Milvus) stores these embeddings so they can be searched efficiently. Instead of searching exact words, you search by meaning.

Example:

·       If you ask, “How do I secure AWS?”, the database finds vectors close to “secure” + “AWS” and retrieves relevant documents—even if they don’t use the exact same words.

Summary of the whole above concept

Imagine a library:

  • Tokenization = cutting books into sentences and words.
  • IDs = giving each word a catalog number.
  • Embeddings = describing each word with a “meaning fingerprint.”
  • Vector database = shelves organized by meaning, so when you ask for “cloud security,” the librarian instantly finds books about Azure, AWS, SIEM—even if the word “cloud” isn’t explicitly written.

In short: Tokenization breaks text into pieces, embeddings give those pieces meaning, and vector databases store them for smart retrieval.

Now our next step is to understand “Regression”. Stay Tuned for my next write-up...Happy Learning 😀 




Comments

  1. This is an informative post review. I appreciate your efforts and all the best. I am so pleased to get this post article and nice information. I was looking forward to getting such a post which is very helpful to us. Best Steam Presses A big thank for posting this article on this website.

    Biodegradable Straw Machine

    ReplyDelete

Post a Comment

Popular posts from this blog

Cloud Security Platforms