My journey to Data Science(part 4.2)

A database is any collection of related information. DBMS is special software that helps the user to create and maintain a database.

Two types of DataBase

  1. Relational Database(SQL): Organize data into one or more tables. Each table has columns and rows. A unique key identifies each row. A relational database is a database that stores and provides access to data points related to one another, e.g., my SQL, oracle, Postgre oracle, Maria DB.
  2. Non-relational database(NoSQL): Organize data in anything but the table, key-value, docs, graphs, flexible table, e.g., MongoDB, dynamo DB, Apache Cassandra


Transaction control language to manage transactions in a database using DML(…

My journey to Data Science(part 5) Regression Algorithms(part 1)

Machine Learning is about Creating an algorithm for which the computer finds a model to fit the data as best as possible and accurately predict.

Types of machine learning

  1. Supervised machine learning: Supervised machine learning is used to find the best performance. Each data is labeled with the target (for example, a particular input this will be the output, this way, if the model gets any other output, it can optimize it to get the specified output for that particular input)
  2. Unsupervised learning: Unsupervised learning is used to find the best. No data labeling is done. The model finds a hidden pattern, relations, and…

My journey to Data Science(part 4.1)

Object-Oriented Programming refers to the programming paradigm defined using objects instead of only functions and methods. The objects contain data, called attributes and methods (behaviors).

Class and Object

A class is a code template or a blueprint to create an object. The class provides attributes and methods to the object created. A class is a logical entity and does not consume memory at run time.

An object is referred to as a run-time instance created from a class during execution. Objects are considered real-world entities. Object consumes memory when created.

Attribute: Attributes are variable of a class that is shared between all instances

My journey to Data Science(part 4)

Python is a high-level, general-purpose programming language. Python is used for web development, AI, machine learning, operating systems, mobile application development, and video games.

For the most part, Python is an interpreted language and not a compiled one, although compilation is a step. Python code, written in a .py file, is first compiled to a bytecode stored with a .pyc or .pyo format.

The compiler converts the high-level language to machine-readable code (bytecode)


  1. Source code: Python Code
  2. Compilation: The source code is converted to Bytecode
  3. Bytecode: Intermediate code or low-level code
  4. Virtual Machine: Here, the code gets executed with…

My journey to Data Science(part 3.1)

Samples are drawn because it would take a lot of time and money to collect the entire population data; before we can analyze and get inferences from the sample data, statistical tests are performed to check whether the sample drawn is from the population under study.

Different Types Of Tests And Their Purpose

Correlation Tests

Correlation is used to test relationships between variables, and it is a measure of how things are related.

Pearson’s Correlation Coefficient

It measures the direction and the magnitude of the relationship between two variables. …

My journey to Data Science (Part 3)

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data.

Why Statistics for Data Science

Data gathered are all raw data, and raw data do not provide meaningful information. That's why we need statistics to collect, organize and analyze data. With statistics, basic questions like which observation is the most occurring? Is there a difference between the two experiments? Is the collected sample a representation of the population? Is the result obtained significant enough to make a difference? These questions can be answered by statistics and transform the raw data into meaningful information.

Descriptive and Inferential Statistics

My journey to Data Science (Part 2)

The field of data science revolves around Probability and statistics. Hence, it is crucial to have a solid understanding of these concepts.

Why Probability for Machine Learning?

Probability is the science of uncertainty. Whenever there is a doubt of an event occurring, probability concepts are used to estimate the likelihood of the event.

  1. Classification problems require probability to predict which Output.
  2. Few models are designed based on probability (Linear Regression, Logistic Regression, Naive Bias).
  3. Models train using an iterative algorithm that is based on Probability (Maximum likelihood estimation, expectation-maximization).
  4. Models are evaluated with a Probabilistic measure (log loss, Roc-Auc).

Uncertainty in Machine Learning

Machine learning involves lots of uncertainties.

My journey to Data Science (Part 1)

This is My Journey to data science, How I Learned

What is Data Science?

Data science is an interdisciplinary field that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. Data science practitioners apply machine learning algorithms to numbers, text, images, video, audio, and more to produce artificial intelligence systems to perform tasks that ordinarily require human intelligence. In turn, these systems generate insight that analysts and business users can translate into tangible business value.

