Scikit-Learn Machine Learning Tutorial

7th Jan 2024
19:57 pm

Scikit-learn is a popular Python library used for machine learning. The Python Assignment Help provides online 1-on-1 tutoring sessions, assignment help, and project help regarding Scikit-learn coursework.

Scikit-learn Basic Tutorial

Here's a basic tutorial to get you started with scikit-learn:

Step 1: Installation

Ensure you have Python installed on your system. You can install scikit-learn using pip: pip install scikit-learn

Step 2: Importing Libraries

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

Step 3: Loading Dataset

For the purpose of this tutorial, let's use the Boston Housing dataset available in scikit-learn:

boston = datasets.load_boston()
X = boston.data
y = boston.target

Step 4: Data Preprocessing

Split the data into training and testing sets, and perform feature scaling if necessary:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 5: Choosing and Training a Model

For demonstration, let's use a Linear Regression model:

model = LinearRegression()
model.fit(X_train, y_train)

Step 6: Model Evaluation

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

Scikit-learn Advance Tutorial

Let us learn some advanced concepts like feature engineering, hyperparameter tuning, and pipeline creation.

Step 1: Feature Engineering

Feature engineering involves creating new features or modifying existing ones to improve model performance.

from sklearn.preprocessing import PolynomialFeatures

# Example: Creating Polynomial Features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

Step 2: Hyperparameter Tuning using GridSearchCV

Optimizing model performance by tuning hyperparameters:

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor

# Example: Random Forest Regressor with GridSearchCV
param_grid = {
'n_estimators': [100, 200, 300],
'max_depth': [None, 5, 10, 15],
'min_samples_split': [2, 5, 10]
}

rf = RandomForestRegressor()
grid_search = GridSearchCV(rf, param_grid, cv=5)
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

Step 3: Pipeline Creation

Creating a pipeline to chain multiple preprocessing steps and model training.

from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.svm import SVR

# Example: Pipeline with Feature Scaling, PCA, and Support Vector Regressor
pipe = Pipeline([
('scaler', StandardScaler()),
('pca', PCA(n_components=10)),
('svr', SVR())
])

pipe.fit(X_train, y_train)

Step 4: Model Evaluation and Performance Metrics

Evaluate models using various metrics:

from sklearn.metrics import r2_score

# Example: Evaluating Model Performance (R-squared)
y_pred = best_model.predict(X_test)
r2 = r2_score(y_test, y_pred)
print(f"R-squared Score: {r2}")

Author - Radhika

Radhika is a skilled Python programmer with a passion for leveraging technology to solve complex problems. With a solid foundation in Python programming, she specializes in data analysis and application development. Her innovative mindset and attention to detail make her a valuable asset in diverse tech environments.

Scikit-Learn Machine Learning Tutorial