- 7th Jan 2024
- 19:57 pm
Scikit-learn is a popular Python library used for machine learning. The Python Assignment Help provides online 1-on-1 tutoring sessions, assignment help, and project help regarding Scikit-learn coursework.
Scikit-learn Basic Tutorial
Here's a basic tutorial to get you started with scikit-learn:
Step 1: Installation
Ensure you have Python installed on your system. You can install scikit-learn using pip: pip install scikit-learn
Step 2: Importing Libraries
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
Step 3: Loading Dataset
For the purpose of this tutorial, let's use the Boston Housing dataset available in scikit-learn:
boston = datasets.load_boston()
X = boston.data
y = boston.target
Step 4: Data Preprocessing
Split the data into training and testing sets, and perform feature scaling if necessary:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step 5: Choosing and Training a Model
For demonstration, let's use a Linear Regression model:
model = LinearRegression()
model.fit(X_train, y_train)
Step 6: Model Evaluation
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Scikit-learn Advance Tutorial
Let us learn some advanced concepts like feature engineering, hyperparameter tuning, and pipeline creation.
Step 1: Feature Engineering
Feature engineering involves creating new features or modifying existing ones to improve model performance.
from sklearn.preprocessing import PolynomialFeatures
# Example: Creating Polynomial Features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
Step 2: Hyperparameter Tuning using GridSearchCV
Optimizing model performance by tuning hyperparameters:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor
# Example: Random Forest Regressor with GridSearchCV
param_grid = {
'n_estimators': [100, 200, 300],
'max_depth': [None, 5, 10, 15],
'min_samples_split': [2, 5, 10]
}
rf = RandomForestRegressor()
grid_search = GridSearchCV(rf, param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
Step 3: Pipeline Creation
Creating a pipeline to chain multiple preprocessing steps and model training.
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.svm import SVR
# Example: Pipeline with Feature Scaling, PCA, and Support Vector Regressor
pipe = Pipeline([
('scaler', StandardScaler()),
('pca', PCA(n_components=10)),
('svr', SVR())
])
pipe.fit(X_train, y_train)
Step 4: Model Evaluation and Performance Metrics
Evaluate models using various metrics:
from sklearn.metrics import r2_score
# Example: Evaluating Model Performance (R-squared)
y_pred = best_model.predict(X_test)
r2 = r2_score(y_test, y_pred)
print(f"R-squared Score: {r2}")
Author - Radhika
Radhika is a skilled Python programmer with a passion for leveraging technology to solve complex problems. With a solid foundation in Python programming, she specializes in data analysis and application development. Her innovative mindset and attention to detail make her a valuable asset in diverse tech environments.