This project is based on Chapter 2 of the Hands-On Machine Learning book. The goal of the project is to build a housing price prediction model using the California housing dataset.
Project link: peeplika/Housing-price-prediction (github.com)
Step 1: Downloading the Data
The California housing dataset was used, which is a shortened version of the full dataset. This smaller dataset makes it easier to work with while still providing enough data for training the model.
Step 2: Splitting the Data
A portion of the data was set aside for testing before any analysis was performed. This ensures that the test data remains unseen during the model training process, providing a realistic evaluation of the model's performance.
Step 3: Visualizing the Data
To gain a better understanding of the dataset, visualizations such as histograms were created. These visualizations revealed the distribution of features, allowing potential patterns or outliers to be identified.
Step 4: Modifying the Data
The data was modified to enhance its quality. This included:
Adding new, relevant features such as
rooms_per_household
.Handling missing values to avoid losing critical information.
Converting categorical features to numerical values using one-hot encoding.
Step 5: Trying Different Models
Several machine learning models were tested to determine which one performed best on the dataset. Testing multiple models ensures that the most suitable one is selected.
Step 6: Fine-Tuning the Model
After selecting the best-performing model, hyperparameter tuning was applied to optimize its performance. This step ensures that the model achieves the best possible results.
Step 7: Testing the Model
Finally, the model was tested on the previously unseen test data. Performance metrics were recorded to evaluate the model's ability to predict housing prices accurately.