housing prediction project

This project is based on Chapter 2 of the Hands-On Machine Learning book. The goal of the project is to build a housing price prediction model using the California housing dataset.

Project link: peeplika/Housing-price-prediction (github.com )

Step 1: Downloading the Data

The California housing dataset was used, which is a shortened version of the full dataset. This smaller dataset makes it easier to work with while still providing enough data for training the model.

Step 2: Splitting the Data

A portion of the data was set aside for testing before any analysis was performed. This ensures that the test data remains unseen during the model training process, providing a realistic evaluation of the model's performance.

Step 3: Visualizing the Data

To gain a better understanding of the dataset, visualizations such as histograms were created. These visualizations revealed the distribution of features, allowing potential patterns or outliers to be identified.

Step 4: Modifying the Data

The data was modified to enhance its quality. This included:

Adding new, relevant features such as rooms_per_household.
Handling missing values to avoid losing critical information.
Converting categorical features to numerical values using one-hot encoding.

Step 5: Trying Different Models

Several machine learning models were tested to determine which one performed best on the dataset. Testing multiple models ensures that the most suitable one is selected.

Step 6: Fine-Tuning the Model

After selecting the best-performing model, hyperparameter tuning was applied to optimize its performance. This step ensures that the model achieves the best possible results.

Step 7: Testing the Model

Finally, the model was tested on the previously unseen test data. Performance metrics were recorded to evaluate the model's ability to predict housing prices accurately.

Housing Price Prediction