Designing and Implementing a Relational Database for a Used Car Marketplace
Project Overview: Used Car Marketplace Database
This project aims to create an entity-relationship diagram (ERD) for a used car marketplace database. The ERD will serve as the foundation for the database schema and will be used to inform the creation of the data model. In addition, insights will be drawn from the data to better understand the used car marketplace and inform future business decisions.
This project will create a database schema with tables for cars, sellers, buyers, transactions, and bidding history. The cars table will contain car details like make, model, year, price, and location. The sellers table will have information on the seller’s name, contact details, and cars for sale. The buyers table will include the buyer’s name and contact information. The transactions table will store information such as sale price, date, and payment method, while the bidding history table will contain data about the bidding history for each car. Insights will be drawn from the data on car sales patterns, popular models and makes, and average prices, to inform business decisions on pricing strategies, marketing campaigns, and inventory management.
Feature Requirements
- User Profile: The application should allow users to offer more than one used car for sale. Before selling,
users must complete their personal information, including name, contact information, and location.
- Product Listings: Users can post their car for sale on the platform, and the listing should include product details
such as make, model, body type, transmission type, and year of manufacture. Optional information like color and mileage can also be added.
- Search Functionality: Users can search for available cars based on the seller's location, make, and body type
- Bidding Feature: If the seller allows the bidding feature, potential buyers can make a bid on the product.
- Ad Display: The platform should display ads with product information, titles, and seller contacts.
- Entity-Relationship Diagram (ERD): The project should include the creation of an ERD for the database schema.
- Insights: The project should involve drawing insights from the data to understand the used car marketplace better.
- Scope: The project should not include developing a payment or transaction system since the transactions for
purchasing a car will be conducted outside of the platform.
ERD Table Design
Designing the Database
- Identify the entities: We have identified seven entities: Location, Product, User_seller, User_buyer, Bid_buyer, Advertisement, and Interest.
- Create tables: We will create tables for each entity, following the data structure provided for each entity.
- Define primary keys: For each table, we will define a primary key. The primary key uniquely identifies each row in the table.
- Define foreign keys: We will define foreign keys in tables where one table needs to reference another table. For example,
the User_seller and User_buyer tables have a foreign key referencing the Location table.
- Define table relationships: We will define the relationships between tables based on the data structure. For example, the Product table
has a one-to-many relationship with the Advertisement and Bid_buyer tables, and the User_buyer table has a one-to-many relationship with
the Bid_buyer and Interest tables.
- Implement table constraints: We will implement constraints to ensure data integrity, such as requiring non-null values for certain columns.
- Populate the database: We can now populate the database with data.
Following these steps, we can create a functional relational database based on the given data. The system contains a list of 7 objects that require representation in the database.
These objects include Location,Product, User_seller, User_buyer, Bid_buyer, Advertisement, and Interest.
Based on the identified objects, I created table structures for each object. For each table, I determined the required columns and their respective data types. I also specified primary keys and foreign keys as necessary. The table structures that I created are as follows:
- Location table: includes columns for location_id (primary key), city, latitude, and longitude.
- Product table: includes columns for product_id (primary key), brand, model, body_type, transmission, and year.
- User_seller table: includes columns for user_id (primary key), name, contact, and location_id.
- User_buyer table: includes columns for user_id (primary key), name, contact, and location_id.
- Bid_buyer table: includes columns for bid_buyer_id (primary key), user_id_buyer, product_id, bid_price, and date_created.
- Advertisement table: includes columns for ad_id (primary key), product_id, user_id, title, description, price_sell, and date_created.
- Interest table: includes columns for interest_id (primary key), user_id_buyer, product_id, location_id, and date_created.
Implementing The Design
The implementation of the database design involves creating seven tables, each corresponding to one of the objects.
- The first table is called dim_location, which contains information about the location of the products, sellers, and buyers. This table has columns for location_id, city, latitude, and longitude
- The second table is dim_user_seller, which contains information about the sellers. This table has columns for user_id, name, contact, and location_id, where location_id is a foreign key that references the location table.
- The third table is dim_product, which contains information about the products. This table has columns for product_id, brand, model, body_type, transmission, and year.
- The fourth table is dim_user_buyer, which contains information about the buyers. This table has columns for user_id, name, contact, and location_id, where location_id is a foreign key that references the location table.
- The fifth table is dim_bid_buyer, which contains information about the bids made by buyers. This table has columns for bid_buyer_id, user_id_buyer (a foreign key that references the buyer table), product_id (a foreign key that references the product table), bid_price, and date_created.
- The sixth table is fact_advertisement, which contains information about the advertisements created by the sellers. This table has columns for ad_id, product_id (a foreign key that references the product table), user_id (a foreign key that references the seller table), title, description, price_sell, and date_created.
- The seventh table is fact_interest, which contains information about the interests of the buyers in the products. This table has columns for interest_id, user_id_buyer (a foreign key that references the buyer table), product_id (a foreign key that references the product table), location_id (a foreign key that references the location table), and date_created.
Populating the database
The first step in populating the database is to insert the provided data for the dim_location and dim_product tables, which are in CSV format. We can use the script below to insert the data. Once the data is available, we will generate additional data using Faker.
Generate Data using Faker
The next step is to generate data using faker. The entire data that needs to be generated is already provided in a Jupyter notebook file, which just needs to be executed. The notebook file is located in the ipynb folder. The following steps will be taken:
- This Python code snippet imports the psycopg2, random, time, and Faker modules to connect to a PostgreSQL database and generate fake data using the id_ID locale. The psycopg2 library is used to establish a connection to the database by specifying the host name, port number, database name, username, and password.
- Once the connection is established, a cursor object cur is created to execute SQL commands on the connected database. The cursor object is used to send queries and retrieve results from the database.
- Insert data into the fact_advertisement table by executing a loop for 1000 times. In each iteration,the script generates fake data such as ad_id, product_id, user_id, title, description, price_sell, and date_created.
- Insert data into the dim_bid_buyer table by executing a loop for 300 times. In each iteration, the script generates fake data such as bid_buyer_id, user_id_buyer, product_id, bid_price, and date_created.
- Insert data into the fact_interest table by executing a loop for 1000 times. In each iteration, the script generates fake data such as interest_id, user_id_buyer, product_id, location_interest, and date_created.
Transactional Query Project
At this stage, we are asked to provide an insight by finding the nearest used car based on a city ID, where the closest distance is calculated based on latitude and longitude. The distance calculation can be done using the Euclidean distance formula based on latitude and longitude.
The expected result is to calculate the nearest distance to find the closest car to the city ID 3173. To do this, we need to obtain the latitude and longitude values of the used car dealerships and calculate the Euclidean distance between them and the city ID 3173. The dealership with the shortest distance will be considered the closest to the city.
Analytical Query Project
In this code snippet, we are comparing the prices of cars based on the average price per city. The query retrieves the city, brand, model, and year of the car from the fact_advertisement table. It then joins this table with the dim_product, dim_user_seller, and dim_location tables using their respective keys.
To calculate the average price per city, the query uses the AVG() window function over the partition of the city column. This means that the average price is calculated for each city separately. The result is then ordered in descending order based on the car’s year.
This code snippet compares the percentage difference between the average price of cars based on their models and the average bid price offered by customers in the last 6 months. The query uses common table expressions (CTE) to calculate the average price of cars and the average bid price for each car model. The first CTE calculates the average price of cars using the fact_advertisement table, and the second CTE calculates the average bid price offered by customers in the last 6 months using the dim_bid_buyer table.
The main query then joins these two CTEs using the model column and calculates the difference and percentage difference between the average price and the average bid price. The result is then ordered by the car brand.