MSBA7012/MACC7022 Individual Assignment 2: Fraudulent Job Post Detection
Deadline: Sunday, April 28, 2024 11:59pm
Objective:
• Leverage Alteryx to develop a workflow that can preprocess data, engineer features, and build a machine learning model to predict whether a job posting is fraudulent.
Dataset:
• The Balanced_Fraudulent_Job_Posts.xlsx dataset includes attributes related to job postings, with key columns like 'title', 'company_profile', 'description', 'requirements', 'benefits', and 'fraudulent'.
Tasks:
1. Data Preprocessing:
• Use Alteryx to load the dataset and create a new column that combines the textual data in 'title', 'company_profile', 'description', 'requirements', and 'benefits' columns.
• Perform. text pre-processing on the combined text column.
2. Feature Engineering:
• Implement TF-IDF vectorization in Alteryx using the Python Tool to convert the text
data into a numerical format suitable for machine learning.
3. Model Building and Evaluation:
• Split the data into a training set and a testing set with a ratio of 70:30.
• Utilize Alteryx's Forest Model tool to train a model using the training set.
• Consider the TF-IDF counts only as the model features.
• Evaluate the model's performance on the testing set through the Model Comparison tool and record the metrics (accuracy, F1-score, AUC, and confusion matrix).
4. Reporting:
• Create a report in Word to summarize the model evaluation results and insights into the key factors that help predict fraudulent job postings.
Deliverables:
• An Alteryx workflow (.yxmd) containing the complete analysis, with annotations explaining each tool and step. Use relative path for workflow dependencies in Alteryx so that the grader can run your program without making any change.
• A Word document (.docx) summarizing the findings and insights from the model.
• Compress the above two files into a zip file named with your student ID,e.g., 123456.zip.
• You should not make any modifications to the input file: Balanced_Fraudulent_Job_Posts.xlsx. Also, DO NOT include this input file in your zip file.
Evaluation Criteria:
• Correctness and completeness of the preprocessing and feature engineering steps implemented in Alteryx.
• Accuracy and thoroughness of the model evaluation and interpretation of results within Alteryx.
• Quality and clarity of the final report, including insights and conclusions drawn from the analysis.
	
	
	
	
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681  微信:codinghelp 电子信箱:99515681@qq.com  
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。