A tantárgyleírás hatályossága
Hatályosság kezdete:
2026. March 21.
Hatályosság vége:
—
| Subject name (Hungarian, English) |
Adatelemzés
Data Analysis
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Subject code | BMEVISZAC00 | ||||||||||||
| Subject type | — | ||||||||||||
| Training Level | — | ||||||||||||
| Course types and hours (weekly/semester) |
|
||||||||||||
| Assessment type | vizsga | ||||||||||||
| Credits | 4 | ||||||||||||
| Subject coordinator |
DR. Csima Judit
position: egyetemi docens
contact:
csima@cs.bme.hu
|
||||||||||||
| Responsible department |
Számítástudományi és Információelméleti Tanszék
|
||||||||||||
| Faculty | Villamosmérnöki és Informatikai Kar | ||||||||||||
| Subject website | — | ||||||||||||
| Primary curriculum type | — | ||||||||||||
| Direct prerequisites – Strong prerequisite | none | ||||||||||||
| Direct prerequisites – Weak prerequisite | none | ||||||||||||
| Direct prerequisites – Parallel prerequisite | none | ||||||||||||
| Direct prerequisites – Milestone prerequisite | none | ||||||||||||
| Direct prerequisites – Exclusion | none |
Objectives
Programme
Schedule of lectures
- Basic concepts of theory of hypothesis: null- and alternative-hypetheses, test statistic, acceptance- and critical regions, errors (type 1 and 2), power-function, level of significance, strength of the test, unbiasedness, consistency. The distributions origined from the normal distribution: chi-squared, F-, t-. Lukacs’s Theorem. Parameters, parametric tests.
- Parametric tests for the parameters of the normal distribution: one-sample t- and u-tests, two independent sample u- and t-tests, paired two-tailed t-test, F-test, Welch's t-test, Bartlett test.
- Non-parametric tests I. Fundamental Theorem of the Chi-square tests. Pure and estimated Chi-square fitting tests. Chi-square test for independence. Two independent sample Chi-square test for checking homogeneity of samples.
- Non-parametric tests II. Gnegyenko-Koroljuk Theorem. Order Statistics, Fit Testing with one sample Kolmogorov-Smirnov test. Testing Homogeneity with two-sample Kolmogorov-Smirnov test.
- Tets of Homogeneity. Checking homogeneity of two independent sample with Mann-Whitney test. Checking homogeneity of several independent samples with Kruskal-Wallis test. Checking homogeneity two paired samples with Wilcoxon test. Checking homogeneity several paired samples with Friedmann test.
- Bivariate regression models. Theoretical background: the conditional expectation. The bivariate regression types: linear regression, polynomial regression, linear regression can be traced back two parameters. Logistic regression. The least squares method. Analysis of Variance (ANOVA) to decide the validity of the model. Coefficient of determination (R-squared). Nadaraja-method.
- Multivariate linear regression. Techniques of Model construction. Correlation coefficients: total-, multiple-, part-. The beta coefficients. In the adjusted coefficient of determination. Multicollinearity. Heteroskedasticity. Outlier points detection and analysis.
- Introduction to Business Intelligence and practical considerations. Data driven solution, CRISP-DM (CRoss Industry Standard Process for Data Mining). Customer Relationship Management (CRM) analytics.
- Preparation for enterprise data analytics. Frequent item sets, market busket analysis, association rules and their application in practice.
- Supervised machine learning. Evaluation metrics. Profit matrix. Simple algorithms: kNN, Naive-Bayes and useful metrics.
- Decision trees (DT) and their application to decision making. Some DT algorithms (C4.5, purity measures, splitting, pre- and post-pruningés), applications.
- Classification and regression. Customer value and churn prediction. Credibility prediction. Prediction in direct marketing campaign.
- Customer segmentation and other unsupervised learning problems. Various models for clustering by K-means (bisecting and adaptive). Density based clustering algorithms (DBSCAN, OPTICS) and hierarchical clustering and their application.
- Open and/or widely used data mining softwares, model building and practical constrains.
Practical lessons
- Introduction to statistical software packages. Definition of basic statistics and their interpretation. Plots: scatter, box, pie chart and histograms, P-P and Q-Q plots, Dot plot. Confidence interval, parametric tests over economic type data sets.
- Statistical hypothesis testing over enterprise and business datasets. Independent testing and homogeneity, significance testing. Graphical and statistical analysis.
- Regression analysis over business type data sets. Model building and constrains. Evaluation: Multicollinearity, heteroscedasticity, sensitivity and outliers.
- Statistical data mining. Decision making with basic algorithms for credibility analysis.
- Frequent item sets over a bookstore dataset. Various quality measures of association rules (“list”) and their connection to decision trees.
- Modeling user behavior over a webshop and predict whether the user is going to buy something.
- Segmentation of customers according to past activities, group work with available tools.
The
aim of the lectures is learning fundamental methods of the statistics and
business data mining for master students during the semester. On the practises several application examples
that have arisen from various real problem analyzing and problem solving with
data-intensive computing support is presented Gained skills and abilities:
Students will be to realise the problems
solvable by business intelligence, will be able to solve the problems of
statistical and data mining tools in high level in the corporate sector skills
will be used, will be capable of accessing the corporate customer's data and
integrated with other corporate data to plan and achieve of profit-oriented
analytical solutions.
Learning outcomes
Ez a tantárgy a KKK rendeletben meghatározott, következő kompetenciák fejlesztését szolgálja:
Knowledge
No learning outcomes recorded.
Skills
No learning outcomes recorded.
Attitudes
No learning outcomes recorded.
Autonomy and responsibility
No learning outcomes recorded.
Oktatási módszertan
3 lectures/week + 1
computer practice (in 7 double hours)
Tanulástámogató anyagok
Online források
-; P. Tan, M. Steinbach, V. Kumar: Introduction to Data Mining, Addison-Wesley,; 2006, Cloth; 769 pp, ISBN-10: 0321321367, ISBN-13: 9780321321367; http://www-users.cs.umn.edu/~kumar/dmbook/index.php; Leskovic, Rajraman, Ullmann: Mining of Massive Datasets; http://infolab.stanford.edu/~ullman/mmds.html
Recommended preliminary knowledge for completing the subject
Knowledge type competencies
(azon előzetes ismeretek összessége, amelyek megléte nem kötelező, de a tantárgy eredményes teljesítését nagyban elősegíti)
Algorithms
Skill type competencies
(azon előzetes képességek és készségek összessége, amelyek megléte nem kötelező, de a tantárgy eredményes teljesítését nagyban elősegíti)
nincs
Recommended (non-compulsory) preliminary competencies
(azon ajánlott (nem kötelező) előzetesen megszerzendő kompetenciák összessége, amelyek jelentősen hozzájárulnak a tantárgy eredményes teljesítéséhez)
Algorithms
General rules
Requirements:
a.) in the academic term: attandance is mandatory for all
computer seminar. Students should arrive prepared based on the lectures and the
previously handed out schedule and lectures. A test will assess the preparation
at the beginning of each laboratory.
Records have to be taken of the work and then handed in at the end of
the laboratory session. The participation and the records will be marked.
Requirements for obtaining a signature: participation in minimum 70% of
seminars. Course work is optional and additional points can be given when it
reaches a sufficient level.
b.) exam period: written final exam. Determination of exam
mark: 50% is the average of the top 70% of laboratory marks and 50% is the
written exam when it passes.
Additional possibilities:
Replacement
of the computer seminars are impossible
Assessment methods
In-term assessments
No detailed assessments provided.
Weight of in-term assessments
No weights provided.
Exam-period assessments
No detailed assessments provided.
Weight of exam elements
No weights provided.
Grade calculation
No grade thresholds provided.
Attendance requirements
No attendance requirements provided.
Rules for retake and resubmission
Not provided.
Short description
Not provided.
Detailed description
Not provided.
Recommended courses
Not provided.
Workload to complete the subject
No workload breakdown provided.
Validity of subject requirements
Requirements valid from:
—
Requirements valid until:
—
Curriculum placement
No curriculum placements recorded for this subject version.