Data Analysis

Adatelemzés
A tantárgyleírás hatályossága
Hatályosság kezdete:
2026. March 21.
Hatályosság vége:
Subject name (Hungarian, English)
Adatelemzés
Data Analysis
Subject code BMEVISZAC00
Subject type
Training Level
Course types and hours (weekly/semester)
Course type lecture tutorial laboratory
hours (weekly) 2 1 0
type (linked/independent) derived course
Assessment type vizsga
Credits 4
Subject coordinator
DR. Csima Judit
position: egyetemi docens
Responsible department
Számítástudományi és Információelméleti Tanszék
Faculty Villamosmérnöki és Informatikai Kar
Subject website
Primary curriculum type
Direct prerequisites – Strong prerequisite none
Direct prerequisites – Weak prerequisite none
Direct prerequisites – Parallel prerequisite none
Direct prerequisites – Milestone prerequisite none
Direct prerequisites – Exclusion none

Objectives

Programme

Schedule of lectures

 

  1. Basic concepts of theory of hypothesis: null- and alternative-hypetheses, test statistic, acceptance- and critical regions, errors (type 1 and 2), power-function, level of significance, strength of the test, unbiasedness, consistency. The distributions origined from the normal distribution:  chi-squared, F-, t-. Lukacs’s Theorem. Parameters, parametric tests.
  2. Parametric tests for the parameters of the normal distribution: one-sample t- and u-tests, two independent sample u- and t-tests, paired two-tailed t-test, F-test, Welch's t-test, Bartlett test.
  3. Non-parametric tests I. Fundamental Theorem of the Chi-square tests. Pure and estimated Chi-square fitting tests. Chi-square test for independence. Two independent sample Chi-square test for checking homogeneity of samples.
  4. Non-parametric tests II. Gnegyenko-Koroljuk Theorem. Order Statistics,  Fit Testing with one sample Kolmogorov-Smirnov test. Testing Homogeneity with two-sample Kolmogorov-Smirnov test.
  5. Tets of Homogeneity. Checking homogeneity of two independent sample with Mann-Whitney test. Checking homogeneity of several independent samples with Kruskal-Wallis test. Checking homogeneity two paired samples with Wilcoxon test. Checking homogeneity several paired samples with Friedmann test.
  6. Bivariate regression models. Theoretical background: the conditional expectation. The bivariate regression types: linear regression, polynomial regression, linear regression can be traced back two parameters. Logistic regression. The least squares method. Analysis of Variance (ANOVA) to decide the validity of the model. Coefficient of determination (R-squared). Nadaraja-method.
  7. Multivariate linear regression. Techniques of Model construction. Correlation coefficients: total-, multiple-, part-. The beta coefficients. In the adjusted coefficient of determination. Multicollinearity. Heteroskedasticity. Outlier points detection and analysis.
  8. Introduction to Business Intelligence and practical considerations. Data driven solution, CRISP-DM (CRoss Industry Standard Process for Data Mining). Customer Relationship Management (CRM) analytics.
  9. Preparation for enterprise data analytics. Frequent item sets, market busket analysis, association rules and their application in practice.
  10. Supervised machine learning. Evaluation metrics. Profit matrix. Simple algorithms: kNN, Naive-Bayes and useful metrics.
  11. Decision trees (DT) and their application to decision making. Some DT algorithms (C4.5, purity measures, splitting, pre- and post-pruningés), applications.
  12. Classification and regression. Customer value and churn prediction. Credibility prediction. Prediction in direct marketing campaign.
  13. Customer segmentation and other unsupervised learning problems. Various models for clustering by K-means (bisecting and adaptive). Density based clustering algorithms (DBSCAN, OPTICS) and hierarchical clustering and their application.
  14. Open and/or widely used data mining softwares, model building and practical constrains.

 

       Practical lessons

 

  1. Introduction to statistical software packages. Definition of basic statistics and their interpretation. Plots: scatter, box, pie chart and histograms, P-P and Q-Q plots, Dot plot. Confidence interval, parametric tests over economic type data sets.
  2. Statistical hypothesis testing over enterprise and business datasets. Independent testing and homogeneity, significance testing. Graphical and statistical analysis.
  3. Regression analysis over business type data sets. Model building and constrains. Evaluation: Multicollinearity, heteroscedasticity, sensitivity and outliers.
  4. Statistical data mining. Decision making with basic algorithms for credibility analysis.
  5. Frequent item sets over a bookstore dataset. Various quality measures of association rules (“list”) and their connection to decision trees.
  6. Modeling user behavior over a webshop and predict whether the user is going to buy something.
  7. Segmentation of customers according to past activities, group work with available tools.
The aim of the lectures is learning fundamental methods of the statistics and business data mining for master students during the semester.  On the practises several application examples that have arisen from various real problem analyzing and problem solving with data-intensive computing support is presented Gained skills and abilities: Students will be to realise the problems  solvable by business intelligence, will be able to solve the problems of statistical and data mining tools in high level in the corporate sector skills will be used, will be capable of accessing the corporate customer's data and integrated with other corporate data to plan and achieve of profit-oriented analytical solutions.

Learning outcomes

Ez a tantárgy a KKK rendeletben meghatározott, következő kompetenciák fejlesztését szolgálja:

Knowledge

No learning outcomes recorded.

Skills

No learning outcomes recorded.

Attitudes

No learning outcomes recorded.

Autonomy and responsibility

No learning outcomes recorded.

Oktatási módszertan

 3 lectures/week + 1 computer practice (in 7 double hours)

Tanulástámogató anyagok

Online források
-; P. Tan, M. Steinbach, V. Kumar: Introduction to Data Mining, Addison-Wesley,; 2006, Cloth; 769 pp, ISBN-10: 0321321367, ISBN-13: 9780321321367; http://www-users.cs.umn.edu/~kumar/dmbook/index.php; Leskovic, Rajraman, Ullmann: Mining of Massive Datasets; http://infolab.stanford.edu/~ullman/mmds.html

Recommended preliminary knowledge for completing the subject

Knowledge type competencies
(azon előzetes ismeretek összessége, amelyek megléte nem kötelező, de a tantárgy eredményes teljesítését nagyban elősegíti)
Algorithms  
Skill type competencies
(azon előzetes képességek és készségek összessége, amelyek megléte nem kötelező, de a tantárgy eredményes teljesítését nagyban elősegíti)
nincs
Recommended (non-compulsory) preliminary competencies
(azon ajánlott (nem kötelező) előzetesen megszerzendő kompetenciák összessége, amelyek jelentősen hozzájárulnak a tantárgy eredményes teljesítéséhez)
Algorithms  
General rules
Requirements: a.) in the academic term: attandance is mandatory for all computer seminar. Students should arrive prepared based on the lectures and the previously handed out schedule and lectures. A test will assess the preparation at the beginning of each laboratory.  Records have to be taken of the work and then handed in at the end of the laboratory session. The participation and the records will be marked. Requirements for obtaining a signature: participation in minimum 70% of seminars. Course work is optional and additional points can be given when it reaches a sufficient level.   b.) exam period: written final exam. Determination of exam mark: 50% is the average of the top 70% of laboratory marks and 50% is the written exam when it passes. Additional possibilities: Replacement of the computer seminars are impossible
Assessment methods
In-term assessments

No detailed assessments provided.

Weight of in-term assessments

No weights provided.

Exam-period assessments

No detailed assessments provided.

Weight of exam elements

No weights provided.

Grade calculation

No grade thresholds provided.

Attendance requirements

No attendance requirements provided.

Rules for retake and resubmission

Not provided.

Short description

Not provided.

Detailed description

Not provided.

Recommended courses

Not provided.

Workload to complete the subject

No workload breakdown provided.

Validity of subject requirements
Requirements valid from:
Requirements valid until:
Curriculum placement

No curriculum placements recorded for this subject version.