ETL concepts
What is fact? Why do we need Staging table? What is dimension table and how it is related to fact table? What is surrogate key and why it is needed? Slowly Changing Dimension Types Difference between o Star Schema and Snow Flake Schema and it’s normalized or denormalized o Database and Data warehouse o Data mart and Data warehouse o OLTP and OLAP Explain Transformations and give some examples Data load types - Full load/Initial load/Historic load, Incremental load..
Database
Normalization vs Denormalization and examples Views vs Materialized views Clustered and Non-clustered index Explain each type of SQL Statements - DDL, DML, TCL Difference between o unique key and primary key o primary key and foreign key
SQL
Fetch nth highest (eg: 5th highest) salary - dense_rank() vs rank() vs row_number() functions There would be questions to write SQLs using aggregate functions (max(),min(),sum(),count(),avg()..), group by and having, joins-inner join/left outer join/right outer join/full outer join/self join How would you find if a table has duplicate data? Display unique records from a table. Union vs Union All
Testing Process
Others
Difference between o Test Plan vs Test Strategy o Test Scenarios vs Test Cases o Regression Testing vs Retesting Defect Life Cycle Which STLC model (V model, Agile, etc..) is followed in your project? Explain about it. Which test management and defect tracking tool you are using (HP ALM, JIRA, etc..) How would you ensure that you covered all the requirements in your testing? (RTM)
List down the test scenarios which you execute for an ETL Project Explain your recent ETL Project architecture How do you compare data between source and staging or staging and fact? How do you check data if the source is flat file? How do you execute ETL job? Is there any job scheduling tool you have experience with? What’s you role and responsibilities?