It’s more convenient to store the data in your MySQL instead of Hive to improve the efficiency and facilitate retrieval of the data if the amount of data is not that large. Here are some tips about SQL, according to IBM Databases and SQL for Data Science
using string pattern: like ‘%%’
using a range: between … and …
using a set of values: in (‘’, ‘’)
Sorting: order by … / order by … desc / order by 2(column number)
Eliminating Duplicates: distinct
It seems like you don’t need to write distinct(), distinct also works.
Group by clause
Restricting the result set - Having clause: Having
works only with the GROUP BY clause.
Aggregate Functions: sum(), min(), max(), avg()
Scaler and String functions: round(), length(), ucase, lcase
Date and Time functions: year(), month(), day(), dayofmonth(), dayofweek(), dayofyear(), week(), hour(), minute(), second()
Date or time arithmetic: + 1 Days, CURRENT_DATE, CURRENT_TIME
Accessing multiple tables with with Implicit join:
select * from employees E, departments D where E.DEP_ID = D.DEPT_ID_DEP;