Tag Archives: SQL

Solve ERROR 1136 (21S01) in Mysql: Column count doesn’t match value count at row 1

The problem is that when a new row is inserted, an error is returned.
type:

mysql> insert into students values(1,"goutou",18,"male",12121);

Returns:

ERROR 1136 (21S01): Column count doesn't match value count at row 1

check statement is ok. Considering the previous operation of deleting a column, two new columns are added, and the input is:

mysql> insert into students(id,name,age,gender,clc_id) values(0,"zhutou",12,"male",12121);
Query OK, 1 row affected (0.06 sec)

 

How to quickly locate the problem in DB2-407 error reporting SQLCODE: -407, SQLSTATE: 23502

SQLCODE: -407, SQLSTATE: 23502: The exact description of this error is such that a NULL value cannot be inserted into a column defined as NOT NULL. This is a violation of the integrity constraint exception.

in development, this error is something that we often encounter. The problem is simply, “Columns that cannot be empty are empty.” For a table with a large number of fields, it is difficult to sift. How to accurately

positioning?It’s actually pretty simple.

We can learn more from the error message, which can help us pinpoint the problem. Common error messages are as follows:

com.ibm.db2.jcc.b.SqlException: DB2 SQL error: SQLCODE: -407, SQLSTATE: 23502, SQLERRMC: TBSPACEID=2, TABLEID=201, COLNO=3
	at com.ibm.db2.jcc.b.sf.d(sf.java:1396)
	at com.ibm.db2.jcc.c.jb.l(jb.java:356)
	at com.ibm.db2.jcc.c.jb.a(jb.java:64)
	at com.ibm.db2.jcc.c.w.a(w.java:48)
	at com.ibm.db2.jcc.c.dc.c(dc.java:312)
	at com.ibm.db2.jcc.b.tf.cb(tf.java:1723)
	at com.ibm.db2.jcc.b.tf.d(tf.java:2315)
	at com.ibm.db2.jcc.b.tf.Z(tf.java:1326)
	at com.ibm.db2.jcc.b.tf.execute(tf.java:1310)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

Message: Integrity constraint exception (  
--- The error occurred in D:\------- path omitted-------.  
--- The error occurred while applying a parameter map.  
--- Check the aaa.sql-InlineParameterMap.  
--- Check the statement (update failed).  
--- Cause: com.ibm.db2.jcc.b.SqlException: DB2 SQL error: SQLCODE: -407, SQLSTATE: 23502, SQLERRMC: TBSPACEID=2, TABLEID=201, COLNO=3)

observe this error message carefully: SQLCODE: -407, SQLSTATE: 23502, SQLERRMC: TBSPACEID=2, TABLEID=201, COLNO=3. You can see that the error message has been located very well

is clear. We only need to query in syscat.columns of the system table. The syscat.columns table is a system table that holds detailed information about the COLUMNS of all tables in a DB2 database. We just need the basis

view, table name, column number can be queried to find which column. COLNO=3 in the error message above is the column number for which the error was reported. As follows:

SELECT 
	* 
FROM 
	SYSCAT.COLUMNS 
WHERE 
	TABSCHEMA = 'DB' AND 
	TABNAME = 'TT_PARAM_LOG' AND 
	COLNO = '3'

 

SQL Server 2008 18456 error and SA unable to login solution

SQL Server 2008 does not allow remote connection by default, and SA account is disabled by default. If you want to use SSMS to connect to SQL Server 2008 on remote server locally, you need to do two parts of configuration:

SQL Server 2008 does not allow remote connection by default, and SA account is disabled by default. If you want to use SSMS to connect to SQL Server 2008 on remote server locally, you need to do two parts of configuration:

1. SQL Server Management Studio Express (SSMS)

2. SQL Server Configuration Manager/SQL Server Configuration Manager (SSCM)

There are two points to note:

12005’s peripheral application configurator changed place in 2008

2. Some settings take effect and need to restart SQL server

Step 1: open SSMS, connect to database with windows identity, log in, right-click and select “properties”

Step 2: select “security” on the left and “SQL server and windows authentication mode” on the right to enable mixed login mode

Step 3: select “connect”, check “allow remote connection to this server”, and then click “OK”

Step 4: expand “security” & gt; “login” & gt; “Sa”, right-click and select “properties”

Step 5: select “general” on the left and “SQL server authentication” on the right, and set the password

Step 6: select “status”, select “enable”, and click “OK”

Step 7: right click the database and select aspect

Step 8: set “server configuration” and “remoteaccessenabled” property to “true”, and click “OK”

Step 9: so far, SSMS has been set up. Exit first, stop and restart SQL Server service, and then log in with SA. Success means that the SA account has been enabled

 

Step 10: start to configure SSCM

Start SQL Server Configuration Manager in windows menu

Select “SQL Server service” on the left and make sure that “SQL Server” and “SQL server browser” on the right are running (if not, go to “Microsoft SQL Server 2012localdb” and re open SQL configuration manager)

Step 11: the following is the English version of the interface. As shown in the figure: the TCP/IP on the right is “disabled” by default. Double click to open the settings panel and change it to “enabled”

Step 12: select the “IP addresses” tab and set the port “1433”

Step 13: as shown in the figure, change “TCP/IP” of “client protocols” to “enabled”

Step 14: after configuration, restart SQL Server 2008.

At this point should be able to use, but still want to confirm the firewall.

After SQL Server 2005/2008 Express is installed, remote connection is disabled by default. If remote access is required, manual configuration is required.

Turn on firewall settings. take SQLServr.exe (C:\Program Files\Microsoft SQL Server\MSSQL10.SQLEXPRESS\MSSQL\Binn\ sqlservr.exe )Add to the allowed list.

More articles, related information, and so on, WeChat official account: fish science and technology information. Or click the link: http://mp.weixin.qq.com/mp/homepage?__ biz=MzU5MjY1ODg0NA==&hid=3&sn=c8b9f858ce938d785e3f799a8cb59574&scene=18#wechat_ redirect 

Lock mechanism in Oracle

In order to explore lock, I set up a test table test, the structure and test data are as follows:
sql & gt; select * from test;

        ID NAME
———- ———-
         6 yoxi
         2 robinson
         2 tttt
         3 david
         4 jason
         5 lucy
         1 test

7 rows selected.

Row share if row share locking is performed on a table, other sessions cannot perform exclusive locking on the table, but they can perform any other locking. That is to say, if a session performs row share lock on the table, other sessions can be executed except drop command.
In session 1:
sql & gt; lock table test in row share mode;
the table is locked.
Session 2:
sql & gt; lock table test in row share mode;
the table is locked.
SQL & gt; commit;
commit completed.
SQL & gt; lock table test in row exclusive mode;
the table is locked.
SQL & gt; commit;
commit completed.
SQL & gt; lock table test in share mode;
the table is locked.
SQL & gt; commit;
submit complete
sql & gt; lock table test in exclusive mode nowait;
lock table test in exclusive mode nowait
*
error in line 1:
ora-00054: the resource is busy, but the specified nowait method is used to obtain the resource
sql & gt; update test set id = 10;
7 lines have been updated.
SQL & gt; rollback;
rollback completed.
SQL & gt; drop table test;
drop table test
*
error in line 1:
ora-00054: the resource is busy, but it is specified to obtain the resource by nowait// in fact, the drop table command will perform exclusive locking on the table
row exclusive when executing insert, update, Delete will obtain a row exclusive lock, which has all the characteristics of row share. If a session obtains a row exclusive lock, then other sessions cannot perform exclusive lock on the table or share lock on the table. DML operation can be performed on the table, but drop is not allowed;

In session 1, the
sql & gt; lock table test in row exclusive mode;
table is locked.
Session 2:
sql & gt; lock table test in row share mode;
the table is locked.
SQL & gt; commit;
commit completed.
SQL & gt; lock table test in row exclusive mode;
the table is locked.
SQL & gt; commit;
commit completed.
SQL & gt; lock table test in share mode nowait;
lock table test in share mode nowait
*
error in line 1:
ora-00054: the resource is busy, but it is specified to get the resource in nowait mode
sql & gt; lock table test in exclusive mode nowait;
Lock table test in exclusive mode nowait
*
error in line 1:
ora-00054: the resource is busy, but the nowait mode is specified to obtain the resource

Share (shared lock) when a session has a share lock on a table, other sessions cannot lock the table in exclusive, row exclusive mode. In other words, if row exclusive is not available, insert, update, delete and drop operations cannot be performed.
In session 1, the
sql & gt; lock table test in share mode;
table is locked.
In session 2, the
sql & gt; lock table test in row share mode;
table is locked.
SQL & gt; lock table test in row exclusive mode nowait;
error in line 1:
ora-00054: the resource is busy, but the specified nowait method is used to get the resource
sql & gt; lock table test in share mode;
the table is locked.
SQL & gt; commit;
commit completed.
SQL & gt; lock table test in exclusive mode nowait;
lock table test in exclusive mode nowait
*
error in line 1:
ora-00054: the resource is busy, but it is specified to get the resource in nowait mode

Exclusive (exclusive lock) when the drop command is executed, an exclusive lock is generated. As the name suggests, as long as a session has an exclusive lock on a table, other sessions cannot lock the table.
In session 1, the
sql & gt; lock table test in exclusive mode;
table is locked.
In session 2,
sql & gt; lock table test in row share mode nowait;
lock table test in row share mode nowait
*
there is an error in line 1:
ora-00054: the resource is busy, but the specified nowait method is used to obtain the resource
other locks do not need to be tested, because row share cannot obtain other locks, let alone other locks

Share row exclusive: if a session obtains a share row exclusive lock, other sessions can only obtain the row share lock on the table, that is to say, other sessions can’t perform insert, update, delete or drop operations on the table. Row share locking is good for nothing except preventing drop…
sql & gt; lock table test in share in session 1 Row exclusive mode;
the table is locked.
In session 2
sql & gt; lock table test in row share mode nowait;
the table is locked.
SQL & gt; commit;
commit completed.
SQL & gt; lock table test in row exclusive mode nowait;
lock table test in row exclusive mode nowait
*
error in line 1:
ora-00054: the resource is busy, but it is specified to get the resource in nowait mode
sql & gt; lock table test in share mode nowait;
Error in lock table test in share mode nowait
*
line 1:
ora-00054: the resource is busy, but the specified nowait method is used to obtain the resource
sql & gt; lock table test in exclusive mode nowait;
error in lock table test in exclusive mode nowait
*
line 1:
ora-00054: the resource is busy, However, it is specified to obtain resources in nowait mode

To sum up:

1. If a table is locked by a session, no matter what the lock is, the other session will not be able to obtain the exclusive lock for the table, that is, the table cannot be deleted. In fact, it is very easy to be the same. If someone is using that table, you must delete it

2. If row exclusive locking can be performed on a table, DML operation can certainly be performed

3. Share lock can be understood in this way. As the name suggests, share lock is used for sharing. Since it is shared, other users can only view it and cannot change it.

4. We can understand exclusive lock in this way. As the name suggests, exclusive lock is exclusive. Other users can only view it and cannot modify it.

Note: the lock command above can only get TM (table) lock

There are two kinds of locks generated by update, delete and insert. One is TM lock, the other is TX lock. TX lock means transaction lock

Select….. For update will also get a TX lock and a row exclusive lock
as well

Reprint address: http://blog.csdn.net/robinson1988/article/details/4675371

It’s time to upgrade your parquet: IOException: totalvaluecount = = 0

This article is from Huawei cloud community “your parquet should be upgraded: IOException: totalvaluecount = = 0 problem positioning Tour”, original author: wzhfy.

1. Problem description

When using spark SQL to perform ETL task, an error is reported when reading a table: “IOException: totalvaluecount = = 0”, but there is no exception when writing the table.

2. Preliminary analysis

The result of this table is generated after two tables join. After analysis, the result of join produces data skew, and the skew key is null. After join, each task writes a file, so the task whose partition key is null writes a large number of null values to a file, and the number of null values reaches 2.2 billion.

The figure of 2.2 billion is sensitive, just exceeding the maximum value of int 2147483647 (more than 2.1 billion). Therefore, it is suspected that parquet is writing more than int.max There’s a problem with a value.

[note] this paper only focuses on the problem that a large number of null values are written to the same file, resulting in an error when reading. As for whether it is reasonable to generate such a large number of nulls in this column, it is beyond the scope of this paper.

3. Deep dive into parquet (version 1.8.3, some contents may need to be understood in combination with parquet source code)

Entry: Spark (spark 2.3) – & gt; parquet

The parquet call entry is in spark, so the call stack is mined from spark.

InsertIntoHad oopFsRelationCommand.run ()/ SaveAsHiveFile.saveAsHiveFile () -> FileFormatWriter.write ()

There are several steps

    before starting a job, create an outputwriterfactory: ParquetFileFormat.prepareWrite ()。 A series of configuration information related to parquet writing files will be set here. The main one is to set the writesupport class ParquetOutputFormat.setWriteSupportClass (job, classof [parquetwritesupport]), parquetwritesupport is a class defined by spark itself. In executetask () – & gt; writeTask.execute In (), first create the outputwriter (parquetoutputwriter) through the outputwriterfactory: outputWriterFactory.newInstance ()。 For each row of records, use ParquetOutputWriter.write The (internalrow) method writes the parquet file in turn. Before the task ends, call ParquetOutputWriter.close () shut down resources.

3.1 write process

In parquetoutputwriter, through the ParquetOutputFormat.getRecordWriter Construct a recordwriter (parquet recordwriter), which includes:

Writesupport set when

    preparewrite(): responsible for converting spark record and writing to parquet structure parquetfilewriter: responsible for writing to file

    In parquetrecordwriter, the write operation is actually delegated to an internalwriter (internal parquetrecordwriter, constructed with writesupport and parquetfilewriter).

    Now let’s sort out the general process so far:

    single directory writetask/dynam icPartitionWriteTask.execute
    -> ParquetOutputWriter.write -> ParquetRecordWriter.write -> Interna lParquetRecordWriter.write

    Next, interna lParquetRecordWriter.write There are three things in it

    (1) writeSupport.write , i.e ParquetWriteSupport.write There are three steps

        MessageColumnIO.MessageColumnIORecordConsumer .startMessage; ParquetWriteSupport.writeFields : write the value of each column in a row, except null value; MessageColumnIO.MessageColumnIORecordConsumer . endmessage: write null value for missing fields in the second step.
        Columnwriterv1. Writenull – & gt; accountforvaluewritten:
        1) increase the counter valuecount (int type)
        2) to check whether the space is full, writepage – checkpoint 1 is required

        (2) Increase counter RecordCount (long type)

        (3) Check the block size to see if flushrowgrouptostore – checkpoint 2 is required

        Since all the written values are null and the memsize of 1 and 2 checkpoints is 0, page and row group will not be refreshed. As a result, null values are always added to the same page. The counter valuecount of columnwriterv1 is of type int, when it exceeds int.max The overflow becomes a negative number.

        Therefore, flushrowgrouptostore is executed only when the close() method is called (at the end of the task):
        the ParquetOutputWriter.close -> ParquetRecordWriter.close
        -> Interna lParquetRecordWriter.close -> flushRowGroupToStore
        -> ColumnWriteStoreV1.flush -> for each column ColumnWriterV1.flush

        Page will not be written here because valuecount overflow is negative.

        Because writepage has not been called, the totalvaluecount here is always 0.
        ColumnWriterV1.writePage -> C olumnChunkPageWriter.writePage -&Value total

        At the end of the write, interna lParquetRecordWriter.close -> flushRowGroupToStore -> Colum nChunkPageWriteStore.flushToFileWriter -> for each column C olumnChunkPageWriter.writeToFileWriter :

          ParquetFileWriter.startColumn : totalvaluecount is assigned to currentchunkvalueco untParquetFileWriter.writeDataPagesParquetFileWriter . endcolumn: currentchunk valuecount (0) and other metadata information construct a columnchunk metadata, and the relevant information will be written to the file eventually.

        3.2 read process

        Also, take spark as the entry to view.
        Initialization phase: ParquetFileFormat.BuildReaderWithPartitionValues -> Vectorize dParquetRecordReader.initialize -> ParquetFileReader.readFooter -> Parq uetMetadataConverter.readParquetMetadata -> fromParquetMetadata -> ColumnChunkMetaData.get , which contains valuecount (0).

        When reading: vectorize dParquetRecordReader.nextBatch -> checkEndOfRowGroup:
        1) ParquetFileReader.readNextRowGroup -> for each chunk, currentRowGroup.addColumn ( chunk.descriptor.col , chunk.readAllPages ())

        Since getvaluecount is 0, pagesinchunk is empty.

        2) Construct columnchunkpagereader:

        Because the page list is empty, the totalvaluecount is 0, resulting in an error in the construction of vectorizedcolumnreader.

        4. Solution: parquet upgrade (version 1.11.1)

        In the new version, ParquetWriteSupport.write ->
        MessageColumnIO.MessageColumnIORecordConsumer .endMessage ->
        ColumnWriteStoreV1(ColumnWriteStoreBase).endRecord:

        In endrecord, the attribute of maximum number of records per page (2W records by default) and the check logic are added. When the limit is exceeded, writepage will be generated, so that the valuecount of columnwriterv1 will not overflow (it will be cleared after each writepage).

        Compared with the old version 1.8.3, columnwritestorev1.endrecord is empty.

        Attachment: a small trick in parquet

        In parquet, in order to save space, when a long type value is within a certain range, int will be used to store it. The method is as follows:

        Determine whether it can be stored with int:

        If you can, use intcolumnchunkmetadata instead of longcolumnchunkmetadata to convert on construction time:

        When you use it, turn it back, in tColumnChunkMetaData.getValueCount -> intToPositiveLong():

        The common int range is – 2 ^ 31 ~ (2 ^ 31 – 1). Because metadata information (such as valuecount) is a non negative integer, it can only store numbers in the 0 ~ (2 ^ 31 – 1) range. In this way, the number in the range of 0 ~ (2 ^ 32 – 1) can be expressed, and the expression range is doubled.

        Attachment: test case code that can be used to reproduce (depending on some spark classes, it can be run in spark project)

        Test case code.txt 1.88kb

         

        Click follow to learn about Huawei’s new cloud technology for the first time~

Hive SQL syntax error and corresponding solutions

Hive SQL syntax is different from the frequently used MySQL syntax. SQL written according to the habit of writing MySQL often reports errors, and it is difficult to see the cause of the problem. Therefore, this paper records the phenomenon of the problem and the solution

If you don’t find any problem with the alias: select from error ‘> select from error = 4200* From a) treror: error while compiling statement: failed: semanticexception [error 10025]: expression not in group by key ID (state = 42000, code = 10025)
cause: fields in the select statement but not in the group by statement will cause the error
solution: change the select id, name from a group by name to select collect_ Set (ID), name from a group by NameError: error while compiling statement: failed: semanticexception [error 10004]: Line 1:13 invalid table alias or column reference ‘ID’:
cause: the corresponding field in the subquery statement has changed, such as using a function or renaming
solution: select id, name from (select collect)_ Set (ID), name from a group by name) t “is changed to” select id, name from (select collect)_ Set (ID) id, name from a group by name) t or select t.id, name from (select collect)_ Set (ID), name from a group by name) tproblem: unable to query data after hive multiple SQL unions
cause: the data after union is saved in HDFS to multiple new directories under the table directory
solution: add configuration (which can be directly input on the CLI command line) set mapred.input.dir .recursive=true;
Or use a select statement to package multiple union statements and then execute hsql on tez to report an error. Out of memory
needs to adjust the size of the container
set hive.tez.container .size=4096;
set hive.tez.java . opts = – xmx3072m; hive does not query subdirectories recursively by default, so when creating a table, if there are subdirectories in the specified directory, it will report ERROR:not a file
You can perform the following four configurations in hive cli to enable recursive access to subdirectories in the callback. Instead of recursive query, all the data under the directory will be loaded in. Therefore, when the subdirectories are very deep or there are many subdirectories, the speed will be very slow.
set hive.input.dir .recursive=true; 
set hive.mapred.supports .subdirectories=true; 
set hive.supports.subdirectories=true ; 
set mapred.input.dir .recursive=true;

[mybatis] error: malformed ognl expression: name! = null and name! = ‘”

1、 Questions

Malformed ognl expression: name! = null and name! = ‘
report an error;
malformed means abnormal;

2、 Solutions and reasons

It turned out to be* Mapper.xml In the document:

<if test ="userId ! null">

The equals sign is missing; the SQL in the automatically generated mybatis file is missing an equals sign

<if test ="userId != null">

Moreover, there should be no space between exclamation mark and equal sign;

MAMP failed to start: Apache could’t be started. Please check your MAMP installation and configuration

Solution 1:

In / applications / MAMP / library / bin, find the file envvars and rename it to_ envvars。

It doesn’t work for me.

Solution 2:

sudo /Applications/MAMP/bin/startApache.sh

or

sudo /Applications/MAMP/Library/bin/apachectl start

There is no display.

Solution 3:

Change document root to a fixed address that you don’t know how to delete.
MAMP > Preferences > Server > Document Root

Still failed.

Solution 4:

sudo /Applications/MAMP/Library/bin/apachectl restart

report errors:

shell-init: error retrieving current directory: getcwd: cannot access parent directories: Operation not permitted
job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: Operation not permitted
httpd not running, trying to start

I solved it lazily

Final solution

I had a video conference with the professor. He asked me to come to him

    delete the installation package, turn off the computer, wait for five minutes, and then turn on the system. If there is an update, update it, and then download the MAMP software again

    Well I guess it’s the system update that works, updating Catalina to Big Sur
    as a person who doesn’t shut down several times a year, and who can delay several versions of the update
    After doing all the above honestly In front of the professor, there was no problem at all, and then he was connected
    for a time, I was a little embarrassed, and I told him by email how many attempts I had made I’m impressed

    reference resources

      Apache server cannot start why won Apache server start in MAMP

Datacamp course: database design

1. Processing, Storing, and Organizing Data
</> OLAP vs. OLTP

Categorize the cards into the approach that they describe best.

OLAP OLTP
Queries a larger amount of data Most likely to have data from the past hour
Helps businesses with decision making and problem solving Typically uses an operational database
Typically uses a data warehouse Data is inserted and updated more often

</> Which is better?
Explore the dataset. What data processing approach is this larger repository most likely using?
OLTP because this table could not be used for any analysis. OLAP because each record has a unique service request number. OLTP because this table’s structure appears to require frequent updates. OLAP because this table focuses on pothole requests only.
</> Name that data type!
Each of these cards hold a type of data. Place them in the correct category.

Unstructured Semi-Structured Structured
To-do notes in a text editor CSVs of open data downloaded from your local government websites A relational database with latest withdrawals and deposits made by clients
Images in your photo library JSON object of tweets outputted in real-time by the Twitter API
Zip file of all text messages ever received

</> Ordering ETL Tasks
In the ETL flow you design, different steps will take place. Place the steps in the most appropriate order.

eCommerce API outputs real time data of transactions

Python script drops null rows and clean data into pre-determined columns

Resulting dataframe is written into an AWS Redshift Warehouse

</> Recommend a storage solution
When should you choose a data warehouse over a data lake?
To train a machine learning model with a 150 GB of raw image data. To store real-time social media posts that may be used for future analysis To store customer data that needs to be updated regularly To create accessible and isolated data repositories for other analysts
</> Classifying data models
Each of these cards hold a tool or concept that fits into a certain type of data model. Place the cards in the correct category.

Conceptual Data Model Logical Data Model Physical Data Model
Gathers business requirements Relational model File structure of data storage
Entities, attributes, and relationships Determining tables and columns

</> Deciding fact and dimension tables
Out of these possible answers, what would be the best way to organize the fact table and dimensional tables?
A fact table holding duration_mins and foreign keys to dimension tables holding route details and week details, respectively. A fact table holding week,month, year and foreign keys to dimension tables holding route details and duration details, respectively. A fact table holding route_name,park_name, distance_km,city_name, and foreign keys to dimension tables holding week details and duration details, respectively.
Create a dimension table called route that will hold the route information.
Create a dimension table called week that will hold the week information.

CREATE TABLE route(
	route_id INTEGER PRIMARY KEY,
    park_name VARCHAR(160) NOT NULL,
    city_name VARCHAR(160) NOT NULL,
    distance_km FLOAT NOT NULL,
    route_name VARCHAR(160) NOT NULL
);

CREATE TABLE week(
	week_id INTEGER PRIMARY KEY,
    week INTEGER NOT NULL,
    month VARCHAR(160) NOT NULL,
    year INTEGER NOT NULL
);

</> Querying the dimensional model
Calculate the sum of the duration_mins column.

SELECT 
	SUM(duration_mins)
FROM 
	runs_fact;

sum
1172.16

Join week_dim and runs_fact.
Get all the week_id’s from July, 2019.

SELECT 
	SUM(duration_mins)
FROM 
	runs_fact
INNER JOIN week_dim ON runs_fact.week_id = week_dim.week_id
WHERE month = 'July' and year = '2019';

sum
381.46

2. Database Schemas and Normalization
</> Running from star to snowflake
After learning about the snowflake schema, you convert the current star schema into a snowflake schema. To do this, you normalize route_dim and week_dim. Which option best describes the resulting new tables after doing this?
The tables runs_fact, route_dim, and week_dim have been loaded.
week_dim is extended two dimensions with new tables for month and year. route_dim is extended one dimension with a new table for city. week_dim is extended two dimensions with new tables for month and year. route_dim is extended two dimensions with new tables for city and park. week_dim is extended three dimensions with new tables for week, month and year. route_dim is extended one dimension with new tables for city and park.
</> Adding foreign keys
In the constraint called sales_book, set book_id as a foreign key.
In the constraint called sales_time, set time_id as a foreign key.
In the constraint called sales_store, set store_id as a foreign key.

-- Add the book_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_book
    FOREIGN KEY (book_id) REFERENCES dim_book_star (book_id);
-- Add the time_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_time
    FOREIGN KEY (time_id) REFERENCES dim_time_star (time_id);
-- Add the store_id foreign key
ALTER TABLE fact_booksales ADD CONSTRAINT sales_store
    FOREIGN KEY (store_id) REFERENCES dim_store_star (store_id);

</> Extending the book dimension
Create dim_author with a column for author.
Insert all the distinct authors from dim_book_star into dim_author.

CREATE TABLE dim_author (
    author VARCHAR(256)  NOT NULL
);
-- Insert authors into the new table
INSERT INTO dim_author
SELECT DISTINCT author FROM dim_book_star;

Alter dim_author to have a primary key called author_id.
Output all the columns of dim_author.

CREATE TABLE dim_author (
    author varchar(256)  NOT NULL
);
INSERT INTO dim_author
SELECT DISTINCT author FROM dim_book_star;
-- Add a primary key 
ALTER TABLE dim_author ADD COLUMN author_id SERIAL PRIMARY KEY;
-- Output the new table
SELECT * FROM dim_author;

author				author_id
F. Scott Fitzgerald	1
Barack Obama		2
Agatha Christie		3
...

</> Querying the star schema
Select state from the appropriate table and the total sales_amount.
Complete the JOIN on book_id.
Complete the JOIN to connect the dim_store_star table
Conditionally select for books with the genre novel.
Group the results by state.

SELECT dim_store_star.state, SUM(fact_booksales.sales_amount)
FROM fact_booksales
    JOIN dim_book_star on fact_booksales.book_id = dim_book_star.book_id
    JOIN dim_store_star on fact_booksales.store_id = dim_store_star.store_id
WHERE  
    dim_book_star.genre = 'novel'
GROUP BY
    dim_store_star.state;

state		sum
Florida		295594.2
Vermont		216282
Louisiana	176979
...

</> Querying the snowflake schema
Select state from the appropriate table and the total sales_amount.
Complete the two JOINS to get the genre_id’s.
Complete the three JOINS to get the state_id’s.
Conditionally select for books with the genre novel.
Group the results by state.

SELECT dim_state_sf.state, SUM(fact_booksales.sales_amount)
FROM fact_booksales
    JOIN dim_book_sf on fact_booksales.book_id = dim_book_sf.book_id
    JOIN dim_genre_sf on dim_book_sf.genre_id = dim_genre_sf.genre_id
    JOIN dim_store_sf on fact_booksales.store_id = dim_store_sf.store_id 
    JOIN dim_city_sf on dim_store_sf.city_id = dim_city_sf.city_id
	JOIN dim_state_sf on  dim_city_sf.state_id = dim_state_sf.state_id
WHERE  
    dim_genre_sf.genre = 'novel'
GROUP BY
    dim_state_sf.state;

state				sum
British Columbia	374629.2
California			583248.6
Florida				295594.2
...

</> Updating countries
Output all the records that need to be updated in the star schema so that countries are represented by their abbreviations.

SELECT * FROM dim_store_star
WHERE country != 'USA' AND country !='CA';

store_id	store_address		city			state		country
798			23 Jeanne Ave		Montreal		Quebec		Canada
799			56 University St	Quebec City		Quebec		Canada
800			23 Verte Ave		Montreal		Quebec		Canada
...

How many records would need to be updated in the snowflake schema?
18 records 2 records 1 record 0 records
</> Extending the snowflake schema
Add a continent_id column to dim_country_sf with a default value of 1. Note that NOT NULL DEFAULT(1) constrains a value from being null and defaults its value to 1.
Make that new column a foreign key reference to dim_continent_sf’s continent_id.

ALTER TABLE dim_country_sf
ADD continent_id int NOT NULL DEFAULT(1);
-- Add the foreign key constraint
ALTER TABLE dim_country_sf ADD CONSTRAINT country_continent
   FOREIGN KEY (continent_id) REFERENCES dim_continent_sf(continent_id);
-- Output updated table
SELECT * FROM dim_country_sf;

country_id	country		continent_id
1			Canada		1
2			USA			1

</> Converting to 1NF
Does the customers table meet 1NF criteria?
Yes, all the records are unique. No, because there are multiple values in cars_rented and invoice_id No, because the non-key columns such as don’t depend on customer_id, the primary key.
cars_rented holds one or more car_ids and invoice_id holds multiple values. Create a new table to hold individual car_ids and invoice_ids of the customer_ids who’ve rented those cars.
Drop two columns from customers table to satisfy 1NF

-- Create a new table to satisfy 1NF
CREATE TABLE cust_rentals (
  customer_id INT NOT NULL,
  car_id VARCHAR(128) NULL,
  invoice_id VARCHAR(128) NULL
);
-- Drop column from customers table to satisfy 1NF
ALTER TABLE customers
DROP COLUMN cars_rented,
DROP COLUMN invoice_id;

</> Converting to 2NF
Why doesn’t customer_rentals meet 2NF criteria?
Because the end_date doesn’t depend on all the primary keys. Because there can only be at most two primary keys. Because there are non-key attributes describing the car that only depend on one primary key, car_id.
Create a new table for the non-key columns that were conflicting with 2NF criteria.
Drop those non-key columns from customer_rentals.

-- Create a new table to satisfy 2NF
CREATE TABLE cars (
  car_id VARCHAR(256) NULL,
  model VARCHAR(128),
  manufacturer VARCHAR(128),
  type_car VARCHAR(128),
  condition VARCHAR(128),
  color VARCHAR(128)
);
-- Drop columns in customer_rentals to satisfy 2NF
ALTER TABLE customer_rentals
DROP COLUMN model,
DROP COLUMN manufacturer, 
DROP COLUMN type_car,
DROP COLUMN condition,
DROP COLUMN color;

</> Converting to 3NF
Why doesn’t rental_cars meet 3NF criteria?
Because there are two columns that depend on the non-key column, model. Because there are two columns that depend on the non-key column, color. Because 2NF criteria isn’t satisfied.
Create a new table for the non-key columns that were conflicting with 3NF criteria.
Drop those non-key columns from rental_cars.

-- Create a new table to satisfy 3NF
CREATE TABLE car_model(
  model VARCHAR(128),
  manufacturer VARCHAR(128),
  type_car VARCHAR(128)
);
-- Drop columns in rental_cars to satisfy 3NF
ALTER TABLE rental_cars
DROP COLUMN manufacturer, 
DROP COLUMN type_car;

3. Database Views
</> Tables vs. views

Only Tables Views&Tables Only Views
Part of the physical schema of a database Contains rows and columns Always defined by a query
Can be queried Takes up less memory
Has access control

</> Viewing views
Query the information schema to get views.
Exclude system views in the results.

SELECT * FROM information_schema.views
WHERE table_schema NOT IN ('pg_catalog', 'information_schema');

What does view1 do?

SELECT content.reviewid,
content.content
FROM content
WHERE (length(content.content) > 4000);

Returns the content records with reviewids that have been viewed more than 4000 times. Returns the content records that have reviews of more than 4000 characters. Returns the first 4000 records in content.
What does view2 do?

SELECT reviews.reviewid,
reviews.title,
reviews.score
FROM reviews
WHERE (reviews.pub_year = 2017)
ORDER BY reviews.score DESC
LIMIT 10;

Returns 10 random reviews published in 2017. Returns the top 10 lowest scored reviews published in 2017. Returns the top 10 highest scored reviews published in 2017.
</> Creating and querying a view
Create a view called high_scores that holds reviews with scores above a 9.

CREATE VIEW high_scores AS
SELECT * FROM reviews
WHERE score > 9;

Count the number of records in high_scores that are self-released in the label field of the labels table.

CREATE VIEW high_scores AS
SELECT * FROM REVIEWS
WHERE score > 9;
-- Count the number of self-released works in high_scores
SELECT COUNT(*) FROM high_scores
INNER JOIN labels ON high_scores.reviewid = labels.reviewid
WHERE label = 'self-released';

count
3

</> Creating a view from other views
Create a view called top_artists_2017 with one column artist holding the top artists in 2017.
Join the views top_15_2017 and artist_title.
Output top_artists_2017.

CREATE VIEW top_artists_2017 AS
SELECT artist_title.artist FROM top_15_2017
INNER JOIN artist_title
ON top_15_2017.reviewid = artist_title.reviewid;
-- Output the new view
SELECT * FROM top_artists_2017;

artist
massive attack
krallice
uranium club
...

Which is the DROP command that would drop both top_15_2017 and top_artists_2017?
DROP VIEW top_15_2017 CASCADE; DROP VIEW top_15_2017 RESTRICT; DROP VIEW top_artists_2017 RESTRICT; DROP VIEW top_artists_2017 CASCADE;
</> Granting and revoking access
Revoke all database users’ update and insert privileges on the long_reviews view.
Grant the editor user update and insert privileges on the long_reviews view.

REVOKE update, insert ON long_reviews FROM PUBLIC; 
GRANT update, insert ON long_reviews TO editor; 

</> Updatable views
Which views are updatable?
long_reviews and top_25_2017 top_25_2017 long_reviews top_25_2017 and artist_title

SELECT * FROM information_schema.views
WHERE table_schema NOT IN ('pg_catalog', 'information_schema');

</> Redefining a view
Can the CREATE OR REPLACE statement be used to redefine the artist_title view?
Yes, as long as the label column comes at the end. No, because the new query requires a JOIN with the labels table. No, because a new column that did not exist previously is being added to the view. Yes, as long as the label column has the same data type as the other columns in artist_title
Redefine the artist_title view to include a column for the label field from the labels table.

CREATE OR REPLACE VIEW artist_title AS
SELECT reviews.reviewid, reviews.title, artists.artist, labels.label
FROM reviews
INNER JOIN artists
ON artists.reviewid = reviews.reviewid
INNER JOIN labels
ON labels.reviewid = reviews.reviewid;

SELECT * FROM artist_title;

reviewid	title					artist			label
22703		mezzanine				massive attack	virgin
22721		prelapsarian			krallice		hathenter
22659		all of them naturals	uranium club	fashionable idiots
...

</> Materialized versus non-materialized
Organize these characteristics into the category that they describe best.

Non-Materialized Views Non-Materialized&Materialized Views Materialized Views
Always turns up-to-date data Can be used in a data warehouse Stores the query result on disk
Better to use on write-intensive databases Helps reduce the overhead of writing queries Consumes more storage

</> Creating and refreshing a materialized view
Create a materialized view called genre_count that holds the number of reviews for each genre.
Refresh genre_count so that the view is up-to-date.

CREATE MATERIALIZED VIEW genre_count AS
SELECT genre, COUNT(*) 
FROM genres
GROUP BY genre;

INSERT INTO genres
VALUES (50000, 'classical');
-- Refresh genre_count
REFRESH MATERIALIZED VIEW genre_count;

SELECT * FROM genre_count;

</> Managing materialized views
Why do companies use pipeline schedulers, such as Airflow and Luigi, to manage materialized views?
To set up a data warehouse and make sure tables have the most up-to-date data. To refresh materialized views with consideration to dependences between views. To convert non-materialized views to materialized views. To prevent the creation of new materialized views when there are too many dependencies.
4. Database Management
</> Create a role
Create a role called data_scientist.

CREATE ROLE data_scientist;

Create a role called marta that has one attribute: the ability to login (LOGIN).

CREATE ROLE marta LOGIN;

Create a role called admin with the ability to create databases (CREATEDB) and to create roles (CREATEROLE).

CREATE ROLE admin WITH CREATEDB CREATEROLE;

</> GRANT privileges and ALTER attributes
Grant the data_scientist role update and insert privileges on the long_reviews view.
Alter Marta’s role to give her the provided password.

GRANT UPDATE, INSERT ON long_reviews TO data_scientist;

ALTER ROLE marta WITH PASSWORD '[email protected]';

</> Add a user role to a group role
Add Marta’s user role to the data scientist group role.
Celebrate! You hired multiple data scientists.
Remove Marta’s user role from the data scientist group role.

GRANT data_scientist TO marta;

REVOKE data_scientist FROM marta;

</> Reasons to partition
In the video, you saw some very good reasons to use partitioning. However, can you find which one wouldn’t be a good reason to use partitioning?
Improve data integrity Save records from 2017 or earlier on a slower medium Easily extend partitioning to sharding, and thus making use of parallelization
</> Partitioning and normalization
Can you classify the characteristics in the correct bucket?

Normalization Vertical Partitioning Horizontal Partitioning
Reduce redundancy in table Move specific columns to slower medium Sharding is an extension on this, using multiple machines
Changes the logical data model (Example) Move the third and fourth column to separate table (Examples) Use the timestamp to move rows from Q4 in a specific table

</> Creating vertical partitions
Create a new table film_descriptions containing 2 fields: film_id, which is of type INT, and long_description, which is of type TEXT.
Occupy the new table with values from the film table.

CREATE TABLE film_descriptions (
    film_id INT,
    long_description TEXT
);
-- Copy the descriptions from the film table
INSERT INTO film_descriptions
SELECT film_id, long_description FROM film;

Drop the field long_description from the film table.
Join the two resulting tables to view the original table.

CREATE TABLE film_descriptions (
    film_id INT,
    long_description TEXT
);
-- Copy the descriptions from the film table
INSERT INTO film_descriptions
SELECT film_id, long_description FROM film;
-- Drop the descriptions from the original table
ALTER TABLE film DROP COLUMN long_description;
-- Join to view the original table
SELECT * FROM film_descriptions 
JOIN film
ON film_descriptions.film_id = film.film_id;

film_id	long_description																						film_id	title				rental_duration		rental_rate	length	replacement_cost	rating	release_year
1		A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies		1		ACADEMY DINOSAUR	6					0.99		86		20.99				PG		2019
2		A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China	2		ACE GOLDFINGER		3					4.99		48		12.99				G		2017
3		A Astounding Reflection of a Lumberjack And a Car who must Sink a Lumberjack in A Baloon Factory		3		ADAPTATION HOLES	7					2.99		50		18.99				NC-17	2019
...

</> Creating horizontal partitions
Create the table film_partitioned, partitioned on the field release_year.

CREATE TABLE film_partitioned (
  film_id INT,
  title TEXT NOT NULL,
  release_year TEXT
)
PARTITION BY RANGE (release_year);

Create three partitions: one for each release year: 2017, 2018, and 2019. Call the partition for 2019 film_2019, etc.

CREATE TABLE film_partitioned (
  film_id INT,
  title TEXT NOT NULL,
  release_year TEXT
)
PARTITION BY LIST (release_year);
-- Create the partitions for 2019, 2018, and 2017
CREATE TABLE film_2019
	PARTITION OF film_partitioned FOR VALUES IN ('2019');
CREATE TABLE film_2018
	PARTITION OF film_partitioned FOR VALUES IN ('2018');  
CREATE TABLE film_2017
	PARTITION OF film_partitioned FOR VALUES IN ('2017');

Occupy the new table the three fields required from the film table.

CREATE TABLE film_partitioned (
  film_id INT,
  title TEXT NOT NULL,
  release_year TEXT
)
PARTITION BY LIST (release_year);
-- Create the partitions for 2019, 2018, and 2017
CREATE TABLE film_2019
	PARTITION OF film_partitioned FOR VALUES IN ('2019');
CREATE TABLE film_2018
	PARTITION OF film_partitioned FOR VALUES IN ('2018');
CREATE TABLE film_2017
	PARTITION OF film_partitioned FOR VALUES IN ('2017');
-- Insert the data into film_partitioned
INSERT INTO film_partitioned
SELECT film_id, title, release_year FROM film;
-- View film_partitioned
SELECT * FROM film_partitioned;

film_id	title				release_year
2		ACE GOLDFINGER		2017
4		AFFAIR PREJUDICE	2017
5		AFRICAN EGG			2017
...

</> Data integration do’s and dont’s
Categorize the following items as being True or False when talking about data integration.

False True
Everybody should have access to sensitive data in the final view. You should be careful choosing a hand-coded solution because of maintenance cost.
All your data has to be updated in real time in the final view. Being able to access the desired data through a single view does not mean all data is stored together.
Automated testing and proactive alerts are not needed. Data in the final view can be updated in different intervals.
You should choose whichever solution is right for the job right now. Data integration should be business driven, e.g. what combination of data will be useful for the business.
After data integration all your data should be in a single table. My source data can be stored in different physical locations.
Your data integration solution, hand-coded or ETL tool, should work once and then you can use the resulting view to run queries forever. My source data can be in different formats and database management systems.

</> Analyzing a data integration plan
Which risk is not clearly indicated on the data integration plan?
It is unclear if you took data governance into account. You didn’t clearly show where your data originated from. You should indicate that you plan to anonymize patient health records. If data is lost during ETL you will not find out.
</> SQL versus NoSQL
When is it better to use a SQL DBMS?
You are dealing with rapidly evolving features, functions, data types, and it’s difficult to predict how the application will grow over time. You have a lot of data, many different data types, and your data needs will only grow over time. You are concerned about data consistency and 100% data integrity is your top goal. Your data needs scale up, out, and down.
</> Choosing the right DBMS
Categorize the cards into the appropriate DBMS bucket.

SQL NoSQL
A banking application where it’s extremely important that data integrity is ensured. Data warehousing on big data.
A social media tool that provides users with the oppotunities to grow their networks via connections.
During the holiday shopping season, a e-commerce website needs to keep track of millions of shopping carts.
A blog that needs to create and incorporate new types of content, such as images, comments, and videos.

Cause: java.sql.SQLException: invalid column type: 1111 Error [How to Fix]

Warming: Servlet.service() for servlet [spring] in context with path [/*****] threw exception [Request processing failed; nested exception is org.mybatis.spring.MyBatisSystemException: nested exception is org.apache.ibatis.type.TypeException: Could not set parameters for mapping: ParameterMapping{property='ENTERPRISE_NAME', mode=IN, javaType=class java.lang.Object, jdbcType=null, numericScale=null, resultMapId='null', jdbcTypeName='null', expression='null'}. Cause: org.apache.ibatis.type.TypeException: Error setting null for parameter #13 with JdbcType OTHER . Try setting a different JdbcType for this parameter or a different jdbcTypeForNull configuration property. Cause: java.sql.SQLException: Invalid column type: 1111] with root cause
java.sql.SQLException: Invalid column type: 1111
	at oracle.jdbc.driver.OracleStatement.getInternalType(OracleStatement.java:3978)

Screen:

When inserting the name,ENTERPRISE_NAME passes null and mybatis returns an error.
The solution is: