CCA Spark and Hadoop Developer Exam 온라인 연습
최종 업데이트 시간: 2024년11월08일
당신은 온라인 연습 문제를 통해 Cloudera CCA175 시험지식에 대해 자신이 어떻게 알고 있는지 파악한 후 시험 참가 신청 여부를 결정할 수 있다.
시험을 100% 합격하고 시험 준비 시간을 35% 절약하기를 바라며 CCA175 덤프 (최신 실제 시험 문제)를 사용 선택하여 현재 최신 96개의 시험 문제와 답을 포함하십시오.
정답: Solution:
Step 1: Create all three files in hdfs in directory called sparkl (We will do using Hue}. However, you can first create in local filesystem and then
Step 2: Load EmployeeManager.csv file from hdfs and create PairRDDs
val manager = sc.textFile("spark1/EmployeeManager.csv")
val managerPairRDD = manager.map(x=> (x.split(", ")(0), x.split(", ")(1)))
Step 3: Load EmployeeName.csv file from hdfs and create PairRDDs
val name = sc.textFile("spark1/EmployeeName.csv")
val namePairRDD = name.map(x=> (x.split(", ")(0), x.split('\")(1)))
Step 4: Load EmployeeSalary.csv file from hdfs and create PairRDDs
val salary = sc.textFile("spark1/EmployeeSalary.csv")
val salaryPairRDD = salary.map(x=> (x.split(", ")(0), x.split(", ")(1)))
Step 4: Join all pairRDDS
val joined = namePairRDD.join(salaryPairRDD}.join(managerPairRDD}
Step 5: Now sort the joined results, val joinedData = joined.sortByKey()
Step 6: Now generate comma separated data.
val finalData = joinedData.map(v=> (v._1, v._2._1._1, v._2._1._2, v._2._2))
Step 7: Save this output in hdfs as text file.
finalData.saveAsTextFile("spark1/result.txt")
정답: Solution:
Step 1: Create directory
hdfs dfs -mkdir hdfs_commands
Step 2: Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txt
Step 3: Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.
hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExam
Step 4: Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.
touch data_local.txt
hdfs dfs -moveFromLocal /home/cloudera/Desktop/HadoopExam/dataJocal.txt hdfs_commands/
Step 5: Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.
hdfs dfs -touchz hdfscommands/data hdfs.txt
hdfs dfs -getfrdfs_commands/data_hdfs.txt /home/cloudera/Desktop/HadoopExam/
Step 6: Create a file in local filesystem named filel .txt and put it to hdfs
touch filel.txt
hdfs dfs -put/home/cloudera/Desktop/HadoopExam/file1.txt hdfs_commands/
정답: Solution:
Step 1: Create directory mkdir /tmp/spooldir2
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume8.conf.
agent1 .sources = source1
agent1.sinks = sink1a sink1b agent1.channels = channel1a channel1b
agent1.sources.source1.channels = channel1a channel1b
agent1.sources.source1.selector.type = replicating
agent1.sources.source1.selector.optional = channel1b
agent1.sinks.sink1a.channel = channel1a
agent1 .sinks.sink1b.channel = channel1b
agent1.sources.source1.type = spooldir
agent1 .sources.sourcel.spoolDir = /tmp/spooldir2
agent1.sinks.sink1a.type = hdfs
agent1 .sinks, sink1a.hdfs. path = /tmp/flume/primary
agent1 .sinks.sink1a.hdfs.tilePrefix = events
agent1 .sinks.sink1a.hdfs.fileSuffix = .log
agent1 .sinks.sink1a.hdfs.fileType = Data Stream
agent1 . sinks.sink1b.type = hdfs
agent1 . sinks.sink1b.hdfs.path = /tmp/flume/secondary
agent1 .sinks.sink1b.hdfs.filePrefix = events
agent1.sinks.sink1b.hdfs.fileSuffix = .log
agent1 .sinks.sink1b.hdfs.fileType = Data Stream
agent1.channels.channel1a.type = file
agent1.channels.channel1b.type = memory
step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume8.conf --name age
Step 5: Open another terminal and create a file in /tmp/spooldir2/
echo "IBM, 100, 20160104" » /tmp/spooldir2/.bb.txt
echo "IBM, 103, 20160105" » /tmp/spooldir2/.bb.txt mv /tmp/spooldir2/.bb.txt /tmp/spooldir2/bb.txt
After few mins
echo "IBM.100.2, 20160104" »/tmp/spooldir2/.dr.txt
echo "IBM, 103.1, 20160105" » /tmp/spooldir2/.dr.txt mv /tmp/spooldir2/.dr.txt /tmp/spooldir2/dr.txt
정답: Solution:
Step 1: Create directory mkdir /tmp/spooldir/bb mkdir /tmp/spooldir/dr
Step 2: Create flume configuration file, with below configuration for
agent1.sources = source1 source2
agent1 .sinks = sink1
agent1.channels = channel1
agent1 .sources.source1.channels = channel1
agentl .sources.source2.channels = channell agent1 .sinks.sinkl.channel = channell
agent1 . sources.source1.type = spooldir
agent1 .sources.sourcel.spoolDir = /tmp/spooldir/bb
agent1 . sources.source2.type = spooldir
agent1 .sources.source2.spoolDir = /tmp/spooldir/dr
agent1 . sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = /tmp/flume/finance
agent1-sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1 .sinks.sink1.hdfs.inUsePrefix = _
agent1 .sinks.sink1.hdfs.fileType = Data Stream
agent1.channels.channel1.type = file
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume7.conf --name agent1
Step 5: Open another terminal and create a file in /tmp/spooldir/
echo "IBM, 100, 20160104" » /tmp/spooldir/bb/.bb.txt
echo "IBM, 103, 20160105" » /tmp/spooldir/bb/.bb.txt mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt
After few mins
echo "IBM, 100.2, 20160104" » /tmp/spooldir/dr/.dr.txt
echo "IBM, 103.1, 20160105" »/tmp/spooldir/dr/.dr.txt mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt
정답: Solution:
Step 1: Create directory mkdir /tmp/nrtcontent
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume6.conf.
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
agent1 .sources.source1.channels = channel1
agent1 .sinks.sink1.channel = channel1
agent1 . sources.source1.type = spooldir
agent1 .sources.source1.spoolDir = /tmp/nrtcontent
agent1 .sinks.sink1 .type = hdfs
agent1 . sinks.sink1.hdfs .path = /tmp/flume
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1 .sinks.sink1.hdfs.inUsePrefix = _
agent1 .sinks.sink1.hdfs.fileType = Data Stream
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume6.conf --name agent1
Step 5: Open another terminal and create a file in /tmp/nrtcontent
echo "I am preparing for CCA175 from ABCTech m.com " > /tmp/nrtcontent/.he1.txt
mv /tmp/nrtcontent/.he1.txt /tmp/nrtcontent/he1.txt
After few mins
echo "I am preparing for CCA175 from TopTech .com " > /tmp/nrtcontent/.qt1.txt
mv /tmp/nrtcontent/.qt1.txt /tmp/nrtcontent/qt1.txt
정답: Solution:
Step 1: Create hive table for flumemaleemployeel and .'
CREATE TABLE flumemaleemployeel
(
sex_type int, name string, city string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
CREATE TABLE flumefemaleemployeel
(
sex_type int, name string, city string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
Step 2: Create below directory and file mkdir /home/cloudera/flumetest/ cd /home/cloudera/flumetest/
Step 3: Create flume configuration file, with below configuration for source, sink and channel and save it in flume5.conf.
agent.sources = tailsrc
agent.channels = mem1 mem2
agent.sinks = stdl std2
agent.sources.tailsrc.type = exec
agent.sources.tailsrc.command = tail -F /home/cloudera/flumetest/in.txt
agent.sources.tailsrc.batchSize = 1
agent.sources.tailsrc.interceptors = i1 agent.sources.tailsrc.interceptors.i1.type = regex_extractor agent.sources.tailsrc.interceptors.il.regex = A(\\d} agent.sources.tailsrc. interceptors. M.serializers = t1 agent.sources.tailsrc. interceptors, i1.serializers.t1 . name = type
agent.sources.tailsrc.selector.type = multiplexing agent.sources.tailsrc.selector.header = type agent.sources.tailsrc.selector.mapping.1 = memi agent.sources.tailsrc.selector.mapping.2 = mem2
agent.sinks.std1.type = hdfs
agent.sinks.stdl.channel = mem1
agent.sinks.stdl.batchSize = 1
agent.sinks.std1 .hdfs.path = /user/hive/warehouse/flumemaleemployeei
agent.sinks.stdl.rolllnterval = 0
agent.sinks.stdl.hdfs.tileType = Data Stream
agent.sinks.std2.type = hdfs
agent.sinks.std2.channel = mem2
agent.sinks.std2.batchSize = 1
agent.sinks.std2 .hdfs.path = /user/hi ve/warehouse/fIumefemaleemployee1
agent.sinks.std2.rolllnterval = 0
agent.sinks.std2.hdfs.tileType = Data Stream
agent.channels.mem1.type = memory agent.channels.meml.capacity = 100
agent.channels.mem2.type = memory agent.channels.mem2.capacity = 100
agent.sources.tailsrc.channels = mem1 mem2
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/flume5.conf --name agent
Step 5: Open another terminal create a file at /home/cloudera/flumetest/in.txt.
Step 6: Enter below data in file and save it.
l.alok.mumbai
1 jatin.chennai
1, yogesh, kolkata
2, ragini, delhi
2, jyotsana, pune
1, valmiki, banglore
Step 7: Open hue and check the data is available in hive table or not.
Step 8: Stop flume service by pressing ctrl+c
정답: Step 1: Create hive table for flumeemployee.'
CREATE TABLE flumemaleemployee
(
name string,
salary int,
sex string,
age int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf.
#Define source, sink, channel and agent.
agent1 .sources = source1
agent1 .sinks = sink1
agent1 .channels = channel1
# Describe/configure source1
agent1 . sources.source1.type = netcat
agent1 .sources.source1.bind = 127.0.0.1
agent1.sources.sourcel.port = 44444
#Define interceptors
agent1.sources.source1.interceptors=il
agent1 .sources.source1.interceptors.i1.type=regex_filter
agent1 .sources.source1.interceptors.i1.regex=female
agent1 .sources.source1.interceptors.i1.excludeEvents=true
## Describe sink1
agent1 .sinks, sinkl.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agentl .sinks.sink1.hdfs.fileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channell.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1 .sources.source1.channels = channel1
agent1 .sinks.sink1.channel = channel1
step 3: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf --name agentl
Step 4: Open another terminal and use the netcat service, nc localhost 44444
Step 5: Enter data line by line.
alok, 100000, male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki.123000.male.29
Step 6: Open hue and check the data is available in hive table or not.
Step 7: Stop flume service by pressing ctrl+c
Step 8: Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;
정답: Solution:
Step 1: Create flume configuration file, with below configuration for source, sink and channel.
#Define source, sink, channel and agent,
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1 . sources.source1.type = exec
agentl.sources.source1.command = tail -F /opt/gen logs/logs/access.log
#Define interceptors
agent1 .sources.source1.interceptors=i1
agent1 .sources.source1.interceptors.i1.type=timestamp
agent1 .sources.source1.interceptors.i1.preserveExisting=true
## Describe sink1
agent1 .sinks.sink1.channel = memory-channel
agent1 . sinks.sink1.type = hdfs
agent1 . sinks.sink1.hdfs.path = flume3/%Y/%m/%d/%H/%M
agent1 .sinks.sjnkl.hdfs.fileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
Agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
Step 2: Run below command which will use this configuration file and append data in hdfs.
Start log service using: start_logs
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume3.conf -DfIume.root.logger=DEBUG, INFO, console Cname agent1
Wait for few mins and than stop log service.
stop logs
정답: Solution:
Step 1: Create hive table forflumeemployee.'
CREATE TABLE flumeemployee
(
name string, salary int, sex string,
age int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ', ';
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume2.conf.
#Define source, sink, channel and agent,
agent1 .sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1.sources.source1.type = netcat
agent1.sources.source1.bind = 127.0.0.1
agent1.sources.source1.port = 44444
## Describe sink1
agent1 .sinks.sink1.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 . sinks.sink1.hdfs.path = /user/hive/warehouse/flumeemployee
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agent1 .sinks.sink1.hdfs.tileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
Agent1 .sources.sourcel.channels = channell agent1 .sinks.sinkl.channel = channel1
Step 3: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume2.conf --name agent1
Step 4: Open another terminal and use the netcat service.
nc localhost 44444
Step 5: Enter data line by line.
alok, 100000.male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki, 123000, male, 29
Step 6: Open hue and check the data is available in hive table or not.
step 7: Stop flume service by pressing ctrl+c
Step 8: Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;
정답: Solution:
Step 1: Create flume configuration file, with below configuration for source, sink and channel.
#Define source, sink, channel and agent,
agent1. sources = source1
agent1 .sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1 . sources.source1.type = exec
agent1.sources.source1.command = tail -F /opt/gen logs/logs/access.log
## Describe sinkl
agentl .sinks.sinkl.channel = memory-channel
agentl .sinks.sinkl .type = hdfs
agentl . sinks.sink1.hdfs.path = flumel
agentl .sinks.sinkl.hdfs.fileType = Data Stream
# Now we need to define channell property.
agent1.channels.channel1.type = memory
agent1.channels.channell.capacity = 1000
agent1.channels.channell.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
Step 2: Run below command which will use this configuration file and append data in hdfs.
Start log service using: startjogs
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flumel.conf-Dflume.root.logger=DEBUG, INFO, console
Wait for few mins and than stop log service.
Stop_logs
정답: Solution:
Step 1: Connecting to existing MySQL Database mysql -user=retail_dba --password=cloudera retail_db
Step 2: Show all the available tables show tables;
Step 3: Below is the command to create Sqoop Job (Please note that - import space is mandatory)
sqoop job -create sqoopjob \ -- import \
-connect "jdbc:mysql://quickstart:3306/retail_db" \
-username=retail_dba \
-password=cloudera \
-table categories \
-target-dir categories_targetJob \
-fields-terminated-by '|' \
-lines-terminated-by '\n'
Step 4: List all the Sqoop Jobs sqoop job --list
Step 5: Show details of the Sqoop Job sqoop job --show sqoopjob
Step 6: Execute the sqoopjob sqoopjob --exec sqoopjob
Step 7: Check the output of import job
hdfs dfs -Is categories_target_job
hdfs dfs -cat categories_target_job/part*
정답: Solution:
Step 1: Import departments table from mysql to hdfs as textfile
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
-password=cloudera \
-table departments \
-as-textfile \
-target-dir=departments_text
verify imported data
hdfs dfs -cat departments_text/part"
Step 2: Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
-password=cloudera \
--table departments \
-as-sequencetlle \
-~target-dir=departments sequence
verify imported data
hdfs dfs -cat departments_sequence/part*
Step 3: Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
--password=cloudera \
--table departments \
--as-avrodatafile \
--target-dir=departments_avro
verify imported data
hdfs dfs -cat departments avro/part*
Step 4: Import departments table from mysql to hdfs as sequncetlle
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
~username=retail_dba \
--password=cloudera \
-table departments \
-as-parquetfile \
-target-dir=departments_parquet
verify imported data
hdfs dfs -cat departmentsparquet/part*
정답: Solution:
Step 1: Create table in mysql db as well.
mysql ~user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name varchar(45), avg_salary int);
show tables;
Step 2: Now export data from hive table to mysql table as per the requirement.
sqoop export --connect jdbc:mysql://quickstart:3306/retail_db \
-username retaildba \
-password cloudera \
--table departments_hive02 \
-export-dir /user/hive/warehouse/departments_hive01 \
-input-fields-terminated-by '\001' \
--input-Iines-terminated-by '\n' \
--num-mappers 1 \
-batch \
-Input-null-string "" \
-input-null-non-string -999
step 3: Now validate the data, select * from departments_hive02;
정답: Solution:
Step 1: Create hive table as below.
hive
show tables;
create table departments_hive01(department_id int, department_name string, avgsalary int);
Step 2: Create table in mysql db as well.
mysql -user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive01(id int, department_name varchar(45), avg_salary int);
show tables;
step 3: Insert data in mysql table.
insert into departments_hive01 select a.*, null from departments a;
check data inserts
select' from departments_hive01;
Now iserts null records as given in problem. insert into departments_hive01 values(777, "Not known", 1000); insert into departments_hive01 values(8888, null, 1000); insert into departments_hive01 values(666, null, 1100);
Step 4: Now import data in hive as per requirement.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
--password=cloudera \
-table departments_hive01 \
--hive-home /user/hive/warehouse \
--hive-import \
-hive-overwrite \
-hive-table departments_hive0l \
--fields-terminated-by '\001' \
--null-string M"\
--null-non-strlng -999 \
-split-by id \
-m 1
Step 5: Checkthe data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive01
hdfs dfs -cat/user/hive/warehouse/departments_hive01/part"
Check data in hive table.
Select * from departments_hive01;
정답: Solution:
Step 1: Create hive table as said.
hive
show tables;
create table departments_hive(department_id int, department_name string);
Step 2: The important here is, when we create a table without delimiter fields. Then default delimiter for hive is ^A (\001). Hence, while importing data we have to provide proper delimiter.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
-password=cloudera \
--table departments \
--hive-home /user/hive/warehouse \
-hive-import \
-hive-overwrite \
--hive-table departments_hive \
--fields-terminated-by '\001'
Step 3: Check-the data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive
hdfs dfs -cat/user/hive/warehouse/departmentshive/part'
Check data in hive table.
Select * from departments_hive;