Skip to content

Commit d822ff5

Browse files
committed
Add hadoop-compat-bench
1 parent 5ad7737 commit d822ff5

File tree

59 files changed

+5631
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+5631
-0
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
<!---
2+
Licensed under the Apache License, Version 2.0 (the "License");
3+
you may not use this file except in compliance with the License.
4+
You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software
9+
distributed under the License is distributed on an "AS IS" BASIS,
10+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
See the License for the specific language governing permissions and
12+
limitations under the License. See accompanying LICENSE file.
13+
-->
14+
15+
# Compatibility Benchmark over HCFS Implementations
16+
17+
## <a name="Background"></a> Background
18+
19+
Hadoop-Compatible File System (HCFS) is a core conception in big data storage ecosystem,
20+
providing unified interfaces and generally clear semantics,
21+
and has become the de-factor standard for industry storage systems to follow and conform with.
22+
There have been a series of HCFS implementations in Hadoop,
23+
such as S3AFileSystem for Amazon's S3 Object Store,
24+
WASB for Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object Storage,
25+
and more from storage service's providers on their own.
26+
27+
## <a name="Problems"></a> Problems
28+
29+
However, as indicated by [`HCFS Introduction`](hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md),
30+
there is no formal suite to do compatibility assessment of a file system for all such HCFS implementations.
31+
Thus, whether the functionality is well accomplished and meets the core compatible expectations
32+
mainly relies on service provider's own report.
33+
Meanwhile, Hadoop is also developing and new features are continuously contributing to HCFS interfaces
34+
for existing implementations to follow and update, in which case,
35+
Hadoop also needs a tool to quickly assess if these features are supported or not for a specific HCFS implementation.
36+
Besides, the known hadoop command line tool or hdfs shell is used to directly interact with a HCFS storage system,
37+
where most commands correspond to specific HCFS interfaces and work well.
38+
Still, there are cases that are complicated and may not work, like expunge command.
39+
To check such commands for an HCFS, we also need an approach to figure them out.
40+
41+
## <a name="Proposal"></a> Proposal
42+
43+
Accordingly, we propose to define a formal HCFS compatibility benchmark and provide corresponding tool
44+
to do the compatibility assessment for an HCFS storage system.
45+
The benchmark and tool should consider both HCFS interfaces and hdfs shell commands.
46+
Different scenarios require different kinds of compatibilities.
47+
For such consideration, we could define different suites in the benchmark.
48+
49+
## <a name="Benefits"></a> Benefits
50+
51+
We intend the benchmark and tool to be useful for both storage providers and storage users.
52+
For end users, it can be used to evalute the compatibility level and
53+
determine if the storage system in question is suitable for the required scenarios.
54+
For storage providers, it helps to quickly generate an objective and reliable report
55+
about core functioins of the storage service.
56+
As an instance, if the HCFS got a 100% on a suite named 'tpcds',
57+
it is demonstrated that all functions needed by a tpcds program have been well achieved.
58+
It is also a guide indicating how storage service abilities can map to HCFS interfaces, such as storage class on S3.

hadoop-compat-bench/pom.xml

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. See accompanying LICENSE file.
14+
-->
15+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
16+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
17+
<modelVersion>4.0.0</modelVersion>
18+
<parent>
19+
<groupId>org.apache.hadoop</groupId>
20+
<artifactId>hadoop-project</artifactId>
21+
<version>3.5.0-SNAPSHOT</version>
22+
<relativePath>../hadoop-project</relativePath>
23+
</parent>
24+
<artifactId>hadoop-compat-bench</artifactId>
25+
<version>3.5.0-SNAPSHOT</version>
26+
<packaging>jar</packaging>
27+
28+
<description>Apache Hadoop Compatibility</description>
29+
<name>Apache Hadoop Compatibility Benchmark</name>
30+
31+
<dependencies>
32+
<dependency>
33+
<groupId>org.apache.hadoop</groupId>
34+
<artifactId>hadoop-common</artifactId>
35+
<scope>provided</scope>
36+
</dependency>
37+
<dependency>
38+
<!-- Should we keep this -->
39+
<groupId>org.apache.hadoop</groupId>
40+
<artifactId>hadoop-hdfs</artifactId>
41+
<scope>provided</scope>
42+
</dependency>
43+
<dependency>
44+
<groupId>junit</groupId>
45+
<artifactId>junit</artifactId>
46+
<scope>compile</scope>
47+
</dependency>
48+
49+
<!-- For test -->
50+
<dependency>
51+
<groupId>org.apache.hadoop</groupId>
52+
<artifactId>hadoop-hdfs-client</artifactId>
53+
<scope>test</scope>
54+
</dependency>
55+
<dependency>
56+
<groupId>org.apache.hadoop</groupId>
57+
<artifactId>hadoop-common</artifactId>
58+
<type>test-jar</type>
59+
<scope>test</scope>
60+
</dependency>
61+
<dependency>
62+
<groupId>org.apache.hadoop</groupId>
63+
<artifactId>hadoop-hdfs</artifactId>
64+
<type>test-jar</type>
65+
<scope>test</scope>
66+
</dependency>
67+
<dependency>
68+
<groupId>org.mockito</groupId>
69+
<artifactId>mockito-core</artifactId>
70+
<scope>test</scope>
71+
</dependency>
72+
</dependencies>
73+
74+
<build>
75+
<plugins>
76+
<plugin>
77+
<groupId>org.apache.maven.plugins</groupId>
78+
<artifactId>maven-jar-plugin</artifactId>
79+
<configuration>
80+
<archive>
81+
<manifest>
82+
<mainClass>org.apache.hadoop.compat.HdfsCompatibility</mainClass>
83+
</manifest>
84+
</archive>
85+
</configuration>
86+
<executions>
87+
<execution>
88+
<goals>
89+
<goal>test-jar</goal>
90+
</goals>
91+
<configuration>
92+
<archive>
93+
<manifest>
94+
<mainClass>org.apache.hadoop.compat.hdfs.HdfsCompatMiniCluster</mainClass>
95+
</manifest>
96+
</archive>
97+
</configuration>
98+
</execution>
99+
</executions>
100+
</plugin>
101+
<plugin>
102+
<groupId>org.apache.maven.plugins</groupId>
103+
<artifactId>maven-surefire-plugin</artifactId>
104+
<configuration>
105+
<forkedProcessTimeoutInSeconds>3600</forkedProcessTimeoutInSeconds>
106+
</configuration>
107+
</plugin>
108+
</plugins>
109+
<resources>
110+
<resource>
111+
<directory>src/main/resources</directory>
112+
</resource>
113+
<resource>
114+
<directory>shell</directory>
115+
</resource>
116+
</resources>
117+
</build>
118+
</project>
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
hadoop fs -put "${localDir}/dat" "${baseDir}/"
6+
7+
echo "1..10"
8+
9+
# 1. chown
10+
hadoop fs -chown "hadoop-compat-bench-user" "${baseDir}/dat"
11+
expect_out "chown" "user:hadoop-compat-bench-user" hadoop fs -stat "user:%u" "${baseDir}/dat"
12+
13+
# 2. chgrp
14+
hadoop fs -chgrp "hadoop-compat-bench-group" "${baseDir}/dat"
15+
expect_out "chgrp" "group:hadoop-compat-bench-group" hadoop fs -stat "group:%g" "${baseDir}/dat"
16+
17+
# 3. chmod
18+
hadoop fs -chmod 777 "${baseDir}/dat"
19+
expect_out "chmod" "perm:777" hadoop fs -stat "perm:%a" "${baseDir}/dat"
20+
21+
# 4. touch
22+
hadoop fs -touch -m -t "20000615:000000" "${baseDir}/dat"
23+
expect_out "touch" "date:2000-06-.*" hadoop fs -stat "date:%y" "${baseDir}/dat"
24+
25+
# 5. setfattr
26+
expect_ret "setfattr" 0 hadoop fs -setfattr -n "user.key" -v "value" "${baseDir}/dat"
27+
28+
# 6. getfattr
29+
expect_out "getfattr" ".*value.*" hadoop fs -getfattr -n "user.key" "${baseDir}/dat"
30+
31+
# 7. setfacl
32+
expect_ret "setfacl" 0 hadoop fs -setfacl -m "user:foo:---" "${baseDir}/dat"
33+
34+
# 8. getfacl
35+
expect_out "getfacl" ".*foo.*" hadoop fs -getfacl "${baseDir}/dat"
36+
37+
# 9. setrep
38+
hadoop fs -setrep 1 "${baseDir}/dat"
39+
expect_out "setrep" "replication:1" hadoop fs -stat "replication:%r" "${baseDir}/dat"
40+
41+
# 10. checksum
42+
expect_ret "checksum" 0 hadoop fs -checksum "${baseDir}/dat" # TODO
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
hadoop fs -put "${localDir}/dat" "${baseDir}/src1"
6+
hadoop fs -put "${localDir}/dat" "${baseDir}/src2"
7+
8+
echo "1..3"
9+
10+
# 1. touchz
11+
hadoop fs -touchz "${baseDir}/dat"
12+
expect_out "touchz" "size:0" hadoop fs -stat "size:%b" "${baseDir}/dat"
13+
14+
# 2. concat
15+
expect_ret "concat" 0 hadoop fs -concat "${baseDir}/dat" "${baseDir}/src1" "${baseDir}/src2"
16+
# expect_out "size:26" hadoop fs -stat "size:%b" "${baseDir}/dat"
17+
18+
# 3. getmerge
19+
hadoop fs -getmerge "${baseDir}" "${localDir}/merged"
20+
expect_ret "getmerge" 0 test -s "${localDir}/merged"
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
6+
echo "1..3"
7+
8+
# 1. copyFromLocal
9+
expect_ret "copyFromLocal" 0 hadoop fs -copyFromLocal "${localDir}/dat" "${baseDir}/"
10+
11+
# 2. cp
12+
hadoop fs -cp "${baseDir}/dat" "${baseDir}/dat2"
13+
expect_ret "cp" 0 hadoop fs -test -f "${baseDir}/dat2"
14+
15+
# 3. copyToLocal
16+
hadoop fs -copyToLocal "${baseDir}/dat2" "${localDir}/"
17+
expect_ret "copyToLocal" 0 test -f "${localDir}/dat2"
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
hadoop fs -put "${localDir}/dat" "${baseDir}/"
6+
hadoop fs -mkdir -p "${baseDir}/dir/sub"
7+
8+
echo "1..9"
9+
10+
# 1. ls
11+
expect_lines "ls" 2 ".*dat.*" ".*dir.*" hadoop fs -ls "${baseDir}"
12+
13+
# 2. lsr
14+
expect_lines "lsr" 3 ".*dat.*" ".*dir.*" ".*sub.*" hadoop fs -lsr "${baseDir}"
15+
16+
# 3. count
17+
expect_out "count" ".*13.*" hadoop fs -count "${baseDir}"
18+
19+
# 4. du
20+
expect_out "du" ".*13.*" hadoop fs -du "${baseDir}"
21+
22+
# 5. dus
23+
expect_out "dus" ".*13.*" hadoop fs -dus "${baseDir}"
24+
25+
# 6. df
26+
expect_ret "df" 0 hadoop fs -df "${baseDir}"
27+
28+
# 7. stat
29+
expect_out "stat" "size:13" hadoop fs -stat "size:%b" "${baseDir}/dat"
30+
31+
# 8. test
32+
expect_ret "test" 0 hadoop fs -test -f "${baseDir}/dat"
33+
34+
# 9. find
35+
expect_out "find" ".*dat.*" hadoop fs -find "${baseDir}" -name "dat" -print
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
6+
echo "1..4"
7+
8+
# 1. mkdir
9+
expect_ret "mkdir" 0 hadoop fs -mkdir -p "${baseDir}/dir"
10+
11+
# 2. put
12+
expect_ret "put" 0 hadoop fs -put "${localDir}/dat" "${baseDir}/"
13+
14+
# 3. appendToFile
15+
expect_ret "appendToFile" 0 hadoop fs -appendToFile "${localDir}/dat" "${baseDir}/dat"
16+
17+
# 4. truncate
18+
expect_ret "truncate" 0 hadoop fs -truncate 13 "${baseDir}/dat"
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
6+
echo "1..2"
7+
8+
# 1. moveFromLocal
9+
expect_ret "moveFromLocal" 0 hadoop fs -moveFromLocal "${localDir}/dat" "${baseDir}/"
10+
11+
# 2. mv
12+
hadoop fs -mv "${baseDir}/dat" "${baseDir}/dat2"
13+
expect_ret "mv" 0 hadoop fs -test -f "${baseDir}/dat2"
14+
15+
# moveToLocal is not achieved on HDFS
16+
# hadoop fs -moveToLocal "${baseDir}/dat2" "${localDir}/"
17+
# expect_ret "moveToLocal" 0 test -f "${localDir}/dat2"
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
hadoop fs -put "${localDir}/dat" "${baseDir}/"
6+
7+
echo "1..5"
8+
9+
# 1. get
10+
hadoop fs -get "${baseDir}/dat" "${localDir}/"
11+
expect_ret "get" 0 test -f "${localDir}/dat"
12+
13+
# 2. cat
14+
expect_out "cat" "Hello World!" hadoop fs -cat "${baseDir}/dat"
15+
16+
# 3. text
17+
expect_out "text" "Hello World!" hadoop fs -text "${baseDir}/dat"
18+
19+
# 4. head
20+
expect_out "head" "Hello World!" hadoop fs -head "${baseDir}/dat"
21+
22+
# 5. tail
23+
expect_out "tail" "Hello World!" hadoop fs -tail "${baseDir}/dat"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "Hello World!" > "${localDir}/dat"
5+
hadoop fs -mkdir -p "${baseDir}/dir/sub"
6+
hadoop fs -put "${localDir}/dat" "${baseDir}/dir/"
7+
hadoop fs -put "${localDir}/dat" "${baseDir}/dir/sub/"
8+
9+
echo "1..4"
10+
11+
# 1. rm
12+
hadoop fs -rm -f -skipTrash "${baseDir}/dir/dat"
13+
expect_ret "rm" 1 hadoop fs -test -e "${baseDir}/dir/dat"
14+
15+
# 2. rmr
16+
hadoop fs -rmr "${baseDir}/dir/sub"
17+
expect_ret "rmr" 1 hadoop fs -test -e "${baseDir}/dir/sub"
18+
19+
# 3. rmdir
20+
hadoop fs -rmdir "${baseDir}/dir"
21+
expect_ret "rmdir" 1 hadoop fs -test -e "${baseDir}/dir"
22+
23+
# 4. expunge
24+
expect_ret "expunge" 0 hadoop fs -expunge -immediate -fs "${baseDir}"
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/sh
2+
. $(dirname "$0")/../misc.sh
3+
4+
echo "1..3"
5+
6+
# 1. createSnapshot
7+
expect_out "createSnapshot" "Created snapshot .*" hdfs dfs -createSnapshot "${snapshotDir}" "s-name"
8+
9+
# 2. renameSnapshot
10+
expect_ret "renameSnapshot" 0 hdfs dfs -renameSnapshot "${snapshotDir}" "s-name" "d-name"
11+
12+
# 3. deleteSnapshot
13+
expect_ret "deleteSnapshot" 0 hdfs dfs -deleteSnapshot "${snapshotDir}" "d-name"

0 commit comments

Comments
 (0)