Set up SeaweedFS distributed object storage cluster on Ubuntu 20.04

You can download this article in PDF format via the link below to support us.
Download the guide in PDF formatshut down

With the explosion of information in the connected age, many applications require more data to describe the client, provide AI, and propose large-scale data projects. In order to cope with the booming PB-level information; an efficient, reliable and stable system is needed to quickly process, store and retrieve data. Some systems have been established to process such data on a large scale, including Ceph, GlusterFS, HDFS, MinIO and SeawedFS. Their nature and the way they handle objects are excellent. In this guide, we will focus on SeawedFS. We will see its features and then install it with all our attention. First, let us get familiar with SeawedFS.

SeaweedFS is a distributed object storage and file system that can quickly store and provide billions of files! Object storage has O(1) disk search and transparent cloud integration. Filer supports active-active replication across clusters, Kubernetes, POSIX, S3 API, encryption, erasure coding for hot storage, FUSE installation, Hadoop, WebDAV. source: SeawedFS GitHub space

SeaweedFS was originally used as an object storage to efficiently handle small files. Instead of managing all file metadata in the central host, the central host only manages file volumes and allows these volume servers to manage files and their metadata. This can reduce the concurrency pressure from the central host and distribute the file metadata to the volume server, so that the file can be accessed faster (O(1), usually only a disk read operation). SeawedFS GitHub space

Seaweed has two goals:

  • To store billions of files!
  • To provide documents quickly!

Features of SeaweedFS

Seaweed has the following functions to show the world

  • SeaweedFS can be transparently integrated with the cloud: it can achieve fast local access time and flexible cloud storage capacity without any client changes.
  • SeaweedFS has O(1) disk read function, which is very simple. You are welcome to challenge performance through actual use cases. The metadata of each file has only 40 bytes of disk storage overhead.
  • You can choose no replication or different replication levels, and both racks and data centers can be identified.
  • Automatic primary server failover-no single point of failure (SPOF).
  • Gzip compression is performed automatically according to the file mime type.
  • Automatically compress after deletion or update to reclaim disk space.
  • Automatic entry of TTL expires.
  • Any server with some disk space can be added to the total storage space.
  • Unless triggered by an admin command, adding/removing servers will not cause any data rebalancing.
  • Optional image size adjustment.

Install SeaweedFS on Ubuntu 20.04

Now to the part where we put on our boots, gloves and enter the farm, install, farm and water the SeaweedFS on our Ubuntu 20.04 server. Before pushing the shovel into the dirt, follow the steps below to first install the Go required by SeeweedFS.

Step 1: Prepare the server

This is a very important step because we will install the latest software and patches before proceeding to install SeaweedFS and Go. We will also install the required tools here.

sudo apt update
sudo apt install vim curl wget zip git -y
sudo apt install build-essential autoconf automake gdb git libffi-dev zlib1g-dev libssl-dev -y

Step 2: Obtain and install Go

You can use Golang available in the APT repository or pull it from the source.

Method 1: Install from APT repository:

Run the following command to install Golang on Ubuntu from the APT repository.

sudo apt install golang

Method 2: Manual installation

access Go to download page Get the latest available Go tarball version as follows:

cd ~
wget -c https://golang.org/dl/go1.15.5.linux-amd64.tar.gz -O - | sudo tar -xz -C /usr/local

After finishing, we need to add the “/usr/local/go/bin” directory to the PATH environment variable so that the server can find the Go executable binary file. Do this by adding the following lines to the /etc/profile file (for system-wide installations) or the $HOME/.profile file (for installations for the current user):

echo "export PATH=$PATH:/usr/local/go/bin" | sudo tee -a /etc/profile

#### For the current user installation

echo "export PATH=$PATH:/usr/local/go/bin" | tee -a $HOME/.profile

According to the source file of the file you edited, so that the new PATH environment variable can be loaded into the current shell session:

$ source ~/.profile

# Or
$ source /etc/profile

Step 3: Check out the SeaweedFS repository

In order to install SeaweedFS, we need to put the necessary files into our server. All resources are in GitHub, so let’s clone the repository and proceed with the installation.

cd ~
git clone https://github.com/chrislusf/seaweedfs.git

Step 4: Download, compile and install SeaweedFS

After cloning all sources, execute the following command to navigate to the new directory and install the SeaweedFS project

$ cd ~/seaweedfs
$ make install

##Progress of the installation
$ go get  -d ./weed/
go: downloading github.com/chrislusf/raft v1.0.3
go: downloading github.com/golang/protobuf v1.4.2
go: downloading github.com/gorilla/mux v1.7.4
go: downloading google.golang.org/grpc v1.29.1
go: downloading github.com/google/uuid v1.1.1
go: downloading github.com/syndtr/goleveldb v1.0.0
go: downloading go.etcd.io/etcd v0.5.0-alpha.5.0.20200425165423-262c93980547
go: downloading github.com/klauspost/crc32 v1.2.0

Once this is done, you will find the executable file “Weeds” At your $ GOPATH/bin table of Contents.Unfortunately, after installing weeds, weeds will produce $ GOPATH In the current home directory. You will find weeds here”~/go/bin/weed“. Therefore, to solve this problem, we copy the SeaweedFS binary file to the Go installation location. Step 2 like this:

sudo cp ~/go/bin/weed   /usr/local/bin/

right now”WeedsThe command is located in the PATH environment variable, we can now proceed to configure SeaweedFS comfortably as shown in the steps below.

$ weed version
version 30GB 2.12 6d30b21b linux amd64

Step 5: Example usage of SeaweedFS

In order to grasp the simple example to be shown in this step, it would be great if we first fully think about the working mechanism of SeaweedFS. The architecture is very simple. The actual data is stored in volumes on the storage node (which can be located on the same server or on different servers). A volume server can have multiple volumes, and all of them can support read-write access through basic authentication. source:SeaweedFS documentation

All volumes are managed by the master server, which contains the mapping of the volume ID to the volume server.

SeaweedFS no longer manages blocks like a distributed file system, but instead manages data volumes in the main server. Each data volume is about 32GB in size and can hold many files. Each storage node can have many data volumes. Therefore, the master node only needs to store metadata about the volume, which is a relatively small amount of data and is usually stable. source:SeaweedFS documentation

By default, the master node runs on port 9333 and the volume node runs on port 8080. For intuitiveness, we will start a master node and two volume nodes on ports 8080 and 8081 respectively. Ideally, as mentioned earlier, they should be started from different machines, but we will use one server as an example. If you plan to start the volume on another server, make sure the -mserver IP address points to the master server. In addition, the port on the master server must be accessible from the volume server/node.

SeaweedFS uses HTTP REST operations to read, write and delete. The response is in JSON or JSONP format.

Start the main server

After opening, the master node runs on port 9333 by default. We can start the main server as follows:

Option 1: Manual mode

$ weed master &

I1126 20:22:17  6485 file_util.go:23] Folder /tmp Permission: -rwxrwxrwx
I1126 20:22:17  6485 master.go:168] current: 172.22.3.196:9333 peers:
I1126 20:22:17  6485 master_server.go:107] Volume Size Limit is 30000 MB
I1126 20:22:17  6485 master_server.go:192] adminScripts:
I1126 20:22:17  6485 master.go:122] Start Seaweed Master 30GB 2.12 a1021570 at 0.0.0.0:9333
I1126 20:22:17  6485 raft_server.go:70] Starting RaftServer with 172.22.3.196:9333
I1126 20:22:17  6485 raft_server.go:129] current cluster leader:

Option 2: Use Systemd to start the main server

You can start the Master with Systemd by creating a unit file of Systemd, as shown below:

sudo tee /etc/systemd/system/seaweedmaster.service<<EOF
[Unit]
Description=SeaweedFS Master
After=network.target

[Service]
Type=simple
User=root
Group=root

ExecStart=/usr/local/go/bin/weed master
WorkingDirectory=/usr/local/go/bin/
SyslogIdentifier=seaweedfs-master

[Install]
WantedBy=multi-user.target
EOF

After updating the file, you need to reload the daemon and start the main server as shown

sudo systemctl daemon-reload
sudo systemctl start seaweedmaster
sudo systemctl enable seaweedmaster

Then check its status

$ systemctl status seaweedmaster -l
● seaweedmaster.service - SeaweedFS Master
     Loaded: loaded (/etc/systemd/system/seaweedmaster.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2020-11-30 08:11:37 UTC; 2s ago
   Main PID: 1653 (weed)
      Tasks: 10 (limit: 2204)
     Memory: 11.8M
     CGroup: /system.slice/seaweedmaster.service
             └─1653 /usr/local/go/bin/weed master

Start the volume server

Once the host is ready and waiting for the volume, we can now safely start the volume with the following command. We will first create the sample directory.

mkdir /tmp/{data1,data2,data3,data4}}

Then, let’s create the first volume as shown below. (The first is the command, followed by the shell output)

Option 1: Manual mode

$ weed volume -dir="/tmp/data1" -max=5  -mserver="localhost:9333" -port=8080 &


I1126 20:37:24  6595 disk_location.go:133] Store started on dir: /tmp/data1 with 0 volumes max 5        
I1126 20:37:24  6595 disk_location.go:136] Store started on dir: /tmp/data1 with 0 ec shards
I1126 20:37:24  6595 volume.go:331] Start Seaweed volume server 30GB 2.12 a1021570 at 0.0.0.0:8080      
I1126 20:37:24  6595 volume_grpc_client_to_master.go:52] Volume server start with seed master nodes: [localhost:9333]
I1126 20:37:24  6595 volume_grpc_client_to_master.go:114] Heartbeat to: localhost:9333
I1126 20:37:24  6507 node.go:278] topo adds child DefaultDataCenter
I1126 20:37:24  6507 node.go:278] topo:DefaultDataCenter adds child DefaultRack
I1126 20:37:24  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8080       
I1126 20:37:24  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8080
I1126 20:37:24  6595 volume_grpc_client_to_master.go:135] Volume Server found a new master newLeader: 172.22.3.196:9333 instead of localhost:9333
W1126 20:37:24  6507 master_grpc_server.go:57] SendHeartbeat.Recv server 172.22.3.196:8080 : rpc error: 
code = Canceled desc = context canceled
I1126 20:37:24  6507 node.go:294] topo:DefaultDataCenter:DefaultRack removes 172.22.3.196:8080
I1126 20:37:24  6507 master_grpc_server.go:29] unregister disconnected volume server 172.22.3.196:8080  
I1126 20:37:27  6595 volume_grpc_client_to_master.go:114] Heartbeat to: 172.22.3.196:9333
I1126 20:37:27  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8080
I1126 20:37:27  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8080

Then create the second one again as follows. (The first is the command, followed by the shell output)

$ weed volume -dir="/tmp/data2" -max=10 -mserver="localhost:9333" -port=8081 &

I1126 20:38:56  6612 disk_location.go:133] Store started on dir: /tmp/data2 with 0 volumes max 10       
I1126 20:38:56  6612 disk_location.go:136] Store started on dir: /tmp/data2 with 0 ec shards
I1126 20:38:56  6612 volume_grpc_client_to_master.go:52] Volume server start with seed master nodes: [localhost:9333]
I1126 20:38:56  6612 volume.go:331] Start Seaweed volume server 30GB 2.12 a1021570 at 0.0.0.0:8081      
I1126 20:38:56  6612 volume_grpc_client_to_master.go:114] Heartbeat to: localhost:9333
I1126 20:38:56  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8081       
I1126 20:38:56  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8081
I1126 20:38:56  6612 volume_grpc_client_to_master.go:135] Volume Server found a new master newLeader: 172.22.3.196:9333 instead of localhost:9333
W1126 20:38:56  6507 master_grpc_server.go:57] SendHeartbeat.Recv server 172.22.3.196:8081 : rpc error: 
code = Canceled desc = context canceled
I1126 20:38:56  6507 node.go:294] topo:DefaultDataCenter:DefaultRack removes 172.22.3.196:8081
I1126 20:38:56  6507 master_grpc_server.go:29] unregister disconnected volume server 172.22.3.196:8081  
I1126 20:38:59  6612 volume_grpc_client_to_master.go:114] Heartbeat to: 172.22.3.196:9333
I1126 20:38:59  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8081
I1126 20:38:59  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8081

Option 2: Use SystemD

To use Systemd startup sounds, we will need to create two or more volume files in case you need them. It is very simple, as follows:

For volume 1

$ sudo vim /etc/systemd/system/seaweedvolume1.service

[Unit]
Description=SeaweedFS Volume
After=network.target

[Service]
Type=simple
User=root
Group=root

ExecStart=/usr/local/go/bin/weed volume -dir="/tmp/data2" -max=10 -mserver="172.22.3.196:9333" -port=8081
WorkingDirectory=/usr/local/go/bin/
SyslogIdentifier=seaweedfs-volume

[Install]
WantedBy=multi-user.target

Replace the volume path with the correct value, then boot and enable.

sudo systemctl daemon-reload
sudo systemctl start seaweedvolume1.service
sudo systemctl enable seaweedvolume1.service

Check status:

$ systemctl status seaweedvolume1
● seaweedvolume1.service - SeaweedFS Volume
     Loaded: loaded (/etc/systemd/system/seaweedvolume1.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2020-11-30 08:24:43 UTC; 3s ago
   Main PID: 2063 (weed)
      Tasks: 9 (limit: 2204)
     Memory: 9.8M
     CGroup: /system.slice/seaweedvolume1.service
             └─2063 /usr/local/go/bin/weed volume -dir=/tmp/data3 -max=10 -mserver=localhost:9333 -port=8081 -ip=172.22.3.196

For volume 2

$ sudo vim /etc/systemd/system/seaweedvolume2.service

[Unit]
Description=SeaweedFS Volume
After=network.target

[Service]
Type=simple
User=root
Group=root

ExecStart=/usr/local/go/bin/weed volume -dir="/tmp/data1" -max=5  -mserver="172.22.3.196:9333" -port=8080
WorkingDirectory=/usr/local/go/bin/
SyslogIdentifier=seaweedfs-volume2

[Install]
WantedBy=multi-user.target

After updating the file, we will reload the daemon as shown

sudo systemctl daemon-reload
sudo systemctl start seaweedvolume2
sudo systemctl enable seaweedvolume2

Then check their status

sudo systemctl status seaweedvolume2
● seaweedvolume2.service - SeaweedFS Volume
     Loaded: loaded (/etc/systemd/system/seaweedvolume2.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2020-11-30 08:29:22 UTC; 5s ago
   Main PID: 2103 (weed)
      Tasks: 10 (limit: 2204)
     Memory: 10.3M
     CGroup: /system.slice/seaweedvolume2.service
             └─2103 /usr/local/go/bin/weed volume -dir=/tmp/data4 -max=5 -mserver=localhost:9333 -port=8080 -ip=172.22.3.196

Write a sample file

Uploading files to SeaweedFS object storage is fun. First, we must send an HTTP POST, PUT or GET request to /dir/assign to get the file ID (fid) and volume server url:

$ curl http://localhost:9333/dir/assign

{"fid":"7,0101406762","url":"172.22.3.196:8080","publicUrl":"172.22.3.196:8080","count":1}

After the details are shown above, the next step is to store the file contents. To do this, we must send the HTTP multi-part POST request to the url +’/’ + file ID (fid) in the response. Our fid is 7,0101406762, and the url is 172.22.3.196:8080. Let’s send the request like this. You will receive a response as shown below.

$ curl -F [email protected]/home/tech/teleport-logo.png http://172.22.3.196:8080/7,0101406762

{"name":"teleport-logo.png","size":70974,"eTag":"ef8deb64899176d3de492f2fa9951e14"}

Update files that have been sent to object storage

The update is easier than you think. You simply need to send the same command as above, but now you want to replace the existing file with the new file. You will maintain the fid and url.

curl -F [email protected]/home/tech/teleport-logo-updated.png http://172.22.3.196:8080/7,0101406762

Delete files from object storage

In order to get rid of the files you have stored in SeaweedFS, you only need to send the HTTP DELETE request to the same url +’/’+ File ID (fid) URL:

curl -X DELETE http://172.22.3.196:8080/7,0101406762

Read the saved file

After stirring the files, you can easily read them. First, find the URL of the volume server by the volumeId of the file, as shown in the following example:

$ curl http://http://172.22.3.196:9333/dir/lookup?volumeId=7

{"volumeId":"7","locations":[{"url":"172.22.3.196:8080","publicUrl":"172.22.3.196:8080"}]}

Since the volume does not move frequently, you can cache the results most of the time to improve the speed and performance of unique implementations. Depending on the copy type, a volume can have multiple copy locations. Just choose a location at random to read.

Now, open the browser or application you like to use to view the files stored in SeaweedFS object storage and point it to the URL above.If you are running a firewall, please allow your access port

sudo ufw allow 8080

http://172.22.3.196:8080/7,0101406762

This example is a screenshot of the shared file below

If you want a better URL, you can use one of the following alternative URL formats:

 http://172.22.3.196:8080/7/0101406762/your_preferred_name.jpg
 http://172.22.3.196:8080/7/0101406762.jpg
 http://172.22.3.196:8080/7,0101406762.jpg
 http://172.22.3.196:8080/7/0101406762
 http://172.22.3.196:8080/7,0101406762

If you want a zoomed version of the image, you can add some parameters. The following shared example:

http://172.22.3.196:8080/7/0101406762.jpg?height=200&width=200
http://172.22.3.196:8080/7/0101406762.jpg?height=200&width=200&mode=fit
http://172.22.3.196:8080/7/0101406762.jpg?height=200&width=200&mode=fill

There are many things that seaweed can do, such as completing No point of failure By using multiple servers and multiple volumes.check out SeaweedFS documentation Find more information about this excellent object storage tool on GitHub.

Closing speech

It turns out that SeaweedFS has very important value in your project, especially when it involves storing and retrieving large amounts of data in the form of objects. If your application needs to obtain photos and such data, then SeaweedFS is a good choice. At the same time, we will continue to thank you for your time on the blog and the impeccable support you have provided so far. You can read other similar guides shared below.

EKS Kubernetes persistent storage using EFS storage service

Set up GlusterFS storage with Heketi on CentOS 8 / CentOS 7

How to create and delete GlusterFS volumes

Use Minio to set up an S3-compatible object storage server

You can download this article in PDF format via the link below to support us.
Download the guide in PDF formatshut down

Sidebar