Wiby Personal Search Engine
With the advent of private local AI and sourcing fresh information, this is imperative to use for your own private internet and uncensored data.
Build Your Own Search Engine
(Wiby Install Guide)
Wiby is a search engine for the World Wide Web. The source code is now free as of July 8, 2022 under the GPLv2 license. I have been longing for this day! You can watch a quick demo.
It includes a web interface allowing guardians to control where, how far, and how often it crawls websites and follows hyperlinks. The search index is stored inside of an InnoDB full-text index.
Fast queries are maintained by concurrently searching different sections of the index across multiple replication servers or across duplicate server connections, returning a list of top results from each connection, then searching the combined list to ensure correct ordering. Replicas that fail are automatically excluded; new replicas are easy to include. As new pages are crawled, they are stored randomly across the index, ensuring each search section can obtain relevant results.
The search engine is not meant to index the entire web and then sort it with a ranking algorithm. It prefers to seed its index through human submissions made by guests, or by the guardian(s) of the search engine.
The software is designed for anyone with some extra computers (even a Pi), to host their own search engine catering to whatever niche matters to them. The search engine includes a simple API for meta search engines to harness.
I hope this will enable anyone with a love of computers to cheaply build and maintain a search engine of their own. I hope it can cultivate free and independent search engines, ensuring accessibility of ideas and information across the World Wide Web.
I can only provide manual install instructions at this time.
Note that while the software is functionally complete, it is still in beta. Anticipate that some bugs will be discovered now that the source is released. Ensure that you isolate the search engine from your other important services, and if you are running parts of it out of your home, keep the servers on a separate VLAN. Make sure this VLAN cannot access your router or switch interface. Continue this practise even when the software reaches "1.0".
If you have created a "LAMP", or rather a "LEMP" server before, this isn't much more complicated. If you've never done that, I suggest you find a "LEMP" tutorial.
Build a LEMP server
For the sake of simplicity, assume all instructions are for Ubuntu 20 or 22. If you are on a different distro, modify the install steps accordingly to suit your distro.
If you don't have a physical server, you can rent computing space by looking for a "VPS provider". This virtual computer will be your reverse proxy, and if you want, it can host everything else too.
Install the following additional packages:
apt install build-essential php-gd libcurl4-openssl-dev libmysqlclient-dev golang git
Get Wiby Source Files
Compile the crawler (cr), refresh scheduler (rs), replication tracker (rt):
gcc cr.c -o cr -lmysqlclient -lcurl -std=c99 -O3
gcc rs.c -o rs -lmysqlclient -std=c99 -O3
gcc rt.c -o rt -lmysqlclient -std=c99 -O3
If you get any compile errors, it is likely due to the path of the mysql or libcurl header files. This could happen if you are not using Ubuntu. You might have to locate the correct path for curl.h, easy.h, mysql.h, then edit the #include paths in the source files.
Build the core server application:
The core application is located inside the go folder. Run the following commands after copying the files over to your preferred location:
For Ubuntu 20:
go get -u github.com/go-sql-driver/mysql
go build core.go
go build 1core.go
If you are just starting out, you can use '1core'. If you are going to setup replication servers or you are using a computer with a lot of available cores, you can use 'core', but make sure to read the scaling section.
If you want to use 1core on a server separate from your reverse proxy server, modify line 37 of 1core.go: replace 'localhost' with '0.0.0.0' so that it accepts connections over your VPN from your reverse proxy.
You can also use index.php in the root of the www directory and not use the Go version at all. Though the PHP version is used mainly for prototyping.