Troubleshooting Guide in split format
| {{TableOfContents}} This page describes problems encountered using Fuego, and their solutions. | |
Note for Editors: please put each issue in it's own page section | ''Note for Editors: please put each issue in it's own page section'' | |
Installation [edit section] | = Installation = == Problem creating docker file == Make sure you are running on a 64-bit version of the Linux kernel on your host machine. | |
Problem starting Jenkins after initial container creation [edit section] | == Problem starting Jenkins after initial container creation == Doug Crawford reported a problem starting Jenkins in the container after his initial build. | |
{{{#!YellowBox $ sudo ./docker-create-container.sh Created JTA container 6a420f901af7847f2afa3100d3fb3852b71bc65f92aecd13a9aefe0823d42b77 $ sudo ./docker-start-container.sh Starting JTA container 6a420f901af7847f2afa3100d3fb3852b71bc65f92aecd13a9aefe0823d42b77 [....] Starting Jenkins Continuous Integration Server: jenkinssu: System error failed! [ ok ] Starting OpenBSD Secure Shell server: sshd. [ ok ] Starting network benchmark server. }}} | ||
The error string is jenkinssu: System error | The error string is jenkinssu: System error | |
Takuo Kogushi provides the following response: | Takuo Kogushi provides the following response: | |
I had the same issue. I did some search in the net and found it is not a problem of fuego itself. As far as I know there are two workarounds; * 1) Rebuild and install libpam with --disable-audit option (in the container) or * 2) Modify docker-create-container.sh to add --pid="host" option to docker create command | I had the same issue. I did some search in the net and found it is not a problem of fuego itself. As far as I know there are two workarounds; * 1) Rebuild and install libpam with --disable-audit option (in the container) or * 2) Modify docker-create-container.sh to add --pid="host" option to docker create command | |
Here is a patch provided by Koguchi-san: {{{#!YellowBox diff --git a/fuego-host-scripts/docker-create-container.sh b/fuego-host-scripts/docker-create-container.sh index 2ea7961..24663d6 100755 --- a/fuego-host-scripts/docker-create-container.sh +++ b/fuego-host-scripts/docker-create-container.sh @@ -7,7 +7,7 @@ while [ -h "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symli done DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )" | Here is a patch provided by Koguchi-san: {{{#!YellowBox diff --git a/fuego-host-scripts/docker-create-container.sh b/fuego-host-scripts/docker-create-container.sh index 2ea7961..24663d6 100755 --- a/fuego-host-scripts/docker-create-container.sh +++ b/fuego-host-scripts/docker-create-container.sh @@ -7,7 +7,7 @@ while [ -h "$SOURCE" ]; do # resolve $SOURCE until the file is no longer a symli done DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )" | |
-CONTAINER_ID=`sudo docker create -it -v $DIR/../userdata:/userdata --net="host" fuego` +CONTAINER_ID=`sudo docker create -it -v $DIR/../userdata:/userdata --pid="host" --net="host" fuego` CONTAINER_ID_FILE="$DIR/../last_fuego_container.id" echo "Created Fuego container $CONTAINER_ID" echo $CONTAINER_ID > $DIR/../last_fuego_container.id }}} | -CONTAINER_ID=`sudo docker create -it -v $DIR/../userdata:/userdata --net="host" fuego` +CONTAINER_ID=`sudo docker create -it -v $DIR/../userdata:/userdata --pid="host" --net="host" fuego` CONTAINER_ID_FILE="$DIR/../last_fuego_container.id" echo "Created Fuego container $CONTAINER_ID" echo $CONTAINER_ID > $DIR/../last_fuego_container.id }}} | |
Actually I have not tried the first one and do not know if there is any side effects for the second. --- | Actually I have not tried the first one and do not know if there is any side effects for the second. --- | |
This may be related to this docker bug: https://github.com/docker/docker/issues/5899 | This may be related to this docker bug: https://github.com/docker/docker/issues/5899 | |
Problem with bad port on ssh connection [edit section] | == Problem with bad port on ssh connection == ovgen.py doesn't parse SSH_PORT from: * /home/jenkins/fuego/engine/scripts/overlays/base/base-board.fuegoclass because it is missing double quotes. | |
The symptom is the following: You see the following in the test log for some test you tried to run: {{{#!YellowBox +++ sshpass -e ssh -o ServerAliveInterval=30 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=15 -p root@10.0.0.1 true Bad port 'root@10.0.0.1' +++ abort_job 'Cannot connect to 10.0.0.1 via ssh' +++ set + }}} | The symptom is the following: You see the following in the test log for some test you tried to run: {{{#!YellowBox +++ sshpass -e ssh -o ServerAliveInterval=30 -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=15 -p root@10.0.0.1 true Bad port 'root@10.0.0.1' +++ abort_job 'Cannot connect to 10.0.0.1 via ssh' +++ set + }}} | |
The error string here is "Bad port 'root@10.0.0.1'" | The error string here is "Bad port 'root@10.0.0.1'" | |
This occurs because the port is empty. It should have been passed to the ssh command after the '-p' command line option, but since it is empty, it uses the account-name@address combination as the argument. | This occurs because the port is empty. It should have been passed to the ssh command after the '-p' command line option, but since it is empty, it uses the account-name@address combination as the argument. | |
The reason it is empty is that a bug in the base-board.fuegoclass is missing the double-quotes. | The reason it is empty is that a bug in the base-board.fuegoclass is missing the double-quotes. | |
This is fixed in the tbird20d repository with the following commit: * https://bitbucket.org/tbird20d/fuego-core/commits/abb2e7161ba66017a267c09897e5db4d938ab214 | This is fixed in the tbird20d repository with the following commit: * https://bitbucket.org/tbird20d/fuego-core/commits/abb2e7161ba66017a267c09897e5db4d938ab214 | |
General [edit section] | = General = == Timeout executing ssh commands == In some cases, the ssh command used by Fuego takes a very long time to connect. There is a timeout for the ssh commands, specified as 15 seconds in the cogent repository and 30 seconds in the tbird20d repository. | |
The timeout for ssh commands is specified in the file * /home/jenkins/fuego/engine/scripts/overlays/base/base-params.fuegoclass | The timeout for ssh commands is specified in the file * /home/jenkins/fuego/engine/scripts/overlays/base/base-params.fuegoclass | |
You can change ConnectTimeout to something longer by editing the file. | You can change ConnectTimeout to something longer by editing the file. | |
FIXTHIS - make ConnectTimeout for ssh connections a board-level test variable | FIXTHIS - make ConnectTimeout for ssh connections a board-level test variable | |
ssh commands taking a long time [edit section] | == ssh commands taking a long time == Sometimes, even if the command does not time, the SSH operations on the target take a very long time for each operation. | |
The symptom is that when you are watching the console output for a test, the test stops at the point of each SSH connection to the target. | The symptom is that when you are watching the console output for a test, the test stops at the point of each SSH connection to the target. | |
One cause of long ssh connection times can be that the target ssh server (sshd) is configured to do DNS lookups on each inbound connection. | One cause of long ssh connection times can be that the target ssh server (sshd) is configured to do DNS lookups on each inbound connection. | |
To turn this off, on the target, edit the file: * /etc/ssh/sshd_config and add the line: | To turn this off, on the target, edit the file: * /etc/ssh/sshd_config and add the line: | |
{{{#!YellowBox UseDNS no }}} | ||
This line can be added anywhere in the file, but I recommend adding it right after the UsePrivilegeSeparation line (if that's there). | This line can be added anywhere in the file, but I recommend adding it right after the UsePrivilegeSeparation line (if that's there). |