johanot has joined #nixos-kubernetes
ixxie has joined #nixos-kubernetes
johanot has quit [Read error: No route to host]
johanot has joined #nixos-kubernetes
<ixxie> johanot: I added the wrapper but still get the error
<johanot> ixxie: which error?
<ixxie> error: unable to read client-key /var/lib/kubernetes/secrets/cluster-admin-key.pem for cluster-admin due to open /var/lib/kubernetes/secrets/cluster-admin-key.pem: permission denied
<ixxie> johanot: am I missing some kind of imperative step?
<ixxie> I thought maybe because its a node and a master I would need to call nixos-kubernetes-node-join
<johanot> The private key is owned by root, so you need to either execute your kubectl commands as root. Or "sudo chmod go+r /var/lib/kubernetes/secrets/cluster-admin-key.pem"
<ixxie> what is best practice?
<johanot> That's ... up for discussion I guess. It is bootstrapped this way so that you cannot gain cluster-admin rights just by being a normal user (by default). I think, rather than allowing anyone to read the private key, it would be better to create a system group for "k8s-admins" and assign group permissions accordingly.
<ixxie> do you plan to incorporate that sort of group into the module or will it be left for the cluster administrators to set up?
<johanot> One could argue that it would be a nice feature of the module yes. Such that the administator only have to assign group memberships and nothing else. I don't think I will modify the current PR for this, though. Maybe a separate PR.
<ixxie> yeah makes sense
<ixxie> it seems the DNS pod is still in a CrashLoop
<johanot> oh? can you grab the logs from the kube-dns container and put it in a pastebin or something?
<johanot> ixxie: seeing your msg from before, nixos-kubernetes-node-join is only necessary if you have a multinode cluster. If you run a singlenode (master+node) cluster, secret exchange happens locally, i.e. no need for manual work there :)
<ixxie> well, probably not the full logs
<ixxie> but I haven't learn how to get pod logs yet
<johanot> ixxie: sudo kubectl logs -n kube-system kube-dns-5746ddc44f-5cn5l -c kubedns
<ixxie> updated it
<ixxie> looks like permissions errors
<johanot> yeah the apiserver is not very happy with the requests from kubedns
<johanot> ixxie: sudo kubectl get serviceaccounts --all-namespaces
<johanot> what does that give you? ^^
<ixxie> NAMESPACE NAME SECRETS AGE
<ixxie> default default 1 5d
<ixxie> kube-public default 1 5d
<ixxie> kube-system attachdetach-controller 1 16h
<ixxie> kube-system certificate-controller 1 16h
<ixxie> kube-system clusterrole-aggregation-controller 1 16h
<ixxie> kube-system cronjob-controller 1 16h
<ixxie> kube-system daemon-set-controller 1 16h
<ixxie> kube-system default 1 5d
<ixxie> kube-system deployment-controller 1 16h
<ixxie> kube-system disruption-controller 1 16h
<ixxie> kube-system endpoint-controller 1 16h
<ixxie> kube-system expand-controller 1 16h
<ixxie> kube-system generic-garbage-collector 1 16h
<ixxie> kube-system horizontal-pod-autoscaler 1 16h
<ixxie> kube-system job-controller 1 16h
<ixxie> kube-system kube-dns 1 5d
<ixxie> kube-system namespace-controller 1 16h
<ixxie> kube-system node-controller 1 16h
<ixxie> kube-system persistent-volume-binder 1 16h
<ixxie> kube-system pod-garbage-collector 1 16h
<ixxie> kube-system pv-protection-controller 1 16h
<ixxie> kube-system pvc-protection-controller 1 16h
<ixxie> kube-system replicaset-controller 1 16h
<ixxie> kube-system replication-controller 1 16h
<ixxie> kube-system resourcequota-controller 1 16h
<ixxie> kube-system service-account-controller 1 16h
<ixxie> kube-system service-controller 1 16h
<ixxie> kube-system statefulset-controller 1 16h
<ixxie> kube-system ttl-controller 1 16h
<ixxie> johanot: could it be as simple as restarting the kube-dns pod?
<johanot> ixxie: could be.. Perhaps try and delete the kube-dns serviceaccount, wait for it to be recreated and then restart the kubedns pod..
<johanot> sudo kubectl delete serviceaccount -n kube-system kube-dns
<johanot> Then check the list of serviceaccounts until it reappears, should happen within a minute
<johanot> after that: sudo kubectl delete -n kube-system deploy kube-dns
<ixxie> sudo kubectl get serviceaccounts?
<johanot> ixxie: Yes.. with "-n kube-system" or "--all-namespaces"
<ixxie> right
<ixxie> johanot: I shouldn't have to recreate the kube-dns deployment right?
<johanot> right.. the addon-manager should do that
<ixxie> it seems... stuck?
<johanot> stuck how? kube-dns never comes back, or?
<ixxie> you can see in the last file of the gist
<ixxie> I'm still not very good at reading the kube state command outputs but it seems weird
<johanot> ixxie: You might need to check the outputs of "sudo kubectl get events -n kube-system" and "journalctl -u kubelet" + "journalctl -u kube-controller-manager"
<johanot> something is not right in that cluster
<johanot> for some reason
<ixxie> hmm
<ixxie> 12m 12m 4 kube-dns-7689ffc6b8-x9clw.155213282b76e1a1 Pod Warning FailedCreatePodSandBox kubelet, flux-master (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "ec36b62111bd9fc8fab5fc31c317640a607cf37c6b9495820723d3ab02393857" network for pod "kube-dns-7689ffc6b8-x9clw":
<ixxie> NetworkPlugin cni failed to set up pod "kube-dns-7689ffc6b8-x9clw_kube-system" network: failed to set bridge addr: "docker0" already has an IP address different from 10.2.62.1/24, failed to clean up sandbox container "ec36b62111bd9fc8fab5fc31c317640a607cf37c6b9495820723d3ab02393857" network for pod "kube-dns-7689ffc6b8-x9clw": NetworkPlugin cni failed to teardown pod
<ixxie> "kube-dns-7689ffc6b8-x9clw_kube-system" network: running [/nix/store/mdm7f4mka3dqdkvj1nz8gckzzad1hqqq-iptables-1.6.2/bin/iptables -t nat -D POSTROUTING -s 10.2.62.29/24 -j CNI-fd9c25428653fe1922c805eb -m comment --comment name: "mynet" id: "ec36b62111bd9fc8fab5fc31c317640a607cf37c6b9495820723d3ab02393857" --wait]: exit status 2: iptables v1.6.2: Couldn't load target
<ixxie> `CNI-fd9c25428653fe1922c805eb':No such file or directory
<ixxie> aah sorry
<ixxie> ill paste in a gist
<ixxie> it looked more compact in the terminal
<ixxie> it seems to be a conflict with the docker0 networking interface....
<johanot> ixxie: looks like a networking issue.. Might need to see the output of "ip route" and "ip addr" as well
<johanot> yes
<johanot> ixxie: docker0 should have an ip-range subordinate to that of flannel.1
<johanot> I would expect docker0 to have something like 10.2.62.1/24
<ixxie> maybe its because I seperately specified docker as a system package?
<johanot> could be.. however.. the flannel package should reconfigure the docker daemon automatically. Maybe you could do "systemctl stop docker" "ip addr del 172.17.0.1/16" "systemctl restart flannel" "systemctl start docker"
<johanot> ip addr del 172.17.0.1/16 dev docker0
<ixxie> it responds RTNETLINK answers: Cannot assign requested address
<johanot> maybe the address disappeared when you closed down the docker daemon?
<ixxie> how would I check?
<johanot> ip addr :)
<ixxie> but of course xD
<ixxie> 3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
<ixxie> link/ether 02:42:03:18:ce:b8 brd ff:ff:ff:ff:ff:ff
<ixxie> inet 10.2.62.1/24 brd 10.2.62.255 scope global docker0
<ixxie> valid_lft forever preferred_lft forever
<johanot> lgtm
<ixxie> looks better right?
<johanot> indeed
<ixxie> should I restart dns again now?
<johanot> yep
<ixxie> seems to be running okay now
<ixxie> but I wonder what caused this
<johanot> ixxie: probably a case that we/I didn't foresee. i.e. the docker daemon is already installed with X config, and overriding that config might fail in some cases.
<johanot> But I think this must have been a problem on nixos stable as well, because I haven't really touched the flannel setup during the 1.11 refactor
<ixxie> johanot: I guess explicitly setting docker isn't common but some people learning Kube might wanna have docker_compose available as a fallback
<johanot> ixxie: I definitely don't think setting docker in "environment.systemPackages" caused the problem :)
<johanot> Rather that virtulization.docker.enable = true; was set before the kubernetes.flannel submodule was enabled.
<ixxie> ah
<johanot> .. and some combination of state conflicted
<johanot> If I knew which atm. I would fix it instantly :D
<ixxie> well if I can help debug it somehow let me know
<ixxie> I was wondering if I rebuilt the system now whether the error would return
<johanot> ixxie: urgh.. it really shouldn't. But it would be nice with a test to confirm, perhaps a reboot as well. - to get interfaces de-configured and re-configured and all daemons restarted in (hopefully) the correct order.
<ixxie> logs look fine now btw
<ixxie> the only oddity is
<ixxie> [ixxie@flux-master:~]$ sudo kubectl get pods -n kube-system
<ixxie> NAME READY STATUS RESTARTS AGE
<ixxie> kube-dns-5746ddc44f-l52h2 1/3 Unknown 650 5d
<ixxie> kube-dns-7689ffc6b8-cltzd 3/3 Running 0 10m
<ixxie> this weird leftover old kube-dns pod
<ixxie> it shouldn't be stuck there right?
<johanot> ixxie: you should be able to force delete it: "sudo kubectl delete pod -n kube-system --force <podname>"
<ixxie> alright looking good
<ixxie> johanot: I'll do another deploy
<ixxie> rebooting now
<ixxie> hmm
<ixxie> it won't reboot
<ixxie> johanot: it can't find the root device
<ixxie> very weird
<ixxie> I mean I don't care because everything is in the configs and I can just recreate the VM
<ixxie> but its odd that what we just did broke the installation like that
<johanot> ixxie: ok.. it's really not likely that the kubernetes module touched the bootloader or anything :)
<ixxie> more likely I did something stupid
<ixxie> just trying to figure out what
<johanot> regarding the docker/flannel problem, I'll try to reproduce later today on a new vm.
<ixxie> well here is the current HEAD in case you need to reference my repo 8c6bded1b9da3cbcdfcf249836024ca31ac1497a
johanot has quit [Ping timeout: 240 seconds]
<ixxie> I'm converting a fresh VM now
<ixxie> so I guess we will find out soon enough how easy it is to reproduce
johanot has joined #nixos-kubernetes
<johanot> ixxie: I won't get time to reproduce today, will try over the weekend. Please let me know if you manage to reproduce it.
<ixxie> johanot: will do
<ixxie> and thanks once again!
<johanot> ixxie: thank you for testing. very helpful. I realize that not everything in the module is obvious and perfect :)
<ixxie> johanot: thanks for developing! This is obviously a lot of work and while it has its kinks to figure out, this is surely way easier than most approaches to setting up k8s
<ixxie> (I meant its obvious alot of work went into this)
<johanot> it has been a lot of work yes :) almost to the point were I considered recommending people to just use kubeadm.. But... kubeadm is just... Not quite there yet, IMHO
<ixxie> johanot: give me half a year to a year and maybe I could start contributing
johanot has quit [Remote host closed the connection]
johanot has joined #nixos-kubernetes
<johanot> ixxie: looking forward to that! :) feel free to suggest changes anytime, whether you feel ready or not..
johanot has quit [Remote host closed the connection]
ixxie has quit [Quit: Lost terminal]