Category: Offsites

Stuff I find valuable at key websites

How to Start a Startup – Stanford 정리

Post author By
Post date March 6, 2021
No Comments on How to Start a Startup – Stanford 정리

대기업에서 스타트업으로 다시 돌아가면서, 예전에 인상깊게 본 How to Start a Startup 강의를 정리해보고자 합니다. 이 강의는 총 20개로 구성되어 있고, 2014년에 스탠포드에서 진행이 되었습니다. Y Combinator 라는 유명한 액셀러레이터의 대표인 샘 알트만이 진행하였고, 다양한 스타트업 (대부분 혹은.. 모두 Y Combinator가 투자한)에서 연사들이 각 주제별로 강의를 합니다.

모든 강의를 하나하나 요약하고 정리하는 것이 아닌, ‘제품’, ‘사용자’ 와 같이 대분류에 맞춰서 묶을 수 있는 내용들을 모으고 내용 또한 블릿 포인트(bullet point)로 정리합니다. 그리고 몇가지 용어들에 대해서는 조금 더 살펴보고 따로 정리하는 것을 목표로 (e.g. CLV, Cohort Analysis 등) 하였습니다.

General

Great Idea → Great Product → Great Company
왜 창업을 하고 싶은가.
- 최고의 이유
  - 해당 아이디어는 내가 아니면 안 된다
  - 세상이 나를 필요로 한다
스타트업을 시작하는 것은 애를 키운다는 것과 비슷하다고 할 수 있다.

출처: How to Start a Startup, Lecture Note 1

Product : 제품

Idea

좋은 스타트업 아이디어를 떠올리는 방법은 한걸음 뒤로 물러서서 보는 것.

의식적이 아닌, 무의식적으로 떠올릴 수 있어야 한다.
1. 중요한 것들을 열심히 배우라
  1. 기술을 배워라
  2. 최신 기술들을 지켜보기
2. 흥미로운 것들을 해봐라
  1. 기업가 정신은 하나의 도메인 전문가가 된다는 것
3. (존경하는) 친구들과 같이 해봐라

사용자 & 고객

해결해야하는 문제에 대해 정확히 파악해야 한다.
- ‘문제’ 는 한 문장으로 표현할 수 있어야 한다.
- 어떤 산업군에 속하는가, 산업규모는 어떤가?
  해당 산업에 대해서 구체적으로 살펴볼 수 있어야 한다. (2~3 개월)
- 제품을 만들 때 모든 관점에서 고객의 입장이 되어보야아 한다.
  - 해당 산업의 전문가가 되어야 한다.
목표 고객군을 설정해야 한다.
코드 작성 전에 사용자 경험을 고려한 스토리보드 작성

출처: How to Start a Startup, Lecture Note 7

소수의 사용자가 좋아하는 것에 집중해야 한다.
- 자연스럽게 유저가 늘어날 것
- 제품 런칭 전에 해야할 일은? 딱 10명에서 100명을 위한 기능을 만들어야 한다.
창업자가 직접 고객들을 만나야 한다
첫 고객에게 집중하라.
새로운 고객은 데이트를 하는 것과 같고,
- 첫 인상이 중요하다. 그 느낌을 계속 가지고 이야기할 것이기 때문
- 첫 번째 상호작용을 할 때, 사람들은 인내심이 훨씬 적다.
- 그렇기 때문에 첫 인상을 잘 만드는 것은 제품에 아주 중요하다.
  e.g.) Vimeo, Wufoo, Heroku, Chocolat, hurl, MailChimp 케이스, stripe
기존 고객은 결혼한 배우자와 같다.
- 성공적으로 잘 사는 관계는 잘 싸운다. (가장 큰 요소)
- Money – Cost / Billing
- Kids – Users’ Clients
- Sex – Performance
- Time – Roadmap
- Others – Others

출처: How to Start a Startup, Lecture Note 1

SSD: Support Driven Development
- 고품질의 소프트웨어를 개발하는 방법
- 모든 사람이 고객 지원을 하도록 하는 것 → 개발자와 디자이너 모두에게 피드백이 골고루 돌아갈 수 있다.
- 제품을 만든 사람이 직접하기 때문에 최고의 고객지원을 할 수 있다.
- 대화거부는 최악의 행동이며 스타트업에서 많이 하는 실수 중 하나이다.
Jared Spool, 팀원들이 고객에게 직접 노출되는 시간의 양과 디자인이 좋아지는 정도가 정확히 비례한다. (고객과 실시간으로 이야기를 해야 한다.)
최소 6주에 한번은 그런 시간이 있어야 하고, 그 시간은 2시간 이상이어야 한다.
감사편지 보내기 – 팀원들이 겸손하려고 했기 때문에 가능한 일 (팀의 조직력을 높이고 팀이 신경쓰는 일을 하게하는 일종의 의식)
제일 돈이 되는 사용자들에게 집중하라 → 그 고객들만 잘 관리하면 나머지는 따라온다.
고객을 얻는 방법?
1. 시작은 지인, 온라인 커뮤니티, 로컬 커뮤니티, 메일, 언론 등..
2. 진짜 고객을 만날 수 있는 곳으로 가야 한다
고객들에게 피드백을 받아라.
1. 피드백이 오는 것보다는 직접 전화를 걸어라, 찾아가라
2. 설문: 정말 좋거나, 싫은 경우에만 피드백이 올 것이다.
3. 고객의 재방문율을 측정하라.
4. 고객에게 돈을 내게 할 수 있다면, 정말 훌륭한 피드백을 받을 수 있을 것이다.
더 많은 유저를 위해서는?
- 한가지 초점만 정해놓고, 일주일 내내 집중해야 한다.
- 한 채널씩 이해해야 한다.
- 가장 중요한 부분은 창의력
- 아무도 안 하는 한가지를 찾아서 극단적으로 진행해야 한다.
이미 유사한 제품을 사용하고 있는 고객을 끌어오는 방법
- 사용하고 있는 서비스를 전환하는 것은 비용이 든다. 확실한 장점이나 차별성이 있어야 한다. 50가지 자잘한 기능보다는 1, 2 가지의 차별점이 좋다.

Metrics

Metrics: Focus on growth
- Total Registrations
- Active users
- Activity Levels
- Cohort Retention
- Revenue
- Net Promoter Score
  - 이 제품이나 서비스를 동료나 친구들에게 추천할 의향이 얼마나 되시나요?
Growth : 전환율과 이탈율의 상호작용
- 전환율을 1% 높이는 것과 이탈율을 1% 줄이는 것이 성장에 미치는 영향은 똑같다. 보통은 후자가 더 쉽고 비용도 적게 들어간다.

그 외

아이디어, 고객, 산업군에 대해서 알았다면 이제 제품을 만들어야 한다.
1. MVP: Minimum Viable Product, 최소생존제품
2. 무엇을 하는 제품인지 명확히 정리해야 한다.
  (우리 제품은 이런거야 하고 명확하게 말할 수 있어야 한다.)
사람들이 좋아하는 제품이란, 해당 제품에 대해서 열광적인 고객층이 있고, 그 고객층이 우리 회사의 제품 뿐아니라 회사 자체도 성공하길 바라는 것.
시장을 장악하는 3가지 방법
- 최고의 가격 : 유통
- 최고의 제품 : R&D
- 최고의 종합 솔루션 : 고객 친화
  유일하게 누구나 어떤 단계의 기업에서건 쓸 수 있는 방법이다.

QnA 모음

Q. 다양한 고객, 모두가 사랑하는 제품은 어떻게 만들 수 있을까요?

⇒ 초기에는 가장 열성적인 고객에게 집중, 그 고객들에게 맞추다보면 언젠가 일반적인 가치를 발견할 수 있을 것, 기본이 되는 기능들을 만들고 난 후에 그것을 돋보이게 하라

Q. 제품을 만드는 것과 그 외의 일 간의 균형을 맞추는 방법

⇒ 제품에 집중하면서 한쪽에서는 고객과 이야기를 해야하고, 이런 분담은 어느정도 필요하다. 중요한 것은 순환적인 피드백 고리이다.

Q. 제품에 대해 팀 내에서 의견이 서로 다른 경우 의사결정을 진행하는 방법

⇒ 고객 지원을 통해서 해결 (문의가 많은 기능에 대한 기능 순으로 해결)
무조건 고객들이 하라는대로 하는 것이 아닌, 왜 고객들이 그런 요청을 하는지 그 원천적 이유를 찾아서 해결하는 것. 각자의 버전을 만들고 대응해보는 것도 좋다.

Q. Pinterest의 비전이 초창기와 달라졌는가?
⇒ 사람들의 수집품에서 예상하지 못 했던 놀라운 것들을 찾아낼 수 있다는 사실을 나중에 알게 되었다. (사람들의 제품사용 양상이 다른 방향을 보여준다.)
초기에는 누군가 제품을 사용해준 다는 것에 흥분 했다. 재밌는 점은 회사가 성장할수록 포부도 같이 커진다는 점이다. 그래서 항상 Gap이 있다. 우리가 있는 곳과 있어야 할 곳. 객관적으로 우리는 많이 달려왔지만.. 이 Gap은 계속해서 벌어지고 있다.

Execution : 실행

실행에 대한 2가지 질문
1. 무엇을 할지 정할 수 있는가?
2. 그 일을 마무리 할 수 있는가?
  - Focus (선택과 집중)
    - 가장 중요한 2~3가지 일만 골라서 하는 것
    - 커뮤니케이션이 잘 진행 되어야 한다.
      - 매주 방향을 공유하고, 디테일한 목표 수치를 향해 달릴 수 있도록.
    - 정확한 성과지표를 파악하고, 이 지표가 지속적으로 성장할 수 있도록.
  - Intensity (빡세게 일하기)
    - 스타트업은 워라밸이 가능한 곳이 아니다.
    - 끊임없는 운영 리듬 & 퀄리티에 대한 집착
    - 모든 방면에서 빠르게
아무리 큰 일이라도 작은 일으로 나눠서 진행할 수 있다.
- 일을 작은 프로젝트로 나눠서 진행하라.
사람에 대해서는 직관에 의존해도 된다.
스타트업을 성공시키기 위한 지식은 스타트업을 위한 지식이 아니다.
- 고객에 대한 전문가가 되어야 한다.
스타트업에서는 요령피우는 것이 먹히지 않는다.
스타트업은 모두 매우 빡세다.
스타트업이 성공한다는 것에 대해서는 예측이 불가능하다.
- 운동선수와 수학자처럼 예측이 불가능하다.
- 창업가가 얼마나 강인하고 야망이 있는가.
피보팅
1. 회사가 성장하지 않을 때
2. 제품을 사용자가 사용하지 않을 때
3. 논리적으로 비즈니스가 말이 안 될때
“좋은 계획을 오늘 죽어라 실천하는 것이, 내일에 대한 완벽한 계획을 가진 것보다 낫다.”
“물건을 부수는 것은 상관 없으니 빨리 움직여라!”

DOORDASH (배달앱) 케이스

아무것도 준비되어 있지 않는 상황에서, 바로 랜딩페이지로 서비스 런칭
- 초기에는 아이디어를 시험해보고, 사업을 시작하고, 사람들이 원하는 것인지 확인하는게 중요
“확장성이 없는 일을 일부로 해라”
- 이 일은 본인의 사업에 있어 전문가가 되게 해준다. 직접 개개인 별로 피드백 메일도 전달.. 등등
초기에는 일단 시작을 하고, 제품이 시장에 맞는지 확인해야 한다.
정리
1. 가설 검증
2. 런칭을 빠르게
3. 확장성이 없는 일을 한다
  수요가 확인 되면 확장을 할 수 있기 때문이다

Teespring (이커머스 플랫폼) 케이스

스타트업이 가지고 있는 가장 원초적인 강점: 확장성 없는 일을 할 수 있는 것

첫 고객을 모으고
고객을 모으는 데는 왕도가 없다. 가장 힘든 일일 것이다. 모맨텀을 만들기 까지가 가장 힘들다. 여기서 시간 대비 효율을 따지는 것은 무의미하다. 제품을 무료로 주지 말아라. (제품의 가치를 해치는 일이다., 그래서 지속가능하지 않은 일이 된다.)
고객을 챔피언으로 만드는 것
입소문을 낼 수 있도록, 기억에 남을 만한 경험을 선사하는 것 (고객과 대화를 하는 방법),
실제 사용자에게 듣는 것보다 제품을 좋게 만드는 방법은 없다.
소셜미디어와 미디어를 확인하라.
중요한 것은 문제를 제대로 고쳐나가는 거이다. (가장 불만이 많던 고객이 가장 영향력이 있는 챔피언이 되어주는 경우가 있다.)
시장에 맞는 제품을 찾는 것
처음 런칭한 제품은 성장할 수 있는 수준의 제품이 아니어도 된다. 속도가 중요.
경헙으로 얻은 인사이트: 오직 다음 규모의 고객에 대해서만 생각하라
확장성이 없는 일을 최대한 오랜기간 해야 한다.

Growth : 성장

성장동력을 유지하는 것 (스타트업 운영의 핵심)
- 항상 이기는 팀을 만들어 내는 것
- 배포에 대한 주기를 정립 (새로운 Features)
- 지표를 공유하는 것
매출은 모든 것은 고친다.
페이스북 케이스 ‘성장모임’
- 회사의 성장 동력으로 작용

출처: How to Start a Startup, Lecture Note 4

지속가능한 성장이 중요하다.
1. Sticky :기존 사용자가 계속 방문하게 하는 것
  - 좋은 경험이 중요하다.
  - CLV (고객주기), Cohort 분석으로 알 수 있다.
  - 오랜기간이 지나도 계속 사용하는 고객 → Core 고객들
2. Viral : 고객이 직접 주변사람에게 권하는 것
  - 좋은 경험이 중요하다. 그것과 함께 입소문을 낼 수 있는 장치가 필요하다.
    1. 소문을 낼 수 있는 것을 알려준다.
    2. 제도적인 장치 → 친구를 초대하면 양쪽에 크래딧 제공
3. Paid : 매출을 통해서 돈을 벌고 계속 성장
  - CLV > CAC (고객 획득비용) , 돈을 쓰고 더 많은 돈을 얻을 수 있어야 한다.
  - 들인 돈을 회수할 수 있는 기간이 중요하다. (3개월 정도)
사업할 때 전체 시장크기를 볼 수 있어야 한다.
성장에서 가장 중요한 것은 무엇인가?
⇒ 좋은 제품 → 고객 유지 (성장의 핵심)
X축과 평행하게 가면, 시장 크기에 맞는 제품을 잘 만들어낸 것이다.
Retention이 0 으로 가면 제품이 시장에 맞는지부터 확인을 해야 한다.
좋은 Retention 은 몇 프로인가?
⇒ 사업 분야마다 성공을 위한 재방문 수렴치는 다르다.
스타트업에는 성장팀이라는 것이 있어서는 안 된다. 그 팀 자체가 성장팀을 포함해야 한다.
일하는 사람들이 지금 회사에 중요한 것을 알기는 어렵다.
“마법의 순간”
- 사용자들이 ‘아하’를 경험하는 순간
- 예시)
  - 페이스북 → 가입 후 친구의 사진을 보는 순간 (소셜미디어임을 확인하는 순간)
  - 왓츠앱 → 친구, 이베이 → 물건 리스트
- 제품의 마법의 순간이 언제인지 생각해보고, 사용자가 그것을 최대한 빨리 느낄 수 있도록 해라
성장은 설계하는 것이 아니라 지켜보는 것이다?
우리가 초점을 맞춰야하는 고객들은 경계선에 있는 사람들이다. 성장할 때는 이미 잘 사용하고 있는 고객에 신경을 쓸 필요가 없다. 성장에 있어서 가장 중요한 점이다.
잘 만들고, 고객을 데려올 수 있어야 한다. (마케팅)
유입경로
1. Virality : 전파성을 3가지로 나눠 보는 것.
  - Payload : 한번의 바이럴 마케팅으로 만들 수 있는 도달의 수
  - 전환율(CR) : 전환이 얼마나 일어나는가
  - 빈도 : 얼마나 자주 도달 할 수 있는가
  - Import → Send → How many People → Click → Sign up → Import …
    - K factor로 전파성을 판단
2. SEO : 검색엔진
  - 키워드 : 어떤 키워드로 경쟁을 할 것인가 정해야 한다.
  - PageRank

페이스북 케이스

페이스북 성장팀에서 했던 일

가입하고 2주 이내 친구 10명을 찾게하는 것
국제시장 진출 → 언제든지 확장이 가능한 제품 (느리더라도 제대로)
- 우선 순위의 언어에 먼저 집중

QnA 모음

Q. 이메일 마케팅

⇒ 이메일은 25세 이하는 거의 사용하지 않는다. 메일, 문자, 앱 푸쉬는 동작 방식 전부 비슷, 스팸에 들어가면 안되므로, 우등생? 처럼 메시지를 전달하는 것이 중요하다 중요한 것은 사람들에게 전달이 되어야 하는 것이다.
이 알림은 어떤 경로를 통해 보낼 것인가가 중요하고, 그 다음은 참여를 독려하는 내용에 신경써라.

Business : 비즈니스

출처: How to Start a Startup, Lecture Note 5

항상 독점을 목표로 하고 경쟁은 피해야 한다.
사업을 가치있게 만드는 것은 무엇인가?
⇒ X 만큼 가치를 창출하고, Y 만큼 그 안에서 이득을 얻는다. (X, Y는 독립 변수)
세상에는 ‘완전경쟁시장’과 ‘독접사업’ 두 가지 뿐이다.
보통 독점기업은 규제를 피하기 위해서 거짓말을 하고 있고, 완전경쟁시장에서는 1등을 하고 있다고 말하기 위해 독점인 것처럼 말을 한다.
교집합 시장이 진짜인지, 말이 되는지, 가치가 있는지 항상 따져보아야 한다.
어떻게 독점시점을 만들어 낼 것인가?
- 작은 시장을 노려라. 그 후 그 시장을 구심점으로 확장해라.
  - 아주 작은 시장들은 저평가가 되어 있다.
독점 기업의 특성
- 자신만의 기술을 가지고 있다.
  - 불행한 회사는 닮았으며, 행복한 회사는 각각 다르다.
  - 핵심적인 부분에 대해서 혁신적인 개선점을 가지고 있어야 한다.
- 소프트웨어는 한계비용이 0 이다!
  - 고정비용은 크지만 한계비용이 작아서 규모의 경제를 갖출 수 있는 경우
- 네트워크 효과는 일반적으로 시간이 지나면서 확장되는 것.
그 분야의 마지막 회사가 되어라
가치평가에서 가장 중요한 부분은 지속가능성이다.
왜 이 사업이 오랜기간 지속될 것인가를 고민해보는 일..!
경쟁은 그 분야에 대해서 당신을 발전하게 만든다. 하지만 정말 중요한 것이 무엇인가에 대한 큰 질문을 하지 못 하면 더 큰 것을 잃는다.

QnA 모음

Q. 아이디어가 독점적인 사업이 될지 아닐지 구분하는 방법

⇒ 실제 시장에 집중해야 한다.

위대한 기업은 다른 기업 들과 다르게 한 단계 더 뛰어넙는 진보가 있었다고 본다.
다른 사람 혹은 심지어 고객의 의견까지도 받지 않고 가는 경우도 있다.

Investment : 투자 유치

사업을 한 문장으로 설명할 수 있어야 한다.
창업가는 결단력이 있어야 한다. 무슨 일을 하건 일이 진행되도록 해라.
최대한 돈 없이 가라. (Bootstrap)
- 투자를 안 받고 가는 것도 충분히 가능하다.
“성공의 비결은 너무 잘해서 다른 사람들이 무시할 수 없게 하는 것”
투자보다 실제로 돈을 벌고 있는 것이 더 어렵다.

돈과 리스크와의 관계
리스크의 양파이론: 긴 목록의 리스크 리스트가 있을 것이고, 투자의 과정은 투자를 통해 이 리스크를 한 단계씩 제거해 나가는 것이다. 마일스톤을 달성해 나갈 때 마다 리스크를 제거하고, 사업을 진행시키는 것.
투자자들에게 NDA (비밀 서약서) 를 써달라고 하지 마라.
기록을 하라.
좋은 회사로 부터 시드펀딩 → 시리즈 A.. 이런 가능성은 계속 올라가게 된다.
- ⇒ 좋은 투자자는 누구인가?
투자 시 협상은 어떻게 하는 것이 좋은가?
시드펀딩에서 어디까지 지분을 주는 것이 좋은가?
- ⇒ 20~30%, 30 이상은 주주 쪽으로 문제가 될 여지가 있고,.. 지분이 떨어지는 것에 따라서 동기가 떨어지게 되는데.. 틀이 중요하다. 시드에서는 10~15%, … 외부투자자들과의 관계가 복잡해서 투자를 하지 않는 경우도 많다.
최고의 투자는?
- ⇒ Google, AirBnB → 3명의 창업자 모두가 비슷하게 훌륭하기 때문
모든 회사는 CEO가 있습니다. 회사를 시작하면 여러분만큼 잘하거나 더 나은 공동창업자를 찾아야 한다. 그것만 해내도 성공할 확률이 천문학적으로 늘어난다.
마크 주커버그는 혼자서 이끄는 예외적인 케이스이다.

투자자와 이야기하는 법

출처: How to Start a Startup, Lecture Note 19 – 2

30초 홍보 : 여러분 회사에 대해 처음 소개하는 것
- 3문장이면 충분하다.
- 첫 문장 – 회사가 하는 일을 소개 (직관적으로 소개, 엄마 테스트 추천)
- 다음 문장 – 시장의 크기 (Botton up analysis)
- 마지막 문장 – 성장률 (빠르게 움직이는 회사임을 보여주라)
2분 홍보 : 여러분 회사에 관심있는 사람들을 대상으로 함 (10분 30분 혹은 1시간을 준비하는데, 2분이면 충분하다)
- Unique Insight – “아하” 순간, 이 것을 첫 두문장으로 말할 수 있어야 한다.
- How you make money – 사업 모델, 어떻게 돈을 버는지 혹은 벌 것인지 명확하게 이야기하라
- Team – 1. 팀이 이룬 업적을 자랑 (학위나 상장 X), 2. 창업자의 수 + 기간 + 풀타임
- The Big Ask ($$$) – 자금 요청, 진지한 태도로 임해야 한다.
When to Fundraise
- 투자자들은 성장률 기반으로 투자하고 싶어한다.
- 여러분이 강하면서 약한 단계 – 자금 요청을 할 시기
- 투자자가 갑이 되는 상황으로 가면 안 된다 – “여러분의 자금 없이는 저희는 아무것도 못합니다” 이렇게 하면 안 된다.
투자자와 미팅을 잡는 법
- 이상적: 다른 경영자가 여러분 대신에 회사를 소개해주는 것 (신뢰가는 소개)
- 여러가지 일을 동시에 생각하라
  - 투자자들과의 미팅 셋팅을 한 주내에 다 진행해야 한다
- 자금 마련 전담 맴버가 있어야 한다
- 미팅이 끝난 후
  - 사후 관리
  - 투자자들을 면밀히 조사하라
  - 끝낼 때를 제대로 알라
  - 회사를 세우라! (자금 마련이 목표가 아니다)

QnA 모음

Q. 어떤 팀, 회사에 투자를 하나요?

⇒ 처음 만나고 1분 정도 이야기하는 동안 정리가 된다. 이 사람이 리더인지, 자신의 제품에 대해서 집중하고 있는지 (질문: 사업을 하게 된 계기), 그 다음으로 보는 것이 커뮤니케이션 능력 → 리더쉽과 훌륭한 커뮤니케이션 능력 이 2가지를 갖추고 있어야 한다.

⇒ 일반적인 2가지.

VC 투자 활동은 극히 예외적인 경우만 다루는 게임이다.
4000 개 → 200개만 투자 → 그중 15%가 97%의 수입을 가져다 줌
강점이 큰 회사에 투자할 것인가, 약점이 없는 회사에 투자할 것인가
체크 리스트 만으로는 그 팀이 가지고 있는 강점을 제대로 파악할 수 없다.
결점이 있는 회사를 제외 시키면 15% 안에 회사에 드는 회사를 얻기 어렵다.

Q. 초기 자본이 많이 필요한 스타트업에 대해서는 어떻게 하는가?

⇒ 양파 리스크이론을 다시 언급하면, 리스크를 없앨 수 있는 더 구체적인 계획이 필요할 것이다.

Q. 안 좋은 투자자는 무엇인가?

⇒ 네트워크가 부족하거나, 도움을 받을 수 있는 것이 없는 경우, 투자자를 고르는 것은 결혼 상대를 고르는 것이나 마찬가지이다. 약 15~20년 동안 서로를 의지하고 기대면서 같이 나아가는 것이기 때문.

한번 창업가는 영원한 창업가이다.

투자자를 만났을 때, 이 사람이 존경할 만 한가, 배울 점이 많은 가를 고려해보아라.

피투자사와의 관계는 신뢰를 기반으로 해야 한다.

VC의 가장 큰 제약사항은 결국 큰 기회비용이다. 한 분야에서의 최고를 투자하려고 한다.

Q. 제품이 없는 팀이 투자를 받을 방법은?

⇒ 팀이 중요하다. 기본적으로도 팀을 보고 투자하기 때문.

Q. 투자철학을 하나의 문장으로 만드는 것이 필요한가?

⇒ 한 문장으로 정리할 수 있을 정도로 명확한 투자철학을 가지고 있어야 한다. (엘레베이터 피치)

Q. 적합한 시장을 찾는 것 vs 시장을 만들어가는 것

⇒ 굉장히 어려운 문제이다. 투자 철학을 검증하는데 있어, 이 방향이 과연 시장의 존재유무를 확인하는 데에 도움이 되는가이다. 여러분은 현존하지 않는 시장이 여러분만이 알고 있는 근거에 따른 역발상 때문에 창조 가능하다는 주장을 명료하게 펼칠 수 있어야 한다.

Culture : 조직, 기업문화

창립자들이 하는 일이 회사의 문화다.
믿음은 생각이 되고, 생각은 말이 되고, 말은 행동이 되고, 행동은 습관이 되고, 습관은 가치가 되고, 가치는 운명이 된다. – 간디
이것들이 왜 중요한가?
⇒ First Principles (의사 결정의 첫 번째 기준), Alignment, Stability (안정감), Trust, Exclusion (하지 말아야할 것은 아는 것이 더 중요하다), Retention
Zappos의, 첫 번째 핵심가치는 서비스에 “와우” 요소를 넣는 것, 또 다른 핵심가치는 겸손
성과가 좋은 팀의 특정
- Patrick Lencioni 의 피라미드 – 1. 신뢰, 2. Conflict, 3. Commitment, 4. Accountability, 4. Results

출처: How to Start a Startup, Lecture Note 10

인터뷰 과정에서 문화를 고려해야 한다.
- 좋은 문화를 만드는 데 성패가 갈리는 부분은 습관적으로 할 수 있는가 없는가 이다.
문화가 일을 해내는 방식이라고 한다면 2가지 요소가 있다.
1. 행동양식 (변하는 것), 2. 변하지 않는 것이 필요하다. (핵심가치)
여러분만의 핵심가지치 3~4개는 있어야 한다.
새로운 사람을 한 명도 고용하기 전에 핵심가치를 정했던 회사! → 첫 번째 개발자를 뽑는데, 걸린 시간은 5~6개월 → 이 사람이 회사의 DNA를 결정하기 때문이다. 다양성은 존중하나, 가치관은 비슷한 사람들이어야 한다.
회사의 사명
- ⇒ ‘소속감’을 주고 공통체를 만들어가는 일
AirBnB의 핵심가치
1. 사명을 위해 분투하는 것
  - 여기 하는 일에 믿음이 있는가?
  - 소명이 있는 사람을 찾는다
2. 연쇄 창업가가 되어야 한다는 것 (창조적)

이것들은 회사의 가치와 원칙으로 변하게 된다.

문화의 문제점
1. 문화에 대한 글은 찾기 어렵다.
2. 정량화하기 어렵다.
3. 장기적으로만 효과가 있다.
문화는 마치 미래에 투자하는 것과 같다. 결국 장기간 버틸 회사를 만드는데 필요한 것

지향점을 확실하게 해야 한다.
- 이 가치들을 기준으로 고용/해고를 해야 한다.
- 여기에 엔지니어가 들어가서 뽑는 것은 좋지 않다. 실력 위주로 보기 때문
전반적인 사내문화
초기 인력 구성
그 이상으로 규모가 커져감에 따라서 발생하는 변화와 적응

QnA 모음

Q. 문화란 무엇일까? → 기업의 문화는 무엇이 되어야 하는가?

⇒ Set of Values?

Q. 기업 문화란 무엇일까?

⇒ 핵심가치, 행동 혹은 행위, 사명감 (목표)

Q. AirBnB 시나이로

⇒ 너무 잘나서 불편하게 할 정도의 팀을 만드는 것이 중요하다. (훌륭한 공동창업자)
제품을 만들고 나면, 회사를 만들어야 한다. → 우리는 오래살 수 있는 회사를 만들고 싶었다. 길게 가는 회사들의 공통점 → 분명한 임무를 가지고 있고, 명확한 가치를 가치며 협업할 수 있다는 것.

Q. 어떻게 집주인들이 AirBnB의 문화를 따르게 할 수 있는가?

⇒ 집주인들은 문화와 맞지 많아도 된다고 생각했지만, 아니였다. (문제가 생기는 등..), 그 이후로는 ‘슈퍼집주인’ 프로그램을 통해서 문화를 따르는 집주인들을 더 대우해준다.

Q. 회사를 만들면서 가장 중요하게 생각했던 요소는?

⇒ 누구를 채용하는가? 채용된 사람들이 무엇을 중시하는지, 우리가 매일 무슨 일을 하는가 또 왜 하는가?, 우리가 소통하기로 선택한 것들, 마지막으로 찬양할 것은 무엇인가?

⇒ 내부 투명성을 더 강조, 모든 직원들이 회사가 하는 바를 지지하고 믿을 수 있도록 하기 위함 (정보를 원활히 접근할 수 있고 상태를 알 수 있다면 도움이 될 것)

⇒ 사내 문화 정립은 다양한 문제를 풀 수 있음, 사람들이 늘어나면서 직접적으로 영향을 끼치는 영향도는 줄어들 수 밖에 없음.
초기에 사람을 10명을 데려오는 것은, 이 사람들이 데려올 90명의 사람들에 대한 영향력까지 고려를 해야하기 때문에 굉장히 조심해야 하는 일이다.

Q. 초기 채용을 하고 올바른 문화를 퍼트리기 위해 한 행동은?

⇒ 언젠가 이루고 싶은 목표를 직원들에게 계속해서 상기시킴, 왜냐하면 누군가에게 문제가 주어지면, 그 사람은 직면한 문제가 세상의 전부라고 생각하게 되기 쉽기 떄문. 많은 시간을 들여서 채용을 한 다음에 입사 후 30일간의 경험을 개선하기 위해 노력한다. (예를 들어, 이름을 아는 동료직원은 있는지? 자신의 매니저가 누군지? 팀원들과 만나고 있는지? 회사에 전반적인 구조는 아는지? 최우선 목표들은 아는지? 이런 프로그램들을 계속해서 발전 시키고 평가함)

⇒ 1. 직원들이 바쁘게 진짜 일을 하는 것. 이렇게 해야만이 실제 문제들을 찾아내고, 얼마나 진척이 됐는지를 알 수 있다. 강하게 적응시키려고 한다. 2. 최대한 빨리 피드백을 주려고 한다. 특히 사내 문화 적응 관련 피드백은 더 빠르게 하려고 한다. (Stripe의 문화 중 하나는 글을 통해서 소통을 하려고 한다.)

Q. Stripe는 어떻게 투명성을 확장시켰는지 궁금하다.

⇒ 스타트업은 정치적인 문제들에 휩싸이지 않은 조직이라고 정의를 했다. 대기업에서는 여러가지 일들이 얽혀있으면서.. 제품에는 최선의 방향이 회사차원에서는 문제가 될 때가 있을 것이다. 하지만 모든 사람들이 한 방향으로 나아가는 스타트업에서는 모든 정보를 공개할 수 있다. Stripe는 초기에 발송되는 모든 이메일에 전직원을 참조시켰다. 일어나는 일들을 미리 알고 있다면, 미팅을 따로 가질 필요가 없다고 생각했다. 하지만 확장에 문제가 있었던 것은 사실이다.

⇒ 첫째는 툴을 바꾼 것이고, 둘째는 확장에 맞춰서 문화를 발전시킨 것이다.
툴은 모든 이메일 공유 → 이제는 매주 취합한 형태로 정보를 공유 (덱을 만들거나..)
문화 이야기는 엄청난 양의 정부가 내부 공개가 되어있고, 이에 따른 사내 규범들이 규정되어 있다.

Team : 동료 & 채용

채용

작은 팀으로 얼마나 많은 일을 하는지에 대해 스스로 자랑스러워 해야 한다. 한계에 부딪칠 때까지 초기에는 채용을 하지 않는 방향이 좋다.
AirBnB 케이스
- 채용 눈높이를 엄청나게 높이고, 천천히 채용하면서 모두가 회사의 일에 대한 사명감을 가지게 함
최고의 사람들을 얻는 법
- 똑똑한 사람들은 로켓에 올라타야 한다는 것을 안다. (훌륭햔 제품의 필요성)
- 채용에 사용하는 시간: 0% or 25% (하지 않거나, 최대로 쓰거나)
- 스타트업에서는 중간 실력이면 안 된다. 회사를 망하게 할 수 있다. (이 사람 한명한테 우리 회사의 미래를 맡길 수 있을까?)
- 초기에는 주변 사람들을 영입하는 방식
- 채용시 보는 3가지
  1. 똑똑한가?
  2. 일을 해내는가?
  3. 더 많은 시간을 같이 보내고 싶은가?
  4. (추천) 인터뷰 보다는 프로젝트를 같이 해보라
    1. (인터뷰를 할시) 무엇을 해봤는지 프로젝트에 대해서 집중적으로 물어보라
    2. 레퍼런스 체크를 철저하게 진행하라.
- 그 외 중요한 점들
  - 커뮤니케이션 스킬
  - 어느정도 위함한 것을 좋아하는 사람
  - 스타트업에 대한 집착이 있는 사람
  - 마크 주커버그의 기준
    - 함께 시간을 보내기 즐거운 사람
    - 역할이 뒤집혀도 보고하기 편한사람
- 투자자에게는 협상시 최대를 얻을 수 있도록 하고, 직원에게는 후하게 하라 (지분 등)
직원들이 행복하고 존중받는 느낌이 들도록 해야한다.
- 지분 분배의 중요성
- 회사에 일어나는 좋은 일의 공은 모두 팀원들에게 돌리고, 나쁜 일에 대한 모든 책임은 책임을 져야 한다.
- 동기 부여의 핵심
  - 자율성 보장
  - 성장한다는 느낌
  - 일을 하는 명료한 목적
해고를 빠르게 해야 한다.

QnA 모음

Q. 초기맴버를 사내문화 관점으로 접근할때 중요하게 본 요소는?

⇒ 같이 일하고 싶으면서 또 재능있는 사람들,
사내문화는 건설한다고 생각하지만.. 내가 보기에는 정원을 가꾸는 일이다.
초기에는 창업자와 비슷한 사람들을 찾으려고 했다. (아주 별난 사람들이기도 함)
우리는 많은 분야에 관심을 가지고 있으면서 한 분야에서는 최고의 전문가인 이런 창의적이고 별난 사람들이 대단한 제품을 만들고 협력에도 뛰어남을 보았다.

우리는 위대한 무언가를 만들고 싶어 하는 사람들이다. 초기 환경을 고려하면 순수한 이유만을 가지고 합류했다.
채용이 이뤄지는 장소, 뛰어난 사람들은 다른 무언가를 하고 있을 확률이 높고 정말 다양한 장소에서 만날 수 있다. 우리는 그런 사람들을 찾아더녀야 한다.

⇒ 신규채용은 장기간 동안 지인의 지인들을 장기간 설득하는 과정이였다.
아직 알려지지 않은, 저평가되고 있는 실력자들
엘레베이터 피치를 하듯, 사람들에게도 이런 과정을 지속해야 한다.

⇒ 첫 10명의 경우, 아주 진실되고 정직했다는 것. 다른 사람들이 같이 일하고 싶은 사람으로서 주변에 믿음을 주고, 문제 접근에 대해 지적으로 솔직한 사람들
특정분야를 파는데 2년을 쏟은 누군가와 일을 하는 것이 당연히 훨씬 더 흥미로운 일, 사소한 디테일이라고 신경을 쓰는 사람들, 그리고 일을 끝마칠 수 있는 사람들

Q. 어떻게 뛰어난 사람을 알아보는 방법은??

⇒ 같이 일해보기 전까지는 100% 확신할 수가 없다.
재능은 2가지 부류로 나뉜다.

일을 하는데 필요한 재능들을 미리 알 수 있는 경우
그렇지 않은 경우 → 이 경우가 어렵고, 이때에는 누군가와 말하기 전에, 먼저 필요로 하는 분야에서 세계 최고수준이란 무엇을 의미하는지 생각

그 분야에서 세계적인 권위를 가진 사람들과 이야기하면서 그들은 무슨 요소들을 살피는지를 물어보는 것을 습관화하였다. 여기서 무슨 질문을 해야하는지 또한 알 수 있었다.
인터뷰에서의 질문들은 이 사람이 와서 일하기에 적합한가에 대한 답을 줄 수 있어야 한다.
(예를 들어, 문제를 해결하기 좋아하는 사람들에게 구글이 내는 기발한 문제들)

왜 이 아이디어가 좋은 아이디어인지 솔직하게 말하되, 어떤 점들이 어려울지 끔찍할 만큼 디테일들을 늘어놓는 것이 필요하다. (아이폰을 위한 채용 때에는 무슨 일을 하게 될지도 말을 안 했다고 한다. 3년간 가족을 볼 수 없겠지만 일이 끝나면 당신의 아이들의 아이들까지 당신이 만든 걸 기억하게 될 것이다.)

주위 사람들의 평을 받는 것도 중요하다. 경험이 있는 사람들에게 의견을 구하는 것이기 때문. (예를 들어, 인터뷰 때 우리는 모두 조나단을 압니다. 몇주 뒤에 그와 이야기를 하려고 합니다. 그에게 당신이 가장 잘하는 것, 가장 자랑스러워 하는 것, 당신이 나아지려고 하는 부분을 물어본다면 뭐라고 대답했을 것 같나요??)

그 다음에는 가볍게 느껴질 수 있는 질문을 조금 더 게량적으로 느끼게 만들 수 있는 질문을 하고 장기간에 걸쳐 측정을 진행 (예를 들어, 이 사람은 이 부분에서 평가하기에 같이 일했던 사람들 중 상위 1%입니까? 아니면 5%입니까? 아니면 10%입니까?) 상대평가를 강제하면 더 객관적인 평가를 받을 수 있다. 그냥 괜찮다는 말은 도움이 덜 될 수 있다.

⇒ 여러분이 원하는 방식대로 인터뷰를 이끌어 갈 수 있는 자신감이 필요하다. 보통 잘 모르면 알려진 방법들을 따라하게 되는데, 이것보다는 스스로 방법을 알아내는 것이 더 좋은 효과를 준다. (예들 들어, 엔지니어 → 코딩 테스트가 아닌 옆에서 코딩을 하게 하고 그 것을 지켜보는 것, 비즈니스 업무 → 프로젝트 위주로 이야기, 기존 프로젝트를 어떻게 발전시킬지, 어떤 새로운 프로젝트를 하고 싶은지 이야기)

⇒ 첫 10명의 경우, 최대한 일을 많이 해보고 뽑는 것이 필요하다. 모두와 적어도 최소 1주일은 일해보았다. 그리고 한가지 짚고 넘어갈 점은 사람들은 직접 경험하기 전까지는 첫 10명 채용과 사내문화 문제의 중요성을 깨닫지 못한다는 점.

Q. 회사가 10명으로 1000명 규모로 성장함에 따라 채용과 팀관리 정책에 대한 변화는??

⇒ 팀 관리 측면에서는 각각의 팀들이 더 큰 조직에 속한 한계 내에서 최대한 독립적이라고 느끼면서 민첩한 대응이 가능하게 하는 것. 시간에 걸쳐 회사를 여러 스타트업으로 이뤄진 스타트업으로 느끼게 만드는 것. (규격화된 프로세스들 아래 있는 거대한 회사가 아닌..)

한 가지 목표는 각각의 팀이 성과를 이루는데 필요한 자원에 대한 통제권을 가지게 하는 것, 제일 중요한 일이 무엇인지, 어떻게 측정해야 할지 알게 하는 것. 이것들이 이뤄지면 팀 관리가 어느 정도 가능해진다.

각각의 그룹 내에서 해결이 가능하게 하고 싶다. 문제를 어렵게 만들기는 하지만, 이것이 제품을 만드는데 있어 Pinterest가 가지고 있는 철학의 중심가치이다. 여러 분야의 사람들을 모아 놓으면 각자 다른 흥미들을 가지고 있다. 그들을 하나의 프로젝트로 묶고 걸림돌들을 없애주어 마음껏 능력을 펼치게 해주는 것.

채용은 회사 규모가 커짐에 따라서.. 네트워크를 통해서 더 많은 사람을 모을 수 있게 된다. 15 번째 직원이 스타트업과 대기업을 다 경험하면서.. 이런 경험들이 도움이 많이 되었다.

⇒ 변화에 따라서 시간의 길이가 급격하게 변하게 된다. 초기에는 1달 로드맵을 보고 있었다면, 후에는 1년후 나중에는 5년후를 보게 된다. 우리는 미리 계획하고 대비해야 한다.
초기에는 당장의 생산성을 중심으로 사람을 구하게 될 것이다. 그 이후에는 장기적인 관점으로 사람을 채용해야 한다.
시스템을 통해서 문제를 푸는 방법을 고민해야 한다.

Q. 어떻게 사람들에게 스타트업 합류를 하라고 설득할 수 있나요?

⇒ 불확실성이야말로 스타트업이 사람들에게 울림을 줄 수 있는 이유라고 생각한다. 성공이 확실하다면 지루하지 않을까요? 그리고 또 하나의 중요한 동기는 개인 성장 측면이다.

⇒ 무엇이 어려울 것인지, 당신의 최선의 계획은 무엇인지 말해보라. 그리고 그 사람의 역할이 왜 핵심적인지 말해줘라. 반대하는 점은 이 모든 것들을 눈가림 하는 것이다. 예를 들어, 이 문제를 정말 풀고 싶은지 알려면 다른 어떤 회사들에 지원했는지 물어보라. 보통은.. 문제에 집중하는 것이 아니라 괜찮은 회사를 가고 싶은 것이면 유명한 기업들의 리스트가 나올 것이다. 그들은 목적을 이루기 위해서가 아니라 경험을 위해서 합류한 사람들이기 때문이다.

Stripe의 경우는 초기의 4명의 Stripe 사용자들을 채용했다. 다른 방법으로는 구할 수 없었던 인재들이었고, 자신이 좋아하는 제품에서 일하는 것이기에 혜택을 제공했을 것이다.

Q. 사람을 관리하는 방법에 대해서 조금 더 상세하게 알고 싶다.

⇒ 최대 2주에 1번씩은 만나서 1대1 이야기를 해봐라. 토의사항은 매니저가 아니라 직원들이 정할 수 있어야 한다. 정말 신뢰가 쌓였을 경우에는 한달에 1번 정도로 늘릴 수 있을 것이다.

Q. 임원을 해고 또는 강등을 할 때, 어떤 식으로 당사자와 대화하고 어떤 식으로 주변에 설명하는가?

⇒ 누군가를 해고할 때 가장 중요한 것은 솔직해지는 것이다. 감정적이나 감상적은 맞는 방법이 아니다. 채용에서의 실패가 있었다는 것을 인정하고 문제점을 찾아야 한다. (해결의 좋은 시작점)
여러분은 누군가의 직업을 뺐을 수도 있고, 또 그래야만 합니다. 하지만 누군가의 자존심까지 뺏어서는 안 됩니다. (당신의 말이 그 사람의 평판이 될 것, 회사의 문제도 확실하게 인지해야 할 수 있어야 한다.)

Q. 당신에게 반목하던 사람을 자기 편으로 만드는 방법?

⇒ 리더로서 더 나은 방향을 제시할 수 있어야 한다. 말 그대로 당신의 방식이 더 나아야 합니다.

Q. 다른 사람의 입장에서 생각하는 팁

⇒ 일상에서도 어렵지만, 비즈니스에서는 더 어렵다.
아직 절차를 갖춰놓지 않았다면, 잠시 멈춰서 생각을 하세요. 리더가 되기 위한 중요한 자질은 잠시 멈출 줄 알아야한다는 겁니다. 중요한 일이 있는데 아직 생각을 깊이 생각해보지 못 했다면.. 솔직하게 이야기를 해라. “중요한 문제라고 생각하고 다양한 관점을 모두 고려해서 답을 하고 싶다.”
*김치 문제: 작은 감정적인 문제로 숲 전체를 태우는 일 (깊숙히 묻을수록 숙성되는 것??)

Founder: CEO & 공동창업자

출처: How to Start a Startup, Lecture Note 13

CEO의 일
1. 비전 수립
2. 투자 유치
3. 전도?
4. 채용 & 관리
5. 실행에 대한 기준을 세우는 것
실상 창업자는 사방팔방에서 나오는 문제를 떠안는 사람
내가 위대한 창업가인지 어떻게 알 수 있을까?
1. 팀
  - 혼자서 창업보다는, 2~3인의 공동창업이 낫다 (굉장히 다양한 능력이 요구되는데, 2~3명이 더 다양하게 대응을 할 수 있다)
  - 신뢰의 중요성
2. 위치
  - 위대한 창업자는 반드시 문제와 과업을 해결해 줄 인적 네트워크를 찾기 때문
    - 실리콘밸리가 모든 산업에 적합한 곳은 아니다
3. 역발상
  - 똑똑한 사람이 이것을 보고 똘끼가 있다고 생각할까? 를 기준으로 삼으면 좋다.
    - - 왜 똑똑한 사람들이 나의 의견에 동의하지 않는가 도 고려해야 한다
    - - 내가 알지만 다른 사람들이 모르는 것은 무엇인가
4. 일을 직접 vs 위임
  - 간단하게 둘다 해야 한다. (상황에 따라)
5. 유연 vs 지속성 (Persistent)
  - 역시 둘다 상황에 따라
  - 기업처럼 투자철학이 있어야 한다. 그래야 이 기준에 따라서 선택을 할 수 있을 것이다.
6. 자신감 vs 주의
  - 신념을 고수함과 동시에 다른 사람들의 반론도 받을 수 있어야 한다.
    (자신감을 유지하되 위험요소를 충분히 이해하라)
7. 내성적 vs 외향적
  - 역시 둘다, 위대한 창업자는 이 경계를 자유롭게 넘나들 수 있다.
8. Vision vs Data
  - 데이터 역시 당신이 설계하는 비전 안에서 의미가 있는 것이다.
  - Data 가 방향을 틀 수 있지만, 명확한 비전은 필요하다.
9. Take Risks vs 위험 최소화
  - 사업가는 언제나 위험을 계산하고 과감하게 배팅할 수 있어야 한다.
  - 위험을 감수할 때, 지능적인 위험 감소를 추구해야 한다.
    - 위험은 최소화하되, 효과는 최대화할 수 있는 방안을 항상 강구해야 한다.
10. Short Term vs Long Term
  - 당연하게도, 둘다 고려해야 한다.
사업은 절벽에서 일단 뛰어내린 다음, 비행기를 만들어서 나르는 것이다.
공동창업자
- 스타트업이 망하는 가장 큰 망하는 이유로 공동창업자 간의 갈등
- 오래알지 못하거나 단순하게 공동창업자를 구하는 것은 재앙이다
  - 대학에서 만나면 좋지만, 그러지 못한 경우 회사에서 좋은 사람들 확보
  - 오래알던 사람과 창업하는 것이 최선
- ‘거침없는 지략가형’
  - 한 분야의 전문가 보다는 007 제임스 본드 같은 사람
  - 거칠지만 차분한 사람
- 약 2~3 명이 이상적
- Q. 공동창업자 간의 지분 분배
  ⇒ 일을 시작한지 얼마 안되었을 때 정리해야 한다. 서로 비슷한 것이 이상적
- Q. 공동창업자 간의 관계가 깨질 경우?
  ⇒ 이에 대한 안정 장치가 필요하다 (조건부 지분분배)

Management: 경영

회사를 설립하는 것은 제품을 잘 만드는 것만큼 어렵다. 기본적으로 사람은 비이성적이기 때문.
회사를 설립하는 것은 “엔진”을 만드는 것과 같다. 아주 고성능의 문제가 없을 그런 기계를 만들 수 있어야 한다.
매니저의 아웃풋 = 조직의 생산량 극대화 + 주변 팀의 영향력
문제에 대한 분류가 필요하다.
감기의 경우, 시간이 지나면 자연스럽게 나아지는 병이므로, 노력을 많이 투자할 필요가 없다.
Editing, 편집자가 창업자에 가장 적합하다.
- 모든 팀원들을 위해 업무를 “명확하게”하고 “단순화”하는 일
- 단순화를 시킨만큼 생산성이 올라갈 것이다.
명확하게 만들기 위한 질문을 던져라.
자원을 분배하라.
- 여러분과 일하는 사람들 대부분이 자기 고유의 계회을 제시해야 한다. (Bottom-up)
- 빨간 잉크의 양을 체크하라.
조직의 일관된 목소리를 유지하라.
- Economist, 글을 보면 한 사람이 쓴 것처럼 느껴질 것이다.
위임하기
- 위임의 문제는 결국 여러분(창업자)가 모든 일에 대한 책임을 진다는 점
- 위임과 책임을 동시에 잡는 법
  - 과업 숙련도 (task-relevant maturity) 일을 많이 해본 사람에게 더 위임을 하는..
    - 경영자는 하나의 관리 방식만을 고수해서는 안 된다. 직원들에게 각가 맞는 방식을 사용해야 한다.
  - 결정에 대한 자신감 x 예상되는 결과 영향력 (2 x 2)
    - 확신이 적고, 영향력 또한 적다면 완벽하게 위임하라.
    - 확신이 크고, 영향이 큰 일은 직접 제대로 하라.

출처: How to Start a Startup, Lecture Note 14

팀을 구성하기
- 포와 탄약에 대한 비유, 채용되는 사람들은 대부분 탄약이고 그것을 움직이는 대포와 같은 소수의 사람들이 있다.
  한 번에 쏠 수 있는 탄약의 수는 대포의 수에 의해서 결정된다.
  대포같은 사람이란, 특정 아이디어를 구상단계에서부터 실제 상품을 고객에게 전달하고 사람들을 결집시킬 수 있는 사람이다. 문화적인 기량이 상당히 필요하다.
- 대포를 늘리기 위해서는, 모두에게 결국 부러질때까지 점진적으로 책임의 볌위를 늘려나가는 것이 필요하다. 실패의 시점을 기억하고 그 수준으로 책임을 부여하면 된다.
- 수평적인 관계에서 주변 동료들이 많이 찾는 사람이 있다면, 그 사람은 주변 사람들을 도와주는 사람일 것이다. 즉, 대포에 가까운 사람일 것이다.
Scaling, 언제 어떻게 확장을 할 것인가
- 각각 회사마다 성장곡선이 다르다.
Insist on Focus, 한 가지 일에만 집중하게 하는 것
- Peter 의 방식, 사람은 눈에 보이는 쉽게 해결할 수 있는 일을 먼저 처리하기 때문 → 누구도 문제를 제대로 풀기 위해서 시간의 100%을 사용하지 않는다.
Metrics & Transparency
- Dashboard를 만들기를 추천한다.
  - 첫 대쉬보드는 창립자가 만들어라.
  - 이 대쉬보드에는 무엇이 중요한지 모두가 알 수 있도록 직관적이고 명확해야 한다.
- 투명성
  - 회사의 모든 직원들은 회사내 벌어지는 모든 일을 알 수 있어야 한다.
  - 이사회의 결정이 나는 발표자료를 모든 직원들에게 공유
  - 모든 미팅에 대해서 전체 공유
  - 회의실 벽이 모두 유리로 되어있다.
  - 스티브잡스 Next → 투명 임금제도
- Metrics
  - 한가지 요소에만 집중하지 말고, 트레이드오프가 되는 요소까지 같이 측정해야 한다. + 책임을 지는 사람의 수준
예측이 불가능한 현상에 대한 분석이 필요하다. → 새로운 시장을 개척할 수 있는 기회
Details Matter
- “모든 디테일을 잡는다면 100만달러짜리 사업을 시작하거나 100만불의 수익을 창출하거나 100만명의 사용자를 잡기위해 노력할 필요가 없다.”
- 조직전체가 모든 일을 제대로 한다면, 결국 최고 수준의 팀을 가지게 될 것이다.
- 직원들이 매일같이 일하고 살다시피하는 사무실 환경은 기업의 문화와 사람들이 결정하는 방식과 직원들이 투입하는 노력의 정도를 결정할 것이다. 이러한 디테일을 다른 사람에게 맡기지 말고 스스로 해라.
One Management Concept : 결정적인 결정을 할 때는, 그 결정에 대해서 바라보는 모든 사람들의 시각을 이해할 필요가 있다. 즉, 회사 전체의 시각에서 문제를 바라볼 수 있어야 한다.
1. Demotions
  - 해고 vs 강등
    - 해고보다는 강등을 포함한 옵션을 주는 것이 개인에게도, 결정권자 입장에서도, 회사입장에서도 더 나을 것이다.
    - 강등 후, 사람들의 신임을 얻을 수 있을 것인가 또 당사자가 동기부여를 잃어버리지는 않을까
    - 이것은 고소득을 올리는 직책의 직원이 자신의 임무를 다하지 못했을 때 일어날 수 있는 일을 정의한다.
2. Raises
  - 우수한 직원이 임금 인상을 요구한 경우
    - 직원 입장에서는 이러한 요구를 하기까지 굉장히 많은 단계를 거쳤을 것
    - 임금 인상을 요구하지 않는 직원들도 고려를 해야 한다.
    - 절차(원칙)는 중요하다. → 사내 문화를 보호할 수 있는 수단
3. Sam Altman blog post
  - 스톡옵션을 행사하는 방법에 대한 제안 (행사가능 기간 10년)
    - 현재: 퇴사 후 90일 내에 처리를 해야하는 상황 + 행사를 하려면 필요한 돈의 문제
    - 90일로 선정이 된 이유는 주가에 대한 불확실성으로 손실 계산이 불가능했기 때문이다.
    - Cultural Statements (2가지의 대안책들)
      1. 직원들에게 솔직해지기, 10년후에도 행사할 수 있는 조건
      2. 사실 그대로 말해주기, 스톡옵션이라는 권리가 주어질 것이나 회사에 남아있어서 상장까지 시킬 수 있어야 받을 수 있는 권리다 → 어떤 사람을 채용하고 싶은지 말하는 바가 있다.
4. History’s greatest practioner
  - Toussant (투싼) 에제
    - 주변의 적과 싸워서 이겨야 했다
      1. 자국의 군대 입장
      2. 적의 입장
      3. 나라의 문화 입장
        
        ⇒ 약탈을 하지 않음, 강간 X, 장교 간의 불륜 X (장기적인 문화를 고려했기 때문)
        
        ⇒ 전쟁에서 이겼을 때, 적군의 능력이 있는 장수를 고용 (전문성과 더 높은 수준의 문화를 원했기 때문)
    - 노예주에 대한 처분
      1. 노예들의 입장 (노예주를 죽이자)
      2. 투생의 입장 (기존의 노예로서의 입장, 설탕경제에 대한 이해)
      3. 노예주들의 입장 (사업을 하는 방식)
        
        ⇒ 노예제 폐지, 노예주는 땅을 그대로 소유하되 임금을 지불, 강제 노동 금지 (마찬가지로 더 높은 수준의 문화를 원했기 때문)

결론 : 여러분이 배울 수 있는 가장 중요한 일이자 CEO 로서 가장 어려운 일이 바로 스스로가 회사를 직원들과 파트너들과 관점에서 바라보도록 훈련시키는 일이다. 여러분과 이야기하고 있지 않고 한 공간에 있지 않은 사람들의 관점에서 말입니다.

QnA 모음

Q. 탄약을 언제 많이 뽑아야 하나요?

⇒ 엔지니어의 경우에는 10~20명 정도가 충분할 것 같고, 포가 준비되었을 때 탄약을 뽑는 것이 맞다고 본다. 디자이너의 경우에는 조금 다르다.
X (총 생산량) / Y (팀원의 수) → 이 값을 가지고 직무평가를 한다고 해야 한다.
이럴 경우 Y 는 늘어나지 않을 것이다.

Q. 위임과 책임 그리고 디테일에 대한 조화를 어떻게 이룰 것인가?

⇒ 디테일을 잡는 다는 철학은 기업 초기에 매우 중요하다. 기준을 잡는 일이 때문.
그래야 나중에 들어온 사람들도 이 기준에 맞춰서 일을 할 수 있게 될 것이다. 문화는 곧 결정을 내리게 하는 사고의 틀이다.

User Interview: 사용자 인터뷰

Emmett – Twitch CEO
누구와 대화를 하는지, 또 무엇을 묻고 얻을지는 굉장히 중요하다.
Twitch의 경우, 시청자와 방송제작자의 피드백은 완전히 다를 것이다.
강의 중심의 노트 필기 앱 케이스
1. 여러분이 만드는 것에 대해서 배우기 위해서는 누구와 말을 해야 할까요?
  - 예시
    - 실제 강의를 듣는 학생들
      - 학원을 안 가고 집에서 인강을 듣는 사람들
      - 각각 다른 과목을 듣는 사람들
      - 공부 방법 별로 분류해서 만난다
      - ⇒ 학생들은 돈을 잘 쓰지 않는다. 오히려 대학 IT 서비스 팀 혹은 부모들을 대상으로 판매를 계획하는 것이 더 맞는 방향일 수 있다.
    - 강의를 하는 선생/강사
    - 영상으로 편집하는 편집자
  - 뭔가 대단한 아이디어가 있다면, 생각할 수 있는 가장 넓은 그룹의 사용자들을 생각해보라.
2. 인터뷰를 통해서 무엇을 만들지 정해보자.
  - 질문에 꼬리에 꼬리를 물면서 더 세부적으로, 파고든다.
  - 첫 인터뷰에서는 기능에서 최대한 멀어져서, 문제에만 집중을 해야 한다.
    - 다양한 사람들과 문제에 대해서 이야기를 하다보면 큰 장애물이 있는 부분을 알게 된다. 이 부분이 제품의 핵심이 될 수 있다.
  - 한 분야의 6~8명을 인터뷰하면 정보를 거의 얻었다고 볼 수 있다.
3. 인터뷰를 바탕으로 기능을 생각해보자.
  - 이 기능이면 충분한가? 이것을 보고 사람들이 우리의 제품으로 전환을 할까요?
    1. 직접 프로그래밍을 해서 세상에 공개한다.
    2. 위의 방법은 시간이 걸리기 때문에, 프로토타입 기법을 활용한다.
      - 이때, “내가 이런 기능을 생각해봤어” 라는 식으로 접근하면 좋은 피드백을 얻을 수 없다.
      - 기능을 평가하기 위해 최소한으로 가져야 하는 방법으로.. 다른 제품에 붙여보고, 사람들이 사용하는지 확인해보는 것
      - Money Test는 실제로 사람들이 돈을 주고 이 제품을 살 것인지 말해준다.
Twitch 케이스
- 몇 가지 피드백을 보면.. 이런 이슈들이 있음에도 해당 제품을 사용한다는 것은 굉장한 의미가 있다.
- 다른 제품 유저들의 피드백을 보면 완전히 다른 양상의 피드백을 얻을 수 있다.
- 방송제작자의 피드백, 경쟁제품을 사용하는 방송제작자의 피드백, 일반 사용자의 피드백
- 게이밍 방송시장의 경우, 일반사용자가 대부분이고 더 큰 시장임을 의미한다.
- 유저 인터뷰를 한 사람들을 대상으로 문제점을 파악하고, 제품을 만들어서 제안!
데이터를 기반으로 판단하는 것은 좋으나, 제품의 방향을 보여주지는 못 한다.

QnA 모음

Q. 스타트업이 인터뷰를 할때 실수하는 것들은?

⇒ 제품을 보여주지 말라. 제품을 보여주는 것은 기능에 대해서 말해주는 것과 같다.
사람들의 머리속에 있는 것을 파악해야지, 새로운 것을 넣으면 안 된다.
실제로 인터뷰해야 하는 사람이 아닌 대화 가능한 사람을 하는 경우가 많다. 실제 사용자가 누구인지 알아내는 과정이 필요하다.

Q. 회사 내 다른 사람들을 설득하는 방법

⇒ 인터뷰를 녹음해라. 만약 여러분이 어떤 것을 만들어야 된다고 주장하고 싶으면, 그저 인터뷰를 재생해주면 된다.

Q. 인터뷰를 어떤 툴을 통해서 진행했는가?

⇒ 이메일보다는 Skype와 같은 툴에서 진행해야 한다. 가장 흥미로운 포인트들은 “흥미롭군요, 좀 더 말씀해주세요” 에서 옵니다. 의도하지 않았던 순간에서 핵심이 나온다. (상호적인 피드백이 중요한 이유)

Q. 글로벌 시장에서의 인터뷰는 (다른 언어의 사용자들)?

⇒ 한국 시장의 경우, 통역자를 구해서 인터뷰를 해봤으나 실제 사용자의 대표를 구하기 어려운 문제가 있다.

Q. 인터뷰 대상자를 구하기 위한 채널과 보상은?

⇒ Twitch의 경우, 회사 웹사이트 내의 메시지 시스템이였다. (채널), 돈을 지불하면서 대화를 할 필요는 없었다. 보통 문제를 직시하고 있는 사람들은 보상없이 자신이 원하는 바를 이이기 한다.

Q. 사이트 자체적인 유저 피드백 도구들이 있나요?

⇒ 중요한 두번째 종류의 유저 피드백이 있다. 우리는 문제를 발견하는 것에 집중했기 때문에 사이트가 없었다.

Q. 제한된 리소스 상황에서 하나의 고객군에 집중해야 한다면?

⇒ 우리는 경쟁 제품을 사용하는 사람들에게 집중했다. 이미 우리가 요구하는 행동에 관심이 있었고, 이런 요구를 만족하면 옮길 것이라고 생각했기 때문이다. 또 회사 상황상 빠른 성장을 해야하는 상황이였다.

Q. 게임 퍼블리셔는 어떻게 대하였는가?

⇒ 누구도 대화를 하려고 하지 않았다. 제품을 사용자들이 많이 쓰게되면서 자연스럽게 퍼블리셔에게 우리가 중요해졌고, 대화를 하게 되었다.
제품은 계속해서 변화하고 그 때에 맞는 사람들과 대화를 하는 것이 중요하다.

Q. 사용자 입장에서 좋은 피드백을 주는 방법

⇒ 생각하는 진짜 문제를 이야기해줬으면 좋겠다. 횡성수설 했으면 좋겠고.. 그냥 아무 말이나 해줬으면 좋겠다. 그 사람에 대한 컨텍스트를 알 수록 무엇을 원하는지 알 수가 있다.

Legal & Accounting: 재무기초

출처: How to Start a Startup, Lecture Note 18

자금 마련, 직원 고용, 계약 체결
Delaware Corporation (델라웨어 회사법)
- 내가 이 일을 개인으로써 하고 있는지, 독립된 법인체인 기업을 대신하여 하고 있는 것인지
자본 배분
- 지분
- 공동창업자들간의 지분관계 → 모든 창업자들이 동등한 배분을 받았다
- 과거가 아닌 미래에만 집중
주식 배분
- 83(b) Election
- 서류에 서명하고 그 증서를 가지고 있어라
주식의 귀속 (Vesting)
- 실리콘밸리 표준 4년, 클리프 1년: 1년 후 25%를 가지고, 점진적으로 가짐
- 이 제도를 사용하는 이유? → 창업자가 퇴사를 하는 경우, 오랫동안 일할 수 있는 동기부여의 필요
Fundaraising
- 벨류에이션 캡
- 예) 500만 달러 캡, 10만 달러 투자 → 회사의 가치가 2천만 달러, 한주당 25센트로 구입 가능
- 미래에 주식이 희석될 수 있음을 알려라
Investor Requests
- 이사직 : 왠만하면 거절하는 것이 좋다, 전략과 방향성을 제시해줄 수 있는 인재는 돈으로도 살 수 없다
- 어드바이저: 도움이 되는 조언을 하는 경우가 드물다, 투자를 한 이상 도와주는 것은 당연한 일이다.
- pro-rata 권리: 미래에 추가적인 주식을 매입함으로써, 회사 내의 지분을 유지할 수 있는 권리
- 정보에 대한 권리: YC는 한달에 한번씩 투자자들에게 업데이트를 추천
Company Expenses
- 사무실 대여, 직원 고용 등등
- “내가 각 세부 사항을 이야기해야할 때, 부끄러울 항목이 하나라도 있는가?”
- 각종 영수증은 챙겨야 한다
창업자 고용
- 창업자가 무급으로 일하는 것은 불법이다.
- Payroll Service를 사용하라
- 창업자가 퇴사하는 경우, 문제가 되는 경우가 존재
직원 고용
- 정규직과 계약직의 차이
- 근로자 관련 보험, …
- 이 역시, 스스로 업무를 보는 것보다는 서비스를 이용하라
직원 해고
- 가능한 빨리, 명확하게 진행하라
Legitimacy

Sales and Marketing: 영업과 마케팅

영업 → 창업을 하면 창업자가 하게 될 것
- 제품, 산업 도메인에 대한 이해필요

출처: How to Start a Startup, Lecture Note 19 – 1

Almighty Funnel – 1. Prospecting, 2. Conversations 3. Closing 4. Revenue / Promised Land

출처: http://egloos.zum.com/maniac-blogging/v/5197166

각 단계별로 효과적이였던 전략
1. Prospecting
  - 기술 수용 주기
    - 얼리어답터 = 잠재적 고객 (약 2.5%)
    - 이 말은 곧 약 2.5%의 고객들만이 전화를 받고 고민할 것이라는 것
  - Top 3: Your network, conferences, cold emails
    - 컨퍼런스: 초기 고객들을 만날 수 있는 장소,
2. Conversations
  - 전화를 받으면 그들이 말을 하게 하라!
    - 영업의 1%의 사람의 경우, 70%는 그들이 이야기하게 하고 30%정도를 이야기 한다. 특히 문제를 제대로 이해하기 위한 질문을 한다
  - 통화 후 사후관리
  - 이메일(응답 없음) → 이메일 x n
  - 시간은 스타트업의 자산이다 → 초기에 살만한 고객인지 아닌지를 빠르게 구별할 수 있어야 한다.
3. Closing
  - 표준 계약 서식
  - 사소한 부분에 집착하면서 시간을 잃지 말라.
  - “제품은 마음에 드는데, 이 기능이 있으면 좋겠네요” → 기능을 추가해도 이 사람이 쓴다는 보장은 없다.
    1. 해당 기능을 추가하는 것으로 계약을 체결
    2. 다른 고객들의 이야기를 종합하여 기능 추가를 판단
  - Free Trials: 프리 트라이얼 이후에 구매를 요구하는 것은 다시 제품을 파는 것이나 마찬가지이다. 누군가 프리 트라이얼을 요구한다면, “저희는 주로 연간 단위로 계약하는데, 이번에만 처음 30일, 60일 이내에 제품이 마음에 들지 않으시면 취소하실 수 있습니다”
4. Revenue / Promised Land
  - 5가지 비즈니스방법 – 자신의 사업이 어디에 속하는지 확인하라
    - 10$, 10,000k Customer : Marketing
    - 100$, 1,000k Customer : Marketing
    - 1,000$, 100k Customer : Marketing
    - 10,000$, 10k Customer : Inside Sales
    - 100,000$, 1k Customer : Field Sales

B2B: 기업용 소프트웨어

Aaron Levie, Box CEO
항상 변화하는 기술적인 요소를 주목하라.
Box – 파일 공유에 집중,
소비자들에게는 더 많은 기능을 제공하고, 기업용으로는 보안 등.. 요구사항을 충족하지는 못하고 있던 상황
소비자 / 기업 간의 시장 규모 → 가치에 대한 등식이 다르다
(소비자 돈을 최대한 적게 쓰려고 하고, 기업 입장에서는 돈보다는 효율과 효과에 집중)
On-Premise ↔ Cloud
2명~ 30만명까지 다양한 기업을 지원할 수 있다는 것은, 진입할 수 있는 시장이 확대되는 것을 의미한다.
기업에서 기술 혁신이 일어날 때, 딱 2번의 기회가 있을 것
1. 원자재가 변화할 때 → 컴퓨팅 가격이 낮아지는 등
2. 고객들이 기업의 제품에 대해서 새로운 경험을 필요로 할때 → 운송업에서의 우버
  ⇒ 지금이 단 하나의 산업군을 타겟으로 하는 수직적 소프트웨어 회사를 시작하기에 좋은 시간인지 보여준다. → 모든 분야에 적용할 수 있기 때문
앞으로도 회사들은 더 똑똑하고, 효율적으로, 효과적으로 일하기 위해 나아갈 것이다.
창업 관점에서의 조언들
1. 기술의 변화를 찾아라. – 크고 근본적인 트렌드의 변화
  - 잘 살펴보면, 지금 적용되는 기술들은 이미 몇년 전에 시도했던 기술들일 것이다. (환경이 받춰주지 못했기 때문에)
2. 의도적으로 작게 시작할 필요가 있다. – 이미 존재하는 제품들 사이에 끼어들 수 있는 쐐기 같은 제품이어야 한다.
3. 불균형을 찾아야 한다. – 기존 업체들이 하지 못했거나, 하지 않은 일을 해야 한다.
4. 아웃라이어를 찾아야 한다. – 제품의 얼리어답터를 활용하라.
5. 고객의 말을 들어가 – 하지만 항상 이들이 원하는대로 해주지는 마라. 요구사항을 듣고 해석해서 정말 좋은 것을 만들 수 있어야 한다.
6. 커스터마이징이 아닌, 모듈화를 해라
7. 사용자에 집중하라 – 기업용 소프트웨어에 소비자 DNA을 넣어라.
8. 바이럴 마케팅이 가능하게 하라. – 근본적으로는 제품이 중심에 있어야 한다.

Later-Stage Adivce: 후기 단계의 조언들

출처: How to Start a Startup, Lecture Note 20

창업 후 12~24개월 후에 중요해지는 일들

Management

약 25명의 직원이 있는 다음, 갑자기 구조의 부재를 느끼는 순간이 올 것
해야할 것은 모든 직원들이 그들의 관리자가 누구인지 알게 하고, 직원마다 관리자가 한명이도록 하는 것. 마지막으로 모든 관리자들은 직접 보고해야할 대상을 아는 것.
명확한 보고 체계가 중요
관리구조를 혁신하려고 하지 마라. 혁신해야 할 것은 제품이다.
훌륭한 제품을 만드는 것에서 훌륭한 회사를 만드는 일로 변경 된다.

출처: How to Start a Startup, Lecture Note 20

실패케이스
1. 경력자 (시니어) 채용을 주저하는 것
2. 영웅모드 (본보기)
  - 나는 휴가를 쓸 것이고, 직원을 더 뽑을 것이야. 앞으로 성장을 이 만큼 할 것이니 그것에 맞춰서 고용을 할 것이다.
3. 나쁜 위임법
  - 나쁜 케이스 : 우리는 큰 일을 해야한다. 이것을 조사해주세요. 그럼 제가 결정하고 진행하면 됩니다.
  - 좋은 케이스: 우리는 큰 일을 해야 한다. 나는 당신을 신뢰하고 있다. 조사하고 어떻게 결정할지 알려주세요.
4. 개인적인 정리
  - 개인의 생산성과 제품의 개발을 추적하는 일은 꼭 필요하다.
  - 추가로 단순히 어떤 일을 하고, 왜 그 일을 하는지 써 놓는 것도 중요하다. (‘어떻게’ 와 ‘왜’)
    - 이것이 회사의 규범이 될 것이다.
    - 왜 → 문화적 가치

HR

명확한 구조 : 성과에 대한 피드백이 중요
회사가 성장함에 따라서 보상 밴드가 필요해질 수 있다.
지분 : 많은 주식을 직원들에게 나눠주라
- 향후 10년간 3~5%를 회사의 직원들에게 나눠주어라
- 개인적인 추천: 6년 Vesting
- 옵션 관리 시스템의 필요성
50명 이상의 직원일 때
- 성추행, 다양성에 대한 교육
- 번 아웃이 되지 않도록 관찰하기 → 이제 마라톤이 되기 때문에
- 채용 절차
  - 채용하기 전에 내부 내정자를 내부 메일링 등을 통해서 공지
- 직원 늘리기 프로그램
- 팀의 다양성
- 초기 직원들에 대한 관리
  - 일반적으로 회사가 직원들보다 빠르게 성장한다
  - 직접적으로, 솔직하게 이야기해보라 → “초기 직원들은 회사가 성장함에 따라서 무엇을 하고 싶을까”

Company Productivity

초기에는 고려할 사항이 아니다.
직원이 늘어나면서 생산성은 직원 수의 제곱으로 떨어진다.
중요한 하나의 단어: Alignment
- 모두 같은 페이지에 있고, 하나의 방향을 보게 하는 것
방법 중 하나는 명확한 로드맵과 목표를 가지는 것
- 무작위로 직원 중 3명에게 회사의 최상위 목표 3가지가 뭐냐고 물어보면.. 답을 할 수 있어야 한다
- 만약 사람들이 결정의 뼈대(토대)를 알고 있다면, 같은 결정을 내릴 것이다.
프로세스가 아닌 제품에 의해서 움직이고, 매일 딜리버리해라.
커뮤니케이션 : 투명성과 리듬
- 주간 관리회의
- 매월 로드맵과 비전을 공유하는 올헨즈미팅
- Offsite (워크샵)
목표는 오랜 기간 가치를 만들어내는 회사를 만드는 것
- 창업자들이 끌고가면서 하나의 제품에 혁신이 있을 수 있으나, 그 다음 것까지 혁신을 이룰 수 있어야 한다.

Mechanics

회계, 재무 등
FF Stock (Founder’s Fund) in the B round
지적재산권, 상표, 특허
- 제품 출시 후, 11개월 후 정도가 적당하다. 임시특허
- 상표권, 도메인 등
FP&A : 재무 모델
세금 구조 짜기

Your Psychology

갈수록 더 강도가 강해질 것이다.
성공을 하게 된다면, 많은 haters가 생길 것이다. 그것을 무시해라.
장기적인 관점을 가지는 것.
지독한 번 아웃을 경험할 수 있다.
Focus
- 인수합병에 관심을 가지지 말라. → 회사를 죽이는 방법이다
스타트업은 창업자가 포기할때 망한다

Marketing & PR

제품이 팔리기 시작하면, 그떄부터 조금씩 신경을 쓰라
핵심 메시지는 창업자 스스로 찾아야 한다. → 회사가 어떤 메시지를 결정하는 일
개인적으로 핵심적인 기자와 관계를 만들어라 (대행사를 교용하는 것보다 더 직접적으로 의견을 전달할 수 있다)

Deals

좋은 제품을 만드는 것
개인적인 관계를 발전 시키는 것
경쟁 역학
창업자서의 고집은 불편할 정도가 되어야 한다
원하는 것이 있으면 요구하라
스타트업이 겪는 그래프

QnA 모음

Q. 다양성에 대해서..

⇒ 원하는 것은 배경의 다양성이지, 비전의 다양성이 아니다. 배경의 통일은 획일화된 문화로 이기어지 때문이다.

Q. 개인적인 차원에서 생산성을 끌어올리는 방법

⇒ 3달에서 12달정도 시간 동안 해야하는 목표를 적은 종이를 만드는 것. + 매일보기
별도로 단기 리스트, 모든 사람들에 대한 리스트 (하는 일, 나눈 대화, 그외 정보들)

Q. 스타트업이 잘 실패하는 방법

⇒ 실패하고 있다면, 투자자들에게 말을 하고.. 자금이 바닥나지 않도록 해야한다. 빠르게 행동해야 한다. (직원들에게도, 투자자들에게도)

Q. YC 내 이민자 출신의 창업자 수

⇒ 41% 정도??

Q. 전문적인 CEO를 고용하기에 적절한 시기는?

⇒ Never, 창업자가 계속해서 운영을 하는 것, 좋은 제품을 만드는 관점으로는 꼭 그래야 한다

Q. YC가 스타트업을 선정하는 기준, 그리고 시간이 지나면서 변하는지?

⇒ 좋은 창업자와 좋은 아이디어. 기준은 변하지 않았다.

Q. 좋다고 생각하는 시장이 있지만, 아직 많이 알지 못하는 경우의 접근 방법

⇒ 1) 그냥 바로 들어가는 것, 하면서 배우라.

⇒ 2) 그 분야에 있는 회사에서 일하거나, 그 시장에서 1~2년간 무엇이라도 하는 것
후자를 약간 더 추천한다. 그러나 사용자에 대해 정말 제대로 배울 생각이 있다면 크게 상관 없다

끝으로

이번 기회에 다시 정리를 해보면서 내용들을 여러번 다시 보게되었습니다. 대략 7년이 지난 지금에도 적용이 되는 이야기가 대부분이라고 생각합니다. 각 강의에서 창업자들의 인사이트가 느껴지기도 하구요. 스타트업이 성장하고 커가는 방식은 다양할 수 있지만, ‘훌롱햔 제품을 만드는 것’ 이 가장 기본이 되는 방식일 것입니다.

언젠가는 훌륭한 제품 그리고 훌륭한 회사를 만들고, 위 강의처럼 인사이트를 공유해보고 싶네요.

Offsites

PAIRED: A New Multi-agent Approach for Adversarial Environment Generation

Post author By
Post date March 5, 2021
No Comments on PAIRED: A New Multi-agent Approach for Adversarial Environment Generation

Posted by Natasha Jaques, Google Research and Michael Dennis, UC Berkeley

The effectiveness of any machine learning method is critically dependent on its training data. In the case of reinforcement learning (RL), one can rely either on limited data collected by an agent interacting with the real world, or a simulated training environment that can be used to collect as much data as needed. This latter method of training in simulation is increasingly popular, but it has a problem — the RL agent can learn what is built into the simulator, but tends to be bad at generalizing to tasks that are even slightly different than the ones simulated. And obviously building a simulator that covers all the complexity of the real-world is extremely challenging.

An approach to address this is to automatically create more diverse training environments by randomizing all the parameters of the simulator, a process called domain randomization (DR). However, DR can fail even in very simple environments. For example, in the animation below, the blue agent is trying to navigate to the green goal. The left panel shows an environment created with DR where the positions of the obstacles and goal have been randomized. Many of these DR environments were used to train the agent, which was then transferred to the simple Four Rooms environment in the middle panel. Notice that the agent can’t find the goal. This is because it has not learned to walk around walls. Even though the wall configuration from the Four Rooms example could have been generated randomly in the DR training phase, it’s unlikely. As a result, the agent has not spent enough time training on walls similar to the Four Rooms structure, and is unable to reach the goal.

Domain randomization (left) does not effectively prepare an agent to transfer to previously unseen environments, such as the Four Rooms scenario (middle). To address this, a minimax adversary is used to construct previously unseen environments (right), but can result in creating situations that are impossible to solve.

Instead of just randomizing the environment parameters, one could train a second RL agent to learn how to set the environment parameters. This minimax adversary can be trained to minimize the performance of the first RL agent by finding and exploiting weaknesses in its policy, e.g. building wall configurations it has not encountered before. But again there is a problem. The right panel shows an environment built by a minimax adversary in which it is actually impossible for the agent to reach the goal. While the minimax adversary has succeeded in its task — it has minimized the performance of the original agent — it provides no opportunity for the agent to learn. Using a purely adversarial objective is not well suited to generating training environments, either.

In collaboration with UC Berkeley, we propose a new multi-agent approach for training the adversary in “Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design”, a publication recently presented at NeurIPS 2020. In this work we present an algorithm, Protagonist Antagonist Induced Regret Environment Design (PAIRED), that is based on minimax regret and prevents the adversary from creating impossible environments, while still enabling it to correct weaknesses in the agent’s policy. PAIRED incentivizes the adversary to tune the difficulty of the generated environments to be just outside the agent’s current abilities, leading to an automatic curriculum of increasingly challenging training tasks. We show that agents trained with PAIRED learn more complex behavior and generalize better to unknown test tasks. We have released open-source code for PAIRED on our GitHub repo.

PAIRED
To flexibly constrain the adversary, PAIRED introduces a third RL agent, which we call the antagonist agent, because it is allied with the adversarial agent, i.e., the one designing the environment. We rename our initial agent, the one navigating the environment, the protagonist. Once the adversary generates an environment, both the protagonist and antagonist play through that environment.

The adversary’s job is to maximize the antagonist’s reward while minimizing the protagonist’s reward. This means it must create environments that are feasible (because the antagonist can solve them and get a high score), but challenging to the protagonist (exploit weaknesses in its current policy). The gap between the two rewards is the regret — the adversary tries to maximize the regret, while the protagonist competes to minimize it.

The methods discussed above (domain randomization, minimax regret and PAIRED) can be analyzed using the same theoretical framework, unsupervised environment design (UED), which we describe in detail in the paper. UED draws a connection between environment design and decision theory, enabling us to show that domain randomization is equivalent to the Principle of Insufficient Reason, the minimax adversary follows the Maximin Principle, and PAIRED is optimizing minimax regret. This formalism enables us to use tools from decision theory to understand the benefits and drawbacks of each method. Below, we show how each of these ideas works for environment design:

Domain randomization (a) generates unstructured environments that aren’t tailored to the agent’s learning progress. The minimax adversary (b) may create impossible environments. PAIRED (c) can generate challenging, structured environments, which are still possible for the agent to complete.

Curriculum Generation
What’s interesting about minimax regret is that it incentivizes the adversary to generate a curriculum of initially easy, then increasingly challenging environments. In most RL environments, the reward function will give a higher score for completing the task more efficiently, or in fewer timesteps. When this is true, we can show that regret incentivizes the adversary to create the easiest possible environment the protagonist can’t solve yet. To see this, let’s assume the antagonist is perfect, and always gets the highest score that it possibly can. Meanwhile, the protagonist is terrible, and gets a score of zero on everything. In that case, the regret just depends on the difficulty of the environment. Since easier environments can be completed in fewer timesteps, they allow the antagonist to get a higher score. Therefore, the regret of failing at an easy environment is greater than the regret of failing on a hard environment:

So, by maximizing regret the adversary is searching for easy environments that the protagonist fails to do. Once the protagonist learns to solve each environment, the adversary must move on to finding a slightly harder environment that the protagonist can’t solve. Thus, the adversary generates a curriculum of increasingly difficult tasks.

Results
We can see the curriculum emerging in the learning curves below, which plot the shortest path length of a maze the agents have successfully solved. Unlike minimax or domain randomization, the PAIRED adversary creates a curriculum of increasingly longer, but possible, mazes, enabling PAIRED agents to learn more complex behavior.

But can these different training schemes help an agent generalize better to unknown test tasks? Below, we see the zero-shot transfer performance of each algorithm on a series of challenging test tasks. As the complexity of the transfer environment increases, the performance gap between PAIRED and the baselines widens. For extremely difficult tasks like Labyrinth and Maze, PAIRED is the only method that can occasionally solve the task. These results provide promising evidence that PAIRED can be used to improve generalization for deep RL.

Admittedly, these simple gridworlds do not reflect the complexities of the real world tasks that many RL methods are attempting to solve. We address this in “Adversarial Environment Generation for Learning to Navigate the Web”, which examines the performance of PAIRED when applied to more complex problems, such as teaching RL agents to navigate web pages. We propose an improved version of PAIRED, and show how it can be used to train an adversary to generate a curriculum of increasingly challenging websites:

Above, you can see websites built by the adversary in the early, middle, and late training stages, which progress from using very few elements per page to many simultaneous elements, making the tasks progressively harder. We test whether agents trained on this curriculum can generalize to standardized web navigation tasks, and achieve a 75% success rate, with a 4x improvement over the strongest curriculum learning baseline:

Conclusions
Deep RL is very good at fitting a simulated training environment, but how can we build simulations that cover the complexity of the real world? One solution is to automate this process. We propose Unsupervised Environment Design (UED) as a framework that describes different methods for automatically creating a distribution of training environments, and show that UED subsumes prior work like domain randomization and minimax adversarial training. We think PAIRED is a good approach for UED, because regret maximization leads to a curriculum of increasingly challenging tasks, and prepares agents to transfer successfully to unknown test tasks.

Acknowledgements
We would like to recognize the co-authors of “Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design”: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, and Sergey Levine, as well as the co-authors of “Adversarial Environment Generation for Learning to Navigate the Web”: Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust. In addition, we thank Michael Chang, Marvin Zhang, Dale Schuurmans, Aleksandra Faust, Chase Kew, Jie Tan, Dennis Lee, Kelvin Xu, Abhishek Gupta, Adam Gleave, Rohin Shah, Daniel Filan, Lawrence Chan, Sam Toyer, Tyler Westenbroek, Igor Mordatch, Shane Gu, DJ Strouse, and Max Kleiman-Weiner for discussions that contributed to this work.

Offsites

Lyra: A New Very Low-Bitrate Codec for Speech Compression

Post author By
Post date February 25, 2021
No Comments on Lyra: A New Very Low-Bitrate Codec for Speech Compression

Posted by Alejandro Luebs, Software Engineer and Jamieson Brettle, Product Manager, Chrome

Connecting to others online via voice and video calls is something that is increasingly a part of everyday life. The real-time communication frameworks, like WebRTC, that make this possible depend on efficient compression techniques, codecs, to encode (or decode) signals for transmission or storage. A vital part of media applications for decades, codecs allow bandwidth-hungry applications to efficiently transmit data, and have led to an expectation of high-quality communication anywhere at any time.

As such, a continuing challenge in developing codecs, both for video and audio, is to provide increasing quality, using less data, and to minimize latency for real-time communication. Even though video might seem much more bandwidth hungry than audio, modern video codecs can reach lower bitrates than some high-quality speech codecs used today. Combining low-bitrate video and speech codecs can deliver a high-quality video call experience even in low-bandwidth networks. Yet historically, the lower the bitrate for an audio codec, the less intelligible and more robotic the voice signal becomes. Furthermore, while some people have access to a consistent high-quality, high-speed network, this level of connectivity isn’t universal, and even those in well connected areas at times experience poor quality, low bandwidth, and congested network connections.

To solve this problem, we have created Lyra, a high-quality, very low-bitrate speech codec that makes voice communication available even on the slowest networks. To do this, we’ve applied traditional codec techniques while leveraging advances in machine learning (ML) with models trained on thousands of hours of data to create a novel method for compressing and transmitting voice signals.

Lyra Overview
The basic architecture of the Lyra codec is quite simple. Features, or distinctive speech attributes, are extracted from speech every 40ms and are then compressed for transmission. The features themselves are log mel spectrograms, a list of numbers representing the speech energy in different frequency bands, which have traditionally been used for their perceptual relevance because they are modeled after human auditory response. On the other end, a generative model uses those features to recreate the speech signal. In this sense, Lyra is very similar to other traditional parametric codecs, such as MELP.

However traditional parametric codecs, which simply extract from speech critical parameters that can then be used to recreate the signal at the receiving end, achieve low bitrates, but often sound robotic and unnatural. These shortcomings have led to the development of a new generation of high-quality audio generative models that have revolutionized the field by being able to not only differentiate between signals, but also generate completely new ones. DeepMind’s WaveNet was the first of these generative models that paved the way for many to come. Additionally, WaveNetEQ, the generative model-based packet-loss-concealment system currently used in Duo, has demonstrated how this technology can be used in real-world scenarios.

A New Approach to Compression with Lyra
Using these models as a baseline, we’ve developed a new model capable of reconstructing speech using minimal amounts of data. Lyra harnesses the power of these new natural-sounding generative models to maintain the low bitrate of parametric codecs while achieving high quality, on par with state-of-the-art waveform codecs used in most streaming and communication platforms today. The drawback of waveform codecs is that they achieve this high quality by compressing and sending over the signal sample-by-sample, which requires a higher bitrate and, in most cases, isn’t necessary to achieve natural sounding speech.

One concern with generative models is their computational complexity. Lyra avoids this issue by using a cheaper recurrent generative model, a WaveRNN variation, that works at a lower rate, but generates in parallel multiple signals in different frequency ranges that it later combines into a single output signal at the desired sample rate. This trick enables Lyra to not only run on cloud servers, but also on-device on mid-range phones in real time (with a processing latency of 90ms, which is in line with other traditional speech codecs). This generative model is then trained on thousands of hours of speech data and optimized, similarly to WaveNet, to accurately recreate the input audio.

Comparison with Existing Codecs
Since the inception of Lyra, our mission has been to provide the best quality audio using a fraction of the bitrate data of alternatives. Currently, the royalty-free open-source codec Opus, is the most widely used codec for WebRTC-based VOIP applications and, with audio at 32kbps, typically obtains transparent speech quality, i.e., indistinguishable from the original. However, while Opus can be used in more bandwidth constrained environments down to 6kbps, it starts to demonstrate degraded audio quality. Other codecs are capable of operating at comparable bitrates to Lyra (Speex, MELP, AMR), but each suffer from increased artifacts and result in a robotic sounding voice.

Lyra is currently designed to operate at 3kbps and listening tests show that Lyra outperforms any other codec at that bitrate and is compared favorably to Opus at 8kbps, thus achieving more than a 60% reduction in bandwidth. Lyra can be used wherever the bandwidth conditions are insufficient for higher-bitrates and existing low-bitrate codecs do not provide adequate quality.

Clean Speech

Original

Opus@6kbps

Lyra@3kbps

Speex@3kbps

Noisy Environment

Original

Opus@6kbps

Lyra@3kbps

Speex@3kbps

Reference	Opus@6kbps	Lyra@3kbps

Ensuring Fairness
As with any ML based system, the model must be trained to make sure that it works for everyone. We’ve trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verifying the audio quality with expert and crowdsourced listeners. One of the design goals of Lyra is to ensure universally accessible high-quality audio experiences. Lyra trains on a wide dataset, including speakers in a myriad of languages, to make sure the codec is robust to any situation it might encounter.

Societal Impact and Where We Go From Here
The implications of technologies like Lyra are far reaching, both in the short and long term. With Lyra, billions of users in emerging markets can have access to an efficient low-bitrate codec that allows them to have higher quality audio than ever before. Additionally, Lyra can be used in cloud environments enabling users with various network and device capabilities to chat seamlessly with each other. Pairing Lyra with new video compression technologies, like AV1, will allow video chats to take place, even for users connecting to the internet via a 56kbps dial-in modem.

Duo already uses ML to reduce audio interruptions, and is currently rolling out Lyra to improve audio call quality and reliability on very low bandwidth connections. We will continue to optimize Lyra’s performance and quality to ensure maximum availability of the technology, with investigations into acceleration via GPUs and TPUs. We are also beginning to research how these technologies can lead to a low-bitrate general-purpose audio codec (i.e., music and other non-speech use cases).

Acknowledgements
Thanks to everyone who made Lyra possible including Jan Skoglund, Felicia Lim, Michael Chinen, Bastiaan Kleijn, Tom Denton, Andrew Storus, Yero Yeh (Chrome Media), Henrik Lundin, Niklas Blum, Karl Wiberg (Google Duo), Chenjie Gu, Zach Gleicher, Norman Casagrande, Erich Elsen (DeepMind).

Offsites

The Technology Behind Cinematic Photos

Post author By
Post date February 23, 2021
No Comments on The Technology Behind Cinematic Photos

Posted by Per Karlsson and Lucy Yu, Software Engineers, Google Research

Looking at photos from the past can help people relive some of their most treasured moments. Last December we launched Cinematic photos, a new feature in Google Photos that aims to recapture the sense of immersion felt the moment a photo was taken, simulating camera motion and parallax by inferring 3D representations in an image. In this post, we take a look at the technology behind this process, and demonstrate how Cinematic photos can turn a single 2D photo from the past into a more immersive 3D animation.

Camera 3D model courtesy of Rick Reitano.

Depth Estimation
Like many recent computational photography features such as Portrait Mode and Augmented Reality (AR), Cinematic photos requires a depth map to provide information about the 3D structure of a scene. Typical techniques for computing depth on a smartphone rely on multi-view stereo, a geometry method to solve for the depth of objects in a scene by simultaneously capturing multiple photos at different viewpoints, where the distances between the cameras is known. In the Pixel phones, the views come from two cameras or dual-pixel sensors.

To enable Cinematic photos on existing pictures that were not taken in multi-view stereo, we trained a convolutional neural network with encoder-decoder architecture to predict a depth map from just a single RGB image. Using only one view, the model learned to estimate depth using monocular cues, such as the relative sizes of objects, linear perspective, defocus blur, etc.

Because monocular depth estimation datasets are typically designed for domains such as AR, robotics, and self-driving, they tend to emphasize street scenes or indoor room scenes instead of features more common in casual photography, like people, pets, and objects, which have different composition and framing. So, we created our own dataset for training the monocular depth model using photos captured on a custom 5-camera rig as well as another dataset of Portrait photos captured on Pixel 4. Both datasets included ground-truth depth from multi-view stereo that is critical for training a model.

Mixing several datasets in this way exposes the model to a larger variety of scenes and camera hardware, improving its predictions on photos in the wild. However, it also introduces new challenges, because the ground-truth depth from different datasets may differ from each other by an unknown scaling factor and shift. Fortunately, the Cinematic photo effect only needs the relative depths of objects in the scene, not the absolute depths. Thus we can combine datasets by using a scale-and-shift-invariant loss during training and then normalize the output of the model at inference.

The Cinematic photo effect is particularly sensitive to the depth map’s accuracy at person boundaries. An error in the depth map can result in jarring artifacts in the final rendered effect. To mitigate this, we apply median filtering to improve the edges, and also infer segmentation masks of any people in the photo using a DeepLab segmentation model trained on the Open Images dataset. The masks are used to pull forward pixels of the depth map that were incorrectly predicted to be in the background.

Camera Trajectory
There can be many degrees of freedom when animating a camera in a 3D scene, and our virtual camera setup is inspired by professional video camera rigs to create cinematic motion. Part of this is identifying the optimal pivot point for the virtual camera’s rotation in order to yield the best results by drawing one’s eye to the subject.

The first step in 3D scene reconstruction is to create a mesh by extruding the RGB image onto the depth map. By doing so, neighboring points in the mesh can have large depth differences. While this is not noticeable from the “face-on” view, the more the virtual camera is moved, the more likely it is to see polygons spanning large changes in depth. In the rendered output video, this will look like the input texture is stretched. The biggest challenge when animating the virtual camera is to find a trajectory that introduces parallax while minimizing these “stretchy” artifacts.

The parts of the mesh with large depth differences become more visible (red visualization) once the camera is away from the “face-on” view. In these areas, the photo appears to be stretched, which we call “stretchy artifacts”.

Because of the wide spectrum in user photos and their corresponding 3D reconstructions, it is not possible to share one trajectory across all animations. Instead, we define a loss function that captures how much of the stretchiness can be seen in the final animation, which allows us to optimize the camera parameters for each unique photo. Rather than counting the total number of pixels identified as artifacts, the loss function triggers more heavily in areas with a greater number of connected artifact pixels, which reflects a viewer’s tendency to more easily notice artifacts in these connected areas.

We utilize padded segmentation masks from a human pose network to divide the image into three different regions: head, body and background. The loss function is normalized inside each region before computing the final loss as a weighted sum of the normalized losses. Ideally the generated output video is free from artifacts but in practice, this is rare. Weighting the regions differently biases the optimization process to pick trajectories that prefer artifacts in the background regions, rather than those artifacts near the image subject.

During the camera trajectory optimization, the goal is to select a path for the camera with the least amount of noticeable artifacts. In these preview images, artifacts in the output are colored red while the green and blue overlay visualizes the different body regions.

Framing the Scene
Generally, the reprojected 3D scene does not neatly fit into a rectangle with portrait orientation, so it was also necessary to frame the output with the correct right aspect ratio while still retaining the key parts of the input image. To accomplish this, we use a deep neural network that predicts per-pixel saliency of the full image. When framing the virtual camera in 3D, the model identifies and captures as many salient regions as possible while ensuring that the rendered mesh fully occupies every output video frame. This sometimes requires the model to shrink the camera’s field of view.

Heatmap of the predicted per-pixel saliency. We want the creation to include as much of the salient regions as possible when framing the virtual camera.

Conclusion
Through Cinematic photos, we implemented a system of algorithms – with each ML model evaluated for fairness – that work together to allow users to relive their memories in a new way, and we are excited about future research and feature improvements. Now that you know how they are created, keep an eye open for automatically created Cinematic photos that may appear in your recent memories within the Google Photos app!

Acknowledgments
Cinematic Photos is the result of a collaboration between Google Research and Google Photos teams. Key contributors also include: Andre Le, Brian Curless, Cassidy Curtis, Ce Liu‎, Chun-po Wang, Daniel Jenstad, David Salesin, Dominik Kaeser, Gina Reynolds, Hao Xu, Huiwen Chang, Huizhong Chen‎, Jamie Aspinall, Janne Kontkanen, Matthew DuVall, Michael Kucera, Michael Milne, Mike Krainin, Mike Liu, Navin Sarma, Orly Liba, Peter Hedman, Rocky Cai‎, Ruirui Jiang‎, Steven Hickson, Tracy Gu, Tyler Zhu, Varun Jampani, Yuan Hao, Zhongli Ding.

Offsites

Introducing Model Search: An Open Source Platform for Finding Optimal ML Models

Post author By
Post date February 19, 2021
No Comments on Introducing Model Search: An Open Source Platform for Finding Optimal ML Models

Posted by Hanna Mazzawi, Research Engineer and Xavi Gonzalvo, Research Scientist, Google Research

The success of a neural network (NN) often depends on how well it can generalize to various tasks. However, designing NNs that can generalize well is challenging because the research community’s understanding of how a neural network generalizes is currently somewhat limited: What does the appropriate neural network look like for a given problem? How deep should it be? Which types of layers should be used? Would LSTMs be enough or would Transformer layers be better? Or maybe a combination of the two? Would ensembling or distillation boost performance? These tricky questions are made even more challenging when considering machine learning (ML) domains where there may exist better intuition and deeper understanding than others.

In recent years, AutoML algorithms have emerged [e.g., 1, 2, 3] to help researchers find the right neural network automatically without the need for manual experimentation. Techniques like neural architecture search (NAS), use algorithms, like reinforcement learning (RL), evolutionary algorithms, and combinatorial search, to build a neural network out of a given search space. With the proper setup, these techniques have demonstrated they are capable of delivering results that are better than the manually designed counterparts. But more often than not, these algorithms are compute heavy, and need thousands of models to train before converging. Moreover, they explore search spaces that are domain specific and incorporate substantial prior human knowledge that does not transfer well across domains. As an example, in image classification, the traditional NAS searches for two good building blocks (convolutional and downsampling blocks), that it arranges following traditional conventions to create the full network.

To overcome these shortcomings and to extend access to AutoML solutions to the broader research community, we are excited to announce the open source release of Model Search, a platform that helps researchers develop the best ML models, efficiently and automatically. Instead of focusing on a specific domain, Model Search is domain agnostic, flexible and is capable of finding the appropriate architecture that best fits a given dataset and problem, while minimizing coding time, effort and compute resources. It is built on Tensorflow, and can run either on a single machine or in a distributed setting.

Overview
The Model Search system consists of multiple trainers, a search algorithm, a transfer learning algorithm and a database to store the various evaluated models. The system runs both training and evaluation experiments for various ML models (different architectures and training techniques) in an adaptive, yet asynchronous, fashion. While each trainer conducts experiments independently, all trainers share the knowledge gained from their experiments. At the beginning of every cycle, the search algorithm looks up all the completed trials and uses beam search to decide what to try next. It then invokes mutation over one of the best architectures found thus far and assigns the resulting model back to a trainer.

Model Search schematic illustrating the distributed search and ensembling. Each trainer runs independently to train and evaluate a given model. The results are shared with the search algorithm, which it stores. The search algorithm then invokes mutation over one of the best architectures and then sends the new model back to a trainer for the next iteration. S is the set of training and validation examples and A are all the candidates used during training and search.

The system builds a neural network model from a set of predefined blocks, each of which represents a known micro-architecture, like LSTM, ResNet or Transformer layers. By using blocks of pre-existing architectural components, Model Search is able to leverage existing best knowledge from NAS research across domains. This approach is also more efficient, because it explores structures, not their more fundamental and detailed components, therefore reducing the scale of the search space.

Neural network micro architecture blocks that work well, e.g., a ResNet Block.

Because the Model Search framework is built on Tensorflow, blocks can implement any function that takes a tensor as an input. For example, imagine that one wants to introduce a new search space built with a selection of micro architectures. The framework will take the newly defined blocks and incorporate them into the search process so that algorithms can build the best possible neural network from the components provided. The blocks provided can even be fully defined neural networks that are already known to work for the problem of interest. In that case, Model Search can be configured to simply act as a powerful ensembling machine.

The search algorithms implemented in Model Search are adaptive, greedy and incremental, which makes them converge faster than RL algorithms. They do however imitate the “explore & exploit” nature of RL algorithms by separating the search for a good candidate (explore step), and boosting accuracy by ensembling good candidates that were discovered (exploit step). The main search algorithm adaptively modifies one of the top k performing experiments (where k can be specified by the user) after applying random changes to the architecture or the training technique (e.g., making the architecture deeper).

An example of an evolution of a network over many experiments. Each color represents a different type of architecture block. The final network is formed via mutations of high performing candidate networks, in this case adding depth.

To further improve efficiency and accuracy, transfer learning is enabled between various internal experiments. Model Search does this in two ways — via knowledge distillation or weight sharing. Knowledge distillation allows one to improve candidates’ accuracies by adding a loss term that matches the high performing models’ predictions in addition to the ground truth. Weight sharing, on the other hand, bootstraps some of the parameters (after applying mutation) in the network from previously trained candidates by copying suitable weights from previously trained models and randomly initializing the remaining ones. This enables faster training, which allows opportunities to discover more (and better) architectures.

Experimental Results
Model Search improves upon production models with minimal iterations. In a recent paper, we demonstrated the capabilities of Model Search in the speech domain by discovering a model for keyword spotting and language identification. Over fewer than 200 iterations, the resulting model slightly improved upon internal state-of-the-art production models designed by experts in accuracy using ~130K fewer trainable parameters (184K compared to 315K parameters).

Model accuracy given iteration in our system compared to the previous production model for keyword spotting, a similar graph can be found for language identification in the linked paper.

We also applied Model Search to find an architecture suitable for image classification on the heavily explored CIFAR-10 imaging dataset. Using a set known convolution blocks, including convolutions, resnet blocks (i.e., two convolutions and a skip connection), NAS-A cells, fully connected layers, etc., we observed that we were able to quickly reach a benchmark accuracy of 91.83 in 209 trials (i.e., exploring only 209 models). In comparison, previous top performers reached the same threshold accuracy in 5807 trials for the NasNet algorithm (RL), and 1160 for PNAS (RL + Progressive).

Conclusion
We hope the Model Search code will provide researchers with a flexible, domain-agnostic framework for ML model discovery. By building upon previous knowledge for a given domain, we believe that this framework is powerful enough to build models with the state-of-the-art performance on well studied problems when provided with a search space composed of standard building blocks.

Acknowledgements
Special thanks to all code contributors to the open sourcing and the paper: Eugen Ehotaj, Scotty Yak, Malaika Handa, James Preiss, Pai Zhu, Aleks Kracun, Prashant Sridhar, Niranjan Subrahmanya, Ignacio Lopez Moreno, Hyun Jin Park, and Patrick Violette.

Offsites

Mastering Atari with Discrete World Models

Post author By
Post date February 18, 2021
No Comments on Mastering Atari with Discrete World Models

Posted by Danijar Hafner, Student Researcher, Google Research

Deep reinforcement learning (RL) enables artificial agents to improve their decisions over time. Traditional model-free approaches learn which of the actions are successful in different situations by interacting with the environment through a large amount of trial and error. In contrast, recent advances in deep RL have enabled model-based approaches to learn accurate world models from image inputs and use them for planning. World models can learn from fewer interactions, facilitate generalization from offline data, enable forward-looking exploration, and allow reusing knowledge across multiple tasks.

Despite their intriguing benefits, existing world models (such as SimPLe) have not been accurate enough to compete with the top model-free approaches on the most competitive reinforcement learning benchmarks — to date, the well-established Atari benchmark requires model-free algorithms, such as DQN, IQN, and Rainbow, to reach human-level performance. As a result, many researchers have focused instead on developing task-specific planning methods, such as VPN and MuZero, which learn by predicting sums of expected task rewards. However, these methods are specific to individual tasks and it is unclear how well they would generalize to new tasks or learn from unsupervised datasets. Similar to the recent breakthrough of unsupervised representation learning in computer vision [1, 2], world models aim to learn patterns in the environment that are more general than any particular task to later solve tasks more efficiently.

Today, in collaboration with DeepMind and the University of Toronto, we introduce DreamerV2, the first RL agent based on a world model to achieve human-level performance on the Atari benchmark. It constitutes the second generation of the Dreamer agent that learns behaviors purely within the latent space of a world model trained from pixels. DreamerV2 relies exclusively on general information from the images and accurately predicts future task rewards even when its representations were not influenced by those rewards. Using a single GPU, DreamerV2 outperforms top model-free algorithms with the same compute and sample budget.

Gamer normalized median score across the 55 Atari games after 200 million steps. DreamerV2 substantially outperforms previous world models. Moreover, it exceeds top model-free agents within the same compute and sample budget.

Behaviors learned by DreamerV2 for some of the 55 Atari games. These videos show images from the environment. Video predictions are shown below in the blog post.

An Abstract Model of the World
Just like its predecessor, DreamerV2 learns a world model and uses it to train actor-critic behaviors purely from predicted trajectories. The world model automatically learns to compute compact representations of its images that discover useful concepts, such as object positions, and learns how these concepts change in response to different actions. This lets the agent generate abstractions of its images that ignore irrelevant details and enables massively parallel predictions on a single GPU. During 200 million environment steps, DreamerV2 predicts 468 billion compact states for learning its behavior.

DreamerV2 builds upon the Recurrent State-Space Model (RSSM) that we introduced for PlaNet and was also used for DreamerV1. During training, an encoder turns each image into a stochastic representation that is incorporated into the recurrent state of the world model. Because the representations are stochastic, they do not have access to perfect information about the images and instead extract only what is necessary to make predictions, making the agent robust to unseen images. From each state, a decoder reconstructs the corresponding image to learn general representations. Moreover, a small reward network is trained to rank outcomes during planning. To enable planning without generating images, a predictor learns to guess the stochastic representations without access to the images from which they were computed.

Learning process of the world model used by DreamerV2. The world model maintains recurrent states (h₁–h₃) that receive actions (a₁–a₂) and incorporate information about the images (x₁–x₃) via stochastic representations (z₁–z₃). A predictor guesses the representations as (ẑ₁–ẑ₃) without access to the images from which they were generated.

Importantly, DreamerV2 introduces two new techniques to RSSM that lead to a substantially more accurate world model for learning successful policies. The first technique is to represent each image with multiple categorical variables instead of the Gaussian variables used by PlaNet, DreamerV1, and many more world models in the literature [1, 2, 3, 4, 5]. This leads the world model to reason about the world in terms of discrete concepts and enables more accurate predictions of future representations.

The encoder turns each image into 32 distributions over 32 classes each, the meanings of which are determined automatically as the world model learns. The one-hot vectors sampled from these distributions are concatenated to a sparse representation that is passed on to the recurrent state. To backpropagate through the samples, we use straight-through gradients that are easy to implement using automatic differentiation. Representing images with categorical variables allows the predictor to accurately learn the distribution over the one-hot vectors of the possible next images. In contrast, earlier world models that use Gaussian predictors cannot accurately match the distribution over multiple Gaussian representations for the possible next images.

Multiple categoricals that represent possible next images can be accurately predicted by a categorical predictor, whereas a Gaussian predictor is not flexible enough to accurately predict multiple possible Gaussian representations.

The second new technique of DreamerV2 is KL balancing. Many previous world models use the ELBO objective that encourages accurate reconstructions while keeping the stochastic representations (posteriors) close to their predictions (priors) to regularize the amount of information extracted from each image and facilitate generalization. Because the objective is optimized end-to-end, the stochastic representations and their predictions can be made more similar by bringing either of the two towards the other. However, bringing the representations towards their predictions can be problematic when the predictor is not yet accurate. KL balancing lets the predictions move faster toward the representations than vice versa. This results in more accurate predictions, a key to successful planning.

Long-term video predictions of the world model for holdout sequences. Each model receives 5 frames as input (not shown) and then predicts 45 steps forward given only actions. The video predictions are only used to gain insights into the quality of the world model. During planning, only compact representations are predicted, not images.

Measuring Atari Performance
DreamerV2 is the first world model that enables learning successful behaviors with human-level performance on the well-established and competitive Atari benchmark. We select the 55 games that many previous studies have in common and recommend this set of games for future work. Following the standard evaluation protocol, the agents are allowed 200M environment interactions using an action repeat of 4 and sticky actions (25% chance that an action is ignored and the previous action is repeated instead). We compare to the top model-free agents IQN and Rainbow, as well as to the well-known C51 and DQN agents implemented in the Dopamine framework.

Different standards exist for aggregating the scores across the 55 games. Ideally, a new algorithm would perform better under all conditions. For all four aggregation methods, DreamerV2 indeed outperforms all compared model-free algorithms while using the same computational budget.

DreamerV2 outperforms the top model-free agents according to four methods for aggregating scores across the 55 Atari games. We introduce and recommend the Clipped Record Mean (right-most plot) as an informative and robust performance metric.

The first three aggregation methods were previously proposed in the literature. We identify important drawbacks in each and recommend a new aggregation method, the clipped record mean to overcome their drawbacks.

Gamer Median. Most commonly, scores for each game are normalized by the performance of a human gamer that was assessed for the DQN paper and the median of the normalized scores of all games is reported. Unfortunately, the median ignores the scores of many simpler and harder games.
Gamer Mean. The mean takes the scores for all games into account but is mainly influenced by a small number of games where the human gamer performed poorly. This makes it easy for an algorithm to achieve large normalized scores on some games (e.g., James Bond, Video Pinball) that then dominate the mean.
Record Mean. Prior work recommends normalization based on the human world record instead, but such a metric is still overly influenced by a small number of games where it is easy for the artificial agents to outscore the human record.
Clipped Record Mean. We introduce a new metric that normalizes scores by the world record and clips them to not exceed the record. This yields an informative and robust metric that takes the performance on all games into account to an approximately equal amount.

While many current algorithms exceed the human gamer baseline, they are still quite far behind the human world record. As shown in the right-most plot above, DreamerV2 leads by achieving 25% of the human record on average across games. Clipping the scores at the record line lets us focus our efforts on developing methods that come closer to the human world record on all of the games rather than exceeding it on just a few games.

What matters and what doesn’t
To gain insights into the important components of DreamerV2, we conduct an extensive ablation study. Importantly, we find that categorical representations offer a clear advantage over Gaussian representations despite the fact that Gaussians have been used extensively in prior works. KL balancing provides an even more substantial advantage over the KL regularizer used by most generative models.

By preventing the image reconstruction or reward prediction gradients from shaping the model states, we study their importance for learning successful representations. We find that DreamerV2 relies completely on universal information from the high-dimensional input images and its representations enable accurate reward predictions even when they were not trained using information about the reward. This mirrors the success of unsupervised representation learning in the computer vision community.

Atari performance for various ablations of DreamerV2 (Clipped Record Mean). Categorical representations, KL balancing, and learning about the images are crucial for the success of DreamerV2. Using reward information, that is specific to narrow tasks, offers no additional benefits for learning the world model.

Conclusion
We show how to learn a powerful world model to achieve human-level performance on the competitive Atari benchmark and outperform the top model-free agents. This result demonstrates that world models are a powerful approach for achieving high performance on reinforcement learning problems and are ready to use for practitioners and researchers. We see this as an indication that the success of unsupervised representation learning in computer vision [1, 2] is now starting to be realized in reinforcement learning in the form of world models. An unofficial implementation of DreamerV2 is available on Github and provides a productive starting point for future research projects. We see world models that leverage large offline datasets, long-term memory, hierarchical planning, and directed exploration as exciting avenues for future research.

Acknowledgements
This project is a collaboration with Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. We further thank everybody on the Brain Team and beyond who commented on our paper draft and provided feedback at any point throughout the project.

Offsites

First mlverse survey results – software, applications, and beyond

Post author By
Post date February 17, 2021
No Comments on First mlverse survey results – software, applications, and beyond

Thank you everyone who participated in our first mlverse survey!

Wait: What even is the mlverse?

The mlverse originated as an abbreviation of multiverse1, which, on its part, came into being as an intended allusion to the well-known tidyverse. As such, although mlverse software aims for seamless interoperability with the tidyverse, or even integration when feasible (see our recent post featuring a wholly tidymodels-integrated torch network architecture), the priorities are probably a bit different: Often, mlverse software’s raison d’être is to allow R users to do things that are commonly known to be done with other languages, such as Python.

As of today, mlverse development takes place mainly in two broad areas: deep learning, and distributed computing / ML automation. By its very nature, though, it is open to changing user interests and demands. Which leads us to the topic of this post.

The survey

GitHub issues and community questions are valuable feedback, but we wanted something more direct. We wanted a way to find out how you, our users, employ the software, and what for; what you think could be improved; what you wish existed but is not there (yet). To that end, we created a survey. Complementing software- and application-related questions for the above-mentioned broad areas, the survey had a third section, asking about how you perceive ethical and social implications of AI as applied in the “real world”.

A few things upfront:

Firstly, the survey was completely anonymous, in that we asked for neither identifiers (such as e-mail addresses) nor things that render one identifiable, such as gender or geographic location. In the same vein, we had collection of IP addresses disabled on purpose.

Secondly, just like GitHub issues are a biased sample, this survey’s participants must be. Main venues of promotion were rstudio::global, Twitter, LinkedIn, and RStudio Community. As this was the first time we did such a thing (and under significant time constraints), not everything was planned to perfection – not wording-wise and not distribution-wise. Nevertheless, we got a lot of interesting, helpful, and often very detailed answers, – and for the next time we do this, we’ll have our lessons learned!

Thirdly, all questions were optional, naturally resulting in different numbers of valid answers per question. On the other hand, not having to select a bunch of “not applicable” boxes freed respondents to spend time on topics that mattered to them.

As a final pre-remark, most questions allowed for multiple answers.

In sum, we ended up with 138 completed2 surveys. Thanks again everyone who participated, and especially, thank you for taking the time to answer the – many – free-form questions!

Deep learning Areas and applications

Our first goal was to find out in which settings, and for what kinds of applications, deep-learning software is being used.

Overall, 72 respondents reported using DL in their jobs in industry, followed by academia (23), studies (21), spare time (43), and not-actually-using-but-wanting-to (24).

Of those working with DL in industry, more than twenty said they worked in consulting, finance, and healthcare (each). IT, education, retail, pharma, and transportation were each mentioned more than ten times:

(#fig:unnamed-chunk-1)Number of users reporting to use DL in industry. Smaller groups not displayed.

In academia, dominant fields (as per survey participants) were bioinformatics, genomics, and IT, followed by biology, medicine, pharmacology, and social sciences:

(#fig:unnamed-chunk-2)Number of users reporting to use DL in academia. Smaller groups not displayed.

What application areas matter to larger subgroups of “our” users? Nearly a hundred (of 138!) respondents said they used DL for some kind of image-processing application (including classification, segmentation, and object detection). Next up was time-series forecasting, followed by unsupervised learning.

The popularity of unsupervised DL was a bit unexpected; had we anticipated this, we would have asked for more detail here. So if you’re one of the people who selected this – or if you didn’t participate, but do use DL for unsupervised learning – please let us know a bit more in the comments!

Next, NLP was about on par with the former; followed by DL on tabular data, and anomaly detection. Bayesian deep learning, reinforcement learning, recommendation systems, and audio processing were still mentioned frequently.

(#fig:unnamed-chunk-3)Applications deep learning is used for. Smaller groups not displayed.

Frameworks and skills

We also asked what frameworks and languages participants were using for deep learning, and what they were planning on using in the future. Single-time mentions (e.g., deeplearning4J) are not displayed.

(#fig:unnamed-chunk-4)Framework / language used for deep learning. Single mentions not displayed.

An important thing for any software developer or content creator to investigate is proficiency/levels of expertise present in their audiences. It (nearly) goes without saying that expertise is very different from self-reported expertise. I’d like to be very cautious, then, to interpret the below results.

While with regard to R skills3, the aggregate self-ratings look plausible (to me), I would have guessed a slightly different outcome re DL. Judging from other sources (like, e.g., GitHub issues), I tend to suspect more of a bimodal distribution (a far stronger version of the bimodality we’re already seeing, that is). To me, it seems like we have rather many users who know a lot about DL. In agreement with my gut feeling, though, is the bimodality itself – as opposed to, say, a Gaussian shape.

But of course, sample size is moderate, and sample bias is present.

(#fig:unnamed-chunk-5)Self-rated skills re R and deep learning.

Wishes and suggestions

Now, to the free-form questions. We wanted to know what we could do better.

I’ll address the most salient topics in order of frequency of mention.4 For DL, this is surprisingly easy (as opposed to Spark, as you’ll see).

“No Python”

The number one concern with deep learning from R, for survey respondents, clearly has to do not with R but with Python. This topic appeared in various forms, the most frequent being frustration over how hard it can be, dependent on the environment, to get Python dependencies for TensorFlow/Keras correct. (It also appeared as enthusiasm for torch, which we are very happy about.)

Let me clarify and add some context.

TensorFlow is a Python framework (nowadays subsuming Keras, which is why I’ll be addressing both of those as “TensorFlow” for simplicity) that is made available from R through packages tensorflow and keras . As with other Python libraries, objects are imported and accessible via reticulate . While tensorflow provides the low-level access, keras brings idiomatic-feeling, nice-to-use wrappers that let you forget about the chain of dependencies involved.

On the other hand, torch, a recent addition to mlverse software, is an R port of PyTorch that does not delegate to Python. Instead, its R layer directly calls into libtorch, the C++ library behind PyTorch. In that way, it is like a lot of high-duty R packages, making use of C++ for performance reasons.

Now, this is not the place for recommendations. Here are a few thoughts though.

Clearly, as one respondent remarked, as of today the torch ecosystem does not offer functionality on par with TensorFlow, and for that to change time and – hopefully! more on that below – your, the community’s, help is needed. Why? Because torch is so young, for one; but also, there is a “systemic” reason! With TensorFlow, as we can access any symbol via the tf object, it is always possible, if inelegant, to do from R what you see done in Python. Respective R wrappers nonexistent5, quite a few blog posts (see, e.g., https://blogs.rstudio.com/ai/posts/2020-04-29-encrypted_keras_with_syft/, or A first look at federated learning with TensorFlow) relied on this!

Switching to the topic of tensorflow’s Python dependencies causing problems with installation, my experience (from GitHub issues, as well as my own) has been that difficulties are quite system-dependent. On some OSes, complications seem to appear more often than on others; and low-control (to the individual user) environments like HPC clusters can make things especially difficult. In any case though, I have to (unfortunately) admit that when installation problems appear, they can be very tricky to solve.

tidymodels integration

The second most frequent mention clearly was the wish for tighter tidymodels integration. Here, we wholeheartedly agree. As of today, there is no automated way to accomplish this for torch models generically, but it can be done for specific model implementations.

Last week, torch, tidymodels, and high-energy physics featured the first tidymodels-integrated torch package. And there’s more to come. In fact, if you are developing a package in the torch ecosystem, why not consider doing the same? Should you run into problems, the growing torch community will be happy to help.

Documentation, examples, teaching materials

Thirdly, several respondents expressed the wish for more documentation, examples, and teaching materials. Here, the situation is different for TensorFlow than for torch.

For tensorflow, the website has a multitude of guides, tutorials, and examples. For torch, reflecting the discrepancy in respective lifecycles, materials are not that abundant (yet). However, after a recent refactoring, the website has a new, four-part Get started section addressed to both beginners in DL and experienced TensorFlow users curious to learn about torch. After this hands-on introduction, a good place to get more technical background would be the section on tensors, autograd, and neural network modules.

Truth be told, though, nothing would be more helpful here than contributions from the community. Whenever you solve even the tiniest problem (which is often how things appear to oneself), consider creating a vignette explaining what you did. Future users will be thankful, and a growing user base means that over time, it’ll be your turn to find that some things have already been solved for you!

Community, community, community

The remaining items discussed didn’t come up quite as often (individually), but taken together, they all have something in common: They all are wishes we happen to have, as well!

This definitely holds in the abstract – let me cite:

“Develop more of a DL community”

“Larger developer community and ecosystem. Rstudio has made great tools, but for applied work is has been hard to work against the momentum of working in Python.”

We wholeheartedly agree, and building a larger community is exactly what we’re trying to do. I like the formulation “a DL community” insofar it is framework-independent. In the end, frameworks are just tools, and what counts is our ability to usefully apply those tools to problems we need to solve.

Concrete wishes include

More paper/model implementations (such as TabNet).
Facilities for easy data reshaping and pre-processing (e.g., in order to pass data to RNNs or 1dd convnets in the expected 3-d format).
Probabilistic programming for torch (analogously to TensorFlow Probability).
A high-level library (such as fast.ai) based on torch.

In other words, there is a whole cosmos of useful things to create; and no small group alone can do it. This is where we hope we can build a community of people, each contributing what they’re most interested in, and to whatever extent they wish.

Spark Areas and applications

For Spark, questions broadly paralleled those asked about deep learning.

Overall, judging from this survey (and unsurprisingly), Spark is predominantly used in industry (n = 39). For academic staff and students (taken together), n = 8. Seventeen people reported using Spark in their spare time, while 34 said they wanted to use it in the future.

Looking at industry sectors, we again find finance, consulting, and healthcare dominating.6

(#fig:unnamed-chunk-6)Number of users reporting to use Spark in industry. Smaller groups not displayed.

What do survey respondents do with Spark? Analyses of tabular data and time series dominate:

(#fig:unnamed-chunk-7)Number of users reporting to use Spark in industry. Smaller groups not displayed.

Frameworks and skills

As with deep learning, we wanted to know what language people use to do Spark. If you look at the below graphic, you see R appearing twice: once in connection with sparklyr, once with SparkR. What’s that about?

Both sparklyr and SparkR are R interfaces for Apache Spark, each designed and built with a different set of priorities and, consequently, trade-offs in mind.

sparklyr, one the one hand, will appeal to data scientists at home in the tidyverse, as they’ll be able to use all the data manipulation interfaces they’re familiar with from packages such as dplyr, DBI, tidyr, or broom.

SparkR, on the other hand, is a light-weight R binding for Apache Spark, and is bundled with the same. It’s an excellent choice for practitioners who are well-versed in Apache Spark and just need a thin wrapper to access various Spark functionalities from R.

(#fig:unnamed-chunk-8)Language / language bindings used to do Spark.

When asked to rate their expertise in R7 and Spark, respectively, respondents showed similar behavior as observed for deep learning above: Most people seem to think more of their R skills than their theoretical Spark-related knowledge. However, even more caution should be exercised here than above: The number of responses here was significantly lower.

(#fig:unnamed-chunk-9)Self-rated skills re R and Spark.

Wishes and suggestions

Just like with DL, Spark users were asked what could be improved, and what they were hoping for.

Interestingly, answers were less “clustered” than for DL. While with DL, a few things cropped up again and again, and there were very few mentions of concrete technical features, here we see about the opposite: The great majority of wishes were concrete, technical, and often only came up once.

Probably though, this is not a coincidence.

Looking back at how sparklyr has evolved from 2016 until now, there is a persistent theme of it being the bridge that joins the Apache Spark ecosystem to numerous useful R interfaces, frameworks, and utilities (most notably, the tidyverse).

Many of our users’ suggestions were essentially a continuation of this theme. This holds, for example, for two features already available as of sparklyr 1.4 and 1.2, respectively: support for the Arrow serialization format and for Databricks Connect. It also holds for tidymodels integration (a frequent wish), a simple R interface for defining Spark UDFs (frequently desired, this one too), out-of-core direct computations on Parquet files, and extended time-series functionalities.

We’re thankful for the feedback and will evaluate carefully what could be done in each case. In general, integrating sparklyr with some feature X is a process to be planned carefully, as modifications could, in theory, be made in various places (sparklyr; X; both sparklyr and X; or even a newly-to-be-created extension). In fact, this is a topic deserving of much more detailed coverage, and has to be left to a future post.

Ethics and AI in society

To start, this is probably the section that will profit most from more preparation, the next time we do this survey. Due to time pressure, some (not all!) of the questions ended up being too suggestive, possibly resulting in social-desirability bias.8

Next time, we’ll try to avoid this, and questions in this area will likely look pretty different (more like scenarios or what-if stories)9. However, I was told by several people they’d been positively surprised by simply encountering this topic at all in the survey. So perhaps this is the main point – although there are a few results that I’m sure will be interesting by themselves!

Anticlimactically, the most non-obvious results are presented first.10

“Are you worried about societal/political impacts of how AI is used in the real world?”

For this question, we had four answer options, formulated in a way that left no real “middle ground”. (The labels in the graphic below verbatim reflect those options.)

(#fig:unnamed-chunk-10)Number of users responding to the question ‘Are you worried about societal/political impacts of how AI is used in the real world?’ with the answer options given.

The next question is definitely one to keep for future editions, as from all questions in this section, it definitely has the highest information content.

“When you think of the near future, are you more afraid of AI misuse or more hopeful about positive outcomes?”

Here, the answer was to be given by moving a slider, with -100 signifying “I tend to be more pessimistic”; and 100, “I tend to be more optimistic”. Although it would have been possible to remain undecided, choosing a value close to 0, we instead see a bimodal distribution:

(#fig:unnamed-chunk-11)When you think of the near future, are you more afraid of AI misuse or more hopeful about positive outcomes?

Why worry, and what about

The following two questions are those already alluded to as possibly being overly prone to social-desirability bias. They asked what applications people were worried about, and for what reasons, respectively. Both questions allowed to select however many responses one wanted, intentionally not forcing people to rank things that are not comparable (the way I see it). In both cases though, it was possible to explicitly indicate None (corresponding to “I don’t really find any of these problematic” and “I am not extensively worried”, respectively.)

What applications of AI do you feel are most problematic?

(#fig:unnamed-chunk-12)Number of users selecting the respective application in response to the question: What applications of AI do you feel are most problematic?

If you are worried about misuse and negative impacts, what exactly is it that worries you?

(#fig:unnamed-chunk-13)Number of users selecting the respective impact in response to the question: If you are worried about misuse and negative impacts, what exactly is it that worries you?

Complementing these questions, it was possible to enter further thoughts and concerns in free-form. Although I can’t cite everything that was mentioned here, recurring themes were:

Misuse of AI to the wrong purposes, by the wrong people, and at scale.
Not feeling responsible for how one’s algorithms are used (the I’m just a software engineer topos).
Reluctance, in AI but in society overall as well, to even discuss the topic (ethics).

Finally, although this was mentioned just once, I’d like to relay a comment that went in a direction absent from all provided answer options, but that probably should have been there already: AI being used to construct social credit systems.

“It’s also that you somehow might have to learn to game the algorithm, which will make AI application forcing us to behave in some way to be scored good. That moment scares me when the algorithm is not only learning from our behavior but we behave so that the algorithm predicts us optimally (turning every use case around).”11

Conclusion

This has become a long text. But I think that seeing how much time respondents took to answer the many questions, often including lots of detail in the free-form answers, it seemed like a matter of decency to, in the analysis and report, go into some detail as well.

Thanks again to everyone who took part! We hope to make this a recurring thing, and will strive to design the next edition in a way that makes answers even more information-rich.

Thanks for reading!

Calling it an abbreviation is, in..

Offsites

Rearranging the Visual World

Post author By
Post date February 16, 2021
No Comments on Rearranging the Visual World

Posted by Andy Zeng and Pete Florence, Research Scientists, Robotics at Google

Rearranging objects (such as organizing books on a bookshelf, moving utensils on a dinner table, or pushing piles of coffee beans) is a fundamental skill that can enable robots to physically interact with our diverse and unstructured world. While easy for people, accomplishing such tasks remains an open research challenge for embodied machine learning (ML) systems, as it requires both high-level and low-level perceptual reasoning. For example, when stacking a pile of books, one might consider where the books should be stacked, and in which order, while ensuring that the edges of the books align with each other to form a neat pile.

Across many application areas in ML, simple differences in model architecture can exhibit vastly different generalization properties. Therefore, one might ask whether there are certain deep network architectures that favor simple underlying elements of the rearrangement problem. Convolutional architectures, for example, are common in computer vision as they encode translational invariance, yielding the same response even if an image is shifted, while Transformer architectures are common in language processing because they exploit self-attention to capture long-range contextual dependencies. In robotics applications, one common architectural element is to use object-centric representations such as poses, keypoints, or object descriptors inside learned models, but these representations require additional training data (often manually annotated) and struggle to describe difficult scenarios such as deformables (e.g., playdough), fluids (honey), or piles of stuff (chopped onions).

Today, we present the Transporter Network, a simple model architecture for learning vision-based rearrangement tasks, which appeared as a publication and plenary talk during CoRL 2020. Transporter Nets use a novel approach to 3D spatial understanding that avoids reliance on object-centric representations, making them general for vision-based manipulation but far more sample efficient than benchmarked end-to-end alternatives. As a consequence, they are fast and practical to train on real robots. We are also releasing an accompanying open-source implementation of Transporter Nets together with Ravens, our new simulated benchmark suite of ten vision-based manipulation tasks.

Transporter Networks: Rearranging the Visual World for Robotic Manipulation
The key idea behind the Transporter Network architecture is that one can formulate the rearrangement problem as learning how to move a chunk of 3D space. Rather than relying on an explicit definition of objects (which is bound to struggle at capturing all edge cases), 3D space is a much broader definition for what could serve as the atomic units being rearranged, and can broadly encompass an object, part of an object, or multiple objects, etc. Transporter Nets leverage this structure by capturing a deep representation of the 3D visual world, then overlaying parts of it on itself to imagine various possible rearrangements of 3D space. It then chooses the rearrangements that best match those it has seen during training (e.g., from expert demonstrations), and uses them to parameterize robot actions. This formulation allows Transporter Nets to generalize to unseen objects and enables them to better exploit geometric symmetries in the data, so that they can extrapolate to new scene configurations. Transporter Nets are applicable to a wide variety of rearrangement tasks for robotic manipulation, expanding beyond our earlier models, such as affordance-based manipulation and TossingBot, that focus only on grasping and tossing.

Transporter Nets capture a deep representation of the visual world, then overlay parts of it on itself to imagine various possible rearrangements of 3D space to find the best one and inform robot actions.

Ravens Benchmark
To evaluate the performance of Transporter Nets in a consistent environment for fair comparisons to baselines and ablations, we developed Ravens, a benchmark suite of ten simulated vision-based rearrangement tasks. Ravens features a Gym API with a built-in stochastic oracle to evaluate the sample efficiency of imitation learning methods. Ravens avoids assumptions that cannot transfer to a real setup: observation data contains only RGB-D images and camera parameters; actions are end effector poses (transposed into joint positions with inverse kinematics).

Experiments on these ten tasks show that Transporter Nets are orders of magnitude more sample efficient than other end-to-end methods, and are capable of achieving over 90% success on many tasks with just 100 demonstrations, while the baselines struggle to generalize with the same amount of data. In practice, this makes collecting enough demonstrations a more viable option for training these models on real robots (which we show examples of below).

Our new Ravens benchmark includes ten simulated vision-based manipulation tasks, including pushing and pick-and-place, for which experiments show that Transporter Nets are orders of magnitude more sample efficient than other end-to-end methods. Ravens features a Gym API with a built-in stochastic oracle to evaluate the sample efficiency of imitation learning methods.

Highlights
Given 10 example demonstrations, Transporter Nets can learn pick and place tasks such as stacking plates (surprisingly easy to misplace!), multimodal tasks like aligning any corner of a box to a marker on the tabletop, or building a pyramid of blocks.

By leveraging closed-loop visual feedback, Transporter Nets have the capacity to learn various multi-step sequential tasks with a modest number of demonstrations: such as moving disks for Tower of Hanoi, palletizing boxes, or assembling kits of new objects not seen during training. These tasks have considerably “long horizons”, meaning that to solve the task the model must correctly sequence many individual choices. Policies also tend to learn emergent recovery behaviors.

One surprising thing about these results was that beyond just perception, the models were starting to learn behaviors that resemble high-level planning. For example, to solve Towers of Hanoi, the models have to pick which disk to move next, which requires recognizing the state of the board based on the current visible disks and their positions. With a box-palletizing task, the models must locate the empty spaces of the pallet, and identify how new boxes can fit into those voids. Such behaviors are exciting because they suggest that with all the baked-in invariances, the model can focus its capacity on learning the more high-level patterns in manipulation.

Transporter Nets can also learn tasks that use any motion primitive defined by two end effector poses, such as pushing piles of small objects into a target set, or reconfiguring a deformable rope to connect the two end-points of a 3-sided square. This suggests that rigid spatial displacements can serve as useful priors for nonrigid ones.

Conclusion
Transporter Nets bring a promising approach to learning vision-based manipulation, but are not without limitations. For example, they can be susceptible to noisy 3D data, we have only demonstrated them for sparse waypoint-based control with motion primitives, and it remains unclear how to extend them beyond spatial action spaces to force or torque-based actions. But overall, we are excited about this direction of work, and we hope that it provides inspiration for extensions beyond the applications we’ve discussed. For more details, please check out our paper.

Acknowledgements
This research was done by Andy Zeng, Pete Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, and Johnny Lee, with special thanks to Ken Goldberg, Razvan Surdulescu, Daniel Seita, Ayzaan Wahid, Vincent Vanhoucke, Anelia Angelova, Kendra Byrne, for helpful feedback on writing; Sean Snyder, Jonathan Vela, Larry Bisares, Michael Villanueva, Brandon Hurd for operations and hardware support; Robert Baruch for software infrastructure, Jared Braun for UI contributions; Erwin Coumans for PyBullet advice; Laura Graesser for video narration.

Offsites

3D Scene Understanding with TensorFlow 3D

Post author By
Post date February 11, 2021
No Comments on 3D Scene Understanding with TensorFlow 3D

Posted by Alireza Fathi, Research Scientist and Rui Huang, AI Resident, Google Research

The growing ubiquity of 3D sensors (e.g., Lidar, depth sensing cameras and radar) over the last few years has created a need for scene understanding technology that can process the data these devices capture. Such technology can enable machine learning (ML) systems that use these sensors, like autonomous cars and robots, to navigate and operate in the real world, and can create an improved augmented reality experience on mobile devices. The field of computer vision has recently begun making good progress in 3D scene understanding, including models for mobile 3D object detection, transparent object detection, and more, but entry to the field can be challenging due to the limited availability tools and resources that can be applied to 3D data.

In order to further improve 3D scene understanding and reduce barriers to entry for interested researchers, we are releasing TensorFlow 3D (TF 3D), a highly modular and efficient library that is designed to bring 3D deep learning capabilities into TensorFlow. TF 3D provides a set of popular operations, loss functions, data processing tools, models and metrics that enables the broader research community to develop, train and deploy state-of-the-art 3D scene understanding models.

TF 3D contains training and evaluation pipelines for state-of-the-art 3D semantic segmentation, 3D object detection and 3D instance segmentation, with support for distributed training. It also enables other potential applications like 3D object shape prediction, point cloud registration and point cloud densification. In addition, it offers a unified dataset specification and configuration for training and evaluation of the standard 3D scene understanding datasets. It currently supports the Waymo Open, ScanNet, and Rio datasets. However, users can freely convert other popular datasets, such as NuScenes and Kitti, into a similar format and use them in the pre-existing or custom created pipelines, and can leverage TF 3D for a wide variety of 3D deep learning research and applications, from quickly prototyping and trying new ideas to deploying a real-time inference system.

An example output of the 3D object detection model in TF 3D on a frame from Waymo Open Dataset is shown on the left. An example output of the 3D instance segmentation model on a scene from ScanNet dataset is shown on the right.

Here, we will present the efficient and configurable sparse convolutional backbone that is provided in TF 3D, which is the key to achieving state-of-the-art results on various 3D scene understanding tasks. Furthermore, we will go over each of the three pipelines that TF 3D currently supports: 3D semantic segmentation, 3D object detection and 3D instance segmentation.

3D Sparse Convolutional Network
The 3D data captured by sensors often consists of a scene that contains a set of objects of interest (e.g. cars, pedestrians, etc.) surrounded mostly by open space, which is of limited (or no) interest. As such, 3D data is inherently sparse. In such an environment, standard implementation of convolutions would be computationally intensive and consume a large amount of memory. So, in TF 3D we use submanifold sparse convolution and pooling operations, which are designed to process 3D sparse data more efficiently. Sparse convolutional models are core to the state-of-the-art methods applied in most outdoor self-driving (e.g. Waymo, NuScenes) and indoor benchmarks (e.g. ScanNet).

We also use various CUDA techniques to speed up the computation (e.g., hashing, partitioning / caching the filter in shared memory, and using bit operations). Experiments on the Waymo Open dataset shows that this implementation is around 20x faster than a well-designed implementation with pre-existing TensorFlow operations.

TF 3D then uses the 3D submanifold sparse U-Net architecture to extract a feature for each voxel. The U-Net architecture has proven to be effective by letting the network extract both coarse and fine features and combining them to make the predictions. The U-Net network consists of three modules, an encoder, a bottleneck, and a decoder, each of which consists of a number of sparse convolution blocks with possible pooling or un-pooling operations.

A 3D sparse voxel U-Net architecture. Note that a horizontal arrow takes in the voxel features and applies a submanifold sparse convolution to it. An arrow that is moving down performs a submanifold sparse pooling. An arrow that is moving up will gather back the pooled features, concatenate them with the features coming from the horizontal arrow, and perform a submanifold sparse convolution on the concatenated features.

The sparse convolutional network described above is the backbone for the 3D scene understanding pipelines that are offered in TF 3D. Each of the models described below uses this backbone network to extract features for the sparse voxels, and then adds one or multiple additional prediction heads to infer the task of interest. The user can configure the U-Net network by changing the number of encoder / decoder layers and the number of convolutions in each layer, and by modifying the convolution filter sizes, which enables a wide range of speed / accuracy tradeoffs to be explored through the different backbone configurations

3D Semantic Segmentation
The 3D semantic segmentation model has only one output head for predicting the per-voxel semantic scores, which are mapped back to points to predict a semantic label per point.

3D semantic segmentation of an indoor scene from ScanNet dataset.

3D Instance Segmentation
In 3D instance segmentation, in addition to predicting semantics, the goal is to group the voxels that belong to the same object together. The 3D instance segmentation algorithm used in TF 3D is based on our previous work on 2D image segmentation using deep metric learning. The model predicts a per-voxel instance embedding vector as well as a semantic score for each voxel. The instance embedding vectors map the voxels to an embedding space where voxels that correspond to the same object instance are close together, while those that correspond to different objects are far apart. In this case, the input is a point cloud instead of an image, and it uses a 3D sparse network instead of a 2D image network. At inference time, a greedy algorithm picks one instance seed at a time, and uses the distance between the voxel embeddings to group them into segments.

3D Object Detection
The 3D object detection model predicts per-voxel size, center, and rotation matrices and the object semantic scores. At inference time, a box proposal mechanism is used to reduce the hundreds of thousands of per-voxel box predictions into a few accurate box proposals, and then at training time, box prediction and classification losses are applied to per-voxel predictions. We apply a Huber loss on the distance between predicted and the ground-truth box corners. Since the function that estimates the box corners from its size, center and rotation matrix is differentiable, the loss will automatically propagate back to those predicted object properties. We use a dynamic box classification loss that classifies a box that strongly overlaps with the ground-truth as positive and classifies the non-overlapping boxes as negative.

Our 3D object detection results on ScanNet dataset.

In our recent paper, “DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes”, we describe in detail the single-stage weakly supervised learning algorithm used for object detection in TF 3D. In addition, in a follow up work, we extended the 3D object detection model to leverage temporal information by proposing a sparse LSTM-based multi-frame model. We go on to show that this temporal model outperforms the frame-by-frame approach by 7.5% in the Waymo Open dataset.

The 3D object detection and shape prediction model introduced in the DOPS paper. A 3D sparse U-Net is used to extract a feature vector for each voxel. The object detection module uses these features to propose 3D boxes and semantic scores. At the same time, the other branch of the network predicts a shape embedding that is used to output a mesh for each object.

Ready to Get Started?
We’ve certainly found this codebase to be useful for our 3D computer vision projects, and we hope that you will as well. Contributions to the codebase are welcome and please stay tuned for our own further updates to the framework. To get started please visit our github repository.

Acknowledgements
The release of the TensorFlow 3D codebase and model has been the result of widespread collaboration among Google researchers with feedback and testing from product groups. In particular we want to highlight the core contributions by Alireza Fathi and Rui Huang (work performed while at Google), with special additional thanks to Guangda Lai, Abhijit Kundu, Pei Sun, Thomas Funkhouser, David Ross, Caroline Pantofaru, Johanna Wald, Angela Dai and Matthias Niessner.

Offsites

Uncovering Unknown Unknowns in Machine Learning

Post author By
Post date February 11, 2021
No Comments on Uncovering Unknown Unknowns in Machine Learning

Posted by Lora Aroyo and Praveen Paritosh, Research Scientists, Google Research

The performance of machine learning (ML) models depends both on the learning algorithms, as well as the data used for training and evaluation. The role of the algorithms is well studied and the focus of a multitude of challenges, such as SQuAD, GLUE, ImageNet, and many others. In addition, there have been efforts to also improve the data, including a series of workshops addressing issues for ML evaluation. In contrast, research and challenges that focus on the data used for evaluation of ML models are not commonplace. Furthermore, many evaluation datasets contain items that are easy to evaluate, e.g., photos with a subject that is easy to identify, and thus they miss the natural ambiguity of real world context. The absence of ambiguous real-world examples in evaluation undermines the ability to reliably test machine learning performance, which makes ML models prone to develop “weak spots”, i.e., classes of examples that are difficult or impossible for a model to accurately evaluate, because that class of examples is missing from the evaluation set.

To address the problem of identifying these weaknesses in ML models, we recently launched the Crowdsourcing Adverse Test Sets for Machine Learning (CATS4ML) Data Challenge at HCOMP 2020 (open until 30 April, 2021 to researchers and developers worldwide). The goal of the challenge is to raise the bar in ML evaluation sets and to find as many examples as possible that are confusing or otherwise problematic for algorithms to process. CATS4ML relies on people’s abilities and intuition to spot new data examples about which machine learning is confident, but actually misclassifies.

What are ML “Weak Spots”?
There are two categories of weak spots: known unknowns and unknown unknowns. Known unknowns are examples for which a model is unsure about the correct classification. The research community continues to study this in a field known as active learning, and has found the solution to be, in very general terms, to interactively solicit new labels from people on uncertain examples. For example, if a model is not certain whether or not the subject of a photo is a cat, a person is asked to verify; but if the system is certain, a person is not asked. While there is room for improvement in this area, what is comforting is that the confidence of the model is correlated with its performance, i.e., one can see what the model doesn’t know.

Unknown unknowns, on the other hand, are examples where a model is confident about its answer, but is actually wrong. Efforts to proactively discover unknown unknowns (e.g., Attenberg 2015 and Crawford 2019) have helped uncover a multitude of unintended machine behaviours. In contrast to such approaches for the discovery of unknown unknowns, generative adversarial networks (GANs) generate unknown unknowns for image recognition models in the form of optical illusions for computers that cause deep learning models to make mistakes beyond human perception. While GANs uncover model exploits in the event of an intentional manipulation, real-world examples can better highlight a model’s failures in its day-to-day performance. These real-world examples are the unknown unknowns of interest to CATS4ML — the challenge aims to gather unmanipulated examples that humans can reliably interpret but on which many ML models would confidently disagree.

Example illustrating how optical illusions for computers caused by adversarial noise help discover machine manipulated unknown unknowns for ML models (based on Brown 2018).

First Edition of CATS4ML Data Challenge: Open Images Dataset
The CATS4ML Data Challenge focuses on visual recognition, using images and labels from the Open Images Dataset. The target images for the challenge are selected from the Open Images Dataset along with a set of 24 target labels from the same dataset. The challenge participants are invited to invent new and creative ways to explore this existing publicly available dataset and, focussed on a list of pre-selected target labels, discover examples of unknown unknowns for ML models.

Examples from the Open Images Dataset as possible unknown unknowns for ML models.

CATS4ML is a complementary effort to FAIR’s recently introduced DynaBench research platform for dynamic data collection. Where DynaBench tackles issues with static benchmarks using ML models with humans in the loop, CATS4ML focuses on improving evaluation datasets for ML by encouraging the exploration of existing ML benchmarks for adverse examples that can be unknown unknowns. The results will help detect and avoid future errors, and also will give insights to model explainability.

In this way, CATS4ML aims to raise greater awareness of the problem by providing dataset resources that developers can use to uncover the weak spots of their algorithms. This will also inform researchers on how to create benchmark datasets for machine learning that are more balanced, diverse and socially aware.

Get Involved
We invite the global community of ML researchers and practitioners to join us in the effort of discovering interesting, difficult examples from the Open Images Dataset. Register on the challenge website, download the target images and labeled data, contribute the images you discover and join the competition for the winning participant!

To score points in this competition, participants should submit a set of image-label pairs that will be confirmed by human-in-the-loop raters, whose votes should be in disagreement with the average machine score for the label over a number of machine learning models.

An example of how a submitted image can score points. The same image can score as a false positive (Left) and as a false negative (Right) with two different labels. In both cases the human verification is in disagreement with the machine score. Participants score on submitted image-label pairs, which means that one and the same image can be an example of an ML unknown unknown for different labels.

The challenge is open until 30 April, 2021 to researchers and developers worldwide. To learn more about CATS4ML and how to join, please review these slides and visit the challenge website.

Acknowledgements
The release of the CATS4ML Data Challenge has been possible thanks to the hard work of a lot of people including, but not limited to, the following (in alphabetical order of last name): Osman Aka, Ken Burke, Tulsee Doshi, Mig Gerard, Victor Gomes, Shahab Kamali, Igor Karpov, Devi Krishna, Daphne Luong, Carey Radebaugh, Jamie Taylor, Nithum Thain, Kenny Wibowo, Ka Wong, and Tong Zhou.